Hi Andreas, Raghav and Jacob, Thank you for your inputs. I have attached the links to the final draft of the proposal. I would really be grateful if anyone has any other suggestions and would be happy to incorporate them. Thanks for your time.
Wiki proposal <https://github.com/scikit-learn/scikit-learn/wiki/%5BGSoc-2016%5D-Better-Missing-Value-Handling-in-scikit-learn> PDF Proposal <https://drive.google.com/file/d/0BzDDRCWPRL5Zd0FJVWlPX3FQVE0/view> Regards, Maniteja _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general On Wed, Mar 30, 2016 at 12:54 AM, Maniteja Nandana < maniteja.modesty...@gmail.com> wrote: > Hi everyone, > > Thanks for the inputs. I have created a wiki page here > <https://github.com/maniteja123/scikit-learn/wiki/Better-Missing-Value-Handling-in-scikit-learn> > for > the work aimed to be done in better handling of missing data including > working on the stalled PR on Matrix Factorization, KNN imputation and also > on some additional features as suggested above. Please do have a look at it > and would be really grateful if anyone has any input or suggestions on the > proposal and also correct me in case I had missed something. > > Thanks for your time. > > Best regards, > Maniteja. > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > On Sat, Mar 26, 2016 at 12:15 AM, Raghav R V <rag...@gmail.com> wrote: > >> Yes! Exactly the same! >> >> On Fri, Mar 25, 2016 at 6:21 PM, Maniteja Nandana < >> maniteja.modesty...@gmail.com> wrote: >> >>> Hi Raghav, >>> >>> Thanks a lot for the idea. I would be glad to work on it and along with >>> the "output dummy one-hot encoder features for imputer to specify if >>> the feature value is imputed or not", would the the idea to add " >>> binary indicator feature (for each possibly missing feature) that indicate >>> feature >>> was imputed" as suggested here >>> <https://github.com/scikit-learn/scikit-learn/issues/6556> probably be >>> a nice and easy addition ? >>> >>> Thanks, >>> Maniteja. >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> On Fri, Mar 25, 2016 at 9:25 PM, Andreas Mueller <t3k...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On 03/25/2016 11:11 AM, Raghav R V wrote: >>>> > Hey Maniteja, >>>> > >>>> > I took a look at your proposal. As I said before I feel it is a bit >>>> > broad and you should try to narrow it down to a good theme. >>>> > >>>> > Since you have chosen more than one PRs which are missing value >>>> > related, I have a suggestion for a theme - >>>> > >>>> > "Better Missing Value Handling" >>>> > >>>> > You could group the knn imputation, matrix factorization with missing >>>> > values and *outputting dummy one-hot encoded features for imputer to >>>> > specify if the feature value is imputed or not. Implementing these >>>> > properly and merging should be sufficient for a GSoC I feel. As an >>>> > optional thing, you could add another imputation strategy. >>>> > >>>> > *I'll raise an issue so you understand that better. >>>> +1 >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Transform Data into Opportunity. >>>> Accelerate data analysis in your applications with >>>> Intel Data Analytics Acceleration Library. >>>> Click to learn more. >>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Transform Data into Opportunity. >>> Accelerate data analysis in your applications with >>> Intel Data Analytics Acceleration Library. >>> Click to learn more. >>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Transform Data into Opportunity. >> Accelerate data analysis in your applications with >> Intel Data Analytics Acceleration Library. >> Click to learn more. >> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140 >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general