Hi everyone,

Thanks for the inputs. I have created a wiki page here
<https://github.com/maniteja123/scikit-learn/wiki/Better-Missing-Value-Handling-in-scikit-learn>
for
the work aimed to be done in better handling of missing data including
working on the stalled PR on Matrix Factorization, KNN imputation and also
on some additional features as suggested above. Please do have a look at it
and would be really grateful if anyone has any input or suggestions on the
proposal and also correct me in case I had missed something.

Thanks for your time.

Best regards,
Maniteja.
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

On Sat, Mar 26, 2016 at 12:15 AM, Raghav R V <rag...@gmail.com> wrote:

> Yes! Exactly the same!
>
> On Fri, Mar 25, 2016 at 6:21 PM, Maniteja Nandana <
> maniteja.modesty...@gmail.com> wrote:
>
>> Hi Raghav,
>>
>> Thanks a lot for the idea. I would be glad to work on it and along with
>> the "output dummy one-hot encoder features for imputer to specify if the 
>> feature
>> value is imputed or not", would the the idea to add " binary indicator
>> feature (for each possibly missing feature) that indicate feature
>> was imputed" as suggested here
>> <https://github.com/scikit-learn/scikit-learn/issues/6556> probably be a
>> nice and easy addition ?
>>
>> Thanks,
>> Maniteja.
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> On Fri, Mar 25, 2016 at 9:25 PM, Andreas Mueller <t3k...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On 03/25/2016 11:11 AM, Raghav R V wrote:
>>> > Hey Maniteja,
>>> >
>>> > I took a look at your proposal. As I said before I feel it is a bit
>>> > broad and you should try to narrow it down to a good theme.
>>> >
>>> > Since you have chosen more than one PRs which are missing value
>>> > related, I have a suggestion for a theme -
>>> >
>>> > "Better Missing Value Handling"
>>> >
>>> > You could group the knn imputation, matrix factorization with missing
>>> > values and *outputting dummy one-hot encoded features for imputer to
>>> > specify if the feature value is imputed or not. Implementing these
>>> > properly and merging should be sufficient for a GSoC I feel. As an
>>> > optional thing, you could add another imputation strategy.
>>> >
>>> > *I'll raise an issue so you understand that better.
>>> +1
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to