Hey Maniteja,

I took a look at your proposal. As I said before I feel it is a bit broad
and you should try to narrow it down to a good theme.

Since you have chosen more than one PRs which are missing value related, I
have a suggestion for a theme -

"Better Missing Value Handling"

You could group the knn imputation, matrix factorization with missing
values and *outputting dummy one-hot encoded features for imputer to
specify if the feature value is imputed or not. Implementing these properly
and merging should be sufficient for a GSoC I feel. As an optional thing,
you could add another imputation strategy.

*I'll raise an issue so you understand that better.

Thanks,

Raghav R V



On Wed, Mar 23, 2016 at 5:46 PM, Maniteja Nandana <
maniteja.modesty...@gmail.com> wrote:

> Hi Raghav,
>
> Thanks a lot for your reply. That helps so much.
>
> I understand that the proposal should be specific to a module but right
> now I am not sure which of these implementation are the most sought-after.
> I will update the proposal based on the inputs.
>
> I also have looked at the stalled PRs of Metric learning NCA and Matrix
> Completion for missing values, but they have heavy on math. If they are of
> utmost importance, I would gladly spend time to read through the reference
> papers.
>
> I would really appreciate any other feedback on this proposal.
>
> Thank you again for your time !
>
> Best regards,
> Maniteja.
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> :// <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> lists.sourceforge.net
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>/lists/
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> listinfo
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>/
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> scikit-learn-general
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> On 23 Mar 2016 5:27 pm, "Raghav R V" <rag...@gmail.com> wrote:
>
>> Hey Maniteja,
>>
>> Having taken a quick look at the list... my thoughts -
>>
>> * The KNN imputation is an important addition that got stalled.
>> * The semi-supervised NB with EM seems like a good addition, Olivier,
>> Larsmans (and Joel?) have to comment on whether it should be a priority.
>> * The haversine metric is tagged "easy".
>> * "Meta-estimator for semi-supervised learning" is not hard but I believe
>> is API heavy and would involve devoting considerable amount of time for API
>> discussions...
>> * "Label power set multilabel classification strategy" doesn't look like
>> a priority.
>> * I am not very sure if infomax ICA had good interest among core devs.
>> * *I think* People were pretty interested in Metric Learning NCA and
>> Matrix completion with missing values, but I believe they are math heavy.
>> Make sure you can handle that! Ping Olivier if you need more information.
>>
>> Also please note that the proposal needs to have a central theme like
>> "Improvements in linear models" or "Improvements in tree models" and your
>> should propose to complete the stalled PRs under that  theme...
>>
>> Thanks for the mail! Good luck on your proposal! Please note that the
>> deadline is on 25th of this month!
>>
>> Raghav
>>
>> On Mon, Mar 21, 2016 at 7:35 PM, Maniteja Nandana <
>> maniteja.modesty...@gmail.com> wrote:
>>
>>> Hello everyone,
>>>
>>> My name is Maniteja, a senior year computer science student from India (
>>> github <https://github.com/maniteja123>)
>>> It was been a wonderful learning opportunity contributing to the library
>>> for the past few months and would like to thank everyone for their support
>>> and patiently answering my questions. I am really eager to contribute more
>>> to my best abilities. Since it was proposed to work on existing PRs, I have
>>> also added better detailed version at here
>>> <https://github.com/maniteja123/scikit-learn/wiki/Various-enhancements-to-scikit-learn>
>>>
>>> I wanted to seek feedback on the following issues and PRs . If any of
>>> the authors of the following PRs are interested to work on their PRs please
>>> let me know and I am sorry for not asking prior permission since I couldn't
>>> contact each of you and also didn't want to create noise by commenting on
>>> all the PRs. Hope you understand. If it is okay for me to try working on
>>> these, please let me know your opinions and suggestions.
>>>
>>> Semi-supervised Naive Bayes using Expectation Maximization  #430
>>> <https://github.com/scikit-learn/scikit-learn/pull/430>
>>> Meta estimator for self trained model #1243
>>> <https://github.com/scikit-learn/scikit-learn/issues/1243>
>>> Use Bayesian priors in Nearest Neighbors classifier #399
>>> <https://github.com/scikit-learn/scikit-learn/issues/399> #970
>>> <https://github.com/scikit-learn/scikit-learn/pull/970%5C>
>>> Classifier Chain for multi-label problems PRs: #3727
>>> <https://github.com/scikit-learn/scikit-learn/pull/3727> #4759
>>> <https://github.com/scikit-learn/scikit-learn/issues/4759>
>>> Label power set multilabel classification strategy PRs: #2461
>>> <https://github.com/scikit-learn/scikit-learn/pull/2461>
>>> Multioutput bagging  #4848
>>> <https://github.com/scikit-learn/scikit-learn/pull/4848>
>>> Added 'average' option to passive aggressive classifier/regressor. #4939
>>> <https://github.com/scikit-learn/scikit-learn/pull/4939>
>>> Add "grouped" option to Scaler classes: #4963
>>> <https://github.com/scikit-learn/scikit-learn/pull/4963>
>>> Metric precision at k score #4975
>>> <https://github.com/scikit-learn/scikit-learn/4975>
>>> Implement haversine metric in pairwise #4458
>>> <https://github.com/scikit-learn/scikit-learn/pull/4458> #4453
>>> <https://github.com/scikit-learn/scikit-learn/issues/4453>
>>> Add KNN strategy for imputation #4844
>>> <https://github.com/scikit-learn/scikit-learn/pull/4844>
>>> Add resample to preprocessing. #1454
>>> <https://github.com/scikit-learn/scikit-learn/pull/1454> #6568
>>> <https://github.com/scikit-learn/scikit-learn/issues/6568>
>>> Added metrics support for multiclass-multioutput classification #3681
>>> <https://github.com/scikit-learn/scikit-learn/pull/3681>
>>> random neural network algorithm #4703
>>> <https://github.com/scikit-learn/scikit-learn/pull/4703>
>>>
>>> Thank you for your time and waiting to hear back from you !
>>>
>>> Yours sincerely,
>>> Maniteja.
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to