Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC (and my official application :-) )

Andy Tue, 24 Mar 2015 13:33:54 -0700

Hi Luca.

If you give write comment permissions, I could comment on the google docin-place which might be helpful.As I think was commented earlier, the current PLS already implementsNIPALS. What would the addition be?

Use that in PCA? That is not super clear from the proposal.

I think implementing this together with the other paper you mention willtake more than one or two weeks.Please keep in mind that it needs tests, documentation, examples andreviews.

The "massive parallel" paper only has 8 citations since 2013. That seemspretty low impact and not very established.Unsupervised Feature Selection Using Feature Similarity seems a muchsafer bet (800 cites since 2002), though I am notfamiliar enough with the area to say if it is still comparable to stateof the art or useful.Feature Subset Selection and Ranking for Data Dimensionality Reductionseems borderline with 120 cites since 2007.I haven't actually had time to check the papers (yet?), this is just afirst very superficial review.

Instead of focusing on many algorithms, I think you should also allocatesome time to ensure that we have good evaluation metrics andcross-validation supportfor multi-output algorithms where Y might be an input to transform (notsure for how many of these algorithms this is the case).

How is the multi-task lasso that you are proposing different from theone implemented already in scikit-learn?

http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.MultiTaskLasso.html#sklearn.linear_model.MultiTaskLasso

The project sounds great, the hardest part might be finding the rightmentor (Gael?)


Cheers,
Andy


On 03/06/2015 07:57 PM, Luca Puggini wrote:

Thanks a lot for the material provided on randomized pca and randomforest it would for sure help me in my research.
I talked with my supervisor and he said that I am free to apply forthis summer project.
I used quiet a lot GAM and I did some work related to high dimensionalfault detection system and so to metrics but apparently these topicsare already taken.
My understanding from the previous emails is that nipals PCA may be ofinterest. On the same topic I have a couple of algorithms that I thinkcould be useful.
1- Sparse principal component analysis via regularized low rank matrixapproximation.http://www.sciencedirect.com/science/article/pii/S0047259X07000887This is basically the equivalent of the nipals algorithm for SPCA. Itis more efficient for high dimensional problem. It is pretty usefulbecause it is possible to avoid the initial SVD.
2- Feature Subset Selection and Ranking for Data DimensionalityReduction http://eprints.whiterose.ac.uk/1947/1/weihl3.pdf .
This is a method to do unsupervised features selection. Similar toSPCA but it is optimized in order to maximize the percentage ofexplained variance respect to the number of selected variables.
If these topics are not of interest I will be happy to work on
- improve GMM  or -Global optimization based Hyperparameter optimization
I am not familiar with these 2 topics but they are close to myresearch area so I will be happy to study them.
Now my understanding is that the staff should contact me to discussfurther the various arguments. Please fill free to contact me to myprivate email and I am happy to share my cv and my python code(research quality code )
Thanks a lot,
Luca




------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC (and my official application :-) )

Reply via email to