Hi Henry.
Please discuss issues like these on the mailing list.
Any one particular developer might not have time to respond.

Blair's SPC is just "make_pipeline(SelectKBest(), PCA(), LogisticRegression())". So I wouldn't say "it didn't make it through". I'd rather say "it's already implemented".

There is indeed no supervised PCA in scikit-learn. The paper seems not really well-established enough for inclusion in scikit-learn, see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published

The paper has 50 citations, which is not a lot. It is basically a classification or regression algorithm with some nice visualization properties. To include it, it would need to out-perform more established approaches on a variety of datasets. I only skimmed the paper but they don't even seems to compare against linear approaches like ridge or lasso.

That doesn't mean it's not beneficial to create an open source python implementation that is scikit-learn compatible, again see
http://scikit-learn.org/dev/faq.html#can-i-add-this-new-algorithm-that-i-or-someone-else-just-published

Cheers,
Andy


On 12/08/2015 05:16 AM, Henry Lin wrote:
Hi all,

My name's Henry Lin, and I'm a Master's student at the University of Illinois at Urbana Champaign. You might remember me from a few pull requests from scikit-learn. (5431 <https://github.com/scikit-learn/scikit-learn/pull/5431> and 5825 <https://github.com/scikit-learn/scikit-learn/pull/5825>).

I've been recently performing research in embedding methods for classification, and one algorithm that I've recently been interested in is supervised principal component analysis by Barshan and et al. here <http://www.sciencedirect.com/science/article/pii/S0031320310005819>. (It's not the "supervised principal components" by Bair, Hastie et al. here <https://web.stanford.edu/%7Ehastie/Papers/spca_JASA.pdf>.)

I was wondering whether there would be any interest in adding a supervised principal component analysis to scikit-learn. This has been previously proposed with Bair's SPC in this pull request <https://github.com/scikit-learn/scikit-learn/pull/5196>, but it never made it though. (The workflow was too similar to a scikit-learn pipeline.) On the other hand, the work by Barshan is completely different from Bair, and I think it would be interesting to have a supervised version of PCA added to scikit-learn. (To my knowledge, there is currently no supervised PCA in the library.)

I am currently in contact with Elnaz Barshan, and she has given me the code from her paper. Using her matlab code I've been able to reproduce some of her results, and with some time I'll be able to rewrite her work in python. I'd just like some validation from scikit-learn owners (such as yourselves ☺) to see whether it's a worthy time investment for me to work on this project. It would entail me to verify with her that she would like to see her code implemented in a public domain, and then I would obviously have to implement it in python, with scikit-learn's standards.

What do you guys think?
-Henry Lin

--
/*Henry Lin*, Research Assistant
M.S. Computer Science
University of Illinois at Urbana-Champaign 2016
847-769-8729 <tel:847-769-8729> | hal...@illinois.edu <mailto:hal...@illinois.edu>/

------------------------------------------------------------------------------
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to