Indeed it sounds interesting but I'd still be curious as to how it compares against elasticnet.

On 07/29/2015 05:41 PM, Stylianos Kampakis wrote:
Hi Andreas,

Sure. Actually, the purpose of the model is both regularization and dimensionality reduction for problems where the number of features can be larger than the number of instances (or in any case when there is a large number of features). It is particularly effective when there are lots of highly correlated attributes with each other.

L1 regularization breaks down in the presence of lots of correlations. L2 deals better with this problem, but ignores the presence of clusters of highly correlated attributes. Supervised PCA is particularly well suited to these kinds of problems. The algorithm seems to outperform partial least squares.

I actually came up upon this algorithm when trying to find a way to analyze GPS data gathered from the training of a professional football team. Ridge logistic regression didn't provide good results, LASSO either, but supervised PCA worked well. It is also possible to use it to reduce the dimensionality in a way that the components correlate with the response.

The work was presented at Mathsports International 2015 (http://www.mathsportinternational2015.com/uploads/2/2/2/4/22242920/mathsport2015proceedings.pdf)

I am not sure about the popularity of this method, in general, but for me it's going to be one of the standard methods to use in the presence of lots of variables.

Best regards,
Stelios

2015-07-28 19:16 GMT+01:00 Andreas Mueller <t3k...@gmail.com <mailto:t3k...@gmail.com>>:

    Hi Stylianos.

    Can you give a bit more background on the model?
    It seems fairly well-cited but I haven't really seen it in practice.
    Is it still state of the art?
    The main purpose seems to be a particular type of regularization,
    right, not supervised dimensionality reduction?
    How does this compare against elastic net? There seems to be some
    comparison to PLS and lasso in the paper.

    It would be good to see that this is a widely useful method before
    adding it to sklearn.

    Cheers,
    Andy



    On 07/24/2015 06:40 AM, Stylianos Kampakis wrote:
    Dear all,

    I am thinking to contribute a new model to the library: The
    supervised principal components analysis by Bair et al. (2006).

    I wanted to get in touch before contributing to make sure no-one
    else is working on that algorithm, since this is what the site
    recommends.

    Cheers,
    S. Kampakis


    
------------------------------------------------------------------------------


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------

    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to