Hi Andreas,

Sure. Actually, the purpose of the model is both regularization and
dimensionality reduction for problems where the number of features can be
larger than the number of instances (or in any case when there is a large
number of features). It is particularly effective when there are lots of
highly correlated attributes with each other.

L1 regularization breaks down in the presence of lots of correlations. L2
deals better with this problem, but ignores the presence of clusters of
highly correlated attributes. Supervised PCA is particularly well suited to
these kinds of problems. The algorithm seems to outperform partial least
squares.

I actually came up upon this algorithm when trying to find a way to analyze
GPS data gathered from the training of a professional football team. Ridge
logistic regression didn't provide good results, LASSO either, but
supervised PCA worked well. It is also possible to use it to reduce the
dimensionality in a way that the components correlate with the response.

The work was presented at Mathsports International 2015 (
http://www.mathsportinternational2015.com/uploads/2/2/2/4/22242920/mathsport2015proceedings.pdf
)

I am not sure about the popularity of this method, in general, but for me
it's going to be one of the standard methods to use in the presence of lots
of variables.

Best regards,
Stelios

2015-07-28 19:16 GMT+01:00 Andreas Mueller <t3k...@gmail.com>:

>  Hi Stylianos.
>
> Can you give a bit more background on the model?
> It seems fairly well-cited but I haven't really seen it in practice.
> Is it still state of the art?
> The main purpose seems to be a particular type of regularization, right,
> not supervised dimensionality reduction?
> How does this compare against elastic net? There seems to be some
> comparison to PLS and lasso in the paper.
>
> It would be good to see that this is a widely useful method before adding
> it to sklearn.
>
> Cheers,
> Andy
>
>
>
> On 07/24/2015 06:40 AM, Stylianos Kampakis wrote:
>
> Dear all,
>
>  I am thinking to contribute a new model to the library: The supervised
> principal components analysis by Bair et al. (2006).
>
>  I wanted to get in touch before contributing to make sure no-one else is
> working on that algorithm, since this is what the site recommends.
>
>  Cheers,
> S. Kampakis
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to