Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Luca Puggini
Thanks for all the answers. Then the fault is probably due to the overfitting of OCSVM. I was probably mislead by the title of my reference paper "*Estimating *the *support *of a *high*-*dimensional *distribution" Best, Luca On

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Gael Varoquaux
On Wed, Oct 14, 2015 at 01:18:19PM +, Luca Puggini wrote: > I was expecting OCSVM to be not too much influenced by the increasing number > of > variables even if some of them are irrelevant. I am not: it's based on an RBF kernel. These things are not well behaved in high dimensions. Gaël --

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Kyle Kastner
IncrementalPCA should get closer to "true" PCA as the number of components increases - so if anything the solution should be more stable rather than less. The difference mostly lies in the incremental processing - regular PCA with reduced components performs the full PCA, then only keeps a subset o

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Luca Puggini
Thanks for the answer. I was expecting OCSVM to be not too much influenced by the increasing number of variables even if some of them are irrelevant. I am just wondering if the drop in performances is more likely to occur due to the overfitting of OCSVM or due to an unexpected behaviour of of Incr

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread olologin
On 10/14/2015 02:28 PM, Oliver Tomic wrote: I am not sure whether there is such a feature in scikit-learn, but the cumulative (validated) explained variance after each component may also give a good indication of when to stop including further components. that is when it starts to drop. *expla

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Oliver Tomic
This sentence was rather cluttered. Here I try again: "A standard way to find out how many components you should is to use cross validation when computing the PCA model and then study when cumulative explained variance is flattening out." OLI ​ On Wed, 14 Oct 2015 13:24:39 +0200 Ol

Re: [Scikit-learn-general] Efficiency of Incremental PCA when n_components>>0

2015-10-14 Thread Oliver Tomic
Hi Luca,, it seems to me that you are overfitting the data by using too many components? By including too many components it is possible that you are modelling noise in you PCA model, which in return may lead to poorer predictions with your OCSVM. A standard way to find out how many compone