This sentence was rather cluttered. Here I try again:

"A standard way to find out how many components you should is to use cross 
validation when computing the PCA model and then study when cumulative 
explained variance is flattening out."



OLI

​



 ---- On Wed, 14 Oct 2015 13:24:39 +0200 Oliver Tomic 
<oliverto...@zoho.com> wrote ----




Hi Luca,,



it seems to me that you are overfitting the data by using too many components? 
By including too many components it is possible that you are modelling noise in 
you PCA model, which in return may lead to poorer predictions with your OCSVM.



A standard way to find out how many components you should use cross validation 
on when computing the PCA model and study when the increase in cumulative 
explained variance is flattening out. I am not sure whether there is such a 
feature in scikit-learn, but the cumulative (validated) explained variance 
after each component may also give a good indication of when to stop including 
further components. that is when it starts to drop.



OLI







---- On Wed, 14 Oct 2015 13:13:39 +0200 Luca Puggini 
<lucapug...@gmail.com> wrote ----




Hi, 

I am writing a fault detection system using OCSVM. 


I start with a huge matrix shape=(1006, 300000) and  I reduce its dimension 
with IncrementalPCA.  




If I use 10 - 100 pca components I get a very good AUC score around 0.97 while 
with 1000 components it drops to 0.5.   




Is it possible that incremental PCA becomes unstable when too many components 
are used? 

I can not find another explanation for the drop in performances. As far as I 
know OCSVM should be able to scale well to high dimensional datasets.




The code is here http://jpst.it/CpqB 




Let me know.


Thanks!


-- 

Sent by mobile phone

------------------------------------------------------------------------------ 

_______________________________________________ 

Scikit-learn-general mailing list 

Scikit-learn-general@lists.sourceforge.net 

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 










------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to