This sentence was rather cluttered. Here I try again:
"A standard way to find out how many components you should is to use cross validation when computing the PCA model and then study when cumulative explained variance is flattening out." OLI ---- On Wed, 14 Oct 2015 13:24:39 +0200 Oliver Tomic <oliverto...@zoho.com> wrote ---- Hi Luca,, it seems to me that you are overfitting the data by using too many components? By including too many components it is possible that you are modelling noise in you PCA model, which in return may lead to poorer predictions with your OCSVM. A standard way to find out how many components you should use cross validation on when computing the PCA model and study when the increase in cumulative explained variance is flattening out. I am not sure whether there is such a feature in scikit-learn, but the cumulative (validated) explained variance after each component may also give a good indication of when to stop including further components. that is when it starts to drop. OLI ---- On Wed, 14 Oct 2015 13:13:39 +0200 Luca Puggini <lucapug...@gmail.com> wrote ---- Hi, I am writing a fault detection system using OCSVM. I start with a huge matrix shape=(1006, 300000) and I reduce its dimension with IncrementalPCA. If I use 10 - 100 pca components I get a very good AUC score around 0.97 while with 1000 components it drops to 0.5. Is it possible that incremental PCA becomes unstable when too many components are used? I can not find another explanation for the drop in performances. As far as I know OCSVM should be able to scale well to high dimensional datasets. The code is here http://jpst.it/CpqB Let me know. Thanks! -- Sent by mobile phone ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general