2013/1/11 <[email protected]>: > > BTW: When doing a RandomizedPCA, the explained variance of the first > component increase to 78% > * Turning whiten on or off has more or less no influence on the explained > variance. > > * However, plotting with class labels on => again no clear differentiation > between the two classes :(
It just means that you data is not linearly separable when you project it onto the first 2 dimensions of PCA. This is no big deal though. Not all problems are as easy as iris classification :) What you can also try is plot the histograms for each features. For feature that are highly non gaussian (e.g. with a long tail), you should try to take a sublinear scaling of them: `sign(x_i) * np.log1p(x_i)` instead of `x_i` or alternatively `sign(x_i) * np.sqrt(x_i)`. If the histogram shows a multimodal profile then maybe percentile binning would help too. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and much more. Get web development skills now with LearnDevNow - 350+ hours of step-by-step video tutorials by Microsoft MVPs and experts. SALE $99.99 this month only -- learn more at: http://p.sf.net/sfu/learnmore_122812 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
