Re: [Scikit-learn-general] PCA: first component too dominant?

Olivier Grisel Fri, 11 Jan 2013 02:39:46 -0800

2013/1/11  <[email protected]>:
>
> BTW: When doing a RandomizedPCA, the explained variance of the first
> component increase to 78%
> * Turning whiten on or off has more or less no influence on the explained
> variance.
>
> * However, plotting with class labels on => again no clear differentiation
> between the two classes :(


It just means that you data is not linearly separable when you project
it onto the first 2 dimensions of PCA.

This is no big deal though. Not all problems are as easy as iris
classification :)

What you can also try is plot the histograms for each features. For
feature that are highly non gaussian (e.g. with a long tail), you
should try to take a sublinear scaling of them: `sign(x_i) *
np.log1p(x_i)` instead of `x_i` or alternatively `sign(x_i) *
np.sqrt(x_i)`. If the histogram shows a multimodal profile then maybe
percentile binning would help too.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] PCA: first component too dominant?

Reply via email to