[Scikit-learn-general] Removing confounding factors before clustering

Juan Nunez-Iglesias Tue, 18 Feb 2014 02:47:26 -0800

Hi All,

I have a "biggish" dataset (to use Gaël's terminology ;), 45K samples x 300
features, that I want to cluster. I have very heterogeneous features -- some
are continuous, others are quasi-continuous (high counts), others are
discrete (counts of rare events), others are angles (uniformly distributed
in [-pi, pi])... Is it kosher to use standard scaling and K-means on such a
dataset? What clustering method would you recommend?


Additionally, there are some confounding factors that I want to account
for, as samples were processed in batches. What's the best way to deal with
this? Intuitively I was going to scale each batch independently, but is
there a function/class within sklearn that will do this for me?

Thanks,

Juan.

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121054471&iu=/4140/ostg.clktrk

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Removing confounding factors before clustering

Reply via email to