On 10/08/2012 10:59 PM, Diego Casado wrote:
Ahhhhhh

Now I've understood it! Thanks Andreas...Ok I was confused since the /bench_k_means/ function in here <http://scikit-learn.org/dev/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py> used them to test some metrics and performance of the model to the data. Ok!

So, no problem with the numpy vectors since I've obtained mines from a topic distribution using gensim <http://radimrehurek.com/gensim/> and LDA model. Last simple remark. How can I know then that I using the right numbers of iterations in k-means or I using the appropriate number of K, etc if I can not run the metrics. And to represent it visuallualy I guess that just keep it simple with a PCA 2D reductionor is there any other dimension-scaling?

Thanks a lot Andreas!

Setting the parameters for an unsupervised algorithm like clustering without ground truth
is a hard problem with no obvious solution.
For K-Means, you shouldn't worry to much about the number of iterations, but picking the number of clusters
is important.
You can try the silhouette coefficient:
http://scikit-learn.org/dev/modules/clustering.html#silhouette-coefficient
but probably not rely on it to much.

Visual inspection is always good to explore your data.

Start with PCA in 2 or three dimensions (see http://scikit-learn.org/dev/auto_examples/datasets/plot_iris_dataset.html#example-datasets-plot-iris-dataset-py). You can also try KernelPCA and the algorithms in the manifold module (mds is pretty slow, maybe stick with the rest),
but definitely start with PCA.

I recently blogged about doing an animation but that is more of a toy-thing ;)
http://peekaboo-vision.blogspot.de/2012/10/animating-random-projections-of-high.html

Cheers,
Andy
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to