On 10/08/2012 10:59 PM, Diego Casado wrote:
Ahhhhhh
Now I've understood it! Thanks Andreas...Ok I was confused since the
/bench_k_means/ function in here
<http://scikit-learn.org/dev/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py>
used them to test some metrics and performance of the model to the
data. Ok!
So, no problem with the numpy vectors since I've obtained mines from a
topic distribution using gensim <http://radimrehurek.com/gensim/> and
LDA model.
Last simple remark. How can I know then that I using the right numbers
of iterations in k-means or I using the appropriate number of K, etc
if I can not run the metrics.
And to represent it visuallualy I guess that just keep it simple with
a PCA 2D reductionor is there any other dimension-scaling?
Thanks a lot Andreas!
Setting the parameters for an unsupervised algorithm like clustering
without ground truth
is a hard problem with no obvious solution.
For K-Means, you shouldn't worry to much about the number of iterations,
but picking the number of clusters
is important.
You can try the silhouette coefficient:
http://scikit-learn.org/dev/modules/clustering.html#silhouette-coefficient
but probably not rely on it to much.
Visual inspection is always good to explore your data.
Start with PCA in 2 or three dimensions (see
http://scikit-learn.org/dev/auto_examples/datasets/plot_iris_dataset.html#example-datasets-plot-iris-dataset-py).
You can also try KernelPCA and the algorithms in the manifold module
(mds is pretty slow, maybe stick with the rest),
but definitely start with PCA.
I recently blogged about doing an animation but that is more of a
toy-thing ;)
http://peekaboo-vision.blogspot.de/2012/10/animating-random-projections-of-high.html
Cheers,
Andy
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general