Kmeans really needs to have identified number of clusters in advance. There
are multiple algorithms (XMeans, ART,...) which do not need this
information. Unfortunately, none of them is implemented in MLLib for the
moment (you can give a hand and help community).

Anyway, it seems to me you will not be satisfied with those
algorithms(Xmeans, ART,...) either. I understood that what you want to
achieve is precise number of clusters. Notice, whenever you change input
parameters (random seed,...) number of clusters might be different.
Clustering is great tool but it won't give you one true (one number).


regards, Tomas



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Clustering-text-data-with-MLlib-tp20883p20899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to