Kmeans really needs to have identified number of clusters in advance. There are multiple algorithms (XMeans, ART,...) which do not need this information. Unfortunately, none of them is implemented in MLLib for the moment (you can give a hand and help community).
Anyway, it seems to me you will not be satisfied with those algorithms(Xmeans, ART,...) either. I understood that what you want to achieve is precise number of clusters. Notice, whenever you change input parameters (random seed,...) number of clusters might be different. Clustering is great tool but it won't give you one true (one number). regards, Tomas -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Clustering-text-data-with-MLlib-tp20883p20899.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org