I have been working on a project to return a Linkage Matrix output from the Spark Bisecting Kmeans Algorithm output so that it is possible to plot the selection steps in a dendogram. I am having trouble returning valid Indices when I use more than 3-4 clusters in the algorithm and am hoping someone else might have the time/interest enough to take a look.
To achieve this I made some modifications to the Bisecting Kmeans algorithm to produce a z-linkage matrix based on yu-iskw's work. I also made some modifications to provide more information about the selection steps in the Bisecting Kmeans Algorithm to the log at run-time. Test outputs using the Iris Dataset with both k = 3 and k = 10 clusters can be seen on my stack overflow post <https://stackoverflow.com/questions/49265521/bisecting-kmeans-cluster-indices-in-apache-spark> The project so far (with a simple sbt build and the compiled jars) can also be seen on my github repo <https://github.com/GabeChurch/IncubatingProjects> and is also detailed in the aforementioned stack overflow post. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org