reduce resource requirements. If found that Canopy
Clustering helps in that. I could not anything equivalent to it in spark.
Is something available? or is it planned in some future releases .
Please let me know. Thank you
I want to run k-means of MLib on a big dataset, it seems for big datsets, we
need to perform pre-clustering methods such as canopy clustering. By starting
with an initial clustering the number of more expensive distance measurements
can be significantly reduced by ignoring points outside of
I want to run k-means of MLib on a big dataset, it seems for big datsets, we
need to perform pre-clustering methods such as canopy clustering. By
starting with an initial clustering the number of more expensive distance
measurements can be significantly reduced by ignoring points outside of the