MLLIB , Does Spark support Canopy Clustering ?

2019-04-02 Thread Alok Bhandari
Hello All ,

I am interested to use bisecting k-means algorithm implemented in spark.
While using bisecting k-means I found that some of my clustering requests
on large data-set failed with OOM issues.

As data-set size is expected to be large , so I wanted to use some
pre-processing steps to reduce resource requirements. If found that Canopy
Clustering helps in that. I could not anything equivalent to it in spark.
Is something available? or is it planned in some future releases .

Please let me know. Thank you


canopy clustering

2014-11-10 Thread amin mohebbi
I want to run k-means of MLib  on a big dataset, it seems for big datsets, we 
need to perform pre-clustering methods such as canopy clustering. By starting 
with an initial clustering the number of more expensive distance measurements 
can be significantly reduced by ignoring points outside of the initial 
canopies. 

I I am not mistaken, in the k-means of MLib, there are three initialization 
steps : Kmeans ++, Kmeans|| and random . 

So, can anyone explain to me that can we use kmeans|| instead of canopy 
clustering? or these two methods act completely different?

 
 

Best Regards 

... 

Amin Mohebbi 

PhD candidate in Software Engineering  
 at university of Malaysia   

Tel : +60 18 2040 017 



E-Mail : tp025...@ex.apiit.edu.my 

  amin_...@me.com

canopy clustering

2014-11-09 Thread aminn_524
I want to run k-means of MLib  on a big dataset, it seems for big datsets, we
need to perform pre-clustering methods such as canopy clustering. By
starting with an initial clustering the number of more expensive distance
measurements can be significantly reduced by ignoring points outside of the
initial canopies. 

I I am not mistaken, in the k-means of MLib, there are three initialization
steps : Kmeans ++, Kmeans|| and random .

So, can anyone explain to me that can we use kmeans|| instead of canopy
clustering? or these two methods act completely different? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/canopy-clustering-tp18462.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org