Hi All, I am facing the same issue. taking k values from 60 to 120 incrementing by 10 each time i.e k takes value 60,70,80,...120 the algorithm takes around 2.5h on a 800 MB data set with 38 dimensions.
On Sun, Mar 29, 2015 at 9:34 AM, davidshen84 [via Apache Spark User List] < ml-node+s1001560n2227...@n3.nabble.com> wrote: > Hi Jao, > > Sorry to pop up this old thread. I am have the same problem like you did. > I want to know if you have figured out how to improve k-means on Spark. > > I am using Spark 1.2.0. My data set is about 270k vectors, each has about > 350 dimensions. If I set k=500, the job takes about 3hrs on my cluster. The > cluster has 7 executors, each has 8 cores... > > If I set k=5000 which is the required value for my task, the job goes on > forever... > > > Thanks, > David > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > > http://apache-spark-user-list.1001560.n3.nabble.com/Why-KMeans-with-mllib-is-so-slow-tp20480p22273.html > To start a new topic under Apache Spark User List, email > ml-node+s1001560n1...@n3.nabble.com > To unsubscribe from Apache Spark User List, click here > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==> > . > NAML > <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-KMeans-with-mllib-is-so-slow-tp20480p26467.html Sent from the Apache Spark User List mailing list archive at Nabble.com.