Hi Burak, k = 3 dimension = 785 features Spark 1.4
On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brk...@gmail.com> wrote: > Hi, > > How are you running K-Means? What is your k? What is the dimension of your > dataset (columns)? Which Spark version are you using? > > Thanks, > Burak > > On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com> wrote: > >> Hi, >> >> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of >> time (16+ mints). >> >> It takes lot of time at this task; >> >> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) >> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) >> >> Can this be improved? >> >> -- >> >> Thanks & regards, >> Nirmal >> >> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >> Mobile: +94715779733 >> Blog: http://nirmalfdo.blogspot.com/ >> >> >> > -- Thanks & regards, Nirmal Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/