What are the other parameters? Are you just setting k=3? What about # of runs? How many partitions do you have? How many cores does your machine have?
Thanks, Burak On Mon, Jul 13, 2015 at 10:57 AM, Nirmal Fernando <nir...@wso2.com> wrote: > Hi Burak, > > k = 3 > dimension = 785 features > Spark 1.4 > > On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brk...@gmail.com> wrote: > >> Hi, >> >> How are you running K-Means? What is your k? What is the dimension of >> your dataset (columns)? Which Spark version are you using? >> >> Thanks, >> Burak >> >> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com> wrote: >> >>> Hi, >>> >>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of >>> time (16+ mints). >>> >>> It takes lot of time at this task; >>> >>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33) >>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70) >>> >>> Can this be improved? >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> > > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > >