Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

Nirmal Fernando Mon, 13 Jul 2015 10:59:08 -0700

Hi Burak,

k = 3
dimension = 785 features
Spark 1.4


On Mon, Jul 13, 2015 at 10:28 PM, Burak Yavuz <brk...@gmail.com> wrote:

> Hi,
>
> How are you running K-Means? What is your k? What is the dimension of your
> dataset (columns)? Which Spark version are you using?
>
> Thanks,
> Burak
>
> On Mon, Jul 13, 2015 at 2:53 AM, Nirmal Fernando <nir...@wso2.com> wrote:
>
>> Hi,
>>
>> For a fairly large dataset, 30MB, KMeansModel.computeCost takes lot of
>> time (16+ mints).
>>
>> It takes lot of time at this task;
>>
>> org.apache.spark.rdd.DoubleRDDFunctions.sum(DoubleRDDFunctions.scala:33)
>> org.apache.spark.mllib.clustering.KMeansModel.computeCost(KMeansModel.scala:70)
>>
>> Can this be improved?
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>


-- 

Thanks & regards,
Nirmal

Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

Re: [MLLib][Kmeans] KMeansModel.computeCost takes lot of time

Reply via email to