Re: [MLlib] kmeans random initialization, same seed every time

2017-03-14 Thread Yuhao Yang
Hi Julian, Thanks for reporting this. This is a valid issue and I created https://issues.apache.org/jira/browse/SPARK-19957 to track it. Right now the seed is set to this.getClass.getName.hashCode.toLong by default, which indeed keeps the same among multiple fits. Feel free to leave your comments

Re: [MLlib] kmeans random initialization, same seed every time

2017-03-14 Thread Julian Keppel
I'm sorry, I missed some important informations. I use Spark version 2.0.2 in Scala 2.11.8. 2017-03-14 13:44 GMT+01:00 Julian Keppel : > Hi everybody, > > I make some experiments with the Spark kmeans implementation of the new > DataFrame-API. I compare clustering results of different runs with >

[MLlib] kmeans random initialization, same seed every time

2017-03-14 Thread Julian Keppel
Hi everybody, I make some experiments with the Spark kmeans implementation of the new DataFrame-API. I compare clustering results of different runs with different parameters. I recognized that for random initialization mode, the seed value is the same every time. How is it calculated? In my unders