Re: [MLlib] kmeans random initialization, same seed every time
Hi Julian, Thanks for reporting this. This is a valid issue and I created https://issues.apache.org/jira/browse/SPARK-19957 to track it. Right now the seed is set to this.getClass.getName.hashCode.toLong by default, which indeed keeps the same among multiple fits. Feel free to leave your comments or send a PR for the fix. For your problem, you may add .setSeed(new Random().nextLong()) before fit() as a workaround. Thanks, Yuhao 2017-03-14 5:46 GMT-07:00 Julian Keppel: > I'm sorry, I missed some important informations. I use Spark version 2.0.2 > in Scala 2.11.8. > > 2017-03-14 13:44 GMT+01:00 Julian Keppel : > >> Hi everybody, >> >> I make some experiments with the Spark kmeans implementation of the new >> DataFrame-API. I compare clustering results of different runs with >> different parameters. I recognized that for random initialization mode, the >> seed value is the same every time. How is it calculated? In my >> understanding the seed should be random if it is not provided by the user. >> >> Thank you for you help. >> >> Julian >> > >
Re: [MLlib] kmeans random initialization, same seed every time
I'm sorry, I missed some important informations. I use Spark version 2.0.2 in Scala 2.11.8. 2017-03-14 13:44 GMT+01:00 Julian Keppel: > Hi everybody, > > I make some experiments with the Spark kmeans implementation of the new > DataFrame-API. I compare clustering results of different runs with > different parameters. I recognized that for random initialization mode, the > seed value is the same every time. How is it calculated? In my > understanding the seed should be random if it is not provided by the user. > > Thank you for you help. > > Julian >
[MLlib] kmeans random initialization, same seed every time
Hi everybody, I make some experiments with the Spark kmeans implementation of the new DataFrame-API. I compare clustering results of different runs with different parameters. I recognized that for random initialization mode, the seed value is the same every time. How is it calculated? In my understanding the seed should be random if it is not provided by the user. Thank you for you help. Julian