Re: Does the kFold in Spark always give you the same split?
Are you using SGD for logistic regression? There's a random element there too, by nature. I looked into the code and see that you can't set a seed, but actually, the sampling is done with a fixed seed per partition anyway. Hm. In general you would not expect these algorithms to produce the same result, given the stochastic nature. In this particular case, I'm not sure if you can or should be able to get the implementation to act deterministically. Even if the overt use of randomness is seed-able, there may be some non-determinism in the distributed nature of the processing that is having an effect. On Fri, Jan 30, 2015 at 7:27 PM, Jianguo Li wrote: > Thanks. I did specify a seed parameter. > > Seems that the problem is not caused by kFold. I actually ran another > experiment without cross validation. I just built a model with the training > data and then tested the model on the test data. However, the accuracy still > varies from one run to another. Interestingly, this only happens when I ran > the experiment on our cluster. If I ran the experiment on my local machine, > I can reproduce the result each time. Has anybody encountered similar issue > before? > > Thanks, > > Jianguo > > On Fri, Jan 30, 2015 at 11:22 AM, Sean Owen wrote: >> >> Have a look at the source code for MLUtils.kFold. Yes, there is a >> random element. That's good; you want the folds to be randomly chosen. >> Note there is a seed parameter, as in a lot of the APIs, that lets you >> fix the RNG seed and so get the same result every time, if you need >> to. >> >> On Fri, Jan 30, 2015 at 4:12 PM, Jianguo Li >> wrote: >> > Hi, >> > >> > I am using the utility function kFold provided in Spark for doing k-fold >> > cross validation using logistic regression. However, each time I run the >> > experiment, I got different different result. Since everything else >> > stays >> > constant, I was wondering if this is due to the kFold function I used. >> > Does >> > anyone know if the kFold gives you a different split on a data set each >> > time >> > you call it? >> > >> > Thanks, >> > >> > Jianguo > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Does the kFold in Spark always give you the same split?
Thanks. I did specify a seed parameter. Seems that the problem is not caused by kFold. I actually ran another experiment without cross validation. I just built a model with the training data and then tested the model on the test data. However, the accuracy still varies from one run to another. Interestingly, this only happens when I ran the experiment on our cluster. If I ran the experiment on my local machine, I can reproduce the result each time. Has anybody encountered similar issue before? Thanks, Jianguo On Fri, Jan 30, 2015 at 11:22 AM, Sean Owen wrote: > Have a look at the source code for MLUtils.kFold. Yes, there is a > random element. That's good; you want the folds to be randomly chosen. > Note there is a seed parameter, as in a lot of the APIs, that lets you > fix the RNG seed and so get the same result every time, if you need > to. > > On Fri, Jan 30, 2015 at 4:12 PM, Jianguo Li > wrote: > > Hi, > > > > I am using the utility function kFold provided in Spark for doing k-fold > > cross validation using logistic regression. However, each time I run the > > experiment, I got different different result. Since everything else stays > > constant, I was wondering if this is due to the kFold function I used. > Does > > anyone know if the kFold gives you a different split on a data set each > time > > you call it? > > > > Thanks, > > > > Jianguo >
Re: Does the kFold in Spark always give you the same split?
Have a look at the source code for MLUtils.kFold. Yes, there is a random element. That's good; you want the folds to be randomly chosen. Note there is a seed parameter, as in a lot of the APIs, that lets you fix the RNG seed and so get the same result every time, if you need to. On Fri, Jan 30, 2015 at 4:12 PM, Jianguo Li wrote: > Hi, > > I am using the utility function kFold provided in Spark for doing k-fold > cross validation using logistic regression. However, each time I run the > experiment, I got different different result. Since everything else stays > constant, I was wondering if this is due to the kFold function I used. Does > anyone know if the kFold gives you a different split on a data set each time > you call it? > > Thanks, > > Jianguo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Does the kFold in Spark always give you the same split?
Hi, I am using the utility function kFold provided in Spark for doing k-fold cross validation using logistic regression. However, each time I run the experiment, I got different different result. Since everything else stays constant, I was wondering if this is due to the kFold function I used. Does anyone know if the kFold gives you a different split on a data set each time you call it? Thanks, Jianguo