Thanks. I did specify a seed parameter. Seems that the problem is not caused by kFold. I actually ran another experiment without cross validation. I just built a model with the training data and then tested the model on the test data. However, the accuracy still varies from one run to another. Interestingly, this only happens when I ran the experiment on our cluster. If I ran the experiment on my local machine, I can reproduce the result each time. Has anybody encountered similar issue before?
Thanks, Jianguo On Fri, Jan 30, 2015 at 11:22 AM, Sean Owen <[email protected]> wrote: > Have a look at the source code for MLUtils.kFold. Yes, there is a > random element. That's good; you want the folds to be randomly chosen. > Note there is a seed parameter, as in a lot of the APIs, that lets you > fix the RNG seed and so get the same result every time, if you need > to. > > On Fri, Jan 30, 2015 at 4:12 PM, Jianguo Li <[email protected]> > wrote: > > Hi, > > > > I am using the utility function kFold provided in Spark for doing k-fold > > cross validation using logistic regression. However, each time I run the > > experiment, I got different different result. Since everything else stays > > constant, I was wondering if this is due to the kFold function I used. > Does > > anyone know if the kFold gives you a different split on a data set each > time > > you call it? > > > > Thanks, > > > > Jianguo >
