On 04/03/2015 01:39 PM, Sebastian Raschka wrote:
> Was there are reason to implement the SGDClassifier with shuffling after each 
> epoch rather than selecting a random training sample in each iteration? In 
> terms of computational efficiency it should make a difference to generate a 
> random number at each iteration or shuffle the training set once per epoch, 
> right?
Actually, we added the default shuffling recently.
Arguably, the "standard" implementation of SGD is to shuffle the data 
once before training, and never again.
Proofs are usually done using iid sampling of points, though the plots 
in the papers are usually done using either shuffling once at the 
beginning or after each epoch,
as that usually works better.
If you look at talks by Francis Bach or Shalev-Shwartz they often 
mention it and I also heard them say that they don't know why it works 
better if the theory says to use iid sampling.
I've actually seen a talk by Shalev-Shwartz where he was surprised that 
he had to shuffle between iterations to get the convergence he proved.

The reason why you want to shuffle at least once in the beginning is 
that SGD breaks if the data is sorted, say, by labels.
We benched shuffling once vs shuffling every time, and the runtime is 
basically the same, opted for shuffling every time.

Maybe mblondel or larsmans can give better answers, though ;)

Hth,
Andy

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to