[Scikit-learn-general] SGDClassifier implementation and shuffling

Sebastian Raschka Fri, 03 Apr 2015 10:41:20 -0700

Hi, all, 

I have a question about the implementation of stochastic gradient descent in 
the SGDClassifier. Based on the documentation (at 
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html)
 it sounds like that the n_iter parameter refers to an the entire pass of the 
training dataset, which is shuffled after each epoch via `shuffle=True`.


This is basically in stochastic gradient descent algorithm like below (from 
Wikipedia):

        • Choose an initial vector of weight and learning rate
        • Randomly shuffle examples in the training set.
        • Repeat until an approximate minimum is obtained:
                •  for i=1,2..., n do:
                        • calc. gradient + weight update

However, in the "standard" implementation of SGD, isn't each training sample 
picked randomly for each iteration (where iteration means gradient step, not 
epoch), which would be random sampling with replacement?

        • Choose an initial vector of weight and learning rate
        • Repeat until an approximate minimum is obtained or maximum number of 
iterations is reached:
                • Randomly select one sample in the training set.
                • calc. gradient + weight update

The random shuffling after each iteration is mentioned in, e.g., :

Bottou, Léon. "Large-scale machine learning with stochastic gradient descent." 
Proceedings of COMPSTAT'2010. Physica-Verlag HD, 2010. 177-186.

Zhang, Tong. "Solving large scale linear prediction problems using stochastic 
gradient descent algorithms." Proceedings of the twenty-first international 
conference on Machine learning. ACM, 2004.

Was there are reason to implement the SGDClassifier with shuffling after each 
epoch rather than selecting a random training sample in each iteration? In 
terms of computational efficiency it should make a difference to generate a 
random number at each iteration or shuffle the training set once per epoch, 
right?

Best,
Sebastian
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] SGDClassifier implementation and shuffling

Reply via email to