Small batch sizes are typically used to speed up the training (more iterations) and to avoid the issue that training sets usually don’t fit into memory. Okay, the additional noise from the stochastic approach may also be helpful to escape local minima and/or help with generalization performance (eg as discussed in the recent paper where the authors compared SGD to other optimizers). In any case, since batch size is effectively a hyper parameter I would just experiment with a few values and compare. Also, since you have a small dataset, I would maybe also try to just go with batch gradient descent (I.e batch size = n training samples).
Best, Sebastian Sent from my iPhone > On Sep 24, 2017, at 4:35 PM, Thomas Evangelidis <teva...@gmail.com> wrote: > > Greetings, > > I traing MLPRegressors using small datasets, usually with 10-50 observations. > The default batch_size=min(200, n_samples) for the adam optimizer, and > because my n_samples is always < 200, it is eventually batch_size=n_samples. > According to the theory, stochastic gradient-based optimizers like adam > perform better in the small batch regime. Considering the above, what would > be a good batch_size value in my case (e.g. 4)? Is there any rule of thump to > select the batch_size when the n_samples is small or must the choice be based > on trial and error? > > > -- > ====================================================================== > Dr Thomas Evangelidis > Post-doctoral Researcher > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/2S049, > 62500 Brno, Czech Republic > > email: tev...@pharm.uoa.gr > teva...@gmail.com > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn