Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Feynman Liang
Good point; I agree that defaulting to online SGD (single example per iteration) would be a poor UX due to performance. On Fri, Aug 7, 2015 at 12:44 PM, Meihua Wu wrote: > Feynman, thanks for clarifying. > > If we default miniBatchFraction = (1 / numInstances), then we will > only hit one row fo

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Koen Vantomme
Verzonden vanaf mijn Sony Xperia™-smartphone Meihua Wu schreef >Feynman, thanks for clarifying. > >If we default miniBatchFraction = (1 / numInstances), then we will >only hit one row for every iteration of SGD regardless the number of >partitions and executors. In other words the par

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Meihua Wu
Feynman, thanks for clarifying. If we default miniBatchFraction = (1 / numInstances), then we will only hit one row for every iteration of SGD regardless the number of partitions and executors. In other words the parallelism provided by the RDD is lost in this approach. I think this is something w

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Feynman Liang
Yep, I think that's what Gerald is saying and they are proposing to default miniBatchFraction = (1 / numInstances). Is that correct? On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu wrote: > I think in the SGD algorithm, the mini batch sample is done without > replacement. So with fraction=1, then all

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Meihua Wu
I think in the SGD algorithm, the mini batch sample is done without replacement. So with fraction=1, then all the rows will be sampled exactly once to form the miniBatch, resulting to the deterministic/classical case. On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang wrote: > Sounds reasonable to me,

Re: miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Feynman Liang
Sounds reasonable to me, feel free to create a JIRA (and PR if you're up for it) so we can see what others think! On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler < gerald.loeff...@googlemail.com> wrote: > hi, > > if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0, > doesn’t that mak

miniBatchFraction for LinearRegressionWithSGD

2015-08-07 Thread Gerald Loeffler
hi, if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0, doesn’t that make it a deterministic/classical gradient descent rather than a SGD? Specifically, miniBatchFraction=1.0 means the entire data set, i.e. all rows. In the spirit of SGD, shouldn’t the default be the fraction that r