Re: miniBatchFraction for LinearRegressionWithSGD

Koen Vantomme Fri, 07 Aug 2015 13:12:27 -0700


Verzonden vanaf mijn Sony Xperia™-smartphone


---- Meihua Wu schreef ----

>Feynman, thanks for clarifying.
>
>If we default miniBatchFraction = (1 / numInstances), then we will
>only hit one row for every iteration of SGD regardless the number of
>partitions and executors. In other words the parallelism provided by
>the RDD is lost in this approach. I think this is something we need to
>consider for the default value of miniBatchFraction.
>
>On Fri, Aug 7, 2015 at 11:24 AM, Feynman Liang <fli...@databricks.com> wrote:
>> Yep, I think that's what Gerald is saying and they are proposing to default
>> miniBatchFraction = (1 / numInstances). Is that correct?
>>
>> On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetr...@gmail.com>
>> wrote:
>>>
>>> I think in the SGD algorithm, the mini batch sample is done without
>>> replacement. So with fraction=1, then all the rows will be sampled
>>> exactly once to form the miniBatch, resulting to the
>>> deterministic/classical case.
>>>
>>> On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fli...@databricks.com>
>>> wrote:
>>> > Sounds reasonable to me, feel free to create a JIRA (and PR if you're up
>>> > for
>>> > it) so we can see what others think!
>>> >
>>> > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler
>>> > <gerald.loeff...@googlemail.com> wrote:
>>> >>
>>> >> hi,
>>> >>
>>> >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0,
>>> >> doesn’t that make it a deterministic/classical gradient descent rather
>>> >> than a SGD?
>>> >>
>>> >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e.
>>> >> all rows. In the spirit of SGD, shouldn’t the default be the fraction
>>> >> that results in exactly one row of the data set?
>>> >>
>>> >> thank you
>>> >> gerald
>>> >>
>>> >> --
>>> >> Gerald Loeffler
>>> >> mailto:gerald.loeff...@googlemail.com
>>> >> http://www.gerald-loeffler.net
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> >> For additional commands, e-mail: user-h...@spark.apache.org
>>> >>
>>> >
>>
>>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>

Re: miniBatchFraction for LinearRegressionWithSGD

Reply via email to