Verzonden vanaf mijn Sony Xperia™-smartphone
---- Meihua Wu schreef ---- >Feynman, thanks for clarifying. > >If we default miniBatchFraction = (1 / numInstances), then we will >only hit one row for every iteration of SGD regardless the number of >partitions and executors. In other words the parallelism provided by >the RDD is lost in this approach. I think this is something we need to >consider for the default value of miniBatchFraction. > >On Fri, Aug 7, 2015 at 11:24 AM, Feynman Liang <fli...@databricks.com> wrote: >> Yep, I think that's what Gerald is saying and they are proposing to default >> miniBatchFraction = (1 / numInstances). Is that correct? >> >> On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetr...@gmail.com> >> wrote: >>> >>> I think in the SGD algorithm, the mini batch sample is done without >>> replacement. So with fraction=1, then all the rows will be sampled >>> exactly once to form the miniBatch, resulting to the >>> deterministic/classical case. >>> >>> On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fli...@databricks.com> >>> wrote: >>> > Sounds reasonable to me, feel free to create a JIRA (and PR if you're up >>> > for >>> > it) so we can see what others think! >>> > >>> > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler >>> > <gerald.loeff...@googlemail.com> wrote: >>> >> >>> >> hi, >>> >> >>> >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0, >>> >> doesn’t that make it a deterministic/classical gradient descent rather >>> >> than a SGD? >>> >> >>> >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e. >>> >> all rows. In the spirit of SGD, shouldn’t the default be the fraction >>> >> that results in exactly one row of the data set? >>> >> >>> >> thank you >>> >> gerald >>> >> >>> >> -- >>> >> Gerald Loeffler >>> >> mailto:gerald.loeff...@googlemail.com >>> >> http://www.gerald-loeffler.net >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> >> For additional commands, e-mail: user-h...@spark.apache.org >>> >> >>> > >> >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >For additional commands, e-mail: user-h...@spark.apache.org >