Hi Peng, Short answer: Yes. It has been run on billions of rows and tens of millions of columns.
Long answer: There are many ways to implement LR in a distributed fashion, and their dependence on the dataset dimensions and compute cluster size varies. The implementation distributes the gradient computation (which is instance-parallel). You can find more info here: http://spark.apache.org/docs/latest/mllib-linear-methods.html Joseph On Tue, Feb 3, 2015 at 7:21 AM, Peng Zhang <pzhang.x...@icloud.com> wrote: > Hi Everyone, > > Is LogisticRegressionWithSGD in MLlib scalable? > > If so, what is the idea behind the scalable implementation? > > Thanks in advance, > > Peng > > > > > > ----- > Peng Zhang > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-LogisticRegressionWithSGD-in-MLlib-scalable-tp21482.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >