Jorn, My question is not about the model type but instead, the spark capability on reusing any already trained ml model in training a new model.
On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke <jornfra...@gmail.com> wrote: > Is it really required to have one billion samples for just linear > regression? Probably your model would do equally well with much less > samples. Have you checked bias and variance if you use much less random > samples? > > On 22. Aug 2017, at 12:58, Sea aj <saj3...@gmail.com> wrote: > > I have a large dataframe of 1 billion rows of type LabeledPoint. I tried > to train a linear regression model on the df but it failed due to lack of > memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of > CPU. > > I decided to split my data into multiple chunks and train the model in > multiple phases but I learned the linear regression model in ml library > does not have "setinitialmodel" function to be able to pass the trained > model from one chunk to the rest of chunks. In another word, each time I > call the fit function over a chunk of my data, it overwrites the previous > mode. > > So far the only solution I found is using Spark Streaming to be able to > split the data to multiple dfs and then train over each individually to > overcome memory issue. > > Do you know if there's any other solution? > > > > > On Mon, Jul 10, 2017 at 7:57 AM, Jayant Shekhar <jayantbaya...@gmail.com> > wrote: > >> Hello Mahesh, >> >> We have built one. You can download from here : >> https://www.sparkflows.io/download >> >> Feel free to ping me for any questions, etc. >> >> Best Regards, >> Jayant >> >> >> On Sun, Jul 9, 2017 at 9:35 PM, Mahesh Sawaiker < >> mahesh_sawai...@persistent.com> wrote: >> >>> Hi, >>> >>> >>> 1) Is anyone aware of any workbench kind of tool to run ML jobs in >>> spark. Specifically is the tool could be something like a Web application >>> that is configured to connect to a spark cluster. >>> >>> >>> User is able to select input training sets probably from hdfs , train >>> and then run predictions, without having to write any Scala code. >>> >>> >>> 2) If there is not tool, is there value in having such tool, what could >>> be the challenges. >>> >>> >>> Thanks, >>> >>> Mahesh >>> >>> >>> DISCLAIMER >>> ========== >>> This e-mail may contain privileged and confidential information which is >>> the property of Persistent Systems Ltd. It is intended only for the use of >>> the individual or entity to which it is addressed. If you are not the >>> intended recipient, you are not authorized to read, retain, copy, print, >>> distribute or use this message. If you have received this communication in >>> error, please notify the sender and delete all copies of this message. >>> Persistent Systems Ltd. does not accept any liability for virus infected >>> mails. >>> >> >> >