Re: LBGFS optimizer performace

2015-03-06 Thread Gustavo Enrique Salazar Torres
Hi there: Yeah, I came to that same conclusion after tuning spark sql shuffle parameter. Also cut out some classes I was using to parse my dataset and finally created schema only with the fields needed for my model (before that I was creating it with 63 fields while I just needed 15). So I came

Re: LBGFS optimizer performace

2015-03-05 Thread DB Tsai
PS, I will recommend you compress the data when you cache the RDD. There will be some overhead in compression/decompression, and serialization/deserialization, but it will help a lot for iterative algorithms with ability to caching more data. Sincerely, DB Tsai

Re: LBGFS optimizer performace

2015-03-03 Thread Joseph Bradley
Is that error actually occurring in LBFGS? It looks like it might be happening before the data even gets to LBFGS. (Perhaps the outer join you're trying to do is making the dataset size explode a bit.) Are you able to call count() (or any RDD action) on the data before you pass it to LBFGS? On

Re: LBGFS optimizer performace

2015-03-03 Thread Joseph Bradley
I would recommend caching; if you can't persist, iterative algorithms will not work well. I don't think calling count on the dataset is problematic; every iteration in LBFGS iterates over the whole dataset and does a lot more computation than count(). It would be helpful to see some error

Re: LBGFS optimizer performace

2015-03-03 Thread Gustavo Enrique Salazar Torres
Yeah, I can call count before that and it works. Also I was over caching tables but I removed those. Now there is no caching but it gets really slow since it calculates my table RDD many times. Also hacked the LBFGS code to pass the number of examples which I calculated outside in a Spark SQL

LBGFS optimizer performace

2015-03-02 Thread Gustavo Enrique Salazar Torres
Hi there: I'm using LBFGS optimizer to train a logistic regression model. The code I implemented follows the pattern showed in https://spark.apache.org/docs/1.2.0/mllib-linear-methods.html but training data is obtained from a Spark SQL RDD. The problem I'm having is that LBFGS tries to count the

Re: LBGFS optimizer performace

2015-03-02 Thread Akhil Das
Can you try increasing your driver memory, reducing the executors and increasing the executor memory? Thanks Best Regards On Tue, Mar 3, 2015 at 10:09 AM, Gustavo Enrique Salazar Torres gsala...@ime.usp.br wrote: Hi there: I'm using LBFGS optimizer to train a logistic regression model. The