Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-19 Thread Roshani Nagmote
Thanks Nick. Its working On Mon, Sep 19, 2016 at 11:11 AM, Nick Pentreath wrote: > Try als.setCheckpointInterval (http://spark.apache.org/docs/ > latest/api/scala/index.html#org.apache.spark.mllib.recommendation.ALS@ >

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-19 Thread Nick Pentreath
Try als.setCheckpointInterval ( http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.recommendation.ALS@setCheckpointInterval(checkpointInterval:Int):ALS.this.type ) On Mon, 19 Sep 2016 at 20:01 Roshani Nagmote wrote: > Hello Sean, > > Can

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-19 Thread Roshani Nagmote
Hello Sean, Can you please tell me how to set checkpoint interval? I did set checkpointDir("hdfs:/") But if I want to reduce the default value of checkpoint interval which is 10. How should it be done? Sorry is its a very basic question. I am a novice in spark. Thanks, Roshani On Fri, Sep 16,

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Roshani Nagmote
Hello, Thanks for your reply. Yes, Its netflix dataset. And when I get no space on device, my ‘/mnt’ directory gets filled up. I checked. /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --class org.apache.spark.examples.mllib.MovieLensALS --jars

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Sean Owen
Oh this is the netflix dataset right? I recognize it from the number of users/items. It's not fast on a laptop or anything, and takes plenty of memory, but succeeds. I haven't run this recently but it worked in Spark 1.x. On Fri, Sep 16, 2016 at 5:13 PM, Roshani Nagmote

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Roshani Nagmote
I am also surprised that I face this problems with fairy small dataset on 14 M4.2xlarge machines. Could you please let me know on which dataset you can run 100 iterations of rank 30 on your laptop? I am currently just trying to run the default example code given with spark to run ALS on movie

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Sean Owen
You may have to decrease the checkpoint interval to say 5 if you're getting StackOverflowError. You may have a particularly deep lineage being created during iterations. No space left on device means you don't have enough local disk to accommodate the big shuffles in some stage. You can add more