subject:"Issues while running MLlib matrix factorization ALS algorithm"

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-19 Thread Roshani Nagmote

Thanks Nick. Its working

On Mon, Sep 19, 2016 at 11:11 AM, Nick Pentreath 
wrote:

> Try als.setCheckpointInterval (http://spark.apache.org/docs/
> latest/api/scala/index.html#org.apache.spark.mllib.recommendation.ALS@
> setCheckpointInterval(checkpointInterval:Int):ALS.this.type)
>
>
> On Mon, 19 Sep 2016 at 20:01 Roshani Nagmote 
> wrote:
>
>> Hello Sean,
>>
>> Can you please tell me how to set checkpoint interval? I did set
>> checkpointDir("hdfs:/") But if I want to reduce the default value of
>> checkpoint interval which is 10. How should it be done?
>>
>> Sorry is its a very basic question. I am a novice in spark.
>>
>> Thanks,
>> Roshani
>>
>> On Fri, Sep 16, 2016 at 11:14 AM, Roshani Nagmote <
>> roshaninagmo...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Thanks for your reply.
>>>
>>> Yes, Its netflix dataset. And when I get no space on device, my ‘/mnt’
>>> directory gets filled up. I checked.
>>>
>>> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn
>>> --class org.apache.spark.examples.mllib.MovieLensALS --jars
>>> /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
>>> /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar *--rank 32
>>> --numIterations 100* --kryo s3://dataset_netflix
>>>
>>> When I run above command, I get following error
>>>
>>> Job aborted due to stage failure: Task 221 in stage 53.0 failed 4 times,
>>> most recent failure: Lost task 221.3 in stage 53.0 (TID 9817, ):
>>> java.io.FileNotFoundException: /mnt/yarn/usercache/hadoop/
>>> appcache/application_1473786456609_0042/blockmgr-
>>> 045c2dec-7765-4954-9c9a-c7452f7bd3b7/08/shuffle_168_
>>> 221_0.data.b17d39a6-4d3c-4198-9e25-e19ca2b4d368 (No space left on
>>> device)
>>>
>>> I think I should not need to increase the space on device, as data is
>>> not that big. So, is there any way, I can setup parameters so that it does
>>> not use much disk space. I don’t know much about tuning parameters.
>>>
>>> It will be great if anyone can help me with this.
>>>
>>> Thanks,
>>> Roshani
>>>
>>> On Sep 16, 2016, at 9:18 AM, Sean Owen  wrote:
>>>
>>>
>>>
>>>
>>>
>>

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-19 Thread Nick Pentreath

Try als.setCheckpointInterval (
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.mllib.recommendation.ALS@setCheckpointInterval(checkpointInterval:Int):ALS.this.type
)

On Mon, 19 Sep 2016 at 20:01 Roshani Nagmote 
wrote:

> Hello Sean,
>
> Can you please tell me how to set checkpoint interval? I did set
> checkpointDir("hdfs:/") But if I want to reduce the default value of
> checkpoint interval which is 10. How should it be done?
>
> Sorry is its a very basic question. I am a novice in spark.
>
> Thanks,
> Roshani
>
> On Fri, Sep 16, 2016 at 11:14 AM, Roshani Nagmote <
> roshaninagmo...@gmail.com> wrote:
>
>> Hello,
>>
>> Thanks for your reply.
>>
>> Yes, Its netflix dataset. And when I get no space on device, my ‘/mnt’
>> directory gets filled up. I checked.
>>
>> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn
>> --class org.apache.spark.examples.mllib.MovieLensALS --jars
>> /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
>> /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar *--rank 32
>> --numIterations 100* --kryo s3://dataset_netflix
>>
>> When I run above command, I get following error
>>
>> Job aborted due to stage failure: Task 221 in stage 53.0 failed 4 times,
>> most recent failure: Lost task 221.3 in stage 53.0 (TID 9817, ):
>> java.io.FileNotFoundException:
>> /mnt/yarn/usercache/hadoop/appcache/application_1473786456609_0042/blockmgr-045c2dec-7765-4954-9c9a-c7452f7bd3b7/08/shuffle_168_221_0.data.b17d39a6-4d3c-4198-9e25-e19ca2b4d368
>> (No space left on device)
>>
>> I think I should not need to increase the space on device, as data is not
>> that big. So, is there any way, I can setup parameters so that it does not
>> use much disk space. I don’t know much about tuning parameters.
>>
>> It will be great if anyone can help me with this.
>>
>> Thanks,
>> Roshani
>>
>> On Sep 16, 2016, at 9:18 AM, Sean Owen  wrote:
>>
>>
>>
>>
>>
>

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-19 Thread Roshani Nagmote

Hello Sean,

Can you please tell me how to set checkpoint interval? I did set
checkpointDir("hdfs:/") But if I want to reduce the default value of
checkpoint interval which is 10. How should it be done?

Sorry is its a very basic question. I am a novice in spark.

Thanks,
Roshani

On Fri, Sep 16, 2016 at 11:14 AM, Roshani Nagmote  wrote:

> Hello,
>
> Thanks for your reply.
>
> Yes, Its netflix dataset. And when I get no space on device, my ‘/mnt’
> directory gets filled up. I checked.
>
> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn
> --class org.apache.spark.examples.mllib.MovieLensALS --jars
> /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
> /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar *--rank 32
> --numIterations 100* --kryo s3://dataset_netflix
>
> When I run above command, I get following error
>
> Job aborted due to stage failure: Task 221 in stage 53.0 failed 4 times,
> most recent failure: Lost task 221.3 in stage 53.0 (TID 9817, ):
> java.io.FileNotFoundException: /mnt/yarn/usercache/hadoop/
> appcache/application_1473786456609_0042/blockmgr-045c2dec-7765-4954-9c9a-
> c7452f7bd3b7/08/shuffle_168_221_0.data.b17d39a6-4d3c-4198-9e25-e19ca2b4d368
> (No space left on device)
>
> I think I should not need to increase the space on device, as data is not
> that big. So, is there any way, I can setup parameters so that it does not
> use much disk space. I don’t know much about tuning parameters.
>
> It will be great if anyone can help me with this.
>
> Thanks,
> Roshani
>
> On Sep 16, 2016, at 9:18 AM, Sean Owen  wrote:
>
>
>
>
>

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Roshani Nagmote

Hello,

Thanks for your reply.

Yes, Its netflix dataset. And when I get no space on device, my ‘/mnt’ 
directory gets filled up. I checked. 

/usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --class 
org.apache.spark.examples.mllib.MovieLensALS --jars 
/usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar 
/usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar --rank 32 
--numIterations 100 --kryo s3://dataset_netflix

When I run above command, I get following error

Job aborted due to stage failure: Task 221 in stage 53.0 failed 4 times, most 
recent failure: Lost task 221.3 in stage 53.0 (TID 9817, ): 
java.io.FileNotFoundException: 
/mnt/yarn/usercache/hadoop/appcache/application_1473786456609_0042/blockmgr-045c2dec-7765-4954-9c9a-c7452f7bd3b7/08/shuffle_168_221_0.data.b17d39a6-4d3c-4198-9e25-e19ca2b4d368
 (No space left on device)

I think I should not need to increase the space on device, as data is not that 
big. So, is there any way, I can setup parameters so that it does not use much 
disk space. I don’t know much about tuning parameters. 

It will be great if anyone can help me with this.

Thanks,
Roshani

> On Sep 16, 2016, at 9:18 AM, Sean Owen  wrote:
> 
>

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Sean Owen

Oh this is the netflix dataset right? I recognize it from the number
of users/items. It's not fast on a laptop or anything, and takes
plenty of memory, but succeeds. I haven't run this recently but it
worked in Spark 1.x.

On Fri, Sep 16, 2016 at 5:13 PM, Roshani Nagmote
 wrote:
> I am also surprised that I face this problems with fairy small dataset on 14
> M4.2xlarge machines.  Could you please let me know on which dataset you can
> run 100 iterations of rank 30 on your laptop?
>
> I am currently just trying to run the default example code given with spark
> to run ALS on movie lens dataset. I did not change anything in the code.
> However I am running this example on Netflix dataset (1.5 gb)
>
> Thanks,
> Roshani
>
>
> On Friday, September 16, 2016, Sean Owen  wrote:
>>
>> You may have to decrease the checkpoint interval to say 5 if you're
>> getting StackOverflowError. You may have a particularly deep lineage
>> being created during iterations.
>>
>> No space left on device means you don't have enough local disk to
>> accommodate the big shuffles in some stage. You can add more disk or
>> maybe look at tuning shuffle params to do more in memory and maybe
>> avoid spilling to disk as much.
>>
>> However, given the small data size, I'm surprised that you see either
>> problem.
>>
>> 10-20 iterations is usually where the model stops improving much anyway.
>>
>> I can run 100 iterations of rank 30 on my *laptop* so something is
>> fairly wrong in your setup or maybe in other parts of your user code.
>>
>> On Thu, Sep 15, 2016 at 10:00 PM, Roshani Nagmote
>>  wrote:
>> > Hi,
>> >
>> > I need help to run matrix factorization ALS algorithm in Spark MLlib.
>> >
>> > I am using dataset(1.5Gb) having 480189 users and 17770 items formatted
>> > in
>> > similar way as Movielens dataset.
>> > I am trying to run MovieLensALS example jar on this dataset on AWS Spark
>> > EMR
>> > cluster having 14 M4.2xlarge slaves.
>> >
>> > Command run:
>> > /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn
>> > --class
>> > org.apache.spark.examples.mllib.MovieLensALS --jars
>> > /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
>> > /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar --rank 32
>> > --numIterations 50 --kryo s3://dataset/input_dataset
>> >
>> > Issues I get:
>> > If I increase rank to 70 or more and numIterations 15 or more, I get
>> > following errors:
>> > 1) stack overflow error
>> > 2) No space left on device - shuffle phase
>> >
>> > Could you please let me know if there are any parameters I should tune
>> > to
>> > make this algorithm work on this dataset?
>> >
>> > For better rmse, I want to increase iterations. Am I missing something
>> > very
>> > trivial? Could anyone help me run this algorithm on this specific
>> > dataset
>> > with more iterations?
>> >
>> > Was anyone able to run ALS on spark with more than 100 iterations and
>> > rank
>> > more than 30?
>> >
>> > Any help will be greatly appreciated.
>> >
>> > Thanks and Regards,
>> > Roshani

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Roshani Nagmote

I am also surprised that I face this problems with fairy small dataset on
14 M4.2xlarge machines.  Could you please let me know on which dataset you
can run 100 iterations of rank 30 on your laptop?

I am currently just trying to run the default example code given with spark
to run ALS on movie lens dataset. I did not change anything in the
code.  However I am running this example on Netflix dataset (1.5 gb)

Thanks,
Roshani

On Friday, September 16, 2016, Sean Owen  wrote:

> You may have to decrease the checkpoint interval to say 5 if you're
> getting StackOverflowError. You may have a particularly deep lineage
> being created during iterations.
>
> No space left on device means you don't have enough local disk to
> accommodate the big shuffles in some stage. You can add more disk or
> maybe look at tuning shuffle params to do more in memory and maybe
> avoid spilling to disk as much.
>
> However, given the small data size, I'm surprised that you see either
> problem.
>
> 10-20 iterations is usually where the model stops improving much anyway.
>
> I can run 100 iterations of rank 30 on my *laptop* so something is
> fairly wrong in your setup or maybe in other parts of your user code.
>
> On Thu, Sep 15, 2016 at 10:00 PM, Roshani Nagmote
> > wrote:
> > Hi,
> >
> > I need help to run matrix factorization ALS algorithm in Spark MLlib.
> >
> > I am using dataset(1.5Gb) having 480189 users and 17770 items formatted
> in
> > similar way as Movielens dataset.
> > I am trying to run MovieLensALS example jar on this dataset on AWS Spark
> EMR
> > cluster having 14 M4.2xlarge slaves.
> >
> > Command run:
> > /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn
> --class
> > org.apache.spark.examples.mllib.MovieLensALS --jars
> > /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
> > /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar --rank 32
> > --numIterations 50 --kryo s3://dataset/input_dataset
> >
> > Issues I get:
> > If I increase rank to 70 or more and numIterations 15 or more, I get
> > following errors:
> > 1) stack overflow error
> > 2) No space left on device - shuffle phase
> >
> > Could you please let me know if there are any parameters I should tune to
> > make this algorithm work on this dataset?
> >
> > For better rmse, I want to increase iterations. Am I missing something
> very
> > trivial? Could anyone help me run this algorithm on this specific dataset
> > with more iterations?
> >
> > Was anyone able to run ALS on spark with more than 100 iterations and
> rank
> > more than 30?
> >
> > Any help will be greatly appreciated.
> >
> > Thanks and Regards,
> > Roshani
>

Re: Issues while running MLlib matrix factorization ALS algorithm

2016-09-16 Thread Sean Owen

You may have to decrease the checkpoint interval to say 5 if you're
getting StackOverflowError. You may have a particularly deep lineage
being created during iterations.

No space left on device means you don't have enough local disk to
accommodate the big shuffles in some stage. You can add more disk or
maybe look at tuning shuffle params to do more in memory and maybe
avoid spilling to disk as much.

However, given the small data size, I'm surprised that you see either problem.

10-20 iterations is usually where the model stops improving much anyway.

I can run 100 iterations of rank 30 on my *laptop* so something is
fairly wrong in your setup or maybe in other parts of your user code.

On Thu, Sep 15, 2016 at 10:00 PM, Roshani Nagmote
 wrote:
> Hi,
>
> I need help to run matrix factorization ALS algorithm in Spark MLlib.
>
> I am using dataset(1.5Gb) having 480189 users and 17770 items formatted in
> similar way as Movielens dataset.
> I am trying to run MovieLensALS example jar on this dataset on AWS Spark EMR
> cluster having 14 M4.2xlarge slaves.
>
> Command run:
> /usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --class
> org.apache.spark.examples.mllib.MovieLensALS --jars
> /usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar
> /usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar --rank 32
> --numIterations 50 --kryo s3://dataset/input_dataset
>
> Issues I get:
> If I increase rank to 70 or more and numIterations 15 or more, I get
> following errors:
> 1) stack overflow error
> 2) No space left on device - shuffle phase
>
> Could you please let me know if there are any parameters I should tune to
> make this algorithm work on this dataset?
>
> For better rmse, I want to increase iterations. Am I missing something very
> trivial? Could anyone help me run this algorithm on this specific dataset
> with more iterations?
>
> Was anyone able to run ALS on spark with more than 100 iterations and rank
> more than 30?
>
> Any help will be greatly appreciated.
>
> Thanks and Regards,
> Roshani

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Issues while running MLlib matrix factorization ALS algorithm

2016-09-15 Thread Roshani Nagmote

Hi,

I need help to run matrix factorization ALS algorithm in Spark MLlib.

I am using dataset(1.5Gb) having 480189 users and 17770 items formatted in 
similar way as Movielens dataset. 
I am trying to run MovieLensALS example jar on this dataset on AWS Spark EMR 
cluster having 14 M4.2xlarge slaves. 

Command run: 
/usr/lib/spark/bin/spark-submit --deploy-mode cluster --master yarn --class 
org.apache.spark.examples.mllib.MovieLensALS --jars 
/usr/lib/spark/examples/jars/scopt_2.11-3.3.0.jar 
/usr/lib/spark/examples/jars/spark-examples_2.11-2.0.0.jar --rank 32 
--numIterations 50 --kryo s3://dataset/input_dataset

Issues I get:
If I increase rank to 70 or more and numIterations 15 or more, I get following 
errors:
1) stack overflow error 
2) No space left on device - shuffle phase

Could you please let me know if there are any parameters I should tune to make 
this algorithm work on this dataset?

For better rmse, I want to increase iterations. Am I missing something very 
trivial? Could anyone help me run this algorithm on this specific dataset with 
more iterations? 

Was anyone able to run ALS on spark with more than 100 iterations and rank more 
than 30?

Any help will be greatly appreciated.

Thanks and Regards,
Roshani

Re: Issues while running MLlib matrix factorization ALS algorithm

Re: Issues while running MLlib matrix factorization ALS algorithm

Re: Issues while running MLlib matrix factorization ALS algorithm

Re: Issues while running MLlib matrix factorization ALS algorithm

Re: Issues while running MLlib matrix factorization ALS algorithm

Re: Issues while running MLlib matrix factorization ALS algorithm

Re: Issues while running MLlib matrix factorization ALS algorithm

Issues while running MLlib matrix factorization ALS algorithm

8 matches

Site Navigation

Mail list logo

Footer information