Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-08-21 Thread Ravi Mody
I've been able to almost halve my memory usage with no instability issues. I lowered my storage.memoryFraction and increased my shuffle.memoryFraction (essentially swapping them). I set spark.yarn.executor.memoryOverhead to 6GB. And I lowered executor-cores in case other jobs are using the availab

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Too many values to unpack

2015-07-27 Thread Xiangrui Meng
It seems that the error happens before ALS iterations. Could you try `ratings.first()` right after `ratings = newrdd.map(lambda l: Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50)`? -Xiangrui On Fri, Jun 26, 2015 at 2:28 PM, Ayman Farahat wrote: > I tried something similar and got oration error

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Ayman Farahat
Where do I do that ? Thanks Sent from my iPhone > On Jun 27, 2015, at 8:59 PM, Sabarish Sasidharan > wrote: > > Try setting the yarn executor memory overhead to a higher value like 1g or > 1.5g or more. > > Regards > Sab > >> On 28-Jun-2015 9:22 am, "Ayman Farahat" wrote: >> That's corre

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Sabarish Sasidharan
Try setting the yarn executor memory overhead to a higher value like 1g or 1.5g or more. Regards Sab On 28-Jun-2015 9:22 am, "Ayman Farahat" wrote: > That's correct this is Yarn > And spark 1.4 > Also using the Anaconda tar for Numpy and other Libs > > > Sent from my iPhone > > On Jun 27, 2015,

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Ayman Farahat
That's correct this is Yarn And spark 1.4 Also using the Anaconda tar for Numpy and other Libs Sent from my iPhone > On Jun 27, 2015, at 8:50 PM, Sabarish Sasidharan > wrote: > > Are you running on top of YARN? Plus pls provide your infrastructure details. > > Regards > Sab > >> On 28-Jun-2

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Sabarish Sasidharan
Are you running on top of YARN? Plus pls provide your infrastructure details. Regards Sab On 28-Jun-2015 9:20 am, "Sabarish Sasidharan" < sabarish.sasidha...@manthan.com> wrote: > Are you running on top of YARN? Plus pls provide your infrastructure > details. > > Regards > Sab > On 28-Jun-2015 8:

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Sabarish Sasidharan
Are you running on top of YARN? Plus pls provide your infrastructure details. Regards Sab On 28-Jun-2015 8:47 am, "Ayman Farahat" wrote: > Hello; > I tried to adjust the number of blocks by repartitioning the input. > Here is How I do it; (I am partitioning by users ) > > tot = newrdd.map(lambd

Failed stages and dropped executors when running implicit matrix factorization/ALS : Same error after the re-partition

2015-06-27 Thread Ayman Farahat
Hello; I tried to adjust the number of blocks by repartitioning the input. Here is How I do it; (I am partitioning by users ) tot = newrdd.map(lambda l: (l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache() ratings = tot.values() numIterations =8 rank = 80 model = ALS.trainImplicit(

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS : Too many values to unpack

2015-06-26 Thread Ayman Farahat
I tried something similar and got oration error I had 10 executors and 10 8 cores >>> ratings = newrdd.map(lambda l: >>> Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50) >>> mypart = ratings.getNumPartitions() >>> mypart 50 >>> numIterations =10 >>> rank = 100 >>> model = ALS.trainImplicit(rati

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ayman Farahat
how do i set these partitons? is this is the call to ALS model = ALS.trainImplicit(ratings, rank, numIterations)? On Jun 26, 2015, at 12:33 PM, Xiangrui Meng wrote: > So you have 100 partitions (blocks). This might be too many for your dataset. > Try setting a smaller number of blocks, e.g.,

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ravi Mody
I set the number of partitions on the input dataset at 50. The number of CPU cores I'm using is 84 (7 executors, 12 cores). I'll look into getting a full stack trace. Any idea what my errors mean, and why increasing memory causes them to go away? Thanks. On Fri, Jun 26, 2015 at 11:26 AM, Xiangrui

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
So you have 100 partitions (blocks). This might be too many for your dataset. Try setting a smaller number of blocks, e.g., 32 or 64. When ALS starts iterations, you can see the shuffle read/write size from the "stages" tab of Spark WebUI. Vary number of blocks and check the numbers there. Kyro ser

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ayman Farahat
Hello ; I checked on my partitions/storage and here is what I have I have 80 executors 5 G per executore. Do i need to set additional params say cores spark.serializer org.apache.spark.serializer.KryoSerializer # spark.driver.memory 5g # spark.executor.extraJavaOp

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
No, they use the same implementation. On Fri, Jun 26, 2015 at 8:05 AM, Ayman Farahat wrote: > I use the mllib not the ML. Does that make a difference ? > > Sent from my iPhone > > On Jun 26, 2015, at 7:19 AM, Ravi Mody wrote: > > Forgot to mention: rank of 100 usually works ok, 120 consistently

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Xiangrui Meng
Please see my comments inline. It would be helpful if you can attach the full stack trace. -Xiangrui On Fri, Jun 26, 2015 at 7:18 AM, Ravi Mody wrote: > 1. These are my settings: > rank = 100 > iterations = 12 > users = ~20M > items = ~2M > training examples = ~500M-1B (I'm running into the issue

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ayman Farahat
I use the mllib not the ML. Does that make a difference ? Sent from my iPhone > On Jun 26, 2015, at 7:19 AM, Ravi Mody wrote: > > Forgot to mention: rank of 100 usually works ok, 120 consistently cannot > finish. > >> On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody wrote: >> 1. These are my set

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ravi Mody
Forgot to mention: rank of 100 usually works ok, 120 consistently cannot finish. On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody wrote: > 1. These are my settings: > rank = 100 > iterations = 12 > users = ~20M > items = ~2M > training examples = ~500M-1B (I'm running into the issue even with 500M >

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-26 Thread Ravi Mody
1. These are my settings: rank = 100 iterations = 12 users = ~20M items = ~2M training examples = ~500M-1B (I'm running into the issue even with 500M training examples) 2. The memory storage never seems to go too high. The user blocks may go up to ~10Gb, and each executor will have a few GB used o

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-25 Thread Ayman Farahat
was there any resolution to that problem? I am also having that with Pyspark 1.4 380 Million observations 100 factors and 5 iterations Thanks Ayman On Jun 23, 2015, at 6:20 PM, Xiangrui Meng wrote: > It shouldn't be hard to handle 1 billion ratings in 1.3. Just need > more information to guess w

Re: Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-23 Thread Xiangrui Meng
It shouldn't be hard to handle 1 billion ratings in 1.3. Just need more information to guess what happened: 1. Could you share the ALS settings, e.g., number of blocks, rank and number of iterations, as well as number of users/items in your dataset? 2. If you monitor the progress in the WebUI, how

Failed stages and dropped executors when running implicit matrix factorization/ALS

2015-06-19 Thread Ravi Mody
Hi, I'm running implicit matrix factorization/ALS in Spark 1.3.1 on fairly large datasets (1+ billion input records). As I grow my dataset I often run into issues with a lot of failed stages and dropped executors, ultimately leading to the whole application failing. The errors are like "org.apache.