I've been able to almost halve my memory usage with no instability issues.
I lowered my storage.memoryFraction and increased my shuffle.memoryFraction
(essentially swapping them). I set spark.yarn.executor.memoryOverhead to
6GB. And I lowered executor-cores in case other jobs are using the
availab
It seems that the error happens before ALS iterations. Could you try
`ratings.first()` right after `ratings = newrdd.map(lambda l:
Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50)`? -Xiangrui
On Fri, Jun 26, 2015 at 2:28 PM, Ayman Farahat wrote:
> I tried something similar and got oration error
Where do I do that ?
Thanks
Sent from my iPhone
> On Jun 27, 2015, at 8:59 PM, Sabarish Sasidharan
> wrote:
>
> Try setting the yarn executor memory overhead to a higher value like 1g or
> 1.5g or more.
>
> Regards
> Sab
>
>> On 28-Jun-2015 9:22 am, "Ayman Farahat" wrote:
>> That's corre
Try setting the yarn executor memory overhead to a higher value like 1g or
1.5g or more.
Regards
Sab
On 28-Jun-2015 9:22 am, "Ayman Farahat" wrote:
> That's correct this is Yarn
> And spark 1.4
> Also using the Anaconda tar for Numpy and other Libs
>
>
> Sent from my iPhone
>
> On Jun 27, 2015,
That's correct this is Yarn
And spark 1.4
Also using the Anaconda tar for Numpy and other Libs
Sent from my iPhone
> On Jun 27, 2015, at 8:50 PM, Sabarish Sasidharan
> wrote:
>
> Are you running on top of YARN? Plus pls provide your infrastructure details.
>
> Regards
> Sab
>
>> On 28-Jun-2
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 9:20 am, "Sabarish Sasidharan" <
sabarish.sasidha...@manthan.com> wrote:
> Are you running on top of YARN? Plus pls provide your infrastructure
> details.
>
> Regards
> Sab
> On 28-Jun-2015 8:
Are you running on top of YARN? Plus pls provide your infrastructure
details.
Regards
Sab
On 28-Jun-2015 8:47 am, "Ayman Farahat"
wrote:
> Hello;
> I tried to adjust the number of blocks by repartitioning the input.
> Here is How I do it; (I am partitioning by users )
>
> tot = newrdd.map(lambd
Hello;
I tried to adjust the number of blocks by repartitioning the input.
Here is How I do it; (I am partitioning by users )
tot = newrdd.map(lambda l:
(l[1],Rating(int(l[1]),int(l[2]),l[4]))).partitionBy(50).cache()
ratings = tot.values()
numIterations =8
rank = 80
model = ALS.trainImplicit(
I tried something similar and got oration error
I had 10 executors and 10 8 cores
>>> ratings = newrdd.map(lambda l:
>>> Rating(int(l[1]),int(l[2]),l[4])).partitionBy(50)
>>> mypart = ratings.getNumPartitions()
>>> mypart
50
>>> numIterations =10
>>> rank = 100
>>> model = ALS.trainImplicit(rati
how do i set these partitons? is this is the call to ALS
model = ALS.trainImplicit(ratings, rank, numIterations)?
On Jun 26, 2015, at 12:33 PM, Xiangrui Meng wrote:
> So you have 100 partitions (blocks). This might be too many for your dataset.
> Try setting a smaller number of blocks, e.g.,
I set the number of partitions on the input dataset at 50. The number of
CPU cores I'm using is 84 (7 executors, 12 cores).
I'll look into getting a full stack trace. Any idea what my errors mean,
and why increasing memory causes them to go away? Thanks.
On Fri, Jun 26, 2015 at 11:26 AM, Xiangrui
So you have 100 partitions (blocks). This might be too many for your
dataset. Try setting a smaller number of blocks, e.g., 32 or 64. When ALS
starts iterations, you can see the shuffle read/write size from the
"stages" tab of Spark WebUI. Vary number of blocks and check the numbers
there. Kyro ser
Hello ;
I checked on my partitions/storage and here is what I have
I have 80 executors
5 G per executore.
Do i need to set additional params
say cores
spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory 5g
# spark.executor.extraJavaOp
No, they use the same implementation.
On Fri, Jun 26, 2015 at 8:05 AM, Ayman Farahat wrote:
> I use the mllib not the ML. Does that make a difference ?
>
> Sent from my iPhone
>
> On Jun 26, 2015, at 7:19 AM, Ravi Mody wrote:
>
> Forgot to mention: rank of 100 usually works ok, 120 consistently
Please see my comments inline. It would be helpful if you can attach
the full stack trace. -Xiangrui
On Fri, Jun 26, 2015 at 7:18 AM, Ravi Mody wrote:
> 1. These are my settings:
> rank = 100
> iterations = 12
> users = ~20M
> items = ~2M
> training examples = ~500M-1B (I'm running into the issue
I use the mllib not the ML. Does that make a difference ?
Sent from my iPhone
> On Jun 26, 2015, at 7:19 AM, Ravi Mody wrote:
>
> Forgot to mention: rank of 100 usually works ok, 120 consistently cannot
> finish.
>
>> On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody wrote:
>> 1. These are my set
Forgot to mention: rank of 100 usually works ok, 120 consistently cannot
finish.
On Fri, Jun 26, 2015 at 10:18 AM, Ravi Mody wrote:
> 1. These are my settings:
> rank = 100
> iterations = 12
> users = ~20M
> items = ~2M
> training examples = ~500M-1B (I'm running into the issue even with 500M
>
1. These are my settings:
rank = 100
iterations = 12
users = ~20M
items = ~2M
training examples = ~500M-1B (I'm running into the issue even with 500M
training examples)
2. The memory storage never seems to go too high. The user blocks may go up
to ~10Gb, and each executor will have a few GB used o
was there any resolution to that problem?
I am also having that with Pyspark 1.4
380 Million observations
100 factors and 5 iterations
Thanks
Ayman
On Jun 23, 2015, at 6:20 PM, Xiangrui Meng wrote:
> It shouldn't be hard to handle 1 billion ratings in 1.3. Just need
> more information to guess w
It shouldn't be hard to handle 1 billion ratings in 1.3. Just need
more information to guess what happened:
1. Could you share the ALS settings, e.g., number of blocks, rank and
number of iterations, as well as number of users/items in your
dataset?
2. If you monitor the progress in the WebUI, how
Hi, I'm running implicit matrix factorization/ALS in Spark 1.3.1 on fairly
large datasets (1+ billion input records). As I grow my dataset I often run
into issues with a lot of failed stages and dropped executors, ultimately
leading to the whole application failing. The errors are like
"org.apache.
21 matches
Mail list logo