Re: Spark executor OOM issue on YARN

2015-09-01 Thread ponkin
Hi,
Can you please post your stack trace with exceptions? and also command line
attributes in spark-submit?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-OOM-issue-on-YARN-tp24522p24530.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark executor OOM issue on YARN

2015-08-31 Thread Umesh Kacha
Hi Ted thanks I know by default spark.sql.shuffle.partition are 200. It
would be great if you help me solve OOM issue.

On Mon, Aug 31, 2015 at 11:43 PM, Ted Yu  wrote:

> Please see this thread w.r.t. spark.sql.shuffle.partitions :
> http://search-hadoop.com/m/q3RTtE7JOv1bDJtY
>
> FYI
>
> On Mon, Aug 31, 2015 at 11:03 AM, unk1102  wrote:
>
>> Hi I have Spark job and its executors hits OOM issue after some time and
>> my
>> job hangs because of it followed by couple of IOException, Rpc client
>> disassociated, shuffle not found etc
>>
>> I have tried almost everything dont know how do I solve this OOM issue
>> please guide I am fed up now. Here what I tried but nothing worked
>>
>> -I tried 60 executors with each executor having 12 Gig/2 core
>> -I tried 30 executors with each executor having 20 Gig/2 core
>> -I tried 40 executors with each executor having 30 Gig/6 core (I also
>> tried
>> 7 and 8 core)
>> -I tried to set spark.storage.memoryFraction to 0.2 in order to solve OOM
>> issue I also tried to set it 0.0
>> -I tried to set spark.shuffle.memoryFraction to 0.4 since I need more
>> shuffling memory
>> -I tried to set spark.default.parallelism to 500,1000,1500 but it did not
>> help avoid OOM what is the ideal value for it?
>> -I also tried to set spark.sql.shuffle.partitions to 500 but it did not
>> help
>> it just creates 500 output part files. Please make me understand
>> difference
>> between spark.default.parallelism and spark.sql.shuffle.partitions.
>>
>> My data is skewed but not that much large I dont understand why it is
>> hitting OOM I dont cache anything I jsut have four group by queries I am
>> calling using hivecontext.sql(). I have around 1000 threads which I spawn
>> from driver and each thread will execute these four queries.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-OOM-issue-on-YARN-tp24522.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Spark executor OOM issue on YARN

2015-08-31 Thread Ted Yu
Please see this thread w.r.t. spark.sql.shuffle.partitions :
http://search-hadoop.com/m/q3RTtE7JOv1bDJtY

FYI

On Mon, Aug 31, 2015 at 11:03 AM, unk1102  wrote:

> Hi I have Spark job and its executors hits OOM issue after some time and my
> job hangs because of it followed by couple of IOException, Rpc client
> disassociated, shuffle not found etc
>
> I have tried almost everything dont know how do I solve this OOM issue
> please guide I am fed up now. Here what I tried but nothing worked
>
> -I tried 60 executors with each executor having 12 Gig/2 core
> -I tried 30 executors with each executor having 20 Gig/2 core
> -I tried 40 executors with each executor having 30 Gig/6 core (I also tried
> 7 and 8 core)
> -I tried to set spark.storage.memoryFraction to 0.2 in order to solve OOM
> issue I also tried to set it 0.0
> -I tried to set spark.shuffle.memoryFraction to 0.4 since I need more
> shuffling memory
> -I tried to set spark.default.parallelism to 500,1000,1500 but it did not
> help avoid OOM what is the ideal value for it?
> -I also tried to set spark.sql.shuffle.partitions to 500 but it did not
> help
> it just creates 500 output part files. Please make me understand difference
> between spark.default.parallelism and spark.sql.shuffle.partitions.
>
> My data is skewed but not that much large I dont understand why it is
> hitting OOM I dont cache anything I jsut have four group by queries I am
> calling using hivecontext.sql(). I have around 1000 threads which I spawn
> from driver and each thread will execute these four queries.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-OOM-issue-on-YARN-tp24522.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Spark executor OOM issue on YARN

2015-08-31 Thread unk1102
Hi I have Spark job and its executors hits OOM issue after some time and my
job hangs because of it followed by couple of IOException, Rpc client
disassociated, shuffle not found etc

I have tried almost everything dont know how do I solve this OOM issue
please guide I am fed up now. Here what I tried but nothing worked

-I tried 60 executors with each executor having 12 Gig/2 core
-I tried 30 executors with each executor having 20 Gig/2 core
-I tried 40 executors with each executor having 30 Gig/6 core (I also tried
7 and 8 core)
-I tried to set spark.storage.memoryFraction to 0.2 in order to solve OOM
issue I also tried to set it 0.0
-I tried to set spark.shuffle.memoryFraction to 0.4 since I need more
shuffling memory
-I tried to set spark.default.parallelism to 500,1000,1500 but it did not
help avoid OOM what is the ideal value for it?
-I also tried to set spark.sql.shuffle.partitions to 500 but it did not help
it just creates 500 output part files. Please make me understand difference
between spark.default.parallelism and spark.sql.shuffle.partitions.

My data is skewed but not that much large I dont understand why it is
hitting OOM I dont cache anything I jsut have four group by queries I am
calling using hivecontext.sql(). I have around 1000 threads which I spawn
from driver and each thread will execute these four queries.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-OOM-issue-on-YARN-tp24522.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org