Re: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space -- Work in 1.4, but 1.5 doesn't

2015-12-15 Thread Deenar Toraskar
On 16 December 2015 at 06:19, Deenar Toraskar <
deenar.toras...@thinkreactive.co.uk> wrote:

> Hi
>
> I had the same problem. There is a query with a lot of small tables (5x)
> all below the broadcast threshold and Spark is broadcasting all these
> tables together without checking if there is sufficient memory available.
>
> I got around this issue by reducing the
> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger
> tables in the query.
>
> This looks like a issue to me. A fix would be to
> a) ensure that in addition to the per table threshold, there is a total
> broadcast size per query, so only data upto that limit is broadcast
> preventing executors running out of memory.
>
> Shall I raise a JIRA for this?
>
> Regards
> Deenar
>
>
> On 4 November 2015 at 22:55, Shuai Zheng  wrote:
>
>> And an update is: this ONLY happen in Spark 1.5, I try to run it under
>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under
>> Spark 1.4 last time, and I just re-test it, it works). So this is proven
>> that there is no issue on the logic and data, it is caused by the new
>> version of Spark.
>>
>>
>>
>> So I want to know any new setup I should set in Spark 1.5 to make it
>> work?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Shuai
>>
>>
>>
>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com]
>> *Sent:* Wednesday, November 04, 2015 3:22 PM
>> *To:* user@spark.apache.org
>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
>> java.lang.OutOfMemoryError: Java heap space
>>
>>
>>
>> Hi All,
>>
>>
>>
>> I have a program which actually run a bit complex business (join) in
>> spark. And I have below exception:
>>
>>
>>
>> I running on Spark 1.5, and with parameter:
>>
>>
>>
>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
>> --executor-memory=45G –class …
>>
>>
>>
>> Some other setup:
>>
>>
>>
>> sparkConf.set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max",
>> "2047m");
>>
>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
>> "104857600");
>>
>>
>>
>> This is running on AWS c3*8xlarge instance. I am not sure what kind of
>> parameter I should set if I have below OutOfMemoryError exception.
>>
>>
>>
>> #
>>
>> # java.lang.OutOfMemoryError: Java heap space
>>
>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>
>> #   Executing /bin/sh -c "kill -9 10181"...
>>
>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError:
>> Java heap space
>>
>> at
>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>> Source)
>>
>> at
>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380)
>>
>> at
>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>> at
>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>> at
>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>
>> at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>
>> at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> Any hint will be very helpful.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Shuai
>>
>
>


Re: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space -- Work in 1.4, but 1.5 doesn't

2015-12-15 Thread Deenar Toraskar
Hi

I have created an issue for this
https://issues.apache.org/jira/browse/SPARK-12358

Regards
Deenar

On 16 December 2015 at 06:21, Deenar Toraskar 
wrote:

>
>
> On 16 December 2015 at 06:19, Deenar Toraskar <
> deenar.toras...@thinkreactive.co.uk> wrote:
>
>> Hi
>>
>> I had the same problem. There is a query with a lot of small tables (5x)
>> all below the broadcast threshold and Spark is broadcasting all these
>> tables together without checking if there is sufficient memory available.
>>
>> I got around this issue by reducing the
>> *spark.sql.autoBroadcastJoinThreshold* to stop broadcasting the bigger
>> tables in the query.
>>
>> This looks like a issue to me. A fix would be to
>> a) ensure that in addition to the per table threshold, there is a total
>> broadcast size per query, so only data upto that limit is broadcast
>> preventing executors running out of memory.
>>
>> Shall I raise a JIRA for this?
>>
>> Regards
>> Deenar
>>
>>
>> On 4 November 2015 at 22:55, Shuai Zheng  wrote:
>>
>>> And an update is: this ONLY happen in Spark 1.5, I try to run it under
>>> Spark 1.4 and 1.4.1, there are no issue (the program is developed under
>>> Spark 1.4 last time, and I just re-test it, it works). So this is proven
>>> that there is no issue on the logic and data, it is caused by the new
>>> version of Spark.
>>>
>>>
>>>
>>> So I want to know any new setup I should set in Spark 1.5 to make it
>>> work?
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Shuai
>>>
>>>
>>>
>>> *From:* Shuai Zheng [mailto:szheng.c...@gmail.com]
>>> *Sent:* Wednesday, November 04, 2015 3:22 PM
>>> *To:* user@spark.apache.org
>>> *Subject:* [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I have a program which actually run a bit complex business (join) in
>>> spark. And I have below exception:
>>>
>>>
>>>
>>> I running on Spark 1.5, and with parameter:
>>>
>>>
>>>
>>> spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
>>> --executor-memory=45G –class …
>>>
>>>
>>>
>>> Some other setup:
>>>
>>>
>>>
>>> sparkConf.set("spark.serializer",
>>> "org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buffer.max",
>>> "2047m");
>>>
>>> sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
>>> -XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
>>> "104857600");
>>>
>>>
>>>
>>> This is running on AWS c3*8xlarge instance. I am not sure what kind of
>>> parameter I should set if I have below OutOfMemoryError exception.
>>>
>>>
>>>
>>> #
>>>
>>> # java.lang.OutOfMemoryError: Java heap space
>>>
>>> # -XX:OnOutOfMemoryError="kill -9 %p"
>>>
>>> #   Executing /bin/sh -c "kill -9 10181"...
>>>
>>> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError:
>>> Java heap space
>>>
>>> at
>>> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>>> Source)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelation.scala:380)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:123)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>> at
>>> org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:100)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>> at
>>> org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:85)
>>>
>>> at
>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>>>
>>> at
>>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>> Any hint will be very helpful.
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Shuai
>>>
>>
>>
>


RE: [Spark 1.5]: Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java heap space -- Work in 1.4, but 1.5 doesn't

2015-11-04 Thread Shuai Zheng
And an update is: this ONLY happen in Spark 1.5, I try to run it under Spark
1.4 and 1.4.1, there are no issue (the program is developed under Spark 1.4
last time, and I just re-test it, it works). So this is proven that there is
no issue on the logic and data, it is caused by the new version of Spark.

 

So I want to know any new setup I should set in Spark 1.5 to make it work? 

 

Regards,

 

Shuai

 

From: Shuai Zheng [mailto:szheng.c...@gmail.com] 
Sent: Wednesday, November 04, 2015 3:22 PM
To: user@spark.apache.org
Subject: [Spark 1.5]: Exception in thread "broadcast-hash-join-2"
java.lang.OutOfMemoryError: Java heap space

 

Hi All,

 

I have a program which actually run a bit complex business (join) in spark.
And I have below exception:

 

I running on Spark 1.5, and with parameter:

 

spark-submit --deploy-mode client --executor-cores=24 --driver-memory=2G
--executor-memory=45G -class . 

 

Some other setup:

 

sparkConf.set("spark.serializer",
"org.apache.spark.serializer.KryoSerializer").set("spark.kryoserializer.buff
er.max", "2047m");

sparkConf.set("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps").set("spark.sql.autoBroadcastJoinThreshold",
"104857600");

 

This is running on AWS c3*8xlarge instance. I am not sure what kind of
parameter I should set if I have below OutOfMemoryError exception.

 

#

# java.lang.OutOfMemoryError: Java heap space

# -XX:OnOutOfMemoryError="kill -9 %p"

#   Executing /bin/sh -c "kill -9 10181"...

Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java
heap space

at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProje
ction.apply(Unknown Source)

at
org.apache.spark.sql.execution.joins.UnsafeHashedRelation$.apply(HashedRelat
ion.scala:380)

at
org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.sc
ala:123)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:95)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)

at
org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.sc
ala:100)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1.apply(BroadcastHashOuterJoin.scala:85)

at
org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadca
stFuture$1.apply(BroadcastHashOuterJoin.scala:85)

at
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.
scala:24)

at
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

at java.lang.Thread.run(Thread.java:745)

 

Any hint will be very helpful.

 

Regards,

 

Shuai