Re: Java heap space occured when the amount of data is very large with the same key on join sql

Xuefu Zhang Sat, 28 Nov 2015 06:25:32 -0800

How much data you're dealing with and how skewed it's? The code comes from
Spark as far as I can see. To overcome the problem, you have a few things
to try:


1. Increase executor memory.
2. Try Hive's skew join.
3. Rewrite your query.

Thanks,
Xuefu

On Sat, Nov 28, 2015 at 12:37 AM, Jone Zhang <[email protected]> wrote:

> Add a little:
> The Hive version is 1.2.1
> The Spark version is 1.4.1
> The Hadoop version is 2.5.1
>
> 2015-11-26 20:36 GMT+08:00 Jone Zhang <[email protected]>:
>
>> Here is an error message:
>>
>> java.lang.OutOfMemoryError: Java heap space
>> at java.util.Arrays.copyOf(Arrays.java:2245)
>> at java.util.Arrays.copyOf(Arrays.java:2219)
>> at java.util.ArrayList.grow(ArrayList.java:242)
>> at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)
>> at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)
>> at java.util.ArrayList.add(ArrayList.java:440)
>> at
>> org.apache.hadoop.hive.ql.exec.spark.SortByShuffler$ShuffleFunction$1.next(SortByShuffler.java:95)
>> at
>> org.apache.hadoop.hive.ql.exec.spark.SortByShuffler$ShuffleFunction$1.next(SortByShuffler.java:70)
>> at
>> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>> at
>> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>> at
>> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:216)
>> at
>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>> at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> at org.apache.spark.scheduler.Task.run(Task.scala:70)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> And the note from the SortByShuffler.java
>>               // TODO: implement this by accumulating rows with the same
>> key into a list.
>>               // Note that this list needs to improved to prevent
>> excessive memory usage, but this
>>               // can be done in later phase.
>>
>>
>> The join sql run success when i use hive on mapreduce.
>> So how do mapreduce deal with it?
>> And Is there plan to improved to prevent excessive memory usage?
>>
>> Best wishes!
>> Thanks!
>>
>
>

Re: Java heap space occured when the amount of data is very large with the same key on join sql

Reply via email to