How much data you're dealing with and how skewed it's? The code comes from Spark as far as I can see. To overcome the problem, you have a few things to try:
1. Increase executor memory. 2. Try Hive's skew join. 3. Rewrite your query. Thanks, Xuefu On Sat, Nov 28, 2015 at 12:37 AM, Jone Zhang <[email protected]> wrote: > Add a little: > The Hive version is 1.2.1 > The Spark version is 1.4.1 > The Hadoop version is 2.5.1 > > 2015-11-26 20:36 GMT+08:00 Jone Zhang <[email protected]>: > >> Here is an error message: >> >> java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOf(Arrays.java:2245) >> at java.util.Arrays.copyOf(Arrays.java:2219) >> at java.util.ArrayList.grow(ArrayList.java:242) >> at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216) >> at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208) >> at java.util.ArrayList.add(ArrayList.java:440) >> at >> org.apache.hadoop.hive.ql.exec.spark.SortByShuffler$ShuffleFunction$1.next(SortByShuffler.java:95) >> at >> org.apache.hadoop.hive.ql.exec.spark.SortByShuffler$ShuffleFunction$1.next(SortByShuffler.java:70) >> at >> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) >> at >> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) >> at >> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:216) >> at >> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >> at org.apache.spark.scheduler.Task.run(Task.scala:70) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> >> And the note from the SortByShuffler.java >> // TODO: implement this by accumulating rows with the same >> key into a list. >> // Note that this list needs to improved to prevent >> excessive memory usage, but this >> // can be done in later phase. >> >> >> The join sql run success when i use hive on mapreduce. >> So how do mapreduce deal with it? >> And Is there plan to improved to prevent excessive memory usage? >> >> Best wishes! >> Thanks! >> > >
