Hi, I am using Spark 1.0.1. I am trying to debug a OOM exception i saw during a join step.
Basically, i have a RDD of rows, that i am joining with another RDD of tuples. Some of the tasks succeed but a fair number failed with OOM exception with stack below. The stack belongs to the 'reducer' that is reading shuffle output from the 'mapper'. My question is what's the object being deserialized here - just a portion of an RDD or the whole RDD partition assigned to current reducer? The rows in the RDD could be large, but definitely not something that would run to 100s of MBs in size, and thus run out of memory. Also, is there a way to determine size of the object being deserialized that results in the error (either by looking at some staging hdfs dir or logs)? java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded} java.util.Arrays.copyOf(Arrays.java:2367) java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) java.lang.StringBuilder.append(StringBuilder.java:204) java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3142) java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3050) java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2863) java.io.ObjectInputStream.readString(ObjectInputStream.java:1636) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1339) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) java.util.ArrayList.readObject(ArrayList.java:771) sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031) Thanks, pala