Hi,

I am using Spark 1.0.1. I am trying to debug a OOM exception i saw during a
join step.

Basically, i have a RDD of rows, that i am joining with another RDD of
tuples.

Some of the tasks succeed but a fair number failed with OOM exception with
stack below. The stack belongs to the 'reducer' that is reading shuffle
output from the 'mapper'.

My question is what's the object being deserialized here - just a portion
of an RDD or the whole RDD partition assigned to current reducer? The rows
in the RDD could be large, but definitely not something that would run to
100s of MBs in size, and thus run out of memory.

Also, is there a way to determine size of the object being deserialized
that results in the error (either by looking at some staging hdfs dir or
logs)?

java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead
limit exceeded}
java.util.Arrays.copyOf(Arrays.java:2367)
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
java.lang.StringBuilder.append(StringBuilder.java:204)
java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3142)
java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3050)
java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2863)
java.io.ObjectInputStream.readString(ObjectInputStream.java:1636)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1339)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
java.util.ArrayList.readObject(ArrayList.java:771)
sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031)



Thanks,
pala

Reply via email to