There is a difference from actual GC overhead, which can be reduced by
reusing objects, versus this error, which actually means you ran out of
memory. This error can probably be relieved by increasing your executor
heap size, unless your data is corrupt and it is allocating huge arrays, or
you are otherwise keeping too much memory around.

For your other question, you can reuse objects similar to MapReduce
(HadoopRDD does this by actually using Hadoop's Writables, for instance),
but the general Spark APIs don't support this because mutable objects are
not friendly to caching or serializing.


On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev <
kudryavtsev.konstan...@gmail.com> wrote:

> Hi all,
>
> I faced with the next exception during map step:
> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit
> exceeded)
> java.lang.reflect.Array.newInstance(Array.java:70)
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:325)
> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
> com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> I'm using Spark 1.0In map I create new object each time, as I understand
> I can't reuse object similar to MapReduce development? I wondered, if you
> could point me how is it possible to avoid GC overhead...thank you in
> advance
>
> Thank you,
> Konstantin Kudryavtsev
>

Reply via email to