Hi Konstantin,

I just ran into the same problem. I mitigated the issue by reducing the
number of cores when I executed the job which otherwise it won't be able to
finish.

Unlike many people believes, it might not means that you were running out
of memory. A better answer can be found here:
http://stackoverflow.com/questions/4371505/gc-overhead-limit-exceeded and
copied here as a reference:

"Excessive GC Time and OutOfMemoryError

The concurrent collector will throw an OutOfMemoryError if too much time is
being spent in garbage collection: if more than 98% of the total time is
spent in garbage collection and less than 2% of the heap is recovered, an
OutOfMemoryError will be thrown. This feature is designed to prevent
applications from running for an extended period of time while making
little or no progress because the heap is too small. If necessary, this
feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the
command line.

The policy is the same as that in the parallel collector, except that time
spent performing concurrent collections is not counted toward the 98% time
limit. In other words, only collections performed while the application is
stopped count toward excessive GC time. Such collections are typically due
to a concurrent mode failure or an explicit collection request (e.g., a
call to System.gc())."

It could be that there are many tasks running in the same node and they all
compete for running GCs which slow things down and trigger the error you
saw. By reducing the number of cores, there are more cpu resources
available to a task so the GC could finish before the error gets throw.

HTH,

Jerry


On Tue, Jul 8, 2014 at 1:35 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> There is a difference from actual GC overhead, which can be reduced by
> reusing objects, versus this error, which actually means you ran out of
> memory. This error can probably be relieved by increasing your executor
> heap size, unless your data is corrupt and it is allocating huge arrays, or
> you are otherwise keeping too much memory around.
>
> For your other question, you can reuse objects similar to MapReduce
> (HadoopRDD does this by actually using Hadoop's Writables, for instance),
> but the general Spark APIs don't support this because mutable objects are
> not friendly to caching or serializing.
>
>
> On Tue, Jul 8, 2014 at 9:27 AM, Konstantin Kudryavtsev <
> kudryavtsev.konstan...@gmail.com> wrote:
>
>> Hi all,
>>
>> I faced with the next exception during map step:
>> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit
>> exceeded)
>> java.lang.reflect.Array.newInstance(Array.java:70)
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:325)
>> com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:293)
>> com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>> com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>> com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
>> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
>> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:43)
>> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:115)
>> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
>> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:155)
>> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:154)
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> I'm using Spark 1.0In map I create new object each time, as I understand
>> I can't reuse object similar to MapReduce development? I wondered, if you
>> could point me how is it possible to avoid GC overhead...thank you in
>> advance
>>
>> Thank you,
>> Konstantin Kudryavtsev
>>
>
>

Reply via email to