Hi Christian,
- What is the processing time of each of your Batch,is it exceeding 15 seconds. - How many jobs are queued. - Can you take a heap dump and see which objects are occupying the heap. Regards, Shahbaz On Tue, May 31, 2016 at 12:21 AM, christian.dancu...@rbc.com < christian.dancu...@rbc.com> wrote: > Hi All, > > We have a spark streaming v1.4/java 8 application that slows down and > eventually runs out of heap space. The less driver memory, the faster it > happens. > > Appended is our spark configuration and a snapshot of the of heap taken > using jmap on the driver process. The RDDInfo, $colon$colon and [C objects > keep growing as we observe. We also tried to use G1GC, but it acts the > same. > > Our dependency graph contains multiple updateStateByKey() calls. For each, > we explicitly set the checkpoint interval to 240 seconds. > > We have our batch interval set to 15 seconds; with no delays at the start > of > the process. > > Spark configuration (Spark Driver Memory: 6GB, Spark Executor Memory: 2GB): > spark.streaming.minRememberDuration=180s > spark.ui.showConsoleProgress=false > spark.streaming.receiver.writeAheadLog.enable=true > spark.streaming.unpersist=true > spark.streaming.stopGracefullyOnShutdown=true > spark.streaming.ui.retainedBatches=10 > spark.ui.retainedJobs=10 > spark.ui.retainedStages=10 > spark.worker.ui.retainedExecutors=10 > spark.worker.ui.retainedDrivers=10 > spark.sql.ui.retainedExecutions=10 > spark.serializer=org.apache.spark.serializer.KryoSerializer > spark.kryoserializer.buffer.max=128m > > num #instances #bytes class name > ---------------------------------------------- > 1: 8828200 565004800 org.apache.spark.storage.RDDInfo > 2: 20794893 499077432 scala.collection.immutable.$colon$colon > 3: 9646097 459928736 [C > 4: 9644398 231465552 java.lang.String > 5: 12760625 204170000 java.lang.Integer > 6: 21326 111198632 [B > 7: 556959 44661232 [Lscala.collection.mutable.HashEntry; > 8: 1179788 37753216 > java.util.concurrent.ConcurrentHashMap$Node > 9: 1169264 37416448 java.util.Hashtable$Entry > 10: 552707 30951592 org.apache.spark.scheduler.StageInfo > 11: 367107 23084712 [Ljava.lang.Object; > 12: 556948 22277920 scala.collection.mutable.HashMap > 13: 2787 22145568 > [Ljava.util.concurrent.ConcurrentHashMap$Node; > 14: 116997 12167688 org.apache.spark.executor.TaskMetrics > 15: 360425 8650200 > java.util.concurrent.LinkedBlockingQueue$Node > 16: 360417 8650008 > org.apache.spark.deploy.history.yarn.HandleSparkEvent > 17: 8332 8478088 [Ljava.util.Hashtable$Entry; > 18: 351061 8425464 scala.collection.mutable.ArrayBuffer > 19: 116963 8421336 org.apache.spark.scheduler.TaskInfo > 20: 446136 7138176 scala.Some > 21: 211968 5087232 > io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry > 22: 116963 4678520 > org.apache.spark.scheduler.SparkListenerTaskEnd > 23: 107679 4307160 > org.apache.spark.executor.ShuffleWriteMetrics > 24: 72162 4041072 > org.apache.spark.executor.ShuffleReadMetrics > 25: 117223 3751136 scala.collection.mutable.ListBuffer > 26: 81473 3258920 org.apache.spark.executor.InputMetrics > 27: 125903 3021672 org.apache.spark.rdd.RDDOperationScope > 28: 91455 2926560 java.util.HashMap$Node > 29: 89 2917776 > [Lscala.concurrent.forkjoin.ForkJoinTask; > 30: 116957 2806968 > org.apache.spark.scheduler.SparkListenerTaskStart > 31: 2122 2188568 [Lorg.apache.spark.scheduler.StageInfo; > 32: 16411 1819816 java.lang.Class > 33: 87862 1405792 > org.apache.spark.scheduler.SparkListenerUnpersistRDD > 34: 22915 916600 org.apache.spark.storage.BlockStatus > 35: 5887 895568 [Ljava.util.HashMap$Node; > 36: 480 855552 > [Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry; > 37: 7569 834968 [I > 38: 9626 770080 org.apache.spark.rdd.MapPartitionsRDD > 39: 31748 761952 java.lang.Long > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >