I have an iterative algorithm, the results of each iteration are sent
to master with .collect() and then sent to the workers as a broadcast
variable. I get heap space problems after a few iterations (stacktrace
below). This is expected; I only have enough space for a few copies of
my broadcast variables.
I've tried: System.setProperty("spark.cleaner.ttl", "1800000")
I've found: https://github.com/mesos/spark/pull/771 , but I am not
sure what happened with that pull.
What else can I do?
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.IdentityHashMap.resize(IdentityHashMap.java:452)
at java.util.IdentityHashMap.put(IdentityHashMap.java:428)
at spark.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:114)
at
spark.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:160)
at
spark.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:159)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:76)
at spark.SizeEstimator$.visitSingleObject(SizeEstimator.scala:159)
at
spark.SizeEstimator$.spark$SizeEstimator$$estimate(SizeEstimator.scala:143)
at spark.SizeEstimator$.estimate(SizeEstimator.scala:137)
at spark.storage.MemoryStore.putValues(MemoryStore.scala:55)
at spark.storage.BlockManager.liftedTree1$1(BlockManager.scala:538)
at spark.storage.BlockManager.put(BlockManager.scala:534)
at spark.storage.BlockManager.put(BlockManager.scala:485)
at spark.storage.BlockManager.putSingle(BlockManager.scala:721)
at spark.broadcast.HttpBroadcast.<init>(HttpBroadcast.scala:24)
at
spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcast.scala:54)
at
spark.broadcast.HttpBroadcastFactory.newBroadcast(HttpBroadcast.scala:50)
at spark.broadcast.BroadcastManager.newBroadcast(Broadcast.scala:50)
at spark.SparkContext.broadcast(SparkContext.scala:439)
at
com.hrl.issl.osi.scripts.RunGeocoderSpark$$anonfun$coordinateDescentIterations$1.apply$mcVI$sp(RunGeocoderSpark.scala:189)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:78)
at
com.hrl.issl.osi.scripts.RunGeocoderSpark$.coordinateDescentIterations(RunGeocoderSpark.scala:186)
at
com.hrl.issl.osi.scripts.RunGeocoderSpark$.main(RunGeocoderSpark.scala:117)
at
com.hrl.issl.osi.scripts.RunGeocoderSpark.main(RunGeocoderSpark.scala)