reading the code, is there any reason why setting spark.cleaner.ttl.MAP_OUTPUT_TRACKER directly won't get picked up?
2015-11-17 14:45 GMT-05:00 Jonathan Coveney <jcove...@gmail.com>: > so I have the following... > > broadcast some stuff > cache an rdd > do a bunch of stuff, eventually calling actions which reduce it to an > acceptable size > > I'm getting an OOM on the driver (well, GC is getting out of control), > largely because I have a lot of partitions and it looks like the job > history is getting too large. ttl is an option, but the downside is that it > will also delete the rdd...this isn't really what I want. what I want is to > keep my in memory data structures (the rdd, broadcast variable, etc) but > get rid of the old metadata that I don't need anymore (ie tasks that have > executed). > > Is there a way to achieve this? >