reading the code, is there any reason why
setting spark.cleaner.ttl.MAP_OUTPUT_TRACKER directly won't get picked up?

2015-11-17 14:45 GMT-05:00 Jonathan Coveney <jcove...@gmail.com>:

> so I have the following...
>
> broadcast some stuff
> cache an rdd
> do a bunch of stuff, eventually calling actions which reduce it to an
> acceptable size
>
> I'm getting an OOM on the driver (well, GC is getting out of control),
> largely because I have a lot of partitions and it looks like the job
> history is getting too large. ttl is an option, but the downside is that it
> will also delete the rdd...this isn't really what I want. what I want is to
> keep my in memory data structures (the rdd, broadcast variable, etc) but
> get rid of the old metadata that I don't need anymore (ie tasks that have
> executed).
>
> Is there a way to achieve this?
>

Reply via email to