so I have the following...

broadcast some stuff
cache an rdd
do a bunch of stuff, eventually calling actions which reduce it to an
acceptable size

I'm getting an OOM on the driver (well, GC is getting out of control),
largely because I have a lot of partitions and it looks like the job
history is getting too large. ttl is an option, but the downside is that it
will also delete the rdd...this isn't really what I want. what I want is to
keep my in memory data structures (the rdd, broadcast variable, etc) but
get rid of the old metadata that I don't need anymore (ie tasks that have
executed).

Is there a way to achieve this?

Reply via email to