Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan The use-case we have in mind (and are currently using Broadcast
+ finalizers for) is the case where you have, for example, a connection pool
that is, for one reason or another, scoped to a particular stage in the job. In
this case, the pool itself is expensive to create, and can be shared across
tasks, which makes closing the object in a try/finally for a single task, or
even a single partition, as you'd end up potentially closing the resource
early, and having to rebuild it several times. The fundamental trick is to
figure out a way to allow the driver to define the scope of the shared resource
(like a broadcast variable) and ensure it's really memory-only, so if there's a
better way to use the existing broadcast variable infrastructure to do this,
and prevent this kind of broadcast variable from being purged from the
MemoryStore, then I'm all for it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]