Github user JeetKunDoug commented on the issue:

    https://github.com/apache/spark/pull/21322
  
    @cloud-fan The use-case we have in mind (and are currently using Broadcast 
+ finalizers for) is the case where you have, for example, a connection pool 
that is, for one reason or another, scoped to a particular stage in the job. In 
this case, the pool itself is expensive to create, and can be shared across 
tasks, which makes closing the object in a try/finally for a single task, or 
even a single partition, as you'd end up potentially closing the resource 
early, and having to rebuild it several times. The fundamental trick is to 
figure out a way to allow the driver to define the scope of the shared resource 
(like a broadcast variable) and ensure it's really memory-only, so if there's a 
better way to use the existing broadcast variable infrastructure to do this, 
and prevent this kind of broadcast variable from being purged from the 
MemoryStore, then I'm all for it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to