Github user JeetKunDoug commented on the issue:
https://github.com/apache/spark/pull/21322
@cloud-fan (Also left on the JIRA ticket) - Sorry this has dropped off my
radar for so long - work + life took me away from it for a while. So looking at
the PR review comments and better understanding Broadcast Variable behavior
(and some of the changes that took place in the 2.X series), it seems like
simply trying to close Broadcast variables won't work as intended. However, I
believe the underlying concept (driver-scoped shared variables, where the
variable lives until the job is done or the driver removes it) is still worth
pursuing. Being able to scope shared resources (like DB connection pools, which
may need to change per phase of a job, or be able to be disposed of early in a
process, which makes static variables not useful). Given that, I'd like to
propose we add a new concept, similar to Broadcast Variables, called, perhaps,
Scoped Variables. The intent would be for these to be scoped by the driver, be
relatively small from a memory-consumption perspective (unlike broa
dcast variables, which can be much larger), and to be held in memory until
explicitly removed by the driver. Most of the infrastructure work for broadcast
variables supports this use-case, but we'd need to have either a "non-purgable"
type in the MemoryStore, or some other store specific to these new scoped
variables, in order to prevent them from being evicted like cached items are.
Thoughts on this? I'll start working on updating the PR to support
something like this sometime today, but it might still take a while to get
something workable put together, so I'd appreciate any feedback when someone
has the time.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]