Github user JeetKunDoug commented on the issue:

    https://github.com/apache/spark/pull/21322
  
    @cloud-fan (Also left on the JIRA ticket) - Sorry this has dropped off my 
radar for so long - work + life took me away from it for a while. So looking at 
the PR review comments and better understanding Broadcast Variable behavior 
(and some of the changes that took place in the 2.X series), it seems like 
simply trying to close Broadcast variables won't work as intended. However, I 
believe the underlying concept (driver-scoped shared variables, where the 
variable lives until the job is done or the driver removes it) is still worth 
pursuing. Being able to scope shared resources (like DB connection pools, which 
may need to change per phase of a job, or be able to be disposed of early in a 
process, which makes static variables not useful). Given that, I'd like to 
propose we add a new concept, similar to Broadcast Variables, called, perhaps, 
Scoped Variables. The intent would be for these to be scoped by the driver, be 
relatively small from a memory-consumption perspective (unlike broa
 dcast variables, which can be much larger), and to be held in memory until 
explicitly removed by the driver. Most of the infrastructure work for broadcast 
variables supports this use-case, but we'd need to have either a "non-purgable" 
type in the MemoryStore, or some other store specific to these new scoped 
variables, in order to prevent them from being evicted like cached items are.
     
    Thoughts on this? I'll start working on updating the PR to support 
something like this sometime today, but it might still take a while to get 
something workable put together, so I'd appreciate any feedback when someone 
has the time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to