[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

JoshRosen Mon, 28 Jul 2014 09:19:49 -0700

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1082#issuecomment-50360869
  
    Since broadcast variables also have an `unpersist()` method, should 
`unpersistAll()` unpersist them too?  If not, we should probably give it a more 
specific name, such as `unpersistAllRdds()`.
    
    What do you think is the typical use-case for wanting to unpersist all 
state while retaining RDD lineage, etc?  I imagine that a Spark Job Server that 
runs multiple jobs using the same context might want to unpersist _portions_ of 
its state as jobs finish, but bulk-unpersisting all state might be a bad 
approach here since it could impact the performance of other jobs.
    
    It seems like this job-server use-case could be addressed by a mechanism 
similar to region-based memory management in which unpersist-able objects are 
created in some persistence context, persistence contexts can be nested, and an 
entire persistence content's resources can be freed at once.  This captures 
`unpersistAll()` as a special case in which every object is allocated as part 
of the root persistence context.  Individual jobs could have their own 
persistence contexts, allowing users to unpersist only the new broadcast 
variables / RDDs associated with a particular job / unit of work. 
    
    This "persistence context" approach might be over-engineered, but I think 
it would be helpful to come up with a small set of real use-cases where batch 
unpersistence is needed to see whether `unpersistAll()` adequately addresses 
them.  `unpersist()` is essentially manual memory management, so maybe we can 
borrow patterns from there, too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2141] Adding getPersistentRddIds and un...

Reply via email to