[GitHub] spark pull request: [SPARK-10193] [core] [wip] Eliminate Skipped S...

squito Tue, 25 Aug 2015 14:14:03 -0700

Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/8427#issuecomment-134743609
  
    @markhamstra yup, no question this will increase memory usage.  The 
question is, should we consider it anyway?  Maybe you were implicitly answering 
"no", but I'm gonna make my case again in any case :)
    
    Clearly, if you have long running jobs w/ lots of stages, and you never do 
anything to clean them up, then  `stageIdToStage` is going to eat up all your 
memory.  But that will happen anyway, you'll already run out of memory because 
of `MapOutputTracker` storing shuffle output (and most likely the huge number 
of RDDs you've created that can't be gc'ed either).  We add a few more hashmap 
entries and more `Stage` objects, which shouldn't contain anything huge -- no 
bigger than what we are already tracking.  Certainly it'll have an effect, 
though.
    
    I think its a pretty big usability improvement, so worth considering, but 
that is totally subjective.  I realize this is a bit hand wavy now -- I'll try 
to quantify the memory usage effect so we can make a more informed decision (if 
others are still interested somewhat).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10193] [core] [wip] Eliminate Skipped S...

Reply via email to