Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3009#issuecomment-63690828
  
    > There are some DAGScheduler tests that verify that all of the data 
structures are empty after jobs run [...] Can we just do something similar for 
the UI?
    
    Sure, there are a few subtleties here that make this trickier.  In 
JobProgressListener, we need to check that our datastructures do not grow 
without bound (not that they're empty), so I think the assertion would look 
something like "once I've run at least `spark.ui.retainedStages` stages and 
`spark.ui.retainedJobs` jobs, then the size of JobProgressListener data 
structures that track non-active jobs / stages will not grow" and "once all 
active jobs / stages complete, then the data structures associated with active 
jobs / stages will be empty."  I guess we also need a third constraint that's 
something like "any job / stage that's active will eventually become 
non-active."
    
    I've opened a separate JIRA to fix the existing memory leak so that we can 
review / merge that fix separately (since this PR is already getting pretty 
big).
    
    > Should I review this again yet, or should I wait for you to fix the 
phantom stages thing?
    
    I'd hold off on further review until this afternoon until I finish the 
phantom stages thing.  Once I fix that and the memory leak, then I think this 
will be good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to