Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3009#issuecomment-63690828
> There are some DAGScheduler tests that verify that all of the data
structures are empty after jobs run [...] Can we just do something similar for
the UI?
Sure, there are a few subtleties here that make this trickier. In
JobProgressListener, we need to check that our datastructures do not grow
without bound (not that they're empty), so I think the assertion would look
something like "once I've run at least `spark.ui.retainedStages` stages and
`spark.ui.retainedJobs` jobs, then the size of JobProgressListener data
structures that track non-active jobs / stages will not grow" and "once all
active jobs / stages complete, then the data structures associated with active
jobs / stages will be empty." I guess we also need a third constraint that's
something like "any job / stage that's active will eventually become
non-active."
I've opened a separate JIRA to fix the existing memory leak so that we can
review / merge that fix separately (since this PR is already getting pretty
big).
> Should I review this again yet, or should I wait for you to fix the
phantom stages thing?
I'd hold off on further review until this afternoon until I finish the
phantom stages thing. Once I fix that and the memory leak, then I think this
will be good to go.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]