Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/5550#issuecomment-98805634
Sorry for letting this sit in my review queue for so long.
The original idea behind skipped stages was to indicate that there were
pending stages that _might_ have been needed to be computed if previous stages'
outputs were lost, but which weren't actually executed as part of this job.
Intuitively, I think it makes sense for a completed job's progress bar to
be at 100%. It looks like this patch is trying to address some corner-cases
where we report a numeric progress (_n/m_) that's greater than 100%, which
seems to happen when we either have pending stages that are actually executed
instead of being skipped or when stages are recomputed due to failures. I
guess there are a couple of options here for fixing the numeric progress
indicator: we can either decrease the numerator or increase the denominator.
Intuitively, it probably makes sense for the number of stages listed in the
progress bar to match the number of stages actually completed on the job
details page, so I guess that's increasing the denominator.
The fact that we display _(j failed)_ already conveys that extra work was
performed, so I don't think we need to allow > 100% progress indicators to
convey that the job may have done extra work due to failures.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]