Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5550#issuecomment-98805634
  
    Sorry for letting this sit in my review queue for so long.
    
    The original idea behind skipped stages was to indicate that there were 
pending stages that _might_ have been needed to be computed if previous stages' 
outputs were lost, but which weren't actually executed as part of this job.
    
    Intuitively, I think it makes sense for a completed job's progress bar to 
be at 100%.  It looks like this patch is trying to address some corner-cases 
where we report a numeric progress (_n/m_) that's greater than 100%, which 
seems to happen when we either have pending stages that are actually executed 
instead of being skipped or when stages are recomputed due to failures.  I 
guess there are a couple of options here for fixing the numeric progress 
indicator: we can either decrease the numerator or increase the denominator.  
Intuitively, it probably makes sense for the number of stages listed in the 
progress bar to match the number of stages actually completed on the job 
details page, so I guess that's increasing the denominator.
    
    The fact that we display _(j failed)_ already conveys that extra work was 
performed, so I don't think we need to allow > 100% progress indicators to 
convey that the job may have done extra work due to failures.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to