Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/3009#issuecomment-63426995
  
    One super nit: the stage progress shows "X / Y" (with spaces) whereas the 
task one shows "X/Y" (no spaces). Can you make these consistent?
    
    In terms of the bigger issue, I agree with the sentiments expressed by both 
@markhamstra and @pwendell .
    
    Now, in terms of showing some estimate of total tasks, I looked into this a 
bit.  The bad news is that I looked further into what the DAGScheduler reports 
when a job starts, and it does in fact report all stages, even those that may 
have already completed -- so the example I suggested above resulted in a job 
under "Completed jobs" that says "1/2" stages are complete (but "2/2" tasks).  
The good news is that the JobProgressListener already tracks the stages that 
have already completed, so when we get the JobStartEvent, we can subtract the 
stage IDs that are already done.  This is imperfect for many of the reasons you 
mentioned above (e.g., a job could hang with "0" stages running if a fetch 
failed happens for a stage that we thought was complete when the job was 
submitted) but I think is most intuitive in the general case.
    
    This means you will need to change the SparkListenerJobStart event to 
expose the number of tasks for each stage (or possibly just pass the StageInfos 
-- based on what @markhamstra said about wanting to expose as much info as 
possible for others to use? -- although there are possibly some race conditions 
here with the StageInfo being modified by the DAGScheduler concurrently with a 
listener using it).  This looks relatively straightforward.
    
    Thanks for the detailed writeup Josh!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to