Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/4029#issuecomment-69976817
  
    This is a nice patch, but I wonder whether there's a smaller fix that 
doesn't require changing SparkListener events; that will make it easier to 
backport that patch to `branch-1.2`.  The job page already knows the last stage 
in the job (the result stage), so I think we might be able to use the final 
stage's completion time as the job completion time and the first stage's 
submission time as the job start time.  However, there are a couple of 
corner-cases that this might miss: I could submit a job that spends a bunch of 
time queued behind other jobs before its first stage starts running, in which 
case it would be helpful to be able to distinguish between scheduler delays and 
stage durations.  Similarly, there might be a corner-case related to the job 
completion time if we have a job that spends a lot of time fetching results 
back to the driver after they've been stored in the block manager by completed 
tasks.
    
    So, I guess the approach here seems like the right fix.  I'd guess we might 
be able to do a separate fix in branch-1.2 to use the first/last stage time 
approximations.
    
    I have a couple of comments on the code here, so I'll comment on those 
inline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to