Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3009#issuecomment-63588578
  
    I just pushed a bunch of fixes to several UI issues, including several 
sorting problems.  The biggest change is the addition of a "pending" state on 
the job details page (@pwendell, the implementation here is much simpler than 
the JobProgressListener hacking that I mentioned earlier; this shouldn't have 
any GC issues).
    
    @kayousterhout:
    
    > The good news is that the JobProgressListener already tracks the stages 
that have already completed, so when we get the JobStartEvent, we can subtract 
the stage IDs that are already done. 
    
    I don't think that this will work, since it seems that the skipped stages 
are assigned new StageIds.  For instance, try
    
    ```
    val rdd = sc.parallelize(Seq(1, 2, 
3)).map(identity).groupBy(identity).map(identity).groupBy(identity).map(identity)
    rdd.count()
    rdd.count()
    ```
    
    In this case, both jobs will be submitted with three stage ids, but none of 
those stage ids will be the same.  You can see this by looking at the "all 
stages" page since there are many gaps in the sage number sequence due to all 
of the extra stage ids being assigned to stages that aren't run.
    
    Given this, is there an easy way to figure out which stages will be run?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to