Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    >  I guess on the RDD side its not called RoundRobinPartitioner 
    Thanks for clarifying @tgravescs ! I was looking at `RangePartitioner` and 
variants and was wondering what I was missing - did not make the obvious 
connection with sql :-)
    
    > If we can't come up with another solution, I would actually be ok with 
failing short term, its better then corruption
    
    If I understand correctly, the proposal is 
    * In `ShuffledRDD`, add a flag `orderSensitiveReducer` (?) - to track 
specific patterns identified which is order sensitive (repartition with shuffle 
= true, zip, etc).
    * If a task is getting re-executed as part of stage re-execution, if the 
flag is true, fail job.
      * Task re-execution as part of same stage, speculative execution, etc 
should not be an issue - since only one task completes.
      * ResultStage should not be affected.
      * I am unsure about how cache'ing data interacts here - might need some 
investigation.
    
    This looks like a reasonable stop gap until we fix the issue.
    
    It also allows for users to make progress by inserting a checkpoint before 
the order sensitive closure to unblock them.
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to