Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9610#discussion_r44678796
  
    --- Diff: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ---
    @@ -93,6 +95,10 @@ private[spark] class IndexShuffleBlockResolver(conf: 
SparkConf) extends ShuffleB
         } {
           out.close()
         }
    +    indexFile.deleteOnExit()
    +    if (!tmp.renameTo(indexFile)) {
    +      throw new IOException(s"fail to rename index file $tmp to 
$indexFile")
    --- End diff --
    
    Can you test for this? I think the worry was about different TaskSets 
attempting the same map stage. Imagine that attempt 1 of the stage successfully 
completes a task, and sends back a map output status, but that status gets 
ignored because that stage attempt got cancelled. Attempt 2 might then fail to 
send a new status for it.
    
    There seem to be two ways to fix it if this problem can actually occur -- 
either add MapOutputStatuses even from failed task sets or mark this new task 
as successful if a file exists.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to