squito commented on issue #24892: [SPARK-25341][Core] Support rolling back a 
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-519191919
 
 
   > BTW, another way to fix this problem is: always include the task id (not 
task attempt id) in the shuffle block id. This works, with a larger overhead:
   
   yes, actually this is exactly what I meant.  I think we're actually talking 
about the same id -- unfortunately the naming here within spark isn't great.  
This is `TaskContext.taskAttemptId` : 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TaskContext.scala#L168-L172
  aka TID.  This what the new shuffle api is calling 
   (you are probably thinking of `TaskContext.attemptNumber()` : 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/TaskContext.scala#L162-L166
 which I know is sometimes referred to as task attempt as well, its very 
confusing)
   
   As we both said, using that has larger overhead, as you've got to put it in 
every shuffle block.  But I think it might be necessary for properly rolling 
back w/ speculative tasks, and I think its also necessary with a centralized 
shuffle store (@yifeih do you remember details here better?  I will need to 
look through the past discussions ...), where you may have multiple tasks for 
the same logical task, perhaps from task retries or speculative attempts, 
perhaps from stage retries, and you need to clearly know which one to use 
downstream.  in the scheme of things, that extra overhead seems pretty minimal.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to