squito commented on issue #24892: [SPARK-25341][Core] Support rolling back a 
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-520482491
 
 
   Sorry I don't understand -- why don't you want to support speculative 
execution?
   
   Correctness before performance, yes.
   
   On the assumptions about task independence -- yes, that *was* the 
assumption, before this whole thread of issues related to non-determinstic 
tasks and stage retry.
   
   Fetch failures are more likely on large clusters & large workloads, 
precisely where speculative execution is important too.  If we couldn't get it 
to work together, then I would totally agree we should go for correctness.  But 
I think using the global TID would give us the behavior we want.
   
   I also think the global TID is simpler.  For one, debugging is simpler -- 
the shuffle id is actually the shuffle id; there isn't some other state that is 
tracked separately to know which block to get.
   
   If you're really just concerned about the overhead of an additional field in 
the shuffle block, I think you could even swap out the original map partition 
id for the TID of the map task (though that would be more complex in other 
ways, the `MapStatus` would need to track the TID since its position in the 
array would no longer be sufficient).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to