squito commented on issue #26614: [SPARK-29976] New conf for single task stage 
speculation
URL: https://github.com/apache/spark/pull/26614#issuecomment-558271210
 
 
   yeah I am also torn like @tgravescs .  There are a ton of corner cases.  I 
really don't like special casing one task, but I'm not sure of the clean way to 
configure this.  Even with 10 tasks, if you've got 4 cores per executor you've 
easily got 4 tasks stuck on your one bad executor, and with a default 
speculation quantile of 0.75 you wouldn't finish 8 tasks successfully to start 
speculation.  If you add in the fact that the poor performance may be across an 
entire node, and 64 cores per node is not uncommon, the limit goes way higher.
   
   speculative execution is always a heuristic, we know its not going to be 
perfect.  I feel like when you enable speculation, you are saying you're 
willing to accept some wasted resources, so its more acceptable to run some 
speculative tasks when you don't really need to. But how much waste is OK?  In 
Tom's exxample, say you had 10k tasks that each took an hour, but all are 
actually running fine -- the waste is pretty serious, you'll launch a 
speculative version of each task so its 10k cpu-hours wasted.
   
   One alternative might be to only have this kick in when all tasks are 
running on the same host (the TSM already knows the hosts of the running task, 
its in `TaskInfo`, it would be easy to see if there is just one host used 
across all tasks).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to