squito commented on issue #26614: [SPARK-29976][CORE] New conf for single task stage speculation URL: https://github.com/apache/spark/pull/26614#issuecomment-558746166 sure I think I'm OK with that, its a decent compromise. You wouldn't launch speculative tasks if you've got multiple executors on a bad node, but thats OK (IIRC we also won't make a dynamic allocation to get an executor on a new node, which would be needed to really handle that case). A couple of nitpicky points: * when you say the number of tasks <= number of slots of 1 executor -- is that the total number of tasks in the taskset, or the delta `minFinishedForSpeculation - numSuccessfulTasks`? The reason to do the delta is say you've got 10 tasks in the taskset, but the last 4 are all running on the bad executor. The taskset as a whole is too big to meet that condition, but with `minFinishedForSpeculation=7` and `numSuccessfulTasks=6` you'd meet the delta. * Doesn't it still need another config to decide what the timeout is in this case?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
