Thomas Graves created SPARK-21695: ------------------------------------- Summary: Spark scheduler locality algorithm can take longer then expected Key: SPARK-21695 URL: https://issues.apache.org/jira/browse/SPARK-21695 Project: Spark Issue Type: Bug Components: Scheduler Affects Versions: 2.1.0 Reporter: Thomas Graves
Reference jira https://issues.apache.org/jira/browse/SPARK-21656 I'm seeing an issue with some jobs where the scheduler takes a long time to schedule tasks on executors. The default locality wait is 3 seconds so I was expecting that an executor should get some task on it in max 9 seconds (node local, rack local, any), but its taking way more time then that. In the case of spark-21656 it takes 60+ seconds and executors idle timeout. We should investigate why and see if we can fix this. Upon an initial look it seems the scheduler resets the locality lastLaunchTime whenever it places any task on a node at that locality level. It appears this means it can take way longer then 3 seconds for any particular task to fall back, but this needs to be verified. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org