Parth Gandhi created SPARK-26755: ------------------------------------ Summary: Optimize Spark Scheduler to dequeue speculative tasks more efficiently Key: SPARK-26755 URL: https://issues.apache.org/jira/browse/SPARK-26755 Project: Spark Issue Type: Improvement Components: Scheduler Affects Versions: 3.0.0 Reporter: Parth Gandhi Attachments: Screen Shot 2019-01-28 at 11.21.05 AM.png, Screen Shot 2019-01-28 at 11.21.25 AM.png
Currently, Spark Scheduler takes quite some time to dequeue speculative tasks for larger tasksets within a stage(like 100000 or more) when speculation is turned on. On further analysis, it was found that the "task-result-getter" threads remain blocked on one of the dispatcher-event-loop threads holding the lock on TaskSchedulerImpl object {code:java} def resourceOffers(offers: IndexedSeq[WorkerOffer]): Seq[Seq[TaskDescription]] = synchronized { {code} which takes quite some time to execute the method "dequeueSpeculativeTask" in TaskSetManager.scala, thus, slowing down the overall running time of the spark job. We were monitoring the time utilization of that lock for the whole duration of the job and it was close to 50% i.e. the code within the synchronized block would run for almost half the duration of the entire spark job. The screenshots of the thread dump have been attached below for reference. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org