Ngone51 commented on code in PR #36162:
URL: https://github.com/apache/spark/pull/36162#discussion_r885645567
##########
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala:
##########
@@ -853,8 +857,11 @@ private[spark] class TaskSchedulerImpl(
// (taskId, stageId, stageAttemptId, accumUpdates)
val accumUpdatesWithTaskIds: Array[(Long, Int, Int, Seq[AccumulableInfo])]
= {
accumUpdates.flatMap { case (id, updates) =>
- val accInfos = updates.map(acc => acc.toInfo(Some(acc.value), None))
Option(taskIdToTaskSetManager.get(id)).map { taskSetMgr =>
+ val (accInfos, taskProgressRate) =
getTaskAccumulableInfosAndProgressRate(updates)
Review Comment:
I'm a bit worried about the scheduler's throughput if our concerns on the
accumulators' traverse efficiency matter. I still think we could only traverse
inside the speculation thread to decouple with the scheduling thread. If we
move this stuff to the speculation thread, we can also avoid unnecessary
traverses since it's only necessary when `checkSpeculatableTasks` requires,
while with the current implementation it traverses for each heartbeat update
and successful task completion.
If we want to move it to the speculation thread, the implementation could be
also a bit simpler. At `TaskSchedulerImpl.executorHeartbeatReceived()`, we
should only set `_accumulables`. And we don't need to set `_accumulables` by
us, which is already covered by `DAGScheudler.updateAccumulators()`. Then, we'd
only need to focus on the calculation/traverses at `InefficientTaskCalculator`.
It might be a bit slow for the first-time traverses but we can cache the
records/runtime for the finished tasks or progress rate for the running tasks.
And even if it's slow, I think it's still better compared to slow the
scheduling threads.
@weixiuli @mridulm WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]