pkotikalapudi commented on code in PR #42352:
URL: https://github.com/apache/spark/pull/42352#discussion_r1577025511
##########
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:
##########
@@ -96,6 +96,30 @@ import org.apache.spark.util.{Clock, SystemClock,
ThreadUtils, Utils}
* If an executor with caching data blocks has been idle for more than
this duration,
* the executor will be removed
*
+ * Dynamic resource allocation is also extended to work for structured
streaming use case.
+ * (micro-batch paradigm).
+ * For it to work we would still need the above configs + few additional
configs.
+ *
+ * For executor allocation, In traditional DRA target number of executors are
added based on the
Review Comment:
From my understanding in batch queries each stage can have varied resource
requirements depending upon what it does. So DRA has `schedulerBacklogTimeout`
to figure out when it should ask for more resources ([more on
it](schedulerBacklogTimeout)). So the
[pendingTasks](https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L289)
are determined by [the pending tasks of current
stage](https://github.com/apache/spark/blob/v3.5.1/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L883).
I have [modified
it](https://github.com/apache/spark/pull/42352/files#diff-fdddb0421641035be18233c212f0e3ccd2d6a49d345bd0cd4eac08fc4d911e21R1003)
to consider the pending tasks of other stages as well because structured
streaming deals with micro-batches and we want to scale out if the there are
still other stages pending in the same micro-batch.
for eg:
with current DRA code, if config
`spark.dynamicAllocation.schedulerBacklogTimeout` is set to 6 seconds and we
use that for structured streaming job where a micro-batch consists of 4 stages
which will run at max for 5 seconds each.
Then it wouldn't scale out even if 20 seconds pass because it is just
5+5+5+5 = 30seconds.
But the above mentioned changes I have done, while running the second stage
on the 6th second it figures out that other stages in the micro-batch are
pending so it scale-out appropriately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]