LuciferYang opened a new pull request, #56700: URL: https://github.com/apache/spark/pull/56700
### What changes were proposed in this pull request? `DataflowGraphTransformer.transformDownNodes` resolves flows on a bounded thread pool and drives them from a `while` loop that, each pass, partitioned the in-flight futures with the non-blocking `future.isDone`, reaped the completed ones, and scheduled a new flow if a slot was free. When all slots were in flight (or the queue was drained and only the last futures remained) and none had completed, the pass reaped nothing and scheduled nothing, then looped again immediately - busy-spinning on `isDone` and pinning a core for the duration of resolution. This drives the loop with an `ExecutorCompletionService` instead: completed tasks are drained with the non-blocking `poll()`, and when nothing can be scheduled but tasks are still running, the loop blocks on `take()` until the next one finishes rather than spinning. Behavior is otherwise unchanged - the same flows are scheduled in the same order, exceptions are still propagated via `Future.get()`, and an `outstanding` counter replaces the `ArrayBuffer[Future]` for slot bookkeeping. ### Why are the changes needed? Resolving a graph with more flows than the parallelism (10) kept one CPU core busy at 100% doing no useful work for the whole resolution, which is wasteful and shows up as unexplained driver CPU. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing graph-resolution suites (`ConnectValidPipelineSuite`, `ConnectInvalidPipelineSuite`, `SqlPipelineSuite`, `TriggeredGraphExecutionSuite`, `MaterializeTablesSuite`) still pass; the change only affects how the loop waits, not what it resolves. A dedicated test is not included because asserting the absence of a busy-wait reliably requires CPU-time or timing measurements that are flaky in CI. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Claude Opus 4.8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
