LuciferYang opened a new pull request, #56561:
URL: https://github.com/apache/spark/pull/56561

   ### What changes were proposed in this pull request?
   
   `ShuffleBlockFetcherIterator.removePendingChunks` collected the removed 
chunk ids with a `foreach { _ => ... }` whose body re-`flatMap`ped the whole 
`fetchRequestsToRemove` queue on every iteration. With N removed requests it 
repeated the same work N times; the result was still correct only because the 
accumulator is a `HashSet` that absorbed the duplicates.
   
   This processes each removed request once and drops the now-unnecessary 
intermediate queue, so the helper becomes a single `dequeueAll(pred).foreach` 
pass.
   
   ### Why are the changes needed?
   
   To remove accidental O(N^2) work and a redundant intermediate collection on 
the push-based shuffle fetch-failure fallback path, and to make the intent 
clearer.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. The returned set of removed chunk ids is unchanged.
   
   ### How was this patch tested?
   
   Existing `ShuffleBlockFetcherIteratorSuite`, which covers the SPARK-32922 
chunk-fetch fallback paths, still passes.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.8)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to