HyukjinKwon opened a new pull request, #56724: URL: https://github.com/apache/spark/pull/56724
### What changes were proposed in this pull request? Group the abort-path test `MetricsFailureInjectionSuite."Force checksum mismatch aborts a downstream ResultStage"` by the high-cardinality `id` column instead of the 5-value `low_cardinality_col`, so every one of the 20 reducer partitions reads the corrupted mapper-0. ### Why are the changes needed? The test was flaky under Maven (~3/10 scheduled runs; it always passed on SBT). Only ~5 of the 20 reducer partitions held mapper-0's few low-cardinality keys, and the mapper-0 corruption is applied **asynchronously** after the first result task succeeds (`RESULT_STAGE_DELAY=1`). The indeterminate-stage abort therefore only fired if one of those few partitions happened to be scheduled *after* the corruption landed — a scheduling race. Grouping by the high-cardinality `id` makes every reducer depend on mapper-0, so once the corruption lands the remaining result tasks (dispatched only after the first completes, on `local[2]`) deterministically hit it and the abort always fires. ### Does this PR introduce any user-facing change? No, test only. ### How was this patch tested? Ran the suite **20×** under Maven (the environment where it flaked) on a fork — all 20 passed. - ❌ Before (flaky, scheduled `Build / Maven (Scala 2.13, JDK 21)`): https://github.com/apache/spark/actions/runs/28035705490 - ❌ Before (flaky, scheduled `Build / Maven (Scala 2.13, JDK 25)`): https://github.com/apache/spark/actions/runs/28035606804 - ✅ After (this fix, MetricsFailureInjectionSuite ×20 under Maven, all green): https://github.com/HyukjinKwon/spark/actions/runs/28066715792 ### Was this patch authored or co-authored using generative AI tooling? Yes, Generated-by: Claude Code This pull request and its description were written by Isaac. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
