HyukjinKwon commented on PR #56724:
URL: https://github.com/apache/spark/pull/56724#issuecomment-4786345367

   <!-- ai-code-review -->
   **Automated code review** (posted as a comment, not an approval) — via 
`spark-dev:code-review`.
   
   **Verdict: no blocking findings.** Test-only change; the fix is sound and 
well-justified.
   
   **What it does:** Groups the abort-path test by the high-cardinality `id` 
column instead of the 5-value `low_cardinality_col`, so all 20 reducer 
partitions depend on the corrupted mapper-0 and the indeterminate-stage abort 
fires deterministically (previously only ~5 partitions read mapper-0, so the 
abort hinged on scheduling order — the flake).
   
   **Design (Pass A):** Clean. No architecture/layer impact. The `local[2]` + 
`SHUFFLE_PARTITIONS=20` reasoning matches the sibling test in the same suite 
(`MetricsFailureInjectionSuite.scala:597`), so it's consistent with the 
established pattern. The scenario the test is named for (2-stage query → 
ResultStage abort under `RESULT_STAGE_DELAY=1` + forced checksum mismatch) is 
unchanged — only the grouping key's cardinality changes.
   
   **Correctness (Pass B):** `id` exists in the fixture (`setUpTestTable`, line 
48). The assertion (`intercept[SparkException]` containing `"indeterminate"`) 
is unaffected. No metric-value assertions depend on the grouping key in this 
test (unlike the sibling tests), so switching to `id` is safe.
   
   **Nit (non-blocking):** the new comment says every one of the 20 reducers 
reads mapper-0 and at least one is *guaranteed* to hit the corrupted mapper. 
That's a practical near-certainty rather than a strict guarantee — it depends 
on 300 `id` values hashing across all 20 partitions and on `local[2]` 
dispatching the remaining result tasks after the corruption event is processed. 
Both hold overwhelmingly, but "guaranteed"/"every" slightly overstate it; 
consider "effectively always" to be precise. Not worth blocking on.
   
   **Verification:** suite run 20× under Maven (the env where it flaked) — all 
green: https://github.com/HyukjinKwon/spark/actions/runs/28066715792
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to