zhengruifeng commented on PR #55766: URL: https://github.com/apache/spark/pull/55766#issuecomment-4507484858
## Measured CI time: before vs. after Comparing a recent scheduled `build_maven.yml` run on master against the validation push on this branch (both JDK 17, Scala 2.13, Hadoop 3). **Runs** - Before: apache/spark scheduled run [25992372470](https://github.com/apache/spark/actions/runs/25992372470) on master, 2026-05-17 (pre-optimization). - After: validation push run [26153415924](https://github.com/zhengruifeng/spark/actions/runs/26153415924) on this PR branch, 2026-05-20 (with the precompile artifact). **Per-matrix-entry duration** (sorted by "before") | Matrix entry | Before | After | Δ | | :-------------------------------------- | -------: | -------: | -------: | | sql#core - other tests | 2:03:26 | 1:34:50 | −0:28:36 | | sql#core - slow tests | 1:59:59 | 1:29:42 | −0:30:17 | | sql#core - extended tests | 1:51:01 | 1:04:28 | −0:46:33 | | sql#hive - other tests | 1:46:57 | 1:05:02 | −0:41:55 | | connector#kafka-0-10, … | 1:41:22 | 0:58:30 | −0:42:52 | | core,launcher,common, … | 1:39:35 | 0:57:00 | −0:42:35 | | repl,sql#hive-thriftserver | 1:15:19 | 0:36:02 | −0:39:17 | | mllib-local,mllib,sql#pipelines | 1:14:18 | 0:31:20 | −0:42:58 | | connect | 1:13:42 | 0:20:21 | −0:53:21 | | sql#hive - slow tests | 1:11:35 | 0:33:29 | −0:38:06 | | sql#api,catalyst,yarn,k8s#core | 1:09:22 | 0:24:00 | −0:45:22 | | graphx,streaming,hadoop-cloud | 0:51:28 | 0:09:27 | −0:42:01 | Every entry drops by **28–53 min** (≈40 min on average), matching the redundant `mvn -DskipTests … clean install` (~25–40 min) the PR removes from each matrix entry. The `repl,sql#hive-thriftserver` entry still saves ~39 min here despite the "compilation-loop" special case that re-runs `clean install` — likely because the cached `~/.m2/repository/org/apache/spark/` from the precompile artifact still shortens that re-run. **Aggregate** | Metric | Before | After | Δ | | :------------------------------------ | -------: | -------: | ---------------: | | Sum of matrix entries | 17:58:04 | 9:44:11 | −8:13:53 | | + new `precompile-maven` job | | 0:49:24 | | | **Total CI compute per run** | 17:58:04 | 10:33:35 | **−7:24:29 (−41%)** | | Workflow wall-clock | 2:03:30 | 3:27:51 | +1:24:21 | **On the wall-clock delta:** the +1h 24m is mostly fork-runner queueing — in the after-run, matrix jobs started in a stagger between 10:38 and 11:47 (slowest entry waited ~1h 17m for a runner), whereas on apache/spark all 12 entries start within 3 s. Netting out the queue and looking at `precompile + longest reduced matrix entry` ≈ 49 min + 1h 35m = **~2h 24m**, vs. 2h 3m baseline — i.e. roughly **+20 min sequential cost** on official infra, in exchange for the ~7h 25m compute saving per scheduled run. Across the three scheduled Maven workflows (JDK 17 / 21 / 25), that's **~22 h of CI compute saved per day**. This also confirms the PR description's "~315–325m (~5h) net saved per run" estimate is actually conservative on this run (measured ~7h 25m). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
