zhengruifeng commented on PR #56369:
URL: https://github.com/apache/spark/pull/56369#issuecomment-4655178998

   ### CI before vs after (gzip → zstd)
   
   Measured on the `build_and_test` shared compile artifact 
(`compile-artifact.tar.*`): the Precompile **compress** step, the per-job 
**extract (decompress)** step, and the artifact **size**.
   
   - **Before (gzip):** apache/spark post-merge run 
[27129268105](https://github.com/apache/spark/actions/runs/27129268105) — 
`e8ca2874188` (SPARK-56830, the commit right before this change)
   - **After (zstd):** this PR's run 
[27137241542](https://github.com/zhengruifeng/spark/actions/runs/27137241542) — 
`2b7f14cee28`
   
   | Metric | Before (gzip) | After (zstd) | Change |
   |---|---|---|---|
   | Compress — `Package compile output` (Precompile) | 90 s | 13 s | **~6.9x 
faster (-77 s)** |
   | Decompress — `Extract precompiled artifact`, per consumer job | ~20.7 s 
(20-21 s) | ~9.0 s (6-14 s) | **~2.3x faster (~-12 s/job)** |
   | Artifact size (`ls -lh`) | 2.2 GB | 2.1 GB | roughly equal (slightly 
smaller) |
   
   The two runs have different matrix sizes (13 vs 24 consumer extracts), so 
the comparison is on the single compress step and per-job decompress, not 
cumulative totals.
   
   **Takeaway:** ~7x faster compress and ~2.3x faster decompress at no size 
cost. The `Precompile` job gates every downstream job, so the -77 s compress 
saving shifts all consumers ~77 s earlier, and each consumer's own extract is 
~12 s faster — roughly **~90 s off the precompile -> test critical path**. The 
compress win comes from `zstd` being faster per core plus `-T0` parallelizing 
across all cores (gzip is single-threaded); the decompress win is per-core 
`zstd` vs `gzip` speed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to