zhengruifeng commented on PR #56369: URL: https://github.com/apache/spark/pull/56369#issuecomment-4655178998
### CI before vs after (gzip → zstd) Measured on the `build_and_test` shared compile artifact (`compile-artifact.tar.*`): the Precompile **compress** step, the per-job **extract (decompress)** step, and the artifact **size**. - **Before (gzip):** apache/spark post-merge run [27129268105](https://github.com/apache/spark/actions/runs/27129268105) — `e8ca2874188` (SPARK-56830, the commit right before this change) - **After (zstd):** this PR's run [27137241542](https://github.com/zhengruifeng/spark/actions/runs/27137241542) — `2b7f14cee28` | Metric | Before (gzip) | After (zstd) | Change | |---|---|---|---| | Compress — `Package compile output` (Precompile) | 90 s | 13 s | **~6.9x faster (-77 s)** | | Decompress — `Extract precompiled artifact`, per consumer job | ~20.7 s (20-21 s) | ~9.0 s (6-14 s) | **~2.3x faster (~-12 s/job)** | | Artifact size (`ls -lh`) | 2.2 GB | 2.1 GB | roughly equal (slightly smaller) | The two runs have different matrix sizes (13 vs 24 consumer extracts), so the comparison is on the single compress step and per-job decompress, not cumulative totals. **Takeaway:** ~7x faster compress and ~2.3x faster decompress at no size cost. The `Precompile` job gates every downstream job, so the -77 s compress saving shifts all consumers ~77 s earlier, and each consumer's own extract is ~12 s faster — roughly **~90 s off the precompile -> test critical path**. The compress win comes from `zstd` being faster per core plus `-T0` parallelizing across all cores (gzip is single-threaded); the decompress win is per-core `zstd` vs `gzip` speed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
