Yikun opened a new pull request, #37103: URL: https://github.com/apache/spark/pull/37103
### What changes were proposed in this pull request? This patch have two improvment: - Add `cache-from`: this will help to speed up cache build and ensure the image will NOT do complete refreshed if `REFRESH_DATE` is not changed by intention. - Add `FULL_REFRESH_DATE` in dockerfile: this will refresh cache/image completely. ### Why are the changes needed? Without this PR, if you change the dockerfile, the cache image will do a **complete refreshed** when dockerfile with any changes. This cause the different behavoir between ci tmp image (cache based refresh, real job like pyspark/sparkr/lint) and infra cache (full refresh). Finally, if a PR refresh dockerfile, you might see pyspark/sparkr/lint CI is successful, but next pyspark/sparkr/lint CI failure after cache is refreshed (because deps may be changed when image do full refreshed). After this PR, if you change the dockerfile, the cache image job will do a cache based refreshed(use previous cache as much as possible, and refreshed the left layers when cache mismatch) to keep same behavior of pyspark/sparkr/lint job result. This behavior is similar to static image in some level, you can refresh the `FULL_REFRESH_DATE` to force refresh cache completely, the advantage is you can see the pyspark/sparkr/lint ci results in GA when you do full refresh. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Test local -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
