Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
On Fri, 12 Nov 2021 14:34:50 GMT, Erik Joelsson wrote: >> Sorry, I'm mis-reading the lines here. It's of course for docs-zip. Then >> it's perfectly in order! :-) > > It's using JarArchive.gmk, which is a similar, but different and probably > also needs the same treatment. We should probably share more code between > them. I believe so, it's target docs-zip not jrtfs : https://github.com/openjdk/jdk/blob/51a5731d6dc4b6f6feac920a4b8b49c15fd6b34f/make/Docs.gmk#L712 - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
On Fri, 12 Nov 2021 14:28:31 GMT, Magnus Ihse Bursie wrote: >> make/Main.gmk line 511: >> >>> 509: MAKEFILE := Docs, \ >>> 510: TARGET := docs-zip, \ >>> 511: DEPS := docs-jdk buildtools-jdk, \ >> >> Is this really needed? I did not think jrtfs-jar used zip? > > Sorry, I'm mis-reading the lines here. It's of course for docs-zip. Then it's > perfectly in order! :-) It's using JarArchive.gmk, which is a similar, but different and probably also needs the same treatment. We should probably share more code between them. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
On Fri, 12 Nov 2021 14:24:00 GMT, Magnus Ihse Bursie wrote: >> Andrew Leonard has updated the pull request incrementally with one >> additional commit since the last revision: >> >> 8276743: Make openjdk build Zip Archive generation "reproducible" >> >> Signed-off-by: Andrew Leonard > > make/Main.gmk line 511: > >> 509: MAKEFILE := Docs, \ >> 510: TARGET := docs-zip, \ >> 511: DEPS := docs-jdk buildtools-jdk, \ > > Is this really needed? I did not think jrtfs-jar used zip? Sorry, I'm mis-reading the lines here. It's of course for docs-zip. Then it's perfectly in order! :-) - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
On Fri, 12 Nov 2021 08:24:13 GMT, Andrew Leonard wrote: >> This PR adds a new openjdk build tool MakeZipReproducible, which if >> ENABLE_REPRODUCIBLE_BUILD is enabled, generates a final "zip" file in a >> deterministic way, ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > Andrew Leonard has updated the pull request incrementally with one additional > commit since the last revision: > > 8276743: Make openjdk build Zip Archive generation "reproducible" > > Signed-off-by: Andrew Leonard Looks good. - Marked as reviewed by erikj (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
On Fri, 12 Nov 2021 08:24:13 GMT, Andrew Leonard wrote: >> This PR adds a new openjdk build tool MakeZipReproducible, which if >> ENABLE_REPRODUCIBLE_BUILD is enabled, generates a final "zip" file in a >> deterministic way, ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > Andrew Leonard has updated the pull request incrementally with one additional > commit since the last revision: > > 8276743: Make openjdk build Zip Archive generation "reproducible" > > Signed-off-by: Andrew Leonard I think it looks good now. Let Erik have a final say also, before integrating. Thank you for your effort! - Marked as reviewed by ihse (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
On Fri, 12 Nov 2021 08:24:13 GMT, Andrew Leonard wrote: >> This PR adds a new openjdk build tool MakeZipReproducible, which if >> ENABLE_REPRODUCIBLE_BUILD is enabled, generates a final "zip" file in a >> deterministic way, ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > Andrew Leonard has updated the pull request incrementally with one additional > commit since the last revision: > > 8276743: Make openjdk build Zip Archive generation "reproducible" > > Signed-off-by: Andrew Leonard make/Main.gmk line 511: > 509: MAKEFILE := Docs, \ > 510: TARGET := docs-zip, \ > 511: DEPS := docs-jdk buildtools-jdk, \ Is this really needed? I did not think jrtfs-jar used zip? - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Fri, 12 Nov 2021 14:22:09 GMT, Magnus Ihse Bursie wrote: >> @erikj79 all tests pass, ready for re-review please, thanks > > @andrew-m-leonard Yes, pushing an empty commit is much better. The Skara > tooling will automatically squash all commits in the PR when it is > integrated, so it will be fully invisible in the end. > > But there is a way to trigger re-runs, although I realize is has close to > zero visibility (I'll need to blame Github for that :-(). > > Go to your personal fork of the JDK, select the "Actions" tab, select > "Pre-submit tests" in the list to the left, and then press the `Run workflow` > button in the cyan `This workflow has a workflow_dispatch event trigger.` > field. > > There you can select branch to run, and also additionally modify the set of > platforms run. > @magicus is pushing an empty commit or dummy change preferable? Yes, I think that's the only good way of re-triggering github actions. While working on a PR, it's better to let history be history so review comments don't get lost. When we finally integrate to mainline, everything will be squashed automatically. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Fri, 12 Nov 2021 11:15:30 GMT, Andrew Leonard wrote: >>> I think that the post-processing of the zip file can be dependent on this >>> variable and that it serves no purpose to introduce a separate variable >>> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as >>> ENABLE_REPRODUCIBLE_BUILD. Do you agree? >> >> Sure, that works for me. > > @erikj79 all tests pass, ready for re-review please, thanks @andrew-m-leonard Yes, pushing an empty commit is much better. The Skara tooling will automatically squash all commits in the PR when it is integrated, so it will be fully invisible in the end. But there is a way to trigger re-runs, although I realize is has close to zero visibility (I'll need to blame Github for that :-(). Go to your personal fork of the JDK, select the "Actions" tab, select "Pre-submit tests" in the list to the left, and then press the `Run workflow` button in the cyan `This workflow has a workflow_dispatch event trigger.` field. There you can select branch to run, and also additionally modify the set of platforms run. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Wed, 10 Nov 2021 14:39:09 GMT, Erik Joelsson wrote: >> @erikj79 The flag --enable-reproducible-builds sets >> ENABLE_REPRODUCIBLE_BUILD in spec.gmk. This is set by our JIB profiles. I >> propose that we also turn it on for GHA builds. >> >> I think that the post-processing of the zip file can be dependent on this >> variable and that it serves no purpose to introduce a separate variable >> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as >> ENABLE_REPRODUCIBLE_BUILD. Do you agree? > >> I think that the post-processing of the zip file can be dependent on this >> variable and that it serves no purpose to introduce a separate variable >> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as >> ENABLE_REPRODUCIBLE_BUILD. Do you agree? > > Sure, that works for me. @erikj79 all tests pass, ready for re-review please, thanks - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]
> This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard Andrew Leonard has updated the pull request incrementally with one additional commit since the last revision: 8276743: Make openjdk build Zip Archive generation "reproducible" Signed-off-by: Andrew Leonard - Changes: - all: https://git.openjdk.java.net/jdk/pull/6311/files - new: https://git.openjdk.java.net/jdk/pull/6311/files/8d148065..ea477b9c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=04 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/6311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311 PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]
On Thu, 11 Nov 2021 19:48:04 GMT, Andrew Leonard wrote: >> This PR adds a new openjdk build tool GenerateZip, which generates a final >> "zip" file from an input folder, and creates it in a deterministic way, >> ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > Andrew Leonard has refreshed the contents of this pull request, and previous > commits have been removed. The incremental views will show differences > compared to the previous content of the PR. Looks like other PRs also getting mac or windows test failures.. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]
On Thu, 11 Nov 2021 20:18:40 GMT, Magnus Ihse Bursie wrote: > In the future, please refrain from force pushing to a PR. It makes history > hard to follow for reviewers, and is generally strongly discouraged. OpenJDK > uses the Skara tools which will automatically squash and rebase the commits > in the PR. @magicus I needed to cause a re-submit tests due to a macos timeout, and there seems no github Action or PR command to cause that, so I just force pushed, couldn't see any other way I presume there is? @magicus is pushing an empty commit or dummy change preferable? - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]
On Thu, 11 Nov 2021 19:48:04 GMT, Andrew Leonard wrote: >> This PR adds a new openjdk build tool GenerateZip, which generates a final >> "zip" file from an input folder, and creates it in a deterministic way, >> ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > Andrew Leonard has refreshed the contents of this pull request, and previous > commits have been removed. The incremental views will show differences > compared to the previous content of the PR. In the future, please refrain from force pushing to a PR. It makes history hard to follow for reviewers, and is generally strongly discouraged. OpenJDK uses the Skara tools which will automatically squash and rebase the commits in the PR. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]
> This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard Andrew Leonard has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8276743: Make openjdk build Zip Archive generation "reproducible" Signed-off-by: Andrew Leonard - Changes: - all: https://git.openjdk.java.net/jdk/pull/6311/files - new: https://git.openjdk.java.net/jdk/pull/6311/files/44036af7..8d148065 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/6311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311 PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Wed, 10 Nov 2021 14:39:09 GMT, Erik Joelsson wrote: >> @erikj79 The flag --enable-reproducible-builds sets >> ENABLE_REPRODUCIBLE_BUILD in spec.gmk. This is set by our JIB profiles. I >> propose that we also turn it on for GHA builds. >> >> I think that the post-processing of the zip file can be dependent on this >> variable and that it serves no purpose to introduce a separate variable >> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as >> ENABLE_REPRODUCIBLE_BUILD. Do you agree? > >> I think that the post-processing of the zip file can be dependent on this >> variable and that it serves no purpose to introduce a separate variable >> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as >> ENABLE_REPRODUCIBLE_BUILD. Do you agree? > > Sure, that works for me. @erikj79 @magicus I have just pushed a new commit with the suggested changes, and it works well, thank you for the help I've also done a basic average benchmarking, on my rather slow Ubuntu VM: - Existing src.zip processing : 18 seconds - Additional MakeZipReproducible : 6 seconds - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v3]
> This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard Andrew Leonard has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8276743: Make openjdk build Zip Archive generation "reproducible" Signed-off-by: Andrew Leonard - Changes: - all: https://git.openjdk.java.net/jdk/pull/6311/files - new: https://git.openjdk.java.net/jdk/pull/6311/files/f8a816af..44036af7 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311 PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v2]
On Wed, 10 Nov 2021 11:22:39 GMT, Andrew Leonard wrote: >> Actually, you don't even need to save the ZipEntry:s in memory, you can just >> extract filenames from them on the first pass, sort them, then lookup the >> entries in ZipFile again on the second lap. :) I don't think that's >> necessary though. > > @erikj79 thanks I didn't realize you can do that: "you can use the ZipFile > class to access ZipEntry's in arbitrary order" See new commit - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v2]
On Tue, 9 Nov 2021 14:00:18 GMT, Erik Joelsson wrote: >> Andrew Leonard has updated the pull request incrementally with one >> additional commit since the last revision: >> >> 8276743: Make openjdk build Zip Archive generation "reproducible" >> >> Signed-off-by: Andrew Leonard > > make/common/ZipArchive.gmk line 29: > >> 27: _ZIP_ARCHIVE_GMK := 1 >> 28: >> 29: include ../ToolsJdk.gmk > > Should probably add a comment about inclusion of this file now requires an > explicit dependency on build-tools in Main.gmk. done - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v2]
> This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard Andrew Leonard has updated the pull request incrementally with one additional commit since the last revision: 8276743: Make openjdk build Zip Archive generation "reproducible" Signed-off-by: Andrew Leonard - Changes: - all: https://git.openjdk.java.net/jdk/pull/6311/files - new: https://git.openjdk.java.net/jdk/pull/6311/files/dcf48d65..f8a816af Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=00-01 Stats: 544 lines in 5 files changed: 241 ins; 287 del; 16 mod Patch: https://git.openjdk.java.net/jdk/pull/6311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311 PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Wed, 10 Nov 2021 14:29:22 GMT, Magnus Ihse Bursie wrote: > I think that the post-processing of the zip file can be dependent on this > variable and that it serves no purpose to introduce a separate variable > ENABLE_REPRODUCIBLE_ZIP that is set to the same value as > ENABLE_REPRODUCIBLE_BUILD. Do you agree? Sure, that works for me. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 17:28:39 GMT, Erik Joelsson wrote: >> This PR adds a new openjdk build tool GenerateZip, which generates a final >> "zip" file from an input folder, and creates it in a deterministic way, >> ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > I agree that ideally reproducibility should be on by default, but if there is > a cost, then you can be sure OpenJDK developers will be looking for a way to > remove it for faster turnaround times. I would propose a specific configure > parameter for this specific case, reproducible zip files, that is default on > for release builds and off for debug builds (debug builds aren't reproducible > by nature) and let the existing meta flag also control the value of this new > flag. @erikj79 The flag --enable-reproducible-builds sets ENABLE_REPRODUCIBLE_BUILD in spec.gmk. This is set by our JIB profiles. I propose that we also turn it on for GHA builds. I think that the post-processing of the zip file can be dependent on this variable and that it serves no purpose to introduce a separate variable ENABLE_REPRODUCIBLE_ZIP that is set to the same value as ENABLE_REPRODUCIBLE_BUILD. Do you agree? - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 17:31:16 GMT, Erik Joelsson wrote: >> You are already keeping all the filenames in memory for sorting, so reading >> up the ZipEntry:s isn't that much more data, just some extra metadata for >> each entry. The actual file contents is not part of the ZipEntry object. >> When actually copying the files, you can use the ZipFile class to access >> ZipEntry's in arbitrary order to read their streams as InputStream. > > Actually, you don't even need to save the ZipEntry:s in memory, you can just > extract filenames from them on the first pass, sort them, then lookup the > entries in ZipFile again on the second lap. :) I don't think that's necessary > though. @erikj79 thanks I didn't realize you can do that: "you can use the ZipFile class to access ZipEntry's in arbitrary order" - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 17:26:05 GMT, Erik Joelsson wrote: >> @erikj79 so had a bit of a think, and part of the unzipping.. then >> re-gen'ing is not having to load all the entries into memory. You can't >> guarantee the order "zip" has created them in, so realistically i'd have to >> read all the ZipEntry's into memory, then re-write.. which we can do.. >> src.zip is only 55MB or so, so memory requirements won't be huge given >> src.zip is the only target here currently. > > You are already keeping all the filenames in memory for sorting, so reading > up the ZipEntry:s isn't that much more data, just some extra metadata for > each entry. The actual file contents is not part of the ZipEntry object. When > actually copying the files, you can use the ZipFile class to access > ZipEntry's in arbitrary order to read their streams as InputStream. Actually, you don't even need to save the ZipEntry:s in memory, you can just extract filenames from them on the first pass, sort them, then lookup the entries in ZipFile again on the second lap. :) I don't think that's necessary though. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 12:59:17 GMT, Andrew Leonard wrote: > This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard I agree that ideally reproducibility should be on by default, but if there is a cost, then you can be sure OpenJDK developers will be looking for a way to remove it for faster turnaround times. I would propose a specific configure parameter for this specific case, reproducible zip files, that is default on for release builds and off for debug builds (debug builds aren't reproducible by nature) and let the existing meta flag also control the value of this new flag. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 14:55:52 GMT, Andrew Leonard wrote: >> make/common/ZipArchive.gmk line 178: >> >>> 176:(cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \ >>> 177: $(RM) $$@ && \ >>> 178: $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \ >> >> Having to explode the zip here is unfortunate. This means we are creating an >> almost full copy of the whole src tree in the build directory, something I >> tried to avoid by leveraging the include/exclude functionality of zip, >> instead of generating make rules for copying the files I wanted into a >> source tree to run zip on. This may be a small overhead on Linux, but I'm >> pretty sure it will be very noticeable on Windows. >> >> When reading about your tool at first, I assumed it would read the >> intermediate zip file directly when rebuilding the zip. I don't think >> modifying it to do that would be too complicated, basically read and >> processing ZipEntrys instead of walking the file system. > > @erikj79 so had a bit of a think, and part of the unzipping.. then re-gen'ing > is not having to load all the entries into memory. You can't guarantee the > order "zip" has created them in, so realistically i'd have to read all the > ZipEntry's into memory, then re-write.. which we can do.. src.zip is only > 55MB or so, so memory requirements won't be huge given src.zip is the only > target here currently. You are already keeping all the filenames in memory for sorting, so reading up the ZipEntry:s isn't that much more data, just some extra metadata for each entry. The actual file contents is not part of the ZipEntry object. When actually copying the files, you can use the ZipFile class to access ZipEntry's in arbitrary order to read their streams as InputStream. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 14:57:52 GMT, Magnus Ihse Bursie wrote: > We already have an --enable-reproducible-builds. If (and I say if) we need to > turn this on/off with a flag, this would to fine. > > However, as I have said previously in a private discussion with Andrew, I > prefer it if we can make reproducible builds so cheap, reliable and > uncontroversial so we can always to reproducible builds, and remove that flag. > > For this case, I think it depends on two things: > > 1. the extra time to make the zip file reproducible. Some benchmarking on > the GenerateZip for src.zip would be good to have, at least for some ballpark > figures. > > 2. When is src.zip built? (I don't remember right now) If it is part of > the exploded-image, then it is really time sensitive (and should maybe move > out of that target). If it part of the jdk-image, I'd say it is not as > sensitive, due to >a) this is very much slower anyway, and >b) that part is much more parallelizible, so src.zip can be produced > while waiting for jlink or whatever. @magicus 1) I'll do some rough benchmarks 2) "zip-source" is only dependent on "gensrc", ie.all module -gensrc stages must have completed, so in theory it should run in parallel while the linking goes on, a quick scan of a parallel build log seems to confirm that, the "Updating support/src.zip" occurs quite some time before the exploded image linking: 20:34:45 Compiling 31 files for jdk.management.agent 20:34:45 Compiling 16 files for jdk.security.jgss 20:34:45 Compiling 30 files for jdk.security.auth 20:34:45 Creating support/native/java.security.jgss/libj2gss/static/libj2gss.a from 3 file(s) 20:34:45 Updating support/src.zip 20:34:46 Creating support/native/jdk.management.agent/libmanagement_agent/static/libmanagement_agent.a from 1 file(s) 20:34:46 Creating support/native/jdk.security.auth/libjaas/static/libjaas.a from 1 file(s) 20:34:48 Compiling 136 files for jdk.jdeps 20:34:48 Creating support/test/lib-test/jtreg/native/bin/jvm-test-launcher from 1 file(s) 20:34:49 Creating support/modules_libs/java.base/libverify.so from 1 file(s) . 20:36:17 Optimizing the exploded image 20:36:18 Creating java.datatransfer.jmod 20:36:18 Creating java.instrument.jmod 20:36:18 Creating java.compiler.jmod - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 15:00:59 GMT, Magnus Ihse Bursie wrote: > And have you verified that running with `zip -X` is not enough? I remember > checking into this before, but I don't remember the conclusions. The purpose > is to leave out some extra information, such as time stamps, but it might not > be enough to guarantee reproducible builds. Correct, I have tried that already, that does remove the extended timestamp information, and as long as the input source is already correctly timestamped it works to a point, that point being the "file ordering" is not-deterministic and I think is dependent on how the OS file queries returns files. From history this is what I tried: https://github.com/andrew-m-leonard/jdk-1/blob/f18b5826b9ae3e7e5750190a669edcfd8bb8b2cc/make/common/ZipArchive.gmk#L166 > Also, the name is not really a correct description at this point. Maybe a > straight `MakeZipReproducible` instead or something like that? MakeZipReproducible sounds good. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 12:59:17 GMT, Andrew Leonard wrote: > This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard We already have an --enable-reproducible-builds. If (and I say if) we need to turn this on/off with a flag, this would to fine. However, as I have said previously in a private discussion with Andrew, I prefer it if we can make reproducible builds so cheap, reliable and uncontroversial so we can always to reproducible builds, and remove that flag. For this case, I think it depends on two things: 1) the extra time to make the zip file reproducible. Some benchmarking on the GenerateZip for src.zip would be good to have, at least for some ballpark figures. 2) When is src.zip built? (I don't remember right now) If it is part of the exploded-image, then it is really time sensitive (and should maybe move out of that target). If it part of the jdk-image, I'd say it is not as sensitive, due to a) this is very much slower anyway, and b) that part is much more parallelizible, so src.zip can be produced while waiting for jlink or whatever. Also, the name is not really a correct description at this point. Maybe a straight `MakeZipReproducible` instead or something like that? And have you verified that running with `zip -X` is not enough? I remember checking into this before, but I don't remember the conclusions. The purpose is to leave out some extra information, such as time stamps, but it might not be enough to guarantee reproducible builds. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 13:58:24 GMT, Erik Joelsson wrote: >> This PR adds a new openjdk build tool GenerateZip, which generates a final >> "zip" file from an input folder, and creates it in a deterministic way, >> ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > make/common/ZipArchive.gmk line 178: > >> 176: (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \ >> 177: $(RM) $$@ && \ >> 178: $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \ > > Having to explode the zip here is unfortunate. This means we are creating an > almost full copy of the whole src tree in the build directory, something I > tried to avoid by leveraging the include/exclude functionality of zip, > instead of generating make rules for copying the files I wanted into a source > tree to run zip on. This may be a small overhead on Linux, but I'm pretty > sure it will be very noticeable on Windows. > > When reading about your tool at first, I assumed it would read the > intermediate zip file directly when rebuilding the zip. I don't think > modifying it to do that would be too complicated, basically read and > processing ZipEntrys instead of walking the file system. @erikj79 so had a bit of a think, and part of the unzipping.. then re-gen'ing is not having to load all the entries into memory. You can't guarantee the order "zip" has created them in, so realistically i'd have to read all the ZipEntry's into memory, then re-write.. which we can do.. src.zip is only 55MB or so, so memory requirements won't be huge given src.zip is the only target here currently. - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 14:04:27 GMT, Erik Joelsson wrote: > I think it would it make sense add a configure parameter to to enable/disable > this new functionality. I think developers would want to opt out of extra > unnecessary build steps when building and testing locally. Not sure what the > default should be. For other reproducible measures, we require active > enabling today. Maybe default on for release builds and off for debug builds. @erikj79 so I was hoping to not add another configure arg, when the intentions of this should just all be positive, ordered Also @magicus I get the impression from your comments that we don't really want to end up with in order to build reproducibly you have to specify all these options x,y,z,... Maybe we should just have one big flag --with-reproducible-build ? - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 13:58:24 GMT, Erik Joelsson wrote: >> This PR adds a new openjdk build tool GenerateZip, which generates a final >> "zip" file from an input folder, and creates it in a deterministic way, >> ensuring ordering and timestamps are set as specified. >> >> Using this tool in ZipArchive.gmk will ensure src.zip is then created >> deterministically. >> >> Signed-off-by: Andrew Leonard > > make/common/ZipArchive.gmk line 178: > >> 176: (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \ >> 177: $(RM) $$@ && \ >> 178: $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \ > > Having to explode the zip here is unfortunate. This means we are creating an > almost full copy of the whole src tree in the build directory, something I > tried to avoid by leveraging the include/exclude functionality of zip, > instead of generating make rules for copying the files I wanted into a source > tree to run zip on. This may be a small overhead on Linux, but I'm pretty > sure it will be very noticeable on Windows. > > When reading about your tool at first, I assumed it would read the > intermediate zip file directly when rebuilding the zip. I don't think > modifying it to do that would be too complicated, basically read and > processing ZipEntrys instead of walking the file system. @erikj79 sure, I can look at doing that > make/jdk/src/classes/build/tools/generatezip/GenerateZip.java line 161: > >> 159: boolean pathIsDir = fpath.isDirectory(); >> 160: >> 161: // Keep a sorted Set of files to be processed, so that the Jmod >> is reproducible > > Aren't we generating zip files rather than jmod files here? copy error... - PR: https://git.openjdk.java.net/jdk/pull/6311
Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
On Tue, 9 Nov 2021 12:59:17 GMT, Andrew Leonard wrote: > This PR adds a new openjdk build tool GenerateZip, which generates a final > "zip" file from an input folder, and creates it in a deterministic way, > ensuring ordering and timestamps are set as specified. > > Using this tool in ZipArchive.gmk will ensure src.zip is then created > deterministically. > > Signed-off-by: Andrew Leonard I think it would it make sense add a configure parameter to to enable/disable this new functionality. I think developers would want to opt out of extra unnecessary build steps when building and testing locally. Not sure what the default should be. For other reproducible measures, we require active enabling today. Maybe default on for release builds and off for debug builds. make/common/ZipArchive.gmk line 29: > 27: _ZIP_ARCHIVE_GMK := 1 > 28: > 29: include ../ToolsJdk.gmk Should probably add a comment about inclusion of this file now requires an explicit dependency on build-tools in Main.gmk. make/common/ZipArchive.gmk line 178: > 176: (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \ > 177: $(RM) $$@ && \ > 178: $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \ Having to explode the zip here is unfortunate. This means we are creating an almost full copy of the whole src tree in the build directory, something I tried to avoid by leveraging the include/exclude functionality of zip, instead of generating make rules for copying the files I wanted into a source tree to run zip on. This may be a small overhead on Linux, but I'm pretty sure it will be very noticeable on Windows. When reading about your tool at first, I assumed it would read the intermediate zip file directly when rebuilding the zip. I don't think modifying it to do that would be too complicated, basically read and processing ZipEntrys instead of walking the file system. make/jdk/src/classes/build/tools/generatezip/GenerateZip.java line 161: > 159: boolean pathIsDir = fpath.isDirectory(); > 160: > 161: // Keep a sorted Set of files to be processed, so that the Jmod > is reproducible Aren't we generating zip files rather than jmod files here? - PR: https://git.openjdk.java.net/jdk/pull/6311
RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"
This PR adds a new openjdk build tool GenerateZip, which generates a final "zip" file from an input folder, and creates it in a deterministic way, ensuring ordering and timestamps are set as specified. Using this tool in ZipArchive.gmk will ensure src.zip is then created deterministically. Signed-off-by: Andrew Leonard - Commit messages: - 8276743: Make openjdk build Zip Archive generation "reproducible" Changes: https://git.openjdk.java.net/jdk/pull/6311/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6311=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8276743 Stats: 303 lines in 3 files changed: 300 ins; 0 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/6311.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311 PR: https://git.openjdk.java.net/jdk/pull/6311