Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Andrew Leonard
On Fri, 12 Nov 2021 14:34:50 GMT, Erik Joelsson  wrote:

>> Sorry, I'm mis-reading the lines here. It's of course for docs-zip. Then 
>> it's perfectly in order! :-)
>
> It's using JarArchive.gmk, which is a similar, but different and probably 
> also needs the same treatment. We should probably share more code between 
> them.

I believe so, it's target docs-zip not jrtfs : 
https://github.com/openjdk/jdk/blob/51a5731d6dc4b6f6feac920a4b8b49c15fd6b34f/make/Docs.gmk#L712

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Erik Joelsson
On Fri, 12 Nov 2021 14:28:31 GMT, Magnus Ihse Bursie  wrote:

>> make/Main.gmk line 511:
>> 
>>> 509: MAKEFILE := Docs, \
>>> 510: TARGET := docs-zip, \
>>> 511: DEPS :=  docs-jdk buildtools-jdk, \
>> 
>> Is this really needed? I did not think jrtfs-jar used zip?
>
> Sorry, I'm mis-reading the lines here. It's of course for docs-zip. Then it's 
> perfectly in order! :-)

It's using JarArchive.gmk, which is a similar, but different and probably also 
needs the same treatment. We should probably share more code between them.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Magnus Ihse Bursie
On Fri, 12 Nov 2021 14:24:00 GMT, Magnus Ihse Bursie  wrote:

>> Andrew Leonard has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8276743: Make openjdk build Zip Archive generation "reproducible"
>>   
>>   Signed-off-by: Andrew Leonard 
>
> make/Main.gmk line 511:
> 
>> 509: MAKEFILE := Docs, \
>> 510: TARGET := docs-zip, \
>> 511: DEPS :=  docs-jdk buildtools-jdk, \
> 
> Is this really needed? I did not think jrtfs-jar used zip?

Sorry, I'm mis-reading the lines here. It's of course for docs-zip. Then it's 
perfectly in order! :-)

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Erik Joelsson
On Fri, 12 Nov 2021 08:24:13 GMT, Andrew Leonard  wrote:

>> This PR adds a new openjdk build tool MakeZipReproducible, which if 
>> ENABLE_REPRODUCIBLE_BUILD is enabled, generates a final "zip" file in a 
>> deterministic way, ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> Andrew Leonard has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8276743: Make openjdk build Zip Archive generation "reproducible"
>   
>   Signed-off-by: Andrew Leonard 

Looks good.

-

Marked as reviewed by erikj (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Magnus Ihse Bursie
On Fri, 12 Nov 2021 08:24:13 GMT, Andrew Leonard  wrote:

>> This PR adds a new openjdk build tool MakeZipReproducible, which if 
>> ENABLE_REPRODUCIBLE_BUILD is enabled, generates a final "zip" file in a 
>> deterministic way, ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> Andrew Leonard has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8276743: Make openjdk build Zip Archive generation "reproducible"
>   
>   Signed-off-by: Andrew Leonard 

I think it looks good now. Let Erik have a final say also, before integrating. 

Thank you for your effort!

-

Marked as reviewed by ihse (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Magnus Ihse Bursie
On Fri, 12 Nov 2021 08:24:13 GMT, Andrew Leonard  wrote:

>> This PR adds a new openjdk build tool MakeZipReproducible, which if 
>> ENABLE_REPRODUCIBLE_BUILD is enabled, generates a final "zip" file in a 
>> deterministic way, ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> Andrew Leonard has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8276743: Make openjdk build Zip Archive generation "reproducible"
>   
>   Signed-off-by: Andrew Leonard 

make/Main.gmk line 511:

> 509: MAKEFILE := Docs, \
> 510: TARGET := docs-zip, \
> 511: DEPS :=  docs-jdk buildtools-jdk, \

Is this really needed? I did not think jrtfs-jar used zip?

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-12 Thread Erik Joelsson
On Fri, 12 Nov 2021 14:22:09 GMT, Magnus Ihse Bursie  wrote:

>> @erikj79 all tests pass, ready for re-review please, thanks
>
> @andrew-m-leonard Yes, pushing an empty commit is much better. The Skara 
> tooling will automatically squash all commits in the PR when it is 
> integrated, so it will be fully invisible in the end.
> 
> But there is a way to trigger re-runs, although I realize is has close to 
> zero visibility (I'll need to blame Github for that :-().
> 
> Go to your personal fork of the JDK, select the "Actions" tab, select 
> "Pre-submit tests" in the list to the left, and then press the `Run workflow` 
> button in the cyan `This workflow has a workflow_dispatch event trigger.` 
> field.
> 
> There you can select branch to run, and also additionally modify the set of 
> platforms run.

> @magicus is pushing an empty commit or dummy change preferable?

Yes, I think that's the only good way of re-triggering github actions. While 
working on a PR, it's better to let history be history so review comments don't 
get lost. When we finally integrate to mainline, everything will be squashed 
automatically.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-12 Thread Magnus Ihse Bursie
On Fri, 12 Nov 2021 11:15:30 GMT, Andrew Leonard  wrote:

>>> I think that the post-processing of the zip file can be dependent on this 
>>> variable and that it serves no purpose to introduce a separate variable 
>>> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
>>> ENABLE_REPRODUCIBLE_BUILD. Do you agree?
>> 
>> Sure, that works for me.
>
> @erikj79 all tests pass, ready for re-review please, thanks

@andrew-m-leonard Yes, pushing an empty commit is much better. The Skara 
tooling will automatically squash all commits in the PR when it is integrated, 
so it will be fully invisible in the end.

But there is a way to trigger re-runs, although I realize is has close to zero 
visibility (I'll need to blame Github for that :-().

Go to your personal fork of the JDK, select the "Actions" tab, select 
"Pre-submit tests" in the list to the left, and then press the `Run workflow` 
button in the cyan `This workflow has a workflow_dispatch event trigger.` field.

There you can select branch to run, and also additionally modify the set of 
platforms run.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-12 Thread Andrew Leonard
On Wed, 10 Nov 2021 14:39:09 GMT, Erik Joelsson  wrote:

>> @erikj79 The flag --enable-reproducible-builds sets 
>> ENABLE_REPRODUCIBLE_BUILD in spec.gmk. This is set by our JIB profiles. I 
>> propose that we also turn it on for GHA builds. 
>> 
>> I think that the post-processing of the zip file can be dependent on this 
>> variable and that it serves no purpose to introduce a separate variable 
>> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
>> ENABLE_REPRODUCIBLE_BUILD. Do you agree?
>
>> I think that the post-processing of the zip file can be dependent on this 
>> variable and that it serves no purpose to introduce a separate variable 
>> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
>> ENABLE_REPRODUCIBLE_BUILD. Do you agree?
> 
> Sure, that works for me.

@erikj79 all tests pass, ready for re-review please, thanks

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v5]

2021-11-12 Thread Andrew Leonard
> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

Andrew Leonard has updated the pull request incrementally with one additional 
commit since the last revision:

  8276743: Make openjdk build Zip Archive generation "reproducible"
  
  Signed-off-by: Andrew Leonard 

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6311/files
  - new: https://git.openjdk.java.net/jdk/pull/6311/files/8d148065..ea477b9c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6311.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]

2021-11-12 Thread Andrew Leonard
On Thu, 11 Nov 2021 19:48:04 GMT, Andrew Leonard  wrote:

>> This PR adds a new openjdk build tool GenerateZip, which generates a final 
>> "zip" file from an input folder, and creates it in a deterministic way, 
>> ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> Andrew Leonard has refreshed the contents of this pull request, and previous 
> commits have been removed. The incremental views will show differences 
> compared to the previous content of the PR.

Looks like other PRs also getting mac or windows test failures..

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]

2021-11-11 Thread Andrew Leonard
On Thu, 11 Nov 2021 20:18:40 GMT, Magnus Ihse Bursie  wrote:

> In the future, please refrain from force pushing to a PR. It makes history 
> hard to follow for reviewers, and is generally strongly discouraged. OpenJDK 
> uses the Skara tools which will automatically squash and rebase the commits 
> in the PR.
@magicus I needed to cause a re-submit tests due to a macos timeout, and there 
seems no github Action or PR command to cause that, so I just force pushed, 
couldn't see any other way I presume there is?

@magicus is pushing an empty commit or dummy change preferable?

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]

2021-11-11 Thread Magnus Ihse Bursie
On Thu, 11 Nov 2021 19:48:04 GMT, Andrew Leonard  wrote:

>> This PR adds a new openjdk build tool GenerateZip, which generates a final 
>> "zip" file from an input folder, and creates it in a deterministic way, 
>> ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> Andrew Leonard has refreshed the contents of this pull request, and previous 
> commits have been removed. The incremental views will show differences 
> compared to the previous content of the PR.

In the future, please refrain from force pushing to a PR. It makes history hard 
to follow for reviewers, and is generally strongly discouraged. OpenJDK uses 
the Skara tools which will automatically squash and rebase the commits in the 
PR.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v4]

2021-11-11 Thread Andrew Leonard
> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

Andrew Leonard has refreshed the contents of this pull request, and previous 
commits have been removed. The incremental views will show differences compared 
to the previous content of the PR. The pull request contains one new commit 
since the last revision:

  8276743: Make openjdk build Zip Archive generation "reproducible"
  
  Signed-off-by: Andrew Leonard 

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6311/files
  - new: https://git.openjdk.java.net/jdk/pull/6311/files/44036af7..8d148065

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=02-03

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6311.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-11 Thread Andrew Leonard
On Wed, 10 Nov 2021 14:39:09 GMT, Erik Joelsson  wrote:

>> @erikj79 The flag --enable-reproducible-builds sets 
>> ENABLE_REPRODUCIBLE_BUILD in spec.gmk. This is set by our JIB profiles. I 
>> propose that we also turn it on for GHA builds. 
>> 
>> I think that the post-processing of the zip file can be dependent on this 
>> variable and that it serves no purpose to introduce a separate variable 
>> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
>> ENABLE_REPRODUCIBLE_BUILD. Do you agree?
>
>> I think that the post-processing of the zip file can be dependent on this 
>> variable and that it serves no purpose to introduce a separate variable 
>> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
>> ENABLE_REPRODUCIBLE_BUILD. Do you agree?
> 
> Sure, that works for me.

@erikj79 @magicus I have just pushed a new commit with the suggested changes, 
and it works well, thank you for the help

I've also done a basic average benchmarking, on my rather slow Ubuntu VM:
- Existing src.zip processing : 18 seconds
- Additional MakeZipReproducible : 6 seconds

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v3]

2021-11-11 Thread Andrew Leonard
> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

Andrew Leonard has refreshed the contents of this pull request, and previous 
commits have been removed. The incremental views will show differences compared 
to the previous content of the PR. The pull request contains one new commit 
since the last revision:

  8276743: Make openjdk build Zip Archive generation "reproducible"
  
  Signed-off-by: Andrew Leonard 

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6311/files
  - new: https://git.openjdk.java.net/jdk/pull/6311/files/f8a816af..44036af7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6311.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v2]

2021-11-11 Thread Andrew Leonard
On Wed, 10 Nov 2021 11:22:39 GMT, Andrew Leonard  wrote:

>> Actually, you don't even need to save the ZipEntry:s in memory, you can just 
>> extract filenames from them on the first pass, sort them, then lookup the 
>> entries in ZipFile again on the second lap. :) I don't think that's 
>> necessary though.
>
> @erikj79 thanks I didn't realize you can do that: "you can use the ZipFile 
> class to access ZipEntry's in arbitrary order"

See new commit

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v2]

2021-11-11 Thread Andrew Leonard
On Tue, 9 Nov 2021 14:00:18 GMT, Erik Joelsson  wrote:

>> Andrew Leonard has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   8276743: Make openjdk build Zip Archive generation "reproducible"
>>   
>>   Signed-off-by: Andrew Leonard 
>
> make/common/ZipArchive.gmk line 29:
> 
>> 27: _ZIP_ARCHIVE_GMK := 1
>> 28: 
>> 29: include ../ToolsJdk.gmk
> 
> Should probably add a comment about inclusion of this file now requires an 
> explicit dependency on build-tools in Main.gmk.

done

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible" [v2]

2021-11-11 Thread Andrew Leonard
> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

Andrew Leonard has updated the pull request incrementally with one additional 
commit since the last revision:

  8276743: Make openjdk build Zip Archive generation "reproducible"
  
  Signed-off-by: Andrew Leonard 

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/6311/files
  - new: https://git.openjdk.java.net/jdk/pull/6311/files/dcf48d65..f8a816af

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk=6311=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk=6311=00-01

  Stats: 544 lines in 5 files changed: 241 ins; 287 del; 16 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6311.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-10 Thread Erik Joelsson
On Wed, 10 Nov 2021 14:29:22 GMT, Magnus Ihse Bursie  wrote:

> I think that the post-processing of the zip file can be dependent on this 
> variable and that it serves no purpose to introduce a separate variable 
> ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
> ENABLE_REPRODUCIBLE_BUILD. Do you agree?

Sure, that works for me.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-10 Thread Magnus Ihse Bursie
On Tue, 9 Nov 2021 17:28:39 GMT, Erik Joelsson  wrote:

>> This PR adds a new openjdk build tool GenerateZip, which generates a final 
>> "zip" file from an input folder, and creates it in a deterministic way, 
>> ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> I agree that ideally reproducibility should be on by default, but if there is 
> a cost, then you can be sure OpenJDK developers will be looking for a way to 
> remove it for faster turnaround times. I would propose a specific configure 
> parameter for this specific case, reproducible zip files, that is default on 
> for release builds and off for debug builds (debug builds aren't reproducible 
> by nature) and let the existing meta flag also control the value of this new 
> flag.

@erikj79 The flag --enable-reproducible-builds sets ENABLE_REPRODUCIBLE_BUILD 
in spec.gmk. This is set by our JIB profiles. I propose that we also turn it on 
for GHA builds. 

I think that the post-processing of the zip file can be dependent on this 
variable and that it serves no purpose to introduce a separate variable 
ENABLE_REPRODUCIBLE_ZIP that is set to the same value as 
ENABLE_REPRODUCIBLE_BUILD. Do you agree?

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-10 Thread Andrew Leonard
On Tue, 9 Nov 2021 17:31:16 GMT, Erik Joelsson  wrote:

>> You are already keeping all the filenames in memory for sorting, so reading 
>> up the ZipEntry:s isn't that much more data, just some extra metadata for 
>> each entry. The actual file contents is not part of the ZipEntry object. 
>> When actually copying the files, you can use the ZipFile class to access 
>> ZipEntry's in arbitrary order to read their streams as InputStream.
>
> Actually, you don't even need to save the ZipEntry:s in memory, you can just 
> extract filenames from them on the first pass, sort them, then lookup the 
> entries in ZipFile again on the second lap. :) I don't think that's necessary 
> though.

@erikj79 thanks I didn't realize you can do that: "you can use the ZipFile 
class to access ZipEntry's in arbitrary order"

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Erik Joelsson
On Tue, 9 Nov 2021 17:26:05 GMT, Erik Joelsson  wrote:

>> @erikj79 so had a bit of a think, and part of the unzipping.. then 
>> re-gen'ing is not having to load all the entries into memory. You can't 
>> guarantee the order "zip" has created them in, so realistically i'd have to 
>> read all the ZipEntry's into memory, then re-write.. which we can do.. 
>> src.zip is only 55MB or so, so memory requirements won't be huge given 
>> src.zip is the only target here currently.
>
> You are already keeping all the filenames in memory for sorting, so reading 
> up the ZipEntry:s isn't that much more data, just some extra metadata for 
> each entry. The actual file contents is not part of the ZipEntry object. When 
> actually copying the files, you can use the ZipFile class to access 
> ZipEntry's in arbitrary order to read their streams as InputStream.

Actually, you don't even need to save the ZipEntry:s in memory, you can just 
extract filenames from them on the first pass, sort them, then lookup the 
entries in ZipFile again on the second lap. :) I don't think that's necessary 
though.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Erik Joelsson
On Tue, 9 Nov 2021 12:59:17 GMT, Andrew Leonard  wrote:

> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

I agree that ideally reproducibility should be on by default, but if there is a 
cost, then you can be sure OpenJDK developers will be looking for a way to 
remove it for faster turnaround times. I would propose a specific configure 
parameter for this specific case, reproducible zip files, that is default on 
for release builds and off for debug builds (debug builds aren't reproducible 
by nature) and let the existing meta flag also control the value of this new 
flag.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Erik Joelsson
On Tue, 9 Nov 2021 14:55:52 GMT, Andrew Leonard  wrote:

>> make/common/ZipArchive.gmk line 178:
>> 
>>> 176:(cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \
>>> 177: $(RM) $$@ && \
>>> 178: $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \
>> 
>> Having to explode the zip here is unfortunate. This means we are creating an 
>> almost full copy of the whole src tree in the build directory, something I 
>> tried to avoid by leveraging the include/exclude functionality of zip, 
>> instead of generating make rules for copying the files I wanted into a 
>> source tree to run zip on. This may be a small overhead on Linux, but I'm 
>> pretty sure it will be very noticeable on Windows.
>> 
>> When reading about your tool at first, I assumed it would read the 
>> intermediate zip file directly when rebuilding the zip. I don't think 
>> modifying it to do that would be too complicated, basically read and 
>> processing ZipEntrys instead of walking the file system.
>
> @erikj79 so had a bit of a think, and part of the unzipping.. then re-gen'ing 
> is not having to load all the entries into memory. You can't guarantee the 
> order "zip" has created them in, so realistically i'd have to read all the 
> ZipEntry's into memory, then re-write.. which we can do.. src.zip is only 
> 55MB or so, so memory requirements won't be huge given src.zip is the only 
> target here currently.

You are already keeping all the filenames in memory for sorting, so reading up 
the ZipEntry:s isn't that much more data, just some extra metadata for each 
entry. The actual file contents is not part of the ZipEntry object. When 
actually copying the files, you can use the ZipFile class to access ZipEntry's 
in arbitrary order to read their streams as InputStream.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Andrew Leonard
On Tue, 9 Nov 2021 14:57:52 GMT, Magnus Ihse Bursie  wrote:

> We already have an --enable-reproducible-builds. If (and I say if) we need to 
> turn this on/off with a flag, this would to fine.
> 
> However, as I have said previously in a private discussion with Andrew, I 
> prefer it if we can make reproducible builds so cheap, reliable and 
> uncontroversial so we can always to reproducible builds, and remove that flag.
> 
> For this case, I think it depends on two things:
> 
> 1. the extra time to make the zip file reproducible. Some benchmarking on 
> the GenerateZip for src.zip would be good to have, at least for some ballpark 
> figures.
> 
> 2. When is src.zip built? (I don't remember right now) If it is part of 
> the exploded-image, then it is really time sensitive (and should maybe move 
> out of that target). If it part of the jdk-image, I'd say it is not as 
> sensitive, due to
>a) this is very much slower anyway, and
>b) that part is much more parallelizible, so src.zip can be produced 
> while waiting for jlink or whatever.

@magicus
1) I'll do some rough benchmarks
2) "zip-source" is only dependent on "gensrc", ie.all module -gensrc stages 
must have completed, so in theory it should run in parallel while the linking 
goes on, a quick scan of a parallel build log seems to confirm that, the 
"Updating support/src.zip" occurs quite some time before the exploded image 
linking:

20:34:45  Compiling 31 files for jdk.management.agent
20:34:45  Compiling 16 files for jdk.security.jgss
20:34:45  Compiling 30 files for jdk.security.auth
20:34:45  Creating support/native/java.security.jgss/libj2gss/static/libj2gss.a 
from 3 file(s)
20:34:45  Updating support/src.zip
20:34:46  Creating 
support/native/jdk.management.agent/libmanagement_agent/static/libmanagement_agent.a
 from 1 file(s)
20:34:46  Creating support/native/jdk.security.auth/libjaas/static/libjaas.a 
from 1 file(s)
20:34:48  Compiling 136 files for jdk.jdeps
20:34:48  Creating support/test/lib-test/jtreg/native/bin/jvm-test-launcher 
from 1 file(s)
20:34:49  Creating support/modules_libs/java.base/libverify.so from 1 file(s)
.
20:36:17  Optimizing the exploded image
20:36:18  Creating java.datatransfer.jmod
20:36:18  Creating java.instrument.jmod
20:36:18  Creating java.compiler.jmod

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Andrew Leonard
On Tue, 9 Nov 2021 15:00:59 GMT, Magnus Ihse Bursie  wrote:

> And have you verified that running with `zip -X` is not enough? I remember 
> checking into this before, but I don't remember the conclusions. The purpose 
> is to leave out some extra information, such as time stamps, but it might not 
> be enough to guarantee reproducible builds.

Correct, I have tried that already, that does remove the extended timestamp 
information, and as long as the input source is already correctly timestamped 
it works to a point, that point being the "file ordering" is not-deterministic 
and I think is dependent on how the OS file queries returns files. From history 
this is what I tried: 
https://github.com/andrew-m-leonard/jdk-1/blob/f18b5826b9ae3e7e5750190a669edcfd8bb8b2cc/make/common/ZipArchive.gmk#L166

> Also, the name is not really a correct description at this point. Maybe a 
> straight `MakeZipReproducible` instead or something like that?

MakeZipReproducible sounds good.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Magnus Ihse Bursie
On Tue, 9 Nov 2021 12:59:17 GMT, Andrew Leonard  wrote:

> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

We already have an --enable-reproducible-builds. If (and I say if) we need to 
turn this on/off with a flag, this would to fine.

However, as I have said previously in a private discussion with Andrew, I 
prefer it if we can make reproducible builds so cheap, reliable and 
uncontroversial so we can always to reproducible builds, and remove that flag.

For this case, I think it depends on two things:
1) the extra time to make the zip file reproducible. Some benchmarking on the 
GenerateZip for src.zip would be good to have, at least for some ballpark 
figures.

2) When is src.zip built? (I don't remember right now) If it is part of the 
exploded-image, then it is really time sensitive (and should maybe move out of 
that target). If it part of the jdk-image, I'd say it is not as sensitive, due 
to
a) this is very much slower anyway, and
b) that part is much more parallelizible, so src.zip can be produced while 
waiting for jlink or whatever.

Also, the name is not really a correct description at this point. Maybe a 
straight `MakeZipReproducible` instead or something like that?

And have you verified that running with `zip -X` is not enough? I remember 
checking into this before, but I don't remember the conclusions. The purpose is 
to leave out some extra information, such as time stamps, but it might not be 
enough to guarantee reproducible builds.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Andrew Leonard
On Tue, 9 Nov 2021 13:58:24 GMT, Erik Joelsson  wrote:

>> This PR adds a new openjdk build tool GenerateZip, which generates a final 
>> "zip" file from an input folder, and creates it in a deterministic way, 
>> ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> make/common/ZipArchive.gmk line 178:
> 
>> 176: (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \
>> 177:  $(RM) $$@ && \
>> 178:  $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \
> 
> Having to explode the zip here is unfortunate. This means we are creating an 
> almost full copy of the whole src tree in the build directory, something I 
> tried to avoid by leveraging the include/exclude functionality of zip, 
> instead of generating make rules for copying the files I wanted into a source 
> tree to run zip on. This may be a small overhead on Linux, but I'm pretty 
> sure it will be very noticeable on Windows.
> 
> When reading about your tool at first, I assumed it would read the 
> intermediate zip file directly when rebuilding the zip. I don't think 
> modifying it to do that would be too complicated, basically read and 
> processing ZipEntrys instead of walking the file system.

@erikj79 so had a bit of a think, and part of the unzipping.. then re-gen'ing 
is not having to load all the entries into memory. You can't guarantee the 
order "zip" has created them in, so realistically i'd have to read all the 
ZipEntry's into memory, then re-write.. which we can do.. src.zip is only 55MB 
or so, so memory requirements won't be huge given src.zip is the only target 
here currently.

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Andrew Leonard
On Tue, 9 Nov 2021 14:04:27 GMT, Erik Joelsson  wrote:

> I think it would it make sense add a configure parameter to to enable/disable 
> this new functionality. I think developers would want to opt out of extra 
> unnecessary build steps when building and testing locally. Not sure what the 
> default should be. For other reproducible measures, we require active 
> enabling today. Maybe default on for release builds and off for debug builds.

@erikj79 so I was hoping to not add another configure arg, when the intentions 
of this should just all be positive, ordered Also @magicus I get 
the impression from your comments that we don't really want to end up with in 
order to build reproducibly you have to specify all these options x,y,z,...
Maybe we should just have one big flag --with-reproducible-build ?

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Andrew Leonard
On Tue, 9 Nov 2021 13:58:24 GMT, Erik Joelsson  wrote:

>> This PR adds a new openjdk build tool GenerateZip, which generates a final 
>> "zip" file from an input folder, and creates it in a deterministic way, 
>> ensuring ordering and timestamps are set as specified.
>> 
>> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
>> deterministically.
>> 
>> Signed-off-by: Andrew Leonard 
>
> make/common/ZipArchive.gmk line 178:
> 
>> 176: (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \
>> 177:  $(RM) $$@ && \
>> 178:  $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \
> 
> Having to explode the zip here is unfortunate. This means we are creating an 
> almost full copy of the whole src tree in the build directory, something I 
> tried to avoid by leveraging the include/exclude functionality of zip, 
> instead of generating make rules for copying the files I wanted into a source 
> tree to run zip on. This may be a small overhead on Linux, but I'm pretty 
> sure it will be very noticeable on Windows.
> 
> When reading about your tool at first, I assumed it would read the 
> intermediate zip file directly when rebuilding the zip. I don't think 
> modifying it to do that would be too complicated, basically read and 
> processing ZipEntrys instead of walking the file system.

@erikj79 sure, I can look at doing that

> make/jdk/src/classes/build/tools/generatezip/GenerateZip.java line 161:
> 
>> 159: boolean pathIsDir = fpath.isDirectory();
>> 160: 
>> 161: // Keep a sorted Set of files to be processed, so that the Jmod 
>> is reproducible
> 
> Aren't we generating zip files rather than jmod files here?

copy error...

-

PR: https://git.openjdk.java.net/jdk/pull/6311


Re: RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Erik Joelsson
On Tue, 9 Nov 2021 12:59:17 GMT, Andrew Leonard  wrote:

> This PR adds a new openjdk build tool GenerateZip, which generates a final 
> "zip" file from an input folder, and creates it in a deterministic way, 
> ensuring ordering and timestamps are set as specified.
> 
> Using this tool in ZipArchive.gmk will ensure src.zip is then created 
> deterministically.
> 
> Signed-off-by: Andrew Leonard 

I think it would it make sense add a configure parameter to to enable/disable 
this new functionality. I think developers would want to opt out of extra 
unnecessary build steps when building and testing locally. Not sure what the 
default should be. For other reproducible measures, we require active enabling 
today. Maybe default on for release builds and off for debug builds.

make/common/ZipArchive.gmk line 29:

> 27: _ZIP_ARCHIVE_GMK := 1
> 28: 
> 29: include ../ToolsJdk.gmk

Should probably add a comment about inclusion of this file now requires an 
explicit dependency on build-tools in Main.gmk.

make/common/ZipArchive.gmk line 178:

> 176:  (cd $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/files && \
> 177:   $(RM) $$@ && \
> 178:   $(UNZIP) -q $$(SUPPORT_OUTPUTDIR)/ziptmp/$1/tmp.zip && \

Having to explode the zip here is unfortunate. This means we are creating an 
almost full copy of the whole src tree in the build directory, something I 
tried to avoid by leveraging the include/exclude functionality of zip, instead 
of generating make rules for copying the files I wanted into a source tree to 
run zip on. This may be a small overhead on Linux, but I'm pretty sure it will 
be very noticeable on Windows.

When reading about your tool at first, I assumed it would read the intermediate 
zip file directly when rebuilding the zip. I don't think modifying it to do 
that would be too complicated, basically read and processing ZipEntrys instead 
of walking the file system.

make/jdk/src/classes/build/tools/generatezip/GenerateZip.java line 161:

> 159: boolean pathIsDir = fpath.isDirectory();
> 160: 
> 161: // Keep a sorted Set of files to be processed, so that the Jmod 
> is reproducible

Aren't we generating zip files rather than jmod files here?

-

PR: https://git.openjdk.java.net/jdk/pull/6311


RFR: 8276743: Make openjdk build Zip Archive generation "reproducible"

2021-11-09 Thread Andrew Leonard
This PR adds a new openjdk build tool GenerateZip, which generates a final 
"zip" file from an input folder, and creates it in a deterministic way, 
ensuring ordering and timestamps are set as specified.

Using this tool in ZipArchive.gmk will ensure src.zip is then created 
deterministically.

Signed-off-by: Andrew Leonard 

-

Commit messages:
 - 8276743: Make openjdk build Zip Archive generation "reproducible"

Changes: https://git.openjdk.java.net/jdk/pull/6311/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk=6311=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8276743
  Stats: 303 lines in 3 files changed: 300 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/6311.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/6311/head:pull/6311

PR: https://git.openjdk.java.net/jdk/pull/6311