Friends - I'm trying to make a process to generate byte-for-byte reproducible zip files.
I got the contents identical, including timestamps and permissions. But three bytes at the 98.08% mark (bytes 5543078 to 5543081, out of a file size 5651451) differ between my run and a friend's run. Velocity-dependent? His was done on a train. ;-) try.diffoscope.org is no help. "Format-specific differences are supported for ZIP archives but no file-specific differences were detected; falling back to a binary diff." I can get the same info as provided by diffoscope with $ diff <(hexdump marble-ea2bb52c-mb-fab.zip) <(hexdump marble-ea2bb52c-ld-fab.zip) 346443c346443 < 05494a0 0300 ca68 642c 73cf 642e 7875 000b 0401 --- > 05494a0 0300 ca68 642c ca68 642c 7875 000b 0401 That is, 73cf642e becomes ca68642c. The diff is so small, it seems silly to post both files, but I'll do that anyway. 7cbdcc8b2fed002ed73017ff55e574b654fb82d061658534b4287de22339df64 marble-ea2bb52c-ld-fab.zip 573fe7e8cb662fb3e22e16c1ab4d3520f8275a0ab3dd2064df841e108a08af0e marble-ea2bb52c-mb-fab.zip http://recycle.lbl.gov/~ldoolitt/marble-ea2bb52c-ld-fab.zip http://recycle.lbl.gov/~ldoolitt/marble-ea2bb52c-mb-fab.zip Any zip file format experts here, who can explain where this comes from? And more importantly, can suggest how to fix the environment to prevent it? The script making this file is at https://github.com/BerkeleyLab/Marble/blob/main/design/scripts/manufacturing.sh but because I got the _contents_ to match already, I assert the only important lines for the purposes of this question are export LC_COLLATE=C umask 0022 touch --date="@$SOURCE_DATE_EPOCH" fab/* TZ=UTC zip --latest-time "$zipfile" fab/* Side note, the "ea2bb52c" in the file names above refers to the commit ID in the github repo. - Larry