Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
On Mon, Apr 15, 2024 at 03:00:42PM +0200, Fay Stegerman wrote: > > (thanks again!), am I correct to assume that thus there's no need > > to file a seperate bug against libscout? > It's generating a broken ZIP file with duplicate entries. It really shouldn't > be doing that, regardless of whether we can extract the files nonetheless. > That's still a bug that should be reported and fixed. ok, will do, mostly using this bug as reference, thanks! > > (which is nice, though maybe could only been shown once?) > Ah. It correctly shows that twice as there could be differences between the > two > files being compared wrt whether they have duplicate entries (and if so how > many). > > And if you run 'diffoscope foo.zip bar.zip' it'll show those two different > file > names. But in this case we have nested archives and the path (and in this > case > also the number of duplicate entries) is identical for both, so maybe we can > tweak the output to show which top-level file it belongs to? yes. :) > > though this later is done using diffoscope from unstable while the > > rest of the userland is bullseye, so this might be expected as well? > Ah. Looks like zipdetails(1) on bullseye doesn't support the --redact, > --scan, > and --utc options yet. right, thanks for confirming in detail! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Dance like no one's watching. Encrypt like everyone is. signature.asc Description: PGP signature
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-15 12:13]: > I've got two remaining questions about libscout (and diffoscope) > > On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote: > > unzip does seem to extract all the files, though it errors out. Not sure > > what > > diffoscope should do here. This is definitely a broken ZIP file. That bug > > should probably be reported against libscout or whatever tooling it used to > > create that JAR. > > you filed https://github.com/python/cpython/issues/117779 > (thanks again!), am I correct to assume that thus there's no need > to file a seperate bug against libscout? It's generating a broken ZIP file with duplicate entries. It really shouldn't be doing that, regardless of whether we can extract the files nonetheless. That's still a bug that should be reported and fixed. > and 2nd, > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/libscout.html > now as expected displays: > > './usr/share/java/libscout.jar' has 35 duplicate entries > './usr/share/java/libscout.jar' has 35 duplicate entries > > (which is nice, though maybe could only been shown once?) Ah. It correctly shows that twice as there could be differences between the two files being compared wrt whether they have duplicate entries (and if so how many). And if you run 'diffoscope foo.zip bar.zip' it'll show those two different file names. But in this case we have nested archives and the path (and in this case also the number of duplicate entries) is identical for both, so maybe we can tweak the output to show which top-level file it belongs to? > but > https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/arm64/diffoscope-results/libscout.html > shows this: > > Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. > Standard output: [...] > though this later is done using diffoscope from unstable while the > rest of the userland is bullseye, so this might be expected as well? Ah. Looks like zipdetails(1) on bullseye doesn't support the --redact, --scan, and --utc options yet. - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
Hi again, I've got two remaining questions about libscout (and diffoscope) On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote: > unzip does seem to extract all the files, though it errors out. Not sure what > diffoscope should do here. This is definitely a broken ZIP file. That bug > should probably be reported against libscout or whatever tooling it used to > create that JAR. you filed https://github.com/python/cpython/issues/117779 (thanks again!), am I correct to assume that thus there's no need to file a seperate bug against libscout? and 2nd, https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/libscout.html now as expected displays: './usr/share/java/libscout.jar' has 35 duplicate entries './usr/share/java/libscout.jar' has 35 duplicate entries (which is nice, though maybe could only been shown once?) but https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/arm64/diffoscope-results/libscout.html shows this: Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. Standard output: zipdetails [OPTIONS] file Display details about the internal structure of a Zip file. This is zipdetails version 1.11 OPTIONS -h display help -v Verbose - output more stuff [...] Archive contents identical but files differ, possibly due to different compression levels. Falling back to binary comparison. './usr/share/java/libscout.jar' has 35 duplicate entries './usr/share/java/libscout.jar' has 35 duplicate entries though this later is done using diffoscope from unstable while the rest of the userland is bullseye, so this might be expected as well? -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ :wq signature.asc Description: PGP signature
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
Fay Stegerman wrote: > https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140 Nice; I have applied this locally in Git and will release shortly. :) Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org chris-lamb.co.uk `-
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-11 12:54]: > On Thu, Apr 11, 2024 at 11:28:19AM +0100, Chris Lamb wrote: > [...] > > Applied in Git with attribution taken from your email. > [...] > > Fixed as well. And it adds a nice comment displaying the issue. > > awesome, thank you both! The promised cpython issue: https://github.com/python/cpython/issues/117779 And a small follow-up: https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140 - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
On Thu, Apr 11, 2024 at 11:28:19AM +0100, Chris Lamb wrote: [...] > Applied in Git with attribution taken from your email. [...] > Fixed as well. And it adds a nice comment displaying the issue. awesome, thank you both! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Make facts great again. signature.asc Description: PGP signature
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
tags 1068705 + pending thanks Fay Stegerman wrote: > The attached patch avoids the crash in this case, FWIW. […] Applied in Git with attribution taken from your email. > I would still recommend catching the error for other cases. Fixed as well. And it adds a nice comment displaying the issue. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org chris-lamb.co.uk `-
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Fay Stegerman [2024-04-11 04:28]: > * Holger Levsen [2024-04-11 02:14]: > > > unzip does seem to extract all the files, though it errors out. Not sure > > > what > > > diffoscope should do here. This is definitely a broken ZIP file. That > > > bug > > > should probably be reported against libscout or whatever tooling it used > > > to > > > create that JAR. > > > > I agree it's more complicated, but fundamentally, diffoscope should *not* > > crash > > here! (but rather report the broken zip file.) > > I think we all agree it shouldn't crash :) > > What I meant is that I'm not sure it should simply catch the error, report the > file as broken, and not attempt extraction, or if it makes sense to attempt to > work around this issue, at least in cases like this specific one where the > entries are exact duplicates and the files can presumably be safely extracted. > I think my workaround (which could be implemented slightly differently as > well, > without modifying the ZipFile, but processing it differently in diffoscope) > would accomplish that for this JAR at least. I could make an MR for that. > Though as I said I will also report this upstream to cpython, probably > tomorrow. > > - Fay The attached patch avoids the crash in this case, FWIW. I would still recommend catching the error for other cases. - Fay diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py index 2a27042a..4bfb1527 100644 --- a/diffoscope/comparators/zip.py +++ b/diffoscope/comparators/zip.py @@ -182,7 +182,12 @@ class ZipDirectory(Directory, ArchiveMember): class ZipContainer(Archive): def open_archive(self): -return zipfile.ZipFile(self.source.path, "r") +zf = zipfile.ZipFile(self.source.path, "r") +self.name_to_info = {} +for info in zf.infolist(): +if info.filename not in self.name_to_info: +self.name_to_info[info.filename] = info +return zf def close_archive(self): self.archive.close() @@ -199,7 +204,8 @@ class ZipContainer(Archive): ).encode(sys.getfilesystemencoding(), errors="replace") try: -with self.archive.open(member_name) as source, open( +info = self.name_to_info[member_name] +with self.archive.open(info) as source, open( targetpath, "wb" ) as target: shutil.copyfileobj(source, target)
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-11 02:14]: > > unzip does seem to extract all the files, though it errors out. Not sure > > what > > diffoscope should do here. This is definitely a broken ZIP file. That bug > > should probably be reported against libscout or whatever tooling it used to > > create that JAR. > > I agree it's more complicated, but fundamentally, diffoscope should *not* > crash > here! (but rather report the broken zip file.) I think we all agree it shouldn't crash :) What I meant is that I'm not sure it should simply catch the error, report the file as broken, and not attempt extraction, or if it makes sense to attempt to work around this issue, at least in cases like this specific one where the entries are exact duplicates and the files can presumably be safely extracted. I think my workaround (which could be implemented slightly differently as well, without modifying the ZipFile, but processing it differently in diffoscope) would accomplish that for this JAR at least. I could make an MR for that. Though as I said I will also report this upstream to cpython, probably tomorrow. - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
Fay Stegerman wrote: > Salsa is probably better for figuring out what to do next, but I get > these mails too :) Oh, hey! o/ > unzip does seem to extract all the files, though it errors out. Not sure what > diffoscope should do here. This is definitely a broken ZIP file. First; great debugging there, thank you. :) Okay, separate from your suggestion that a bug should be filed against libscout with its broken zip file, I think that diffoscope should not traceback and crash on this particular input. We do this elsewhere with (most) invalid inputs and it makes a lot of sense here as well. I'll modify diffoscope tomorrow morning to catch the specific exception being thrown by Python's builtin zipfile module and add a suitable message as a user-visible 'comment' — again, something we have plenty of prior art for elsewhere in the codebase. Thanks again. Best wishes, -- o ⬋ ⬊ Chris Lamb o o reproducible-builds.org ⬊ ⬋ o
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote: > Salsa is probably better for figuring out what to do next, but I get these > mails > too :) :) > The libscout.jar has duplicate ZIP entries in the central directory, pointing > to > the same actual entry in the ZIP. So the "overlapped entries" error is > entirely > correct, even if it's not a zip bomb. ah! > unzip does seem to extract all the files, though it errors out. Not sure what > diffoscope should do here. This is definitely a broken ZIP file. That bug > should probably be reported against libscout or whatever tooling it used to > create that JAR. I agree it's more complicated, but fundamentally, diffoscope should *not* crash here! (but rather report the broken zip file.) thanks! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ I’ve said it once, and I’ll say it a thousand times: If the penalty for breaking a law is a fine, then that law only exists for the poor. signature.asc Description: PGP signature
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Fay Stegerman [2024-04-11 01:48]: > * Holger Levsen [2024-04-10 19:43]: > > On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote: > > > Holger Levsen wrote: > > > > > > > when building libscout 2.3.2-3 on current unstable, the result is also > > > > unreproducible, but diffoscope crashes when analysing the diff. > > > I think this is somewhat related to: > > > https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 > > > … which was said to be fixed by Fay in > > > cc3b077f6ef97b4e20036e9823926fe633c7d4d0 > > > that released as diffoscope version 263 on 2024-04-05. > > > However, I can see that the current output of libscout/amd64 on > > > tests.reproducible-builds.org is failing with this very version: > > > > yes, indeed. > > > > also, this happened before too, I'm sure about at least with diffoscope 260 > > already. > > > > > Will loop Fay in via Salsa presently. > > > > thank you! > > Salsa is probably better for figuring out what to do next, but I get these > mails > too :) > > The libscout.jar has duplicate ZIP entries in the central directory, pointing > to > the same actual entry in the ZIP. So the "overlapped entries" error is > entirely > correct, even if it's not a zip bomb. > > >>> import zipfile > >>> zf = zipfile.ZipFile("libscout.jar") > >>> fh = zf.open("javax/annotation/CheckForNull.class") > zipfile.BadZipFile: Overlapped entries: > 'javax/annotation/CheckForNull.class' (possible zip bomb) [...] I do have a workaround of sorts for this specific case of duplicate entries. I'll open a cpython issue to report it to upstream. Though they may not consider this a bug, possibly even the correct behaviour. Not sure myself tbh :) >>> for info in reversed(zf.infolist()): ... zf.NameToInfo[info.filename] = info >>> fh = zf.open("javax/annotation/CheckForNull.class") # works now - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
* Holger Levsen [2024-04-10 19:43]: > On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote: > > Holger Levsen wrote: > > > > > when building libscout 2.3.2-3 on current unstable, the result is also > > > unreproducible, but diffoscope crashes when analysing the diff. > > I think this is somewhat related to: > > https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 > > … which was said to be fixed by Fay in > > cc3b077f6ef97b4e20036e9823926fe633c7d4d0 > > that released as diffoscope version 263 on 2024-04-05. > > However, I can see that the current output of libscout/amd64 on > > tests.reproducible-builds.org is failing with this very version: > > yes, indeed. > > also, this happened before too, I'm sure about at least with diffoscope 260 > already. > > > Will loop Fay in via Salsa presently. > > thank you! Salsa is probably better for figuring out what to do next, but I get these mails too :) The libscout.jar has duplicate ZIP entries in the central directory, pointing to the same actual entry in the ZIP. So the "overlapped entries" error is entirely correct, even if it's not a zip bomb. >>> import zipfile >>> zf = zipfile.ZipFile("libscout.jar") >>> fh = zf.open("javax/annotation/CheckForNull.class") zipfile.BadZipFile: Overlapped entries: 'javax/annotation/CheckForNull.class' (possible zip bomb) >>> len([i for i in zf.infolist() if i.filename == "javax/annotation/CheckForNull.class"]) 2 >>> len(zf.namelist()) - len(set(zf.namelist())) 35 >>> x, y = [i for i in zf.infolist() if i.filename == "javax/annotation/CheckForNull.class"] >>> x.header_offset 23065534 >>> y.header_offset 23065534 >>> x._end_offset 23065890 >>> y._end_offset 23065534 >>> zf.open(x) >>> zf.open(y) Traceback (most recent call last): zipfile.BadZipFile: Overlapped entries: 'javax/annotation/CheckForNull.class' (possible zip bomb) $ unzip -q -d foo libscout.jar error: invalid zip file with overlapped components (possible zip bomb) unzip does seem to extract all the files, though it errors out. Not sure what diffoscope should do here. This is definitely a broken ZIP file. That bug should probably be reported against libscout or whatever tooling it used to create that JAR. FWIW, it seems the libscout.jar files in both .deb files are identical apart from timestamps and the ordering of entries in the ZIP. - Fay
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote: > Holger Levsen wrote: > > > when building libscout 2.3.2-3 on current unstable, the result is also > > unreproducible, but diffoscope crashes when analysing the diff. > I think this is somewhat related to: > https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 > … which was said to be fixed by Fay in > cc3b077f6ef97b4e20036e9823926fe633c7d4d0 > that released as diffoscope version 263 on 2024-04-05. > However, I can see that the current output of libscout/amd64 on > tests.reproducible-builds.org is failing with this very version: yes, indeed. also, this happened before too, I'm sure about at least with diffoscope 260 already. > Will loop Fay in via Salsa presently. thank you! -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ Fischers Fritz fischt Plastik. signature.asc Description: PGP signature
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
Holger Levsen wrote: > when building libscout 2.3.2-3 on current unstable, the result is also > unreproducible, but diffoscope crashes when analysing the diff. I think this is somewhat related to: https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362 … which was said to be fixed by Fay in cc3b077f6ef97b4e20036e9823926fe633c7d4d0 that released as diffoscope version 263 on 2024-04-05. However, I can see that the current output of libscout/amd64 on tests.reproducible-builds.org is failing with this very version: Tue Apr 9 12:14:14 UTC 2024 I: diffoscope 263 will be used to compare the two builds: From https://gist.github.com/lamby/e5db96d4d61612485a469b826590192e/raw (saved output for posterity) Will loop Fay in via Salsa presently. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org chris-lamb.co.uk `-
Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye
package: diffoscope version: 263 hi, diffoscope 263 crashes on libscout 2.3.2-3 build on unstable but not bullseye: libscout 2.3.2-3 is part of bullseye (but neither bookworm nor trixie) and builds unreproducible there and diffoscope is able to show a diff. when building libscout 2.3.2-3 on current unstable, the result is also unreproducible, but diffoscope crashes when analysing the diff. this happens on all 4 tested archs. I've copied the packages in question to https://tests.reproducible-builds.org/debian/diffoscope-libscout/artifacts/r00t-me/ for further investigation. (because one .deb is 20mb and there's 16 of them.) (someone please remind me to delete them there once this bug has been closed.) -- cheers, Holger ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄ The hardest part about defending against social engineering is that it doesn't attack attack the weakness of a community. It attacks its *strengths*: trust, collaboration, and mutual assistance. (Russ Allbery) signature.asc Description: PGP signature