Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-16 Thread Holger Levsen
On Mon, Apr 15, 2024 at 03:00:42PM +0200, Fay Stegerman wrote:
> > (thanks again!), am I correct to assume that thus there's no need
> > to file a seperate bug against libscout?
> It's generating a broken ZIP file with duplicate entries.  It really shouldn't
> be doing that, regardless of whether we can extract the files nonetheless.
> That's still a bug that should be reported and fixed.

ok, will do, mostly using this bug as reference, thanks!

> > (which is nice, though maybe could only been shown once?)
> Ah.  It correctly shows that twice as there could be differences between the 
> two
> files being compared wrt whether they have duplicate entries (and if so how
> many).
> 
> And if you run 'diffoscope foo.zip bar.zip' it'll show those two different 
> file
> names.  But in this case we have nested archives and the path (and in this 
> case
> also the number of duplicate entries) is identical for both, so maybe we can
> tweak the output to show which top-level file it belongs to?

yes.

:)
 
> > though this later is done using diffoscope from unstable while the
> > rest of the userland is bullseye, so this might be expected as well?
> Ah.  Looks like zipdetails(1) on bullseye doesn't support the --redact, 
> --scan,
> and --utc options yet.

right, thanks for confirming in detail!


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Dance like no one's watching. Encrypt like everyone is.


signature.asc
Description: PGP signature


Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-15 Thread Fay Stegerman
* Holger Levsen  [2024-04-15 12:13]:
> I've got two remaining questions about libscout (and diffoscope)
>
> On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote:
> > unzip does seem to extract all the files, though it errors out.  Not sure 
> > what
> > diffoscope should do here.  This is definitely a broken ZIP file.  That bug
> > should probably be reported against libscout or whatever tooling it used to
> > create that JAR.
>
> you filed https://github.com/python/cpython/issues/117779
> (thanks again!), am I correct to assume that thus there's no need
> to file a seperate bug against libscout?

It's generating a broken ZIP file with duplicate entries.  It really shouldn't
be doing that, regardless of whether we can extract the files nonetheless.
That's still a bug that should be reported and fixed.

> and 2nd, 
> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/libscout.html
> now as expected displays:
>
> './usr/share/java/libscout.jar' has 35 duplicate entries
> './usr/share/java/libscout.jar' has 35 duplicate entries
>
> (which is nice, though maybe could only been shown once?)

Ah.  It correctly shows that twice as there could be differences between the two
files being compared wrt whether they have duplicate entries (and if so how
many).

And if you run 'diffoscope foo.zip bar.zip' it'll show those two different file
names.  But in this case we have nested archives and the path (and in this case
also the number of duplicate entries) is identical for both, so maybe we can
tweak the output to show which top-level file it belongs to?

> but 
> https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/arm64/diffoscope-results/libscout.html
> shows this:
>
> Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. 
> Standard output:
[...]
> though this later is done using diffoscope from unstable while the
> rest of the userland is bullseye, so this might be expected as well?

Ah.  Looks like zipdetails(1) on bullseye doesn't support the --redact, --scan,
and --utc options yet.

- Fay



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-15 Thread Holger Levsen
Hi again,

I've got two remaining questions about libscout (and diffoscope)

On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote:
> unzip does seem to extract all the files, though it errors out.  Not sure what
> diffoscope should do here.  This is definitely a broken ZIP file.  That bug
> should probably be reported against libscout or whatever tooling it used to
> create that JAR.

you filed https://github.com/python/cpython/issues/117779
(thanks again!), am I correct to assume that thus there's no need
to file a seperate bug against libscout?

and 2nd, 
https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/libscout.html
now as expected displays:

'./usr/share/java/libscout.jar' has 35 duplicate entries
'./usr/share/java/libscout.jar' has 35 duplicate entries

(which is nice, though maybe could only been shown once?)

but 
https://tests.reproducible-builds.org/debian/rb-pkg/bullseye/arm64/diffoscope-results/libscout.html
shows this:

Command `'zipdetails --redact --scan --utc {}'` failed with exit code 255. 
Standard output:
zipdetails [OPTIONS] file

Display details about the internal structure of a Zip file.

This is zipdetails version 1.11

OPTIONS
 -h display help
 -v Verbose - output more stuff
[...]
Archive contents identical but files differ, possibly due to different 
compression levels. Falling back to binary comparison.
'./usr/share/java/libscout.jar' has 35 duplicate entries
'./usr/share/java/libscout.jar' has 35 duplicate entries


though this later is done using diffoscope from unstable while the
rest of the userland is bullseye, so this might be expected as well?


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

:wq


signature.asc
Description: PGP signature


Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-12 Thread Chris Lamb
Fay Stegerman wrote:

> https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140

Nice; I have applied this locally in Git and will release shortly. :)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org  chris-lamb.co.uk
   `-



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-11 Thread Fay Stegerman
* Holger Levsen  [2024-04-11 12:54]:
> On Thu, Apr 11, 2024 at 11:28:19AM +0100, Chris Lamb wrote:
> [...]
> > Applied in Git with attribution taken from your email.
> [...]
> > Fixed as well. And it adds a nice comment displaying the issue.
> 
> awesome, thank you both!

The promised cpython issue: https://github.com/python/cpython/issues/117779

And a small follow-up:
https://salsa.debian.org/reproducible-builds/diffoscope/-/merge_requests/140

- Fay



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-11 Thread Holger Levsen
On Thu, Apr 11, 2024 at 11:28:19AM +0100, Chris Lamb wrote:
[...]
> Applied in Git with attribution taken from your email.
[...]
> Fixed as well. And it adds a nice comment displaying the issue.

awesome, thank you both!


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Make facts great again.


signature.asc
Description: PGP signature


Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-11 Thread Chris Lamb
tags 1068705 + pending
thanks

Fay Stegerman wrote:

> The attached patch avoids the crash in this case, FWIW. […]

Applied in Git with attribution taken from your email.

> I would still recommend catching the error for other cases.

Fixed as well. And it adds a nice comment displaying the issue.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org  chris-lamb.co.uk
   `-



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Fay Stegerman
* Fay Stegerman  [2024-04-11 04:28]:
> * Holger Levsen  [2024-04-11 02:14]:
> > > unzip does seem to extract all the files, though it errors out.  Not sure 
> > > what
> > > diffoscope should do here.  This is definitely a broken ZIP file.  That 
> > > bug
> > > should probably be reported against libscout or whatever tooling it used 
> > > to
> > > create that JAR.
> > 
> > I agree it's more complicated, but fundamentally, diffoscope should *not* 
> > crash
> > here! (but rather report the broken zip file.)
> 
> I think we all agree it shouldn't crash :)
> 
> What I meant is that I'm not sure it should simply catch the error, report the
> file as broken, and not attempt extraction, or if it makes sense to attempt to
> work around this issue, at least in cases like this specific one where the
> entries are exact duplicates and the files can presumably be safely extracted.
> I think my workaround (which could be implemented slightly differently as 
> well,
> without modifying the ZipFile, but processing it differently in diffoscope)
> would accomplish that for this JAR at least.  I could make an MR for that.
> Though as I said I will also report this upstream to cpython, probably 
> tomorrow.
> 
> - Fay

The attached patch avoids the crash in this case, FWIW.  I would still recommend
catching the error for other cases.

- Fay
diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py
index 2a27042a..4bfb1527 100644
--- a/diffoscope/comparators/zip.py
+++ b/diffoscope/comparators/zip.py
@@ -182,7 +182,12 @@ class ZipDirectory(Directory, ArchiveMember):
 
 class ZipContainer(Archive):
 def open_archive(self):
-return zipfile.ZipFile(self.source.path, "r")
+zf = zipfile.ZipFile(self.source.path, "r")
+self.name_to_info = {}
+for info in zf.infolist():
+if info.filename not in self.name_to_info:
+self.name_to_info[info.filename] = info
+return zf
 
 def close_archive(self):
 self.archive.close()
@@ -199,7 +204,8 @@ class ZipContainer(Archive):
 ).encode(sys.getfilesystemencoding(), errors="replace")
 
 try:
-with self.archive.open(member_name) as source, open(
+info = self.name_to_info[member_name]
+with self.archive.open(info) as source, open(
 targetpath, "wb"
 ) as target:
 shutil.copyfileobj(source, target)


Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Fay Stegerman
* Holger Levsen  [2024-04-11 02:14]:
> > unzip does seem to extract all the files, though it errors out.  Not sure 
> > what
> > diffoscope should do here.  This is definitely a broken ZIP file.  That bug
> > should probably be reported against libscout or whatever tooling it used to
> > create that JAR.
> 
> I agree it's more complicated, but fundamentally, diffoscope should *not* 
> crash
> here! (but rather report the broken zip file.)

I think we all agree it shouldn't crash :)

What I meant is that I'm not sure it should simply catch the error, report the
file as broken, and not attempt extraction, or if it makes sense to attempt to
work around this issue, at least in cases like this specific one where the
entries are exact duplicates and the files can presumably be safely extracted.
I think my workaround (which could be implemented slightly differently as well,
without modifying the ZipFile, but processing it differently in diffoscope)
would accomplish that for this JAR at least.  I could make an MR for that.
Though as I said I will also report this upstream to cpython, probably tomorrow.

- Fay



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Chris Lamb
Fay Stegerman wrote:

> Salsa is probably better for figuring out what to do next, but I get
> these mails too :)

Oh, hey! o/

> unzip does seem to extract all the files, though it errors out.  Not sure what
> diffoscope should do here.  This is definitely a broken ZIP file.

First; great debugging there, thank you. :)

Okay, separate from your suggestion that a bug should be filed against
libscout with its broken zip file, I think that diffoscope should not
traceback and crash on this particular input. We do this elsewhere with
(most) invalid inputs and it makes a lot of sense here as well.

I'll modify diffoscope tomorrow morning to catch the specific
exception being thrown by Python's builtin zipfile module and add a
suitable message as a user-visible 'comment' — again, something we have
plenty of prior art for elsewhere in the codebase. Thanks again.


Best wishes,

-- 
  o
⬋   ⬊  Chris Lamb
   o o reproducible-builds.org 
⬊   ⬋
  o



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Holger Levsen
On Thu, Apr 11, 2024 at 01:48:18AM +0200, Fay Stegerman wrote:
> Salsa is probably better for figuring out what to do next, but I get these 
> mails
> too :)

:)
 
> The libscout.jar has duplicate ZIP entries in the central directory, pointing 
> to
> the same actual entry in the ZIP.  So the "overlapped entries" error is 
> entirely
> correct, even if it's not a zip bomb.

ah!

> unzip does seem to extract all the files, though it errors out.  Not sure what
> diffoscope should do here.  This is definitely a broken ZIP file.  That bug
> should probably be reported against libscout or whatever tooling it used to
> create that JAR.

I agree it's more complicated, but fundamentally, diffoscope should *not* crash
here! (but rather report the broken zip file.)

thanks!


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

I’ve said it once, and I’ll say it a thousand times: If the penalty for
breaking a law is a fine, then that law only exists for the poor.


signature.asc
Description: PGP signature


Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Fay Stegerman
* Fay Stegerman  [2024-04-11 01:48]:
> * Holger Levsen  [2024-04-10 19:43]:
> > On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote:
> > > Holger Levsen wrote:
> > > 
> > > > when building libscout 2.3.2-3 on current unstable, the result is also 
> > > > unreproducible, but diffoscope crashes when analysing the diff.
> > > I think this is somewhat related to:
> > >   https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362
> > > … which was said to be fixed by Fay in 
> > > cc3b077f6ef97b4e20036e9823926fe633c7d4d0
> > > that released as diffoscope version 263 on 2024-04-05.
> > > However, I can see that the current output of libscout/amd64 on
> > > tests.reproducible-builds.org is failing with this very version:
> > 
> > yes, indeed.
> > 
> > also, this happened before too, I'm sure about at least with diffoscope 260 
> > already.
> >  
> > > Will loop Fay in via Salsa presently.
> > 
> > thank you!
> 
> Salsa is probably better for figuring out what to do next, but I get these 
> mails
> too :)
> 
> The libscout.jar has duplicate ZIP entries in the central directory, pointing 
> to
> the same actual entry in the ZIP.  So the "overlapped entries" error is 
> entirely
> correct, even if it's not a zip bomb.
> 
>   >>> import zipfile
>   >>> zf = zipfile.ZipFile("libscout.jar")
>   >>> fh = zf.open("javax/annotation/CheckForNull.class")
>   zipfile.BadZipFile: Overlapped entries: 
> 'javax/annotation/CheckForNull.class' (possible zip bomb)
[...]

I do have a workaround of sorts for this specific case of duplicate entries.
I'll open a cpython issue to report it to upstream.  Though they may not
consider this a bug, possibly even the correct behaviour.  Not sure myself tbh 
:)

  >>> for info in reversed(zf.infolist()):
  ...   zf.NameToInfo[info.filename] = info
  >>> fh = zf.open("javax/annotation/CheckForNull.class") # works now

- Fay



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Fay Stegerman
* Holger Levsen  [2024-04-10 19:43]:
> On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote:
> > Holger Levsen wrote:
> > 
> > > when building libscout 2.3.2-3 on current unstable, the result is also 
> > > unreproducible, but diffoscope crashes when analysing the diff.
> > I think this is somewhat related to:
> >   https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362
> > … which was said to be fixed by Fay in 
> > cc3b077f6ef97b4e20036e9823926fe633c7d4d0
> > that released as diffoscope version 263 on 2024-04-05.
> > However, I can see that the current output of libscout/amd64 on
> > tests.reproducible-builds.org is failing with this very version:
> 
> yes, indeed.
> 
> also, this happened before too, I'm sure about at least with diffoscope 260 
> already.
>  
> > Will loop Fay in via Salsa presently.
> 
> thank you!

Salsa is probably better for figuring out what to do next, but I get these mails
too :)

The libscout.jar has duplicate ZIP entries in the central directory, pointing to
the same actual entry in the ZIP.  So the "overlapped entries" error is entirely
correct, even if it's not a zip bomb.

  >>> import zipfile
  >>> zf = zipfile.ZipFile("libscout.jar")
  >>> fh = zf.open("javax/annotation/CheckForNull.class")
  zipfile.BadZipFile: Overlapped entries: 'javax/annotation/CheckForNull.class' 
(possible zip bomb)
  >>> len([i for i in zf.infolist() if i.filename == 
"javax/annotation/CheckForNull.class"])
  2
  >>> len(zf.namelist()) - len(set(zf.namelist()))
  35
  >>> x, y = [i for i in zf.infolist() if i.filename == 
"javax/annotation/CheckForNull.class"]
  >>> x.header_offset
  23065534
  >>> y.header_offset
  23065534
  >>> x._end_offset
  23065890
  >>> y._end_offset
  23065534
  >>> zf.open(x)
  
  >>> zf.open(y)
  Traceback (most recent call last):
  zipfile.BadZipFile: Overlapped entries: 'javax/annotation/CheckForNull.class' 
(possible zip bomb)

$ unzip -q -d foo libscout.jar
error: invalid zip file with overlapped components (possible zip bomb)

unzip does seem to extract all the files, though it errors out.  Not sure what
diffoscope should do here.  This is definitely a broken ZIP file.  That bug
should probably be reported against libscout or whatever tooling it used to
create that JAR.

FWIW, it seems the libscout.jar files in both .deb files are identical apart
from timestamps and the ordering of entries in the ZIP.

- Fay



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Holger Levsen
On Wed, Apr 10, 2024 at 06:12:21PM +0100, Chris Lamb wrote:
> Holger Levsen wrote:
> 
> > when building libscout 2.3.2-3 on current unstable, the result is also 
> > unreproducible, but diffoscope crashes when analysing the diff.
> I think this is somewhat related to:
>   https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362
> … which was said to be fixed by Fay in 
> cc3b077f6ef97b4e20036e9823926fe633c7d4d0
> that released as diffoscope version 263 on 2024-04-05.
> However, I can see that the current output of libscout/amd64 on
> tests.reproducible-builds.org is failing with this very version:

yes, indeed.

also, this happened before too, I'm sure about at least with diffoscope 260 
already.
 
> Will loop Fay in via Salsa presently.

thank you!


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

Fischers Fritz fischt Plastik.


signature.asc
Description: PGP signature


Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-10 Thread Chris Lamb
Holger Levsen wrote:

> when building libscout 2.3.2-3 on current unstable, the result is also 
> unreproducible, but diffoscope crashes when analysing the diff.

I think this is somewhat related to:

  https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/362

… which was said to be fixed by Fay in cc3b077f6ef97b4e20036e9823926fe633c7d4d0
that released as diffoscope version 263 on 2024-04-05.

However, I can see that the current output of libscout/amd64 on
tests.reproducible-builds.org is failing with this very version:

  Tue Apr  9 12:14:14 UTC 2024  I: diffoscope 263 will be used to compare the 
two builds:

  From https://gist.github.com/lamby/e5db96d4d61612485a469b826590192e/raw
  (saved output for posterity)

Will loop Fay in via Salsa presently.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org  chris-lamb.co.uk
   `-



Bug#1068705: diffoscope crashes on libscout 2.3.2-3 build on unstable but not bullseye

2024-04-09 Thread Holger Levsen
package: diffoscope
version: 263

hi,

diffoscope 263 crashes on libscout 2.3.2-3 build on unstable but not bullseye:
libscout 2.3.2-3 is part of bullseye (but neither bookworm nor trixie) and
builds unreproducible there and diffoscope is able to show a diff.

when building libscout 2.3.2-3 on current unstable, the result is also 
unreproducible, but diffoscope crashes when analysing the diff.

this happens on all 4 tested archs.

I've copied the packages in question to
https://tests.reproducible-builds.org/debian/diffoscope-libscout/artifacts/r00t-me/
for further investigation. (because one .deb is 20mb and there's 16 of them.)


(someone please remind me to delete them there once this bug has been closed.)


-- 
cheers,
Holger

 ⢀⣴⠾⠻⢶⣦⠀
 ⣾⠁⢠⠒⠀⣿⡁  holger@(debian|reproducible-builds|layer-acht).org
 ⢿⡄⠘⠷⠚⠋⠀  OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C
 ⠈⠳⣄

The hardest part about defending against social engineering is that it
doesn't attack attack the weakness of a community.  It attacks its
*strengths*: trust, collaboration, and mutual assistance. (Russ Allbery)


signature.asc
Description: PGP signature