Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-12-04 Thread Chris Lamb
tags 879011 + pending
thanks

Ximin wrote:

> I've fixed it a different way:
>
> https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?id=25fee28c8b29dbc66cd7bb57a2ab427651050c23

Marking as pending to match.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-



Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-11-28 Thread Ximin Luo
Mike Hommey:
> On Thu, Oct 19, 2017 at 07:15:35PM +0900, Mike Hommey wrote:
>> Attached is my work in progress. I won't finish it right now because
>> it's getting late and I don't have all the optional tools to run the
>> test suite.
>>
>> It's worth noting that this changes the output, and the notable
>> difference is that the information whether files are stored or deflated
>> is lost. It doesn't seem at first glance that libarchive exposes that,
>> so that might be a showstopper :-/
> 
> Attached here is a full patchset that passes all zip-related tests. Please
> take a look at the changes in output, whether this is wanted or not.
> 

I've fixed it a different way:

https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?id=25fee28c8b29dbc66cd7bb57a2ab427651050c23

I didn't notice any slowdown that might have occurred due to the lack of random 
access to the file.

Let me know if this doesn't fix the original issue. I agree the 
stderr-to-stdout-when-tty is ridiculous. Funnily enough the behaviour isn't 
present when `-v` is enabled...

(I have applied 0001.patch since that's a separate issue, so thanks for that 
too.)

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git



Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-10-20 Thread Mike Hommey
On Thu, Oct 19, 2017 at 07:15:35PM +0900, Mike Hommey wrote:
> Attached is my work in progress. I won't finish it right now because
> it's getting late and I don't have all the optional tools to run the
> test suite.
> 
> It's worth noting that this changes the output, and the notable
> difference is that the information whether files are stored or deflated
> is lost. It doesn't seem at first glance that libarchive exposes that,
> so that might be a showstopper :-/

Attached here is a full patchset that passes all zip-related tests. Please
take a look at the changes in output, whether this is wanted or not.

Mike
>From 3b02350b9e9b021b401762b606f7fb6205dc5dc1 Mon Sep 17 00:00:00 2001
From: Mike Hommey 
Date: Fri, 20 Oct 2017 20:36:54 +0900
Subject: [PATCH 1/2] Extract libarchive contents with a file extension

Some of the commands running on extracted content, like javap, require a
specific file extension to work, so the original extension is better
preserved.
---
 diffoscope/comparators/utils/libarchive.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/diffoscope/comparators/utils/libarchive.py b/diffoscope/comparators/utils/libarchive.py
index db953a5..0f8cb4b 100644
--- a/diffoscope/comparators/utils/libarchive.py
+++ b/diffoscope/comparators/utils/libarchive.py
@@ -222,6 +222,8 @@ class LibarchiveContainer(Archive):
 # Keep directory sizes small. could be improved but should be
 # good enough for "ordinary" large archives.
 dst = os.path.join(tmpdir, str(idx // 4096), str(idx % 4096))
+root, ext = os.path.splitext(entry.pathname)
+dst += ext
 # Maintain a mapping of archive path to the extracted path,
 # avoiding the need to sanitise filenames.
 self._members[entry.pathname] = dst
-- 
2.14.1

>From 79b564f7d52e08d88aae20d55602b2b1b147a515 Mon Sep 17 00:00:00 2001
From: Mike Hommey 
Date: Fri, 20 Oct 2017 20:39:08 +0900
Subject: [PATCH 2/2] Use libarchive instead of zipinfo/bsdtar for zip files

---
 diffoscope/comparators/apk.py   |  10 +--
 diffoscope/comparators/zip.py   | 123 +++-
 tests/comparators/test_apk.py   |  28 +++-
 tests/comparators/test_dex.py   |  11 ++-
 tests/comparators/test_epub.py  |   6 +-
 tests/comparators/test_zip.py   |   7 --
 tests/data/apk_zipinfo_expected_diff|   6 --
 tests/data/dex_expected_diffs   |  17 -
 tests/data/epub_expected_diffs  |  42 +--
 tests/data/mozzip_zipinfo_expected_diff |  18 +
 tests/data/zip_bsdtar_expected_diff |   6 +-
 tests/data/zip_zipinfo_expected_diff|  12 ++--
 12 files changed, 55 insertions(+), 231 deletions(-)
 delete mode 100644 tests/data/apk_zipinfo_expected_diff

diff --git a/diffoscope/comparators/apk.py b/diffoscope/comparators/apk.py
index 7623f82..e00d9f2 100644
--- a/diffoscope/comparators/apk.py
+++ b/diffoscope/comparators/apk.py
@@ -26,10 +26,9 @@ from diffoscope.tools import tool_required
 from diffoscope.tempfiles import get_temporary_directory
 from diffoscope.difference import Difference
 
-from .utils.file import File
 from .utils.archive import Archive
 from .utils.compare import compare_files
-from .zip import Zipinfo, ZipinfoVerbose
+from .zip import ZipFile
 from .missing_file import MissingFile
 
 logger = logging.getLogger(__name__)
@@ -146,17 +145,12 @@ class ApkContainer(Archive):
 return differences
 
 
-class ApkFile(File):
+class ApkFile(ZipFile):
 FILE_TYPE_HEADER_PREFIX = b"PK\x03\x04"
 FILE_TYPE_RE = re.compile(r'^(Java|Zip) archive data.*\b')
 FILE_EXTENSION_SUFFIX = '.apk'
 CONTAINER_CLASS = ApkContainer
 
-def compare_details(self, other, source=None):
-zipinfo_difference = Difference.from_command(Zipinfo, self.path, other.path) or \
- Difference.from_command(ZipinfoVerbose, self.path, other.path)
-return [zipinfo_difference]
-
 
 def filter_apk_metadata(filepath, archive_name):
 new_filename = os.path.join(os.path.dirname(filepath), 'APK metadata')
diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py
index 0815ddc..01f5071 100644
--- a/diffoscope/comparators/zip.py
+++ b/diffoscope/comparators/zip.py
@@ -23,84 +23,14 @@ import shutil
 import os.path
 import zipfile
 
-from diffoscope.tools import tool_required
 from diffoscope.difference import Difference
-
 from .utils.file import File
+from .utils.libarchive import LibarchiveContainer, list_libarchive
 from .directory import Directory
-from .utils.archive import Archive, ArchiveMember
-from .utils.command import Command
-
-
-class Zipinfo(Command):
-@tool_required('zipinfo')
-def cmdline(self):
-return ['zipinfo', self.path]
-
-def filter(self, line):
-# we don't care about the archive file path
-if line.startswith(b'Archive:'):
- 

Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-10-19 Thread Mike Hommey
On Thu, Oct 19, 2017 at 06:27:07PM +0900, Mike Hommey wrote:
> On Thu, Oct 19, 2017 at 04:41:04PM +0900, Mike Hommey wrote:
> > On Thu, Oct 19, 2017 at 03:57:29PM +0900, Mike Hommey wrote:
> > > On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote:
> > > > Package: diffoscope
> > > > Version: 87
> > > > Severity: normal
> > > > 
> > > > While diffing firefox, zipinfo is run on omni.ja, and issues warnings
> > > > like:
> > > > warning [/tmp/tmplgigxgm__diffoscope/0/24]:  17283883 extra bytes at 
> > > > beginning or within zipfile
> > > > 
> > > > Now, when both ends have the same warning, as expected, the diff still
> > > > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path.
> > > 
> > > Note these messages are sent to stderr, and I noticed that for readelf,
> > > stderr is not part of the diff. It seems it would make sense to do the
> > > same for zipinfo.
> > 
> > The code actually does the same thing as for readelf. The problem is
> > that zipinfo is not consistently sending its warning to stderr.
> > 
> > $ zipinfo omni.ja > /dev/null
> > warning [omni.ja]:  17283876 extra bytes at beginning or within zipfile
> >   (attempting to process anyway)
> > error [omni.ja]:  reported length of central directory is
> >   -17283876 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
> >   zipfile?).  Compensating...
> > 
> > ^ means the warnings are on stderr, right? Nope
> > 
> > $ zipinfo omni.ja 2> /dev/null | grep -v 10-Jan-01
> > Archive:  omni.ja
> > Zip file size: 17400328 bytes, number of entries: 1313
> > warning [omni.ja]:  17283876 extra bytes at beginning or within zipfile
> >   (attempting to process anyway)
> > error [omni.ja]:  reported length of central directory is
> >   -17283876 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
> >   zipfile?).  Compensating...
> > 1313 files, 17188436 bytes uncompressed, 17188436 bytes compressed:  0.0%
> > 
> > It looks like it only sends those messages to stderr when it's a tty.
> 
> I haven't found a way to make things work with zipinfo's crazy behavior,
> however, I was able to get the zip comparator to work by using
> libarchive, like for other archive formats. I still need to do some more
> cleanup of the code, adjust the apk comparator, and adjust tests.

Attached is my work in progress. I won't finish it right now because
it's getting late and I don't have all the optional tools to run the
test suite.

It's worth noting that this changes the output, and the notable
difference is that the information whether files are stored or deflated
is lost. It doesn't seem at first glance that libarchive exposes that,
so that might be a showstopper :-/

Mike
diff --git a/diffoscope/comparators/apk.py b/diffoscope/comparators/apk.py
index 7623f82..e00d9f2 100644
--- a/diffoscope/comparators/apk.py
+++ b/diffoscope/comparators/apk.py
@@ -26,10 +26,9 @@ from diffoscope.tools import tool_required
 from diffoscope.tempfiles import get_temporary_directory
 from diffoscope.difference import Difference
 
-from .utils.file import File
 from .utils.archive import Archive
 from .utils.compare import compare_files
-from .zip import Zipinfo, ZipinfoVerbose
+from .zip import ZipFile
 from .missing_file import MissingFile
 
 logger = logging.getLogger(__name__)
@@ -146,17 +145,12 @@ class ApkContainer(Archive):
 return differences
 
 
-class ApkFile(File):
+class ApkFile(ZipFile):
 FILE_TYPE_HEADER_PREFIX = b"PK\x03\x04"
 FILE_TYPE_RE = re.compile(r'^(Java|Zip) archive data.*\b')
 FILE_EXTENSION_SUFFIX = '.apk'
 CONTAINER_CLASS = ApkContainer
 
-def compare_details(self, other, source=None):
-zipinfo_difference = Difference.from_command(Zipinfo, self.path, 
other.path) or \
- Difference.from_command(ZipinfoVerbose, 
self.path, other.path)
-return [zipinfo_difference]
-
 
 def filter_apk_metadata(filepath, archive_name):
 new_filename = os.path.join(os.path.dirname(filepath), 'APK metadata')
diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py
index 0815ddc..01f5071 100644
--- a/diffoscope/comparators/zip.py
+++ b/diffoscope/comparators/zip.py
@@ -23,84 +23,14 @@ import shutil
 import os.path
 import zipfile
 
-from diffoscope.tools import tool_required
 from diffoscope.difference import Difference
-
 from .utils.file import File
+from .utils.libarchive import LibarchiveContainer, list_libarchive
 from .directory import Directory
-from .utils.archive import Archive, ArchiveMember
-from .utils.command import Command
-
-
-class Zipinfo(Command):
-@tool_required('zipinfo')
-def cmdline(self):
-return ['zipinfo', self.path]
-
-def filter(self, line):
-# we don't care about the archive file path
-if line.startswith(b'Archive:'):
-return b''
-return line
-
-
-class ZipinfoVerbose(Zipinfo):
-@tool_required('zipinfo')
-def cmdline(self):
-return ['zipinfo', '-v', self.path]
-
-

Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-10-19 Thread Mike Hommey
On Thu, Oct 19, 2017 at 04:41:04PM +0900, Mike Hommey wrote:
> On Thu, Oct 19, 2017 at 03:57:29PM +0900, Mike Hommey wrote:
> > On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote:
> > > Package: diffoscope
> > > Version: 87
> > > Severity: normal
> > > 
> > > While diffing firefox, zipinfo is run on omni.ja, and issues warnings
> > > like:
> > > warning [/tmp/tmplgigxgm__diffoscope/0/24]:  17283883 extra bytes at 
> > > beginning or within zipfile
> > > 
> > > Now, when both ends have the same warning, as expected, the diff still
> > > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path.
> > 
> > Note these messages are sent to stderr, and I noticed that for readelf,
> > stderr is not part of the diff. It seems it would make sense to do the
> > same for zipinfo.
> 
> The code actually does the same thing as for readelf. The problem is
> that zipinfo is not consistently sending its warning to stderr.
> 
> $ zipinfo omni.ja > /dev/null
> warning [omni.ja]:  17283876 extra bytes at beginning or within zipfile
>   (attempting to process anyway)
> error [omni.ja]:  reported length of central directory is
>   -17283876 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
>   zipfile?).  Compensating...
> 
> ^ means the warnings are on stderr, right? Nope
> 
> $ zipinfo omni.ja 2> /dev/null | grep -v 10-Jan-01
> Archive:  omni.ja
> Zip file size: 17400328 bytes, number of entries: 1313
> warning [omni.ja]:  17283876 extra bytes at beginning or within zipfile
>   (attempting to process anyway)
> error [omni.ja]:  reported length of central directory is
>   -17283876 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
>   zipfile?).  Compensating...
> 1313 files, 17188436 bytes uncompressed, 17188436 bytes compressed:  0.0%
> 
> It looks like it only sends those messages to stderr when it's a tty.

I haven't found a way to make things work with zipinfo's crazy behavior,
however, I was able to get the zip comparator to work by using
libarchive, like for other archive formats. I still need to do some more
cleanup of the code, adjust the apk comparator, and adjust tests.

Mike



Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-10-19 Thread Mike Hommey
On Thu, Oct 19, 2017 at 03:57:29PM +0900, Mike Hommey wrote:
> On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote:
> > Package: diffoscope
> > Version: 87
> > Severity: normal
> > 
> > While diffing firefox, zipinfo is run on omni.ja, and issues warnings
> > like:
> > warning [/tmp/tmplgigxgm__diffoscope/0/24]:  17283883 extra bytes at 
> > beginning or within zipfile
> > 
> > Now, when both ends have the same warning, as expected, the diff still
> > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path.
> 
> Note these messages are sent to stderr, and I noticed that for readelf,
> stderr is not part of the diff. It seems it would make sense to do the
> same for zipinfo.

The code actually does the same thing as for readelf. The problem is
that zipinfo is not consistently sending its warning to stderr.

$ zipinfo omni.ja > /dev/null
warning [omni.ja]:  17283876 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [omni.ja]:  reported length of central directory is
  -17283876 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
  zipfile?).  Compensating...

^ means the warnings are on stderr, right? Nope

$ zipinfo omni.ja 2> /dev/null | grep -v 10-Jan-01
Archive:  omni.ja
Zip file size: 17400328 bytes, number of entries: 1313
warning [omni.ja]:  17283876 extra bytes at beginning or within zipfile
  (attempting to process anyway)
error [omni.ja]:  reported length of central directory is
  -17283876 bytes too long (Atari STZip zipfile?  J.H.Holm ZIPSPLIT 1.1
  zipfile?).  Compensating...
1313 files, 17188436 bytes uncompressed, 17188436 bytes compressed:  0.0%

It looks like it only sends those messages to stderr when it's a tty.

Mike



Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-10-19 Thread Mike Hommey
On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote:
> Package: diffoscope
> Version: 87
> Severity: normal
> 
> While diffing firefox, zipinfo is run on omni.ja, and issues warnings
> like:
> warning [/tmp/tmplgigxgm__diffoscope/0/24]:  17283883 extra bytes at 
> beginning or within zipfile
> 
> Now, when both ends have the same warning, as expected, the diff still
> shows a difference because of the /tmp/tmp*__diffoscope/0/fd path.

Note these messages are sent to stderr, and I noticed that for readelf,
stderr is not part of the diff. It seems it would make sense to do the
same for zipinfo.

Mike



Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names

2017-10-18 Thread Mike Hommey
Package: diffoscope
Version: 87
Severity: normal

While diffing firefox, zipinfo is run on omni.ja, and issues warnings
like:
warning [/tmp/tmplgigxgm__diffoscope/0/24]:  17283883 extra bytes at beginning 
or within zipfile

Now, when both ends have the same warning, as expected, the diff still
shows a difference because of the /tmp/tmp*__diffoscope/0/fd path.

Mike


-- System Information:
Debian Release: buster/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'testing'), (1, 
'experimental-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.13.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8), 
LANGUAGE=ja_JP.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages diffoscope depends on:
ii  python33.6.3-1
ii  python3-distro 1.0.1-2
ii  python3-libarchive-c   2.1-3.1
ii  python3-magic  1:5.32-1
ii  python3-pkg-resources  36.2.7-2

Versions of packages diffoscope recommends:
ii  acl   2.2.52-3+b1
pn  apktool   
pn  binutils-multiarch
ii  bzip2 1.0.6-8.1
pn  caca-utils
ii  colord1.3.3-2
pn  device-tree-compiler  
pn  docx2txt  
pn  enjarify  
pn  fontforge-extras  
pn  fp-utils  
pn  genisoimage   
ii  gettext   0.19.8.1-4
pn  ghc   
ii  ghostscript   9.22~dfsg-1
pn  giflib-tools  
ii  gnupg 2.2.1-2
pn  imagemagick   
pn  jsbeautifier  
pn  libarchive-tools  
pn  llvm  
pn  mono-utils
pn  odt2txt   
pn  oggvideotools 
ii  openjdk-7-jdk [java-sdk]  7u121-2.6.8-1
ii  openjdk-8-jdk [java-sdk]  8u144-b01-2
ii  openssh-client1:7.6p1-2
pn  pdftk 
pn  pgpdump   
pn  poppler-utils 
pn  python3-argcomplete   
ii  python3-binwalk   2.1.1-16
ii  python3-debian0.1.31
ii  python3-defusedxml0.5.0-1
pn  python3-guestfs   
pn  python3-progressbar   
pn  python3-rpm   
pn  python3-tlsh  
pn  r-base-core   
ii  rpm2cpio  4.12.0.2+dfsg1-2+b1
pn  sng   
ii  sqlite3   3.20.1-1
ii  squashfs-tools1:4.3-4
ii  tcpdump   4.9.2-1
ii  unzip 6.0-21
ii  vim-common2:8.0.1144-1
ii  xxd   2:8.0.1144-1+b1
ii  xz-utils  5.2.2-1.3

Versions of packages diffoscope suggests:
ii  libjs-jquery  3.2.1-1

-- no debconf information