Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
tags 879011 + pending thanks Ximin wrote: > I've fixed it a different way: > > https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?id=25fee28c8b29dbc66cd7bb57a2ab427651050c23 Marking as pending to match. Regards, -- ,''`. : :' : Chris Lamb `. `'` la...@debian.org / chris-lamb.co.uk `-
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
Mike Hommey: > On Thu, Oct 19, 2017 at 07:15:35PM +0900, Mike Hommey wrote: >> Attached is my work in progress. I won't finish it right now because >> it's getting late and I don't have all the optional tools to run the >> test suite. >> >> It's worth noting that this changes the output, and the notable >> difference is that the information whether files are stored or deflated >> is lost. It doesn't seem at first glance that libarchive exposes that, >> so that might be a showstopper :-/ > > Attached here is a full patchset that passes all zip-related tests. Please > take a look at the changes in output, whether this is wanted or not. > I've fixed it a different way: https://anonscm.debian.org/cgit/reproducible/diffoscope.git/commit/?id=25fee28c8b29dbc66cd7bb57a2ab427651050c23 I didn't notice any slowdown that might have occurred due to the lack of random access to the file. Let me know if this doesn't fix the original issue. I agree the stderr-to-stdout-when-tty is ridiculous. Funnily enough the behaviour isn't present when `-v` is enabled... (I have applied 0001.patch since that's a separate issue, so thanks for that too.) X -- GPG: ed25519/56034877E1F87C35 GPG: rsa4096/1318EFAC5FBBDBCE https://github.com/infinity0/pubkeys.git
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
On Thu, Oct 19, 2017 at 07:15:35PM +0900, Mike Hommey wrote: > Attached is my work in progress. I won't finish it right now because > it's getting late and I don't have all the optional tools to run the > test suite. > > It's worth noting that this changes the output, and the notable > difference is that the information whether files are stored or deflated > is lost. It doesn't seem at first glance that libarchive exposes that, > so that might be a showstopper :-/ Attached here is a full patchset that passes all zip-related tests. Please take a look at the changes in output, whether this is wanted or not. Mike >From 3b02350b9e9b021b401762b606f7fb6205dc5dc1 Mon Sep 17 00:00:00 2001 From: Mike HommeyDate: Fri, 20 Oct 2017 20:36:54 +0900 Subject: [PATCH 1/2] Extract libarchive contents with a file extension Some of the commands running on extracted content, like javap, require a specific file extension to work, so the original extension is better preserved. --- diffoscope/comparators/utils/libarchive.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/diffoscope/comparators/utils/libarchive.py b/diffoscope/comparators/utils/libarchive.py index db953a5..0f8cb4b 100644 --- a/diffoscope/comparators/utils/libarchive.py +++ b/diffoscope/comparators/utils/libarchive.py @@ -222,6 +222,8 @@ class LibarchiveContainer(Archive): # Keep directory sizes small. could be improved but should be # good enough for "ordinary" large archives. dst = os.path.join(tmpdir, str(idx // 4096), str(idx % 4096)) +root, ext = os.path.splitext(entry.pathname) +dst += ext # Maintain a mapping of archive path to the extracted path, # avoiding the need to sanitise filenames. self._members[entry.pathname] = dst -- 2.14.1 >From 79b564f7d52e08d88aae20d55602b2b1b147a515 Mon Sep 17 00:00:00 2001 From: Mike Hommey Date: Fri, 20 Oct 2017 20:39:08 +0900 Subject: [PATCH 2/2] Use libarchive instead of zipinfo/bsdtar for zip files --- diffoscope/comparators/apk.py | 10 +-- diffoscope/comparators/zip.py | 123 +++- tests/comparators/test_apk.py | 28 +++- tests/comparators/test_dex.py | 11 ++- tests/comparators/test_epub.py | 6 +- tests/comparators/test_zip.py | 7 -- tests/data/apk_zipinfo_expected_diff| 6 -- tests/data/dex_expected_diffs | 17 - tests/data/epub_expected_diffs | 42 +-- tests/data/mozzip_zipinfo_expected_diff | 18 + tests/data/zip_bsdtar_expected_diff | 6 +- tests/data/zip_zipinfo_expected_diff| 12 ++-- 12 files changed, 55 insertions(+), 231 deletions(-) delete mode 100644 tests/data/apk_zipinfo_expected_diff diff --git a/diffoscope/comparators/apk.py b/diffoscope/comparators/apk.py index 7623f82..e00d9f2 100644 --- a/diffoscope/comparators/apk.py +++ b/diffoscope/comparators/apk.py @@ -26,10 +26,9 @@ from diffoscope.tools import tool_required from diffoscope.tempfiles import get_temporary_directory from diffoscope.difference import Difference -from .utils.file import File from .utils.archive import Archive from .utils.compare import compare_files -from .zip import Zipinfo, ZipinfoVerbose +from .zip import ZipFile from .missing_file import MissingFile logger = logging.getLogger(__name__) @@ -146,17 +145,12 @@ class ApkContainer(Archive): return differences -class ApkFile(File): +class ApkFile(ZipFile): FILE_TYPE_HEADER_PREFIX = b"PK\x03\x04" FILE_TYPE_RE = re.compile(r'^(Java|Zip) archive data.*\b') FILE_EXTENSION_SUFFIX = '.apk' CONTAINER_CLASS = ApkContainer -def compare_details(self, other, source=None): -zipinfo_difference = Difference.from_command(Zipinfo, self.path, other.path) or \ - Difference.from_command(ZipinfoVerbose, self.path, other.path) -return [zipinfo_difference] - def filter_apk_metadata(filepath, archive_name): new_filename = os.path.join(os.path.dirname(filepath), 'APK metadata') diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py index 0815ddc..01f5071 100644 --- a/diffoscope/comparators/zip.py +++ b/diffoscope/comparators/zip.py @@ -23,84 +23,14 @@ import shutil import os.path import zipfile -from diffoscope.tools import tool_required from diffoscope.difference import Difference - from .utils.file import File +from .utils.libarchive import LibarchiveContainer, list_libarchive from .directory import Directory -from .utils.archive import Archive, ArchiveMember -from .utils.command import Command - - -class Zipinfo(Command): -@tool_required('zipinfo') -def cmdline(self): -return ['zipinfo', self.path] - -def filter(self, line): -# we don't care about the archive file path -if line.startswith(b'Archive:'): -
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
On Thu, Oct 19, 2017 at 06:27:07PM +0900, Mike Hommey wrote: > On Thu, Oct 19, 2017 at 04:41:04PM +0900, Mike Hommey wrote: > > On Thu, Oct 19, 2017 at 03:57:29PM +0900, Mike Hommey wrote: > > > On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote: > > > > Package: diffoscope > > > > Version: 87 > > > > Severity: normal > > > > > > > > While diffing firefox, zipinfo is run on omni.ja, and issues warnings > > > > like: > > > > warning [/tmp/tmplgigxgm__diffoscope/0/24]: 17283883 extra bytes at > > > > beginning or within zipfile > > > > > > > > Now, when both ends have the same warning, as expected, the diff still > > > > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path. > > > > > > Note these messages are sent to stderr, and I noticed that for readelf, > > > stderr is not part of the diff. It seems it would make sense to do the > > > same for zipinfo. > > > > The code actually does the same thing as for readelf. The problem is > > that zipinfo is not consistently sending its warning to stderr. > > > > $ zipinfo omni.ja > /dev/null > > warning [omni.ja]: 17283876 extra bytes at beginning or within zipfile > > (attempting to process anyway) > > error [omni.ja]: reported length of central directory is > > -17283876 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 > > zipfile?). Compensating... > > > > ^ means the warnings are on stderr, right? Nope > > > > $ zipinfo omni.ja 2> /dev/null | grep -v 10-Jan-01 > > Archive: omni.ja > > Zip file size: 17400328 bytes, number of entries: 1313 > > warning [omni.ja]: 17283876 extra bytes at beginning or within zipfile > > (attempting to process anyway) > > error [omni.ja]: reported length of central directory is > > -17283876 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 > > zipfile?). Compensating... > > 1313 files, 17188436 bytes uncompressed, 17188436 bytes compressed: 0.0% > > > > It looks like it only sends those messages to stderr when it's a tty. > > I haven't found a way to make things work with zipinfo's crazy behavior, > however, I was able to get the zip comparator to work by using > libarchive, like for other archive formats. I still need to do some more > cleanup of the code, adjust the apk comparator, and adjust tests. Attached is my work in progress. I won't finish it right now because it's getting late and I don't have all the optional tools to run the test suite. It's worth noting that this changes the output, and the notable difference is that the information whether files are stored or deflated is lost. It doesn't seem at first glance that libarchive exposes that, so that might be a showstopper :-/ Mike diff --git a/diffoscope/comparators/apk.py b/diffoscope/comparators/apk.py index 7623f82..e00d9f2 100644 --- a/diffoscope/comparators/apk.py +++ b/diffoscope/comparators/apk.py @@ -26,10 +26,9 @@ from diffoscope.tools import tool_required from diffoscope.tempfiles import get_temporary_directory from diffoscope.difference import Difference -from .utils.file import File from .utils.archive import Archive from .utils.compare import compare_files -from .zip import Zipinfo, ZipinfoVerbose +from .zip import ZipFile from .missing_file import MissingFile logger = logging.getLogger(__name__) @@ -146,17 +145,12 @@ class ApkContainer(Archive): return differences -class ApkFile(File): +class ApkFile(ZipFile): FILE_TYPE_HEADER_PREFIX = b"PK\x03\x04" FILE_TYPE_RE = re.compile(r'^(Java|Zip) archive data.*\b') FILE_EXTENSION_SUFFIX = '.apk' CONTAINER_CLASS = ApkContainer -def compare_details(self, other, source=None): -zipinfo_difference = Difference.from_command(Zipinfo, self.path, other.path) or \ - Difference.from_command(ZipinfoVerbose, self.path, other.path) -return [zipinfo_difference] - def filter_apk_metadata(filepath, archive_name): new_filename = os.path.join(os.path.dirname(filepath), 'APK metadata') diff --git a/diffoscope/comparators/zip.py b/diffoscope/comparators/zip.py index 0815ddc..01f5071 100644 --- a/diffoscope/comparators/zip.py +++ b/diffoscope/comparators/zip.py @@ -23,84 +23,14 @@ import shutil import os.path import zipfile -from diffoscope.tools import tool_required from diffoscope.difference import Difference - from .utils.file import File +from .utils.libarchive import LibarchiveContainer, list_libarchive from .directory import Directory -from .utils.archive import Archive, ArchiveMember -from .utils.command import Command - - -class Zipinfo(Command): -@tool_required('zipinfo') -def cmdline(self): -return ['zipinfo', self.path] - -def filter(self, line): -# we don't care about the archive file path -if line.startswith(b'Archive:'): -return b'' -return line - - -class ZipinfoVerbose(Zipinfo): -@tool_required('zipinfo') -def cmdline(self): -return ['zipinfo', '-v', self.path] - -
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
On Thu, Oct 19, 2017 at 04:41:04PM +0900, Mike Hommey wrote: > On Thu, Oct 19, 2017 at 03:57:29PM +0900, Mike Hommey wrote: > > On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote: > > > Package: diffoscope > > > Version: 87 > > > Severity: normal > > > > > > While diffing firefox, zipinfo is run on omni.ja, and issues warnings > > > like: > > > warning [/tmp/tmplgigxgm__diffoscope/0/24]: 17283883 extra bytes at > > > beginning or within zipfile > > > > > > Now, when both ends have the same warning, as expected, the diff still > > > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path. > > > > Note these messages are sent to stderr, and I noticed that for readelf, > > stderr is not part of the diff. It seems it would make sense to do the > > same for zipinfo. > > The code actually does the same thing as for readelf. The problem is > that zipinfo is not consistently sending its warning to stderr. > > $ zipinfo omni.ja > /dev/null > warning [omni.ja]: 17283876 extra bytes at beginning or within zipfile > (attempting to process anyway) > error [omni.ja]: reported length of central directory is > -17283876 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 > zipfile?). Compensating... > > ^ means the warnings are on stderr, right? Nope > > $ zipinfo omni.ja 2> /dev/null | grep -v 10-Jan-01 > Archive: omni.ja > Zip file size: 17400328 bytes, number of entries: 1313 > warning [omni.ja]: 17283876 extra bytes at beginning or within zipfile > (attempting to process anyway) > error [omni.ja]: reported length of central directory is > -17283876 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 > zipfile?). Compensating... > 1313 files, 17188436 bytes uncompressed, 17188436 bytes compressed: 0.0% > > It looks like it only sends those messages to stderr when it's a tty. I haven't found a way to make things work with zipinfo's crazy behavior, however, I was able to get the zip comparator to work by using libarchive, like for other archive formats. I still need to do some more cleanup of the code, adjust the apk comparator, and adjust tests. Mike
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
On Thu, Oct 19, 2017 at 03:57:29PM +0900, Mike Hommey wrote: > On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote: > > Package: diffoscope > > Version: 87 > > Severity: normal > > > > While diffing firefox, zipinfo is run on omni.ja, and issues warnings > > like: > > warning [/tmp/tmplgigxgm__diffoscope/0/24]: 17283883 extra bytes at > > beginning or within zipfile > > > > Now, when both ends have the same warning, as expected, the diff still > > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path. > > Note these messages are sent to stderr, and I noticed that for readelf, > stderr is not part of the diff. It seems it would make sense to do the > same for zipinfo. The code actually does the same thing as for readelf. The problem is that zipinfo is not consistently sending its warning to stderr. $ zipinfo omni.ja > /dev/null warning [omni.ja]: 17283876 extra bytes at beginning or within zipfile (attempting to process anyway) error [omni.ja]: reported length of central directory is -17283876 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... ^ means the warnings are on stderr, right? Nope $ zipinfo omni.ja 2> /dev/null | grep -v 10-Jan-01 Archive: omni.ja Zip file size: 17400328 bytes, number of entries: 1313 warning [omni.ja]: 17283876 extra bytes at beginning or within zipfile (attempting to process anyway) error [omni.ja]: reported length of central directory is -17283876 bytes too long (Atari STZip zipfile? J.H.Holm ZIPSPLIT 1.1 zipfile?). Compensating... 1313 files, 17188436 bytes uncompressed, 17188436 bytes compressed: 0.0% It looks like it only sends those messages to stderr when it's a tty. Mike
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
On Wed, Oct 18, 2017 at 09:27:29PM +0900, Mike Hommey wrote: > Package: diffoscope > Version: 87 > Severity: normal > > While diffing firefox, zipinfo is run on omni.ja, and issues warnings > like: > warning [/tmp/tmplgigxgm__diffoscope/0/24]: 17283883 extra bytes at > beginning or within zipfile > > Now, when both ends have the same warning, as expected, the diff still > shows a difference because of the /tmp/tmp*__diffoscope/0/fd path. Note these messages are sent to stderr, and I noticed that for readelf, stderr is not part of the diff. It seems it would make sense to do the same for zipinfo. Mike
Bug#879011: diffoscope: zipinfo diff shows warning differences that are due to temporary file names
Package: diffoscope Version: 87 Severity: normal While diffing firefox, zipinfo is run on omni.ja, and issues warnings like: warning [/tmp/tmplgigxgm__diffoscope/0/24]: 17283883 extra bytes at beginning or within zipfile Now, when both ends have the same warning, as expected, the diff still shows a difference because of the /tmp/tmp*__diffoscope/0/fd path. Mike -- System Information: Debian Release: buster/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'unstable'), (500, 'testing'), (1, 'experimental-debug'), (1, 'experimental') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.13.0-1-amd64 (SMP w/4 CPU cores) Locale: LANG=ja_JP.UTF-8, LC_CTYPE=ja_JP.UTF-8 (charmap=UTF-8), LANGUAGE=ja_JP.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages diffoscope depends on: ii python33.6.3-1 ii python3-distro 1.0.1-2 ii python3-libarchive-c 2.1-3.1 ii python3-magic 1:5.32-1 ii python3-pkg-resources 36.2.7-2 Versions of packages diffoscope recommends: ii acl 2.2.52-3+b1 pn apktool pn binutils-multiarch ii bzip2 1.0.6-8.1 pn caca-utils ii colord1.3.3-2 pn device-tree-compiler pn docx2txt pn enjarify pn fontforge-extras pn fp-utils pn genisoimage ii gettext 0.19.8.1-4 pn ghc ii ghostscript 9.22~dfsg-1 pn giflib-tools ii gnupg 2.2.1-2 pn imagemagick pn jsbeautifier pn libarchive-tools pn llvm pn mono-utils pn odt2txt pn oggvideotools ii openjdk-7-jdk [java-sdk] 7u121-2.6.8-1 ii openjdk-8-jdk [java-sdk] 8u144-b01-2 ii openssh-client1:7.6p1-2 pn pdftk pn pgpdump pn poppler-utils pn python3-argcomplete ii python3-binwalk 2.1.1-16 ii python3-debian0.1.31 ii python3-defusedxml0.5.0-1 pn python3-guestfs pn python3-progressbar pn python3-rpm pn python3-tlsh pn r-base-core ii rpm2cpio 4.12.0.2+dfsg1-2+b1 pn sng ii sqlite3 3.20.1-1 ii squashfs-tools1:4.3-4 ii tcpdump 4.9.2-1 ii unzip 6.0-21 ii vim-common2:8.0.1144-1 ii xxd 2:8.0.1144-1+b1 ii xz-utils 5.2.2-1.3 Versions of packages diffoscope suggests: ii libjs-jquery 3.2.1-1 -- no debconf information