Re: [arch-projects] [PATCH] [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf

2019-03-18 Thread Maxim Baz via arch-projects
Thanks for the detailed reply Eli :)

Before I go further, I should mention that this thread has sparked some 
interest among our teammates,
and folks on IRC are researching now a possibility of switching from xz to zstd 
entirely.

So I think we will return to this topic soon with some updates :)

> And it is not a pixz issue either. For example xz -T $num will compress
> with up to $num cores, as opposed to the default 1, but basic testing
> shows that it produces the same output with 2, 3, 4 etc. cores but
> different output from what you get when using 1 core. You cannot even
> hardcode -T0, since (in addition to requiring packagers to use more
> cores than they want to) that may still get interpreted as 1, which is
> then unreproducible. You can verify this using taskset -c 1...
>
> (This has been discussed already in #archlinux-pacman.)

Turns out it is possible to make xz use more threads and produce reproducible 
results,
check out this message: 
https://lists.debian.org/debian-dpkg/2016/10/msg00011.html

I've confirmed (with real VMs) that -T2, -T3, -T8, all produce the same result, 
on both single-core and multi-core CPUs,
and -T0 produces the same result on multi-core CPU, but -T0 produces a 
different result on a single-core CPU.

Therefore, the simplest workaround that produces reproducible results on all 
types of CPU is this:

-T$([ $(nproc) -lt 2  ] && echo '2' || echo '0')

Or maybe even something like this:

-T$(( $(nproc) + 1  ))

> For the topic of configurable implementations, there is an additional
> hurdle. Since compression happens inside the chroot, even if we did
> support it, it would need some way to look up the required binary and
> install it in the chroot, but we have no tooling for this and in fact
> makepkg does not even know how to check that the binary exists, so you
> will probably just get:
>
>   -> Compressing package...
> /usr/share/makepkg/util/compress.sh: line 39: pixz: command not found
> bsdtar: Write error
> ==> ERROR: Failed to create package file.
>
> The solution would have to require either a known, embedded list of
> "good" ones (if pixz is not reproducible when running the same command
> line twice then we'd need to be able to blacklist that), or else assume
> the user installed it when adding it to their makepkg.conf and using
> pacman -Qo. It is probably overkill to use pkgfile and support
> compressors that aren't installed on the host either...

That is correct.

One final remark, pixz does produce the same result when is run twice, I made a 
mistake when I was testing it.
But it doesn't matter, since we'll probably settle for zstd or xz -T.


--
Maxim Baz


signature.asc
Description: PGP signature


Re: [arch-projects] [PATCH] [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf

2019-03-18 Thread Eli Schwartz via arch-projects
On 3/18/19 5:56 AM, Maxim Baz via arch-projects wrote:
> I did see an earlier patch proposing the same, which was rejected because it 
> might break delta packages [1].
> However, someone mentioned that pacman is dropping support for delta packages 
> altogether [2] referring to a removal commit in pacman repo [3], so I wanted 
> to bring this up again :)
> 
> I'm creating a new thread because according to my tests the earlier patch [1] 
> is incomplete.
> 
> 
> 1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html
> 2: https://github.com/AladW/aurutils/issues/547
> 3: 
> https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e85431dabeea005e

Well, I would ideally like to add delta packages back into pacman,
whether that is via xdelta3 or something else, and it is also worth
noting it got removed at least in large part because archlinux.org
itself has never really used it, and therefore:
- it doesn't exactly get well tested
- for our use cases we don't ultimately notice the difference

(I would like to one day add delta updates to dbscripts. :( But before
that happens, I would need to convince Allan with a tested, safe delta
implementation that someone actually understands and is willing to
maintain.)

Supporting packages in the official repos which are not byte-identical
to the default COMPRESSXZ makes it quite difficult to ever add delta
support back.

...

The fundamental problem behind delta updates remains the same, though,
which is that you cannot actually generate the same package twice using
the same data (even from a byte-identical compressed tar, which is
basically what xdelta3 did). As Levente pointed out, this completely
nukes reproducible builds.

And it is not a pixz issue either. For example xz -T $num will compress
with up to $num cores, as opposed to the default 1, but basic testing
shows that it produces the same output with 2, 3, 4 etc. cores but
different output from what you get when using 1 core. You cannot even
hardcode -T0, since (in addition to requiring packagers to use more
cores than they want to) that may still get interpreted as 1, which is
then unreproducible. You can verify this using taskset -c 1...

(This has been discussed already in #archlinux-pacman.)

Without a guarantee that a given compressor will always be reproducible
no matter how many cores are used, we cannot take advantage of
multithreaded compression on any level.

More generally, without a variation on
https://patchwork.archlinux.org/patch/692/ we cannot take advantage of
any sort of configurability. But my patch was rejected.

Specific sub-issues with unknown configurability that isn't recorded in
the .BUILDINFO:

User-configurable implementations.
Without a guarantee that any implementation of a format retains
byte-level compatibility with the standard format (in this case xz), we
cannot support this as we do not know what was being used.

User-configurable options.
These are inherently unreproducible as long as they include the
compression level.

...

For the topic of configurable implementations, there is an additional
hurdle. Since compression happens inside the chroot, even if we did
support it, it would need some way to look up the required binary and
install it in the chroot, but we have no tooling for this and in fact
makepkg does not even know how to check that the binary exists, so you
will probably just get:

  -> Compressing package...
/usr/share/makepkg/util/compress.sh: line 39: pixz: command not found
bsdtar: Write error
==> ERROR: Failed to create package file.

The solution would have to require either a known, embedded list of
"good" ones (if pixz is not reproducible when running the same command
line twice then we'd need to be able to blacklist that), or else assume
the user installed it when adding it to their makepkg.conf and using
pacman -Qo. It is probably overkill to use pkgfile and support
compressors that aren't installed on the host either...

-- 
Eli Schwartz
Bug Wrangler and Trusted User



signature.asc
Description: OpenPGP digital signature


Re: [arch-projects] [PATCH] [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf

2019-03-18 Thread Maxim Baz via arch-projects
On Mon, Mar 18, 2019 at 10:58:57AM +0100, Levente Polyak via arch-projects 
wrote:
> Messing with this in devtools used for building official repo packages kills 
> reproducible builds when not agreed on very specific flags and tools to 
> compress.
> This is not a good idea to do and we won't kill reproducibility by going wild 
> west.

True, sad but valid point.

I've just confirmed that pixz produces different results when run twice on the 
same data, so I can't even propose to just switch the default to pixz without 
breaking reproducible builds :(

Thanks for the quick feedback!

--
Maxim Baz


signature.asc
Description: PGP signature


Re: [arch-projects] [PATCH] [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf

2019-03-18 Thread Levente Polyak via arch-projects
On March 18, 2019 10:56:34 AM GMT+01:00, Maxim Baz via arch-projects 
 wrote:
>I did see an earlier patch proposing the same, which was rejected
>because it might break delta packages [1].
>However, someone mentioned that pacman is dropping support for delta
>packages altogether [2] referring to a removal commit in pacman repo
>[3], so I wanted to bring this up again :)
>
>I'm creating a new thread because according to my tests the earlier
>patch [1] is incomplete.
>
>
>1:
>https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html
>2: https://github.com/AladW/aurutils/issues/547
>3:
>https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e85431dabeea005e
>
>--



Messing with this in devtools used for building official repo packages kills 
reproducible builds when not agreed on very specific flags and tools to 
compress.
This is not a good idea to do and we won't kill reproducibility by going wild 
west. 


Re: [arch-projects] [PATCH] [devtools] makechrootpkg: load COMPRESS* vars from makepkg.conf

2019-03-18 Thread Maxim Baz via arch-projects
I did see an earlier patch proposing the same, which was rejected because it 
might break delta packages [1].
However, someone mentioned that pacman is dropping support for delta packages 
altogether [2] referring to a removal commit in pacman repo [3], so I wanted to 
bring this up again :)

I'm creating a new thread because according to my tests the earlier patch [1] 
is incomplete.


1: https://lists.archlinux.org/pipermail/arch-projects/2017-May/004574.html
2: https://github.com/AladW/aurutils/issues/547
3: 
https://git.archlinux.org/pacman.git/commit/?id=9adb0d5b37df7ca668e23877e85431dabeea005e

--
Maxim Baz


signature.asc
Description: PGP signature