Hi!

On Sun, 2018-03-11 at 21:51:05 +0100, Balint Reczey wrote:
> Package: dpkg
> Version: 1.19.0.5
> Severity: wishlist
> Tags: patch

> Please add support for Zstandard compression to dpkg and other
> programs generated by the dpkg source package [1].

Thanks. I started implementing this several weeks ago after having
discussed it with Julian Andres Klode on IRC, but stopped after seeing
the implementation getting messy given the current code structure.

In any case, as I mentioned on IRC to Andres, this is something I
pondered about already in 2016, when Paul Wise blogged about it, and
which I also told Andres about at the time when he was adding lz4
support to apt. :) But parked it for later as there were several
apparent problems with it at the time.

So, the items that come to mind (most from the dpkg FAQ [F]:

* Availability in general Unix systems would be one. I think the code
  should be portable, but I've not checked properly.
* Size of the shared library another, it would be by far the fattest
  compression lib used by dpkg. It's not entirely clear whether the
  shlib embeds a zlib library?
* Increase in the (build-)essential set (directly and transitively).
* It also seems the format has changed quite some times already, and
  it's probably the reason for the fat shlib. Not sure if the format
  has stabilized enough to use this as good long-term storage format,
  and what's the policy regarding supporting old formats for example,
  given that this is intended mainly to be used for real-time and
  streaming content and similar. For example the Makefile for libzstd
  defaults to supporting v0.4+ only, which does not look great.
* The license seems fine, as being very permissive, or it could affect
  availability. This one I need to add to the FAQ.
* Memory usage seemed fine or slight better depending on the compression
  level, but not when wanting equal or less space used?
* Space used seemed worse.
* Compression and decompression speed seemed better depending on the
  compression and decompression levels.

[F] 
<https://wiki.debian.org/Teams/Dpkg/FAQ#Q:_Can_we_add_support_for_new_compressors_for_.deb_packages.3F>

Overall I'm still not sure whether this is worth it. Also the
tradeoffs for stable are different to unstable/testing, or for
fast/slow networks, or long-term storage, one-time installations,
or things like CI and similar.

In any case this would still need discussion on debian-devel, and
involvement from other parts of the project, at least ftp-masters for
example. And whether the added "eternal" support makes sense if we are
or not planning to eventually switch to the compressor as the default,
for example, etc.

> $ rm -rf firefox-xz/* ;time  dpkg-deb -R firefox-xz.deb firefox-xz/
> real 0m4,270s
> user 0m4,220s
> sys 0m0,630s
> $ rm -rf firefox-zstd/* ;time  dpkg-deb -R firefox-zstd.deb firefox-zstd/
> real 0m0,765s
> user 0m0,556s
> sys 0m0,462s

Right, although that might end up being noise when factored into a
normal dpkg installation, due to the fsync()s, or maintscript
execution, etc.

> Tests on the full Ubuntu main archive showed ~6% average increase in
> the size of the binary packages.

What about the total increase? Because it's not the same say a 15%
increase in a 500 MiB .deb, than a 2% in a 100 KiB one obviously. :)



And follows a quick code review, not very deep, given that whether to
include support for this is still open.


> From 79aad733cbc7edd44e124702f82b8a46a3a4aea9 Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint.rec...@canonical.com>
> Date: Thu, 8 Mar 2018 09:53:36 +0100
> Subject: [PATCH 1/4] dpkg: Add Zstandard compression support

> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
> index 52e9ce67d..7f898210e 100644
> --- a/dpkg-deb/main.c
> +++ b/dpkg-deb/main.c
> @@ -108,7 +108,7 @@ usage(const struct cmdinfo *cip, const char *value)
>  "      --[no-]uniform-compression   Use the compression params on all 
> members.\n"
>  "  -z#                              Set the compression level when 
> building.\n"
>  "  -Z<type>                         Set the compression type used when 
> building.\n"
> -"                                     Allowed types: gzip, xz, none.\n"
> +"                                     Allowed types: gzip, xz, zstd, none.\n"
>  "  -S<strategy>                     Set the compression strategy when 
> building.\n"
>  "                                     Allowed values: none; extreme (xz);\n"
>  "                                     filtered, huffman, rle, fixed 
> (gzip).\n"

In theory the proper way to introduce this is to first enable
decompression and then after a full stable release cycle add compression
support.

> diff --git a/lib/dpkg/compress.c b/lib/dpkg/compress.c
> index 44075cdb6..e20add3b7 100644
> --- a/lib/dpkg/compress.c
> +++ b/lib/dpkg/compress.c
> @@ -750,6 +753,127 @@ static const struct compressor compressor_lzma = {
>       .decompress = decompress_lzma,
>  };
>  
> +/*
> + * Zstd compressor.
> + */
> +
> +#define ZSTD         "zstd"
> +
> +#ifdef WITH_LIBZSTD
> +
> +static void
> +decompress_zstd(int fd_in, int fd_out, const char *desc)
> +{
> +  size_t const buf_in_size = ZSTD_DStreamInSize();

The indentation should be tabs, not spaces here. There are several
other problems with spacing and blank lines, etc in other places.

> +  void*  const buf_in  = malloc(buf_in_size);

m_malloc().

> +  size_t const buf_out_size = ZSTD_DStreamOutSize();
> +  void*  const buf_out = malloc(buf_out_size);
> +  size_t init_result, just_read, to_read;
> +  ZSTD_DStream* const dstream = ZSTD_createDStream();
> +  if (dstream == NULL) {
> +    ohshit(_("ZSTD_createDStream() error "));
> +  }

Unnecessary braces, and non-obvious error message.

> +  /* TODO: a file may consist of multiple appended frames (ex : pzstd).
> +   * The following implementation decompresses only the first frame */

The commit fixing this should be folded into this one.

> +  init_result = ZSTD_initDStream(dstream);
> +  if (ZSTD_isError(init_result)) {
> +    ohshit(_("ZSTD_initDStream() error : %s"), 
> ZSTD_getErrorName(init_result));
> +  }
> +  to_read = init_result;
> +  while ((just_read = fd_read(fd_in, buf_in, to_read))) {
> +    ZSTD_inBuffer input = { buf_in, just_read, 0 };
> +    while (input.pos < input.size) {
> +      ZSTD_outBuffer output = { buf_out, buf_out_size, 0 };
> +      to_read = ZSTD_decompressStream(dstream, &output , &input);
> +      if (ZSTD_isError(to_read)) {
> +        ohshit(_("ZSTD_decompressStream() error : %s \n"),
> +               ZSTD_getErrorName(to_read));
> +      }
> +      fd_write(fd_out, output.dst, output.pos);

Return value not checked.

> +    }
> +  }
> +}

> diff --git a/man/dpkg-source.man b/man/dpkg-source.man
> index 2233d7a8d..991162003 100644
> --- a/man/dpkg-source.man
> +++ b/man/dpkg-source.man
> @@ -176,7 +176,7 @@ Specify the compression to use for created tarballs and 
> diff files
>  (\fB\-\-compression\fP since dpkg 1.15.5).
>  Note that this option will not cause existing tarballs to be recompressed,
>  it only affects new files. Supported values are:
> -.IR gzip ", " bzip2 ", " lzma " and " xz .
> +.IR gzip ", " bzip2 ", " lzma ", " zstd " and " xz .
>  The default is \fIxz\fP for formats 2.0 and newer, and \fIgzip\fP for
>  format 1.0. \fIxz\fP is only supported since dpkg 1.15.5.
>  .TP

> diff --git a/scripts/Dpkg/Compression.pm b/scripts/Dpkg/Compression.pm
> index 3dbc4adf0..4ea512fdc 100644
> --- a/scripts/Dpkg/Compression.pm
> +++ b/scripts/Dpkg/Compression.pm
> @@ -72,6 +72,12 @@ my $COMP = {
>       decomp_prog => [ 'unxz', '--format=lzma' ],
>       default_level => 6,
>      },
> +    zstd => {
> +     file_ext => 'zst',
> +     comp_prog => [ 'zstd', '-q' ],
> +     decomp_prog => [ 'unzstd', '-q' ],
> +     default_level => 19,
> +    },
>      xz => {
>       file_ext => 'xz',
>       comp_prog => [ 'xz' ],

I don't think it makes sense to add support for source packages, as
long as there's no significant usage by upstreams, which I don't think
there is.

> From 9dec1a3f6be2e3d525a92f5a123300618407cb19 Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint.rec...@canonical.com>
> Date: Thu, 8 Mar 2018 10:14:30 +0100
> Subject: [PATCH 2/4] Add test for zstd decompression

This should be folded into the main implementation commit.

> From c927d94df0fdc59c25961505a5438b0dfc58710a Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint.rec...@canonical.com>
> Date: Fri, 9 Mar 2018 15:19:43 +0100
> Subject: [PATCH 3/4] dpkg: Support Zstandard compressed packages with multiple
>  frames

This should be folded with the main implementation commit.


> From d4b3f22299339f4b54f0013b5f86eff48db1e8c4 Mon Sep 17 00:00:00 2001
> From: Balint Reczey <balint.rec...@canonical.com>
> Date: Fri, 9 Mar 2018 11:19:24 +0100
> Subject: [PATCH 4/4] dpkg: Enable zstd uniform compression
> 
> ---
>  dpkg-deb/main.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/dpkg-deb/main.c b/dpkg-deb/main.c
> index 7f898210e..7a40ecb80 100644
> --- a/dpkg-deb/main.c
> +++ b/dpkg-deb/main.c
> @@ -245,6 +245,7 @@ int main(int argc, const char *const *argv) {
>    if (opt_uniform_compression &&
>        (compress_params.type != COMPRESSOR_TYPE_NONE &&
>         compress_params.type != COMPRESSOR_TYPE_GZIP &&
> +       compress_params.type != COMPRESSOR_TYPE_ZSTD &&
>         compress_params.type != COMPRESSOR_TYPE_XZ))
>      badusage(_("unsupported compression type '%s' with uniform compression"),
>               compressor_get_name(compress_params.type));

I'm not sure why this would be split into a different commit, or
introduced at a different point. Older dpkg-deb will either not understand
the data.tar.zst or both the data.tar.zst and the control.tar.zst, so
there's no point in delaying this.

Thanks,
Guillem

Reply via email to