On 12/10/24 12:37, enh wrote:
On Sun, Dec 8, 2024 at 12:51 AM Rob Landley <r...@landley.net> wrote:

On 12/7/24 18:39, enh wrote:
On Sat, Dec 7, 2024, 18:25 Rob Landley <r...@landley.net> wrote:

On 12/6/24 13:57, enh wrote:
We're seeing ever more zstd-compressed files in the wild, so even
though
toybox can't compress/decompress zstd without an external helper, it
still seems useful to integrate with any that happens to be on the
system.

No short option for zstd, even though every other explicit archive
format has one?


technically there are a couple of other compression options that are
longopt only,

In gnu/gnu.

such as --lzma (but i haven't added those here because i've
yet to see them used).

this probably made sense when it was added in 2019, and it wasn't clear
how
popular, zstd was going to become. (especially in comparison to the other
options we don't have.)

though tbh, zstd seems more popular in non-tar contexts ... i had to ask
the internet what the long and short extensions were!

Imma hijack -Z. I'm aware in debian that's "compress" but we've never
supported that format, which was patented in the 1980s causing it to be
completely replaced by gzip except for some old legacy archives you can
"compress -d file.Z | tar x" if you like.


yeah, sounds reasonable.

coincidentally i saw https://www.phoronix.com/news/Linux-EFI-Zboot-Gzip-Zstd
"Linux EFI Zboot Abandoning "Compression Library Museum", Focusing On Gzip
& Zstd" which made me laugh, given that that had been my reaction to the
other formats that gnu tar supports (and has single-letter options for!)
that toybox tar doesn't (and almost certainly shoudn't) like lzip and lzop.
presumably characters from a children's show in a language i don't speak?

Way back when then pkzip 2.0 came out there was arj and pak and zoo and several others, I was never entirely sure what the under the cover differences were (especially since the archive and the compression are two different formats). I also remember that zip itself supported a bunch of legacy formats (hence the Nancy Button: "Unzip, expand, explode, what pervert came up with this in "the little caligraphic button catalogue on the prairie" circa 1984. I think that was the first one I got at that Dr. Who convention, "Don't crush that dwarf, hand me the caligraphic button catalogue" was later...)

I blogged about there being a similar group of compression formats (supported in the linux kernel's zimage and initramfs expanders) and having no idea which would "win", and winding up with xz because txz was the format kernel tarballs were available in and I found a public domain expander program.

I don't know what the difference between xz and zstd is, I've mostly avoided technology that comes from faceboot because zuckerberg and thiel somehow manage to be worse than gates and ballmer.

(I just like there to BE a short option, and another obvious contender
isn't presenting itself. Plus I haven't got an obvious way to test this
anyway.)

yeah, i just tested manually. it did occur to me that the test shell script
could check to see whether there's a zstd(1) binary on the path, and skip
any zstd tests if not?

At some point I need to categorize the skips. Not sure how yet, there's a missing design idea.

But "gnu/command never passed this", "musl never passed this", "busybox doesn't pass this", "bionic never passed this", "old glibc passes this but new one has version skew"...

I want more granularity out of skipped but dunno what the annotation(s) should be. Maybe "skip strings" added to the end of the line as a parenthetical? With a VERBOSE=why added to VERBOSE=allfailnopassquietspam

As I said, missing design work...

(and there's really no excuse for me not adding a file(1) test beyond "we
don't have tests for _most_ of the recognized formats", though "this is
just a constant prefix match" is a slightly better excuse.)

I'm always up for adding more tests, but I haven't been trying to do so piecemeal because it doesn't save work for an eventual "trying to be systemic" pass where you go line by line through the source and relevant standards and write a test for every decision..


https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md

I note that I have yet to see zstd tarballs in the wild. Not one of the kernel formats, not one of the linux from scratch formats... Implementing "zip" is higher on my priority list, which means finishing deflate compression side, which means answering the dictionary reset question. (Although if I don't care about producing binary equivalent tarballs, "every X bytes" is fine. Maybe every 250k? The problem with calculating a non-default huffman tree is you need to read the data before compressing it to count the symbol frequency, so what's the input buffer size...)

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to