Re: [Toybox] [PATCH] file, tar: basic zstd awareness.

Rob Landley Wed, 11 Dec 2024 03:27:51 -0800

On 12/10/24 12:37, enh wrote:

On Sun, Dec 8, 2024 at 12:51 AM Rob Landley <r...@landley.net> wrote:

On 12/7/24 18:39, enh wrote:

On Sat, Dec 7, 2024, 18:25 Rob Landley <r...@landley.net> wrote:

On 12/6/24 13:57, enh wrote:

We're seeing ever more zstd-compressed files in the wild, so even

though

toybox can't compress/decompress zstd without an external helper, it
still seems useful to integrate with any that happens to be on the
system.


No short option for zstd, even though every other explicit archive
format has one?


technically there are a couple of other compression options that are
longopt only,


In gnu/gnu.

such as --lzma (but i haven't added those here because i've
yet to see them used).

this probably made sense when it was added in 2019, and it wasn't clear

how

popular, zstd was going to become. (especially in comparison to the other
options we don't have.)

though tbh, zstd seems more popular in non-tar contexts ... i had to ask
the internet what the long and short extensions were!


Imma hijack -Z. I'm aware in debian that's "compress" but we've never
supported that format, which was patented in the 1980s causing it to be
completely replaced by gzip except for some old legacy archives you can
"compress -d file.Z | tar x" if you like.


yeah, sounds reasonable.

coincidentally i saw https://www.phoronix.com/news/Linux-EFI-Zboot-Gzip-Zstd
"Linux EFI Zboot Abandoning "Compression Library Museum", Focusing On Gzip
& Zstd" which made me laugh, given that that had been my reaction to the
other formats that gnu tar supports (and has single-letter options for!)
that toybox tar doesn't (and almost certainly shoudn't) like lzip and lzop.
presumably characters from a children's show in a language i don't speak?

Way back when then pkzip 2.0 came out there was arj and pak and zoo andseveral others, I was never entirely sure what the under the coverdifferences were (especially since the archive and the compression aretwo different formats). I also remember that zip itself supported abunch of legacy formats (hence the Nancy Button: "Unzip, expand,explode, what pervert came up with this in "the little caligraphicbutton catalogue on the prairie" circa 1984. I think that was the firstone I got at that Dr. Who convention, "Don't crush that dwarf, hand methe caligraphic button catalogue" was later...)

I blogged about there being a similar group of compression formats(supported in the linux kernel's zimage and initramfs expanders) andhaving no idea which would "win", and winding up with xz because txz wasthe format kernel tarballs were available in and I found a public domainexpander program.

I don't know what the difference between xz and zstd is, I've mostlyavoided technology that comes from faceboot because zuckerberg and thielsomehow manage to be worse than gates and ballmer.

(I just like there to BE a short option, and another obvious contender
isn't presenting itself. Plus I haven't got an obvious way to test this
anyway.)


yeah, i just tested manually. it did occur to me that the test shell script
could check to see whether there's a zstd(1) binary on the path, and skip
any zstd tests if not?

At some point I need to categorize the skips. Not sure how yet, there'sa missing design idea.

But "gnu/command never passed this", "musl never passed this", "busyboxdoesn't pass this", "bionic never passed this", "old glibc passes thisbut new one has version skew"...

I want more granularity out of skipped but dunno what the annotation(s)should be. Maybe "skip strings" added to the end of the line as aparenthetical? With a VERBOSE=why added to VERBOSE=allfailnopassquietspam


As I said, missing design work...

(and there's really no excuse for me not adding a file(1) test beyond "we
don't have tests for _most_ of the recognized formats", though "this is
just a constant prefix match" is a slightly better excuse.)

I'm always up for adding more tests, but I haven't been trying to do sopiecemeal because it doesn't save work for an eventual "trying to besystemic" pass where you go line by line through the source and relevantstandards and write a test for every decision..

https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md

I note that I have yet to see zstd tarballs in the wild. Not one of thekernel formats, not one of the linux from scratch formats...Implementing "zip" is higher on my priority list, which means finishingdeflate compression side, which means answering the dictionary resetquestion. (Although if I don't care about producing binary equivalenttarballs, "every X bytes" is fine. Maybe every 250k? The problem withcalculating a non-default huffman tree is you need to read the databefore compressing it to count the symbol frequency, so what's the inputbuffer size...)


Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] [PATCH] file, tar: basic zstd awareness.

Reply via email to