On Tuesday, February 27th, 2024 at 20:51, Oliver Webb via Toybox
<[email protected]> wrote:
> On Tuesday, February 27th, 2024 at 20:01, Rob Landley [email protected] wrote:
>
> > On 2/27/24 19:23, Oliver Webb wrote:
> >
> > > The below patch does a bunch cleanup on xzcat.c.
> > > It resolves the ifdefs that were in the code (But everything they were
> > > checking was defined so they didn't do anything)
> > > Adds Tests for errors (And while we are changing the test suite, converts
> > > testing -> testcmd)
> > > Rearange/Rewrite large comments to be smaller (and sometimes C99!).
> > > It puts main at the bottom and do_xzcat above it, the way code is usually
> > > arranged..
> > > Removes some function prototypes and the inline keyword (useless in
> > > modern C since it doesn't force anything)
> > > Removes "!= 0" (And turns "x == 0" to "!x")
> > > Some types replaced with their LP64 counterparts (uint8_t to char, etc)
> > >
> > > Yes, I know that this is large for a patch, but a lot of that is
> > > shuffling code/comments. None of the logic has been
> > > changed. When all is said and done, this patch shrinks xzcat.c by over 10%
> >
> > Applied (out of the list's spam filter, "message body too big"), but you've
> > introduced a glitch (missing flush maybe). When I tested it against the xz
> > files
> > I had lying around via "xzcat $FILE | tar tv", one file failed with your
> > patch
> > and succeeded before it. I copied it to my web server so you can take a
> > look:
> >
> > wget https://landley.net/musl-cross-make.tar.xz
> >
> > (Yeah, it's 200 megs. Sorry about that. That was the first one that went
> > "boing".)
>
>
> $ ./xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
>
> 229834752 bytes, 219Mb, 153Mb/s, 0m01s
> $ xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
>
> 229836800 bytes, 219Mb, 509Mb/s, 0m00s
>
> Hmmm, there seems to be missing bytes at the end
>
> $ ./xzcat musl-cross-make.tar.xz | xxd > fi
>
> 229834752 bytes, 219Mb, 153Mb/s, 0m01s
> $ xzcat musl-cross-make.tar.xz | xxd > fi2
>
> 229836800 bytes, 219Mb, 509Mb/s, 0m00s14364673,14364800d14364672
> $ diff fi fi2
> < 0db30000: 2022 2469 220a 2020 646f 6e65 0a66 690a "$i". done.fi.
> < 0db30010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> [zeros until end of file]
>
> putting a fflush at the end of reading a file doesn't fix anything, it's
> losing
> exactly half a page so not flushing would also be my guess
>
> > Also, I note that there is an "upstream" for this code, which may have
> > changed
> > since I merged this (many moons ago), and any upstream changes are probably
> > worth a look:
> >
> > https://git.tukaani.org/xz-embedded.git
> >
> > The one in pending is an old version of that glued together into one file
> > with a
> > config header, NEWTOY() and command_main() wrapper. (Yes, the upstream is
> > public
> > domain! Extractor only, not compressor. I haven't found a public domain xz
> > compressor yet.)
> >
> > I'm also unclear on the difference between xz and 7zip, and the linux
> > kernel's
> > "general setup menu" also offers lzma, lzo, lz4, and zstd as initramfs
> > compression options. I just know kernel.org has tar.xz files, and
> > https://linuxfromscratch.org/lfs/view/12.0/chapter03/packages.html has .gz,
> > .bz2, and .xz tarballs, hence those are the three formats toybox can
> > decompress.
> >
> > It should be able to create gz and zip, which is also what
> > https://github.com/landley/toybox/tags offers to create. (Both of those are
> > deflate, I know how and wrote one in java once long ago. I just got
> > sidetracked
> > on when to do dictionary resets to match the hashes of existing tarballs.
> > The
> > answer is probably "nuts to your white mice" and just do it every 250k or
> > something, and oh well the hashes don't match but you get the data back and
> > that's the important thing. But first I wanted to grab a bunch of existing
> > .gz
> > tarballs and CHECK when they do dictionary resets, because maybe it IS
> > already
> > something simple like that.)
> >
> > Rob
> >
> > P.S. Alas, I don't make a habit of extensively reading incompatibly licensed
> > code before writing a 0BSD version, so this kind of question can linger a
> > while
> > on the todo list when I could theoretically just go track it down in the
> > source...
The symbols that differ:
name old new delta
-----------------------------------------------------------------------
xz_dec_run 1901 1907 6
zhelp_data 100 98 -2
help_data 94 90 -4
do_xzcat 491 467 -24
xz_dec_bcj_reset 41 0 -41
-----------------------------------------------------------------------
-65 total
2 are help text, do_xzcat was due to replacing the case statement for error
handling with a array
If I had to guess, xz_dec_run or xz_dec_bcj_reset
It was the case statement I stripped out in xz_dec_bcj_reset, it works with the
attached patch
- Oliver Webb [email protected]From 4b5f26948c13af8661a3113e8df3117abc05ed8b Mon Sep 17 00:00:00 2001
From: Oliver Webb <[email protected]>
Date: Tue, 27 Feb 2024 21:26:17 -0600
Subject: [PATCH] fix bug in xzcat.c
---
toys/pending/xzcat.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/toys/pending/xzcat.c b/toys/pending/xzcat.c
index 5177f243..b50c9524 100644
--- a/toys/pending/xzcat.c
+++ b/toys/pending/xzcat.c
@@ -55,6 +55,7 @@ enum xz_ret {
// truncated or otherwise corrupt.
};
+
// Passing input and output buffers to XZ code
// Only the contents of the output buffer from out[out_pos] onward, and
// the variables in_pos and out_pos are modified by the XZ code.
@@ -212,7 +213,7 @@ struct xz_dec_bcj {
// in an x86 instruction.
static int bcj_x86_test_msbyte(char b)
{
- return !b || !~b;
+ return !b || b == 0xFF;
}
static size_t bcj_x86(struct xz_dec_bcj *s, char *buf, size_t size)
@@ -639,6 +640,20 @@ enum xz_ret xz_dec_bcj_run(struct xz_dec_bcj *s, struct xz_dec_lzma2 *lzma2,
*/
enum xz_ret xz_dec_bcj_reset(struct xz_dec_bcj *s, char id)
{
+ switch (id) {
+ case BCJ_X86:
+ case BCJ_POWERPC:
+ case BCJ_IA64:
+ case BCJ_ARM:
+ case BCJ_ARMTHUMB:
+ case BCJ_SPARC:
+ break;
+
+ default:
+ /* Unsupported Filter ID */
+ return XZ_OPTIONS_ERROR;
+ }
+
s->type = id;
s->ret = XZ_OK;
s->pos = 0;
@@ -2860,16 +2875,17 @@ void do_xzcat(int fd, char *name)
if (ret == XZ_OK || ret == XZ_UNSUPPORTED_CHECK)
continue;
- else if (ret == XZ_STREAM_END) {
- xz_dec_end(s);
- return;
- }
-
if (fwrite(out, 1, b.out_pos, stdout) != b.out_pos) {
msg = "Write error\n";
goto error;
}
+ if (ret == XZ_STREAM_END) {
+ xz_dec_end(s);
+ return;
+ }
+
+
msg = (ret-3 < ARRAY_LEN(errors)) ? errors[ret-3] : "Bug!";
goto error;
}
--
2.44.0
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net