Re: [Toybox] [PATCH] Clean up xz a good amount

Oliver Webb via Toybox Tue, 27 Feb 2024 19:28:40 -0800

On Tuesday, February 27th, 2024 at 20:51, Oliver Webb via Toybox 
<[email protected]> wrote:
> On Tuesday, February 27th, 2024 at 20:01, Rob Landley [email protected] wrote:
> 
> > On 2/27/24 19:23, Oliver Webb wrote:
> > 
> > > The below patch does a bunch cleanup on xzcat.c.
> > > It resolves the ifdefs that were in the code (But everything they were 
> > > checking was defined so they didn't do anything)
> > > Adds Tests for errors (And while we are changing the test suite, converts 
> > > testing -> testcmd)
> > > Rearange/Rewrite large comments to be smaller (and sometimes C99!).
> > > It puts main at the bottom and do_xzcat above it, the way code is usually 
> > > arranged..
> > > Removes some function prototypes and the inline keyword (useless in 
> > > modern C since it doesn't force anything)
> > > Removes "!= 0" (And turns "x == 0" to "!x")
> > > Some types replaced with their LP64 counterparts (uint8_t to char, etc)
> > > 
> > > Yes, I know that this is large for a patch, but a lot of that is 
> > > shuffling code/comments. None of the logic has been
> > > changed. When all is said and done, this patch shrinks xzcat.c by over 10%
> > 
> > Applied (out of the list's spam filter, "message body too big"), but you've
> > introduced a glitch (missing flush maybe). When I tested it against the xz 
> > files
> > I had lying around via "xzcat $FILE | tar tv", one file failed with your 
> > patch
> > and succeeded before it. I copied it to my web server so you can take a 
> > look:
> > 
> > wget https://landley.net/musl-cross-make.tar.xz
> > 
> > (Yeah, it's 200 megs. Sorry about that. That was the first one that went 
> > "boing".)
> 
> 
> $ ./xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
> 
> 229834752 bytes, 219Mb, 153Mb/s, 0m01s
> $ xzcat musl-cross-make.tar.xz | ./count -l > /dev/null
> 
> 229836800 bytes, 219Mb, 509Mb/s, 0m00s
> 
> Hmmm, there seems to be missing bytes at the end
> 
> $ ./xzcat musl-cross-make.tar.xz | xxd > fi
> 
> 229834752 bytes, 219Mb, 153Mb/s, 0m01s
> $ xzcat musl-cross-make.tar.xz | xxd > fi2
> 
> 229836800 bytes, 219Mb, 509Mb/s, 0m00s14364673,14364800d14364672
> $ diff fi fi2
> < 0db30000: 2022 2469 220a 2020 646f 6e65 0a66 690a "$i". done.fi.
> < 0db30010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> [zeros until end of file]
> 
> putting a fflush at the end of reading a file doesn't fix anything, it's 
> losing
> exactly half a page so not flushing would also be my guess
> 
> > Also, I note that there is an "upstream" for this code, which may have 
> > changed
> > since I merged this (many moons ago), and any upstream changes are probably
> > worth a look:
> > 
> > https://git.tukaani.org/xz-embedded.git
> > 
> > The one in pending is an old version of that glued together into one file 
> > with a
> > config header, NEWTOY() and command_main() wrapper. (Yes, the upstream is 
> > public
> > domain! Extractor only, not compressor. I haven't found a public domain xz
> > compressor yet.)
> > 
> > I'm also unclear on the difference between xz and 7zip, and the linux 
> > kernel's
> > "general setup menu" also offers lzma, lzo, lz4, and zstd as initramfs
> > compression options. I just know kernel.org has tar.xz files, and
> > https://linuxfromscratch.org/lfs/view/12.0/chapter03/packages.html has .gz,
> > .bz2, and .xz tarballs, hence those are the three formats toybox can 
> > decompress.
> > 
> > It should be able to create gz and zip, which is also what
> > https://github.com/landley/toybox/tags offers to create. (Both of those are
> > deflate, I know how and wrote one in java once long ago. I just got 
> > sidetracked
> > on when to do dictionary resets to match the hashes of existing tarballs. 
> > The
> > answer is probably "nuts to your white mice" and just do it every 250k or
> > something, and oh well the hashes don't match but you get the data back and
> > that's the important thing. But first I wanted to grab a bunch of existing 
> > .gz
> > tarballs and CHECK when they do dictionary resets, because maybe it IS 
> > already
> > something simple like that.)
> > 
> > Rob
> > 
> > P.S. Alas, I don't make a habit of extensively reading incompatibly licensed
> > code before writing a 0BSD version, so this kind of question can linger a 
> > while
> > on the todo list when I could theoretically just go track it down in the 
> > source...


The symbols that differ:
name                                           old       new      delta
-----------------------------------------------------------------------
xz_dec_run                                    1901      1907          6
zhelp_data                                     100        98         -2
help_data                                       94        90         -4
do_xzcat                                       491       467        -24
xz_dec_bcj_reset                                41         0        -41
-----------------------------------------------------------------------
                                                                    -65 total
2 are help text, do_xzcat was due to replacing the case statement for error 
handling with a array
If I had to guess, xz_dec_run or xz_dec_bcj_reset

It was the case statement I stripped out in xz_dec_bcj_reset, it works with the 
attached patch

- Oliver Webb [email protected]

From 4b5f26948c13af8661a3113e8df3117abc05ed8b Mon Sep 17 00:00:00 2001
From: Oliver Webb <[email protected]>
Date: Tue, 27 Feb 2024 21:26:17 -0600
Subject: [PATCH] fix bug in xzcat.c

---
 toys/pending/xzcat.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/toys/pending/xzcat.c b/toys/pending/xzcat.c
index 5177f243..b50c9524 100644
--- a/toys/pending/xzcat.c
+++ b/toys/pending/xzcat.c
@@ -55,6 +55,7 @@ enum xz_ret {
   // truncated or otherwise corrupt.
 };
 
+
 // Passing input and output buffers to XZ code
 // Only the contents of the output buffer from out[out_pos] onward, and
 // the variables in_pos and out_pos are modified by the XZ code.
@@ -212,7 +213,7 @@ struct xz_dec_bcj {
 // in an x86 instruction.
 static int bcj_x86_test_msbyte(char b)
 {
-  return !b || !~b;
+  return !b || b == 0xFF;
 }
 
 static size_t bcj_x86(struct xz_dec_bcj *s, char *buf, size_t size)
@@ -639,6 +640,20 @@ enum xz_ret xz_dec_bcj_run(struct xz_dec_bcj *s, struct xz_dec_lzma2 *lzma2,
  */
 enum xz_ret xz_dec_bcj_reset(struct xz_dec_bcj *s, char id)
 {
+  switch (id) {
+  case BCJ_X86:
+  case BCJ_POWERPC:
+  case BCJ_IA64:
+  case BCJ_ARM:
+  case BCJ_ARMTHUMB:
+  case BCJ_SPARC:
+    break;
+
+  default:
+    /* Unsupported Filter ID */
+    return XZ_OPTIONS_ERROR;
+  }
+
   s->type = id;
   s->ret = XZ_OK;
   s->pos = 0;
@@ -2860,16 +2875,17 @@ void do_xzcat(int fd, char *name)
     if (ret == XZ_OK || ret == XZ_UNSUPPORTED_CHECK)
       continue;
 
-    else if (ret == XZ_STREAM_END) {
-      xz_dec_end(s);
-      return;
-    }
-
     if (fwrite(out, 1, b.out_pos, stdout) != b.out_pos) {
       msg = "Write error\n";
       goto error;
     }
 
+    if (ret == XZ_STREAM_END) {
+      xz_dec_end(s);
+      return;
+    }
+
+
     msg = (ret-3 < ARRAY_LEN(errors)) ? errors[ret-3] : "Bug!";
     goto error;
   }
-- 
2.44.0

_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] [PATCH] Clean up xz a good amount

Reply via email to