On 4/22/21 8:06 AM, Yi-yo Chiang via Toybox wrote: > Was playing with the new cpio command and spotted a few oddities. Some of > which > I'm not sure are bugs or WAI?
This got caught in gmail's spam filter, just fished it out. Is it still relevant? > 1. cpio -i might not preserve mtime, due to later entries might modify > previous > entries' mtime. > > $ mkdir a && touch a/b > $ touch -d @0 a a/b > $ ls -al a > total 8 > drwxr-xr-x 2 yochiang primarygroup 4096 Jan 1 1970 . > drwxr-xr-x 4 yochiang primarygroup 4096 Apr 22 20:35 .. > -rw-r--r-- 1 yochiang primarygroup 0 Jan 1 1970 b If they're modifying it, then they're changing the mtime, yes. Tar saves directory time modifications and applies them later to mitigate this: https://github.com/landley/toybox/blob/0.8.4/toys/posix/tar.c#L417 > $ # both a/ and a/b have timestamp at epoch+0 > $ find a | toybox cpio -H newc -o >a.cpio > $ mkdir stage && cd stage > $ toybox cpio -i <../a.cpio > $ ls -al a > total 8 > drwxr-xr-x 2 yochiang primarygroup 4096 Apr 22 20:37 . > drwxr-xr-x 3 yochiang primarygroup 4096 Apr 22 20:37 .. > -rw-r--r-- 1 yochiang primarygroup 0 Jan 1 1970 b > > The timestamp of a/b is correct, but a/ isn't. This is because a/ 's timestamp > was updated when we create a/b. Exactly. > Not sure if this is a design choice to simplify code? What does the kernel extractor do? > Fixing this could mean we need a "fix-up" phase after all entries are > extracted > and fix up all the extracted file's st_mtime, which means we would memorize > the > list of all files we extract, which doesn't sound like a good idea in terms of > memory consumption? Just directories, and you can make simplifying assumptions about all the files in a directory coming right after that directory so you have a single stack you're going down and then you pop your way back up. I did this for tar. I didn't bother for cpio because nobody'd asked. > 2. Archives created by cpio command are non-deterministic due to unstable > inode > numbers. > > $ # using the same a.cpio from previous example > $ toybox cpio -idu <../a.cpio > $ find a | toybox cpio -H newc -o | sha1sum > d17aa2355dc17239b90cae724d74d6a56bef67c3 - > $ rm -rf ./* > $ toybox cpio -idu <../a.cpio > $ find a | toybox cpio -H newc -o | sha1sum > bf1428382bdb9240fedb38c46746a30d25ae4daa - > > Even though the source files are exactly the same, the produced archives have > different contents. Upon close inspection the diff happens in the st_ino and > st_mtime field. > > How about we add an option, say "-s" for "stable" or "-P" for "Portability", > that changes the output to have deterministic output by renumbering st_ino, > st_mtime, st_dev and such? Easy enough to do, but I haven't even implemented hardlink support yet. (This stuff still isn't my day job, and I generally spin off todo items faster than I get to them. I still haven't addressed the test suite pathing issue from http://landley.net/notes-2021.html#30-04-2021 for example...) I can add it to the todo heap, but I'm currently distracted elsewhre. I recently noticed that bash job control keeps a persistent exited PID result list forever (it's cleared by a call to "wait" with no arguments, but not by anything ELSE I've noticed yet): $ exit 37 & [1] 16876 $ for i in $(seq 1 100); do exit $i & done ... $ wait 16876 ; echo $? 37 $ wait 16900; echo $? bash: wait: pid 16900 is not a child of this shell 127 landley@driftwood:~/toybox/toybox$ wait 16876 ; echo $? 37 AND that "set -b" (notify of job termination immediately) exists, which means the job control plumbing has to be SIGCHLD based and thus the tables being updated have to be accessed in a signal safe manner DESPITE being dynamically resizable, which means the job control plumbing I've implemented so far has to be redesigned. (I actually noticed this yesterday but was busy with $DAYJOB stuff and just got back to it, and haven't finished the redesign yet. I think I need two different (volatile *) to make this work...) Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
