and to the surprise of absolutely no-one ... they got back to me with a test script to repro this with (which i assume will work in any kernel tree, and isn't specific to ACK)... ``` #!/bin/bash
set -x run () { start_time=$(date +%s) find /usr/include/linux/ -name *.h -print0 | \ /usr/bin/perf record -g -- \ tar czf target.$1.tar.gz \ --absolute-names \ --dereference \ --transform "s,/usr/include/,," \ --null -T - mv perf.data perf.$1.data end_time=$(date +%s) elapsed=$(( end_time - start_time )) echo $elapsed s } ``` ...and reported that all the time is going into tar fork()ing and exec()ing sed :-( so although NORECURSE was a clever hack to work around the _first_ tar+sed problem, it seems to have had the expected result of landing me back with the original tar+sed problem :-( On Wed, Aug 31, 2022 at 2:55 PM enh <e...@google.com> wrote: > > > On Tue, Aug 16, 2022 at 12:22 PM enh <e...@google.com> wrote: > >> >> >> On Tue, Aug 16, 2022 at 10:28 AM enh <e...@google.com> wrote: >> >>> >>> >>> On Tue, Aug 16, 2022 at 1:43 AM Rob Landley <r...@landley.net> wrote: >>> >>>> On 8/15/22 18:50, enh via Toybox wrote: >>>> > and here's their minimized repro case: >>>> > >>>> > echo > /tmp/foo.txt; echo /tmp/foo.txt > /tmp/find.txt >>>> > >>>> > cat /tmp/find.txt | prebuilts/build-tools/path/linux-x86/tar czf >>>> /tmp/out.tar.gz \ >>>> > --absolute-names \ >>>> > --transform 's,^/,,' -T - >>>> > >>>> > This fails with >>>> > >>>> > tar: bad xform >>>> >>>> Hmmm... >>>> >>>> $ echo > /tmp/foo.txt; echo /tmp/foo.txt > /tmp/find.txt >>>> $ cat /tmp/find.txt | PATH=$PWD/sub9:$PATH ./tar czf out.tar.gz \ >>>> --absolute-names --transform 's,^/,,' -T - >>>> $ tar tvf /tmp/out.tar.gz >>>> -rw-r--r-- landley/landley 1 2022-08-16 01:53 tmp/foo.txt >>>> >>>> Working for me? (The sub9 bit was because I stuck toybox sed in the >>>> $PATH to >>>> make sure that wasn't it...) >>>> >>> >>> repos for me, both with their prebuilt but also with a fresh clone (on >>> either macos or linux): >>> >>> */tmp/toybox$ *cat /tmp/find.txt | ./toybox tar czf /tmp/out.tar.gz >>> --absolute-names >>> --transform 's,^/,,' -T - >>> >>> tar: bad xform >>> >>> */tmp/toybox$ * >>> >>> a bit of printf debugging shows we're reading nothing back: >>> >>> */tmp/toybox$ *cat /tmp/find.txt | strace -f ./toybox tar czf >>> /tmp/out.tar.gz --absolute-names --transform 's,^/,,' -T - 2> >>> /tmp/out >>> >>> argv[0]="sed" >>> >>> argv[1]="-e" >>> >>> argv[2]="s,^/,," >>> >>> pid=1779946 >>> >>> stdin="/tmp/foo.txt" >>> >>> len=0 Success >>> >>> total=0 result="(null)" >>> >>> >>> but strace implies we're not actually exec()ing sed at all? >>> >> >> and if i `CONFIG_TOYBOX_NORECURSE=y`, it calls sed and works... >> > > ...though this might be about to come back and bite me. i'm hearing as-yet > unconfirmed reports that toybox `tar czf` is a lot slower than gnu tar, and > -- given that they're using --transform` while they're assuming it's tar or > gzip, i'm wondering whether it's actually the fact that we're forking out > to sed for every file? > > i've asked for repro steps or a `perf record` i can look at... > > >> >> > let me know if you've already fixed this on your branch and that's why you >>> can't repro, otherwise i'll keep looking after my meeting... >>> >>> > However, if the file names are fed via the -T /tmp/find.txt, it works: >>>> >>>> Hmmm... the child process shouldn't have access to the parent's stdin, >>>> we >>>> replaced it with a pipe? There was a potential bug in that area, but >>>> commit >>>> dc8b46d5ddab should have fixed it last month and I don't _think_ it >>>> would have >>>> applied here anyway... >>>> >>>> > echo > /tmp/foo.txt; echo /tmp/foo.txt > /tmp/find.txt >>>> > >>>> > prebuilts/build-tools/path/linux-x86/tar czf /tmp/out.tar.gz \ >>>> > --absolute-names \ >>>> > --transform 's,^/,,' -T /tmp/find.txt >>>> > >>>> > (the "prebuilts/build-tools/path/linux-x86/" stuff is just a >>>> directory full of >>>> > symlinks to toybox.) >>>> >>>> Multiplexer instead of standalone build shouldn't make a difference if >>>> you've >>>> disabled command recursion. (Modulo you're calling tar out a specific >>>> path but >>>> it then grabs sed out of the $PATH, but I haven't yet implemented the >>>> extra >>>> argument processing that would specifically require toybox sed...) >>>> >>>> (The extra error message is a little tricker than my first guess >>>> because you can >>>> have multiple --xform things which turn into a list of -e entries to >>>> sed... >>>> Possibly instead of error_exit() I should error_msg(), dump the sed >>>> command line >>>> on a second line, and then xexit()...) >>>> >>>> Rob >>>> >>>
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net