On Mon, Feb 6, 2023 at 7:35 PM Rob Landley <r...@landley.net> wrote: > > > > On 2/6/23 12:07, enh wrote: > >> > -*- > >> > > >> > as for the new tar, i updated the prebuilts yesterday and we've seen > >> > enough OOM kills since that i've had to revert it. > >> > >> Grrr. I've got a couple of suspects for what I screwed up, but... Lemme > >> see if I > >> can work the ASAN leak detector into my workflow at all here. (There's a > >> lot of > >> commands that intentionally leak resources on the way out because the OS > >> will > >> free them, but there's a category of commands that should NOT do that > >> because > >> they process theoretically unbounded input and cannot be allowed to > >> leak-per.) > > I haven't wired up the ASAN leak detector yet.
i don't think there's a leak. > >> > every case i saw > >> > was from > >> > https://cs.android.com/android/platform/superproject/+/master:system/update_engine/Android.bp;l=947?q=ue_unittest_disk_imgs > > I wget that, and it was html. > > >> > so basically: > >> > ``` > >> > tar -jxf sample_images/sample_images.tar.bz2 -C gen disk_ext2_1k.img > >> > disk_ext2_4k.img disk_ext2_4k_empty.img disk_ext2_unittest.img > >> > ``` > >> > > >> > (direct link to the .tar.bz2: > >> > https://android.googlesource.com/platform/system/update_engine/+/refs/heads/master/sample_images/) > > I wget _that_ and it was html. > > >> Wait, OOM on _extract_? Did I change the extract codepath? (That's > >> sounding more > >> like I damaged dirtree in the non-breadth case...) > > > > yeah, that didn't make much sense to me either, and i should have > > explicitly said "i'm seeing tar killed by SIGKILL and _assume_ that's > > the oom killer". but -- now i've had time to repro -- it looks like > > suicide instead: > > I committed a fix in the --sort path that in theory removed a memory leak, and > removed an "always add null terminator to empty directories thus dereferencing > null" thinko that was disguised by the memory leak. > > But when I stress tested "tar c /home | tar t" (over 1.5 terabytes of files > including an AOSP checkout) I don't like the memory behavior, it was up to 256 > megs used when I killed it. (Possibly just glibc's horrible heap, but I wanted > to poke at it more to be sure.) > > I still haven't fiddled with the extract path, that I know... > > > ~/aosp-master-with-phones$ ./out/host/linux-x86/bin/toybox tar -jxf > > system/update_engine/sample_images/sample_images.tar.bz2 -C > > out/soong/.temp/sbox/c744f840647dbf476d005753d9a7965d5af2771b/out/gen > > disk_ext2_1k.img disk_ext2_4k.img disk_ext2_4k_empty.img > > disk_ext2_unittest.img > > tar: had errors > > I didn't manage to grab your test image on friday because the THIRD link I > tried > https://android.googlesource.com/platform/system/update_engine/+/refs/heads/master/sample_images/sample_images.tar.bz2 > is still an html file with no obvious "download the darn file" link. There are > "txt" and "json" links at the bottom, but the txt was not a bzip2 file. > > I just went and looked at the code instead, on the theory this worked before > so > must have been part of the recent changes... > > > read(4, > > "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 512) = 512 > > kill(3992265, SIGKILL) = 0 > > wait4(3992265, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3992265 > > exit_group(0) = ? > > +++ exited with 0 +++ > > ... OOM killer maybe? no, sorry --- this was the point i was trying to make. _you're_ calling kill(). (if it was the OOM killer, you'd just see the signal appear out of nowhere.) i think this is probably the earlier race workaround coming back to bite us? https://github.com/landley/toybox/blob/master/toys/posix/tar.c#L1120 > > (yes, that pid is legit. now you see why i needed the column size fix > > for ps/top :-) ) > > > > (note that the command line there should give you convenient repro > > steps --- you don't need a whole AOSP tree, just the .tar.bz2 i linked > > to in the first post.) > > Ok, I backed up to "update engine" link which told me the git clone invocation > for this repo, and the resulting 485 files totaling 33 megabytes did include > the > relevant file, although at 6k you probably could have just attached it. > > Let's see... > > $ mkdir sub && cd sub > $ ~/toybox/toybox/tar xvf ../sample_images.tar.bz2 > disk_ext2_1k.img > disk_ext2_4k.img > disk_ext2_4k_empty.img > disk_ext2_unittest.img > disk_sqfs_empty.img > disk_sqfs_default.img > disk_sqfs_unittest.img > $ echo $? > 0 > > Seems to have worked with the test binary I made before that last checkin? > Unless the -C or explicit -j were doing something weird, or I need more to the > reproduction sequence. (Built with NDK or musl...?) it's definitely flaky for me, but i was able to reproduce most of the time. given my suspicion that it's related to your earlier attempt to deal with a race condition, the active ingredient between my 8/10 failure rate and your inability to repro might just be "i have a really fast machine"? > Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net