(looks like it breaks find too --- with O_PATH find stops reporting anything but the directory it's given.)
On Thu, Jun 20, 2019 at 4:27 PM enh <[email protected]> wrote: > > On Tue, Jun 18, 2019 at 12:24 AM Rob Landley <[email protected]> wrote: > > > > On 6/17/19 5:12 PM, enh wrote: > > >> Sigh. It's a pity I can't see what the actual failure is. (When you say > > >> timezone > > >> testing, do you mean toybox's date.test...?) > > > > > > no, it's "x86 CtsHostTzDataTests". afaict that's hundreds of lines of > > > Java to do what could be done in a couple of lines of shell, if only > > > we supported the latter. > > > > *blink* *blink* How is a change in toybox's shared plumbing making java > > fail...? > > (Is java shelling back out to the command line?) > > yes, exactly. that's why i'm grumbling about "we should have just > supported shell scripts". rewriting shell scripts as repeated calls to > adb in Java isn't helping anyone. (but it's the hammer they had, > so...) > > > I've debugged my way into the kernel and back out to track down weirdness, > > but I > > haven't got a reproduction environment for this. (Or nearly enough > > familiarity > > with the layers of java, selinux, and adb that seem to be involved in > > reproducing this). Is the vm _required_ for this to reproduce? What the... > > > > If you run the failing test twice (not rebuild, but redo the adb > > transaction) > > does it fail again? If so, does swapping in the previous toybox binary make > > it > > succeed? > > > > If that's the case we don't need the _build_, we just need the VM and the > > test > > harness... > > > > >>>>> increasingly convinced that the DIRTREE_STATELESS patch does break > > >>>>> something, and it's not just an infrastructure issue... i wouldn't > > >>>>> normally send such a useless bug report, but i've failed to get to > > >>>>> this in 3 days, and i'm not likely to for at least 3 more at this > > >>>>> point, so i thought i'd at least mention it...) > > >>>> > > >>>> This isn't going to break anything, is it? > > >>>> > > >>>> - openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags); > > >>>> + openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), > > >>>> flags); > > >>> > > >>> (one thing that occurred to me over the weekend is that it anywhere we > > >>> use O_PATH might break macOS, since there is no O_PATH there. > > >> > > >> #ifdef __APPLE__ > > >> #define O_PATH 0 > > >> #endif > > > > > > i actually meant behavioral breaks. specifically xabspath breaks the > > > usual "you have to pick one of O_RDONLY/O_RDWR/O_WRONLY" rule. > > > > O_RDONLY is zero on Linux? We can | in O_RDONLY if we need that for mac to > > work, > > I just didn't include a NOP. (Same with the above code snippet.) > > > > > i'm not worried about this right now, but it's something to think > > > about if we're going to seriously support macOS. > > > > I did Python for years, this is part of the reason I want a test suite > > covering > > everything. There's no substitute for thorough runtime testing. (Of course > > in C > > there's no substitute for thorough code review in _addition_ to thorough > > runtime > > testing.) > > > > That said, the interaction with another OS should boil down to A) supporting > > another libc (finding NEW ways to break sprintf), B) what system calls it > > makes, > > and the arguments thereto (which includes different structure layouts). > > That's > > assuming same toolchain and a processor type we've supported before. :) > > > > >>> but the > > >>> failures in question are on Android. [the builds in question don't > > >>> contain a new host prebuilt.]) > > >> > > >> So the host prebuilt is the same, you rebuild toybox from source, and > > >> then a > > >> test it runs afterwards fails? > > > > > > yes, this is only rebuilding the device toybox. and then the > > > should-be-unrelated test fails. > > > > Ok, sounds like you're already avoiding a full rebuild and doctoring the > > image > > to pass or not pass? > > > > > next time i'm actually at a real > > > computer i'll try to find out exactly what commands this test is > > > running on the device. > > > > I'm sorry this is being weird. I need to reproduce your test environment > > here so > > I can avoid this sort of thing in future (or at least be some help when it > > happens). > > > > Would telling AOSP to build a Pixel 3xl image help in any way? (I didn't > > spot an > > adb call into a VM in that last I checked, but how would I know?) > > > > >> Is the test the only thing that fails? (Or does the build stop there?) > > > > > > afaict it's only this. (the tests all run after the build is done.) > > > > So you have a build image, and a command line that calls adb to do Things > > to it. > > Can the test run more than once, or does it modify the image to prevent > > itself > > from running again? (If you don't need to rebuild between each run, why > > does it > > take so long? The test is part of a large test suite and that's what hasn't > > been > > isolated yet?0 > > > > How big is the image, and what emulator do you need to run it? > > > > Also, how would I spray down an strace binary with enough selinux > > permissions > > that it could actually run on android? (Presumably there's some sort of > > setprop > > idkfa "^^vv<><>baSTART" call buried in the AOSP build scripts?) > > > > >> Can this sideload test be extracted from the larger build? Maybe I can > > >> get an > > >> image I can replace toybox in and try the test again to see what's going > > >> on? > > > > > > afaict it's only affecting x86? which is presumably another clue. but > > > i'm not certain that's true yet because i haven't had time to confirm > > > that everything still works on arm/arm64. > > > > You have an environment in which it reproduces, but it could be _anything_. > > For > > all we know it could be toybox passing a critical size which causes an extra > > block to be allocated in the filesystem which moves the xattr into a > > pathological alignment where we trigger an selinux bug. (I've hit, and > > debugged, > > crazier. That's why when I get a bug I FREEZE and don't NOBODY MOVE and we > > SNAPSHOT and pull out the lapidoptry dissection equipment before the needle > > we > > stepped on gets mixed back into the haystack.) > > > > Yeah, makes me a kinda slow developer at times. > > > > (I'm still thinking the O_PATH thing is the most likely externally visible > > change. That's something you could spot in strace.) > > still haven't had time to actually reproduce this properly, but > abusing the build servers in the background while i did other stuff > did let me confirm that it's the O_PATH. removing that fixes things > again. > > one symptom i noticed by accident when running toybox with the O_PATH > change locally: ps doesn't show any processes! oh, and it's > reproduceable on my desktop too. so i don't know that that's _the_ > issue, but that's certainly _an_ issue, and a convincing reason to > revert at least the O_PATH hunk. > > patch attached. > > > Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
