((as is very visible from running the find tests. i _really_ need to start automating some of these tests... could have saved a week that way :-( ))
On Thu, Jun 20, 2019 at 4:33 PM enh <[email protected]> wrote: > > (looks like it breaks find too --- with O_PATH find stops reporting > anything but the directory it's given.) > > On Thu, Jun 20, 2019 at 4:27 PM enh <[email protected]> wrote: > > > > On Tue, Jun 18, 2019 at 12:24 AM Rob Landley <[email protected]> wrote: > > > > > > On 6/17/19 5:12 PM, enh wrote: > > > >> Sigh. It's a pity I can't see what the actual failure is. (When you > > > >> say timezone > > > >> testing, do you mean toybox's date.test...?) > > > > > > > > no, it's "x86 CtsHostTzDataTests". afaict that's hundreds of lines of > > > > Java to do what could be done in a couple of lines of shell, if only > > > > we supported the latter. > > > > > > *blink* *blink* How is a change in toybox's shared plumbing making java > > > fail...? > > > (Is java shelling back out to the command line?) > > > > yes, exactly. that's why i'm grumbling about "we should have just > > supported shell scripts". rewriting shell scripts as repeated calls to > > adb in Java isn't helping anyone. (but it's the hammer they had, > > so...) > > > > > I've debugged my way into the kernel and back out to track down > > > weirdness, but I > > > haven't got a reproduction environment for this. (Or nearly enough > > > familiarity > > > with the layers of java, selinux, and adb that seem to be involved in > > > reproducing this). Is the vm _required_ for this to reproduce? What the... > > > > > > If you run the failing test twice (not rebuild, but redo the adb > > > transaction) > > > does it fail again? If so, does swapping in the previous toybox binary > > > make it > > > succeed? > > > > > > If that's the case we don't need the _build_, we just need the VM and the > > > test > > > harness... > > > > > > >>>>> increasingly convinced that the DIRTREE_STATELESS patch does break > > > >>>>> something, and it's not just an infrastructure issue... i wouldn't > > > >>>>> normally send such a useless bug report, but i've failed to get to > > > >>>>> this in 3 days, and i'm not likely to for at least 3 more at this > > > >>>>> point, so i thought i'd at least mention it...) > > > >>>> > > > >>>> This isn't going to break anything, is it? > > > >>>> > > > >>>> - openat(dirtree_parentfd(new), new->name, O_CLOEXEC), flags); > > > >>>> + openat(dirtree_parentfd(new), new->name, O_PATH|O_CLOEXEC), > > > >>>> flags); > > > >>> > > > >>> (one thing that occurred to me over the weekend is that it anywhere we > > > >>> use O_PATH might break macOS, since there is no O_PATH there. > > > >> > > > >> #ifdef __APPLE__ > > > >> #define O_PATH 0 > > > >> #endif > > > > > > > > i actually meant behavioral breaks. specifically xabspath breaks the > > > > usual "you have to pick one of O_RDONLY/O_RDWR/O_WRONLY" rule. > > > > > > O_RDONLY is zero on Linux? We can | in O_RDONLY if we need that for mac > > > to work, > > > I just didn't include a NOP. (Same with the above code snippet.) > > > > > > > i'm not worried about this right now, but it's something to think > > > > about if we're going to seriously support macOS. > > > > > > I did Python for years, this is part of the reason I want a test suite > > > covering > > > everything. There's no substitute for thorough runtime testing. (Of > > > course in C > > > there's no substitute for thorough code review in _addition_ to thorough > > > runtime > > > testing.) > > > > > > That said, the interaction with another OS should boil down to A) > > > supporting > > > another libc (finding NEW ways to break sprintf), B) what system calls it > > > makes, > > > and the arguments thereto (which includes different structure layouts). > > > That's > > > assuming same toolchain and a processor type we've supported before. :) > > > > > > >>> but the > > > >>> failures in question are on Android. [the builds in question don't > > > >>> contain a new host prebuilt.]) > > > >> > > > >> So the host prebuilt is the same, you rebuild toybox from source, and > > > >> then a > > > >> test it runs afterwards fails? > > > > > > > > yes, this is only rebuilding the device toybox. and then the > > > > should-be-unrelated test fails. > > > > > > Ok, sounds like you're already avoiding a full rebuild and doctoring the > > > image > > > to pass or not pass? > > > > > > > next time i'm actually at a real > > > > computer i'll try to find out exactly what commands this test is > > > > running on the device. > > > > > > I'm sorry this is being weird. I need to reproduce your test environment > > > here so > > > I can avoid this sort of thing in future (or at least be some help when > > > it happens). > > > > > > Would telling AOSP to build a Pixel 3xl image help in any way? (I didn't > > > spot an > > > adb call into a VM in that last I checked, but how would I know?) > > > > > > >> Is the test the only thing that fails? (Or does the build stop there?) > > > > > > > > afaict it's only this. (the tests all run after the build is done.) > > > > > > So you have a build image, and a command line that calls adb to do Things > > > to it. > > > Can the test run more than once, or does it modify the image to prevent > > > itself > > > from running again? (If you don't need to rebuild between each run, why > > > does it > > > take so long? The test is part of a large test suite and that's what > > > hasn't been > > > isolated yet?0 > > > > > > How big is the image, and what emulator do you need to run it? > > > > > > Also, how would I spray down an strace binary with enough selinux > > > permissions > > > that it could actually run on android? (Presumably there's some sort of > > > setprop > > > idkfa "^^vv<><>baSTART" call buried in the AOSP build scripts?) > > > > > > >> Can this sideload test be extracted from the larger build? Maybe I can > > > >> get an > > > >> image I can replace toybox in and try the test again to see what's > > > >> going on? > > > > > > > > afaict it's only affecting x86? which is presumably another clue. but > > > > i'm not certain that's true yet because i haven't had time to confirm > > > > that everything still works on arm/arm64. > > > > > > You have an environment in which it reproduces, but it could be > > > _anything_. For > > > all we know it could be toybox passing a critical size which causes an > > > extra > > > block to be allocated in the filesystem which moves the xattr into a > > > pathological alignment where we trigger an selinux bug. (I've hit, and > > > debugged, > > > crazier. That's why when I get a bug I FREEZE and don't NOBODY MOVE and we > > > SNAPSHOT and pull out the lapidoptry dissection equipment before the > > > needle we > > > stepped on gets mixed back into the haystack.) > > > > > > Yeah, makes me a kinda slow developer at times. > > > > > > (I'm still thinking the O_PATH thing is the most likely externally visible > > > change. That's something you could spot in strace.) > > > > still haven't had time to actually reproduce this properly, but > > abusing the build servers in the background while i did other stuff > > did let me confirm that it's the O_PATH. removing that fixes things > > again. > > > > one symptom i noticed by accident when running toybox with the O_PATH > > change locally: ps doesn't show any processes! oh, and it's > > reproduceable on my desktop too. so i don't know that that's _the_ > > issue, but that's certainly _an_ issue, and a convincing reason to > > revert at least the O_PATH hunk. > > > > patch attached. > > > > > Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
