On Tue, Oct 3, 2023 at 1:44 PM Rob Landley <r...@landley.net> wrote: > > On 10/3/23 13:38, enh wrote: > >> Trying that by hand on devuan (using coreutils' timeout): > >> > >> $ timeout .1 / > >> timeout: failed to run command ‘/’: Permission denied > >> $ echo $? > >> 126 > >> > >> From the bash man page: > >> > >> If a command is not found, the child process created to execute it re‐ > >> turns a status of 127. If a command is found but is not executable, > >> the return status is 126. > >> > >> I'm not sure how you can "file not found" the root directory? (Selinux? > >> Filehandle exhaustion? Even chmod 000 should return EPERM not ENOENT.) > >> > >> The relevant code is xwrap.c line 233: > >> > >> execvp(argv[0], argv); > >> > >> toys.exitval = 126+(errno == ENOENT); > > > > +Colin Cross who just saw this too. > So why is execvp("/", {"/", 0}); returning ENOENT? It's saying it cannot > _find_ > the root directory, not that it can't execute a directory. Hmmm... > > The execvp man page says: > > If the specified filename includes a slash character, then PATH is ig‐ > nored, and the file at the specified pathname is executed. > > Which seems like it would moot: > > If permission is denied for a file (the attempted execve(2) failed with > the error EACCES), these functions will continue searching the rest of > the search path. If no other file is found, however, they will return > with errno set to EACCES. > > Which would still be returning something other than ENOENT anyway. > > Hmmm... > > If the header of a file isn't recognized (the attempted execve(2) > failed with the error ENOEXEC), these functions will execute the shell > (/bin/sh) with the path of the file as its first argument. (If this > attempt fails, no further searching is done.) > > I don't THINK that's a likely fallback path here? Although /bin/sh not found > might explain it. But that would be deterministically reproducible and you're > having an intermittent issue, right?
correct. ccross tells me it's ~2% of all runs in CI. locally, i commented out all the other tests, and just ran this repeatedly on a device, and it did repro after about half an hour. i've kicked off a similar test on the host, and i've kicked off the device again but with strace in the mix (which hopefully doesn't slow things down enough to make the problem disappear!). > $ cat > potato.c << EOF > #include <unistd.h> > int main(int argc, char *argv[]) { execvp("/", (char *[]){"/", 0}); } > EOF > $ gcc potato.c > $ strace ./a.out > execve("/", ["/"], 0x7ffebd0880d8 /* 36 vars */) = -1 EACCES (Permission > denied) > $ sudo strace ./a.out > execve("/", ["/"], 0x7ffc32654e08 /* 16 vars */) = -1 EACCES (Permission > denied) > $ ls -ld / > drwxr-xr-x 24 root root 4096 Feb 9 2023 / > > In general root doesn't care about permission bits, and there's no /bin/sh > follow-up to the syscall failure here. Tried again with bionic and there were > a > couple extra mprotect() calls on the way out but still no /bin/sh fallback... > > So the question here is does the kernel have a weird intermittent codepath, or > does bionic+selinux have a weird intermittent codepath? (yeah, that's why i'm trying on the host now.) > Let's see: in the vanilla kernel source fs/exec.c has SYSCALL_DEFINE3(execve) > which does return do_execve(getname(filename), argv, envp); which wraps > do_execveat_common() on line 1888 of the same file. > > A quick cheat grepping for EACCES shows two uses in this file, one in > SYSCALL_DEFINE1(uselib) which I just BOGGLE at because how are shared > libraries > THE KERNEL'S PROBLEM... but I really doubt we get there here. No, the NORMAL > codepath (which we're apparently not reaching) is do_open_execat(int fd, > struct > filename *name, int flags) which says no, may_open() already checked and this > is > just a race condition check, and it's common plumbing in another file that > returns this error code. Alright, cheat failed, back to drilling. > > Back to do_execveat_common(): filename was not a NULL pointer or similar. > UCOUNT_RLIMIT_NPROC would return -EAGAIN. What error code might alloc_bprm() > return, it's on line 1512 of this same file and it is understandably ENOMEM. > count() can return EFAULT, E2BIG, and ERESTARTNOHAND. (Huh, launching a > process > with an argv of { NULL } has a kernel workaround with shaking finger of shame > in > the log? Did not know that.) (yeah, surprisingly that "broke userspace" in the mild sense of "we had tests" that made sure the _dynamic linker_ didn't crash in that circumstance. and of course, we only had that test because it'd happened in real life. but, yeah, i'm pretty happy with "don't do that".) > bprm_stack_limits() can set E2BIG. > copy_string_kernel() and copy_strings() can both EFAULT or E2BIG. > > And now we're on to bprm_execve(), which I can drill through after lunch... > > Rob > > P.S. I note that 127 is to me an ACCEPTABLE failure return code for this since > attempting to run the root directory is shenanigans in the first place. From > toybox's perspective, it's possible the test is being unnecessarily specific > here. But it would be nice to understand what's going on, pursuing this we may > learn something about your system setup... when i wrote the test i was trying to make sure we tested _all_ the paths through.... ...but there's a bug _somewhere_ for this to be non-deterministic. > P.P.S. If this is (char)-1 getting returned in the wrong place by some obscure > codepath... I'm not gonna be _that_ surprised, to tell the truth. > Disappointed, > but not really surprised. > > P.P.P.S. I assume this was seen on 64 bit arm android of a current flavor > running the test suite through vendor_init or some such? this was a hwasan build on cheetah. (a) because that's what happens to be on my desk, but (b) because ccross' example was arm64 hwasan too. _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net