Date: Sun, 10 Nov 2024 09:35:29 +0000 From: Taylor R Campbell <riastr...@netbsd.org> Message-ID: <20241110093534.e816384...@mail.netbsd.org>
| Yikes! Do you have a reproducer handy for this? Yes, it turns out to be trivially easy to do, once the bug is understood - just very rare to actually happen in anything in the wild. All you need is main() { dup3(0, 5, O_CLOEXEC); /* use fcntl(5, F_GETFD) and verify close-on-exec is set if you want */ execl("verifier", "verifier", NULL); } plus all the boilerplate (#include) etc, and a little error checking, etc) to where "verifier" is just #!/bin/sh fdflags -v (executable) - or anything else which checks there are no open fds in the new process which shouldn't be there and none with close-on-exec set which should be impossible in a newly created process. Then you need another main() using fcntl(F_DUPFD_CLOEXEC) (or also put that in the same one) and another which does whatever is required to get the kernel fd_clone() function to be called with flags containing O_CLOEXEC, that one I can't do (now anyway) because I didn't bother to work out what that would be exactly. | Can you file a PR to record the reproducer, and track pullups? I could, but I don't really think it is needed for this one. No ATF tests to check it either - the tests can only ever be for the specific errors (specific ways of turning on close-on-exec that are done improperly) which would now have to be some new way (some new added functionality done improperly again) which we cannot possibly write a test for now, only for the cases that are already fixed (or were never wrong) which no-one is ever going to deliberately (or even accidentally) go and break now. I will submit pullups for it, and make sure they happen, so there also isn't really a need for tracking anything, beyond what the releng pullup tickets provide. There doesn't even need to be much testing in HEAD for this one before the pullups happen, the fixes are so obvious, and so obviously correct. A bigger issue would be if there's another case hiding somewhere that I didn't find - I can't test for that as I don't know what it is, if it exists. However I doubt there is such a thing. This is rarely observed in anything real, as all it takes is one instance of setting close-on-exec the right way, even if that fd is no longer still open when the exec happens, to hide the existence of the bug in any other fd's which enabled the flag using a method which the kernel didn't do properly. So if you added fcntl(dup(0), F_SETFD, 1); to the above program then the broken case dup3() above, would never be noticed. Right now I need to work out how my changes to the shell (that I was testing when I encountered the kernel problem - I was looking very closely at when close-on-exec happened, and didn't) seem to have broken the b5 i386 testbed (and perhaps more) quite so badly. kre