Date:        Sun, 10 Nov 2024 09:35:29 +0000
    From:        Taylor R Campbell <riastr...@netbsd.org>
    Message-ID:  <20241110093534.e816384...@mail.netbsd.org>

  | Yikes!  Do you have a reproducer handy for this?

Yes, it turns out to be trivially easy to do, once the
bug is understood - just very rare to actually happen in
anything in the wild.

All you need is

        main() {
                dup3(0, 5, O_CLOEXEC);
                /* use fcntl(5, F_GETFD) and verify
                   close-on-exec is set if you want */
                execl("verifier", "verifier", NULL);
        }

plus all the boilerplate (#include) etc, and a little error checking, etc)
to 

where "verifier" is just

        #!/bin/sh
        fdflags -v

(executable) - or anything else which checks there are no open
fds in the new process which shouldn't be there and none with
close-on-exec set which should be impossible in a newly created
process.

Then you need another main() using fcntl(F_DUPFD_CLOEXEC) (or
also put that in the same one) and another which does whatever
is required to get the kernel fd_clone() function to be called
with flags containing O_CLOEXEC, that one I can't do (now anyway)
because I didn't bother to work out what that would be exactly.

  | Can you file a PR to record the reproducer, and track pullups?

I could, but I don't really think it is needed for this one.

No ATF tests to check it either - the tests can only ever be
for the specific errors (specific ways of turning on close-on-exec
that are done improperly) which would now have to be some new
way (some new added functionality done improperly again) which
we cannot possibly write a test for now, only for the cases that
are already fixed (or were never wrong) which no-one is ever
going to deliberately (or even accidentally) go and break now.

I will submit pullups for it, and make sure they happen, so there
also isn't really a need for tracking anything, beyond what the
releng pullup tickets provide.   There doesn't even need to be
much testing in HEAD for this one before the pullups happen, the
fixes are so obvious, and so obviously correct.

A bigger issue would be if there's another case hiding somewhere
that I didn't find - I can't test for that as I don't know what
it is, if it exists.   However I doubt there is such a thing.

This is rarely observed in anything real, as all it takes is
one instance of setting close-on-exec the right way, even if that
fd is no longer still open when the exec happens, to hide the
existence of the bug in any other fd's which enabled the flag
using a method which the kernel didn't do properly.  So if you
added

        fcntl(dup(0), F_SETFD, 1);

to the above program then the broken case dup3() above, would
never be noticed.

Right now I need to work out how my changes to the shell (that
I was testing when I encountered the kernel problem - I was looking
very closely at when close-on-exec happened, and didn't) seem to
have broken the b5 i386 testbed (and perhaps more) quite so badly.

kre

Reply via email to