On 5/16/22 10:47, enh wrote:
> 
> 
> On Sat, May 14, 2022 at 5:59 AM Rob Landley <r...@landley.net
> <mailto:r...@landley.net>> wrote:
> 
>     On 5/11/22 15:20, enh via Toybox wrote:
>     > the toybox build is pretty noisy on a current mac, complaining that
>     > vfork() should be replaced by fork() or posix_spawn().
> 
>     Sigh.
> 
>     > looks like it's because they've changed vfork() to just be fork() and
>     > would like people to accept that they understand that by changing
>     > their source to say fork() instead... (i'll copy & paste the man page
>     > here because afaik apple doesn't give us anywhere to link to...)
> 
>     I.E. mac will never work on a nommu system. Got it.
>  
> eh, makes sense to me ... i recently rejected patches for non-4KiB page sizes
> because there's no obvious path to shipping such a thing, so it's just 
> untested
> code that will bitrot immediately. (exhibit A: "this isn't the first time
> someone's tried to clean up the hard-coded 4096es".)

Even the architectures that support non-4k page size generally have a 4k page
size option. (Modulo huge pages, which we all just smile and hand wave...)
 
>     >      The vfork system call can be used to create new processes. As of 
> macOS
>     >      12.0, this system call behaves identically to the fork(2) system 
> call,
>     >      except without calling any handlers registered with 
> pthread_atfork(2).
> 
>     That's very much not what it's for. Apple hasn't got anybody left there 
> who
>     understands why it was there in the first place.
> 
> for the systems macOS _does_ run on though, the "the fork() ... exec() model
> doesn't work great if you have lots of fds/page table entries/etc" problem is
> what they're more likely to be thinking about.

For the page table entries, vfork() solves that because it defers their
allocation until exec time. Parent and child share everything INCLUDING THE
STACK which is why the parent is stopped until the child finishes (via exec or
_exit()), and the child can't return from the current function it's in (because
doing so and calling ANY other function would stomp the parent's return vector
on the stack).

> linux added close_range() as a
> workaround for the fd problem, whereas Apple is (afaik) the only kernel that's
> actually implemented posix_spawn() as a system call.

Musl uses vfork() to implement posix_spawn(), albeit via clone:

        pid = __clone(child, stack+sizeof stack,
                CLONE_VM|CLONE_VFORK|SIGCHLD, &args);

> so they can theoretically
> at least -- i've never poked deeper -- do a better job than anyone else of
> avoiding unnecessary work when spawning. 

Except the 80/20 solution of this was already done for them via vfork(), which
people who _understood_ it have been using for 40 years even on systems with 
mmu.

>     (There are more nommu processors in the world than there are mmu 
> processors.
>     It's sort of a "total weight of all insects" vs "total weight of all 
> mammals"
>     thing. You just don't _notice_ them...)
> 
> aye, but zero of them are running macOS :-)

Indeed.

>     >      This system call is deprecated. In a future release, it may begin 
> to
>     return
>     >      errors in all cases, or may be removed entirely.  It is extremely
>     strongly
>     >      recommended to replace all uses with fork(2) or, ideally, 
> posix_spawn(3).
>     >
>     > weirdly it looks like you can use _POSIX_C_SOURCE to make this go
>     > away? from their unistd.h:
>     >
>     > #if !defined(_POSIX_C_SOURCE)
>     > __deprecated_msg("Use posix_spawn or fork")
>     > #endif
>     > pid_t    vfork(void) __WATCHOS_PROHIBITED __TVOS_PROHIBITED;
> 
>     We can move that header into portability.h if we need to? It's #included 
> before
>     the rest for a reason...
> 
>     > -Wno-deprecated-declarations in configure works too,
> 
>     I'm happy to wait for it to break and then fix it better then. :)
> 
> ack.

I can't test (modulo the github test which doesn't run reliably), happy to apply
either patch...

>     > though their
>     > threat to make vfork() always fail in a future release makes me
>     > question whether that's such a good idea. (i haven't followed apple
>     > closely enough to know whether there's any precedent for violent
>     > breakage like that. i'm not sure why you wouldn't just remove the
>     > symbol rather than replace it with an implementation that always
>     > fails?)
> 
>     Ask Rich Felker with his insistence that fork() must always be there on 
> nommu
>     systems and return -EINVAL, and me patching it out with an #ifdef in my 
> build
>     script.
> 
>     Rob
> 
>     P.S. A few years ago I did a contract at a place that ported an old 
> system from
>     wince to Linux and ran an ~80-thread .net app under mono, and every time 
> they
>     forked from one of those threads it froze the whole app for 75 
> miliseconds while
>     it copied all of its memory to the new process. (Because forking from a 
> thread
>     defeats copy on write, the kernel doesn't even TRY to track it and just 
> copies
>     everything, which involves locks.) Unfortunately something needed 4ms 
> response
>     time to avoid going "boing". The fix was to replace the fork() with a 
> vfork()
>     which only froze the ONE thread that called it (until it could do the 
> rest of
>     its setup and _exit() or exec() the new process). No big memory copy, 
> thus no
>     latency spike. On a system that had an MMU, and yet...
> 
> yeah, i thought i'd already said (but didn't spot it in my quoted bits when
> replying) that i've seen the Android system health folks switching a few 
> fork()s
> to vfork()s lately.

If you UNDERSTAND it, it's actually a very nice optimization. Toybox uses it by
default so there's a single codepath that gets all the testing, but "faster and
cheaper" is a nice fringe benefit. :)

P.S. The XVFORK() macro isn't a config thing that uses fork or vfork, it always
uses vfork but what it's hiding is:

  lib/lib.h:#define XVFORK() xvforkwrap(vfork())

Which means vfork() is always called before xvforkwrap(), and then we
immediately jump into a new function call context where we don't have to worry
about stomping parent stack context and can do all the error checking and such
we like without games.

Plus it's annotated with returns_twice which lets the darn compiler know NOT to
"optimize out" local variables because it thinks they're no longer used and that
stack space can be reclaimed by variables with a later lifetime. (There was a
very annoying bug in netcat because of that. At one point there was either a
musl or a uClibc bug (I forget which) because setjmp() was missing that
annotation on some architecture too...)

> i have no immediate plans to make vfork() == fork(). people
> manage to screw up enough with just fork() (or they get really "clever" and 
> try
> to use clone() directly) that i don't think i'd save myself much of a headache
> that way.

Oh it's worse than that, you can actually very subtly break stuff that has
"child must run to completion before parent advances" assumptions that fork()
doesn't honor (but can coincidentally do in any amount of testing, especially
when the scheduler prioritizes the child over the parent, which people fought
about as an optimization until Linux grew
/proc/sys/kernel/sched_child_runs_first which I am not making up. It's
technically slower if you set up the child then switch to the parent which is
just gonna call wait() and then switch back to the child to do the exec() work.
The question is really about cache flushing and which one is a win depends on
cache size, but I think the "subtle errors if you depend on it" means better bug
finding won out? Or something? I remember the kernel guys arguing about it, not
the conclusion...)

I _think_ "child wrote something into a local variable that the parent then
reads" is technically undefined behavior because the vfork() child can run in a
copy of the stack on platforms that treat vfork() as a weird subset of
threading. It's sort of like a signal stack, but not really a win because unless
you're parsing where your local function starts on the stack you wind up having
to copy the whole stack which can be 8 megabytes, I think the last one I saw do
that kind of handwaved 64k and hoped for the best? (Note you STILL can't return
from the function with a truncated stack, now for a DIFFERENT reason, so what
have you gained exactly?) It's a bad idea, but so was most of Solaris.

But yeah, fork() != vfork() and people who think you can just search and replace
tend to introduce subtle bugs that take YEARS to find all of.

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to