Re: fork speed vs /bin/sh

2003-11-28 Thread Terry Lambert
Peter Wemm wrote:
> What this shows is that vfork() is 3 times faster than fork() on static
> binaries, and 9 times faster on dynamic binaries.  If people are
> worried about a 40% slowdown, then perhaps they'd like to investigate
> a speedup that works no matter whether its static or dynamic?  There is
> a reason that popen(3) uses vfork().  /bin/sh should too, regardless of
> whether its dynamic or static.  csh/tcsh already uses vfork() for the
> same reason.

I'm a big fan of vfork(); the on problem I have with the use of
it is that people tend to treat it as "a faster fork()", when it
definitely is not.  The utility of vfork() is limited to the list
of allowed system calls, which are _exit() and execve(); all other
usage is undefined -- specifically, you cannot control things like
whether it's the parent or the child that gets effected by calls
like setsid(), setpgrp(), etc..

The other place that vfork() really sucks is in applications like
"screen" or other applications that have multiple children and act
as mux'es for them: during the vfork() to spawn off a new child
from the parent, the parent is stalled, and this in turn stalls
all the children, as well.

The vfork() system call is a good thing, particularly compared to
the fork() system call, IFF it's used appropriately.

For the most part, FreeBSD should consider creating a posix_spawn()
system call, instead, for most uses to which people put either the
fork() or vfork() system calls today.

-- Terry
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: fork speed vs /bin/sh

2003-11-27 Thread Matthew Dillon
:What this shows is that vfork() is 3 times faster than fork() on static
:binaries, and 9 times faster on dynamic binaries.  If people are
:worried about a 40% slowdown, then perhaps they'd like to investigate
:a speedup that works no matter whether its static or dynamic?  There is
:a reason that popen(3) uses vfork().  /bin/sh should too, regardless of
:whether its dynamic or static.  csh/tcsh already uses vfork() for the
:same reason.
:
:NetBSD have already taken advantage of this speedup and their /bin/sh uses
:vfork().  Some enterprising individual who cares about /bin/sh speed should
:check out that.  Start looking near #ifdef DO_SHAREDVFORK.

That isn't really a fair comparison because your vfork is hitting a
degenerate case and isn't actually doing anything significant.  You
really need to exec() something.  I've included a program below 
that [v]fork/exec's "./sh -c exit 0" 5000 times.

Dell2550, 2xCPU (MP build), DFly

0.000u  4.095s 0:02.53 161.6%   154+107k 0+0io 0pf+0w VFORK/EXEC STATIC SH
0.000u  6.681s 0:04.04 165.3%   94+97k 0+0io 0pf+0w   FORK/EXEC STATIC SH
0.500u 16.844s 0:16.34 106.1%   53+84k 0+0io 0pf+0w   VFORK/EXEC DYNAMIC SH
0.093u 18.303s 0:23.86 77.0%42+79k 0+0io 0pf+0w   FORK/EXEC DYNAMIC SH


Athlon64, 2xCPU (UP), DFly

0.078u 0.687s 0:00.74 101.3%399+226k 0+0io 0pf+0w VFORK/EXEC STATIC SH
0.117u 0.968s 0:01.07 100.0%273+208k 0+0io 0pf+0w FORK/EXEC STATIC SH
2.218u 2.484s 0:04.71 99.5% 121+180k 0+0io 1pf+0w VFORK/EXEC DYNAMIC SH
2.281u 2.773s 0:04.98 101.4%113+179k 0+0io 0pf+0w FORK/EXEC DYNAMIC SH

1.304u 2.289s 0:03.60 99.4% 121+180k 0+0io 0pf+0w VFORK/EXEC DYNAMIC SH
  WITH PREBINDING.
1.296u 2.648s 0:03.90 100.7%112+180k 0+0io 1pf+0w FORK/EXEC DYNAMIC SH
  WITH PREBINDING.



These results were rather unexpected, actually.  I'm not sure why the
numbers on the DELL box are so bad with a dynamic 'sh' but I suspect that
the dynamic linking is blowing out the L1 cache.

In anycase, taking the Athlon64 system the difference between static and
dynamic is around 4 seconds while the difference between vfork and fork
is only around 0.25 seconds, so while moving to vfork() helps it doesn't
help all that much.

Unless you happen to be hitting a boundary condition on the L1 cache,
that is.  If that is presumably the case on the Dell box (which only
has a 16K L1 cache where as the AMD64 has a 64K L1 cache), then the
difference is around 14 seconds between vfork static and vfork dynamic
verses an additional 8 seconds going from vfork to fork.  Vfork would
probably be a significant improvement on the DELL box.

Prebinding generates around a 20% overhead improvement for the dynamic 'sh'
on the Athlon64 but on the Dell2550 prebinding actually made things
go slower (not shown above), from 23.8 seconds to 26 seconds.  I 
think there is an edge case due to prebinding having a greater L1 cache
impact.  For larger, more complex programs prebinding shows definite,
if small, improvements.

-Matt

/*
 * CD into the directory containing the ./sh executable before running
 */
#include 
#include 
#include 

main()
{
int i;
pid_t pid;

for (i = 0; i < 5000; ++i) {
if ((pid = vfork()) == 0) { /* < CHANGE THIS FORK/VFORK */
execl("./sh", "./sh", "-c", "exit", "0", NULL);
write(2, "problem\n", 8);
_exit(1);
}
if (pid > 0)
waitpid(pid, NULL, 0);
}
return(0);
}

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


fork speed vs /bin/sh

2003-11-27 Thread Peter Wemm
I *know* I'm going to regret posting this, but if people care about
the speed of their shell, then perhaps you want to look at this:

[EMAIL PROTECTED]:46am]/tmp-149> cc -O -o vforkathon.dynamic vforkathon.c
[EMAIL PROTECTED]:46am]/tmp-150> cc -O -static -o vforkathon.static vforkathon.c
[EMAIL PROTECTED]:47am]/tmp-151> cc -O -static -o forkathon.static forkathon.c
[EMAIL PROTECTED]:47am]/tmp-152> cc -O -o forkathon.dynamic forkathon.c
[EMAIL PROTECTED]:47am]/tmp-153> time ./forkathon.dynamic
0.120u 17.192s 0:17.81 97.1%5+169k 0+0io 0pf+0w
[EMAIL PROTECTED]:47am]/tmp-154> time ./forkathon.static
0.051u 5.939s 0:06.38 93.7% 15+177k 0+0io 0pf+0w
[EMAIL PROTECTED]:47am]/tmp-155> time ./vforkathon.dynamic
0.015u 2.006s 0:02.30 87.3% 5+176k 0+0io 0pf+0w
[EMAIL PROTECTED]:48am]/tmp-156> time ./vforkathon.static
0.022u 2.020s 0:02.34 87.1% 16+182k 0+0io 0pf+0w

What this shows is that vfork() is 3 times faster than fork() on static
binaries, and 9 times faster on dynamic binaries.  If people are
worried about a 40% slowdown, then perhaps they'd like to investigate
a speedup that works no matter whether its static or dynamic?  There is
a reason that popen(3) uses vfork().  /bin/sh should too, regardless of
whether its dynamic or static.  csh/tcsh already uses vfork() for the
same reason.

NetBSD have already taken advantage of this speedup and their /bin/sh uses
vfork().  Some enterprising individual who cares about /bin/sh speed should
check out that.  Start looking near #ifdef DO_SHAREDVFORK.

In case anybody was wondering:

[EMAIL PROTECTED]:48am]/tmp-157> cat forkathon.c
#include 
#include 
#include 

int
main(int ac, char *av[])
{
int i;
pid_t pid;

for (i = 0; i < 10; i++) {
pid = fork();
switch (pid) {
case 0:
_exit(0);
default:
waitpid(pid, NULL, 0);
}
}
}
[EMAIL PROTECTED]:53am]/tmp-158> diff forkathon.c vforkathon.c
12c12
<   pid = fork();
---
>   pid = vfork();


Cheers,
-Peter
--
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
"All of this is for nothing if we don't go to the stars" - JMS/B5

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"