I've gotten a bit tired of seeing "could not create semaphores: No space
left on device" failures in the buildfarm, so I looked into whether we
should consider preferring unnamed POSIX semaphores over SysV semaphores.

We've had code for named and unnamed POSIX semaphores in our tree for
a long time, but it's not actually used on any current platform AFAIK.
There are good reasons to avoid the named-semaphore variant: typically
that eats a file descriptor per sema per backend.  However that
complaint doesn't necessarily apply to unnamed semaphores.  Indeed,
it seems that on Linux an unnamed POSIX semaphore is basically a futex,
which eats zero kernel resources; all the state is in userspace.

Although in normal cases the semaphore code paths aren't very heavily
exercised in our code, I was able to get a measurable performance
difference by building with --disable-spinlocks, so that spinlocks are
emulated with semaphores.  On an 8-core RHEL6 machine, "pgbench -S -c 20
-j 20" seems to be about 4% faster with unnamed semaphores than SysV
semaphores.  It'd be good to replicate that test on some higher-end
hardware, but provisionally I'd say unnamed semaphores are faster.

The data structure is bigger: Linux's type sem_t is 32 bytes on 64-bit
machines (16 bytes on 32-bit) whereas we use 8 bytes for SysV semaphores.
But there aren't normally a huge number of semaphores in a cluster, and
anyway this comparison is cheating because it ignores the space taken for
the kernel data structures backing the SysV semaphores.

There was some previous discussion about this in
but that thread tailed off without a resolution, partly because it wasn't
the kind of change we'd consider making in late beta.  One thing
I expressed concern about there was whether there are any hidden kernel
resources underlying an unnamed semaphore.  So far as I can tell by
strace'ing sem_init and sem_destroy, there are not, at least on Linux.

Another issue is raised in today's discussion
where it appears that we might need to be more careful about putting
memory barriers into the unnamed-semaphore code (probably because it
might not enter the kernel).  But if that's a bug, we'd want to fix it
anyway, IMO.

So for Linux, I think probably we should switch.

macOS seems not to have unnamed POSIX semaphores, only named ones (the
functions exist, but they always fail with ENOSYS).  However, some
googling suggests that other BSD derivatives do have these primitives, so
somebody ought to do a similar comparison on them to see if switching is a
win.  (The first thread above asserts that it is for FreeBSD, but someone
should recheck using a test case that stresses semaphores more.)

Dunno about other platforms.  sem_init is nominally required by SUS v2,
but it doesn't seem to actually exist everywhere, so I doubt we can drop
SysV altogether.  I'd be inclined to change the default on a platform-
by-platform basis not whole hog.

If anyone wants to test, the main thing you have to do to try this in
the existing code is to add "USE_UNNAMED_POSIX_SEMAPHORES=1" and
"--disable-spinlocks" to your configure arguments.  On Linux you may need
to add -lrt to the backend LIBS list, though on my machine configure is
putting that in already.

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to