Re: [PATCH 1/1] Initial mach based shared memory support.

2021-03-25 Thread David Steele

On 1/19/21 6:50 AM, James Hilliard wrote:

On Mon, Jan 18, 2021 at 11:12 PM Tom Lane  wrote:


James Hilliard  writes:

from my understanding due to mach semantics a number of the sanity checks
we are doing for sysv shmem are probably unnecessary when using mach
API's due to the mach port task bindings(mach_task_self()) effectively
ensuring our maps are already task bound and won't conflict with other tasks.


Really?  If so, this whole patch is pretty much dead in the water,
because the fact that sysv shmem is globally accessible is exactly
why we use it (well, that and the fact that you can find out how many
processes are attached to it).  It's been a long time since we cared
about sysv shmem as a source of shared storage.  What we really use
it for is as a form of mutual exclusion, i.e. being able to detect
whether any live children remain from a dead postmaster.  That is,
PGSharedMemoryIsInUse is not some minor ancillary check, it's the main
reason this module exists at all.  If POSIX-style mmap'd shmem had the
same capability we'd likely have dropped sysv altogether long ago.

>

Oh, I had figured the mutual exclusion check was just to ensure that
one didn't accidentally overlap an existing shared memory map.


At the very least, this patch does not seem to be ready for review, so I 
have marked it Waiting on Author.


Since it appears the approach needs to be reconsidered, my 
recommendation is to mark it Returned with Feedback at the end of the CF 
so I'll do that unless there are objections.


Thanks,
--
-David
da...@pgmasters.net




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-20 Thread Thomas Munro
On Sun, Nov 22, 2020 at 9:19 PM James Hilliard
 wrote:
> In order to avoid hitting these limits we can bypass the wrapper layer
> and just use mach directly.

FWIW I looked into using mach_vm_alllocate() years ago because I
wanted to be able to use its VM_FLAGS_SUPERPAGE_SIZE_2MB flag to
implement PostgreSQL's huge_pages option for the main shared memory
area, but I ran into some difficulty getting such mapping to be
inherited by fork() children.  There may be some way to get past that,
but it seems the current crop of Apple Silicon has only 4KB and 16KB
pages and I don't know if that's interesting enough.  On the other
hand, I just saw a claim that "running an arm64 Debian VM on Apple M1,
using a 16K page size instead of 4K reduces this kernel build time by
*16%*".

[1] https://twitter.com/AtTheHackOfDawn/status/1333895115174187011




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-19 Thread Tom Lane
Peter Eisentraut  writes:
> This is the first I've heard in years of shared memory limits being a 
> problem on macOS.  What settings or what levels of concurrency do you 
> use to provoke these errors?

I suppose it wouldn't be too hard to run into shmmni with aggressive
parallel testing; the default is just 32.  But AFAIK it still works
to set the shm limits via /etc/sysctl.conf.  (I wonder if you have
to disable SIP to mess with that, though.)

regards, tom lane




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-19 Thread Peter Eisentraut

On 2020-11-22 09:19, James Hilliard wrote:

OSX implements sysv shmem support via a mach wrapper, however the mach
sysv shmem wrapper has some severe restrictions that prevent us from
allocating enough memory segments in some cases.

These limits appear to be due to the way the wrapper itself is
implemented and not mach.

For example when running a test suite that spins up many postgres
instances I commonly see this issue:
DETAIL:  Failed system call was shmget(key=5432002, size=56, 03600).


This is the first I've heard in years of shared memory limits being a 
problem on macOS.  What settings or what levels of concurrency do you 
use to provoke these errors?






Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-19 Thread James Hilliard
On Mon, Jan 18, 2021 at 11:12 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > from my understanding due to mach semantics a number of the sanity checks
> > we are doing for sysv shmem are probably unnecessary when using mach
> > API's due to the mach port task bindings(mach_task_self()) effectively
> > ensuring our maps are already task bound and won't conflict with other 
> > tasks.
>
> Really?  If so, this whole patch is pretty much dead in the water,
> because the fact that sysv shmem is globally accessible is exactly
> why we use it (well, that and the fact that you can find out how many
> processes are attached to it).  It's been a long time since we cared
> about sysv shmem as a source of shared storage.  What we really use
> it for is as a form of mutual exclusion, i.e. being able to detect
> whether any live children remain from a dead postmaster.  That is,
> PGSharedMemoryIsInUse is not some minor ancillary check, it's the main
> reason this module exists at all.  If POSIX-style mmap'd shmem had the
> same capability we'd likely have dropped sysv altogether long ago.
Oh, I had figured the mutual exclusion check was just to ensure that
one didn't accidentally overlap an existing shared memory map.

If it's just an additional sanity check maybe we should make it non-fatal for
ENOSPC returns as it is impossible to increase the sysv shm segment limits
on OSX from my understanding.

But there should be a way to do this with mach, I see a few options, the task
bound map behavior from my understanding is what you get when using
mach_task_self(), however you can also look up a task using pid_for_task()
from my understanding(there may be security checks here that could cause
issues). It also may make sense to write the postmaster task ID and use that
instead of the PID. I'll try and rework this.

By the way is PGSharedMemoryIsInUse only using the pid(id2) for looking up
the shmem segment key or is something else needed? I noticed there is an
unused id1 variable as well that gets passed to it.
>
> I've occasionally wondered if we should take another look at file locking
> as a mechanism for ensuring only one postmaster+children process group
> can access a data directory.  Back in the day it was too untrustworthy,
> but maybe that has changed.
Yeah, there's probably some ugly edge cases there still.
>
> regards, tom lane




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-18 Thread Tom Lane
James Hilliard  writes:
> from my understanding due to mach semantics a number of the sanity checks
> we are doing for sysv shmem are probably unnecessary when using mach
> API's due to the mach port task bindings(mach_task_self()) effectively
> ensuring our maps are already task bound and won't conflict with other tasks.

Really?  If so, this whole patch is pretty much dead in the water,
because the fact that sysv shmem is globally accessible is exactly
why we use it (well, that and the fact that you can find out how many
processes are attached to it).  It's been a long time since we cared
about sysv shmem as a source of shared storage.  What we really use
it for is as a form of mutual exclusion, i.e. being able to detect
whether any live children remain from a dead postmaster.  That is,
PGSharedMemoryIsInUse is not some minor ancillary check, it's the main
reason this module exists at all.  If POSIX-style mmap'd shmem had the
same capability we'd likely have dropped sysv altogether long ago.

I've occasionally wondered if we should take another look at file locking
as a mechanism for ensuring only one postmaster+children process group
can access a data directory.  Back in the day it was too untrustworthy,
but maybe that has changed.

regards, tom lane




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-18 Thread James Hilliard
On Mon, Jan 18, 2021 at 5:29 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > OSX implements sysv shmem support via a mach wrapper, however the mach
> > sysv shmem wrapper has some severe restrictions that prevent us from
> > allocating enough memory segments in some cases.
> > ...
> > In order to avoid hitting these limits we can bypass the wrapper layer
> > and just use mach directly.
>
> I wanted to review this, but it's impossible because the kernel calls
> you're using have you've-got-to-be-kidding documentation like this:
>
> https://developer.apple.com/documentation/kernel/1402504-mach_vm_page_info?language=objc
FYI there's a good chance I'm not using some of these info API's correctly,
from my understanding due to mach semantics a number of the sanity checks
we are doing for sysv shmem are probably unnecessary when using mach
API's due to the mach port task bindings(mach_task_self()) effectively
ensuring our maps are already task bound and won't conflict with other tasks.

The mach vm API for our use case does appear to be superior to the shmem API
on top of not being bound by the sysv wrapper limits due to it being
natively task
bound as opposed to normal sysv shared memory maps which appears to be
system wide global maps and thus requires lots of key conflict checks which
should be entirely unnecessary when using task bound mach vm maps.

For example PGSharedMemoryIsInUse() from my understanding should always
return false for the mach vm version as it is already task scoped.

I'm not 100% sure here as I'm not very familiar with all the ways postgres uses
shared memory however. Is it true that postgresql shared memory is strictly used
by child processes fork()'d from the parent postmaster? If that is the
case I think
as long as we ensure all our map operations are bound to the parent postmaster
mach_task_self() port then we should never conflict with another postmaster's
process tree's memory maps.

I tried to largely copy existing sysv shmem semantics in this first
proof of concept as
I'm largely unfamiliar with postgresql internals but I suspect we can
safely remove
much of the map key conflict checking code entirely.

Here's a few additional references I found which may be helpful.

This one is more oriented for kernel programming, however much of the info is
relevant to the userspace interface as well it would appear:
https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/KernelProgramming/vm/vm.html#//apple_ref/doc/uid/TP3905-CH210-TPXREF109

This book gives an overview of the mach userspace interface as well, although
I don't think it's a fully complete API reference, but it covers some
of the basic use
cases:
https://flylib.com/books/en/3.126.1.89/1/

Apparently the hurd kernel uses a virtual memory interface that is based on
the mach kernel, and it has more detailed API docs, note that these tend to
not have the "mach_" prefix but appear to largely function the same:
https://www.gnu.org/software/hurd/gnumach-doc/Virtual-Memory-Interface.html

There is additional documentation from the OSFMK kernel(which was combined with
XNU by apple from my understanding) here related to the virtual memory
interface,
like the herd docs these tend to not have the "mach_" prefix:
http://web.mit.edu/darwin/src/modules/xnu/osfmk/man/
>
> Google finds the source code too, but that's equally bereft of useful
> documentation.  So I wonder where you obtained the information necessary
> to write this patch.
>
> I did notice however that mach_shmem.c seems to be 95% copy-and-paste from
> sysv_shmem.c, including a lot of comments that don't feel particularly
> true or relevant here.  I wonder whether we need a separate file as
> opposed to a few #ifdef's around the kernel calls.
>
> regards, tom lane




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-18 Thread James Hilliard
On Mon, Jan 18, 2021 at 5:29 PM Tom Lane  wrote:
>
> James Hilliard  writes:
> > OSX implements sysv shmem support via a mach wrapper, however the mach
> > sysv shmem wrapper has some severe restrictions that prevent us from
> > allocating enough memory segments in some cases.
> > ...
> > In order to avoid hitting these limits we can bypass the wrapper layer
> > and just use mach directly.
>
> I wanted to review this, but it's impossible because the kernel calls
> you're using have you've-got-to-be-kidding documentation like this:
>
> https://developer.apple.com/documentation/kernel/1402504-mach_vm_page_info?language=objc
Yeah, the documentation situation with Apple tends to be some degree of terrible
in general. Ultimately I was referencing the sysv shmem wrapper code
itself a lot:
https://github.com/apple/darwin-xnu/blob/master/bsd/kern/sysv_shm.c

I also used the browser shmem implementations as references:
https://github.com/chromium/chromium/blob/master/base/memory/platform_shared_memory_region_mac.cc
https://github.com/mozilla/gecko-dev/blob/master/ipc/glue/SharedMemoryBasic_mach.mm
>
> Google finds the source code too, but that's equally bereft of useful
> documentation.  So I wonder where you obtained the information necessary
> to write this patch.
Mostly trial and error while using the above references.
>
> I did notice however that mach_shmem.c seems to be 95% copy-and-paste from
> sysv_shmem.c, including a lot of comments that don't feel particularly
> true or relevant here.  I wonder whether we need a separate file as
> opposed to a few #ifdef's around the kernel calls.
Well I basically started with copying sysv_shmem.c and then started
replacing the sysv
shmem calls with their mach equivalents. I kept it as a separate file
since the mach
calls use different semantics and that's also how the other projects
seem to separate
out their mach shared memory handlers. I left a number of the original
comments since
I wasn't entirely sure what best to replace them with.
>
> regards, tom lane




Re: [PATCH 1/1] Initial mach based shared memory support.

2021-01-18 Thread Tom Lane
James Hilliard  writes:
> OSX implements sysv shmem support via a mach wrapper, however the mach
> sysv shmem wrapper has some severe restrictions that prevent us from
> allocating enough memory segments in some cases.
> ...
> In order to avoid hitting these limits we can bypass the wrapper layer
> and just use mach directly.

I wanted to review this, but it's impossible because the kernel calls
you're using have you've-got-to-be-kidding documentation like this:

https://developer.apple.com/documentation/kernel/1402504-mach_vm_page_info?language=objc

Google finds the source code too, but that's equally bereft of useful
documentation.  So I wonder where you obtained the information necessary
to write this patch.

I did notice however that mach_shmem.c seems to be 95% copy-and-paste from
sysv_shmem.c, including a lot of comments that don't feel particularly
true or relevant here.  I wonder whether we need a separate file as
opposed to a few #ifdef's around the kernel calls.

regards, tom lane




[PATCH 1/1] Initial mach based shared memory support.

2020-11-22 Thread James Hilliard
OSX implements sysv shmem support via a mach wrapper, however the mach
sysv shmem wrapper has some severe restrictions that prevent us from
allocating enough memory segments in some cases.

These limits appear to be due to the way the wrapper itself is
implemented and not mach.

For example when running a test suite that spins up many postgres
instances I commonly see this issue:
DETAIL:  Failed system call was shmget(key=5432002, size=56, 03600).

In order to avoid hitting these limits we can bypass the wrapper layer
and just use mach directly.

This is roughly how all major browsers seem to implement shared memory
support on OSX.

This still needs a lot of cleanup but seems to compile and basic tests
seem to pass.
---
 configure |  15 +-
 configure.ac  |  11 +-
 src/backend/port/mach_shmem.c | 797 ++
 src/include/pg_config.h.in|   3 +
 src/include/postgres.h|  14 +
 5 files changed, 831 insertions(+), 9 deletions(-)
 create mode 100644 src/backend/port/mach_shmem.c

diff --git a/configure b/configure
index dd64692345..5a2eb14dea 100755
--- a/configure
+++ b/configure
@@ -18043,16 +18043,21 @@ fi
 
 
 # Select shared-memory implementation type.
-if test "$PORTNAME" != "win32"; then
+if test "$PORTNAME" = "win32"; then
 
-$as_echo "#define USE_SYSV_SHARED_MEMORY 1" >>confdefs.h
+$as_echo "#define USE_WIN32_SHARED_MEMORY 1" >>confdefs.h
 
-  SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
+  SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+elif test "$PORTNAME" = "darwin"; then
+
+$as_echo "#define USE_MACH_SHARED_MEMORY 1" >>confdefs.h
+
+  SHMEM_IMPLEMENTATION="src/backend/port/mach_shmem.c"
 else
 
-$as_echo "#define USE_WIN32_SHARED_MEMORY 1" >>confdefs.h
+$as_echo "#define USE_SYSV_SHARED_MEMORY 1" >>confdefs.h
 
-  SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+  SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
 fi
 
 # Select random number source. If a TLS library is used then it will be the
diff --git a/configure.ac b/configure.ac
index 748fb50236..1bbcd5dc78 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2144,12 +2144,15 @@ fi
 
 
 # Select shared-memory implementation type.
-if test "$PORTNAME" != "win32"; then
-  AC_DEFINE(USE_SYSV_SHARED_MEMORY, 1, [Define to select SysV-style shared 
memory.])
-  SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
-else
+if test "$PORTNAME" = "win32"; then
   AC_DEFINE(USE_WIN32_SHARED_MEMORY, 1, [Define to select Win32-style shared 
memory.])
   SHMEM_IMPLEMENTATION="src/backend/port/win32_shmem.c"
+elif test "$PORTNAME" = "darwin"; then
+  AC_DEFINE(USE_MACH_SHARED_MEMORY, 1, [Define to select Mach-style shared 
memory.])
+  SHMEM_IMPLEMENTATION="src/backend/port/mach_shmem.c"
+else
+  AC_DEFINE(USE_SYSV_SHARED_MEMORY, 1, [Define to select SysV-style shared 
memory.])
+  SHMEM_IMPLEMENTATION="src/backend/port/sysv_shmem.c"
 fi
 
 # Select random number source. If a TLS library is used then it will be the
diff --git a/src/backend/port/mach_shmem.c b/src/backend/port/mach_shmem.c
new file mode 100644
index 00..2fe42ed770
--- /dev/null
+++ b/src/backend/port/mach_shmem.c
@@ -0,0 +1,797 @@
+/*-
+ *
+ * mach_shmem.c
+ *   Implement shared memory using mach facilities
+ *
+ * These routines used to be a fairly thin layer on top of mach shared
+ * memory functionality.  With the addition of anonymous-shmem logic,
+ * they're a bit fatter now.  We still require a mach shmem block to
+ * exist, though, because mmap'd shmem provides no way to find out how
+ * many processes are attached, which we need for interlocking purposes.
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ *   src/backend/port/mach_shmem.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+#include 
+#include 
+#ifdef HAVE_SYS_IPC_H
+#include 
+#endif
+
+#include "miscadmin.h"
+#include "portability/mem.h"
+#include "storage/dsm.h"
+#include "storage/ipc.h"
+#include "storage/pg_shmem.h"
+#include "utils/pidfile.h"
+
+#include 
+#include 
+#include 
+
+
+/*
+ * As of PostgreSQL 9.3, we normally allocate only a very small amount of
+ * System V shared memory, and only for the purposes of providing an
+ * interlock to protect the data directory.  The real shared memory block
+ * is allocated using mmap().  This works around the problem that many
+ * systems have very low limits on the amount of System V shared memory
+ * that can be allocated.  Even a limit of a few megabytes will be enough
+ * to run many copies of PostgreSQL without needing to adjust system settings.
+ *
+ * We assume that no one will attempt to run PostgreSQL 9.3 or later on
+ * systems that are ancient enough that anonymous shared memory is not