Re: [HACKERS] POSIX shared memory redux

A.M. Thu, 14 Apr 2011 07:27:23 -0700

On Apr 13, 2011, at 9:30 PM, Robert Haas wrote:

> On Wed, Apr 13, 2011 at 6:11 PM, A.M. <[email protected]> wrote:
>>> I don't see why we need to get rid of SysV shared memory; needing less
>>> of it seems just as good.
>> 
>> 1. As long one keeps SysV shared memory around, the postgresql project has 
>> to maintain the annoying platform-specific document on how to configure the 
>> poorly named kernel parameters. If the SysV region is very small, that means 
>> I can run more postgresql instances within the same kernel limits, but one 
>> can still hit the limits. My patch allows the postgresql project to delete 
>> that page and the hassles with it.
>> 
>> 2. My patch proves that SysV is wholly unnecessary. Are you attached to it? 
>> (Pun intended.)
> 
> With all due respect, I think this is an unproductive conversation.
> Your patch proves that SysV is wholly unnecessary only if we also
> agree that fcntl() locking is just as reliable as the nattch
> interlock, and Tom and I are trying to explain why we don't believe
> that's the case.  Saying that we're just wrong without responding to
> our points substantively doesn't move the conversation forward.


Sorry- it wasn't meant to be an attack- just a dumb pun. I am trying to argue 
that, even if the fcntl is unreliable, the startup procedure is just as 
reliable as it is now. The reasons being:

1) the SysV nattch method's primary purpose is to protect the shmem region. 
This is no longer necessary in my patch because the shared memory in unlinked 
immediately after creation, so only the initial postmaster and its children 
have access.

2) the standard postgresql lock file remains the same

Furthermore, there is indeed a case where the SysV nattch cannot work while the 
fcntl locking can indeed catch: if two separate machines have a postgresql data 
directory mounted over NFS, postgresql will currently allow both machines to 
start a postmaster in that directory because the SysV nattch check fails and 
then the pid in the lock file is the pid on the first machine, so postgresql 
will say "starting anyway". With fcntl locking, this can be fixed. SysV only 
has presence on one kernel.


> 
> In case it's not clear, here again is what we're concerned about: A
> System V shm *cannot* be removed until nobody is attached to it.  A
> lock file can be removed, or the lock can be accidentally released by
> the apparently innocuous operation of closing a file descriptor.
> 
>> Both you and Tom have somehow assumed that the patch alters current 
>> postgresql behavior. In fact, the opposite is true. I haven't changed any of 
>> the existing behavior. The "robust" behavior remains. I merely added fcntl 
>> interlocking on top of the lock file to replace the SysV shmem check.
> 
> This seems contradictory.  If you replaced the SysV shmem check, then
> it's not there, which means you altered the behavior.

>From what I understood, the primary purpose of the SysV check was to protect 
>the shared memory from multiple stompers. The interlock was a neat 
>side-effect. 

The lock file contents are currently important to get the pid of a potential, 
conflicting postmaster. With the fcntl API, we can return a live conflicting 
PID (whether a postmaster or a stuck child), so that's an improvement. This 
could be used, for example, for STONITH, to reliably kill a dying replication 
clone- just loop on the pids returned from the lock.

Even if the fcntl check passes, the pid in the lock file is checked, so the 
lock file behavior remains the same.

If you were to implement a daemon with a shared data directory but no shared 
memory, how would implement the interlock? Would you still insist on SysV 
shmem? Unix daemons generally rely on lock files alone. Perhaps there is a 
different API on which we can agree.

Cheers,
M
-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POSIX shared memory redux

Reply via email to