On 11/20/2013 09:58 PM, Robert Haas wrote:
On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
How many allocations? What size will they have have typically, minimum and

The facility is intended to be general, so the answer could vary
widely by application.  The testing that I have done so far suggests
that for message-passing, relatively small queue sizes (a few kB,
perhaps 1 MB at the outside) should be sufficient.  However,
applications such as parallel sort could require vast amounts of
shared memory.  Consider a machine with 1TB of memory performing a
512GB internal sort.  You're going to need 512GB of shared memory for

Hmm. Those two use cases are quite different. For message-passing, you want a lot of small queues, but for parallel sort, you want one huge allocation. I wonder if we shouldn't even try a one-size-fits-all solution.

For message-passing, there isn't much need to even use dynamic shared memory. You could just assign one fixed-sized, single-reader multiple-writer queue for each backend.

For parallel sort, you'll want to utilize all the available memory and all CPUs for one huge sort. So all you really need is a single huge shared memory segment. If one process is already using that 512GB segment to do a sort, you do *not* want to allocate a second 512GB segment. You'll want to wait for the first operation to finish first. Or maybe you'll want to have 3-4 somewhat smaller segments in use at the same time, but not more than that.

* As discussed in the "Something fishy happening on frogmouth" thread, I
don't like the fact that the dynamic shared memory segments will be
permanently leaked if you kill -9 postmaster and destroy the data directory.

Your test elicited different behavior for the dsm code vs. the main
shared memory segment because it involved running a new postmaster
with a different data directory but the same port number on the same
machine, and expecting that that new - and completely unrelated -
postmaster would clean up the resources left behind by the old,
now-destroyed cluster.  I tend to view that as a defect in your test
case more than anything else, but as I suggested previously, we could
potentially change the code to use something like 1000000 + (port *
100) with a forward search for the control segment identifier, instead
of using a state file, mimicking the behavior of the main shared
memory segment.  I'm not sure we ever reached consensus on whether
that was overall better than what we have now.

I really think we need to do something about it. To use your earlier example of parallel sort, it's not acceptable to permanently leak a 512 GB segment on a system with 1 TB of RAM.

One idea is to create the shared memory object with shm_open, and wait until all the worker processes that need it have attached to it. Then, shm_unlink() it, before using it for anything. That way the segment will be automatically released once all the processes close() it, or die. In particular, kill -9 will release it. (This is a variant of my earlier idea to create a small number of anonymous shared memory file descriptors in postmaster startup with shm_open(), and pass them down to child processes with fork()). I think you could use that approach with SysV shared memory as well, by destroying the segment with sgmget(IPC_RMID) immediately after all processes have attached to it.

- Heikki

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to