Sorry for unclear language. Late Friday evening in my place is to blame.

On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov <> wrote:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <> wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <> wrote:
>>>> I expect that to be useful for parallel query and anything else where
>>>> processes need to share variable-size data.  However, that's different
>>>> from this because ours can grown to arbitrary size and shrink again by
>>>> allocating and freeing with DSM segments.  We also do everything with
>>>> relative pointers since DSM segments can be mapped at different
>>>> addresses in different processes, whereas this would only work with
>>>> memory carved out of the main shared memory segment (or some new DSM
>>>> facility that guaranteed identical placement in every address space).
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup.  If the pages are not touched, the OS VM
>>> subsystem will not commit them.
>> In my opinion, that's not going to fly.  If I thought otherwise, I
>> would not have developed the DSM facility in the first place.
>> First, the behavior in this area is highly dependent on choice of
>> operating system and configuration parameters.  We've had plenty of
>> experience with requiring non-default configuration parameters to run
>> PostgreSQL, and it's all bad.  I don't really want to have to tell
>> users that they must run with a particular value of
>> vm.overcommit_memory in order to run the server.  Nor do I want to
>> tell users of other operating systems that their ability to run
>> PostgreSQL is dependent on the behavior their OS has in this area.  I
>> had a MacBook Pro up until a year or two ago where a sufficiently
>> shared memory request would cause a kernel panic.  That bug will
>> probably be fixed at some point if it hasn't been already, but
>> probably by returning an error rather than making it work.
>> Second, there's no way to give memory back once you've touched it.  If
>> you decide to do a hash join on a 250GB inner table using a shared
>> hash table, you're going to have 250GB in swap-backed pages floating
>> around when you're done.  If the user has swap configured (and more
>> and more people don't), the operating system will eventually page
>> those out, but until that happens those pages are reducing the amount
>> of page cache that's available, and after it happens they're using up
>> swap.  In either case, the space consumed is consumed to no purpose.
>> You don't care about that hash table any more once the query
>> completes; there's just no way to tell the operating system that.  If
>> your workload follows an entirely predictable pattern and you always
>> have about the same amount of usage of this facility then you can just
>> reuse the same pages and everything is fine.  But if your usage
>> fluctuates I believe it will be a big problem.  With DSM, we can and
>> do explicitly free the memory back to the OS as soon as we don't need
>> it any more - and that's a big benefit.
> Essentially this is pessimizing for the lowest common denominator
> among OSes. Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.
> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.
> Thus providing a single contiguous shared address space is doable.
> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between backends this a clear win again.
> In terms of resulting performance, I don't know. This would take
> a few cycles on every step. You have a shared hash table. You
> cannot keep pointers there. You need to store offsets against the
> base address. Any reference would involve additional arithmetics.
> When these things add up, the net effect might become noticeable.
> Regards,
> Aleksey

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to