Sorry for unclear language. Late Friday evening in my place is to blame.
On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov <adema...@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <adema...@gmail.com> wrote:
>>>> I expect that to be useful for parallel query and anything else where
>>>> processes need to share variable-size data. However, that's different
>>>> from this because ours can grown to arbitrary size and shrink again by
>>>> allocating and freeing with DSM segments. We also do everything with
>>>> relative pointers since DSM segments can be mapped at different
>>>> addresses in different processes, whereas this would only work with
>>>> memory carved out of the main shared memory segment (or some new DSM
>>>> facility that guaranteed identical placement in every address space).
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup. If the pages are not touched, the OS VM
>>> subsystem will not commit them.
>> In my opinion, that's not going to fly. If I thought otherwise, I
>> would not have developed the DSM facility in the first place.
>> First, the behavior in this area is highly dependent on choice of
>> operating system and configuration parameters. We've had plenty of
>> experience with requiring non-default configuration parameters to run
>> PostgreSQL, and it's all bad. I don't really want to have to tell
>> users that they must run with a particular value of
>> vm.overcommit_memory in order to run the server. Nor do I want to
>> tell users of other operating systems that their ability to run
>> PostgreSQL is dependent on the behavior their OS has in this area. I
>> had a MacBook Pro up until a year or two ago where a sufficiently
>> shared memory request would cause a kernel panic. That bug will
>> probably be fixed at some point if it hasn't been already, but
>> probably by returning an error rather than making it work.
>> Second, there's no way to give memory back once you've touched it. If
>> you decide to do a hash join on a 250GB inner table using a shared
>> hash table, you're going to have 250GB in swap-backed pages floating
>> around when you're done. If the user has swap configured (and more
>> and more people don't), the operating system will eventually page
>> those out, but until that happens those pages are reducing the amount
>> of page cache that's available, and after it happens they're using up
>> swap. In either case, the space consumed is consumed to no purpose.
>> You don't care about that hash table any more once the query
>> completes; there's just no way to tell the operating system that. If
>> your workload follows an entirely predictable pattern and you always
>> have about the same amount of usage of this facility then you can just
>> reuse the same pages and everything is fine. But if your usage
>> fluctuates I believe it will be a big problem. With DSM, we can and
>> do explicitly free the memory back to the OS as soon as we don't need
>> it any more - and that's a big benefit.
> Essentially this is pessimizing for the lowest common denominator
> among OSes. Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.
> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.
> Thus providing a single contiguous shared address space is doable.
> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between backends this a clear win again.
> In terms of resulting performance, I don't know. This would take
> a few cycles on every step. You have a shared hash table. You
> cannot keep pointers there. You need to store offsets against the
> base address. Any reference would involve additional arithmetics.
> When these things add up, the net effect might become noticeable.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: