Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Sat, Jun 18, 2016 at 3:43 AM, Tom Lane  wrote:
> DSM already exists, and for many purposes its lack of a
> within-a-shmem-segment dynamic allocator is irrelevant; the same purpose
> is served (with more speed, more reliability, and less code) by releasing
> the whole DSM segment when no longer needed.  The DSM segment effectively
> acts like a memory context, saving code from having to account precisely
> for every single allocation it makes.
>
> I grant that having a dynamic allocator added to DSM will support even
> more use-cases.  What I'm not convinced of is that we need a dynamic
> allocator within the fixed-size shmem segment.  Robert already listed some
> reasons why that's rather dubious, but I'll add one more: any leak becomes
> a really serious bug, because the only way to recover the space is to
> restart the whole database instance.
>

Okay, if you say that DSM segments work the best for accumulating
transient data that may be freed together when it becomes unnecessary
at once, then I agree with that.

My code is for long-living data that could be allocated and freed
chunk by chunk. As if an extension wants to store more data and in
more complicated fashion than fits to an ordinary dynahash with the
HASH_SHARED_MEM flag.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Sat, Jun 18, 2016 at 12:45 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Aleksey Demakov <adema...@gmail.com> writes:
>> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
>>> In my opinion, that's not going to fly.  If I thought otherwise, I
>>> would not have developed the DSM facility in the first place.
>
>> Essentially this is pessimizing for the lowest common denominator
>> among OSes.
>
> You're right, but that doesn't mean that the community is going to take
> much interest in an unportable replacement for code that already exists.

Excuse me, what code already exists? As far as I understand, we
compare the approach taken in my code against Robert's code that
is not yet available to the community.

Discussing DSM is beyond the point.

My code might be smoothly hooked into the existing system from an
extension module just with a couple of calls:

RequestAddinShmemSpace() and ShmemInitStruct().

After that this extension might use my concurrent memory allocator
and safe memory reclamation for implementing highly optimized
concurrent data structures of their choice. E.g. concurrent data
structures that I am going to add to the package in the future.

All in all, currently this is not a replacement for anything. This is an
experimental add-on and a food for thought for interested people.

Integrating my code right into the core to replace anything there is
a very remote possibility. I understand if it ever happens it would
take very serious work and multiple iterations.

> Especially not an unportable replacement that also needs sweeping
> assumptions like "disciplined use of mmap in postgresql core and
> extensions".  You don't have to look further than the availability of
> mmap to plperlu programmers to realize that that won't fly.  (Even if
> we threw all the untrusted PLs overboard, I believe plain old stdio
> is willing to use mmap in many versions of libc.)
>

Sorry. I made a sloppy statement about mmap/munmap use. As
correctly pointed out by Andres Freund, it is problematic. So the
whole line about "disciplined use of mmap in postgresql core and
extensions" goes away. Forget it.

But the other techniques that I mentioned do not take such a
special discipline.

The corrected statement is that a single contiguous shared space
is practically doable on many platforms with some effort. And this
approach would make implementation of many shared data
structures more efficient.

Furthermore, I'd guess there is no much point to enable parallel
query execution on a macbook. Or at least one wouldn't expect
superb results from this anyway.

I'd make a wild claim that users who would benefit from parallel
queries or my concurrency work most of the time are the same
users who run platforms that can support single address space.

Thus if there is a solution that benefits e.g. 95% of target users
then why refrain from it in the name of the other 5%? Should not
the support of those 5% be treated as a lower-priority fallback,
while the main effort be put on optimizing for 95-percenters?

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Sat, Jun 18, 2016 at 12:31 AM, Andres Freund <and...@anarazel.de> wrote:
> On 2016-06-18 00:23:14 +0600, Aleksey Demakov wrote:
>> Finally, it's possible to repeatedly mmap
>> and munmap on portions of a contiguous address space providing
>> a given addr argument for both of them. The last option might, of
>> course, is susceptible to hijacking this portion of the address by an
>> inadvertent caller of mmap with NULL addr argument. But probably
>> this could be avoided by imposing a disciplined use of mmap in
>> postgresql core and extensions.
>
> I don't think that's particularly realistic. malloc() uses mmap(NULL)
> internally.  And you can't portably mmap non-file backed memory from
> different processes; you need something like tmpfs backed / posix shared
> memory / for it.  On linux you can do stuff like madvise(MADV_FREE),
> which kinda helps.

Oops. Agreed.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
Sorry for unclear language. Late Friday evening in my place is to blame.

On Sat, Jun 18, 2016 at 12:23 AM, Aleksey Demakov <adema...@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
>> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <adema...@gmail.com> wrote:
>>>> I expect that to be useful for parallel query and anything else where
>>>> processes need to share variable-size data.  However, that's different
>>>> from this because ours can grown to arbitrary size and shrink again by
>>>> allocating and freeing with DSM segments.  We also do everything with
>>>> relative pointers since DSM segments can be mapped at different
>>>> addresses in different processes, whereas this would only work with
>>>> memory carved out of the main shared memory segment (or some new DSM
>>>> facility that guaranteed identical placement in every address space).
>>>>
>>>
>>> I believe it would be perfectly okay to allocate huge amount of address
>>> space with mmap on startup.  If the pages are not touched, the OS VM
>>> subsystem will not commit them.
>>
>> In my opinion, that's not going to fly.  If I thought otherwise, I
>> would not have developed the DSM facility in the first place.
>>
>> First, the behavior in this area is highly dependent on choice of
>> operating system and configuration parameters.  We've had plenty of
>> experience with requiring non-default configuration parameters to run
>> PostgreSQL, and it's all bad.  I don't really want to have to tell
>> users that they must run with a particular value of
>> vm.overcommit_memory in order to run the server.  Nor do I want to
>> tell users of other operating systems that their ability to run
>> PostgreSQL is dependent on the behavior their OS has in this area.  I
>> had a MacBook Pro up until a year or two ago where a sufficiently
>> shared memory request would cause a kernel panic.  That bug will
>> probably be fixed at some point if it hasn't been already, but
>> probably by returning an error rather than making it work.
>>
>> Second, there's no way to give memory back once you've touched it.  If
>> you decide to do a hash join on a 250GB inner table using a shared
>> hash table, you're going to have 250GB in swap-backed pages floating
>> around when you're done.  If the user has swap configured (and more
>> and more people don't), the operating system will eventually page
>> those out, but until that happens those pages are reducing the amount
>> of page cache that's available, and after it happens they're using up
>> swap.  In either case, the space consumed is consumed to no purpose.
>> You don't care about that hash table any more once the query
>> completes; there's just no way to tell the operating system that.  If
>> your workload follows an entirely predictable pattern and you always
>> have about the same amount of usage of this facility then you can just
>> reuse the same pages and everything is fine.  But if your usage
>> fluctuates I believe it will be a big problem.  With DSM, we can and
>> do explicitly free the memory back to the OS as soon as we don't need
>> it any more - and that's a big benefit.
>>
>
> Essentially this is pessimizing for the lowest common denominator
> among OSes. Having a contiguous address space makes things so
> much simpler that considering this case, IMHO, is well worth of it.
>
> You are right that this might highly depend on the OS. But you are
> only partially right that it's impossible to give the memory back once
> you touched it. It is possible in many cases with additional measures.
> That is with additional control over memory mapping. Surprisingly, in
> this case windows has the most straightforward solution. VirtualAlloc
> has separate MEM_RESERVE and MEM_COMMIT flags. On various
> Unix flavours it is possible to play with mmap MAP_NORESERVE
> flag and madvise syscall. Finally, it's possible to repeatedly mmap
> and munmap on portions of a contiguous address space providing
> a given addr argument for both of them. The last option might, of
> course, is susceptible to hijacking this portion of the address by an
> inadvertent caller of mmap with NULL addr argument. But probably
> this could be avoided by imposing a disciplined use of mmap in
> postgresql core and extensions.
>
> Thus providing a single contiguous shared address space is doable.
> The other question is how much it would buy. As for development
> time of an allocator it is a clear win. In terms of easy passing direct
> memory pointers between ba

Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Fri, Jun 17, 2016 at 10:54 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Fri, Jun 17, 2016 at 12:34 PM, Aleksey Demakov <adema...@gmail.com> wrote:
>>> I expect that to be useful for parallel query and anything else where
>>> processes need to share variable-size data.  However, that's different
>>> from this because ours can grown to arbitrary size and shrink again by
>>> allocating and freeing with DSM segments.  We also do everything with
>>> relative pointers since DSM segments can be mapped at different
>>> addresses in different processes, whereas this would only work with
>>> memory carved out of the main shared memory segment (or some new DSM
>>> facility that guaranteed identical placement in every address space).
>>>
>>
>> I believe it would be perfectly okay to allocate huge amount of address
>> space with mmap on startup.  If the pages are not touched, the OS VM
>> subsystem will not commit them.
>
> In my opinion, that's not going to fly.  If I thought otherwise, I
> would not have developed the DSM facility in the first place.
>
> First, the behavior in this area is highly dependent on choice of
> operating system and configuration parameters.  We've had plenty of
> experience with requiring non-default configuration parameters to run
> PostgreSQL, and it's all bad.  I don't really want to have to tell
> users that they must run with a particular value of
> vm.overcommit_memory in order to run the server.  Nor do I want to
> tell users of other operating systems that their ability to run
> PostgreSQL is dependent on the behavior their OS has in this area.  I
> had a MacBook Pro up until a year or two ago where a sufficiently
> shared memory request would cause a kernel panic.  That bug will
> probably be fixed at some point if it hasn't been already, but
> probably by returning an error rather than making it work.
>
> Second, there's no way to give memory back once you've touched it.  If
> you decide to do a hash join on a 250GB inner table using a shared
> hash table, you're going to have 250GB in swap-backed pages floating
> around when you're done.  If the user has swap configured (and more
> and more people don't), the operating system will eventually page
> those out, but until that happens those pages are reducing the amount
> of page cache that's available, and after it happens they're using up
> swap.  In either case, the space consumed is consumed to no purpose.
> You don't care about that hash table any more once the query
> completes; there's just no way to tell the operating system that.  If
> your workload follows an entirely predictable pattern and you always
> have about the same amount of usage of this facility then you can just
> reuse the same pages and everything is fine.  But if your usage
> fluctuates I believe it will be a big problem.  With DSM, we can and
> do explicitly free the memory back to the OS as soon as we don't need
> it any more - and that's a big benefit.
>

Essentially this is pessimizing for the lowest common denominator
among OSes. Having a contiguous address space makes things so
much simpler that considering this case, IMHO, is well worth of it.

You are right that this might highly depend on the OS. But you are
only partially right that it's impossible to give the memory back once
you touched it. It is possible in many cases with additional measures.
That is with additional control over memory mapping. Surprisingly, in
this case windows has the most straightforward solution. VirtualAlloc
has separate MEM_RESERVE and MEM_COMMIT flags. On various
Unix flavours it is possible to play with mmap MAP_NORESERVE
flag and madvise syscall. Finally, it's possible to repeatedly mmap
and munmap on portions of a contiguous address space providing
a given addr argument for both of them. The last option might, of
course, is susceptible to hijacking this portion of the address by an
inadvertent caller of mmap with NULL addr argument. But probably
this could be avoided by imposing a disciplined use of mmap in
postgresql core and extensions.

Thus providing a single contiguous shared address space is doable.
The other question is how much it would buy. As for development
time of an allocator it is a clear win. In terms of easy passing direct
memory pointers between backends this a clear win again.

In terms of resulting performance, I don't know. This would take
a few cycles on every step. You have a shared hash table. You
cannot keep pointers there. You need to store offsets against the
base address. Any reference would involve additional arithmetics.
When these things add up, the net effect might become noticeable.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Fri, Jun 17, 2016 at 10:18 PM, Robert Haas  wrote:
> On Fri, Jun 17, 2016 at 11:30 AM, Tom Lane  wrote:
> But I'm a bit confused about where it gets the bytes it wants to
> manage.  There's no call to dsm_create() or ShmemAlloc() anywhere in
> the code, at least not that I could find quickly.  The only way to get
> shar_base set to a non-NULL value seems to be to call SharAttach(),
> and if there's no SharCreate() where would we get that non-NULL value?
>

You are right, I just have to tidy up the initialisation code before
publishing it.

> I expect that to be useful for parallel query and anything else where
> processes need to share variable-size data.  However, that's different
> from this because ours can grown to arbitrary size and shrink again by
> allocating and freeing with DSM segments.  We also do everything with
> relative pointers since DSM segments can be mapped at different
> addresses in different processes, whereas this would only work with
> memory carved out of the main shared memory segment (or some new DSM
> facility that guaranteed identical placement in every address space).
>

I believe it would be perfectly okay to allocate huge amount of address
space with mmap on startup.  If the pages are not touched, the OS VM
subsystem will not commit them.

>  I've been a bit reluctant to put it out there
> until we have a tangible application of the allocator working, for
> fear people will say "that's not good for anything!".  I'm confident
> it's good for lots of things, but other people have been known not to
> share my confidence.
>

This is what I've been told by Postgres Pro folks too. But I felt that this
thing deserves to be shown to the community sooner rather than latter.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
On Fri, Jun 17, 2016 at 9:30 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Aleksey Demakov <adema...@gmail.com> writes:
>> I have some very experimental code to enable dynamic memory allocation
>> of shared memory for postgresql backend processes.
>
> Um ... what's this do that the existing DSM stuff doesn't do?
>

It operates over a single large shared memory segment. Within this
segment it lets alloc / free small chunks of memory from 16 bytes to
16 kilobytes. Chunks are carved out from fixed-size 32k blocks. Each
block is used to allocate chunks of single size class. When a block is
full, another block for a given size class is taken from the top
shared segment.

The goal is to support high levels of concurrency for alloc / free
calls. Therefore the allocator is mostly non-blocking. Currently it
uses Heller's lazy list algorithm to maintain block lists of a given
size class, so it uses slocks once in a while, when a new block is
added or removed. If this proves to cause scalability problems the
Heller's list might be replaced with Maged Michael's lock-free list to
make the whole allocator absolutely lock-free.

Additionally it provides epoch-based memory reclamation facility that
solves ABA-problem for lock-free algorithms. I am going to implement
some lock-free algorithms (extendable hash-tables and probably skip
lists) on top of this facility.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Experimental dynamic memory allocation of postgresql shared memory

2016-06-17 Thread Aleksey Demakov
Hi all,

I have some very experimental code to enable dynamic memory allocation
of shared memory for postgresql backend processes. The source code in
the repository is not complete yet. Moreover it is not immediately
useful by itself. However it might serve as the basis to implement
higher-level features. Such as expanding hash-tables or other data
structures to share data between backends. Ultimately it might be used
for an in-memory data store usable via FDW interface. Despite such
higher level features are not available yet the code anyway might be
interesting for curious eyes.

https://github.com/ademakov/sharena

The first stage of this project was funded by Postgres Pro. Many
thanks to this wonderful team.

Regards,
Aleksey


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using FDW AddForeignUpdateTargets for a hidden pseudo-column

2016-06-14 Thread Aleksey Demakov
A very quick and dirty hack I did in src/backend/optimizer/plan/initsplan.c (in 
9.5.3):

--- initsplan.c.orig2016-06-14 19:08:27.0 +0600
+++ initsplan.c 2016-06-14 19:10:55.0 +0600
@@ -185,9 +185,12 @@
if (IsA(node, Var))
{
Var*var = (Var *) node;
-   RelOptInfo *rel = find_base_rel(root, var->varno);
+   RelOptInfo *rel;
int attno = var->varattno;
 
+   if (var->varno == INDEX_VAR)
+   continue;
+   rel = find_base_rel(root, var->varno);
if (bms_is_subset(where_needed, rel->relids))
continue;
Assert(attno >= rel->min_attr && attno <= 
rel->max_attr);


And then in my FDW I add the tuple id column like this:

static void
MyAddForeignUpdateTargets(Query *parsetree,
RangeTblEntry 
*target_rte,
Relation 
target_relation)
{
Var*var;
TargetEntry *tle;

/* Make a Var representing the desired value */
var = makeVar(INDEX_VAR, /* instead of parsetree->resultRelation,*/
  target_relation->rd_att->natts + 1,
  INT8OID,
  -1,
  InvalidOid,
  0);

/* Wrap it in a resjunk TLE with the right name ... */
tle = makeTargetEntry((Expr *) var,
  
list_length(parsetree->targetList) + 1,
  pstrdup(MY_FDW_TUPLE_ID),
  true);

/* ... and add it to the query's targetlist */
parsetree->targetList = lappend(parsetree->targetList, tle);
}

I was able to run successfully a couple of very simple tests with these. This 
seems to
indicate that tweaking the core to handle this case properly is doable.

The question is if this approach is conceptually correct and if so what are the 
other
required places to patch.

Regards,
Aleksey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Using FDW AddForeignUpdateTargets for a hidden pseudo-column

2016-06-14 Thread Aleksey Demakov
Hi all,

I have a data store where tuples have unique identities that normally are not 
visible.
I also have a FDW to work with this data store. As per the docs to implement 
updates
for this data store I have AddForeignUpdateTargets() function that adds an 
artificial
column to the target list.

It seems to me that I cannot re-use a system attribute number for this 
artificial resjunk
column (as, for instance, postgres_fdw uses SelfItemPointerAttributeNumber). 
These
attributes have specific meaning not compatible with my tuple identity.

On other hand using a regular AttrNumber might confuse the query planner. In 
contrast
e..g with Oracle FDW that can use a unique key to identify the row, in my data 
store
the tuple identity is normally not visible. So the data planner might break if 
it sees a
Var node with an unexpected varattno number.

What is the best approach to handle such a case?

1. Give up on this entirely and require a unique key for any table used thru 
FDW.

2. Force the FDW to expose the identity column either explicitly by the user who
creates a foreign table or automatically adding it in the corresponding trigger
(preferably still making it hidden for normal scans).

3. Modify the postgresql core to nicely handle the case of an unknown target
column added by a FDW.

4. Something else?

Regards,
Aleksey



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rationalizing code-sharing among src/bin/ directories

2016-03-24 Thread Aleksey Demakov
Hi there,

> On 23 Mar 2016, at 22:38, Tom Lane  wrote:
> Anybody want to bikeshed the directory name src/feutils?  Maybe fe-utils
> would be more readable.  And where to put the corresponding header files?
> src/include/fe-utils?


For me “utils" sounds like something of auxiliary nature. if some pretty
basic stuff is going to be added there, then I believe that fe_common or
perhaps fe_shared would be a bit more suitable.

Regards,
Aleksey



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers