On Fri, Nov 9, 2018 at 2:19 PM Ideriha, Takeshi
<ideriha.take...@jp.fujitsu.com> wrote:
> From: Thomas Munro [mailto:thomas.mu...@enterprisedb.com]
> >I know of 3 ideas that would make your idea C work, so that you could share
> >something as complicated as a query plan directly without having to 
> >deserialise it to
> >use it:
> >
> >1.  Allow the creation of DSA areas inside the traditional fixed memory 
> >segment
> >(instead of DSM), in a fixed-sized space reserved by the postmaster.  That 
> >is, use
> >dsa.c's ability to allocate and free memory, and possibly free a whole area 
> >at once to
> >avoid leaking memory in some cases (like MemoryContexts), but in this mode
> >dsa_pointer would be directly castable to a raw pointer.  Then you could 
> >provide a
> >regular MemoryContext interface for it, and use it via palloc(), as you 
> >said, and all the
> >code that knows how to construct lists and trees and plan nodes etc would 
> >All Just
> >Work.  It would be your plan C, and all the pointers would be usable in 
> >every process,
> >but limited in total size at start-up time.
> >
> >2.  Allow regular DSA in DSM to use raw pointers into DSM segments, by 
> >mapping
> >segments at the same address in every backend.  This involves reserving a 
> >large
> >virtual address range up front in the postmaster, and then managing the 
> >space,
> >trapping SEGV to map/unmap segments into parts of that address space as 
> >necessary
> >(instead of doing that in dsa_get_address()).  AFAIK that would work, but it 
> >seems to
> >be a bit weird to go to such lengths.  It would be a kind of home-made 
> >simulation of
> >threads.  On the other hand, that is what we're already doing in dsa.c, 
> >except more
> >slowly due to extra software address translation from funky pseudo-addresses.
> >
> >3.  Something something threads.
>
> I'm thinking to go with plan 1. No need to think about address translation
> seems tempting. Plan 2 (as well as plan 3) looks a big project.

The existing function dsa_create_in_place() interface was intended to
support that, but has never been used in that way so I'm not sure what
extra problems will come up.  Here are some assorted thoughts:

* You can prevent a DSA area from creating extra DSM segments, so that
it is constrained to stay entirely in the space you give it, by
calling dsa_set_size_limit(area, size) using the same size that you
gave to dsa_create_in_place(); now you have a DSA area that manages a
single fixed-sized chunk of memory that you gave it, in your case
inside the traditional shared memory segment (but it could be
anywhere, including inside a DSM segment or another DSA area!)

* You can probably write a MemoryContext wrapper for it, if it has
only one segment that is in the traditional shared memory segment.
You would need to do very simple kind of address translation: the
result from palloc() needs to be base + dsa_allocate()'s result, and
the argument to pfree() needs to be subtracted from base when
dsa_free() is called.  That is a version of your idea C that should
work AFAIK.

* Once you have that working, you now have a new kind of resource
management problem on your hands: memory leaks will be cluster-wide
and cluster-life-time!  That's hard, because the goal is to be able to
use arbitrary code in the tree that deals with plans etc, but that
code all assumes that it can "throw" (elog()) on errors.  PostgreSQL C
is generally "garbage collected" (in a way), but in this sketch, that
doesn't work anymore: this area *never* goes out of scope and gets
cleaned up.  Generally, languages with exceptions either need garbage
collection or scoped destructors to clean up the mess, but in this
sketch we don't have that anymore...  much like allocating stuff in
TopMemoryContext, except worse because it doesn't go away when one
backend exits.

* I had some ideas about some kind of "allocation rollback" interface:
you begin an "allocation transaction", allocate a bunch of stuff
(perhaps indirectly, by calling some API that makes query plans or
whatever and is totally unaware of this stuff).  Then if there is an
error, whatever was allocated so far is freed in the usual cleanup
paths by a rollback that happens via the resource manager machinery.
If you commit, then the allocation becomes permanent.  Then you only
commit stuff that you promise not to leak (perhaps stuff that has been
added to a very carefully managed cluster-wide plan cache).  I am not
sure of the details, and this might be crazy...

-- 
Thomas Munro
http://www.enterprisedb.com

Reply via email to