On Thu, Nov 10, 2016 at 12:37 AM, Thomas Munro
<thomas.mu...@enterprisedb.com> wrote:
> On Tue, Nov 1, 2016 at 5:06 PM, Thomas Munro
> <thomas.mu...@enterprisedb.com> wrote:
>> On Wed, Oct 5, 2016 at 11:28 PM, Thomas Munro
>> <thomas.mu...@enterprisedb.com> wrote:
>>> [dsa-v3.patch]
>> Here is a new version which just adds CLOBBER_FREED_MEMORY support to 
>> dsa_free.
> Here is a new version that fixes a bug I discovered in freepage.c today.
> Details:  When dsa_free decides to give back a whole superblock back
> to the free page manager for a segment with FreePageManagerPut, and
> there was already exactly one span of exactly one free page in that
> segment, and the span being 'put' is not adjacent to that existing
> free page, then the singleton format must be converted to a btree with
> the existing page as root and the newly put span as the sole leaf.
> But in that special case we forgot to add the newly put span to the
> appropriate free list.  Not only did we lose track of it, but a future
> call to FreePageManagerPut might try to merge it with another adjacent
> span, which will try to manipulate the freelist that it expects it to
> be in and blow up.  The fix is just to add a call to
> FreePagePushSpanLeader in this corner case before the early return.

Since a lot of the design of this patch is mine - from my earlier work
on sb_alloc - I don't expect to have a ton of objections to it. And
I'd like to get it committed, because other parallelism work depends
on it (bitmap heap scan and parallel hash join in particular), and
because it's got other uses as well.  However, I don't want to be
perceived as slamming my code (or that of my colleagues) into the tree
without due opportunity for other people to provide feedback, so if
anyone has questions, comments, concerns, or review to offer, please

I think we should develop versions of this that (1) allocate from the
main shared memory segment and (2) allocate from backend-private
memory.  Per my previous benchmarking results, allocating from
backend-private memory would be a substantial win for tuplesort.c
because this allocator is substantially more memory-efficient for
large memory contexts than aset.c, and Tomas Vondra tested it out and
found that it is also faster for logical decoding than the approach he
proposed.  Perhaps that's not an argument for holding up his proposed
patches for that problem, but I think it IS a good argument for
pressing forward with a backend-private version of this allocator.
I'm not saying that should be part of the initial commit of this code,
but I think it's a good direction to pursue.

One question that we need to resolve is where the code should live in
the source tree.  When I wrote the original patches upon which this
work was based, I think that I put all the code in
src/backend/utils/mmgr, since it's all memory-management code.  In
this patch, Thomas left the free page manager code there, but put the
allocator itself in src/backend/storage/ipc.  There's a certain logic
to that because dynamic shared areas (dsa.c) sit right next to dynamic
shared memory (dsm.c) but it feels a bit schizophrenic to have half of
the code in storage/ipc and the other half in utils/mmgr.  I guess my
view is that utils/mmgr is a better fit, because I think that this is
basically memory management code that happens to use shared memory,
rather than basically IPC that happens to be an allocator.  If we
decide that this stuff goes in storage/ipc then that's basically
saying that everything that uses dynamic shared memory is going to end
up in that directory, which seems like a mess.  The fact that the
free-page manager, at least, could be used for allocations not based
on DSM strengthens that argument in my view. Other opinions?

The #ifdef HAVE__BUILTIN_TYPES_COMPATIBLE_P hack in relptr.h, for
which I believe I'm responsible, is ugly.  There is probably a
compiler out there that has __typeof__ but not
__builtin_types_compatible_p, and we could cater to that by adding a
separate configure test for __typeof__.  A little browsing of the
documentation on at https://gcc.gnu.org/onlinedocs/ seems to suggest
that __builtin_types_compatible_p didn't exist before GCC 3.1, but
__typeof__ seems to be there even in 2.95.3.  That's not very
interesting, honestly, because 3.1 came out in 2002, but there might
be non-GCC compilers that implement __typeof__ but not
__builtin_types_compatible_p.  I am inclined not to worry about this
unless somebody feels otherwise, but it's not beautiful.

I wonder if it wouldn't be a good idea to allow the dsa_area_control
to be stored wherever instead of insisting that it's got to be located
inside the first DSM segment backing the pool itself.  For example,
you could make dsa_create_dynamic() take a third argument which is a
pointer to enough space for a dsa_area_control, and it would
initialize it in place.  Then you could make dsa_attach_dynamic() take
a pointer to that same structure instead of taking a dsa_handle.
Actually, I think dsa_handle goes away: it becomes the caller's
responsibility to figure out the correct pointer address to pass in
the second process.  The advantage of this design change is that you
could stuff the dsa_area_control into the existing parallel DSM and
only create additional DSMs if anything is actually allocated.  What
would be even cooler is to allow a little bit of space inside the
parallel DSM that gets used first, and then, when that overflows, we
start creating new DSMs.  Say, 64kB.  Am I sounding greedy yet?  It
just seems like a good idea not to needlessly multiply the number of

+               /* Unlink span. */
+               /* TODO: Does it even need to be linked in in the
first place? */
+               LWLockAcquire(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE),
+                                         LW_EXCLUSIVE);
+               unlink_span(area, span);
+               LWLockRelease(DSA_SCLASS_LOCK(area, DSA_SCLASS_SPAN_LARGE));

In answer to the TODO, I think this isn't strictly necessary, but it
seems like a good idea to do it anyway for debuggability.  If we
didn't do this, the space occupied by a large object wouldn't be
"known" in any way other than by having disappeared from the free page
map, whereas this way it's linked into the DSA's listed of allocated
chunks like anything else, so for example dsa_dump() can print it.  I
recommend removing this TODO.

+       /*
+        * TODO: We could take Max(fpm->contiguous_pages, result of
+        * FreePageBtreeCleanup) and give it to FreePageManagerUpdatLargest as a
+        * starting point for its search, potentially avoiding a bunch of work,
+        * since there is no way the largest contiguous run is bigger than that.
+        */

Typo: Updat.

+       /*
+        * TODO: Figure out how to avoid setting this every time. It
may not be as
+        * simple as it looks.
+        */

Something isn't right with this function, because it takes the trouble
to calculate a value for contiguous_pages that it then doesn't use for
anything.  I think the original idea here was that if we calculated a
value for contiguous_pages that was less than fpm->contiguous_pages,
there was no need to dirty it.  If we can't get away with that for
some reason, then there's no point in calculating the value in the
first place.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to