On 09/25/2016 08:48 PM, Petr Jelinek wrote:
Hi Tomas,

On 02/08/16 17:44, Tomas Vondra wrote:

This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):

* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)

* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab

So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?

Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.

I haven't thought about other places that might benefit from these new allocators very much - in general, it's useful for places that produce a stream of equally-sized items (GenSlab relaxes this), that are pfreed() in ~FIFO manner (i.e. roughly in the order of allocation).

So none of this is meant as a universal replacement of AllocSet,
but in the suitable cases the results seem really promising. For
example for the simple test query in [1], the performance
improvement is this:

        N |   master |  patched
    10000 |    100ms |    100ms
    50000 |  15000ms |    350ms
   100000 | 146000ms |    700ms
   200000 |        ? |   1400ms

That's a fairly significant improvement, and the submitted version
of the patches should perform even better (~2x, IIRC).

I agree that it improves performance quite nicely and that
reoderbuffer could use this.

About the code. I am not quite sure that this needs to be split into
two patches especially if 1/3 of the second patch is the removal of
the code added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.

I don't know - it seemed natural to first introduce the Slab, as it's easier to discuss it separately, and it works for 2 of the 3 contexts needed in reorderbuffer.

GenSlab is an improvement of Slab, or rather based on it, so that it works for the third context. And it introduces some additional ideas (particularly the generational design, etc.)

Of course, none of this means it has to be committed in two separate chunks, or that I don't expect GenSlab to get committed ...

In general it seems understandable, the initial description helps to
understand what's happening well enough.

One thing I don't understand however is why the freelist is both
array and doubly linked list and why there is new implementation of
said doubly linked list given that we have dlist.

Hmm, perhaps that should be explained better.

In AllocSet, we only have a global freelist of chunks, i.e. we have a list of free chunks for each possible size (there's 11 sizes, starting with 8 bytes and then doubling the size). So freelist[0] is a list of free 8B chunks, freelist[1] is a list of free 16B chunks, etc.

In Slab, the freelist has two levels - first there's a bitmap on each block (which is possible, as the chunks have constant size), tracking which chunks of that particular block are free. This makes it trivial to check that all chunks on the block are free, and free the whole block (which is impossible with AllocSet).

Second, the freelist at the context level tracks blocks with a given number of free chunks - so freelist[0] tracks completely full blocks, freelist[1] is a list of blocks with 1 free chunk, etc. This is used to reuse space on almost full blocks first, in the hope that some of the less full blocks will get completely empty (and freed to the OS).

Is that clear?

+ * SlabContext is our standard implementation of MemoryContext.
+ *


Meh, that's clearly a bogus comment.

+ * SlabChunk
+ *             The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ */

Is this true? Why? And if it is then why doesn't the SlabChunk
actually match the StandardChunkHeader?

It is true, a lot of stuff in MemoryContext infrastructure relies on that. For example when you do pfree(ptr), we actually do something like

   header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))

to get the chunk header - which includes pointer to the memory context and other useful stuff.

This also means we can put additional fields before StandardChunkHeader as that does not break this pointer arithmetic, i.e. SlabChunkData is effectively defined like this:

typedef struct SlabChunkData
    /* block owning this chunk */
    void *block;

    /* standard header */
    StandardChunkHeader header;
} SlabChunkData;

+#define SlabPointerIsValid(pointer) PointerIsValid(pointer)

What's the point of this given that it's defined in the .c file?

Meh, I've copied this from aset.c, but I see it's useless there too.

+static void *
+SlabAlloc(MemoryContext context, Size size)
+       Slab    set = (Slab) context;
+       SlabBlock       block;
+       SlabChunk       chunk;
+       int                     idx;
+       AssertArg(SlabIsValid(set));
+       Assert(size == set->chunkSize);

I wonder if there should be stronger protection (ie, elog) for the size

Perhaps, I'm not opposed to that.

+static void *
+SlabRealloc(MemoryContext context, void *pointer, Size size)
+       Slab    set = (Slab)context;
+       /* can't do actual realloc with slab, but let's try to be gentle */
+       if (size == set->chunkSize)
+               return pointer;
+       /* we can't really do repalloc with this allocator */
+       Assert(false);

This IMHO should definitely be elog.

Yeah, you're probably right.

+static void
+SlabCheck(MemoryContext context)
+       /* FIXME */

Do you plan to implement this interface?

Yes, although I'm not sure what checks should go there. The only thing I can think of right now is checking that the number of free chunks on a block (according to the bitmap) matches the freelist index.

+#define SLAB_DEFAULT_BLOCK_SIZE                8192
+#define SLAB_LARGE_BLOCK_SIZE          (8 * 1024 * 1024)

I am guessing this is based on max_cached_tuplebufs? Maybe these
could be written with same style?

Not sure I understand what you mean by "based on"? I don't quite remember how I came up with those constants, but I guess 8kB and 8MB seemed like good values.

Also, what style you mean? I've used the same style as for ALLOCSET_* constants in the same file.


Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.

Yeah, I'd like to get some opinions on that too - that's why I left there the FIXMEs, actually.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to