On 09/25/2016 08:48 PM, Petr Jelinek wrote:
On 02/08/16 17:44, Tomas Vondra wrote:
This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):
* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab
So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?
Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.
I haven't thought about other places that might benefit from these new
allocators very much - in general, it's useful for places that produce a
stream of equally-sized items (GenSlab relaxes this), that are pfreed()
in ~FIFO manner (i.e. roughly in the order of allocation).
So none of this is meant as a universal replacement of AllocSet,
but in the suitable cases the results seem really promising. For
example for the simple test query in , the performance
improvement is this:
N | master | patched
10000 | 100ms | 100ms
50000 | 15000ms | 350ms
100000 | 146000ms | 700ms
200000 | ? | 1400ms
That's a fairly significant improvement, and the submitted version
of the patches should perform even better (~2x, IIRC).
I agree that it improves performance quite nicely and that
reoderbuffer could use this.
About the code. I am not quite sure that this needs to be split into
two patches especially if 1/3 of the second patch is the removal of
the code added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.
I don't know - it seemed natural to first introduce the Slab, as it's
easier to discuss it separately, and it works for 2 of the 3 contexts
needed in reorderbuffer.
GenSlab is an improvement of Slab, or rather based on it, so that it
works for the third context. And it introduces some additional ideas
(particularly the generational design, etc.)
Of course, none of this means it has to be committed in two separate
chunks, or that I don't expect GenSlab to get committed ...
In general it seems understandable, the initial description helps to
understand what's happening well enough.
One thing I don't understand however is why the freelist is both
array and doubly linked list and why there is new implementation of
said doubly linked list given that we have dlist.
Hmm, perhaps that should be explained better.
In AllocSet, we only have a global freelist of chunks, i.e. we have a
list of free chunks for each possible size (there's 11 sizes, starting
with 8 bytes and then doubling the size). So freelist is a list of
free 8B chunks, freelist is a list of free 16B chunks, etc.
In Slab, the freelist has two levels - first there's a bitmap on each
block (which is possible, as the chunks have constant size), tracking
which chunks of that particular block are free. This makes it trivial to
check that all chunks on the block are free, and free the whole block
(which is impossible with AllocSet).
Second, the freelist at the context level tracks blocks with a given
number of free chunks - so freelist tracks completely full blocks,
freelist is a list of blocks with 1 free chunk, etc. This is used to
reuse space on almost full blocks first, in the hope that some of the
less full blocks will get completely empty (and freed to the OS).
Is that clear?
+ * SlabContext is our standard implementation of MemoryContext.
Meh, that's clearly a bogus comment.
+ * SlabChunk
+ * The prefix of each piece of memory in an SlabBlock
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
Is this true? Why? And if it is then why doesn't the SlabChunk
actually match the StandardChunkHeader?
It is true, a lot of stuff in MemoryContext infrastructure relies on
that. For example when you do pfree(ptr), we actually do something like
header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
to get the chunk header - which includes pointer to the memory context
and other useful stuff.
This also means we can put additional fields before StandardChunkHeader
as that does not break this pointer arithmetic, i.e. SlabChunkData is
effectively defined like this:
typedef struct SlabChunkData
/* block owning this chunk */
/* standard header */
+#define SlabPointerIsValid(pointer) PointerIsValid(pointer)
What's the point of this given that it's defined in the .c file?
Meh, I've copied this from aset.c, but I see it's useless there too.
+static void *
+SlabAlloc(MemoryContext context, Size size)
+ Slab set = (Slab) context;
+ SlabBlock block;
+ SlabChunk chunk;
+ int idx;
+ Assert(size == set->chunkSize);
I wonder if there should be stronger protection (ie, elog) for the size
Perhaps, I'm not opposed to that.
+static void *
+SlabRealloc(MemoryContext context, void *pointer, Size size)
+ Slab set = (Slab)context;
+ /* can't do actual realloc with slab, but let's try to be gentle */
+ if (size == set->chunkSize)
+ return pointer;
+ /* we can't really do repalloc with this allocator */
This IMHO should definitely be elog.
Yeah, you're probably right.
+ /* FIXME */
Do you plan to implement this interface?
Yes, although I'm not sure what checks should go there. The only thing I
can think of right now is checking that the number of free chunks on a
block (according to the bitmap) matches the freelist index.
+#define SLAB_DEFAULT_BLOCK_SIZE 8192
+#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)
I am guessing this is based on max_cached_tuplebufs? Maybe these
could be written with same style?
Not sure I understand what you mean by "based on"? I don't quite
remember how I came up with those constants, but I guess 8kB and 8MB
seemed like good values.
Also, what style you mean? I've used the same style as for ALLOCSET_*
constants in the same file.
Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.
Yeah, I'd like to get some opinions on that too - that's why I left
there the FIXMEs, actually.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: