Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...


+ * IDENTIFICATION
+ *       src/backend/utils/mmgr/slab.c
+ *
+ *
+ *     The constant allocation size allows significant simplification and 
various
+ *     optimizations that are not possible in AllocSet. Firstly, we can get rid
+ *     of the doubling and carve the blocks into chunks of exactly the right 
size
+ *     (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.

+ *
+ *     At the context level, we use 'freelist' array to track blocks grouped by
+ *     number of free chunks. For example freelist[0] is a list of completely 
full
+ *     blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?


+ *     About MEMORY_CONTEXT_CHECKING:
+ *
+ *     Since we usually round request sizes up to the next power of 2, there
+ *     is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?


+ *     Thus, if someone makes the common error of writing past what they've
+ *     requested, the problem is likely to go unnoticed ... until the day when
+ *     there *isn't* any wasted space, perhaps because of different memory
+ *     alignment on a new platform, or some other effect.  To catch this sort
+ *     of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
+ *     the requested space whenever the request is less than the actual chunk
+ *     size, and verifies that the byte is undamaged when the chunk is freed.
+ *
+ *
+ *     About USE_VALGRIND and Valgrind client requests:
+ *
+ *     Valgrind provides "client request" macros that exchange information with
+ *     the host Valgrind (if any).  Under !USE_VALGRIND, memdebug.h stubs out
+ *     currently-used macros.
+ *
+ *     When running under Valgrind, we want a NOACCESS memory region both 
before
+ *     and after the allocation.  The chunk header is tempting as the preceding
+ *     region, but mcxt.c expects to able to examine the standard chunk header
+ *     fields.  Therefore, we use, when available, the requested_size field and
+ *     any subsequent padding.  requested_size is made NOACCESS before 
returning
+ *     a chunk pointer to a caller.  However, to reduce client request traffic,
+ *     it is kept DEFINED in chunks on the free list.
+ *
+ *     The rounded-up capacity of the chunk usually acts as a post-allocation
+ *     NOACCESS region.  If the request consumes precisely the entire chunk,
+ *     there is no such region; another chunk header may immediately follow.  
In
+ *     that case, Valgrind will not detect access beyond the end of the chunk.
+ *
+ *     See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.


+typedef struct SlabBlockData *SlabBlock;               /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+       MemoryContextData header;       /* Standard memory-context fields */
+       /* Allocation parameters for this context: */
+       Size            chunkSize;              /* chunk size */
+       Size            fullChunkSize;  /* chunk size including header and 
alignment */
+       Size            blockSize;              /* block size */
+       int                     chunksPerBlock; /* number of chunks per block */
+       int                     minFreeChunks;  /* min number of free chunks in 
any block */
+       int                     nblocks;                /* number of blocks 
allocated */
+       /* Info about storage allocated in this context: */
+       SlabBlock       freelist[1];    /* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.

+/*
+ * SlabBlockData
+ *             Structure of a single block in SLAB allocator.
+ *
+ * slab: context owning this block

What do we need this for?

+ * prev, next: used for doubly-linked list of blocks in global freelist

I'd prefer using an embedded list here (cf. ilist.h).

+/*
+ * SlabChunk
+ *             The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ * However it's possible to fields in front of the StandardChunkHeader fields,
+ * which is used to add pointer to the block.
+ */

Wouldn't that be easier to enforce - particularly around alignment
requirements - by embedding a StandardChunkHeader here? That'd also
avoid redundancies.

+/* ----------
+ * Debug macros
+ * ----------
+ */
+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+                       fprintf(stderr, "SlabFree: %s: %p, %d\n", \
+                               (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+                       fprintf(stderr, "SlabAlloc: %s: %p, %d\n", \
+                               (_cxt)->header.name, (_chunk), (_chunk)->size)
+#else
+#define SlabFreeInfo(_cxt, _chunk)
+#define SlabAllocInfo(_cxt, _chunk)
+#endif

Do we really have to copy that stuff from aset.c? Obviously no-one uses
that, since it doesn't even compile cleanly if HAVE_ALLOCINFO is
defined:
/home/andres/src/postgresql/src/backend/utils/mmgr/aset.c:302:20: warning: 
format ‘%d’ expects argument of type ‘int’, but argument 5 has type ‘Size {aka 
long unsigned int}’ [-Wformat=]
    fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \


+#ifdef CLOBBER_FREED_MEMORY
+
+/* Wipe freed memory for debugging purposes */
+static void
+wipe_mem(void *ptr, size_t size)

+#ifdef MEMORY_CONTEXT_CHECKING
+static void
+set_sentinel(void *base, Size offset)
+
+static bool
+sentinel_ok(const void *base, Size offset)
+#endif

+#ifdef RANDOMIZE_ALLOCATED_MEMORY
+static void
+randomize_mem(char *ptr, size_t size)

Let's move these into an mcxt_internal.h or mcxt_impl.h or such, as
static inlines.

+MemoryContext
+SlabContextCreate(MemoryContext parent,
+                                         const char *name,
+                                         Size blockSize,
+                                         Size chunkSize)
+{
+       int             chunksPerBlock;
+       Size    fullChunkSize;
+       Slab    set;
+
+       /* chunk, including SLAB header (both addresses nicely aligned) */
+       fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

+       /* make sure the block can store at least one chunk (plus a bitmap) */
+       if (blockSize - sizeof(SlabChunkData) < fullChunkSize + 1)
+               elog(ERROR, "block size %ld for slab is too small for chunks 
%ld",
+                                       blockSize, chunkSize);

I assume the 1 is the bitmap size?


+       /* so how many chunks can we fit into a block, including header and 
bitmap? */
+       chunksPerBlock
+               =  (8 * (blockSize - sizeof(SlabBlockData)) - 7) / (8 * 
fullChunkSize + 1);

I'm slightly drunk due to bad airline wine, but right now that seems a
bit odd and/or documentation worthy. I understand the (xxx + 7) / 8
pattern elsewhere, but right now I can't follow the - 7.

+/*
+ * SlabAlloc
+ *             Returns pointer to allocated memory of given size or NULL if
+ *             request could not be completed; memory is added to the set.
+ *
+ * No request may exceed:
+ *             MAXALIGN_DOWN(SIZE_MAX) - SLAB_BLOCKHDRSZ - SLAB_CHUNKHDRSZ
+ * All callers use a much-lower limit.

That seems like a meaningless comment in the context of a slab allocator
with a fixed size.


+       /*
+        * If there are no free chunks in any existing block, create a new block
+        * and put it to the last freelist bucket.
+        *
+        * (set->minFreeChunks == 0) means there are no blocks with free chunks,
+        * thanks to how minFreeChunks is updated at the end of SlabAlloc().
+        */
+       if (set->minFreeChunks == 0)
+       {
+               block = (SlabBlock)malloc(set->blockSize);

Space after cast - maybe run pgindent over the file before submission?
Doing that manually helps to avoid ugly damange by the per-release run
later. I'm pretty sure there'll be a significant number of changes.



+       if (block->nfree == 0)
+               block->firstFreeChunk = set->chunksPerBlock;
+       else
+       {
+               /* look for the next free chunk in the block, after the first 
one */
+               while ((++block->firstFreeChunk) < set->chunksPerBlock)
+               {
+                       int byte = block->firstFreeChunk / 8;
+                       int bit  = block->firstFreeChunk % 8;
+
+                       /* stop when you find 0 (unused chunk) */
+                       if (! (block->bitmapptr[byte] & (0x01 << bit)))
+                               break;
+               }

I haven't profiled (or even compiled) this patchset yet, but FWIW, in
the tuple deforming code, I could measure a noticeable speedup by
accessing bitmap-bytes in the native word-size, rather than char. I'm
*NOT* saying you should do that, but if this ever shows up as a
bottlneck, it might be worthwhile to optimize.

+       /*
+        * And finally update minFreeChunks, i.e. the index to the block with 
the
+        * lowest number of free chunks. We only need to do that when the block
+        * got full (otherwise we know the current block is the right one).
+        * We'll simply walk the freelist until we find a non-empty entry.
+        */
+       if (set->minFreeChunks == 0)
+               for (idx = 1; idx <= set->chunksPerBlock; idx++)
+                       if (set->freelist[idx])
+                       {
+                               set->minFreeChunks = idx;
+                               break;
+                       }

Yuck. This definitely needs braces.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to