Re: [HACKERS] PATCH: two slab-like memory allocators

Andres Freund Thu, 09 Feb 2017 13:38:05 -0800

Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
>  src/backend/utils/mmgr/Makefile   |   2 +-
>  src/backend/utils/mmgr/aset.c     | 115 +--------------------------------
>  src/backend/utils/mmgr/memdebug.c | 131 
> ++++++++++++++++++++++++++++++++++++++
>  src/include/utils/memdebug.h      |  22 +++++++
>  4 files changed, 156 insertions(+), 114 deletions(-)
>  create mode 100644 src/backend/utils/mmgr/memdebug.c


I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower?  Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).


> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> From: Tomas Vondra <to...@2ndquadrant.com>
> Date: Wed, 30 Nov 2016 15:36:23 +0100
> Subject: [PATCH 2/3] slab allocator
> 
> ---
>  src/backend/replication/logical/reorderbuffer.c |  82 +--
>  src/backend/utils/mmgr/Makefile                 |   2 +-
>  src/backend/utils/mmgr/slab.c                   | 803 
> ++++++++++++++++++++++++
>  src/include/nodes/memnodes.h                    |   2 +-
>  src/include/nodes/nodes.h                       |   1 +
>  src/include/replication/reorderbuffer.h         |  15 +-
>  src/include/utils/memutils.h                    |   9 +

I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.




> +/*-------------------------------------------------------------------------
> + *
> + * slab.c
> + *     SLAB allocator definitions.
> + *
> + * SLAB is a custom MemoryContext implementation designed for cases of
> + * equally-sized objects.
> + *
> + *
> + * Portions Copyright (c) 2016, PostgreSQL Global Development Group

Bump, before a committer forgets it.


> + * IDENTIFICATION
> + *     src/backend/utils/mmgr/slab.c
> + *
> + *
> + *   The constant allocation size allows significant simplification and 
> various
> + *   optimizations. Firstly, we can get rid of the doubling and carve the 
> blocks
> + *   into chunks of exactly the right size (plus alignment), not wasting 
> memory.

Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.


> + *   Each block includes a simple bitmap tracking which chunks are used/free.
> + *   This makes it trivial to check if all chunks on the block are free, and
> + *   eventually free the whole block (which is almost impossible with a 
> global
> + *   freelist of chunks, storing chunks from all blocks).

Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency.  A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?


> + *   At the context level, we use 'freelist' to track blocks ordered by 
> number
> + *   of free chunks, starting with blocks having a single allocated chunk, 
> and
> + *   with completely full blocks on the tail.

Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks?  Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?

Thirdly, isn't that approach going to result in a quite long freelists
array, when you have small items and a decent blocksize? That seems like
a fairly reasonable thing to do?


> + *   This also allows various optimizations - for example when searching for
> + *   free chunk, we the allocator reuses space from the most full blocks 
> first,
> + *   in the hope that some of the less full blocks will get completely empty
> + *   (and returned back to the OS).

Might be worth mentioning tlb/cache efficiency too.


> + *   For each block, we maintain pointer to the first free chunk - this is 
> quite
> + *   cheap and allows us to skip all the preceding used chunks, eliminating
> + *   a significant number of lookups in many common usage patters. In the 
> worst
> + *   case this performs as if the pointer was not maintained.

Hm, so that'd be eliminated if we maintained a linked list of chunks (by
chunk number) and a free_chunk_cnt or such.


> +
> +#include "postgres.h"
> +
> +#include "utils/memdebug.h"
> +#include "utils/memutils.h"
> +#include "lib/ilist.h"

Move ilist up, above memdebug, so the list is alphabetically ordered.


> +/*
> + * SlabPointer
> + *           Aligned pointer which may be a member of an allocation set.
> + */
> +typedef void *SlabPointer;
> +typedef SlabContext *Slab;

I personally wont commit this whith pointer hiding typedefs.  If
somebody else does, I can live with it, but for me it's bad enough taste
that I wont.


> +/*
> + * SlabContext is a specialized implementation of MemoryContext.
> + */
> +typedef struct SlabContext
> +{
> +     MemoryContextData header;       /* Standard memory-context fields */
> +     /* Allocation parameters for this context: */
> +     Size            chunkSize;              /* chunk size */
> +     Size            fullChunkSize;  /* chunk size including header and 
> alignment */
> +     Size            blockSize;              /* block size */
> +     int                     chunksPerBlock; /* number of chunks per block */
> +     int                     minFreeChunks;  /* min number of free chunks in 
> any block */
> +     int                     nblocks;                /* number of blocks 
> allocated */
> +     /* blocks with free space, grouped by number of free chunks: */
> +     dlist_head      freelist[FLEXIBLE_ARRAY_MEMBER];
> +}    SlabContext;
> +

Why aren't these ints something unsigned?


> +/*
> + * SlabIsValid
> + *           True iff set is valid allocation set.
> + */
> +#define SlabIsValid(set) PointerIsValid(set)

It's not your fault, but this "iff" is obviously a lot stronger than the
actual test ;).  I seriously doubt this macro is worth anything...


> +/*
> + * SlabReset
> + *           Frees all memory which is allocated in the given set.
> + *
> + * The code simply frees all the blocks in the context - we don't keep any
> + * keeper blocks or anything like that.
> + */

Why don't we?  Seems quite worthwhile?   Thinking about this, won't this
result in a drastic increase of system malloc/mmap/brk traffic when
there's lot of short transactions in reorderbuffer?


> +static void
> +SlabReset(MemoryContext context)
> +{
> +     /* walk over freelists and free the blocks */
> +     for (i = 0; i <= set->chunksPerBlock; i++)
> +     {
> +             dlist_mutable_iter miter;
> +
> +             dlist_foreach_modify(miter, &set->freelist[i])
> +             {
> +                     SlabBlock       block = dlist_container(SlabBlockData, 
> node, miter.cur);
> +
> +                     dlist_delete(miter.cur);
> +
> +                     /* Normal case, release the block */

What does "normal case" refer to here? Given that there's no alternative
case...


> +     /*
> +      * We need to update index of the next free chunk on the block. If we 
> used
> +      * the last free chunk on this block, set it to chunksPerBlock (which is
> +      * not a valid chunk index). Otherwise look for the next chunk - we know
> +      * that it has to be above the current firstFreeChunk value, thanks to 
> how
> +      * we maintain firstFreeChunk here and in SlabFree().
> +      */
> +     if (block->nfree == 0)
> +             block->firstFreeChunk = set->chunksPerBlock;
> +     else
> +     {
> +             /* look for the next free chunk in the block, after the first 
> one */
> +             while ((++block->firstFreeChunk) < set->chunksPerBlock)
> +             {
> +                     int                     byte = block->firstFreeChunk / 
> 8;
> +                     int                     bit = block->firstFreeChunk % 8;
> +
> +                     /* stop when you find 0 (unused chunk) */
> +                     if (!(block->bitmapptr[byte] & (0x01 << bit)))
> +                             break;
> +             }
> +
> +             /* must have found the free chunk */
> +             Assert(block->firstFreeChunk != set->chunksPerBlock);
> +     }

This and previous code just re-affirms my opinion that a bitmap is not
the best structure here.


> +     /* move the whole block to the right place in the freelist */
> +     dlist_delete(&block->node);
> +     dlist_push_head(&set->freelist[block->nfree], &block->node);

Hm.  What if we, instead of the array of doubly linked lists, just kept
a single linked list of blocks, and keep that list sorted by number of
free chunks?  Given that freeing / allocation never changes the number
of allocated chunks by more than 1, we'll never have to move an entry
far in that list to keep it sorted.


> +/*
> + * SlabRealloc
> + *           As Slab is designed for allocating equally-sized chunks of 
> memory, it
> + *           can't really do an actual realloc.
> + *
> + * We try to be gentle and allow calls with exactly the same size as in that
> + * case we can simply return the same chunk. When the size differs, we fail
> + * with assert failure or return NULL.
> + *
> + * We might be even support cases with (size < chunkSize). That however seems
> + * rather pointless - Slab is meant for chunks of constant size, and moreover
> + * realloc is usually used to enlarge the chunk.
> + *
> + * XXX Perhaps we should not be gentle at all and simply fails in all cases,
> + * to eliminate the (mostly pointless) uncertainty.

I think I'm in favor of that.  This seems more likely to hide a bug than
actually helpful.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: two slab-like memory allocators

Reply via email to