Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, this patch is marked as committed in CF application but the second part (generational allocator) was AFAICS never committed. Does anybody plan to push this forward? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 28 February 2017 at 01:02, Andres Freundwrote: > Hi, > > On 2017-02-27 03:17:32 -0800, Andres Freund wrote: >> I'll work on getting slab committed first, and then review / edit / >> commit generation.c later. One first note there is that I'm wondering >> if generation.c is a too generic filename. > > And pushed slab and its usage. Will have a look at generation.c > tomorrow. Attached is a patch to fix the compiler warning for compilers that don't understand elog(ERROR) does not return. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services slaballoc_warning_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/07/2017 12:19 AM, Andres Freund wrote: On 2017-03-02 22:51:09 +0100, Tomas Vondra wrote: Attaches is the last part of the patch series, rebased to current master and adopting the new chunk header approach. Something seems to have gone awry while sending that - the attachement is a whopping 0 bytes... Why are you complaining? That makes it much easier to review and commit! But if you insist, here is the actual patch ;-) regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 8aac670..0bdc214 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -149,15 +149,6 @@ typedef struct ReorderBufferDiskChange */ static const Size max_changes_in_memory = 4096; -/* - * We use a very simple form of a slab allocator for frequently allocated - * objects, simply keeping a fixed number in a linked list when unused, - * instead pfree()ing them. Without that in many workloads aset.c becomes a - * major bottleneck, especially when spilling to disk while decoding batch - * workloads. - */ -static const Size max_cached_tuplebufs = 4096 * 2; /* ~8MB */ - /* --- * primary reorderbuffer support routines * --- @@ -248,6 +239,10 @@ ReorderBufferAllocate(void) SLAB_DEFAULT_BLOCK_SIZE, sizeof(ReorderBufferTXN)); + buffer->tup_context = GenerationContextCreate(new_ctx, + "Tuples", + SLAB_LARGE_BLOCK_SIZE); + hash_ctl.keysize = sizeof(TransactionId); hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt); hash_ctl.hcxt = buffer->context; @@ -258,15 +253,12 @@ ReorderBufferAllocate(void) buffer->by_txn_last_xid = InvalidTransactionId; buffer->by_txn_last_txn = NULL; - buffer->nr_cached_tuplebufs = 0; - buffer->outbuf = NULL; buffer->outbufsize = 0; buffer->current_restart_decoding_lsn = InvalidXLogRecPtr; dlist_init(>toplevel_by_lsn); - slist_init(>cached_tuplebufs); return buffer; } @@ -419,42 +411,12 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len) alloc_len = tuple_len + SizeofHeapTupleHeader; - /* - * Most tuples are below MaxHeapTupleSize, so we use a slab allocator for - * those. Thus always allocate at least MaxHeapTupleSize. Note that tuples - * generated for oldtuples can be bigger, as they don't have out-of-line - * toast columns. - */ - if (alloc_len < MaxHeapTupleSize) - alloc_len = MaxHeapTupleSize; - - - /* if small enough, check the slab cache */ - if (alloc_len <= MaxHeapTupleSize && rb->nr_cached_tuplebufs) - { - rb->nr_cached_tuplebufs--; - tuple = slist_container(ReorderBufferTupleBuf, node, -slist_pop_head_node(>cached_tuplebufs)); - Assert(tuple->alloc_tuple_size == MaxHeapTupleSize); -#ifdef USE_ASSERT_CHECKING - memset(>tuple, 0xa9, sizeof(HeapTupleData)); - VALGRIND_MAKE_MEM_UNDEFINED(>tuple, sizeof(HeapTupleData)); -#endif - tuple->tuple.t_data = ReorderBufferTupleBufData(tuple); -#ifdef USE_ASSERT_CHECKING - memset(tuple->tuple.t_data, 0xa8, tuple->alloc_tuple_size); - VALGRIND_MAKE_MEM_UNDEFINED(tuple->tuple.t_data, tuple->alloc_tuple_size); -#endif - } - else - { - tuple = (ReorderBufferTupleBuf *) - MemoryContextAlloc(rb->context, - sizeof(ReorderBufferTupleBuf) + - MAXIMUM_ALIGNOF + alloc_len); - tuple->alloc_tuple_size = alloc_len; - tuple->tuple.t_data = ReorderBufferTupleBufData(tuple); - } + tuple = (ReorderBufferTupleBuf *) + MemoryContextAlloc(rb->tup_context, + sizeof(ReorderBufferTupleBuf) + + MAXIMUM_ALIGNOF + alloc_len); + tuple->alloc_tuple_size = alloc_len; + tuple->tuple.t_data = ReorderBufferTupleBufData(tuple); return tuple; } @@ -468,21 +430,7 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len) void ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf *tuple) { - /* check whether to put into the slab cache, oversized tuples never are */ - if (tuple->alloc_tuple_size == MaxHeapTupleSize && - rb->nr_cached_tuplebufs < max_cached_tuplebufs) - { - rb->nr_cached_tuplebufs++; - slist_push_head(>cached_tuplebufs, >node); - VALGRIND_MAKE_MEM_UNDEFINED(tuple->tuple.t_data, tuple->alloc_tuple_size); - VALGRIND_MAKE_MEM_UNDEFINED(tuple, sizeof(ReorderBufferTupleBuf)); - VALGRIND_MAKE_MEM_DEFINED(>node, sizeof(tuple->node)); - VALGRIND_MAKE_MEM_DEFINED(>alloc_tuple_size, sizeof(tuple->alloc_tuple_size)); - } - else - { - pfree(tuple); - } + pfree(tuple); } /* diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile index cd0e803..7263399 100644 --- a/src/backend/utils/mmgr/Makefile +++ b/src/backend/utils/mmgr/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr top_builddir = ../../../.. include
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-03-02 22:51:09 +0100, Tomas Vondra wrote: > Attaches is the last part of the patch series, rebased to current master and > adopting the new chunk header approach. Something seems to have gone awry while sending that - the attachement is a whopping 0 bytes... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-03-06 23:32:30 +0100, Tomas Vondra wrote: > > > The question however is whether this won't make the optimization > > > pointless. > > > I also, wonder how much we save by this optimization and how widely it's > > > used? Can someone point me to some numbers? > > > > I don't recall any recent numbers. I'm more than a bit doubful that it > > really matters - it's only used for the results of aggregate/window > > functions, and surely they've a good chunk of their own overhead... > > > > And if the benefit is negligible, trying to keep the optimization might > easily result in slowdown (compared to non-optimized code). I doubt the branch is noticeable here, given that we're doing a memory allocation otherwise. Should also be decently predictable. > But I'm puzzled why we haven't seen any reports of failures? I mean, doing > sum(int4) is not particularly extravagant thing, if there really is an > issue, shouldn't we see a lot of reports? What are we missing? Reports about what? False positives causing crashes / wrong results? I think it's quite unlikely to actually trigger this in practice, because you need a properly aligned pointer, and then the preceding header has to to point to a bit pattern that's equal to the context - that's presumably quite unlikely in practice. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/06/2017 08:08 PM, Andres Freund wrote: Hi, On 2017-03-06 19:49:56 +0100, Tomas Vondra wrote: On 03/06/2017 07:05 PM, Robert Haas wrote: On Mon, Mar 6, 2017 at 12:44 PM, Andres Freundwrote: On 2017-03-06 12:40:18 -0500, Robert Haas wrote: On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund wrote: The issue was that on 32bit platforms the Datum returned by some functions (int2int4_sum in this case) isn't actually a separately allocated Datum, but rather just something embedded in a larger struct. That, combined with the following code: if (!peraggstate->resulttypeByVal && !*isnull && !MemoryContextContains(CurrentMemoryContext, DatumGetPointer(*result))) seems somewhat problematic to me. MemoryContextContains() can give false positives when used on memory that's not a distinctly allocated chunk, and if so, we violate memory lifetime rules. It's quite unlikely, given the required bit patterns, but nonetheless it's making me somewhat uncomfortable. Do others think this isn't an issue and we can just live with it? I think it's 100% broken to call MemoryContextContains() on something that's not guaranteed to be a palloc'd chunk. I agree, but to me it seems the only fix would be to just yank out the whole optimization? Dunno, haven't looked into it. I think it might be fixable by adding a flag into the chunk, with 'true' for regular allocations, and 'false' for the optimized ones. And then only use MemoryContextContains() for 'flag=true' chunks. I'm not quite following here. We only get a Datum and the knowledge that it's a pass-by-ref argument, so we really don't know that much. We could create an "EmbeddedDatum" type that has a preceding chunk header (appropriately for the version), that just gets zeroed out at start. Is that what you mean? Yes, that's roughly what I meant. The question however is whether this won't make the optimization pointless. I also, wonder how much we save by this optimization and how widely it's used? Can someone point me to some numbers? I don't recall any recent numbers. I'm more than a bit doubful that it really matters - it's only used for the results of aggregate/window functions, and surely they've a good chunk of their own overhead... And if the benefit is negligible, trying to keep the optimization might easily result in slowdown (compared to non-optimized code). But I'm puzzled why we haven't seen any reports of failures? I mean, doing sum(int4) is not particularly extravagant thing, if there really is an issue, shouldn't we see a lot of reports? What are we missing? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-03-06 19:49:56 +0100, Tomas Vondra wrote: > On 03/06/2017 07:05 PM, Robert Haas wrote: > > On Mon, Mar 6, 2017 at 12:44 PM, Andres Freundwrote: > > > On 2017-03-06 12:40:18 -0500, Robert Haas wrote: > > > > On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund > > > > wrote: > > > > > The issue was that on 32bit platforms the Datum returned by some > > > > > functions (int2int4_sum in this case) isn't actually a separately > > > > > allocated Datum, but rather just something embedded in a larger > > > > > struct. That, combined with the following code: > > > > > if (!peraggstate->resulttypeByVal && !*isnull && > > > > > !MemoryContextContains(CurrentMemoryContext, > > > > > > > > > > DatumGetPointer(*result))) > > > > > seems somewhat problematic to me. MemoryContextContains() can give > > > > > false positives when used on memory that's not a distinctly allocated > > > > > chunk, and if so, we violate memory lifetime rules. It's quite > > > > > unlikely, given the required bit patterns, but nonetheless it's making > > > > > me somewhat uncomfortable. > > > > > > > > > > Do others think this isn't an issue and we can just live with it? > > > > > > > > I think it's 100% broken to call MemoryContextContains() on something > > > > that's not guaranteed to be a palloc'd chunk. > > > > > > I agree, but to me it seems the only fix would be to just yank out the > > > whole optimization? > > > > Dunno, haven't looked into it. > > > > I think it might be fixable by adding a flag into the chunk, with 'true' for > regular allocations, and 'false' for the optimized ones. And then only use > MemoryContextContains() for 'flag=true' chunks. I'm not quite following here. We only get a Datum and the knowledge that it's a pass-by-ref argument, so we really don't know that much. We could create an "EmbeddedDatum" type that has a preceding chunk header (appropriately for the version), that just gets zeroed out at start. Is that what you mean? > The question however is whether this won't make the optimization pointless. > I also, wonder how much we save by this optimization and how widely it's > used? Can someone point me to some numbers? I don't recall any recent numbers. I'm more than a bit doubful that it really matters - it's only used for the results of aggregate/window functions, and surely they've a good chunk of their own overhead... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/06/2017 07:05 PM, Robert Haas wrote: On Mon, Mar 6, 2017 at 12:44 PM, Andres Freundwrote: On 2017-03-06 12:40:18 -0500, Robert Haas wrote: On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund wrote: The issue was that on 32bit platforms the Datum returned by some functions (int2int4_sum in this case) isn't actually a separately allocated Datum, but rather just something embedded in a larger struct. That, combined with the following code: if (!peraggstate->resulttypeByVal && !*isnull && !MemoryContextContains(CurrentMemoryContext, DatumGetPointer(*result))) seems somewhat problematic to me. MemoryContextContains() can give false positives when used on memory that's not a distinctly allocated chunk, and if so, we violate memory lifetime rules. It's quite unlikely, given the required bit patterns, but nonetheless it's making me somewhat uncomfortable. Do others think this isn't an issue and we can just live with it? I think it's 100% broken to call MemoryContextContains() on something that's not guaranteed to be a palloc'd chunk. I agree, but to me it seems the only fix would be to just yank out the whole optimization? Dunno, haven't looked into it. I think it might be fixable by adding a flag into the chunk, with 'true' for regular allocations, and 'false' for the optimized ones. And then only use MemoryContextContains() for 'flag=true' chunks. The question however is whether this won't make the optimization pointless. I also, wonder how much we save by this optimization and how widely it's used? Can someone point me to some numbers? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Mon, Mar 6, 2017 at 12:44 PM, Andres Freundwrote: > On 2017-03-06 12:40:18 -0500, Robert Haas wrote: >> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund wrote: >> > The issue was that on 32bit platforms the Datum returned by some >> > functions (int2int4_sum in this case) isn't actually a separately >> > allocated Datum, but rather just something embedded in a larger >> > struct. That, combined with the following code: >> > if (!peraggstate->resulttypeByVal && !*isnull && >> > !MemoryContextContains(CurrentMemoryContext, >> > >> > DatumGetPointer(*result))) >> > seems somewhat problematic to me. MemoryContextContains() can give >> > false positives when used on memory that's not a distinctly allocated >> > chunk, and if so, we violate memory lifetime rules. It's quite >> > unlikely, given the required bit patterns, but nonetheless it's making >> > me somewhat uncomfortable. >> > >> > Do others think this isn't an issue and we can just live with it? >> >> I think it's 100% broken to call MemoryContextContains() on something >> that's not guaranteed to be a palloc'd chunk. > > I agree, but to me it seems the only fix would be to just yank out the > whole optimization? Dunno, haven't looked into it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-03-06 12:40:18 -0500, Robert Haas wrote: > On Wed, Mar 1, 2017 at 5:55 PM, Andres Freundwrote: > > The issue was that on 32bit platforms the Datum returned by some > > functions (int2int4_sum in this case) isn't actually a separately > > allocated Datum, but rather just something embedded in a larger > > struct. That, combined with the following code: > > if (!peraggstate->resulttypeByVal && !*isnull && > > !MemoryContextContains(CurrentMemoryContext, > > > > DatumGetPointer(*result))) > > seems somewhat problematic to me. MemoryContextContains() can give > > false positives when used on memory that's not a distinctly allocated > > chunk, and if so, we violate memory lifetime rules. It's quite > > unlikely, given the required bit patterns, but nonetheless it's making > > me somewhat uncomfortable. > > > > Do others think this isn't an issue and we can just live with it? > > I think it's 100% broken to call MemoryContextContains() on something > that's not guaranteed to be a palloc'd chunk. I agree, but to me it seems the only fix would be to just yank out the whole optimization? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Wed, Mar 1, 2017 at 5:55 PM, Andres Freundwrote: > The issue was that on 32bit platforms the Datum returned by some > functions (int2int4_sum in this case) isn't actually a separately > allocated Datum, but rather just something embedded in a larger > struct. That, combined with the following code: > if (!peraggstate->resulttypeByVal && !*isnull && > !MemoryContextContains(CurrentMemoryContext, > > DatumGetPointer(*result))) > seems somewhat problematic to me. MemoryContextContains() can give > false positives when used on memory that's not a distinctly allocated > chunk, and if so, we violate memory lifetime rules. It's quite > unlikely, given the required bit patterns, but nonetheless it's making > me somewhat uncomfortable. > > Do others think this isn't an issue and we can just live with it? I think it's 100% broken to call MemoryContextContains() on something that's not guaranteed to be a palloc'd chunk. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/04/2017 02:58 AM, Andres Freund wrote: On 2017-03-01 22:19:30 -0800, Andres Freund wrote: On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote: I've noticed two minor typos: 1) That is solved this by creating ... - extra "this" 2) Given this, routines like pfree their corresponding context ... - missing "find" or "determine" Will fix. And done. I also see you've explicitly mentioned the callbacks were added in 9.5. Doesn't that somewhat reintroduce the historical account? That section I just moved up, the version reference was there before. I left it in, because it seemed new enough to still be somewhat relevant; removed and added it, not sure what's better. Left that in place for now. Yeah. I haven't realized it was just moved a bit, and moreover it probably makes sense to have some comments regarding differences between current versions. So +1 to that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-03-01 22:19:30 -0800, Andres Freund wrote: > On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote: > > I've noticed two minor typos: > > > > 1) That is solved this by creating ... > >- extra "this" > > > > 2) Given this, routines like pfree their corresponding context ... > >- missing "find" or "determine" > > Will fix. And done. > > I also see you've explicitly mentioned the callbacks were added in 9.5. > > Doesn't that somewhat reintroduce the historical account? > > That section I just moved up, the version reference was there before. I > left it in, because it seemed new enough to still be somewhat > relevant; removed and added it, not sure what's better. Left that in place for now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/01/2017 05:29 AM, Andres Freund wrote: On 2017-02-28 20:18:35 -0800, Andres Freund wrote: - Andres, hoping the buildfarm turns greener Oh well, that didn't work. Investigating. Attaches is the last part of the patch series, rebased to current master and adopting the new chunk header approach. Unlike Slab, this context needs the whole AllocSet header (size, requested_size), and also the block pointer, so no padding seems to be needed. I've tested this on x86-64 and amrv7l, and the test_decoding test suite passes on both. FWIW, I'm still not entirely happy with the name "Generation". I agree with Andres that it's perhaps a bit too generic, but more importantly the name might actually be a bit obsolete. There used to be generations of chunks, but that's gone. Now it simply does not reuse the chunks at all, and frees the blocks when they get empty. It's not entirely FIFO though, because the transactions interleave, so later blocks may be released first. But the "allocated close, freed close" is still there. So perhaps something like "TemporalSet" or something like that would be a better name? Man, naming things is hard ... regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services gen-context.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-03-02 04:47:13 +0100, Tomas Vondra wrote: > On 03/01/2017 11:55 PM, Andres Freund wrote: > > On 2017-02-28 20:29:36 -0800, Andres Freund wrote: > > > On 2017-02-28 20:18:35 -0800, Andres Freund wrote: > > > > - Andres, hoping the buildfarm turns greener > > > > > > Oh well, that didn't work. Investigating. > > > > The fix for that was fairly trivial, and the buildfarm has cooled down. > > > > The issue was that on 32bit platforms the Datum returned by some > > functions (int2int4_sum in this case) isn't actually a separately > > allocated Datum, but rather just something embedded in a larger > > struct. That, combined with the following code: > > if (!peraggstate->resulttypeByVal && !*isnull && > > !MemoryContextContains(CurrentMemoryContext, > > > > DatumGetPointer(*result))) > > seems somewhat problematic to me. MemoryContextContains() can give > > false positives when used on memory that's not a distinctly allocated > > chunk, and if so, we violate memory lifetime rules. It's quite > > unlikely, given the required bit patterns, but nonetheless it's making > > me somewhat uncomfortable. > > > > I assume you're only using that code snippet as an example of code that > might be broken by MemoryContextContains() false positives, right? I'm mentioning that piece of code because it's what temporarily caused all 32bit animals to fail, when I had made MemoryContextContains() less forgiving. > (I don't see how the slab allocator could interfere with aggregates, as it's > only used for reorderbuffer.c). Indeed, this is independent of slab.c. I just came across it because I triggered crashes when shrinking the StandardChunkHeader to be just the chunk's MemoryContext. > > Do others think this isn't an issue and we can just live with it? > > > > My understanding is all the places calling MemoryContextContains() assume > they can't receive memory not allocated as a simple chunk by palloc(). If > that's not the case, it's likely broken. Yea, that's my conclusion too. Which means nodeAgg.c and nodeWindowAgg.c are broken afaics, because of e.g. int2int4_sum's() use of Int64GetDatumFast() on sub-parts of larger allocations. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote: > I've noticed two minor typos: > > 1) That is solved this by creating ... >- extra "this" > > 2) Given this, routines like pfree their corresponding context ... >- missing "find" or "determine" Will fix. > I also see you've explicitly mentioned the callbacks were added in 9.5. > Doesn't that somewhat reintroduce the historical account? That section I just moved up, the version reference was there before. I left it in, because it seemed new enough to still be somewhat relevant; removed and added it, not sure what's better. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/01/2017 11:55 PM, Andres Freund wrote: On 2017-02-28 20:29:36 -0800, Andres Freund wrote: On 2017-02-28 20:18:35 -0800, Andres Freund wrote: - Andres, hoping the buildfarm turns greener Oh well, that didn't work. Investigating. The fix for that was fairly trivial, and the buildfarm has cooled down. The issue was that on 32bit platforms the Datum returned by some functions (int2int4_sum in this case) isn't actually a separately allocated Datum, but rather just something embedded in a larger struct. That, combined with the following code: if (!peraggstate->resulttypeByVal && !*isnull && !MemoryContextContains(CurrentMemoryContext, DatumGetPointer(*result))) seems somewhat problematic to me. MemoryContextContains() can give false positives when used on memory that's not a distinctly allocated chunk, and if so, we violate memory lifetime rules. It's quite unlikely, given the required bit patterns, but nonetheless it's making me somewhat uncomfortable. I assume you're only using that code snippet as an example of code that might be broken by MemoryContextContains() false positives, right? (I don't see how the slab allocator could interfere with aggregates, as it's only used for reorderbuffer.c). > Do others think this isn't an issue and we can just live with it? My understanding is all the places calling MemoryContextContains() assume they can't receive memory not allocated as a simple chunk by palloc(). If that's not the case, it's likely broken. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 03/01/2017 05:18 AM, Andres Freund wrote: On 2017-02-28 10:41:22 -0800, Andres Freund wrote: Hi, On 2017-02-27 23:44:20 -0800, Andres Freund wrote: *preliminary* patch attached. This needs a good bit of polishing (primarily comment work, verifying that valgrind works), but I'm too tired now. I'm not quite sure how to deal with mmgr/README - it's written as kind of a historical document, and the "Mechanisms to Allow Multiple Types of Contexts" is already quite out of date. I think it'd be good to rip out all the historical references and just describe the current state, but I'm not really enthusiastic about tackling that :/ While still not enthusiastic, I took a stab at doing so. While still not perfect, I do think this is an improvement. Is anybody uncomfortable going away from the current historical account style? I've pushed these now. I'm not claiming that the README revision is perfect, but we can incremently improve it further... Thanks. I went through the README and it definitely looks better now. I've noticed two minor typos: 1) That is solved this by creating ... - extra "this" 2) Given this, routines like pfree their corresponding context ... - missing "find" or "determine" I also see you've explicitly mentioned the callbacks were added in 9.5. Doesn't that somewhat reintroduce the historical account? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-28 20:29:36 -0800, Andres Freund wrote: > On 2017-02-28 20:18:35 -0800, Andres Freund wrote: > > - Andres, hoping the buildfarm turns greener > > Oh well, that didn't work. Investigating. The fix for that was fairly trivial, and the buildfarm has cooled down. The issue was that on 32bit platforms the Datum returned by some functions (int2int4_sum in this case) isn't actually a separately allocated Datum, but rather just something embedded in a larger struct. That, combined with the following code: if (!peraggstate->resulttypeByVal && !*isnull && !MemoryContextContains(CurrentMemoryContext, DatumGetPointer(*result))) seems somewhat problematic to me. MemoryContextContains() can give false positives when used on memory that's not a distinctly allocated chunk, and if so, we violate memory lifetime rules. It's quite unlikely, given the required bit patterns, but nonetheless it's making me somewhat uncomfortable. Do others think this isn't an issue and we can just live with it? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-28 20:18:35 -0800, Andres Freund wrote: > - Andres, hoping the buildfarm turns greener Oh well, that didn't work. Investigating. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-28 10:41:22 -0800, Andres Freund wrote: > Hi, > > On 2017-02-27 23:44:20 -0800, Andres Freund wrote: > > *preliminary* patch attached. This needs a good bit of polishing > > (primarily comment work, verifying that valgrind works), but I'm too > > tired now. > > > > I'm not quite sure how to deal with mmgr/README - it's written as kind > > of a historical document, and the "Mechanisms to Allow Multiple Types of > > Contexts" is already quite out of date. I think it'd be good to rip out > > all the historical references and just describe the current state, but > > I'm not really enthusiastic about tackling that :/ > > While still not enthusiastic, I took a stab at doing so. While still > not perfect, I do think this is an improvement. > > Is anybody uncomfortable going away from the current historical account > style? I've pushed these now. I'm not claiming that the README revision is perfect, but we can incremently improve it further... - Andres, hoping the buildfarm turns greener -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-27 23:44:20 -0800, Andres Freund wrote: > *preliminary* patch attached. This needs a good bit of polishing > (primarily comment work, verifying that valgrind works), but I'm too > tired now. > > I'm not quite sure how to deal with mmgr/README - it's written as kind > of a historical document, and the "Mechanisms to Allow Multiple Types of > Contexts" is already quite out of date. I think it'd be good to rip out > all the historical references and just describe the current state, but > I'm not really enthusiastic about tackling that :/ While still not enthusiastic, I took a stab at doing so. While still not perfect, I do think this is an improvement. Is anybody uncomfortable going away from the current historical account style? - Andres >From 5eef2abe5e593b6a9b072b58bbecadbe15689a01 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Tue, 28 Feb 2017 10:36:29 -0800 Subject: [PATCH 2/2] Overhaul mmgr/'s README. The README was written as a "historical account", and that style hasn't aged particularly well. Rephrase it to describe the current situation, instead of having various version specific comments. This also updates the description of how allocated chunks are associated with their corresponding context, the method of which has changed with the previous commit. Author: Andres Freund Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc729...@2ndquadrant.com --- src/backend/utils/mmgr/README | 292 +++--- 1 file changed, 132 insertions(+), 160 deletions(-) diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README index f97d7653de..b83b29c268 100644 --- a/src/backend/utils/mmgr/README +++ b/src/backend/utils/mmgr/README @@ -1,15 +1,7 @@ src/backend/utils/mmgr/README -Notes About Memory Allocation Redesign -== - -Up through version 7.0, Postgres had serious problems with memory leakage -during large queries that process a lot of pass-by-reference data. There -was no provision for recycling memory until end of query. This needed to be -fixed, even more so with the advent of TOAST which allows very large chunks -of data to be passed around in the system. This document describes the new -memory management system implemented in 7.1. - +Memory Context System Design Overview += Background -- @@ -38,10 +30,10 @@ to or get more memory from the same context the chunk was originally allocated in. At all times there is a "current" context denoted by the -CurrentMemoryContext global variable. The backend macro palloc() -implicitly allocates space in that context. The MemoryContextSwitchTo() -operation selects a new current context (and returns the previous context, -so that the caller can restore the previous context before exiting). +CurrentMemoryContext global variable. palloc() implicitly allocates space +in that context. The MemoryContextSwitchTo() operation selects a new current +context (and returns the previous context, so that the caller can restore the +previous context before exiting). The main advantage of memory contexts over plain use of malloc/free is that the entire contents of a memory context can be freed easily, without @@ -60,8 +52,10 @@ The behavior of palloc and friends is similar to the standard C library's malloc and friends, but there are some deliberate differences too. Here are some notes to clarify the behavior. -* If out of memory, palloc and repalloc exit via elog(ERROR). They never -return NULL, and it is not necessary or useful to test for such a result. +* If out of memory, palloc and repalloc exit via elog(ERROR). They +never return NULL, and it is not necessary or useful to test for such +a result. With palloc_extended() that behavior can be overridden +using the MCXT_ALLOC_NO_OOM flag. * palloc(0) is explicitly a valid operation. It does not return a NULL pointer, but a valid chunk of which no bytes may be used. However, the @@ -71,28 +65,18 @@ error. Similarly, repalloc allows realloc'ing to zero size. * pfree and repalloc do not accept a NULL pointer. This is intentional. -pfree/repalloc No Longer Depend On CurrentMemoryContext +The Current Memory Context +-- -Since Postgres 7.1, pfree() and repalloc() can be applied to any chunk -whether it belongs to CurrentMemoryContext or not --- the chunk's owning -context will be invoked to handle the operation, regardless. This is a -change from the old requirement that CurrentMemoryContext must be set -to the same context the memory was allocated from before one can use -pfree() or repalloc(). - -There was some consideration of getting rid of CurrentMemoryContext entirely, -instead requiring the target memory context for allocation to be specified -explicitly. But we decided that would be too much notational overhead --- -we'd
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, *preliminary* patch attached. This needs a good bit of polishing (primarily comment work, verifying that valgrind works), but I'm too tired now. I'm not quite sure how to deal with mmgr/README - it's written as kind of a historical document, and the "Mechanisms to Allow Multiple Types of Contexts" is already quite out of date. I think it'd be good to rip out all the historical references and just describe the current state, but I'm not really enthusiastic about tackling that :/ On 2017-02-28 00:29:44 -0500, Tom Lane wrote: > Not sure either, but suggest we add a StaticAssert asserting there's no > padding; something along the lines of > offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == > MAXALIGN(sizeof(AllocSetChunkHeader)) Included. With Craig's help I've verified that the patch as attached works on a platform that's currently failing. Thanks! - Andres >From a59c3200dd127feb0cb09a055250ff6401aee1aa Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Mon, 27 Feb 2017 23:32:22 -0800 Subject: [PATCH] Remove StandardChunkHeader for Slab's benefit. --- src/backend/utils/mmgr/aset.c| 51 - src/backend/utils/mmgr/mcxt.c| 121 +++ src/backend/utils/mmgr/slab.c| 81 -- src/include/utils/memutils.h | 56 +++--- src/tools/pgindent/typedefs.list | 1 - 5 files changed, 92 insertions(+), 218 deletions(-) diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 8056c00ae4..31228ade19 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -97,20 +97,7 @@ */ #define ALLOC_BLOCKHDRSZ MAXALIGN(sizeof(AllocBlockData)) -#define ALLOC_CHUNKHDRSZ MAXALIGN(sizeof(AllocChunkData)) - -/* Portion of ALLOC_CHUNKHDRSZ examined outside aset.c. */ -#define ALLOC_CHUNK_PUBLIC \ - (offsetof(AllocChunkData, size) + sizeof(Size)) - -/* Portion of ALLOC_CHUNKHDRSZ excluding trailing padding. */ -#ifdef MEMORY_CONTEXT_CHECKING -#define ALLOC_CHUNK_USED \ - (offsetof(AllocChunkData, requested_size) + sizeof(Size)) -#else -#define ALLOC_CHUNK_USED \ - (offsetof(AllocChunkData, size) + sizeof(Size)) -#endif +#define ALLOC_CHUNKHDRSZ sizeof(struct AllocChunkData) typedef struct AllocBlockData *AllocBlock; /* forward reference */ typedef struct AllocChunkData *AllocChunk; @@ -169,20 +156,27 @@ typedef struct AllocBlockData /* * AllocChunk * The prefix of each piece of memory in an AllocBlock - * - * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. */ typedef struct AllocChunkData { - /* aset is the owning aset if allocated, or the freelist link if free */ - void *aset; /* size is always the size of the usable space in the chunk */ Size size; + #ifdef MEMORY_CONTEXT_CHECKING /* when debugging memory usage, also store actual requested size */ /* this is zero in a free chunk */ Size requested_size; + +#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4 + Size padding; #endif + +#endif /* MEMORY_CONTEXT_CHECKING */ + + /* aset is the owning aset if allocated, or the freelist link if free */ + void *aset; + + /* there must not be any padding to reach a MAXALIGN boundary here! */ } AllocChunkData; /* @@ -334,6 +328,10 @@ AllocSetContextCreate(MemoryContext parent, { AllocSet set; + StaticAssertStmt(offsetof(AllocChunkData, aset) + sizeof(MemoryContext) == + MAXALIGN(sizeof(AllocChunkData)), + "padding calculation in AllocChunkData is wrong"); + /* * First, validate allocation parameters. (If we're going to throw an * error, we should do so before the context is created, not after.) We @@ -616,13 +614,14 @@ AllocSetAlloc(MemoryContext context, Size size) AllocAllocInfo(set, chunk); /* - * Chunk header public fields remain DEFINED. The requested - * allocation itself can be NOACCESS or UNDEFINED; our caller will - * soon make it UNDEFINED. Make extra space at the end of the chunk, - * if any, NOACCESS. + * Chunk's context fields remain DEFINED. The requested allocation + * itself can be NOACCESS or UNDEFINED; our caller will soon make it + * UNDEFINED. Make extra space at the end of the chunk, if any, + * NOACCESS. */ - VALGRIND_MAKE_MEM_NOACCESS((char *) chunk + ALLOC_CHUNK_PUBLIC, - chunk_size + ALLOC_CHUNKHDRSZ - ALLOC_CHUNK_PUBLIC); + VALGRIND_MAKE_MEM_NOACCESS((char *) chunk, + chunk_size + ALLOC_CHUNKHDRSZ); + VALGRIND_MAKE_MEM_DEFINED(chunk->aset, sizeof(MemoryContext)); return AllocChunkGetPointer(chunk); } @@ -709,7 +708,7 @@ AllocSetAlloc(MemoryContext context, Size size) chunk = (AllocChunk) (block->freeptr); /* Prepare to initialize the chunk header. */ -VALGRIND_MAKE_MEM_UNDEFINED(chunk, ALLOC_CHUNK_USED); +VALGRIND_MAKE_MEM_UNDEFINED(chunk, sizeof(AllocChunkData)); block->freeptr += (availchunk + ALLOC_CHUNKHDRSZ); availspace -= (availchunk +
Re: [HACKERS] PATCH: two slab-like memory allocators
Andres Freundwrites: > Hm, that should be doable with something like > #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4 > which'd probably be better documentation than a macro that hides this > (arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better. Not sure either, but suggest we add a StaticAssert asserting there's no padding; something along the lines of offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == MAXALIGN(sizeof(AllocSetChunkHeader)) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-27 22:57:24 -0500, Tom Lane wrote: > If the slab allocator would be happier with just a MemoryContext pointer > as chunk header, I think we should push in this direction rather than > invent some short-term hack. It would - it really doesn't need the size, because it's the same for the whole context, and thereby is a waste of space. Still wondering if we should band-aid this till that's done. > One could imagine redefining aset.c's chunk header along the lines of > > typedef struct AllocSetChunkHeader > { > Sizesize; /* size of data space allocated in chunk */ > #ifdef MEMORY_CONTEXT_CHECKING > Sizerequested_size; /* original request size */ > #if 32-bit-but-maxalign-is-8 > Sizepadding; /* needed to avoid padding below */ > #endif > #endif > MemoryContext context;/* owning context */ > /* there must not be any padding to reach a MAXALIGN boundary here! */ > } AllocSetChunkHeader; > > where we'd possibly need some help from configure to implement that inner > #if condition, but it seems doable enough. Hm, that should be doable with something like #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4 which'd probably be better documentation than a macro that hides this (arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better. Working on a patch now, will post but not push tonight. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Andres Freundwrites: > Independently of this, we really should redefine StandardChunkHeader to > be only the MemoryContext. There's no need to have size/requested_size > part of StandardChunkHeader, given there's > MemoryContextMethods->get_chunk_space(). Yeah, perhaps. The trick would be to get things laid out so that the MemoryContext pointer is always immediately adjacent to the chunk data (no padding between). One could imagine redefining aset.c's chunk header along the lines of typedef struct AllocSetChunkHeader { Sizesize; /* size of data space allocated in chunk */ #ifdef MEMORY_CONTEXT_CHECKING Sizerequested_size; /* original request size */ #if 32-bit-but-maxalign-is-8 Sizepadding; /* needed to avoid padding below */ #endif #endif MemoryContext context;/* owning context */ /* there must not be any padding to reach a MAXALIGN boundary here! */ } AllocSetChunkHeader; where we'd possibly need some help from configure to implement that inner #if condition, but it seems doable enough. If the slab allocator would be happier with just a MemoryContext pointer as chunk header, I think we should push in this direction rather than invent some short-term hack. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-28 01:44:42 +0100, Tomas Vondra wrote: > On 02/27/2017 06:42 PM, Andres Freund wrote: > > Yea, I hadn't yet realized when writing that that termite actually, > > despite running on ppc64, compiles a 32bit postgres. Will thus > > duplicate StandardChunkHeader's contents in to slab.c :( - I don't > > see an easy way around that... > > I've tried this - essentially copying the StandardChunkHeader's contents > into SlabChunk, but that does not seem to do the trick, sadly. Per pahole, > the structures then (at least on armv7l) look like this: > > struct SlabChunk { > void * block;/* 0 4 */ > MemoryContext context; /* 4 4 */ > Size size; /* 8 4 */ > Size requested_size; /* 12 4 */ > > /* size: 16, cachelines: 1, members: 4 */ > /* last cacheline: 16 bytes */ > }; > > struct StandardChunkHeader { > MemoryContext context; /* 0 4 */ > Size size; /* 4 4 */ > Size requested_size; /* 8 4 */ > > /* size: 12, cachelines: 1, members: 3 */ > /* last cacheline: 12 bytes */ > }; > So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so > pfree() grabs the block pointer but thinks it's the context :-( Hm. The only way I can think of to do achieve the right thing here would be something like: typedef struct StandardChunkHeader { MemoryContext context; /* owning context */ Sizesize; /* size of data space allocated in chunk */ #ifdef MEMORY_CONTEXT_CHECKING /* when debugging memory usage, also store actual requested size */ Sizerequested_size; #endif union { char *data; /* ensure MAXALIGNed */ int64 alignof_int64; double alignof_double; } d; } StandardChunkHeader; typedef struct SlabChunk { void *block; StandardChunkHeader header; } SlabChunk; That's not overly pretty, but also not absolutely disgusting. Unifying the padding calculations between allocators would be a nice side-effect. Note we at least previously had such union/double tricks in the tree, via http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e1a11d93111ff3fba7a91f3f2ac0b0aca16909a8 It might be a good idea to have configure define maxaligned_type instead of including both int64/double (although it'll IIRC practically always be double that's maxaligned). Independently of this, we really should redefine StandardChunkHeader to be only the MemoryContext. There's no need to have size/requested_size part of StandardChunkHeader, given there's MemoryContextMethods->get_chunk_space(). > Not sure what to do about this - the only thing I can think about is > splitting SlabChunk into two separate structures, and align them > independently. > > The attached patch does that - it probably needs a bit more work on the > comments to make it commit-ready, but it fixes the test_deconding tests on > the rpi3 board I'm using for testing. That'd work as well, although at the very least I'd want to add a comment explaining the actual memory layout somewhere - this is a bit too finnicky to expect to get right the next time round. Any preferences? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/27/2017 06:42 PM, Andres Freund wrote: On 2017-02-27 12:27:48 -0500, Tom Lane wrote: Andres Freundwrites: The best theory I have so far that I have is that slab.c's idea of StandardChunkHeader's size doesn't match what mcxt.c think it is (because slab.c simply embeds StandardChunkHeader, but mcxt uses MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't quite see how that'd cause the issue, since StandardChunkHeader's size should always be properly sized. Uh, wrong. On a 32-bit machine with debug enabled, StandardChunkHeader will contain 3 4-byte fields. However, there are some such machines on which MAXALIGN is 8. For example, looking at termite's configure output: checking size of void *... 4 checking size of size_t... 4 checking size of long... 4 checking alignment of short... 2 checking alignment of int... 4 checking alignment of long... 4 checking alignment of long long int... 8 checking alignment of double... 8 axolotl's output looks similar. I expect my old HPPA dinosaur will show the failure as well. Yea, I hadn't yet realized when writing that that termite actually, despite running on ppc64, compiles a 32bit postgres. Will thus duplicate StandardChunkHeader's contents in to slab.c :( - I don't see an easy way around that... I've tried this - essentially copying the StandardChunkHeader's contents into SlabChunk, but that does not seem to do the trick, sadly. Per pahole, the structures then (at least on armv7l) look like this: struct SlabChunk { void * block;/* 0 4 */ MemoryContext context; /* 4 4 */ Size size; /* 8 4 */ Size requested_size; /* 12 4 */ /* size: 16, cachelines: 1, members: 4 */ /* last cacheline: 16 bytes */ }; struct StandardChunkHeader { MemoryContext context; /* 0 4 */ Size size; /* 4 4 */ Size requested_size; /* 8 4 */ /* size: 12, cachelines: 1, members: 3 */ /* last cacheline: 12 bytes */ }; So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so pfree() grabs the block pointer but thinks it's the context :-( Not sure what to do about this - the only thing I can think about is splitting SlabChunk into two separate structures, and align them independently. The attached patch does that - it probably needs a bit more work on the comments to make it commit-ready, but it fixes the test_deconding tests on the rpi3 board I'm using for testing. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-fix.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/27/2017 04:07 PM, Andres Freund wrote: On February 27, 2017 6:14:20 AM PST, Tomas Vondrawrote: On 02/27/2017 01:02 PM, Andres Freund wrote: Hi, On 2017-02-27 03:17:32 -0800, Andres Freund wrote: I'll work on getting slab committed first, and then review / edit / commit generation.c later. One first note there is that I'm wondering if generation.c is a too generic filename. And pushed slab and its usage. Will have a look at generation.c tomorrow. - Andres Gah. I don't want to annoy person who just committed my patch, but can you give more time when asking for feedback? I mean, sending a modified patch on Friday midnight, and committing on Monday noon does not really give much time to look at it. Hm. The changes IMO weren't controversial (or surprising -most of them I had announced previously); I announced that I would when posting the review that I'd push the patch later that weekend. If I hadn't been tired after doing the review/editing I'd have just pushed right then and there. It's hard to find time and attention, so not introducing a week of feedback time is quite worthwhile. I listed the changes I made primarily for posterities sake. Most if not all committers make editorializing changed around commit, so that's not just me. If you specifically want I can try to give you more time to look at an edited patch, but that'll mean things move slower. I won't promise not to make minor changed just before commit either way, I always do another round of review just before push. I also agree the changes are not particularly controversial, but then why to ask for comments at all? I'm OK with a committer making final tweaks and pushing it without asking, but if you ask for comments, let's give people time to actually respond. I agree introducing weeks of delays would be silly, but I'm not asking for that. I'm perfectly fine with two days for feedback, as long as it's not a weekend + half of Monday. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/27/2017 06:40 PM, Andres Freund wrote: On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote: On 27/02/17 18:00, Andres Freund wrote: FWIW I think the ppc64 machines are failing because of unrelated issue (changes to integer timestamps). We should probably look at 32bit machines first. Don't think so - termite is ppc64 afaics, and the failure doesn't look integer timestamp related (assert failure is clearly about this, and set of changed commits *only* include slab related commits). termite is ppc64 but with 4 byte pointer size according to configure so it might be related to that perhaps? Uh, ok. I checked the --configure options, but not the actual configure output (blame -ENOCOFEE and jetlag). The output makes it fairly likely that my StandardChunkHeader theory is valid, so I'll work on a patch to clean that up. Thanks. I set up a rpi3 machine (amrv7l) that fails with the same issue, so if you need to test the patch, let me know. While building, I've also noticed a bunch of warnings about string formatting, attached is a patch that that fixes those. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c index a5e140e..c673bc3 100644 --- a/src/backend/utils/mmgr/slab.c +++ b/src/backend/utils/mmgr/slab.c @@ -207,7 +207,7 @@ SlabContextCreate(MemoryContext parent, /* Make sure the block can store at least one chunk. */ if (blockSize - sizeof(SlabBlock) < fullChunkSize) - elog(ERROR, "block size %ld for slab is too small for %ld chunks", + elog(ERROR, "block size %zu for slab is too small for %zu chunks", blockSize, chunkSize); /* Compute maximum number of chunks per block */ @@ -333,7 +333,7 @@ SlabAlloc(MemoryContext context, Size size) /* make sure we only allow correct request size */ if (size != slab->chunkSize) - elog(ERROR, "unexpected alloc chunk size %ld (expected %ld)", + elog(ERROR, "unexpected alloc chunk size %zu (expected %zu)", size, slab->chunkSize); /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-27 12:27:48 -0500, Tom Lane wrote: > Andres Freundwrites: > > The best theory I have so far that I have is that slab.c's idea of > > StandardChunkHeader's size doesn't match what mcxt.c think it is > > (because slab.c simply embeds StandardChunkHeader, but mcxt uses > > MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't > > quite see how that'd cause the issue, since StandardChunkHeader's size > > should always be properly sized. > > Uh, wrong. On a 32-bit machine with debug enabled, StandardChunkHeader > will contain 3 4-byte fields. However, there are some such machines on > which MAXALIGN is 8. For example, looking at termite's configure > output: > > checking size of void *... 4 > checking size of size_t... 4 > checking size of long... 4 > checking alignment of short... 2 > checking alignment of int... 4 > checking alignment of long... 4 > checking alignment of long long int... 8 > checking alignment of double... 8 > > axolotl's output looks similar. I expect my old HPPA dinosaur > will show the failure as well. Yea, I hadn't yet realized when writing that that termite actually, despite running on ppc64, compiles a 32bit postgres. Will thus duplicate StandardChunkHeader's contents in to slab.c :( - I don't see an easy way around that... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Andres Freundwrites: > The best theory I have so far that I have is that slab.c's idea of > StandardChunkHeader's size doesn't match what mcxt.c think it is > (because slab.c simply embeds StandardChunkHeader, but mcxt uses > MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't > quite see how that'd cause the issue, since StandardChunkHeader's size > should always be properly sized. Uh, wrong. On a 32-bit machine with debug enabled, StandardChunkHeader will contain 3 4-byte fields. However, there are some such machines on which MAXALIGN is 8. For example, looking at termite's configure output: checking size of void *... 4 checking size of size_t... 4 checking size of long... 4 checking alignment of short... 2 checking alignment of int... 4 checking alignment of long... 4 checking alignment of long long int... 8 checking alignment of double... 8 axolotl's output looks similar. I expect my old HPPA dinosaur will show the failure as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote: > On 27/02/17 18:00, Andres Freund wrote: > > > >> FWIW I think the ppc64 machines are failing because of unrelated issue > >> (changes to integer timestamps). We should probably look at 32bit machines > >> first. > > > > Don't think so - termite is ppc64 afaics, and the failure doesn't look > > integer timestamp related (assert failure is clearly about this, and set > > of changed commits *only* include slab related commits). > > > > termite is ppc64 but with 4 byte pointer size according to configure so > it might be related to that perhaps? Uh, ok. I checked the --configure options, but not the actual configure output (blame -ENOCOFEE and jetlag). The output makes it fairly likely that my StandardChunkHeader theory is valid, so I'll work on a patch to clean that up. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Andres Freundwrites: >> FWIW I think the ppc64 machines are failing because of unrelated issue >> (changes to integer timestamps). We should probably look at 32bit machines >> first. > Don't think so - termite is ppc64 afaics, and the failure doesn't look > integer timestamp related (assert failure is clearly about this, and set > of changed commits *only* include slab related commits). There are a couple of animals that have --disable-integer-datetimes, but those are failing at the configure stage, and were doing so before today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 27/02/17 18:00, Andres Freund wrote: > >> FWIW I think the ppc64 machines are failing because of unrelated issue >> (changes to integer timestamps). We should probably look at 32bit machines >> first. > > Don't think so - termite is ppc64 afaics, and the failure doesn't look > integer timestamp related (assert failure is clearly about this, and set > of changed commits *only* include slab related commits). > termite is ppc64 but with 4 byte pointer size according to configure so it might be related to that perhaps? -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-27 17:56:08 +0100, Tomas Vondra wrote: > No, I don't, but I'll ping Craig. I might ping him, but it's ~4AM in > Australia, though, so it'll take time. Did the same... ;) > FWIW I think the ppc64 machines are failing because of unrelated issue > (changes to integer timestamps). We should probably look at 32bit machines > first. Don't think so - termite is ppc64 afaics, and the failure doesn't look integer timestamp related (assert failure is clearly about this, and set of changed commits *only* include slab related commits). Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/27/2017 05:48 PM, Andres Freund wrote: On 2017-02-27 07:55:32 -0800, Andres Freund wrote: On 2017-02-27 10:32:25 -0500, Tom Lane wrote: Andres Freundwrites: And pushed slab and its usage. Will have a look at generation.c tomorrow. Perhaps first you need to find out why so much of the buildfarm is unhappy. Will do, after a morning coffee. Hm. Not entirely clear on what's going on yet. I've run the tests on hydra (community ppc 64 machine), which is pretty similar to termite which failed [1] with: TRAP: BadArgument("!(((context) != ((void *)0) && (const Node*)((context)))->type) == T_AllocSetContext) || const Node*)((context)))->type) == T_SlabContext", File: "/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/utils/mmgr/mcxt.c", Line: 1010) The best theory I have so far that I have is that slab.c's idea of StandardChunkHeader's size doesn't match what mcxt.c think it is (because slab.c simply embeds StandardChunkHeader, but mcxt uses MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't quite see how that'd cause the issue, since StandardChunkHeader's size should always be properly sized. Tomas, do you have access to termite (which appears to be run by Craig, under company mail). No, I don't, but I'll ping Craig. I might ping him, but it's ~4AM in Australia, though, so it'll take time. FWIW I think the ppc64 machines are failing because of unrelated issue (changes to integer timestamps). We should probably look at 32bit machines first. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-27 15:11:40 +0100, Tomas Vondra wrote: > > I've not yet reviewed the generational allocator yet, but during these > > measurements I get: > > postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', > > NULL, NULL); > > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in > > block 0x55d011ef10f0 exceeds 7234 allocated > > LOCATION: GenerationCheck, generation.c:693 > > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in > > block 0x55d01023eba0 exceeds 65532 allocated > > LOCATION: GenerationCheck, generation.c:693 > > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in > > block 0x55d00d7fb870 exceeds 65532 allocated > > LOCATION: GenerationCheck, generation.c:693 > > WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in > > block 0x55d00cde17b0 exceeds 65531 allocated > > LOCATION: GenerationCheck, generation.c:693 > > > > that seems to occur when there's currently in-progress transactions when > > finishing decoding: > > > ... > > > > could it be that the test's condition is inverted? > > > > Yeah, that seems like the culprit - the condition seems wrong. I wonder why > I haven't seen it during my tests, though ... I suspect it's because your tests only triggered a memory context reset when it was empty... But I ran decoding while a concurrent write transaction was ongoing... > > I'll work on getting slab committed first, and then review / edit / > > commit generation.c later. One first note there is that I'm wondering > > if generation.c is a too generic filename. > Naming things is hard. Indeed. I was thinking of genalloc, but that might be understood as general, rather generational... Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-27 07:55:32 -0800, Andres Freund wrote: > On 2017-02-27 10:32:25 -0500, Tom Lane wrote: > > Andres Freundwrites: > > > And pushed slab and its usage. Will have a look at generation.c > > > tomorrow. > > > > Perhaps first you need to find out why so much of the buildfarm > > is unhappy. > > Will do, after a morning coffee. Hm. Not entirely clear on what's going on yet. I've run the tests on hydra (community ppc 64 machine), which is pretty similar to termite which failed [1] with: TRAP: BadArgument("!(((context) != ((void *)0) && (const Node*)((context)))->type) == T_AllocSetContext) || const Node*)((context)))->type) == T_SlabContext", File: "/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/utils/mmgr/mcxt.c", Line: 1010) The best theory I have so far that I have is that slab.c's idea of StandardChunkHeader's size doesn't match what mcxt.c think it is (because slab.c simply embeds StandardChunkHeader, but mcxt uses MAXALIGN(sizeof(StandardChunkHeader))). That's not good, but I don't quite see how that'd cause the issue, since StandardChunkHeader's size should always be properly sized. Tomas, do you have access to termite (which appears to be run by Craig, under company mail). If not, I can push a "blind" fix, but I'd rather have more information. Greetings, Andres Freund [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2017-02-27%2014%3A00%3A06 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-27 10:32:25 -0500, Tom Lane wrote: > Andres Freundwrites: > > And pushed slab and its usage. Will have a look at generation.c > > tomorrow. > > Perhaps first you need to find out why so much of the buildfarm > is unhappy. Will do, after a morning coffee. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Andres Freundwrites: > And pushed slab and its usage. Will have a look at generation.c > tomorrow. Perhaps first you need to find out why so much of the buildfarm is unhappy. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On February 27, 2017 6:14:20 AM PST, Tomas Vondrawrote: >On 02/27/2017 01:02 PM, Andres Freund wrote: >> Hi, >> >> On 2017-02-27 03:17:32 -0800, Andres Freund wrote: >>> I'll work on getting slab committed first, and then review / edit / >>> commit generation.c later. One first note there is that I'm >wondering >>> if generation.c is a too generic filename. >> >> And pushed slab and its usage. Will have a look at generation.c >> tomorrow. >> >> - Andres >> > >Gah. I don't want to annoy person who just committed my patch, but can >you give more time when asking for feedback? I mean, sending a modified > >patch on Friday midnight, and committing on Monday noon does not really > >give much time to look at it. Hm. The changes IMO weren't controversial (or surprising -most of them I had announced previously); I announced that I would when posting the review that I'd push the patch later that weekend. If I hadn't been tired after doing the review/editing I'd have just pushed right then and there. It's hard to find time and attention, so not introducing a week of feedback time is quite worthwhile. I listed the changes I made primarily for posterities sake. Most if not all committers make editorializing changed around commit, so that's not just me. If you specifically want I can try to give you more time to look at an edited patch, but that'll mean things move slower. I won't promise not to make minor changed just before commit either way, I always do another round of review just before push. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/27/2017 01:02 PM, Andres Freund wrote: Hi, On 2017-02-27 03:17:32 -0800, Andres Freund wrote: I'll work on getting slab committed first, and then review / edit / commit generation.c later. One first note there is that I'm wondering if generation.c is a too generic filename. And pushed slab and its usage. Will have a look at generation.c tomorrow. - Andres Gah. I don't want to annoy person who just committed my patch, but can you give more time when asking for feedback? I mean, sending a modified patch on Friday midnight, and committing on Monday noon does not really give much time to look at it. The changes seem fine to me, thanks for spending time on this. Thanks -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/27/2017 12:17 PM, Andres Freund wrote: Hi, On 2017-02-24 14:10:38 -0800, Andres Freund wrote: I've not yet looked a lot at the next type of context - I want to get this much committed first... I plan to give this another pass sometime this weekend and then push soon. Before committing I wanted to make sure that http://archives.postgresql.org/message-id/32354.1487977458%40sss.pgh.pa.us isn't a sufficient fix. With the test of N=100 from this thread I measured both runtime and memory usage (note that's peak virtual memory which includes 2GB of shared_buffers and such), in assert enabled builds. master: doesn't finish reasonably master+doubly linked list fix: 9390.805 ms VmPeak: 10,969,424 kb master+this thread: 6500.293 ms VmPeak: 2,969,528 kB So the doubly-linked-list fix is great (and much more backpatchable), but the patches in this thread are both better runtime *and* peak memory usage wise. So that seems like a clear call. Nice, thanks for doing the test. I've not yet reviewed the generational allocator yet, but during these measurements I get: postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', NULL, NULL); WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d011ef10f0 exceeds 7234 allocated LOCATION: GenerationCheck, generation.c:693 WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d01023eba0 exceeds 65532 allocated LOCATION: GenerationCheck, generation.c:693 WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00d7fb870 exceeds 65532 allocated LOCATION: GenerationCheck, generation.c:693 WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00cde17b0 exceeds 65531 allocated LOCATION: GenerationCheck, generation.c:693 that seems to occur when there's currently in-progress transactions when finishing decoding: ... could it be that the test's condition is inverted? Yeah, that seems like the culprit - the condition seems wrong. I wonder why I haven't seen it during my tests, though ... I'll work on getting slab committed first, and then review / edit / commit generation.c later. One first note there is that I'm wondering if generation.c is a too generic filename. Naming things is hard. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-27 03:17:32 -0800, Andres Freund wrote: > I'll work on getting slab committed first, and then review / edit / > commit generation.c later. One first note there is that I'm wondering > if generation.c is a too generic filename. And pushed slab and its usage. Will have a look at generation.c tomorrow. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-24 14:10:38 -0800, Andres Freund wrote: > I've not yet looked a lot at the next type of context - I want to get > this much committed first... > > I plan to give this another pass sometime this weekend and then push > soon. Before committing I wanted to make sure that http://archives.postgresql.org/message-id/32354.1487977458%40sss.pgh.pa.us isn't a sufficient fix. With the test of N=100 from this thread I measured both runtime and memory usage (note that's peak virtual memory which includes 2GB of shared_buffers and such), in assert enabled builds. master: doesn't finish reasonably master+doubly linked list fix: 9390.805 ms VmPeak: 10,969,424 kb master+this thread: 6500.293 ms VmPeak: 2,969,528 kB So the doubly-linked-list fix is great (and much more backpatchable), but the patches in this thread are both better runtime *and* peak memory usage wise. So that seems like a clear call. I've not yet reviewed the generational allocator yet, but during these measurements I get: postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', NULL, NULL); WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d011ef10f0 exceeds 7234 allocated LOCATION: GenerationCheck, generation.c:693 WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d01023eba0 exceeds 65532 allocated LOCATION: GenerationCheck, generation.c:693 WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00d7fb870 exceeds 65532 allocated LOCATION: GenerationCheck, generation.c:693 WARNING: 01000: problem in Generation Tuples: number of free chunks 0 in block 0x55d00cde17b0 exceeds 65531 allocated LOCATION: GenerationCheck, generation.c:693 that seems to occur when there's currently in-progress transactions when finishing decoding: #0 GenerationCheck (context=0x5629129407c8) at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:692 #1 0x5629105d92db in GenerationReset (context=0x5629129407c8) at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:255 #2 0x5629105d93c6 in GenerationDelete (context=0x5629129407c8) at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:287 #3 0x5629105e1e12 in MemoryContextDelete (context=0x5629129407c8) at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:225 #4 0x5629105e1ee3 in MemoryContextDeleteChildren (context=0x562912940008) at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:245 #5 0x5629105e1de0 in MemoryContextDelete (context=0x562912940008) at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:208 #6 0x5629103d5451 in ReorderBufferFree (rb=0x562912906320) at /home/andres/src/postgresql/src/backend/replication/logical/reorderbuffer.c:278 #7 0x5629103cea4f in FreeDecodingContext (ctx=0x562912904310) at /home/andres/src/postgresql/src/backend/replication/logical/logical.c:462 #8 0x5629103d03f0 in pg_logical_slot_get_changes_guts (fcinfo=0x7fffc2042e50, confirm=0 '\000', could it be that the test's condition is inverted? I'll work on getting slab committed first, and then review / edit / commit generation.c later. One first note there is that I'm wondering if generation.c is a too generic filename. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-21 01:43:46 +0100, Tomas Vondra wrote: > Attached is v9 of this patch series. This addresses most of the points > raised in the review, namely: Cool, thanks. > 3) get rid of the block-level bitmap tracking free chunks > > Instead of the bitmap, I've used a simple singly-linked list, using int32 > chunk indexes. Perhaps it could use the slist instead, but I'm not quite > sure MAXALIGN is guaranteed to be greater than pointer. I'm pretty sure it's guaranteed to be >= sizeof(void*). But this seems ok, so ... Attached are changes I made on the path to committing the patch. These look large enough that I wanted to give you a chance to comment: - The AllocFreeInfo changes aren't correct for 32bit systems with 64bit, longs (like linux, unlike windows) - using %zu is better (z is code for size_t sized) - Split off the reorderbuffer changes - Removed SlabBlock/SlabChunk / renamed SlabBlockData - +#ifdef MEMORY_CONTEXT_CHECKING +#define SLAB_CHUNK_USED \ + (offsetof(SlabChunkData, requested_size) + sizeof(Size)) +#else doesn't work anymore, because requested_size isn't a top-level field. I first redefined it as (without surrounding ifdef) #define SLAB_CHUNK_USED \ (offsetof(SlabChunkData, header) + sizeof(StandardChunkHeader)) but I'm not really sure there's a whole lot of point in the define rather than just using sizeof() on the whole thing - there's no trailing padding? - SLAB_CHUNK_PUBLIC and SLAB_BLOCKHDRSZ are unused? - renamed 'set' variables (and comments) to slab. - used castNode(SlabContext, context) instead of manual casts - I rephrased + * + * We cache indexes of the first empty chunk on each block (firstFreeChunk), + * and freelist index for blocks with least free chunks (minFreeChunks), so + * that we don't have to search the freelist and block on every SlabAlloc() + * call, which is quite expensive. so it's not referencing firstFreeChunk anymore, since that seems to make less sense now that firstFreeChunk is essentially just the head of the list of free chunks. - added typedefs.list entries and pgindented slab.c - "mark the chunk as unused (zero the bit)" referenced non-existant bitmap afaics. - valgrind was triggering on block->firstFreeChunk = *(int32 *) SlabChunkGetPointer(chunk); because that was previously marked as NOACCESS (via wipe_mem). Explicitly marking as DEFINED solves that. - removed * XXX Perhaps we should not be gentle at all and simply fails in all cases, * to eliminate the (mostly pointless) uncertainty. - you'd included MemoryContext tup_context; in 0002, but it's not really useful yet (and the comments above it in reorderbuffer.h don't apply) - SlabFreeInfo/SlabAllocInfo didn't actually compile w/ HAVE_ALLOCINFO defined (pre StandardChunkHeader def) - some minor stuff. I'm kinda inclined to drop SlabFreeInfo/SlabAllocInfo. I've not yet looked a lot at the next type of context - I want to get this much committed first... I plan to give this another pass sometime this weekend and then push soon. - Andres >From cec3f8372137d2392ff7ac0ab1b2db11fc96e8b3 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Thu, 23 Feb 2017 22:35:44 -0800 Subject: [PATCH 1/3] Make useful infrastructure from aset.c generally available. An upcoming patch introduces a new type of memory context. To avoid duplicating debugging infrastructure with aset.c move useful pieces to memdebug.[ch]. While touching aset.c, fix printf format AllocFree* debug macros. Author: Tomas Vondra Reviewed-By: Andres Freund Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc...@2ndquadrant.com --- src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/aset.c | 115 +- src/backend/utils/mmgr/memdebug.c | 93 ++ src/include/utils/memdebug.h | 48 4 files changed, 144 insertions(+), 114 deletions(-) create mode 100644 src/backend/utils/mmgr/memdebug.c diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile index 1842bae386..fc5f793b7f 100644 --- a/src/backend/utils/mmgr/Makefile +++ b/src/backend/utils/mmgr/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = aset.o dsa.o freepage.o mcxt.o portalmem.o +OBJS = aset.o dsa.o freepage.o mcxt.o memdebug.o portalmem.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 4dfc3ec260..8056c00ae4 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -41,46 +41,6 @@ * chunks as chunks. Anything "large" is passed off to malloc(). Change * the number of freelists to change the small/large boundary. * - * - * About CLOBBER_FREED_MEMORY: - * - * If this symbol is defined, all freed memory is overwritten with 0x7F's. - *
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, Attached is v9 of this patch series. This addresses most of the points raised in the review, namely: 1) change most 'debug' stuff to 'static inline' in memdebug.h 2) fixed and reworded a bunch of comments 3) get rid of the block-level bitmap tracking free chunks Instead of the bitmap, I've used a simple singly-linked list, using int32 chunk indexes. Perhaps it could use the slist instead, but I'm not quite sure MAXALIGN is guaranteed to be greater than pointer. In any case, this seems to be working reasonably well - it saves a bit of code (but also made some code slightly more complex). Also, it seems to be a tad faster than v8 - after repeating the same benchmark as before, I get these numbers: masterslab-v8slab-v9 - 1 50 28 25 5 17500180160 10 15380330 20 ?750670 Although the results are quite noisy. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >From 7c70a7bef4029dd7f10c7dc9ff0dd92a7bd2f966 Mon Sep 17 00:00:00 2001 From: Tomas VondraDate: Mon, 20 Feb 2017 20:16:16 +0100 Subject: [PATCH 1/3] move common bits to memdebug --- src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/aset.c | 115 +- src/backend/utils/mmgr/memdebug.c | 93 ++ src/include/utils/memdebug.h | 48 4 files changed, 144 insertions(+), 114 deletions(-) create mode 100644 src/backend/utils/mmgr/memdebug.c diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile index 1842bae..fc5f793 100644 --- a/src/backend/utils/mmgr/Makefile +++ b/src/backend/utils/mmgr/Makefile @@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global -OBJS = aset.o dsa.o freepage.o mcxt.o portalmem.o +OBJS = aset.o dsa.o freepage.o mcxt.o memdebug.o portalmem.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c index 4dfc3ec..33b4d01 100644 --- a/src/backend/utils/mmgr/aset.c +++ b/src/backend/utils/mmgr/aset.c @@ -41,46 +41,6 @@ * chunks as chunks. Anything "large" is passed off to malloc(). Change * the number of freelists to change the small/large boundary. * - * - * About CLOBBER_FREED_MEMORY: - * - * If this symbol is defined, all freed memory is overwritten with 0x7F's. - * This is useful for catching places that reference already-freed memory. - * - * About MEMORY_CONTEXT_CHECKING: - * - * Since we usually round request sizes up to the next power of 2, there - * is often some unused space immediately after a requested data area. - * Thus, if someone makes the common error of writing past what they've - * requested, the problem is likely to go unnoticed ... until the day when - * there *isn't* any wasted space, perhaps because of different memory - * alignment on a new platform, or some other effect. To catch this sort - * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond - * the requested space whenever the request is less than the actual chunk - * size, and verifies that the byte is undamaged when the chunk is freed. - * - * - * About USE_VALGRIND and Valgrind client requests: - * - * Valgrind provides "client request" macros that exchange information with - * the host Valgrind (if any). Under !USE_VALGRIND, memdebug.h stubs out - * currently-used macros. - * - * When running under Valgrind, we want a NOACCESS memory region both before - * and after the allocation. The chunk header is tempting as the preceding - * region, but mcxt.c expects to able to examine the standard chunk header - * fields. Therefore, we use, when available, the requested_size field and - * any subsequent padding. requested_size is made NOACCESS before returning - * a chunk pointer to a caller. However, to reduce client request traffic, - * it is kept DEFINED in chunks on the free list. - * - * The rounded-up capacity of the chunk usually acts as a post-allocation - * NOACCESS region. If the request consumes precisely the entire chunk, - * there is no such region; another chunk header may immediately follow. In - * that case, Valgrind will not detect access beyond the end of the chunk. - * - * See also the cooperating Valgrind client requests in mcxt.c. - * *- */ @@ -296,10 +256,10 @@ static const unsigned char LogTable256[256] = */ #ifdef HAVE_ALLOCINFO #define AllocFreeInfo(_cxt, _chunk) \ - fprintf(stderr, "AllocFree: %s: %p, %d\n", \ + fprintf(stderr, "AllocFree: %s: %p, %lu\n", \ (_cxt)->header.name, (_chunk), (_chunk)->size) #define
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/14/2017 03:22 AM, Andres Freund wrote: Hi, On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote: On 02/11/2017 02:33 AM, Andres Freund wrote: I have a hard time believing this the cache efficiency of linked lists (which may or may not be real in this case) out-weights this, but if you want to try, be my guest. I'm not following - why would there be overhead in anything for allocations bigger than 4 (or maybe 8) bytes? You can store the list (via chunk ids, not pointers) inside the chunks itself, where data otherwise would be. And I don't see why you'd need a doubly linked list, as the only operations that are needed are to push to the front of the list, and to pop from the front of the list - and both operations are simple to do with a singly linked list? Oh! I have not considered storing the chunk indexes (for linked lists) in the chunk itself, which obviously eliminates the overhead concerns, and you're right a singly-linked list should be enough. There will be some minimum-chunk-size requirement, depending on the block size/chunk size. I wonder whether it makes sense to try to be smart and make it dynamic, so that we only require 1B or 2B for cases when only that many chunks fit into a block, or just say that it's 4B and be done with it. I doubt it's worth it - it seems likely that the added branch is more noticeable overall than the possible savings of 3 bytes. Also, won't the space be lost due to alignment *anyway*? + /* chunk, including SLAB header (both addresses nicely aligned) */ + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize)); In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and use a plain slist - no point in being more careful than that. Hmm, I think you're right. I mean 2^32 chunks ought to be enough for anyone, right? Yea, that seems enough; but given the alignment thing pointed out above, I think we can just use plain pointers - and that definitely should be enough :P People in year 2078: Why the hell did they only use 32 bits? Wasn't it obvious we'll have tiny computers with 32EB of RAM? ;-) I'm still not buying the cache efficiency argument, though. One of the reasons is that the implementation prefers blocks with fewer free chunks when handling palloc(), so pfree() is making the block less likely to be chosen by the next palloc(). That'll possibly de-optimize L1, but for L2 usage the higher density seems like it'll be a win. All this memory is only accessed by a single backend, so packing as densely as possible is good. If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you end up with an array of 1024 doubly linked lists, which'll take up 64kb on its own. And there a certainly scenarios where even bigger block sizes could make sense. That's both memory overhead, and runtime overhead, because at reset-time we'll have to check the whole array (which'll presumably largely be empty lists). Resetting is a pretty common path... True, but it's not entirely clear if resetting is common for the paths where we use those new allocators. That's fair enough. There's still the memory overhead, but I guess we can also live with that. Right. My ambition was not to develop another general-purpose memory context that would work perfectly for everything, but something that works (better than the current code) for places like reorderbuffer. Also, if we accept that it might be a problem, what other solution you propose? I don't think just merging everything into a single list is a good idea, for the reasons I explained before (it might make the resets somewhat less expensive, but it'll make pfree() more expensive). >> Now that I think about it, a binary heap, as suggested elsewhere, isn't entirely trivial to use for this - it's more or less trivial to "fix" the heap after changing an element's value, but it's harder to find that element first. But a two-level list approach seems like it could work quite well - basically a simplified skip-list. A top-level list contains that has the property that all the elements have a distinct #free, and then hanging of those sub-lists for all the other blocks with the same number of chunks. I think that'd be a better implementation, but I can understand if you don't immediately want to go there. I don't want to go there. I'm not all that interested in reorderbuffer, to be honest, and this started more as "Hold my beer!" hack, after a midnight discussion with Petr, than a seriously meant patch. I've already spent like 100x time on it than I expected. What might work is replacing the array with a list, though. So we'd have a list of lists, which would eliminate the array overhead. That seems likely to be significantly worse, because a) iteration is more expensive b) accessing the relevant list to move between two different "freecount" lists would be O(n). Oh, right, I haven't realized we won't know the current head of the list,
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote: > On 02/11/2017 02:33 AM, Andres Freund wrote: > > > I have a hard time believing this the cache efficiency of linked lists > > > (which may or may not be real in this case) out-weights this, but if you > > > want to try, be my guest. > > > > I'm not following - why would there be overhead in anything for > > allocations bigger than 4 (or maybe 8) bytes? You can store the list > > (via chunk ids, not pointers) inside the chunks itself, where data > > otherwise would be. And I don't see why you'd need a doubly linked > > list, as the only operations that are needed are to push to the front of > > the list, and to pop from the front of the list - and both operations > > are simple to do with a singly linked list? > > > > Oh! I have not considered storing the chunk indexes (for linked lists) in > the chunk itself, which obviously eliminates the overhead concerns, and > you're right a singly-linked list should be enough. > > There will be some minimum-chunk-size requirement, depending on the block > size/chunk size. I wonder whether it makes sense to try to be smart and make > it dynamic, so that we only require 1B or 2B for cases when only that many > chunks fit into a block, or just say that it's 4B and be done with it. I doubt it's worth it - it seems likely that the added branch is more noticeable overall than the possible savings of 3 bytes. Also, won't the space be lost due to alignment *anyway*? + /* chunk, including SLAB header (both addresses nicely aligned) */ + fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize)); In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and use a plain slist - no point in being more careful than that. > I mean 2^32 chunks ought to be enough for anyone, right? Yea, that seems enough; but given the alignment thing pointed out above, I think we can just use plain pointers - and that definitely should be enough :P > I'm still not buying the cache efficiency argument, though. One of the > reasons is that the implementation prefers blocks with fewer free chunks > when handling palloc(), so pfree() is making the block less likely to be > chosen by the next palloc(). That'll possibly de-optimize L1, but for L2 usage the higher density seems like it'll be a win. All this memory is only accessed by a single backend, so packing as densely as possible is good. > > If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you > > end up with an array of 1024 doubly linked lists, which'll take up 64kb > > on its own. And there a certainly scenarios where even bigger block > > sizes could make sense. That's both memory overhead, and runtime > > overhead, because at reset-time we'll have to check the whole array > > (which'll presumably largely be empty lists). Resetting is a pretty > > common path... > > > > True, but it's not entirely clear if resetting is common for the paths where > we use those new allocators. That's fair enough. There's still the memory overhead, but I guess we can also live with that. > Also, if we accept that it might be a problem, what other solution you > propose? I don't think just merging everything into a single list is a good > idea, for the reasons I explained before (it might make the resets somewhat > less expensive, but it'll make pfree() more expensive). Now that I think about it, a binary heap, as suggested elsewhere, isn't entirely trivial to use for this - it's more or less trivial to "fix" the heap after changing an element's value, but it's harder to find that element first. But a two-level list approach seems like it could work quite well - basically a simplified skip-list. A top-level list contains that has the property that all the elements have a distinct #free, and then hanging of those sub-lists for all the other blocks with the same number of chunks. I think that'd be a better implementation, but I can understand if you don't immediately want to go there. > What might work is replacing the array with a list, though. So we'd have a > list of lists, which would eliminate the array overhead. That seems likely to be significantly worse, because a) iteration is more expensive b) accessing the relevant list to move between two different "freecount" lists would be O(n). - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/11/2017 02:33 AM, Andres Freund wrote: On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote: On 02/09/2017 10:37 PM, Andres Freund wrote: Hi, On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote: src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/aset.c | 115 + src/backend/utils/mmgr/memdebug.c | 131 ++ src/include/utils/memdebug.h | 22 +++ 4 files changed, 156 insertions(+), 114 deletions(-) create mode 100644 src/backend/utils/mmgr/memdebug.c I'm a bit loathe to move these to a .c file - won't this likely make these debugging tools even slower? Seems better to put some of them them in a header as static inlines (not randomize, but the rest). Do you have any numbers to support that? AFAICS compilers got really good in inlining stuff on their own. Unless you use LTO, they can't inline across translation units. And using LTO is slow enough for linking that it's not that much fun to use, as it makes compile-edit-compile cycles essentially take as long as a full rebuild. From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001 From: Tomas VondraDate: Wed, 30 Nov 2016 15:36:23 +0100 Subject: [PATCH 2/3] slab allocator --- src/backend/replication/logical/reorderbuffer.c | 82 +-- src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/slab.c | 803 src/include/nodes/memnodes.h| 2 +- src/include/nodes/nodes.h | 1 + src/include/replication/reorderbuffer.h | 15 +- src/include/utils/memutils.h| 9 + I'd like to see the reorderbuffer changes split into a separate commit from the slab allocator introduction. I rather dislike patches that only add a bunch of code, without actually using it anywhere. But if needed, this is trivial to do at commit time - just don't commit the reorderbuffer bits. Meh. + * Each block includes a simple bitmap tracking which chunks are used/free. + * This makes it trivial to check if all chunks on the block are free, and + * eventually free the whole block (which is almost impossible with a global + * freelist of chunks, storing chunks from all blocks). Why is checking a potentially somewhat long-ish bitmap better than a simple counter, or a "linked list" of "next free chunk-number" or such (where free chunks simply contain the id of the subsequent chunk)? Using a list instead of a bitmap would also make it possible to get 'lifo' behaviour, which is good for cache efficiency. A simple chunk-number based singly linked list would only imply a minimum allocation size of 4 - that seems perfectly reasonable? A block-level counter would be enough to decide if all chunks on the block are free, but it's not sufficient to identify which chunks are free / available for reuse. The bitmap only has a single bit per chunk, so I find "potentially long-ish" is a bit misleading. Any linked list implementation will require much more per-chunk overhead - as the chunks are fixed-legth, it's possible to use chunk index (instead of 64-bit pointers), to save some space. But with large blocks / small chunks that's still at least 2 or 4 bytes per index, and you'll need two (to implement doubly-linked list, to make add/remove efficient). For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap that's 16B to track all free space on the block. Doubly linked list would require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B. I have a hard time believing this the cache efficiency of linked lists (which may or may not be real in this case) out-weights this, but if you want to try, be my guest. I'm not following - why would there be overhead in anything for allocations bigger than 4 (or maybe 8) bytes? You can store the list (via chunk ids, not pointers) inside the chunks itself, where data otherwise would be. And I don't see why you'd need a doubly linked list, as the only operations that are needed are to push to the front of the list, and to pop from the front of the list - and both operations are simple to do with a singly linked list? Oh! I have not considered storing the chunk indexes (for linked lists) in the chunk itself, which obviously eliminates the overhead concerns, and you're right a singly-linked list should be enough. There will be some minimum-chunk-size requirement, depending on the block size/chunk size. I wonder whether it makes sense to try to be smart and make it dynamic, so that we only require 1B or 2B for cases when only that many chunks fit into a block, or just say that it's 4B and be done with it. I mean 2^32 chunks ought to be enough for anyone, right? I'm still not buying the cache efficiency argument, though. One of the reasons is that the implementation prefers blocks with fewer free chunks when
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote: > > > + /* move the whole block to the right place in the freelist */ > > > + dlist_delete(>node); > > > + dlist_push_head(>freelist[block->nfree], >node); > > > > Hm. What if we, instead of the array of doubly linked lists, just kept > > a single linked list of blocks, and keep that list sorted by number of > > free chunks? Given that freeing / allocation never changes the number > > of allocated chunks by more than 1, we'll never have to move an entry > > far in that list to keep it sorted. > > > > Only assuming that there'll be only few blocks with the same number of free > chunks. If that's not the case, you'll have to walk many blocks to move the > block to the right place in the list. The array of lists handles such cases > way more efficiently, and I think we should keep it. The proper datastructure would probably be a heap. Right now binaryheap.h is fixed-size - probably not too hard to change. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote: > On 02/09/2017 10:37 PM, Andres Freund wrote: > > Hi, > > > > On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote: > > > src/backend/utils/mmgr/Makefile | 2 +- > > > src/backend/utils/mmgr/aset.c | 115 + > > > src/backend/utils/mmgr/memdebug.c | 131 > > > ++ > > > src/include/utils/memdebug.h | 22 +++ > > > 4 files changed, 156 insertions(+), 114 deletions(-) > > > create mode 100644 src/backend/utils/mmgr/memdebug.c > > > > I'm a bit loathe to move these to a .c file - won't this likely make > > these debugging tools even slower? Seems better to put some of them > > them in a header as static inlines (not randomize, but the rest). > > > > Do you have any numbers to support that? AFAICS compilers got really good in > inlining stuff on their own. Unless you use LTO, they can't inline across translation units. And using LTO is slow enough for linking that it's not that much fun to use, as it makes compile-edit-compile cycles essentially take as long as a full rebuild. > > > From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001 > > > From: Tomas Vondra> > > Date: Wed, 30 Nov 2016 15:36:23 +0100 > > > Subject: [PATCH 2/3] slab allocator > > > > > > --- > > > src/backend/replication/logical/reorderbuffer.c | 82 +-- > > > src/backend/utils/mmgr/Makefile | 2 +- > > > src/backend/utils/mmgr/slab.c | 803 > > > > > > src/include/nodes/memnodes.h| 2 +- > > > src/include/nodes/nodes.h | 1 + > > > src/include/replication/reorderbuffer.h | 15 +- > > > src/include/utils/memutils.h| 9 + > > > > I'd like to see the reorderbuffer changes split into a separate commit > > from the slab allocator introduction. > > > > I rather dislike patches that only add a bunch of code, without actually > using it anywhere. > But if needed, this is trivial to do at commit time - just don't > commit the reorderbuffer bits. Meh. > > > + * Each block includes a simple bitmap tracking which chunks are > > > used/free. > > > + * This makes it trivial to check if all chunks on the block are > > > free, and > > > + * eventually free the whole block (which is almost impossible > > > with a global > > > + * freelist of chunks, storing chunks from all blocks). > > > > Why is checking a potentially somewhat long-ish bitmap better than a > > simple counter, or a "linked list" of "next free chunk-number" or such > > (where free chunks simply contain the id of the subsequent chunk)? > > Using a list instead of a bitmap would also make it possible to get > > 'lifo' behaviour, which is good for cache efficiency. A simple > > chunk-number based singly linked list would only imply a minimum > > allocation size of 4 - that seems perfectly reasonable? > > > > A block-level counter would be enough to decide if all chunks on the block > are free, but it's not sufficient to identify which chunks are free / > available for reuse. > > The bitmap only has a single bit per chunk, so I find "potentially long-ish" > is a bit misleading. Any linked list implementation will require much more > per-chunk overhead - as the chunks are fixed-legth, it's possible to use > chunk index (instead of 64-bit pointers), to save some space. But with large > blocks / small chunks that's still at least 2 or 4 bytes per index, and > you'll need two (to implement doubly-linked list, to make add/remove > efficient). > For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap > that's 16B to track all free space on the block. Doubly linked list would > require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B. > I have a hard time believing this the cache efficiency of linked lists > (which may or may not be real in this case) out-weights this, but if you > want to try, be my guest. I'm not following - why would there be overhead in anything for allocations bigger than 4 (or maybe 8) bytes? You can store the list (via chunk ids, not pointers) inside the chunks itself, where data otherwise would be. And I don't see why you'd need a doubly linked list, as the only operations that are needed are to push to the front of the list, and to pop from the front of the list - and both operations are simple to do with a singly linked list? > > Thirdly, isn't that approach going to result in a quite long freelists > > array, when you have small items and a decent blocksize? That seems like > > a fairly reasonable thing to do? > > > > I'm confused. Why wouldn't that be reasonable. Or rather, what would be a > more reasonable way? If I understood correctly, you have one an array of doubly linked lists. A block is stored in the list at the index #block's-free-elements. Is that right? If so, if
Re: [HACKERS] PATCH: two slab-like memory allocators
On 02/09/2017 10:37 PM, Andres Freund wrote: Hi, On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote: src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/aset.c | 115 + src/backend/utils/mmgr/memdebug.c | 131 ++ src/include/utils/memdebug.h | 22 +++ 4 files changed, 156 insertions(+), 114 deletions(-) create mode 100644 src/backend/utils/mmgr/memdebug.c I'm a bit loathe to move these to a .c file - won't this likely make these debugging tools even slower? Seems better to put some of them them in a header as static inlines (not randomize, but the rest). Do you have any numbers to support that? AFAICS compilers got really good in inlining stuff on their own. From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001 From: Tomas VondraDate: Wed, 30 Nov 2016 15:36:23 +0100 Subject: [PATCH 2/3] slab allocator --- src/backend/replication/logical/reorderbuffer.c | 82 +-- src/backend/utils/mmgr/Makefile | 2 +- src/backend/utils/mmgr/slab.c | 803 src/include/nodes/memnodes.h| 2 +- src/include/nodes/nodes.h | 1 + src/include/replication/reorderbuffer.h | 15 +- src/include/utils/memutils.h| 9 + I'd like to see the reorderbuffer changes split into a separate commit from the slab allocator introduction. I rather dislike patches that only add a bunch of code, without actually using it anywhere. But if needed, this is trivial to do at commit time - just don't commit the reorderbuffer bits. +/*- + * + * slab.c + * SLAB allocator definitions. + * + * SLAB is a custom MemoryContext implementation designed for cases of + * equally-sized objects. + * + * + * Portions Copyright (c) 2016, PostgreSQL Global Development Group Bump, before a committer forgets it. OK. + * IDENTIFICATION + * src/backend/utils/mmgr/slab.c + * + * + * The constant allocation size allows significant simplification and various + * optimizations. Firstly, we can get rid of the doubling and carve the blocks + * into chunks of exactly the right size (plus alignment), not wasting memory. Getting rid of it relative to what? I'd try to phrase it so these comments stand on their own. OK, rill reword. + * Each block includes a simple bitmap tracking which chunks are used/free. + * This makes it trivial to check if all chunks on the block are free, and + * eventually free the whole block (which is almost impossible with a global + * freelist of chunks, storing chunks from all blocks). Why is checking a potentially somewhat long-ish bitmap better than a simple counter, or a "linked list" of "next free chunk-number" or such (where free chunks simply contain the id of the subsequent chunk)? Using a list instead of a bitmap would also make it possible to get 'lifo' behaviour, which is good for cache efficiency. A simple chunk-number based singly linked list would only imply a minimum allocation size of 4 - that seems perfectly reasonable? A block-level counter would be enough to decide if all chunks on the block are free, but it's not sufficient to identify which chunks are free / available for reuse. The bitmap only has a single bit per chunk, so I find "potentially long-ish" is a bit misleading. Any linked list implementation will require much more per-chunk overhead - as the chunks are fixed-legth, it's possible to use chunk index (instead of 64-bit pointers), to save some space. But with large blocks / small chunks that's still at least 2 or 4 bytes per index, and you'll need two (to implement doubly-linked list, to make add/remove efficient). For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap that's 16B to track all free space on the block. Doubly linked list would require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B. I have a hard time believing this the cache efficiency of linked lists (which may or may not be real in this case) out-weights this, but if you want to try, be my guest. + * At the context level, we use 'freelist' to track blocks ordered by number + * of free chunks, starting with blocks having a single allocated chunk, and + * with completely full blocks on the tail. Why that way round? Filling chunks up as much as possible is good for cache and TLB efficiency, and allows for earlier de-allocation of partially used blocks? Oh, I see you do that in the next comment, but it still leaves me wondering. Also, is this actually a list? It's more an array of lists, right? I.e. it should be named freelists? Possibly. Naming things is hard. > Thirdly, isn't that approach going to result in a quite long freelists array, when
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote: > src/backend/utils/mmgr/Makefile | 2 +- > src/backend/utils/mmgr/aset.c | 115 + > src/backend/utils/mmgr/memdebug.c | 131 > ++ > src/include/utils/memdebug.h | 22 +++ > 4 files changed, 156 insertions(+), 114 deletions(-) > create mode 100644 src/backend/utils/mmgr/memdebug.c I'm a bit loathe to move these to a .c file - won't this likely make these debugging tools even slower? Seems better to put some of them them in a header as static inlines (not randomize, but the rest). > From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001 > From: Tomas Vondra> Date: Wed, 30 Nov 2016 15:36:23 +0100 > Subject: [PATCH 2/3] slab allocator > > --- > src/backend/replication/logical/reorderbuffer.c | 82 +-- > src/backend/utils/mmgr/Makefile | 2 +- > src/backend/utils/mmgr/slab.c | 803 > > src/include/nodes/memnodes.h| 2 +- > src/include/nodes/nodes.h | 1 + > src/include/replication/reorderbuffer.h | 15 +- > src/include/utils/memutils.h| 9 + I'd like to see the reorderbuffer changes split into a separate commit from the slab allocator introduction. > +/*- > + * > + * slab.c > + * SLAB allocator definitions. > + * > + * SLAB is a custom MemoryContext implementation designed for cases of > + * equally-sized objects. > + * > + * > + * Portions Copyright (c) 2016, PostgreSQL Global Development Group Bump, before a committer forgets it. > + * IDENTIFICATION > + * src/backend/utils/mmgr/slab.c > + * > + * > + * The constant allocation size allows significant simplification and > various > + * optimizations. Firstly, we can get rid of the doubling and carve the > blocks > + * into chunks of exactly the right size (plus alignment), not wasting > memory. Getting rid of it relative to what? I'd try to phrase it so these comments stand on their own. > + * Each block includes a simple bitmap tracking which chunks are used/free. > + * This makes it trivial to check if all chunks on the block are free, and > + * eventually free the whole block (which is almost impossible with a > global > + * freelist of chunks, storing chunks from all blocks). Why is checking a potentially somewhat long-ish bitmap better than a simple counter, or a "linked list" of "next free chunk-number" or such (where free chunks simply contain the id of the subsequent chunk)? Using a list instead of a bitmap would also make it possible to get 'lifo' behaviour, which is good for cache efficiency. A simple chunk-number based singly linked list would only imply a minimum allocation size of 4 - that seems perfectly reasonable? > + * At the context level, we use 'freelist' to track blocks ordered by > number > + * of free chunks, starting with blocks having a single allocated chunk, > and > + * with completely full blocks on the tail. Why that way round? Filling chunks up as much as possible is good for cache and TLB efficiency, and allows for earlier de-allocation of partially used blocks? Oh, I see you do that in the next comment, but it still leaves me wondering. Also, is this actually a list? It's more an array of lists, right? I.e. it should be named freelists? Thirdly, isn't that approach going to result in a quite long freelists array, when you have small items and a decent blocksize? That seems like a fairly reasonable thing to do? > + * This also allows various optimizations - for example when searching for > + * free chunk, we the allocator reuses space from the most full blocks > first, > + * in the hope that some of the less full blocks will get completely empty > + * (and returned back to the OS). Might be worth mentioning tlb/cache efficiency too. > + * For each block, we maintain pointer to the first free chunk - this is > quite > + * cheap and allows us to skip all the preceding used chunks, eliminating > + * a significant number of lookups in many common usage patters. In the > worst > + * case this performs as if the pointer was not maintained. Hm, so that'd be eliminated if we maintained a linked list of chunks (by chunk number) and a free_chunk_cnt or such. > + > +#include "postgres.h" > + > +#include "utils/memdebug.h" > +#include "utils/memutils.h" > +#include "lib/ilist.h" Move ilist up, above memdebug, so the list is alphabetically ordered. > +/* > + * SlabPointer > + * Aligned pointer which may be a member of an allocation set. > + */ > +typedef void *SlabPointer; > +typedef SlabContext *Slab; I personally wont commit this whith pointer hiding typedefs. If somebody else does, I can live with it, but for me it's bad enough taste that I wont. >
Re: [HACKERS] PATCH: two slab-like memory allocators
On Tue, Dec 13, 2016 at 10:32 AM, Petr Jelinekwrote: > Okay, this version looks good to me, marked as RfC. The patches still apply, moved to CF 2017-03 with same status: RfC. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 13/12/16 01:45, Tomas Vondra wrote: > On 12/12/2016 11:39 PM, Tomas Vondra wrote: >> On 12/12/2016 05:05 AM, Petr Jelinek wrote: >>> >>> I'd be happy with this patch now (as in committer ready) except that it >>> does have some merge conflicts after the recent commits, so rebase is >>> needed. >>> >> >> Attached is a rebased version of the patch, resolving the Makefile merge >> conflicts. >> > > Meh, managed to rebase a wrong branch, missing fix to the off-by-one > error (fixed v6). Attached is v8, hopefully the correct one. > Okay, this version looks good to me, marked as RfC. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 12/12/2016 11:39 PM, Tomas Vondra wrote: On 12/12/2016 05:05 AM, Petr Jelinek wrote: I'd be happy with this patch now (as in committer ready) except that it does have some merge conflicts after the recent commits, so rebase is needed. Attached is a rebased version of the patch, resolving the Makefile merge conflicts. Meh, managed to rebase a wrong branch, missing fix to the off-by-one error (fixed v6). Attached is v8, hopefully the correct one. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-move-common-bits-to-memdebug-v8.patch Description: binary/octet-stream 0002-slab-allocator-v8.patch Description: binary/octet-stream 0003-generational-context-v8.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 12/12/2016 05:05 AM, Petr Jelinek wrote: I'd be happy with this patch now (as in committer ready) except that it does have some merge conflicts after the recent commits, so rebase is needed. Attached is a rebased version of the patch, resolving the Makefile merge conflicts. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v7.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 01/12/16 03:26, Tomas Vondra wrote: > > Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a): >> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote: >>> On 27/11/16 21:47, Andres Freund wrote: Hi, >> +typedef struct SlabBlockData *SlabBlock;/* forward >> reference */ >> +typedef struct SlabChunkData *SlabChunk; >> >> Can we please not continue hiding pointers behind typedefs? It's a >> bad >> pattern, and that it's fairly widely used isn't a good excuse to >> introduce further usages of it. >> > > Why is it a bad pattern? It hides what is passed by reference, and what by value, and it makes it a guessing game whether you need -> or . since you don't know whether it's a pointer or the actual object. All to save a * in parameter and variable declaration?... >>> >>> FWIW I don't like that pattern either although it's used in many >>> parts of our code-base. >> >> But relatively few new ones, most of it is pretty old. >> > > I do agree it's not particularly pretty pattern, but in this case it's > fairly isolated in the mmgr sources, and I quite value the consistency > in this part of the code (i.e. that aset.c, slab.c and generation.c all > use the same approach). So I haven't changed this. > > The attached v7 fixes the off-by-one error in slab.c, causing failures > in test_decoding isolation tests, and renames Gen to Generation, as > proposed by Petr. > I'd be happy with this patch now (as in committer ready) except that it does have some merge conflicts after the recent commits, so rebase is needed. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Thu, Dec 1, 2016 at 1:26 PM, Tomas Vondrawrote: > > > Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a): > >> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote: >> >>> On 27/11/16 21:47, Andres Freund wrote: >>> Hi, +typedef struct SlabBlockData *SlabBlock; /* forward >> reference */ >> +typedef struct SlabChunkData *SlabChunk; >> >> Can we please not continue hiding pointers behind typedefs? It's a bad >> pattern, and that it's fairly widely used isn't a good excuse to >> introduce further usages of it. >> >> > Why is it a bad pattern? > It hides what is passed by reference, and what by value, and it makes it a guessing game whether you need -> or . since you don't know whether it's a pointer or the actual object. All to save a * in parameter and variable declaration?... >>> FWIW I don't like that pattern either although it's used in many >>> parts of our code-base. >>> >> >> But relatively few new ones, most of it is pretty old. >> >> > I do agree it's not particularly pretty pattern, but in this case it's > fairly isolated in the mmgr sources, and I quite value the consistency in > this part of the code (i.e. that aset.c, slab.c and generation.c all use > the same approach). So I haven't changed this. > > The attached v7 fixes the off-by-one error in slab.c, causing failures in > test_decoding isolation tests, and renames Gen to Generation, as proposed > by Petr. > Moved to next CF with same status (needs review). Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] PATCH: two slab-like memory allocators
Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a): On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote: On 27/11/16 21:47, Andres Freund wrote: Hi, +typedef struct SlabBlockData *SlabBlock; /* forward reference */ +typedef struct SlabChunkData *SlabChunk; Can we please not continue hiding pointers behind typedefs? It's a bad pattern, and that it's fairly widely used isn't a good excuse to introduce further usages of it. Why is it a bad pattern? It hides what is passed by reference, and what by value, and it makes it a guessing game whether you need -> or . since you don't know whether it's a pointer or the actual object. All to save a * in parameter and variable declaration?... FWIW I don't like that pattern either although it's used in many parts of our code-base. But relatively few new ones, most of it is pretty old. I do agree it's not particularly pretty pattern, but in this case it's fairly isolated in the mmgr sources, and I quite value the consistency in this part of the code (i.e. that aset.c, slab.c and generation.c all use the same approach). So I haven't changed this. The attached v7 fixes the off-by-one error in slab.c, causing failures in test_decoding isolation tests, and renames Gen to Generation, as proposed by Petr. regards Tomas slab-allocators-v7.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote: > On 27/11/16 21:47, Andres Freund wrote: > > Hi, > > > >>> +typedef struct SlabBlockData *SlabBlock; /* forward reference */ > >>> +typedef struct SlabChunkData *SlabChunk; > >>> > >>> Can we please not continue hiding pointers behind typedefs? It's a bad > >>> pattern, and that it's fairly widely used isn't a good excuse to > >>> introduce further usages of it. > >>> > >> > >> Why is it a bad pattern? > > > > It hides what is passed by reference, and what by value, and it makes it > > a guessing game whether you need -> or . since you don't know whether > > it's a pointer or the actual object. All to save a * in parameter and > > variable declaration?... > > > > FWIW I don't like that pattern either although it's used in many parts > of our code-base. But relatively few new ones, most of it is pretty old. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 27/11/16 21:47, Andres Freund wrote: > Hi, > >>> +typedef struct SlabBlockData *SlabBlock; /* forward reference */ >>> +typedef struct SlabChunkData *SlabChunk; >>> >>> Can we please not continue hiding pointers behind typedefs? It's a bad >>> pattern, and that it's fairly widely used isn't a good excuse to >>> introduce further usages of it. >>> >> >> Why is it a bad pattern? > > It hides what is passed by reference, and what by value, and it makes it > a guessing game whether you need -> or . since you don't know whether > it's a pointer or the actual object. All to save a * in parameter and > variable declaration?... > FWIW I don't like that pattern either although it's used in many parts of our code-base. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, > > +typedef struct SlabBlockData *SlabBlock; /* forward reference */ > > +typedef struct SlabChunkData *SlabChunk; > > > > Can we please not continue hiding pointers behind typedefs? It's a bad > > pattern, and that it's fairly widely used isn't a good excuse to > > introduce further usages of it. > > > > Why is it a bad pattern? It hides what is passed by reference, and what by value, and it makes it a guessing game whether you need -> or . since you don't know whether it's a pointer or the actual object. All to save a * in parameter and variable declaration?... Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 11/27/2016 07:25 PM, Petr Jelinek wrote: On 15/11/16 01:44, Tomas Vondra wrote: Attached is v6 of the patch series, fixing most of the points: * common bits (valgrind/randomization/wipe) moved to memdebug.h/c Instead of introducing a new header file, I've added the prototypes to memdebug.h (which was already used for the valgrind stuff anyway), and the implementations to a new memdebug.c file. Not sure what you meant by "static inlines" though. I think Andres wanted to put the implementation to the static inline functions directly in the header (see parts of pg_list or how atomics work), but I guess you way works too. I see. Well turning that into static inlines just like in pg_list is possible. I guess the main reason is performance - for pg_list that probably makes sense, but the memory randomization/valgrind stuff is only ever used for debugging, which already does a lot of expensive stuff anyway. I've however kept SlabContext->freelist as an array, because there may be many blocks with the same number of free chunks, in which case moving the block in the list would be expensive. This way it's simply dlist_delete + dlist_push. +1 I get mxact isolation test failures in test_decoding with this version of patch: step s0w: INSERT INTO do_write DEFAULT VALUES; + WARNING: problem in slab TXN: number of free chunks 33 in block 0x22beba0 does not match bitmap 34 step s0start: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false'); data and step s0alter: ALTER TABLE do_write ADD column ts timestamptz; step s0w: INSERT INTO do_write DEFAULT VALUES; + WARNING: problem in slab TXN: number of free chunks 33 in block 0x227c3f0 does not match bitmap 34 step s0start: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false'); data D'oh! I believe this is a simple thinko in SlabCheck, which iterates over chunks like this: for (j = 0; j <= slab->chunksPerBlock; j++) ... which is of course off-by-one error (and the 33 vs. 34 error message is consistent with this theory). Also, let's just rename the Gen to Generation. OK. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 15/11/16 01:44, Tomas Vondra wrote: > Attached is v6 of the patch series, fixing most of the points: > > * common bits (valgrind/randomization/wipe) moved to memdebug.h/c > > Instead of introducing a new header file, I've added the prototypes to > memdebug.h (which was already used for the valgrind stuff anyway), and > the implementations to a new memdebug.c file. Not sure what you meant by > "static inlines" though. I think Andres wanted to put the implementation to the static inline functions directly in the header (see parts of pg_list or how atomics work), but I guess you way works too. > > I've however kept SlabContext->freelist as an array, because there may > be many blocks with the same number of free chunks, in which case moving > the block in the list would be expensive. This way it's simply > dlist_delete + dlist_push. > +1 I get mxact isolation test failures in test_decoding with this version of patch: step s0w: INSERT INTO do_write DEFAULT VALUES; + WARNING: problem in slab TXN: number of free chunks 33 in block 0x22beba0 does not match bitmap 34 step s0start: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false'); data and step s0alter: ALTER TABLE do_write ADD column ts timestamptz; step s0w: INSERT INTO do_write DEFAULT VALUES; + WARNING: problem in slab TXN: number of free chunks 33 in block 0x227c3f0 does not match bitmap 34 step s0start: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', 'false'); data Also, let's just rename the Gen to Generation. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Attached is v6 of the patch series, fixing most of the points: * common bits (valgrind/randomization/wipe) moved to memdebug.h/c Instead of introducing a new header file, I've added the prototypes to memdebug.h (which was already used for the valgrind stuff anyway), and the implementations to a new memdebug.c file. Not sure what you meant by "static inlines" though. So the patch series now has three parts - 0001 with memdebug stuff, 0002 with slab and 0003 with gen (still a poor name). * removing AllocSet references from both new memory contexts * using FLEXIBLE_ARRAY_ELEMENT in SlabContext * using dlist instead of the custom linked list I've however kept SlabContext->freelist as an array, because there may be many blocks with the same number of free chunks, in which case moving the block in the list would be expensive. This way it's simply dlist_delete + dlist_push. * use StandardChunkHeader instead of the common fields * removing pointer to context from block header for both contexts * fix format in FreeInfo/AllocInfo (including for AllocSet) * improved a bunch of comments (bitmap size, chunksPerBlock formula) * did a pgindent run on the patch * implement the missing methods in Gen (Stats/Check) * fix a few minor bugs in both contexts I haven't done anything with hiding pointers behind typedefs, because I don't know what's so wrong about that. I also haven't done anything with the bitmap access in SlabAlloc - I haven't found any reasonable case when it would be measurable, and I don't expect this to be even measurable in practice. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v6.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 11/12/2016 08:12 PM, Andres Freund wrote: Hi, Subject: [PATCH 1/2] slab allocator diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 6ad7e7d..520f295 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c I'd rather have that in a separate followup commit... + * IDENTIFICATION + * src/backend/utils/mmgr/slab.c + * + * + * The constant allocation size allows significant simplification and various + * optimizations that are not possible in AllocSet. Firstly, we can get rid + * of the doubling and carve the blocks into chunks of exactly the right size + * (plus alignment), not wasting memory. References to AllocSet aren't necessarily a good idea, they'll quite possibly get out of date. The argument can be quite easily be made without referring to a concrete reference to behaviour elsewhere. Yeah, that's probably true. + * + * At the context level, we use 'freelist' array to track blocks grouped by + * number of free chunks. For example freelist[0] is a list of completely full + * blocks, freelist[1] is a block with a single free chunk, etc. Hm. Those arrays are going to be quite large for small allocations w/ big blocks (an imo sensible combination). Maybe it'd make more sense to model it as a linked list of blocks? Full blocks are at one end, empty ones at the other? So there'd be one huge list of blocks, sorted by the number of empty chunks? Hmm, that might work I guess. I don't think the combination of large blocks with small allocations is particularly sensible, though - what exactly would be the benefit of such combination? I would even consider enforcing some upper limit on the number of chunks per block - say, 256, for example. + * About MEMORY_CONTEXT_CHECKING: + * + * Since we usually round request sizes up to the next power of 2, there + * is often some unused space immediately after a requested data area. I assume the "round up" stuff is copy-paste? Yeah, sorry about that. + * Thus, if someone makes the common error of writing past what they've + * requested, the problem is likely to go unnoticed ... until the day when + * there *isn't* any wasted space, perhaps because of different memory + * ... + * + * See also the cooperating Valgrind client requests in mcxt.c. I think we need a preliminary patch moving a lot of this into something like mcxt_internal.h. Copying this comment, and a lot of the utility functions, into every memory context implementation is a bad pattern. Yes, I planned to do that for the next version of patch. Laziness. +typedef struct SlabBlockData *SlabBlock; /* forward reference */ +typedef struct SlabChunkData *SlabChunk; Can we please not continue hiding pointers behind typedefs? It's a bad pattern, and that it's fairly widely used isn't a good excuse to introduce further usages of it. Why is it a bad pattern? +/* + * SlabContext is a specialized implementation of MemoryContext. + */ +typedef struct SlabContext +{ + MemoryContextData header; /* Standard memory-context fields */ + /* Allocation parameters for this context: */ + SizechunkSize; /* chunk size */ + SizefullChunkSize; /* chunk size including header and alignment */ + SizeblockSize; /* block size */ + int chunksPerBlock; /* number of chunks per block */ + int minFreeChunks; /* min number of free chunks in any block */ + int nblocks;/* number of blocks allocated */ + /* Info about storage allocated in this context: */ + SlabBlock freelist[1];/* free lists (block-level) */ I assume this is a variable-length array? If so, that a) should be documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so actually will cause compiler warnings and potential misoptimizations. Will fix, thanks. +/* + * SlabBlockData + * Structure of a single block in SLAB allocator. + * + * slab: context owning this block What do we need this for? You're right the pointer to the owning context is unnecessary - there's nothing like "standard block header" and we already have the pointer in the standard chunk header. But maybe keeping the pointer at least with MEMORY_CONTEXT_CHECKING would be a good idea? + * prev, next: used for doubly-linked list of blocks in global freelist I'd prefer using an embedded list here (cf. ilist.h). Makes sense. +/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + * However it's possible to fields in front of the StandardChunkHeader fields, + * which is used to add pointer to the block. + */
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, Subject: [PATCH 1/2] slab allocator diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index 6ad7e7d..520f295 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c I'd rather have that in a separate followup commit... + * IDENTIFICATION + * src/backend/utils/mmgr/slab.c + * + * + * The constant allocation size allows significant simplification and various + * optimizations that are not possible in AllocSet. Firstly, we can get rid + * of the doubling and carve the blocks into chunks of exactly the right size + * (plus alignment), not wasting memory. References to AllocSet aren't necessarily a good idea, they'll quite possibly get out of date. The argument can be quite easily be made without referring to a concrete reference to behaviour elsewhere. + * + * At the context level, we use 'freelist' array to track blocks grouped by + * number of free chunks. For example freelist[0] is a list of completely full + * blocks, freelist[1] is a block with a single free chunk, etc. Hm. Those arrays are going to be quite large for small allocations w/ big blocks (an imo sensible combination). Maybe it'd make more sense to model it as a linked list of blocks? Full blocks are at one end, empty ones at the other? + * About MEMORY_CONTEXT_CHECKING: + * + * Since we usually round request sizes up to the next power of 2, there + * is often some unused space immediately after a requested data area. I assume the "round up" stuff is copy-paste? + * Thus, if someone makes the common error of writing past what they've + * requested, the problem is likely to go unnoticed ... until the day when + * there *isn't* any wasted space, perhaps because of different memory + * alignment on a new platform, or some other effect. To catch this sort + * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond + * the requested space whenever the request is less than the actual chunk + * size, and verifies that the byte is undamaged when the chunk is freed. + * + * + * About USE_VALGRIND and Valgrind client requests: + * + * Valgrind provides "client request" macros that exchange information with + * the host Valgrind (if any). Under !USE_VALGRIND, memdebug.h stubs out + * currently-used macros. + * + * When running under Valgrind, we want a NOACCESS memory region both before + * and after the allocation. The chunk header is tempting as the preceding + * region, but mcxt.c expects to able to examine the standard chunk header + * fields. Therefore, we use, when available, the requested_size field and + * any subsequent padding. requested_size is made NOACCESS before returning + * a chunk pointer to a caller. However, to reduce client request traffic, + * it is kept DEFINED in chunks on the free list. + * + * The rounded-up capacity of the chunk usually acts as a post-allocation + * NOACCESS region. If the request consumes precisely the entire chunk, + * there is no such region; another chunk header may immediately follow. In + * that case, Valgrind will not detect access beyond the end of the chunk. + * + * See also the cooperating Valgrind client requests in mcxt.c. I think we need a preliminary patch moving a lot of this into something like mcxt_internal.h. Copying this comment, and a lot of the utility functions, into every memory context implementation is a bad pattern. +typedef struct SlabBlockData *SlabBlock; /* forward reference */ +typedef struct SlabChunkData *SlabChunk; Can we please not continue hiding pointers behind typedefs? It's a bad pattern, and that it's fairly widely used isn't a good excuse to introduce further usages of it. +/* + * SlabContext is a specialized implementation of MemoryContext. + */ +typedef struct SlabContext +{ + MemoryContextData header; /* Standard memory-context fields */ + /* Allocation parameters for this context: */ + SizechunkSize; /* chunk size */ + SizefullChunkSize; /* chunk size including header and alignment */ + SizeblockSize; /* block size */ + int chunksPerBlock; /* number of chunks per block */ + int minFreeChunks; /* min number of free chunks in any block */ + int nblocks;/* number of blocks allocated */ + /* Info about storage allocated in this context: */ + SlabBlock freelist[1];/* free lists (block-level) */ I assume this is a variable-length array? If so, that a) should be documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so actually will cause compiler warnings and potential misoptimizations. +/* + * SlabBlockData + * Structure
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/25/16 4:48 PM, Tomas Vondra wrote: The main issue that bugs me is the name of the Gen allocator, but I don't have a good naming ideas :( The basic characteristics of Gen is that it does not reuse space released by pfree(), relying on the fact that the whole block will become free. That should be reflected in the name somehow, I guess. OneTime? OneUse? OneShot? AllocOnce? OneHitWonder? ;P -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/1/16 7:34 PM, Tomas Vondra wrote: +/* otherwise add it to the proper freelist bin */ Looks like something went missing... :) Ummm? The patch contains this: +/* otherwise add it to the proper freelist bin */ +if (set->freelist[block->nfree]) +set->freelist[block->nfree]->prev = block; + +block->next = set->freelist[block->nfree]; +set->freelist[block->nfree] = block; Which does exactly the thing it should do. Or what is missing? What's confusing is the "otherwise" right at the beginning of the function: +static void +add_to_freelist(Slab set, SlabBlock block) +{ + /* otherwise add it to the proper freelist bin */ + if (set->freelist[block->nfree]) + set->freelist[block->nfree]->prev = block; + + block->next = set->freelist[block->nfree]; + set->freelist[block->nfree] = block; +} Otherwise what? What's the other option? (Haven't looked at the newer patch, so maybe this isn't an issue anymore.) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/23/2016 05:26 PM, Petr Jelinek wrote: On 23/10/16 16:26, Tomas Vondra wrote: On 10/22/2016 08:30 PM, Tomas Vondra wrote: ... Moreover, the slab/gen allocators proposed here seem like a better fit for reorderbuffer, e.g. because they release memory. I haven't looked at sb_alloc too closely, but I think it behaves more like AllocSet in this regard (i.e. keeping the memory indefinitely). For reorderbuffer, from what I've seen in practice, I'd prefer proper freeing to 5% performance gain as I seen walsenders taking GBs of memory dues to reoderbuffer allocations that are never properly freed. Right. > About your actual patch. I do like both the Slab and the Gen allocators and think that we should proceed with them for the moment. You definitely need to rename the Gen one (don't ask me to what though) as it sounds like "generic" and do some finishing touches but I think it's the way to go. I don't see any point in GenSlab anymore. Attached is a v5 of the patch that does this i.e. throws away the GenSlab allocator and modifies reorderbuffer in two steps. First (0001) it adds Slab allocator for TXN/Change allocations, and keeps the local slab cache for TupleBuf allocations (with a separate AllocSet context). Then (in 0002) it adds the Gen allocator for TupleBuf, removing the last bits of the local slab cache. I do think this version is is as simple as it gets - there's not much more we could simplify / remove. The main issue that bugs me is the name of the Gen allocator, but I don't have a good naming ideas :( The basic characteristics of Gen is that it does not reuse space released by pfree(), relying on the fact that the whole block will become free. That should be reflected in the name somehow, I guess. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v5.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 23/10/16 16:26, Tomas Vondra wrote: > On 10/22/2016 08:30 PM, Tomas Vondra wrote: >> On 10/20/2016 04:43 PM, Robert Haas wrote: >>> >>> ... >>> >>> The sb_alloc allocator I proposed a couple of years ago would work >>> well for this case, I think. >>> >> >> Maybe, but it does not follow the Memory Context design at all, if I >> understand it correctly. I was willing to give it a spin anyway and see >> how it compares to the two other allocators, but this is a significant >> paradigm shift and certainly much larger step than what I proposed. >> >> I'm not even sure it's possible to implement a MemoryContext based on >> the same ideas as sb_alloc(), because one of the important points of >> sb_alloc design seems to be throwing away the chunk header. While that >> may be possible, it would certainly affect the whole tree (not just the >> reorderbuffer bit), and it'd require way more work. >> >> Moreover, the two allocators I proposed significantly benefit from the >> "same lifespan" assumption. I don't think sb_alloc can do that. >> > > I've given the sb_alloc patch another try - essentially hacking it into > reorderbuffer, ignoring the issues mentioned yesterday. And yes, it's > faster than the allocators discussed in this thread. Based on a few very > quick tests on my laptop, the difference is usually ~5-10%. > > That might seem like a significant improvement, but it's negligible > compared to the "master -> slab/gen" improvement, which improves > performance by orders of magnitude (at least for the tested cases). > > Moreover, the slab/gen allocators proposed here seem like a better fit > for reorderbuffer, e.g. because they release memory. I haven't looked at > sb_alloc too closely, but I think it behaves more like AllocSet in this > regard (i.e. keeping the memory indefinitely). > For reorderbuffer, from what I've seen in practice, I'd prefer proper freeing to 5% performance gain as I seen walsenders taking GBs of memory dues to reoderbuffer allocations that are never properly freed. About your actual patch. I do like both the Slab and the Gen allocators and think that we should proceed with them for the moment. You definitely need to rename the Gen one (don't ask me to what though) as it sounds like "generic" and do some finishing touches but I think it's the way to go. I don't see any point in GenSlab anymore. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/22/2016 08:30 PM, Tomas Vondra wrote: On 10/20/2016 04:43 PM, Robert Haas wrote: >> ... The sb_alloc allocator I proposed a couple of years ago would work well for this case, I think. Maybe, but it does not follow the Memory Context design at all, if I understand it correctly. I was willing to give it a spin anyway and see how it compares to the two other allocators, but this is a significant paradigm shift and certainly much larger step than what I proposed. I'm not even sure it's possible to implement a MemoryContext based on the same ideas as sb_alloc(), because one of the important points of sb_alloc design seems to be throwing away the chunk header. While that may be possible, it would certainly affect the whole tree (not just the reorderbuffer bit), and it'd require way more work. Moreover, the two allocators I proposed significantly benefit from the "same lifespan" assumption. I don't think sb_alloc can do that. I've given the sb_alloc patch another try - essentially hacking it into reorderbuffer, ignoring the issues mentioned yesterday. And yes, it's faster than the allocators discussed in this thread. Based on a few very quick tests on my laptop, the difference is usually ~5-10%. That might seem like a significant improvement, but it's negligible compared to the "master -> slab/gen" improvement, which improves performance by orders of magnitude (at least for the tested cases). Moreover, the slab/gen allocators proposed here seem like a better fit for reorderbuffer, e.g. because they release memory. I haven't looked at sb_alloc too closely, but I think it behaves more like AllocSet in this regard (i.e. keeping the memory indefinitely). FWIW I'm not making any conclusions about sb_alloc benefits outside reorderbuffer.c - it might easily be worth pursuing, no doubt about that. The amount of remaining work however seems quite high, though. Attached is the modified sb_alloc patch that I used - it's mostly v1 with removed uses in nbtree etc. FWIW the patch does not implement sb_destroy_private_allocator (it's only defined in the header), which seems like a bug. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/contrib/test_freepage/Makefile b/contrib/test_freepage/Makefile new file mode 100644 index 000..b482fe9 --- /dev/null +++ b/contrib/test_freepage/Makefile @@ -0,0 +1,17 @@ +# contrib/test_freepage/Makefile + +MODULES = test_freepage + +EXTENSION = test_freepage +DATA = test_freepage--1.0.sql + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/test_freepage +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/test_freepage/test_freepage--1.0.sql b/contrib/test_freepage/test_freepage--1.0.sql new file mode 100644 index 000..5d3191e --- /dev/null +++ b/contrib/test_freepage/test_freepage--1.0.sql @@ -0,0 +1,15 @@ +/* contrib/test_freepage/test_freepage--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION test_freepage" to load this file. \quit + +CREATE FUNCTION init(size pg_catalog.int8) RETURNS pg_catalog.void + AS 'MODULE_PATHNAME' LANGUAGE C STRICT; +CREATE FUNCTION get(pages pg_catalog.int8) RETURNS pg_catalog.int8 + AS 'MODULE_PATHNAME' LANGUAGE C STRICT; +CREATE FUNCTION inquire_largest() RETURNS pg_catalog.int8 + AS 'MODULE_PATHNAME' LANGUAGE C STRICT; +CREATE FUNCTION put(first_page pg_catalog.int8, npages pg_catalog.int8) + RETURNS pg_catalog.void AS 'MODULE_PATHNAME' LANGUAGE C STRICT; +CREATE FUNCTION dump() RETURNS pg_catalog.text +AS 'MODULE_PATHNAME' LANGUAGE C STRICT; diff --git a/contrib/test_freepage/test_freepage.c b/contrib/test_freepage/test_freepage.c new file mode 100644 index 000..074cf56 --- /dev/null +++ b/contrib/test_freepage/test_freepage.c @@ -0,0 +1,113 @@ +/*-- + * + * test_freepage.c + * Test harness code for free page manager. + * + * Copyright (C) 2013, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/test_freepage/test_freepage.c + * + * - + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "miscadmin.h" +#include "utils/builtins.h" +#include "utils/freepage.h" + +PG_MODULE_MAGIC; +PG_FUNCTION_INFO_V1(init); +PG_FUNCTION_INFO_V1(get); +PG_FUNCTION_INFO_V1(inquire_largest); +PG_FUNCTION_INFO_V1(put); +PG_FUNCTION_INFO_V1(dump); + +Datum init(PG_FUNCTION_ARGS); +Datum get(PG_FUNCTION_ARGS); +Datum inquire_largest(PG_FUNCTION_ARGS); +Datum put(PG_FUNCTION_ARGS); +Datum dump(PG_FUNCTION_ARGS); + +char *space; +FreePageManager *fpm; + +Datum +init(PG_FUNCTION_ARGS) +{ + int64 size = PG_GETARG_INT64(0); + Size
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/20/2016 04:43 PM, Robert Haas wrote: On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinekwrote: I agree though that the usability beyond the ReoderBuffer is limited because everything that will want to use it for part of allocations will get much more complicated by the fact that it will have to use two different allocators. I was wondering if rather than trying to implement new allocator we should maybe implement palloc_fixed which would use some optimized algorithm for fixed sized objects in our current allocator. The advantage of that would be that we could for example use that for things like ListCell easily (memory management of which I see quite often in profiles). The sb_alloc allocator I proposed a couple of years ago would work well for this case, I think. Maybe, but it does not follow the Memory Context design at all, if I understand it correctly. I was willing to give it a spin anyway and see how it compares to the two other allocators, but this is a significant paradigm shift and certainly much larger step than what I proposed. I'm not even sure it's possible to implement a MemoryContext based on the same ideas as sb_alloc(), because one of the important points of sb_alloc design seems to be throwing away the chunk header. While that may be possible, it would certainly affect the whole tree (not just the reorderbuffer bit), and it'd require way more work. Moreover, the two allocators I proposed significantly benefit from the "same lifespan" assumption. I don't think sb_alloc can do that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinekwrote: > I agree though that the usability beyond the ReoderBuffer is limited > because everything that will want to use it for part of allocations will > get much more complicated by the fact that it will have to use two > different allocators. > > I was wondering if rather than trying to implement new allocator we > should maybe implement palloc_fixed which would use some optimized > algorithm for fixed sized objects in our current allocator. The > advantage of that would be that we could for example use that for things > like ListCell easily (memory management of which I see quite often in > profiles). The sb_alloc allocator I proposed a couple of years ago would work well for this case, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/19/2016 02:51 PM, Tomas Vondra wrote: ... > Yeah. There are three contexts in reorder buffers: - changes (fixed size) - txns (fixed size) - tuples (variable size) The first two work perfectly fine with Slab. The last one (tuples) is used to allocate variable-sized bits, so I've tried to come up with something smart - a sequence of Slabs + overflow AllocSet. I agree that in hindsight it's a bit strange, and that the "generational" aspect is the key aspect here - i.e. it might be possible to implement a memory context that allocates variable-length chunks but still segregates them into generations. That is, don't build this on top of Slab. That would also fix the issue with two allocators in GenSlab. I'll think about this. And here is a fairly complete prototype of this idea, adding "Gen" generational memory context based only on the "similar lifespan" assumption (and abandoning the fixed-size assumption). It's much simpler than GenSlab (which it's supposed to replace), and abandoning the idea of composing two memory contexts also fixed the warts with some of the API methods (e.g. repalloc). I've also been thinking that perhaps "Gen" would be useful for all three contexts in ReorderBuffer - so I've done a quick test comparing the various combinations (using the test1() function used before). master slabs slabs+genslab slabs+gen gens 50k 18700 210 220 190 190 100k 16 380 470 350 350 200kN/A 750 920 740 679 500kN/A 225022401790 1740 1000kN/A 460050003910 3700 Where: * master - 23ed2ba812117 * slabs - all three contexts use Slab (patch 0001) * slabs+genslab - third context is GenSlab (patch 0002) * slabs+gen - third context is the new Gen (patch 0003) * gens - all three contexts use Gen The results are a bit noisy, but I think it's clear the new Gen context performs well - it actually seems a bit faster than GenSlab, and using only Gen for all three contexts does not hurt peformance. This is most likely due to the trivial (practically absent) freespace management in Gen context, compared to both Slab and GenSlab. So the speed is not the only criteria - I haven't measured memory consumption, but I'm pretty sure there are cases where Slab consumes much less memory than Gen, thanks to reusing free space. I'd say throwing away GenSlab and keeping Slab+Gen is the way to go. There's still a fair bit of work on this, particularly implementing the missing API methods in Gen - GenCheck() and GenStats(). As Robert pointed out, there's also quite a bit of duplicated code between the different memory contexts (randomization and valgrind-related), so this needs to be moved to a shared place. I'm also thinking that we need better names, particularly for the Gen allocator. It's supposed to mean Generational, although there are no explicit generations anymore. Slab is probably OK - it does not match any of the existing kernel slab allocators exactly, but it follows the same principles, which is what matters. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v4.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/19/2016 12:27 AM, Petr Jelinek wrote: > On 18/10/16 22:25, Robert Haas wrote: >> On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra >>wrote: >>> attached is v3 of the patches, with a few minor fixes in Slab, and much >>> larger fixes in GenSlab. >>> >>> Slab (minor fixes) >>> -- >>> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we >>> still need to zero the free bitmap at the end of the block. >>> - Renamed minFreeCount to minFreeChunks, added a few comments explaining >>> why/how firstFreeChunk and minFreeChunks are maintained. >>> - Fixed / improved a bunch of additional comments, based on feedback. >> >> I had a look at 0001 today, but it seems to me that it still needs >> work. It's still got a lot of remnants of where you've >> copy-and-pasted aset.c. I dispute this allegation: >> >> + * SlabContext is our standard implementation of MemoryContext. >> > > Are you looking at old version of the patch? I complained about this as > well and Tomas has changed that. > >> And then there's this: >> >> +#ifdef HAVE_ALLOCINFO >> +#define SlabFreeInfo(_cxt, _chunk) \ >> +fprintf(stderr, "AllocFree: %s: %p, %d\n", \ >> +(_cxt)->header.name, (_chunk), (_chunk)->size) >> +#define SlabAllocInfo(_cxt, _chunk) \ >> +fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ >> +(_cxt)->header.name, (_chunk), (_chunk)->size) >> >> Well, it's pretty stupid that AllocSetAlloc is reporting it's name as >> AllocAlloc, a think that, as far as I can tell, is not real. But >> having this new context type also pretend to be AllocAlloc is even >> dumber. > > You are definitely looking at old version. > Yeah. >> >> More broadly, I'm not sure I like this design very much. The whole >> point of a slab context is that all of the objects are the same size. >> I wouldn't find it too difficult to support this patch if we were >> adding an allocator for fixed-size objects that was then being used to >> allocate objects which are fixed size. However, what we seem to be >> doing is creating an allocator for fixed-size objects and then using >> it for variable-size tuples. That's really pretty weird. Isn't the >> root of this problem that aset.c is utterly terrible at handling large >> number of allocations? Maybe we should try to attack that problem >> more directly. > > It's used for TXNs which are fixed and some tuples (there is > assumption that the decoded tuples have more or less normal > distribution). > Yeah. There are three contexts in reorder buffers: - changes (fixed size) - txns (fixed size) - tuples (variable size) The first two work perfectly fine with Slab. The last one (tuples) is used to allocate variable-sized bits, so I've tried to come up with something smart - a sequence of Slabs + overflow AllocSet. I agree that in hindsight it's a bit strange, and that the "generational" aspect is the key aspect here - i.e. it might be possible to implement a memory context that allocates variable-length chunks but still segregates them into generations. That is, don't build this on top of Slab. That would also fix the issue with two allocators in GenSlab. I'll think about this. > I agree though that the usability beyond the ReoderBuffer is limited > because everything that will want to use it for part of allocations will > get much more complicated by the fact that it will have to use two > different allocators. > It wasn't my (primary) goal to provide allocators usable outside ReorderBuffer. I've intended to show that perhaps using AllocSet and then trying to compensate for the pfree() issues is the wrong direction, and that perhaps different allocation strategy (exploiting the ReorderBuffer specifics) would work much better. And I think the two allocators show prove that. > > I was wondering if rather than trying to implement new allocator we > should maybe implement palloc_fixed which would use some optimized > algorithm for fixed sized objects in our current allocator. The > advantage of that would be that we could for example use that for things > like ListCell easily (memory management of which I see quite often in > profiles). > I don't see how inveting palloc_fixed() solves any of the problems, and I think it's going to be much more complicated than you think. The idea of injecting this into AllocSet seems like a dead-end to me, as the code is already complex enough and it's likely to cause regressions no matter what you do. I prefer the idea of implementing separate specialized memory contexts. If the bar is moved to "implement palloc_fixed()" or something like that, someone else will have to do that - I'm not all that interested in ReorderBuffer (this was the first time I actually saw that code), so my motivation to spend much more time on this is rather small. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote
Re: [HACKERS] PATCH: two slab-like memory allocators
On 18/10/16 22:25, Robert Haas wrote: > On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra >wrote: >> attached is v3 of the patches, with a few minor fixes in Slab, and much >> larger fixes in GenSlab. >> >> Slab (minor fixes) >> -- >> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we >> still need to zero the free bitmap at the end of the block. >> - Renamed minFreeCount to minFreeChunks, added a few comments explaining >> why/how firstFreeChunk and minFreeChunks are maintained. >> - Fixed / improved a bunch of additional comments, based on feedback. > > I had a look at 0001 today, but it seems to me that it still needs > work. It's still got a lot of remnants of where you've > copy-and-pasted aset.c. I dispute this allegation: > > + * SlabContext is our standard implementation of MemoryContext. > Are you looking at old version of the patch? I complained about this as well and Tomas has changed that. > And then there's this: > > +#ifdef HAVE_ALLOCINFO > +#define SlabFreeInfo(_cxt, _chunk) \ > +fprintf(stderr, "AllocFree: %s: %p, %d\n", \ > +(_cxt)->header.name, (_chunk), (_chunk)->size) > +#define SlabAllocInfo(_cxt, _chunk) \ > +fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ > +(_cxt)->header.name, (_chunk), (_chunk)->size) > > Well, it's pretty stupid that AllocSetAlloc is reporting it's name as > AllocAlloc, a think that, as far as I can tell, is not real. But > having this new context type also pretend to be AllocAlloc is even > dumber. You are definitely looking at old version. > > More broadly, I'm not sure I like this design very much. The whole > point of a slab context is that all of the objects are the same size. > I wouldn't find it too difficult to support this patch if we were > adding an allocator for fixed-size objects that was then being used to > allocate objects which are fixed size. However, what we seem to be > doing is creating an allocator for fixed-size objects and then using > it for variable-size tuples. That's really pretty weird. Isn't the > root of this problem that aset.c is utterly terrible at handling large > number of allocations? Maybe we should try to attack that problem > more directly. It's used for TXNs which are fixed and some tuples (there is assumption that the decoded tuples have more or less normal distribution). I agree though that the usability beyond the ReoderBuffer is limited because everything that will want to use it for part of allocations will get much more complicated by the fact that it will have to use two different allocators. I was wondering if rather than trying to implement new allocator we should maybe implement palloc_fixed which would use some optimized algorithm for fixed sized objects in our current allocator. The advantage of that would be that we could for example use that for things like ListCell easily (memory management of which I see quite often in profiles). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondrawrote: > attached is v3 of the patches, with a few minor fixes in Slab, and much > larger fixes in GenSlab. > > Slab (minor fixes) > -- > - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we > still need to zero the free bitmap at the end of the block. > - Renamed minFreeCount to minFreeChunks, added a few comments explaining > why/how firstFreeChunk and minFreeChunks are maintained. > - Fixed / improved a bunch of additional comments, based on feedback. I had a look at 0001 today, but it seems to me that it still needs work. It's still got a lot of remnants of where you've copy-and-pasted aset.c. I dispute this allegation: + * SlabContext is our standard implementation of MemoryContext. And all of this is just a direct copy-paste; I don't really want two copies: + * When running under Valgrind, we want a NOACCESS memory region both before + * and after the allocation. The chunk header is tempting as the preceding + * region, but mcxt.c expects to able to examine the standard chunk header + * fields. Therefore, we use, when available, the requested_size field and + * any subsequent padding. requested_size is made NOACCESS before returning + * a chunk pointer to a caller. However, to reduce client request traffic, + * it is kept DEFINED in chunks on the free list. And then there's this: +#ifdef HAVE_ALLOCINFO +#define SlabFreeInfo(_cxt, _chunk) \ +fprintf(stderr, "AllocFree: %s: %p, %d\n", \ +(_cxt)->header.name, (_chunk), (_chunk)->size) +#define SlabAllocInfo(_cxt, _chunk) \ +fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ +(_cxt)->header.name, (_chunk), (_chunk)->size) Well, it's pretty stupid that AllocSetAlloc is reporting it's name as AllocAlloc, a think that, as far as I can tell, is not real. But having this new context type also pretend to be AllocAlloc is even dumber. +static void +randomize_mem(char *ptr, size_t size) +{ +static int save_ctr = 1; +size_t remaining = size; +int ctr; + +ctr = save_ctr; +VALGRIND_MAKE_MEM_UNDEFINED(ptr, size); +while (remaining-- > 0) +{ +*ptr++ = ctr; +if (++ctr > 251) +ctr = 1; +} +VALGRIND_MAKE_MEM_UNDEFINED(ptr - size, size); +save_ctr = ctr; +} +#endif /* RANDOMIZE_ALLOCATED_MEMORY */ Another copy of this doesn't seem like a good idea, either. More broadly, I'm not sure I like this design very much. The whole point of a slab context is that all of the objects are the same size. I wouldn't find it too difficult to support this patch if we were adding an allocator for fixed-size objects that was then being used to allocate objects which are fixed size. However, what we seem to be doing is creating an allocator for fixed-size objects and then using it for variable-size tuples. That's really pretty weird. Isn't the root of this problem that aset.c is utterly terrible at handling large number of allocations? Maybe we should try to attack that problem more directly. On a related note, the autodestruct thing is a weird hack that's only necessary because of the hijinks already discussed in the previous paragraph. The context has no fixed lifetime; we're just trying to find a way of coping with possibly-shifting tuple sizes over time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Tue, Oct 4, 2016 at 10:11 PM, Tomas Vondra For GenSlab the situation is less clear, as there probably are ways to make > it work, but I'd vote to keep it simple for now, and simply do elog(ERROR) > in the realloc() methods - both for Slab and GenSlab. The current use case > (reorderbuffer) does not need that, and it seems like a can of worms to me. Good plan. Realloc can be added later if there is an actual use case.
Re: [HACKERS] PATCH: two slab-like memory allocators
On 05/10/16 03:11, Tomas Vondra wrote: > On 10/04/2016 09:44 PM, John Gorman wrote: >> >> Remind me again why we cannot realloc in place for sizes >> smaller than chunkSize? GenSlab is happy to allocate smaller >> sizes and put them into the fixed size chunks. >> >> Maybe SlabAlloc can be happy with sizes up to chunkSize. >> >> if (size <= set->chunkSize) >> return MemoryContextAlloc(set->slab, size); >> else >> return MemoryContextAlloc(set->aset, size); >> > > That'd be certainly possible, but it seems a bit strange as the whole > Slab is based on the idea that all chunks have the same size. Moreover, > people usually realloc() to a larger chunk, so it does not really fix > anything in practice. > > In GenSlab, the situation is more complicated. But I don't like the > coupling / moving chunks between contexts, etc. > > If realloc() support is a hard requirement, it immediately rules out > SlabContext() as an independent memory context. Which seems stupid, as > independent Slab contexts are quite useful for reorderbuffer use case. > > For GenSlab the situation is less clear, as there probably are ways to > make it work, but I'd vote to keep it simple for now, and simply do > elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The > current use case (reorderbuffer) does not need that, and it seems like a > can of worms to me. > Hmm, so this in practice means that the caller still has to know the details of what chunks go where. I would prefer if the realloc just failed always and "don't do realloc on GenSlab" would be part of spec of hat context, the randomness that you described originally is the main problem IMHO. Maybe you could add new "constructor" function for Aset that would create Aset which can't realloc for use inside the GenSlab? Alternative would be of course having the individual API calls behind Aset and Slab exported and used by GenSlab directly instead of using child contexts. Then all the calls would go to GenSlab which could decide what to do (and move the chunks between the allocators). But honestly given the usecases for GenSlab, I would at the moment prefer just to have predictable error as it can be done more cleanly and nobody needs the functionality so far, it can be revisited once we actually do need it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, attached is v3 of the patches, with a few minor fixes in Slab, and much larger fixes in GenSlab. Slab (minor fixes) -- - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we still need to zero the free bitmap at the end of the block. - Renamed minFreeCount to minFreeChunks, added a few comments explaining why/how firstFreeChunk and minFreeChunks are maintained. - Fixed / improved a bunch of additional comments, based on feedback. GenSlab --- Fixed a bunch of bugs that made GenSlab utterly useless. Firstly, chunkSize was not stored in GenSlabContextCreate(), so this check in SlabAlloc() if (size <= set->chunkSize) return MemoryContextAlloc(set->slab, set->chunkSize); else return MemoryContextAlloc(set->aset, size); always fell through to the set->aset case, not allocating stuff in the Slab at all. Secondly, nallocations / nbytes counters were not updated at all, so the Slab was never recreated, so GenSlab was not really generational. This only affected 1 of 3 contexts in ReorderBuffer, but apparently those "important ones" to affect performance. Both issues are fixed in the attached v3, which also introduces two additional improvements discussed in this thread: - The chunk size is limited by ALLOCSET_SEPARATE_THRESHOLD, as for large chunks AllocSet works just fine (thanks to keeping them out of free list etc.) - Instead of specifying blockSize and chunkSize, GenSlabCreate() now accepts three parameters - minBlockSize, minChunkCount and chunkSize, and computes the minimum block size (>= minBlockSize), sufficient to store minChunkCount chunks, each chunkSize bytes. This works much better in the auto-tuning scenario. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v3.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/04/2016 09:44 PM, John Gorman wrote: SlabContext has a parent context. It can delegate requests that it cannot handle to the parent context which is GenSlab. Genslab can send them to the sister aset context. But Slab may also be used separately, not just as part of GenSlab (actually, reorderbuffer has two such contexts). That complicates things quite a bit, and it also seems a bit awkward, because: (a) It'd require a flag in SlabContext (or perhaps a pointer to the second context), which introduces coupling between the contexts. (b) SlabContext was meant to be extremely simple (based on the "single chunk size" idea), and this contradicts that a bit. (c) It'd move chunks between the memory contexts in unpredictable ways (although the user should treat it as a single context, and not reset the parts independently for example). > This could handle all reallocs so there will be no surprises. Yeah, but it's also > Remind me again why we cannot realloc in place for sizes smaller than chunkSize? GenSlab is happy to allocate smaller sizes and put them into the fixed size chunks. Maybe SlabAlloc can be happy with sizes up to chunkSize. if (size <= set->chunkSize) return MemoryContextAlloc(set->slab, size); else return MemoryContextAlloc(set->aset, size); That'd be certainly possible, but it seems a bit strange as the whole Slab is based on the idea that all chunks have the same size. Moreover, people usually realloc() to a larger chunk, so it does not really fix anything in practice. In GenSlab, the situation is more complicated. But I don't like the coupling / moving chunks between contexts, etc. If realloc() support is a hard requirement, it immediately rules out SlabContext() as an independent memory context. Which seems stupid, as independent Slab contexts are quite useful for reorderbuffer use case. For GenSlab the situation is less clear, as there probably are ways to make it work, but I'd vote to keep it simple for now, and simply do elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The current use case (reorderbuffer) does not need that, and it seems like a can of worms to me. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
SlabContext has a parent context. It can delegate requests that it cannot handle to the parent context which is GenSlab. Genslab can send them to the sister aset context. This could handle all reallocs so there will be no surprises. Remind me again why we cannot realloc in place for sizes smaller than chunkSize? GenSlab is happy to allocate smaller sizes and put them into the fixed size chunks. Maybe SlabAlloc can be happy with sizes up to chunkSize. if (size <= set->chunkSize) return MemoryContextAlloc(set->slab, size); else return MemoryContextAlloc(set->aset, size); On Sat, Oct 1, 2016 at 10:15 PM, Tomas Vondrawrote: > On 10/02/2016 12:23 AM, John Gorman wrote: > >> I reproduced the quadradic pfree performance problem and verified that >> these patches solved it. >> >> The slab.c data structures and functions contain no quadradic components. >> >> I noticed the sizing loop in SlabContextCreate() and came up with >> a similar formula to determine chunksPerBlock that you arrived at. >> >> Firstly, I've realized there's an issue when chunkSize gets too >>> large - once it exceeds blockSize, the SlabContextCreate() fails >>> as it's impossible to place a single chunk into the block. In >>> reorderbuffer, this may happen when the tuples (allocated in >>> tup_context) get larger than 8MB, as the context uses >>> SLAB_LARGE_BLOCK_SIZE (which is 8MB). >>> >>> But maybe there's a simpler solution - we may simply cap the >>> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because >>> AllocSet handles those requests in a special way - for example >>> instead of tracking them in freelist, those chunks got freed >>> immediately. >>> >> >> I like this approach because it fixes the performance problems >> with smaller allocations and doesn't change how larger >> allocations are handled. >> >> > One more comment about GenSlab, particularly about unpredictability of the > repalloc() behavior. It works for "large" chunks allocated in the AllocSet > part, and mostly does not work for "small" chunks allocated in the > SlabContext. Moreover, the chunkSize changes over time, so for two chunks > of the same size, repalloc() may work on one of them and fail on the other, > depending on time of allocation. > > With SlabContext it's perfectly predictable - repalloc() call fails unless > the chunk size is exactly the same as before (which is perhaps a bit > pointless, but if we decide to fail even in this case it'll work 100% time). > > But with GenSlabContext it's unclear whether the call fails or not. > > I don't like this unpredictability - I'd much rather have consistent > failures (making sure people don't do repalloc() on with GenSlab). But I > don't see a nice way to achieve that - the repalloc() call does not go > through GenSlabRealloc() at all, but directly to SlabRealloc() or > AllocSetRealloc(). > > The best solution I can think of is adding an alternate version of > AllocSetMethods, pointing to a different AllocSetReset implementation. > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] PATCH: two slab-like memory allocators
On Sun, Oct 2, 2016 at 10:15 AM, Tomas Vondrawrote: > One more comment about GenSlab, particularly about unpredictability of the > repalloc() behavior. It works for "large" chunks allocated in the AllocSet > part, and mostly does not work for "small" chunks allocated in the > SlabContext. Moreover, the chunkSize changes over time, so for two chunks of > the same size, repalloc() may work on one of them and fail on the other, > depending on time of allocation. > > With SlabContext it's perfectly predictable - repalloc() call fails unless > the chunk size is exactly the same as before (which is perhaps a bit > pointless, but if we decide to fail even in this case it'll work 100% time). > > But with GenSlabContext it's unclear whether the call fails or not. > > I don't like this unpredictability - I'd much rather have consistent > failures (making sure people don't do repalloc() on with GenSlab). But I > don't see a nice way to achieve that - the repalloc() call does not go > through GenSlabRealloc() at all, but directly to SlabRealloc() or > AllocSetRealloc(). > > The best solution I can think of is adding an alternate version of > AllocSetMethods, pointing to a different AllocSetReset implementation. You guys are still playing with this patch, so moved to next CF with "waiting on author". -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/02/2016 12:23 AM, John Gorman wrote: I reproduced the quadradic pfree performance problem and verified that these patches solved it. The slab.c data structures and functions contain no quadradic components. I noticed the sizing loop in SlabContextCreate() and came up with a similar formula to determine chunksPerBlock that you arrived at. Firstly, I've realized there's an issue when chunkSize gets too large - once it exceeds blockSize, the SlabContextCreate() fails as it's impossible to place a single chunk into the block. In reorderbuffer, this may happen when the tuples (allocated in tup_context) get larger than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB). But maybe there's a simpler solution - we may simply cap the chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those requests in a special way - for example instead of tracking them in freelist, those chunks got freed immediately. I like this approach because it fixes the performance problems with smaller allocations and doesn't change how larger allocations are handled. One more comment about GenSlab, particularly about unpredictability of the repalloc() behavior. It works for "large" chunks allocated in the AllocSet part, and mostly does not work for "small" chunks allocated in the SlabContext. Moreover, the chunkSize changes over time, so for two chunks of the same size, repalloc() may work on one of them and fail on the other, depending on time of allocation. With SlabContext it's perfectly predictable - repalloc() call fails unless the chunk size is exactly the same as before (which is perhaps a bit pointless, but if we decide to fail even in this case it'll work 100% time). But with GenSlabContext it's unclear whether the call fails or not. I don't like this unpredictability - I'd much rather have consistent failures (making sure people don't do repalloc() on with GenSlab). But I don't see a nice way to achieve that - the repalloc() call does not go through GenSlabRealloc() at all, but directly to SlabRealloc() or AllocSetRealloc(). The best solution I can think of is adding an alternate version of AllocSetMethods, pointing to a different AllocSetReset implementation. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/02/2016 12:23 AM, John Gorman wrote: I reproduced the quadradic pfree performance problem and verified that these patches solved it. The slab.c data structures and functions contain no quadradic components. I noticed the sizing loop in SlabContextCreate() and came up with a similar formula to determine chunksPerBlock that you arrived at. ;-) Firstly, I've realized there's an issue when chunkSize gets too large - once it exceeds blockSize, the SlabContextCreate() fails as it's impossible to place a single chunk into the block. In reorderbuffer, this may happen when the tuples (allocated in tup_context) get larger than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB). But maybe there's a simpler solution - we may simply cap the chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those requests in a special way - for example instead of tracking them in freelist, those chunks got freed immediately. I like this approach because it fixes the performance problems with smaller allocations and doesn't change how larger allocations are handled. Right. In slab.c it looks like a line in the top comments could be clearer. Perhaps this is what is meant. < * (plus alignment), now wasting memory. * (plus alignment), not wasting memory. In slab.c some lines are over 80 characters could be folded. It would be nice to give each patch version a unique file name. OK, will fix. Nice patch, I enjoyed reading it! ;-) -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/02/2016 01:53 AM, Jim Nasby wrote: On 9/26/16 9:10 PM, Tomas Vondra wrote: Attached is v2 of the patch, updated based on the review. That means: +/* make sure the block can store at least one chunk (with 1B for a bitmap)? */ (and the comment below it) I find the question to be confusing... I think these would be better as +/* make sure the block can store at least one chunk (with 1B for a bitmap) */ and +/* number of chunks can we fit into a block, including header and bitmap */ Thanks, will rephrase the comments a bit. I'm also wondering if a 1B freespace bitmap is actually useful. If nothing else, there should probably be a #define for the initial bitmap size. That's not the point. The point is that we need to store at least one entry per block, with one bit in a bitmap. But we can't store just a single byte - we always allocate at least 1 byte. So it's more an explanation for the "1" literal in the check, nothing more. +/* otherwise add it to the proper freelist bin */ Looks like something went missing... :) Ummm? The patch contains this: + /* otherwise add it to the proper freelist bin */ + if (set->freelist[block->nfree]) + set->freelist[block->nfree]->prev = block; + + block->next = set->freelist[block->nfree]; + set->freelist[block->nfree] = block; Which does exactly the thing it should do. Or what is missing? Should zeroing the block in SlabAlloc be optional like it is with palloc/palloc0? Good catch. The memset(,0) should not be in SlabAlloc() as all, as the zeroing is handled in mctx.c, independently of the implementation. +/* + * If there are no free chunks in any existing block, create a new block + * and put it to the last freelist bucket. + */ +if (set->minFreeCount == 0) Couldn't there be blocks that have a free count > minFreeCount? (In which case you don't want to just alloc a new block, no?) Nevermind, after reading further down I understand what's going on. I got confused by "we've decreased nfree for a block with the minimum count" until I got to the part that deals with minFreeCount = 0. I think it'd be helpful if the "we've decreased nfree" comment mentioned that if nfree is 0 we'll look for the correct value after updating the freelists. Right. I think it'd be good to add an assert ensuring the minFreeCount value matches the block freelist. Or at least SlabCheck() could check this. +block->bitmapptr[idx/8] |= (0x01 << (idx % 8)); Did you consider using a pre-defined map of values to avoid the shift? I know I've seen that somewhere in the code... Patch 2... Doesn't GenSlabReset() need to actually free something? If the call magically percolates to the other contexts it'd be nice to note that in the comment. No, the other contexts are created as children of GenSlab, so the memory context infrastructure resets them automatically. I don't think this needs to be mentioned in the comments - it's pretty basic part of the parent-child context relationship. Thanks! -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 9/26/16 9:10 PM, Tomas Vondra wrote: Attached is v2 of the patch, updated based on the review. That means: + /* make sure the block can store at least one chunk (with 1B for a bitmap)? */ (and the comment below it) I find the question to be confusing... I think these would be better as + /* make sure the block can store at least one chunk (with 1B for a bitmap) */ and + /* number of chunks can we fit into a block, including header and bitmap */ I'm also wondering if a 1B freespace bitmap is actually useful. If nothing else, there should probably be a #define for the initial bitmap size. + /* otherwise add it to the proper freelist bin */ Looks like something went missing... :) Should zeroing the block in SlabAlloc be optional like it is with palloc/palloc0? + /* +* If there are no free chunks in any existing block, create a new block +* and put it to the last freelist bucket. +*/ + if (set->minFreeCount == 0) Couldn't there be blocks that have a free count > minFreeCount? (In which case you don't want to just alloc a new block, no?) Nevermind, after reading further down I understand what's going on. I got confused by "we've decreased nfree for a block with the minimum count" until I got to the part that deals with minFreeCount = 0. I think it'd be helpful if the "we've decreased nfree" comment mentioned that if nfree is 0 we'll look for the correct value after updating the freelists. + block->bitmapptr[idx/8] |= (0x01 << (idx % 8)); Did you consider using a pre-defined map of values to avoid the shift? I know I've seen that somewhere in the code... Patch 2... Doesn't GenSlabReset() need to actually free something? If the call magically percolates to the other contexts it'd be nice to note that in the comment. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
I reproduced the quadradic pfree performance problem and verified that these patches solved it. The slab.c data structures and functions contain no quadradic components. I noticed the sizing loop in SlabContextCreate() and came up with a similar formula to determine chunksPerBlock that you arrived at. > Firstly, I've realized there's an issue when chunkSize gets too > large - once it exceeds blockSize, the SlabContextCreate() fails > as it's impossible to place a single chunk into the block. In > reorderbuffer, this may happen when the tuples (allocated in > tup_context) get larger than 8MB, as the context uses > SLAB_LARGE_BLOCK_SIZE (which is 8MB). > > But maybe there's a simpler solution - we may simply cap the > chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because > AllocSet handles those requests in a special way - for example > instead of tracking them in freelist, those chunks got freed > immediately. I like this approach because it fixes the performance problems with smaller allocations and doesn't change how larger allocations are handled. In slab.c it looks like a line in the top comments could be clearer. Perhaps this is what is meant. < * (plus alignment), now wasting memory. > * (plus alignment), not wasting memory. In slab.c some lines are over 80 characters could be folded. It would be nice to give each patch version a unique file name. Nice patch, I enjoyed reading it! Best, John John Gorman On Mon, Sep 26, 2016 at 10:10 PM, Tomas Vondrawrote: > Hi, > > Attached is v2 of the patch, updated based on the review. That means: > > - Better comment explaining how free chunks are tracked in Slab context. > > - Removed the unused SlabPointerIsValid macro. > > - Modified the comment before SlabChunkData, explaining how it relates > to StandardChunkHeader. > > - Replaced the two Assert() calls with elog(). > > - Implemented SlabCheck(). I've ended up with quite a few checks there, > checking pointers between the context, block and chunks, checks due > to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the > number of free chunks (bitmap, freelist vs. chunk header). > > - I've also modified SlabContextCreate() to compute chunksPerBlock a > bit more efficiently (use a simple formula instead of the loop, which > might be a bit too expensive for large blocks / small chunks). > > > I haven't done any changes to GenSlab, but I do have a few notes: > > Firstly, I've realized there's an issue when chunkSize gets too large - > once it exceeds blockSize, the SlabContextCreate() fails as it's impossible > to place a single chunk into the block. In reorderbuffer, this may happen > when the tuples (allocated in tup_context) get larger than 8MB, as the > context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB). > > For Slab the elog(ERROR) is fine as both parameters are controlled by the > developer directly, but GenSlab computes the chunkSize on the fly, so we > must not let it fail like that - that'd result in unpredictable failures, > which is not very nice. > > I see two ways to fix this. We may either increase the block size > automatically - e.g. instead of specifying specifying chunkSize and > blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and > then choose the smallest 2^k block large enough). For example with > chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the > closest 2^k block larger than 96000 bytes. > > But maybe there's a simpler solution - we may simply cap the chunkSize (in > GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those > requests in a special way - for example instead of tracking them in > freelist, those chunks got freed immediately. > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi, Attached is v2 of the patch, updated based on the review. That means: - Better comment explaining how free chunks are tracked in Slab context. - Removed the unused SlabPointerIsValid macro. - Modified the comment before SlabChunkData, explaining how it relates to StandardChunkHeader. - Replaced the two Assert() calls with elog(). - Implemented SlabCheck(). I've ended up with quite a few checks there, checking pointers between the context, block and chunks, checks due to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the number of free chunks (bitmap, freelist vs. chunk header). - I've also modified SlabContextCreate() to compute chunksPerBlock a bit more efficiently (use a simple formula instead of the loop, which might be a bit too expensive for large blocks / small chunks). I haven't done any changes to GenSlab, but I do have a few notes: Firstly, I've realized there's an issue when chunkSize gets too large - once it exceeds blockSize, the SlabContextCreate() fails as it's impossible to place a single chunk into the block. In reorderbuffer, this may happen when the tuples (allocated in tup_context) get larger than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB). For Slab the elog(ERROR) is fine as both parameters are controlled by the developer directly, but GenSlab computes the chunkSize on the fly, so we must not let it fail like that - that'd result in unpredictable failures, which is not very nice. I see two ways to fix this. We may either increase the block size automatically - e.g. instead of specifying specifying chunkSize and blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and then choose the smallest 2^k block large enough). For example with chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the closest 2^k block larger than 96000 bytes. But maybe there's a simpler solution - we may simply cap the chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those requests in a special way - for example instead of tracking them in freelist, those chunks got freed immediately. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-simple-slab-allocator-fixed-size-allocations.patch Description: binary/octet-stream 0002-generational-slab-auto-tuning-allocator.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 25/09/16 22:17, Tomas Vondra wrote: > On 09/25/2016 08:48 PM, Petr Jelinek wrote: > >> Slab: >> In general it seems understandable, the initial description helps to >> understand what's happening well enough. >> >> One thing I don't understand however is why the freelist is both >> array and doubly linked list and why there is new implementation of >> said doubly linked list given that we have dlist. >> > > Hmm, perhaps that should be explained better. > > In AllocSet, we only have a global freelist of chunks, i.e. we have a > list of free chunks for each possible size (there's 11 sizes, starting > with 8 bytes and then doubling the size). So freelist[0] is a list of > free 8B chunks, freelist[1] is a list of free 16B chunks, etc. > > In Slab, the freelist has two levels - first there's a bitmap on each > block (which is possible, as the chunks have constant size), tracking > which chunks of that particular block are free. This makes it trivial to > check that all chunks on the block are free, and free the whole block > (which is impossible with AllocSet). > > Second, the freelist at the context level tracks blocks with a given > number of free chunks - so freelist[0] tracks completely full blocks, > freelist[1] is a list of blocks with 1 free chunk, etc. This is used to > reuse space on almost full blocks first, in the hope that some of the > less full blocks will get completely empty (and freed to the OS). > > Is that clear? Ah okay, makes sense, the documentation of this could be improved then though as it's all squashed into single sentence that wasn't quite clear for me. > >>> +/* >>> + * SlabChunk >>> + *The prefix of each piece of memory in an SlabBlock >>> + * >>> + * NB: this MUST match StandardChunkHeader as defined by >>> utils/memutils.h. >>> + */ >> >> Is this true? Why? And if it is then why doesn't the SlabChunk >> actually match the StandardChunkHeader? > > It is true, a lot of stuff in MemoryContext infrastructure relies on > that. For example when you do pfree(ptr), we actually do something like > >header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader)) > > to get the chunk header - which includes pointer to the memory context > and other useful stuff. > > This also means we can put additional fields before StandardChunkHeader > as that does not break this pointer arithmetic, i.e. SlabChunkData is > effectively defined like this: > > typedef struct SlabChunkData > { > /* block owning this chunk */ > void *block; > > /* standard header */ > StandardChunkHeader header; > } SlabChunkData; > Yes but your struct then does not match StandardChunkHeader exactly so it should be explained in more detail (the aset.c where this comment is also present has struct that matches StandardChunkHeader so it's sufficient there). >> >>> +static void >>> +SlabCheck(MemoryContext context) >>> +{ >>> +/* FIXME */ >>> +} >> >> Do you plan to implement this interface? >> > > Yes, although I'm not sure what checks should go there. The only thing I > can think of right now is checking that the number of free chunks on a > block (according to the bitmap) matches the freelist index. > Yeah this context does not seem like it needs too much checking. The freelist vs free chunks check sounds ok to me. I guess GenSlab then will call the checks for the underlying contexts. >>> +#define SLAB_DEFAULT_BLOCK_SIZE8192 >>> +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024) >> >> I am guessing this is based on max_cached_tuplebufs? Maybe these >> could be written with same style? >> > > Not sure I understand what you mean by "based on"? I don't quite > remember how I came up with those constants, but I guess 8kB and 8MB > seemed like good values. > > Also, what style you mean? I've used the same style as for ALLOCSET_* > constants in the same file. > I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more readable. ALLOCSET_* does that too (with exception of ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from rest of the code). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 09/25/2016 08:48 PM, Petr Jelinek wrote: Hi Tomas, On 02/08/16 17:44, Tomas Vondra wrote: This patch actually includes two new memory allocators (not one). Very brief summary (for more detailed explanation of the ideas, see comments at the beginning of slab.c and genslab.c): Slab * suitable for fixed-length allocations (other pallocs fail) * much simpler than AllocSet (no global freelist management etc.) * free space is tracked per block (using a simple bitmap) * which allows freeing the block once all chunks are freed (AllocSet will hold the memory forever, in the hope of reusing it) GenSlab --- * suitable for non-fixed-length allocations, but with chunks of mostly the same size (initially unknown, the context will tune itself) * a combination AllocSet and Slab (or a sequence of Slab allocators) * the goal is to do most allocations in Slab context * there's always a single 'current' Slab context, and every now and and then it's replaced with a new generation (with the chunk size computed from recent requests) * the AllocSet context is used for chunks too large for current Slab So it's just wrapper around the other two allocators to make this usecase easier to handle. Do you expect there will be use for the GenSlab eventually outside of the reoderbuffer? Yes, you might say it's just a wrapper around the other two allocators, but it *also* includes the logic of recomputing chunk size etc. I haven't thought about other places that might benefit from these new allocators very much - in general, it's useful for places that produce a stream of equally-sized items (GenSlab relaxes this), that are pfreed() in ~FIFO manner (i.e. roughly in the order of allocation). So none of this is meant as a universal replacement of AllocSet, but in the suitable cases the results seem really promising. For example for the simple test query in [1], the performance improvement is this: N | master | patched - 1 |100ms |100ms 5 | 15000ms |350ms 10 | 146000ms |700ms 20 |? | 1400ms That's a fairly significant improvement, and the submitted version of the patches should perform even better (~2x, IIRC). I agree that it improves performance quite nicely and that reoderbuffer could use this. About the code. I am not quite sure that this needs to be split into two patches especially if 1/3 of the second patch is the removal of the code added by the first one and otherwise it's quite small and straightforward. That is unless you expect the GenSlab to not go in. I don't know - it seemed natural to first introduce the Slab, as it's easier to discuss it separately, and it works for 2 of the 3 contexts needed in reorderbuffer. GenSlab is an improvement of Slab, or rather based on it, so that it works for the third context. And it introduces some additional ideas (particularly the generational design, etc.) Of course, none of this means it has to be committed in two separate chunks, or that I don't expect GenSlab to get committed ... Slab: In general it seems understandable, the initial description helps to understand what's happening well enough. One thing I don't understand however is why the freelist is both array and doubly linked list and why there is new implementation of said doubly linked list given that we have dlist. Hmm, perhaps that should be explained better. In AllocSet, we only have a global freelist of chunks, i.e. we have a list of free chunks for each possible size (there's 11 sizes, starting with 8 bytes and then doubling the size). So freelist[0] is a list of free 8B chunks, freelist[1] is a list of free 16B chunks, etc. In Slab, the freelist has two levels - first there's a bitmap on each block (which is possible, as the chunks have constant size), tracking which chunks of that particular block are free. This makes it trivial to check that all chunks on the block are free, and free the whole block (which is impossible with AllocSet). Second, the freelist at the context level tracks blocks with a given number of free chunks - so freelist[0] tracks completely full blocks, freelist[1] is a list of blocks with 1 free chunk, etc. This is used to reuse space on almost full blocks first, in the hope that some of the less full blocks will get completely empty (and freed to the OS). Is that clear? +/* + * SlabContext is our standard implementation of MemoryContext. + * Really? Meh, that's clearly a bogus comment. +/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + */ Is this true? Why? And if it is then why doesn't the SlabChunk actually match the StandardChunkHeader? It is true, a lot of stuff in MemoryContext infrastructure relies on that. For example when you do pfree(ptr), we actually do something like header =
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi Tomas, On 02/08/16 17:44, Tomas Vondra wrote: > > This patch actually includes two new memory allocators (not one). Very > brief summary (for more detailed explanation of the ideas, see comments > at the beginning of slab.c and genslab.c): > > Slab > > * suitable for fixed-length allocations (other pallocs fail) > * much simpler than AllocSet (no global freelist management etc.) > * free space is tracked per block (using a simple bitmap) > * which allows freeing the block once all chunks are freed (AllocSet > will hold the memory forever, in the hope of reusing it) > > GenSlab > --- > * suitable for non-fixed-length allocations, but with chunks of mostly > the same size (initially unknown, the context will tune itself) > * a combination AllocSet and Slab (or a sequence of Slab allocators) > * the goal is to do most allocations in Slab context > * there's always a single 'current' Slab context, and every now and and > then it's replaced with a new generation (with the chunk size computed > from recent requests) > * the AllocSet context is used for chunks too large for current Slab > So it's just wrapper around the other two allocators to make this usecase easier to handle. Do you expect there will be use for the GenSlab eventually outside of the reoderbuffer? > So none of this is meant as a universal replacement of AllocSet, but in > the suitable cases the results seem really promising. For example for > the simple test query in [1], the performance improvement is this: > > N | master | patched >- > 1 |100ms |100ms > 5 | 15000ms |350ms >10 | 146000ms |700ms >20 |? | 1400ms > > That's a fairly significant improvement, and the submitted version of > the patches should perform even better (~2x, IIRC). > I agree that it improves performance quite nicely and that reoderbuffer could use this. About the code. I am not quite sure that this needs to be split into two patches especially if 1/3 of the second patch is the removal of the code added by the first one and otherwise it's quite small and straightforward. That is unless you expect the GenSlab to not go in. Slab: In general it seems understandable, the initial description helps to understand what's happening well enough. One thing I don't understand however is why the freelist is both array and doubly linked list and why there is new implementation of said doubly linked list given that we have dlist. > +/* > + * SlabContext is our standard implementation of MemoryContext. > + * Really? > +/* > + * SlabChunk > + * The prefix of each piece of memory in an SlabBlock > + * > + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. > + */ Is this true? Why? And if it is then why doesn't the SlabChunk actually match the StandardChunkHeader? > +#define SlabPointerIsValid(pointer) PointerIsValid(pointer) What's the point of this given that it's defined in the .c file? > +static void * > +SlabAlloc(MemoryContext context, Size size) > +{ > + Slabset = (Slab) context; > + SlabBlock block; > + SlabChunk chunk; > + int idx; > + > + AssertArg(SlabIsValid(set)); > + > + Assert(size == set->chunkSize); I wonder if there should be stronger protection (ie, elog) for the size matching. > +static void * > +SlabRealloc(MemoryContext context, void *pointer, Size size) > +{ > + Slabset = (Slab)context; > + > + /* can't do actual realloc with slab, but let's try to be gentle */ > + if (size == set->chunkSize) > + return pointer; > + > + /* we can't really do repalloc with this allocator */ > + Assert(false); This IMHO should definitely be elog. > +static void > +SlabCheck(MemoryContext context) > +{ > + /* FIXME */ > +} Do you plan to implement this interface? > +#define SLAB_DEFAULT_BLOCK_SIZE 8192 > +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024) I am guessing this is based on max_cached_tuplebufs? Maybe these could be written with same style? GenSlab: Since this is relatively simple wrapper it looks mostly ok to me. The only issue I have here is that I am not quite sure about those FIXME functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to mcxt but I guess the alternative there is to error out so this is probably preferable. Would want to hear other opinions here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: two slab-like memory allocators
Hi, Back in the bug #14231 thread [1], dealing with performance issues in reorderbuffer due to excessive number of expensive free() calls, I've proposed to resolve that by a custom slab-like memory allocator, suitable for fixed-size allocations. I'd like to put this into the next CF, and it's probably too invasive change to count as a bugfix anyway. [1] https://www.postgresql.org/message-id/flat/20160706185502.1426.28143%40wrigleys.postgresql.org This patch actually includes two new memory allocators (not one). Very brief summary (for more detailed explanation of the ideas, see comments at the beginning of slab.c and genslab.c): Slab * suitable for fixed-length allocations (other pallocs fail) * much simpler than AllocSet (no global freelist management etc.) * free space is tracked per block (using a simple bitmap) * which allows freeing the block once all chunks are freed (AllocSet will hold the memory forever, in the hope of reusing it) GenSlab --- * suitable for non-fixed-length allocations, but with chunks of mostly the same size (initially unknown, the context will tune itself) * a combination AllocSet and Slab (or a sequence of Slab allocators) * the goal is to do most allocations in Slab context * there's always a single 'current' Slab context, and every now and and then it's replaced with a new generation (with the chunk size computed from recent requests) * the AllocSet context is used for chunks too large for current Slab So none of this is meant as a universal replacement of AllocSet, but in the suitable cases the results seem really promising. For example for the simple test query in [1], the performance improvement is this: N | master | patched - 1 |100ms |100ms 5 | 15000ms |350ms 10 | 146000ms |700ms 20 |? | 1400ms That's a fairly significant improvement, and the submitted version of the patches should perform even better (~2x, IIRC). There's a bunch of TODOs - e.g. handling of realloc() calls in the GenSlab, and probably things I haven't thought about. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-simple-slab-allocator-fixed-size-allocations.patch Description: binary/octet-stream 0002-generational-slab-auto-tuning-allocator.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers