Re: [HACKERS] PATCH: two slab-like memory allocators

2017-04-02 Thread Petr Jelinek
Hi,

this patch is marked as committed in CF application but the second part
(generational allocator) was AFAICS never committed.

Does anybody plan to push this forward?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-07 Thread David Rowley
On 28 February 2017 at 01:02, Andres Freund  wrote:
> Hi,
>
> On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
>> I'll work on getting slab committed first, and then review / edit /
>> commit generation.c later.  One first note there is that I'm wondering
>> if generation.c is a too generic filename.
>
> And pushed slab and its usage.  Will have a look at generation.c
> tomorrow.

Attached is a patch to fix the compiler warning for compilers that
don't understand elog(ERROR) does not return.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


slaballoc_warning_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-07 Thread Tomas Vondra

On 03/07/2017 12:19 AM, Andres Freund wrote:

On 2017-03-02 22:51:09 +0100, Tomas Vondra wrote:

Attaches is the last part of the patch series, rebased to current master and
adopting the new chunk header approach.


Something seems to have gone awry while sending that - the attachement
is a whopping 0 bytes...



Why are you complaining? That makes it much easier to review and commit!

But if you insist, here is the actual patch ;-)

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 8aac670..0bdc214 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -149,15 +149,6 @@ typedef struct ReorderBufferDiskChange
  */
 static const Size max_changes_in_memory = 4096;
 
-/*
- * We use a very simple form of a slab allocator for frequently allocated
- * objects, simply keeping a fixed number in a linked list when unused,
- * instead pfree()ing them. Without that in many workloads aset.c becomes a
- * major bottleneck, especially when spilling to disk while decoding batch
- * workloads.
- */
-static const Size max_cached_tuplebufs = 4096 * 2;		/* ~8MB */
-
 /* ---
  * primary reorderbuffer support routines
  * ---
@@ -248,6 +239,10 @@ ReorderBufferAllocate(void)
 			SLAB_DEFAULT_BLOCK_SIZE,
 			sizeof(ReorderBufferTXN));
 
+	buffer->tup_context = GenerationContextCreate(new_ctx,
+		   "Tuples",
+		   SLAB_LARGE_BLOCK_SIZE);
+
 	hash_ctl.keysize = sizeof(TransactionId);
 	hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);
 	hash_ctl.hcxt = buffer->context;
@@ -258,15 +253,12 @@ ReorderBufferAllocate(void)
 	buffer->by_txn_last_xid = InvalidTransactionId;
 	buffer->by_txn_last_txn = NULL;
 
-	buffer->nr_cached_tuplebufs = 0;
-
 	buffer->outbuf = NULL;
 	buffer->outbufsize = 0;
 
 	buffer->current_restart_decoding_lsn = InvalidXLogRecPtr;
 
 	dlist_init(>toplevel_by_lsn);
-	slist_init(>cached_tuplebufs);
 
 	return buffer;
 }
@@ -419,42 +411,12 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len)
 
 	alloc_len = tuple_len + SizeofHeapTupleHeader;
 
-	/*
-	 * Most tuples are below MaxHeapTupleSize, so we use a slab allocator for
-	 * those. Thus always allocate at least MaxHeapTupleSize. Note that tuples
-	 * generated for oldtuples can be bigger, as they don't have out-of-line
-	 * toast columns.
-	 */
-	if (alloc_len < MaxHeapTupleSize)
-		alloc_len = MaxHeapTupleSize;
-
-
-	/* if small enough, check the slab cache */
-	if (alloc_len <= MaxHeapTupleSize && rb->nr_cached_tuplebufs)
-	{
-		rb->nr_cached_tuplebufs--;
-		tuple = slist_container(ReorderBufferTupleBuf, node,
-slist_pop_head_node(>cached_tuplebufs));
-		Assert(tuple->alloc_tuple_size == MaxHeapTupleSize);
-#ifdef USE_ASSERT_CHECKING
-		memset(>tuple, 0xa9, sizeof(HeapTupleData));
-		VALGRIND_MAKE_MEM_UNDEFINED(>tuple, sizeof(HeapTupleData));
-#endif
-		tuple->tuple.t_data = ReorderBufferTupleBufData(tuple);
-#ifdef USE_ASSERT_CHECKING
-		memset(tuple->tuple.t_data, 0xa8, tuple->alloc_tuple_size);
-		VALGRIND_MAKE_MEM_UNDEFINED(tuple->tuple.t_data, tuple->alloc_tuple_size);
-#endif
-	}
-	else
-	{
-		tuple = (ReorderBufferTupleBuf *)
-			MemoryContextAlloc(rb->context,
-			   sizeof(ReorderBufferTupleBuf) +
-			   MAXIMUM_ALIGNOF + alloc_len);
-		tuple->alloc_tuple_size = alloc_len;
-		tuple->tuple.t_data = ReorderBufferTupleBufData(tuple);
-	}
+	tuple = (ReorderBufferTupleBuf *)
+		MemoryContextAlloc(rb->tup_context,
+		   sizeof(ReorderBufferTupleBuf) +
+		   MAXIMUM_ALIGNOF + alloc_len);
+	tuple->alloc_tuple_size = alloc_len;
+	tuple->tuple.t_data = ReorderBufferTupleBufData(tuple);
 
 	return tuple;
 }
@@ -468,21 +430,7 @@ ReorderBufferGetTupleBuf(ReorderBuffer *rb, Size tuple_len)
 void
 ReorderBufferReturnTupleBuf(ReorderBuffer *rb, ReorderBufferTupleBuf *tuple)
 {
-	/* check whether to put into the slab cache, oversized tuples never are */
-	if (tuple->alloc_tuple_size == MaxHeapTupleSize &&
-		rb->nr_cached_tuplebufs < max_cached_tuplebufs)
-	{
-		rb->nr_cached_tuplebufs++;
-		slist_push_head(>cached_tuplebufs, >node);
-		VALGRIND_MAKE_MEM_UNDEFINED(tuple->tuple.t_data, tuple->alloc_tuple_size);
-		VALGRIND_MAKE_MEM_UNDEFINED(tuple, sizeof(ReorderBufferTupleBuf));
-		VALGRIND_MAKE_MEM_DEFINED(>node, sizeof(tuple->node));
-		VALGRIND_MAKE_MEM_DEFINED(>alloc_tuple_size, sizeof(tuple->alloc_tuple_size));
-	}
-	else
-	{
-		pfree(tuple);
-	}
+	pfree(tuple);
 }
 
 /*
diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index cd0e803..7263399 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr
 top_builddir = ../../../..
 include 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Andres Freund
On 2017-03-02 22:51:09 +0100, Tomas Vondra wrote:
> Attaches is the last part of the patch series, rebased to current master and
> adopting the new chunk header approach.

Something seems to have gone awry while sending that - the attachement
is a whopping 0 bytes...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Andres Freund
Hi,

On 2017-03-06 23:32:30 +0100, Tomas Vondra wrote:
> > > The question however is whether this won't make the optimization 
> > > pointless.
> > > I also, wonder how much we save by this optimization and how widely it's
> > > used? Can someone point me to some numbers?
> > 
> > I don't recall any recent numbers.  I'm more than a bit doubful that it
> > really matters - it's only used for the results of aggregate/window
> > functions, and surely they've a good chunk of their own overhead...
> > 
> 
> And if the benefit is negligible, trying to keep the optimization might
> easily result in slowdown (compared to non-optimized code).

I doubt the branch is noticeable here, given that we're doing a memory
allocation otherwise.  Should also be decently predictable.

> But I'm puzzled why we haven't seen any reports of failures? I mean, doing
> sum(int4) is not particularly extravagant thing, if there really is an
> issue, shouldn't we see a lot of reports? What are we missing?

Reports about what? False positives causing crashes / wrong results?  I
think it's quite unlikely to actually trigger this in practice, because
you need a properly aligned pointer, and then the preceding header has
to to point to a bit pattern that's equal to the context - that's
presumably quite unlikely in practice.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Tomas Vondra

On 03/06/2017 08:08 PM, Andres Freund wrote:

Hi,

On 2017-03-06 19:49:56 +0100, Tomas Vondra wrote:

On 03/06/2017 07:05 PM, Robert Haas wrote:

On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund  wrote:

On 2017-03-06 12:40:18 -0500, Robert Haas wrote:

On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund  wrote:

The issue was that on 32bit platforms the Datum returned by some
functions (int2int4_sum in this case) isn't actually a separately
allocated Datum, but rather just something embedded in a larger
struct.  That, combined with the following code:
if (!peraggstate->resulttypeByVal && !*isnull &&
!MemoryContextContains(CurrentMemoryContext,
   
DatumGetPointer(*result)))
seems somewhat problematic to me.  MemoryContextContains() can give
false positives when used on memory that's not a distinctly allocated
chunk, and if so, we violate memory lifetime rules.  It's quite
unlikely, given the required bit patterns, but nonetheless it's making
me somewhat uncomfortable.

Do others think this isn't an issue and we can just live with it?


I think it's 100% broken to call MemoryContextContains() on something
that's not guaranteed to be a palloc'd chunk.


I agree, but to me it seems the only fix would be to just yank out the
whole optimization?


Dunno, haven't looked into it.



I think it might be fixable by adding a flag into the chunk, with 'true' for
regular allocations, and 'false' for the optimized ones. And then only use
MemoryContextContains() for 'flag=true' chunks.


I'm not quite following here.  We only get a Datum and the knowledge
that it's a pass-by-ref argument, so we really don't know that much.  We
could create an "EmbeddedDatum" type that has a preceding chunk header
(appropriately for the version), that just gets zeroed out at start.  Is
that what you mean?



Yes, that's roughly what I meant.




The question however is whether this won't make the optimization pointless.
I also, wonder how much we save by this optimization and how widely it's
used? Can someone point me to some numbers?


I don't recall any recent numbers.  I'm more than a bit doubful that it
really matters - it's only used for the results of aggregate/window
functions, and surely they've a good chunk of their own overhead...



And if the benefit is negligible, trying to keep the optimization might 
easily result in slowdown (compared to non-optimized code).


But I'm puzzled why we haven't seen any reports of failures? I mean, 
doing sum(int4) is not particularly extravagant thing, if there really 
is an issue, shouldn't we see a lot of reports? What are we missing?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Andres Freund
Hi,

On 2017-03-06 19:49:56 +0100, Tomas Vondra wrote:
> On 03/06/2017 07:05 PM, Robert Haas wrote:
> > On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund  wrote:
> > > On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
> > > > On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund  
> > > > wrote:
> > > > > The issue was that on 32bit platforms the Datum returned by some
> > > > > functions (int2int4_sum in this case) isn't actually a separately
> > > > > allocated Datum, but rather just something embedded in a larger
> > > > > struct.  That, combined with the following code:
> > > > > if (!peraggstate->resulttypeByVal && !*isnull &&
> > > > > !MemoryContextContains(CurrentMemoryContext,
> > > > >
> > > > > DatumGetPointer(*result)))
> > > > > seems somewhat problematic to me.  MemoryContextContains() can give
> > > > > false positives when used on memory that's not a distinctly allocated
> > > > > chunk, and if so, we violate memory lifetime rules.  It's quite
> > > > > unlikely, given the required bit patterns, but nonetheless it's making
> > > > > me somewhat uncomfortable.
> > > > > 
> > > > > Do others think this isn't an issue and we can just live with it?
> > > > 
> > > > I think it's 100% broken to call MemoryContextContains() on something
> > > > that's not guaranteed to be a palloc'd chunk.
> > > 
> > > I agree, but to me it seems the only fix would be to just yank out the
> > > whole optimization?
> > 
> > Dunno, haven't looked into it.
> > 
> 
> I think it might be fixable by adding a flag into the chunk, with 'true' for
> regular allocations, and 'false' for the optimized ones. And then only use
> MemoryContextContains() for 'flag=true' chunks.

I'm not quite following here.  We only get a Datum and the knowledge
that it's a pass-by-ref argument, so we really don't know that much.  We
could create an "EmbeddedDatum" type that has a preceding chunk header
(appropriately for the version), that just gets zeroed out at start.  Is
that what you mean?


> The question however is whether this won't make the optimization pointless.
> I also, wonder how much we save by this optimization and how widely it's
> used? Can someone point me to some numbers?

I don't recall any recent numbers.  I'm more than a bit doubful that it
really matters - it's only used for the results of aggregate/window
functions, and surely they've a good chunk of their own overhead...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Tomas Vondra

On 03/06/2017 07:05 PM, Robert Haas wrote:

On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund  wrote:

On 2017-03-06 12:40:18 -0500, Robert Haas wrote:

On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund  wrote:

The issue was that on 32bit platforms the Datum returned by some
functions (int2int4_sum in this case) isn't actually a separately
allocated Datum, but rather just something embedded in a larger
struct.  That, combined with the following code:
if (!peraggstate->resulttypeByVal && !*isnull &&
!MemoryContextContains(CurrentMemoryContext,
   
DatumGetPointer(*result)))
seems somewhat problematic to me.  MemoryContextContains() can give
false positives when used on memory that's not a distinctly allocated
chunk, and if so, we violate memory lifetime rules.  It's quite
unlikely, given the required bit patterns, but nonetheless it's making
me somewhat uncomfortable.

Do others think this isn't an issue and we can just live with it?


I think it's 100% broken to call MemoryContextContains() on something
that's not guaranteed to be a palloc'd chunk.


I agree, but to me it seems the only fix would be to just yank out the
whole optimization?


Dunno, haven't looked into it.



I think it might be fixable by adding a flag into the chunk, with 'true' 
for regular allocations, and 'false' for the optimized ones. And then 
only use MemoryContextContains() for 'flag=true' chunks.


The question however is whether this won't make the optimization 
pointless. I also, wonder how much we save by this optimization and how 
widely it's used? Can someone point me to some numbers?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Robert Haas
On Mon, Mar 6, 2017 at 12:44 PM, Andres Freund  wrote:
> On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
>> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund  wrote:
>> > The issue was that on 32bit platforms the Datum returned by some
>> > functions (int2int4_sum in this case) isn't actually a separately
>> > allocated Datum, but rather just something embedded in a larger
>> > struct.  That, combined with the following code:
>> > if (!peraggstate->resulttypeByVal && !*isnull &&
>> > !MemoryContextContains(CurrentMemoryContext,
>> >
>> > DatumGetPointer(*result)))
>> > seems somewhat problematic to me.  MemoryContextContains() can give
>> > false positives when used on memory that's not a distinctly allocated
>> > chunk, and if so, we violate memory lifetime rules.  It's quite
>> > unlikely, given the required bit patterns, but nonetheless it's making
>> > me somewhat uncomfortable.
>> >
>> > Do others think this isn't an issue and we can just live with it?
>>
>> I think it's 100% broken to call MemoryContextContains() on something
>> that's not guaranteed to be a palloc'd chunk.
>
> I agree, but to me it seems the only fix would be to just yank out the
> whole optimization?

Dunno, haven't looked into it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Andres Freund
On 2017-03-06 12:40:18 -0500, Robert Haas wrote:
> On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund  wrote:
> > The issue was that on 32bit platforms the Datum returned by some
> > functions (int2int4_sum in this case) isn't actually a separately
> > allocated Datum, but rather just something embedded in a larger
> > struct.  That, combined with the following code:
> > if (!peraggstate->resulttypeByVal && !*isnull &&
> > !MemoryContextContains(CurrentMemoryContext,
> >
> > DatumGetPointer(*result)))
> > seems somewhat problematic to me.  MemoryContextContains() can give
> > false positives when used on memory that's not a distinctly allocated
> > chunk, and if so, we violate memory lifetime rules.  It's quite
> > unlikely, given the required bit patterns, but nonetheless it's making
> > me somewhat uncomfortable.
> >
> > Do others think this isn't an issue and we can just live with it?
> 
> I think it's 100% broken to call MemoryContextContains() on something
> that's not guaranteed to be a palloc'd chunk.

I agree, but to me it seems the only fix would be to just yank out the
whole optimization?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-06 Thread Robert Haas
On Wed, Mar 1, 2017 at 5:55 PM, Andres Freund  wrote:
> The issue was that on 32bit platforms the Datum returned by some
> functions (int2int4_sum in this case) isn't actually a separately
> allocated Datum, but rather just something embedded in a larger
> struct.  That, combined with the following code:
> if (!peraggstate->resulttypeByVal && !*isnull &&
> !MemoryContextContains(CurrentMemoryContext,
>
> DatumGetPointer(*result)))
> seems somewhat problematic to me.  MemoryContextContains() can give
> false positives when used on memory that's not a distinctly allocated
> chunk, and if so, we violate memory lifetime rules.  It's quite
> unlikely, given the required bit patterns, but nonetheless it's making
> me somewhat uncomfortable.
>
> Do others think this isn't an issue and we can just live with it?

I think it's 100% broken to call MemoryContextContains() on something
that's not guaranteed to be a palloc'd chunk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-03 Thread Tomas Vondra

On 03/04/2017 02:58 AM, Andres Freund wrote:

On 2017-03-01 22:19:30 -0800, Andres Freund wrote:

On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote:

I've noticed two minor typos:

1) That is solved this by creating ...
   - extra "this"

2) Given this, routines like pfree their corresponding context ...
   - missing "find" or "determine"


Will fix.


And done.


I also see you've explicitly mentioned the callbacks were added
in 9.5. Doesn't that somewhat reintroduce the historical
account?


That section I just moved up, the version reference was there
before. I left it in, because it seemed new enough to still be
somewhat relevant; removed and added it, not sure what's better.


Left that in place for now.



Yeah. I haven't realized it was just moved a bit, and moreover it 
probably makes sense to have some comments regarding differences between 
current versions. So +1 to that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-03 Thread Andres Freund
On 2017-03-01 22:19:30 -0800, Andres Freund wrote:
> On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote:
> > I've noticed two minor typos:
> > 
> > 1) That is solved this by creating ...
> >- extra "this"
> > 
> > 2) Given this, routines like pfree their corresponding context ...
> >- missing "find" or "determine"
> 
> Will fix.

And done.

> > I also see you've explicitly mentioned the callbacks were added in 9.5.
> > Doesn't that somewhat reintroduce the historical account?
> 
> That section I just moved up, the version reference was there before. I
> left it in, because it seemed new enough to still be somewhat
> relevant; removed and added it, not sure what's better.

Left that in place for now.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-02 Thread Tomas Vondra

On 03/01/2017 05:29 AM, Andres Freund wrote:

On 2017-02-28 20:18:35 -0800, Andres Freund wrote:

- Andres, hoping the buildfarm turns greener


Oh well, that didn't work. Investigating.



Attaches is the last part of the patch series, rebased to current master 
and adopting the new chunk header approach. Unlike Slab, this context 
needs the whole AllocSet header (size, requested_size), and also the 
block pointer, so no padding seems to be needed.


I've tested this on x86-64 and amrv7l, and the test_decoding test suite 
passes on both.


FWIW, I'm still not entirely happy with the name "Generation". I agree 
with Andres that it's perhaps a bit too generic, but more importantly 
the name might actually be a bit obsolete. There used to be generations 
of chunks, but that's gone. Now it simply does not reuse the chunks at 
all, and frees the blocks when they get empty.


It's not entirely FIFO though, because the transactions interleave, so 
later blocks may be released first. But the "allocated close, freed 
close" is still there. So perhaps something like "TemporalSet" or 
something like that would be a better name?


Man, naming things is hard ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


gen-context.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-01 Thread Andres Freund
On 2017-03-02 04:47:13 +0100, Tomas Vondra wrote:
> On 03/01/2017 11:55 PM, Andres Freund wrote:
> > On 2017-02-28 20:29:36 -0800, Andres Freund wrote:
> > > On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
> > > > - Andres, hoping the buildfarm turns greener
> > > 
> > > Oh well, that didn't work. Investigating.
> > 
> > The fix for that was fairly trivial, and the buildfarm has cooled down.
> > 
> > The issue was that on 32bit platforms the Datum returned by some
> > functions (int2int4_sum in this case) isn't actually a separately
> > allocated Datum, but rather just something embedded in a larger
> > struct.  That, combined with the following code:
> > if (!peraggstate->resulttypeByVal && !*isnull &&
> > !MemoryContextContains(CurrentMemoryContext,
> >
> > DatumGetPointer(*result)))
> > seems somewhat problematic to me.  MemoryContextContains() can give
> > false positives when used on memory that's not a distinctly allocated
> > chunk, and if so, we violate memory lifetime rules.  It's quite
> > unlikely, given the required bit patterns, but nonetheless it's making
> > me somewhat uncomfortable.
> > 
> 
> I assume you're only using that code snippet as an example of code that
> might be broken by MemoryContextContains() false positives, right?

I'm mentioning that piece of code because it's what temporarily caused
all 32bit animals to fail, when I had made MemoryContextContains() less
forgiving.


> (I don't see how the slab allocator could interfere with aggregates, as it's
> only used for reorderbuffer.c).

Indeed, this is independent of slab.c. I just came across it because I
triggered crashes when shrinking the StandardChunkHeader to be just the
chunk's MemoryContext.


> > Do others think this isn't an issue and we can just live with it?
> > 
> 
> My understanding is all the places calling MemoryContextContains() assume
> they can't receive memory not allocated as a simple chunk by palloc(). If
> that's not the case, it's likely broken.

Yea, that's my conclusion too. Which means nodeAgg.c and nodeWindowAgg.c
are broken afaics, because of e.g. int2int4_sum's() use of
Int64GetDatumFast() on sub-parts of larger allocations.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-01 Thread Andres Freund
On 2017-03-02 04:36:23 +0100, Tomas Vondra wrote:
> I've noticed two minor typos:
> 
> 1) That is solved this by creating ...
>- extra "this"
> 
> 2) Given this, routines like pfree their corresponding context ...
>- missing "find" or "determine"

Will fix.

> I also see you've explicitly mentioned the callbacks were added in 9.5.
> Doesn't that somewhat reintroduce the historical account?

That section I just moved up, the version reference was there before. I
left it in, because it seemed new enough to still be somewhat
relevant; removed and added it, not sure what's better.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-01 Thread Tomas Vondra

On 03/01/2017 11:55 PM, Andres Freund wrote:

On 2017-02-28 20:29:36 -0800, Andres Freund wrote:

On 2017-02-28 20:18:35 -0800, Andres Freund wrote:

- Andres, hoping the buildfarm turns greener


Oh well, that didn't work. Investigating.


The fix for that was fairly trivial, and the buildfarm has cooled down.

The issue was that on 32bit platforms the Datum returned by some
functions (int2int4_sum in this case) isn't actually a separately
allocated Datum, but rather just something embedded in a larger
struct.  That, combined with the following code:
if (!peraggstate->resulttypeByVal && !*isnull &&
!MemoryContextContains(CurrentMemoryContext,
   
DatumGetPointer(*result)))
seems somewhat problematic to me.  MemoryContextContains() can give
false positives when used on memory that's not a distinctly allocated
chunk, and if so, we violate memory lifetime rules.  It's quite
unlikely, given the required bit patterns, but nonetheless it's making
me somewhat uncomfortable.



I assume you're only using that code snippet as an example of code that 
might be broken by MemoryContextContains() false positives, right?


(I don't see how the slab allocator could interfere with aggregates, as 
it's only used for reorderbuffer.c).


>

Do others think this isn't an issue and we can just live with it?



My understanding is all the places calling MemoryContextContains() 
assume they can't receive memory not allocated as a simple chunk by 
palloc(). If that's not the case, it's likely broken.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-01 Thread Tomas Vondra

On 03/01/2017 05:18 AM, Andres Freund wrote:

On 2017-02-28 10:41:22 -0800, Andres Freund wrote:

Hi,

On 2017-02-27 23:44:20 -0800, Andres Freund wrote:

*preliminary* patch attached. This needs a good bit of polishing
(primarily comment work, verifying that valgrind works), but I'm too
tired now.

I'm not quite sure how to deal with mmgr/README - it's written as kind
of a historical document, and the "Mechanisms to Allow Multiple Types of
Contexts" is already quite out of date.  I think it'd be good to rip out
all the historical references and just describe the current state, but
I'm not really enthusiastic about tackling that :/


While still not enthusiastic, I took a stab at doing so.  While still
not perfect, I do think this is an improvement.

Is anybody uncomfortable going away from the current historical account
style?


I've pushed these now. I'm not claiming that the README revision is
perfect, but we can incremently improve it further...



Thanks. I went through the README and it definitely looks better now.

I've noticed two minor typos:

1) That is solved this by creating ...
   - extra "this"

2) Given this, routines like pfree their corresponding context ...
   - missing "find" or "determine"

I also see you've explicitly mentioned the callbacks were added in 9.5. 
Doesn't that somewhat reintroduce the historical account?



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-03-01 Thread Andres Freund
On 2017-02-28 20:29:36 -0800, Andres Freund wrote:
> On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
> > - Andres, hoping the buildfarm turns greener
> 
> Oh well, that didn't work. Investigating.

The fix for that was fairly trivial, and the buildfarm has cooled down.

The issue was that on 32bit platforms the Datum returned by some
functions (int2int4_sum in this case) isn't actually a separately
allocated Datum, but rather just something embedded in a larger
struct.  That, combined with the following code:
if (!peraggstate->resulttypeByVal && !*isnull &&
!MemoryContextContains(CurrentMemoryContext,
   
DatumGetPointer(*result)))
seems somewhat problematic to me.  MemoryContextContains() can give
false positives when used on memory that's not a distinctly allocated
chunk, and if so, we violate memory lifetime rules.  It's quite
unlikely, given the required bit patterns, but nonetheless it's making
me somewhat uncomfortable.

Do others think this isn't an issue and we can just live with it?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-28 Thread Andres Freund
On 2017-02-28 20:18:35 -0800, Andres Freund wrote:
> - Andres, hoping the buildfarm turns greener

Oh well, that didn't work. Investigating.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-28 Thread Andres Freund
On 2017-02-28 10:41:22 -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-02-27 23:44:20 -0800, Andres Freund wrote:
> > *preliminary* patch attached. This needs a good bit of polishing
> > (primarily comment work, verifying that valgrind works), but I'm too
> > tired now.
> > 
> > I'm not quite sure how to deal with mmgr/README - it's written as kind
> > of a historical document, and the "Mechanisms to Allow Multiple Types of
> > Contexts" is already quite out of date.  I think it'd be good to rip out
> > all the historical references and just describe the current state, but
> > I'm not really enthusiastic about tackling that :/
> 
> While still not enthusiastic, I took a stab at doing so.  While still
> not perfect, I do think this is an improvement.
> 
> Is anybody uncomfortable going away from the current historical account
> style?

I've pushed these now.  I'm not claiming that the README revision is
perfect, but we can incremently improve it further...

- Andres, hoping the buildfarm turns greener


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-28 Thread Andres Freund
Hi,

On 2017-02-27 23:44:20 -0800, Andres Freund wrote:
> *preliminary* patch attached. This needs a good bit of polishing
> (primarily comment work, verifying that valgrind works), but I'm too
> tired now.
> 
> I'm not quite sure how to deal with mmgr/README - it's written as kind
> of a historical document, and the "Mechanisms to Allow Multiple Types of
> Contexts" is already quite out of date.  I think it'd be good to rip out
> all the historical references and just describe the current state, but
> I'm not really enthusiastic about tackling that :/

While still not enthusiastic, I took a stab at doing so.  While still
not perfect, I do think this is an improvement.

Is anybody uncomfortable going away from the current historical account
style?

- Andres
>From 5eef2abe5e593b6a9b072b58bbecadbe15689a01 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 28 Feb 2017 10:36:29 -0800
Subject: [PATCH 2/2] Overhaul mmgr/'s README.

The README was written as a "historical account", and that style
hasn't aged particularly well.  Rephrase it to describe the current
situation, instead of having various version specific comments.

This also updates the description of how allocated chunks are
associated with their corresponding context, the method of which has
changed with the previous commit.

Author: Andres Freund
Discussion: https://postgr.es/m/d15dff83-0b37-28ed-0809-95a5cc729...@2ndquadrant.com
---
 src/backend/utils/mmgr/README | 292 +++---
 1 file changed, 132 insertions(+), 160 deletions(-)

diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index f97d7653de..b83b29c268 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -1,15 +1,7 @@
 src/backend/utils/mmgr/README
 
-Notes About Memory Allocation Redesign
-==
-
-Up through version 7.0, Postgres had serious problems with memory leakage
-during large queries that process a lot of pass-by-reference data.  There
-was no provision for recycling memory until end of query.  This needed to be
-fixed, even more so with the advent of TOAST which allows very large chunks
-of data to be passed around in the system.  This document describes the new
-memory management system implemented in 7.1.
-
+Memory Context System Design Overview
+=
 
 Background
 --
@@ -38,10 +30,10 @@ to or get more memory from the same context the chunk was originally
 allocated in.
 
 At all times there is a "current" context denoted by the
-CurrentMemoryContext global variable.  The backend macro palloc()
-implicitly allocates space in that context.  The MemoryContextSwitchTo()
-operation selects a new current context (and returns the previous context,
-so that the caller can restore the previous context before exiting).
+CurrentMemoryContext global variable.  palloc() implicitly allocates space
+in that context.  The MemoryContextSwitchTo() operation selects a new current
+context (and returns the previous context, so that the caller can restore the
+previous context before exiting).
 
 The main advantage of memory contexts over plain use of malloc/free is
 that the entire contents of a memory context can be freed easily, without
@@ -60,8 +52,10 @@ The behavior of palloc and friends is similar to the standard C library's
 malloc and friends, but there are some deliberate differences too.  Here
 are some notes to clarify the behavior.
 
-* If out of memory, palloc and repalloc exit via elog(ERROR).  They never
-return NULL, and it is not necessary or useful to test for such a result.
+* If out of memory, palloc and repalloc exit via elog(ERROR).  They
+never return NULL, and it is not necessary or useful to test for such
+a result.  With palloc_extended() that behavior can be overridden
+using the MCXT_ALLOC_NO_OOM flag.
 
 * palloc(0) is explicitly a valid operation.  It does not return a NULL
 pointer, but a valid chunk of which no bytes may be used.  However, the
@@ -71,28 +65,18 @@ error.  Similarly, repalloc allows realloc'ing to zero size.
 * pfree and repalloc do not accept a NULL pointer.  This is intentional.
 
 
-pfree/repalloc No Longer Depend On CurrentMemoryContext

+The Current Memory Context
+--
 
-Since Postgres 7.1, pfree() and repalloc() can be applied to any chunk
-whether it belongs to CurrentMemoryContext or not --- the chunk's owning
-context will be invoked to handle the operation, regardless.  This is a
-change from the old requirement that CurrentMemoryContext must be set
-to the same context the memory was allocated from before one can use
-pfree() or repalloc().
-
-There was some consideration of getting rid of CurrentMemoryContext entirely,
-instead requiring the target memory context for allocation to be specified
-explicitly.  But we decided that would be too much notational overhead ---
-we'd 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
Hi,

*preliminary* patch attached. This needs a good bit of polishing
(primarily comment work, verifying that valgrind works), but I'm too
tired now.

I'm not quite sure how to deal with mmgr/README - it's written as kind
of a historical document, and the "Mechanisms to Allow Multiple Types of
Contexts" is already quite out of date.  I think it'd be good to rip out
all the historical references and just describe the current state, but
I'm not really enthusiastic about tackling that :/

On 2017-02-28 00:29:44 -0500, Tom Lane wrote:
> Not sure either, but suggest we add a StaticAssert asserting there's no
> padding; something along the lines of
>   offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == 
> MAXALIGN(sizeof(AllocSetChunkHeader))

Included.

With Craig's help I've verified that the patch as attached works on a
platform that's currently failing. Thanks!

- Andres
>From a59c3200dd127feb0cb09a055250ff6401aee1aa Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 27 Feb 2017 23:32:22 -0800
Subject: [PATCH] Remove StandardChunkHeader for Slab's benefit.

---
 src/backend/utils/mmgr/aset.c|  51 -
 src/backend/utils/mmgr/mcxt.c| 121 +++
 src/backend/utils/mmgr/slab.c|  81 --
 src/include/utils/memutils.h |  56 +++---
 src/tools/pgindent/typedefs.list |   1 -
 5 files changed, 92 insertions(+), 218 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 8056c00ae4..31228ade19 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -97,20 +97,7 @@
  */
 
 #define ALLOC_BLOCKHDRSZ	MAXALIGN(sizeof(AllocBlockData))
-#define ALLOC_CHUNKHDRSZ	MAXALIGN(sizeof(AllocChunkData))
-
-/* Portion of ALLOC_CHUNKHDRSZ examined outside aset.c. */
-#define ALLOC_CHUNK_PUBLIC	\
-	(offsetof(AllocChunkData, size) + sizeof(Size))
-
-/* Portion of ALLOC_CHUNKHDRSZ excluding trailing padding. */
-#ifdef MEMORY_CONTEXT_CHECKING
-#define ALLOC_CHUNK_USED	\
-	(offsetof(AllocChunkData, requested_size) + sizeof(Size))
-#else
-#define ALLOC_CHUNK_USED	\
-	(offsetof(AllocChunkData, size) + sizeof(Size))
-#endif
+#define ALLOC_CHUNKHDRSZ	sizeof(struct AllocChunkData)
 
 typedef struct AllocBlockData *AllocBlock;		/* forward reference */
 typedef struct AllocChunkData *AllocChunk;
@@ -169,20 +156,27 @@ typedef struct AllocBlockData
 /*
  * AllocChunk
  *		The prefix of each piece of memory in an AllocBlock
- *
- * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
  */
 typedef struct AllocChunkData
 {
-	/* aset is the owning aset if allocated, or the freelist link if free */
-	void	   *aset;
 	/* size is always the size of the usable space in the chunk */
 	Size		size;
+
 #ifdef MEMORY_CONTEXT_CHECKING
 	/* when debugging memory usage, also store actual requested size */
 	/* this is zero in a free chunk */
 	Size		requested_size;
+
+#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
+	Size		padding;
 #endif
+
+#endif /* MEMORY_CONTEXT_CHECKING */
+
+	/* aset is the owning aset if allocated, or the freelist link if free */
+	void	   *aset;
+
+	/* there must not be any padding to reach a MAXALIGN boundary here! */
 }	AllocChunkData;
 
 /*
@@ -334,6 +328,10 @@ AllocSetContextCreate(MemoryContext parent,
 {
 	AllocSet	set;
 
+	StaticAssertStmt(offsetof(AllocChunkData, aset) + sizeof(MemoryContext) ==
+	 MAXALIGN(sizeof(AllocChunkData)),
+	 "padding calculation in AllocChunkData is wrong");
+
 	/*
 	 * First, validate allocation parameters.  (If we're going to throw an
 	 * error, we should do so before the context is created, not after.)  We
@@ -616,13 +614,14 @@ AllocSetAlloc(MemoryContext context, Size size)
 		AllocAllocInfo(set, chunk);
 
 		/*
-		 * Chunk header public fields remain DEFINED.  The requested
-		 * allocation itself can be NOACCESS or UNDEFINED; our caller will
-		 * soon make it UNDEFINED.  Make extra space at the end of the chunk,
-		 * if any, NOACCESS.
+		 * Chunk's context fields remain DEFINED.  The requested allocation
+		 * itself can be NOACCESS or UNDEFINED; our caller will soon make it
+		 * UNDEFINED.  Make extra space at the end of the chunk, if any,
+		 * NOACCESS.
 		 */
-		VALGRIND_MAKE_MEM_NOACCESS((char *) chunk + ALLOC_CHUNK_PUBLIC,
-		 chunk_size + ALLOC_CHUNKHDRSZ - ALLOC_CHUNK_PUBLIC);
+		VALGRIND_MAKE_MEM_NOACCESS((char *) chunk,
+   chunk_size + ALLOC_CHUNKHDRSZ);
+		VALGRIND_MAKE_MEM_DEFINED(chunk->aset, sizeof(MemoryContext));
 
 		return AllocChunkGetPointer(chunk);
 	}
@@ -709,7 +708,7 @@ AllocSetAlloc(MemoryContext context, Size size)
 chunk = (AllocChunk) (block->freeptr);
 
 /* Prepare to initialize the chunk header. */
-VALGRIND_MAKE_MEM_UNDEFINED(chunk, ALLOC_CHUNK_USED);
+VALGRIND_MAKE_MEM_UNDEFINED(chunk, sizeof(AllocChunkData));
 
 block->freeptr += (availchunk + ALLOC_CHUNKHDRSZ);
 availspace -= (availchunk + 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane
Andres Freund  writes:
> Hm, that should be doable with something like
> #if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4
> which'd probably be better documentation than a macro that hides this
> (arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better.

Not sure either, but suggest we add a StaticAssert asserting there's no
padding; something along the lines of
offsetof(AllocSetChunkHeader, context) + sizeof(MemoryContext) == 
MAXALIGN(sizeof(AllocSetChunkHeader))

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
Hi,

On 2017-02-27 22:57:24 -0500, Tom Lane wrote:
> If the slab allocator would be happier with just a MemoryContext pointer
> as chunk header, I think we should push in this direction rather than
> invent some short-term hack.

It would - it really doesn't need the size, because it's the same for
the whole context, and thereby is a waste of space.  Still wondering if
we should band-aid this till that's done.


> One could imagine redefining aset.c's chunk header along the lines of
> 
> typedef struct AllocSetChunkHeader
> {
> Sizesize; /* size of data space allocated in chunk */
> #ifdef MEMORY_CONTEXT_CHECKING
> Sizerequested_size;   /* original request size */
> #if 32-bit-but-maxalign-is-8
> Sizepadding;  /* needed to avoid padding below */
> #endif
> #endif
> MemoryContext context;/* owning context */
> /* there must not be any padding to reach a MAXALIGN boundary here! */
> } AllocSetChunkHeader;
> 
> where we'd possibly need some help from configure to implement that inner
> #if condition, but it seems doable enough.

Hm, that should be doable with something like
#if MAXIMUM_ALIGNOF > 4 && SIZEOF_VOID_P == 4

which'd probably be better documentation than a macro that hides this
(arguing internally whether SIZEOF_VOID_P or SIZEOF_SIZE_T) is better.

Working on a patch now, will post but not push tonight.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane
Andres Freund  writes:
> Independently of this, we really should redefine StandardChunkHeader to
> be only the MemoryContext.  There's no need to have size/requested_size
> part of StandardChunkHeader, given there's
> MemoryContextMethods->get_chunk_space().

Yeah, perhaps.  The trick would be to get things laid out so that the 
MemoryContext pointer is always immediately adjacent to the chunk data
(no padding between).

One could imagine redefining aset.c's chunk header along the lines of

typedef struct AllocSetChunkHeader
{
Sizesize; /* size of data space allocated in chunk */
#ifdef MEMORY_CONTEXT_CHECKING
Sizerequested_size;   /* original request size */
#if 32-bit-but-maxalign-is-8
Sizepadding;  /* needed to avoid padding below */
#endif
#endif
MemoryContext context;/* owning context */
/* there must not be any padding to reach a MAXALIGN boundary here! */
} AllocSetChunkHeader;

where we'd possibly need some help from configure to implement that inner
#if condition, but it seems doable enough.

If the slab allocator would be happier with just a MemoryContext pointer
as chunk header, I think we should push in this direction rather than
invent some short-term hack.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
On 2017-02-28 01:44:42 +0100, Tomas Vondra wrote:
> On 02/27/2017 06:42 PM, Andres Freund wrote:
> > Yea, I hadn't yet realized when writing that that termite actually,
> > despite running on ppc64, compiles a 32bit postgres. Will thus
> > duplicate StandardChunkHeader's contents in to slab.c :( - I don't
> > see an easy way around that...
>
> I've tried this - essentially copying the StandardChunkHeader's contents
> into SlabChunk, but that does not seem to do the trick, sadly. Per pahole,
> the structures then (at least on armv7l) look like this:
>
> struct SlabChunk {
> void * block;/*  0   4 */
> MemoryContext  context;  /*  4   4 */
> Size   size; /*  8   4 */
> Size   requested_size;   /* 12   4 */
>
> /* size: 16, cachelines: 1, members: 4 */
> /* last cacheline: 16 bytes */
> };
>
> struct StandardChunkHeader {
> MemoryContext  context;  /*  0   4 */
> Size   size; /*  4   4 */
> Size   requested_size;   /*  8   4 */
>
> /* size: 12, cachelines: 1, members: 3 */
> /* last cacheline: 12 bytes */
> };

> So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so
> pfree() grabs the block pointer but thinks it's the context :-(

Hm. The only way I can think of to do achieve the right thing here would
be something like:

typedef struct StandardChunkHeader
{
MemoryContext context;  /* owning context */
Sizesize;   /* size of data space allocated in chunk */
#ifdef MEMORY_CONTEXT_CHECKING
/* when debugging memory usage, also store actual requested size */
Sizerequested_size;
#endif
union
{
char *data;
/* ensure MAXALIGNed */
int64 alignof_int64;
double alignof_double;
} d;
} StandardChunkHeader;

typedef struct SlabChunk
{
void *block;
StandardChunkHeader header;
} SlabChunk;

That's not overly pretty, but also not absolutely disgusting.  Unifying
the padding calculations between allocators would be a nice side-effect.
Note we at least previously had such union/double tricks in the tree, via
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e1a11d93111ff3fba7a91f3f2ac0b0aca16909a8


It might be a good idea to have configure define maxaligned_type instead
of including both int64/double (although it'll IIRC practically always
be double that's maxaligned).


Independently of this, we really should redefine StandardChunkHeader to
be only the MemoryContext.  There's no need to have size/requested_size
part of StandardChunkHeader, given there's
MemoryContextMethods->get_chunk_space().


> Not sure what to do about this - the only thing I can think about is
> splitting SlabChunk into two separate structures, and align them
> independently.
>
> The attached patch does that - it probably needs a bit more work on the
> comments to make it commit-ready, but it fixes the test_deconding tests on
> the rpi3 board I'm using for testing.

That'd work as well, although at the very least I'd want to add a
comment explaining the actual memory layout somewhere - this is a bit
too finnicky to expect to get right the next time round.


Any preferences?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra

On 02/27/2017 06:42 PM, Andres Freund wrote:

On 2017-02-27 12:27:48 -0500, Tom Lane wrote:

Andres Freund  writes:

The best theory I have so far that I have is that slab.c's idea of
StandardChunkHeader's size doesn't match what mcxt.c think it is
(because slab.c simply embeds StandardChunkHeader, but mcxt uses
MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
quite see how that'd cause the issue, since StandardChunkHeader's size
should always be properly sized.


Uh, wrong.  On a 32-bit machine with debug enabled, StandardChunkHeader
will contain 3 4-byte fields.  However, there are some such machines on
which MAXALIGN is 8.  For example, looking at termite's configure
output:

checking size of void *... 4
checking size of size_t... 4
checking size of long... 4
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 4
checking alignment of long long int... 8
checking alignment of double... 8

axolotl's output looks similar.  I expect my old HPPA dinosaur
will show the failure as well.


Yea, I hadn't yet realized when writing that that termite actually,
despite running on ppc64, compiles a 32bit postgres. Will thus
duplicate StandardChunkHeader's contents in to slab.c :( - I don't
see an easy way around that...



I've tried this - essentially copying the StandardChunkHeader's contents 
into SlabChunk, but that does not seem to do the trick, sadly. Per 
pahole, the structures then (at least on armv7l) look like this:


struct SlabChunk {
void * block;/*  0   4 */
MemoryContext  context;  /*  4   4 */
Size   size; /*  8   4 */
Size   requested_size;   /* 12   4 */

/* size: 16, cachelines: 1, members: 4 */
/* last cacheline: 16 bytes */
};

struct StandardChunkHeader {
MemoryContext  context;  /*  0   4 */
Size   size; /*  4   4 */
Size   requested_size;   /*  8   4 */

/* size: 12, cachelines: 1, members: 3 */
/* last cacheline: 12 bytes */
};

So SlabChunk happens to be perfectly aligned (MAXIMUM_ALIGNOF=8), and so 
pfree() grabs the block pointer but thinks it's the context :-(


Not sure what to do about this - the only thing I can think about is 
splitting SlabChunk into two separate structures, and align them 
independently.


The attached patch does that - it probably needs a bit more work on the 
comments to make it commit-ready, but it fixes the test_deconding tests 
on the rpi3 board I'm using for testing.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-fix.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra

On 02/27/2017 04:07 PM, Andres Freund wrote:



On February 27, 2017 6:14:20 AM PST, Tomas Vondra
 wrote:

On 02/27/2017 01:02 PM, Andres Freund wrote:

Hi,

On 2017-02-27 03:17:32 -0800, Andres Freund wrote:

I'll work on getting slab committed first, and then review /
edit / commit generation.c later.  One first note there is that
I'm

wondering

if generation.c is a too generic filename.


And pushed slab and its usage.  Will have a look at generation.c
tomorrow.

- Andres



Gah. I don't want to annoy person who just committed my patch, but
can you give more time when asking for feedback? I mean, sending a
modified

patch on Friday midnight, and committing on Monday noon does not
really

give much time to look at it.


Hm. The changes IMO weren't controversial (or surprising -most of
them I had announced previously); I announced that I would when
posting the review that I'd push the patch later that weekend. If I
hadn't been tired after doing the review/editing I'd have just pushed
right then and there.  It's hard to find time and attention, so not
introducing a week of feedback time is quite worthwhile.  I listed
the changes I made primarily for posterities sake.  Most if not all
committers make editorializing changed around commit, so that's not
just me.

If you specifically want I can try to give you more time to look at
an edited patch, but that'll mean things move slower.  I won't
promise not to make minor changed just before commit either way, I
always do another round of review just before push.



I also agree the changes are not particularly controversial, but then 
why to ask for comments at all? I'm OK with a committer making final 
tweaks and pushing it without asking, but if you ask for comments, let's 
give people time to actually respond.


I agree introducing weeks of delays would be silly, but I'm not asking 
for that. I'm perfectly fine with two days for feedback, as long as it's 
not a weekend + half of Monday.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra

On 02/27/2017 06:40 PM, Andres Freund wrote:

On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote:

On 27/02/17 18:00, Andres Freund wrote:



FWIW I think the ppc64 machines are failing because of unrelated issue
(changes to integer timestamps). We should probably look at 32bit machines
first.


Don't think so - termite is ppc64 afaics, and the failure doesn't look
integer timestamp related (assert failure is clearly about this, and set
of changed commits *only* include slab related commits).



termite is ppc64 but with 4 byte pointer size according to configure so
it might be related to that perhaps?


Uh, ok. I checked the --configure options, but not the actual configure
output (blame -ENOCOFEE and jetlag).  The output makes it fairly likely
that my StandardChunkHeader theory is valid, so I'll work on a patch to
clean that up.



Thanks. I set up a rpi3 machine (amrv7l) that fails with the same issue, 
so if you need to test the patch, let me know.


While building, I've also noticed a bunch of warnings about string 
formatting, attached is a patch that that fixes those.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/mmgr/slab.c b/src/backend/utils/mmgr/slab.c
index a5e140e..c673bc3 100644
--- a/src/backend/utils/mmgr/slab.c
+++ b/src/backend/utils/mmgr/slab.c
@@ -207,7 +207,7 @@ SlabContextCreate(MemoryContext parent,
 
 	/* Make sure the block can store at least one chunk. */
 	if (blockSize - sizeof(SlabBlock) < fullChunkSize)
-		elog(ERROR, "block size %ld for slab is too small for %ld chunks",
+		elog(ERROR, "block size %zu for slab is too small for %zu chunks",
 			 blockSize, chunkSize);
 
 	/* Compute maximum number of chunks per block */
@@ -333,7 +333,7 @@ SlabAlloc(MemoryContext context, Size size)
 
 	/* make sure we only allow correct request size */
 	if (size != slab->chunkSize)
-		elog(ERROR, "unexpected alloc chunk size %ld (expected %ld)",
+		elog(ERROR, "unexpected alloc chunk size %zu (expected %zu)",
 			 size, slab->chunkSize);
 
 	/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
On 2017-02-27 12:27:48 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > The best theory I have so far that I have is that slab.c's idea of
> > StandardChunkHeader's size doesn't match what mcxt.c think it is
> > (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> > MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
> > quite see how that'd cause the issue, since StandardChunkHeader's size
> > should always be properly sized.
> 
> Uh, wrong.  On a 32-bit machine with debug enabled, StandardChunkHeader
> will contain 3 4-byte fields.  However, there are some such machines on
> which MAXALIGN is 8.  For example, looking at termite's configure
> output:
> 
> checking size of void *... 4
> checking size of size_t... 4
> checking size of long... 4
> checking alignment of short... 2
> checking alignment of int... 4
> checking alignment of long... 4
> checking alignment of long long int... 8
> checking alignment of double... 8
> 
> axolotl's output looks similar.  I expect my old HPPA dinosaur
> will show the failure as well.

Yea, I hadn't yet realized when writing that that termite actually,
despite running on ppc64, compiles a 32bit postgres.  Will thus
duplicate StandardChunkHeader's contents in to slab.c :( - I don't see
an easy way around that...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane
Andres Freund  writes:
> The best theory I have so far that I have is that slab.c's idea of
> StandardChunkHeader's size doesn't match what mcxt.c think it is
> (because slab.c simply embeds StandardChunkHeader, but mcxt uses
> MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
> quite see how that'd cause the issue, since StandardChunkHeader's size
> should always be properly sized.

Uh, wrong.  On a 32-bit machine with debug enabled, StandardChunkHeader
will contain 3 4-byte fields.  However, there are some such machines on
which MAXALIGN is 8.  For example, looking at termite's configure
output:

checking size of void *... 4
checking size of size_t... 4
checking size of long... 4
checking alignment of short... 2
checking alignment of int... 4
checking alignment of long... 4
checking alignment of long long int... 8
checking alignment of double... 8

axolotl's output looks similar.  I expect my old HPPA dinosaur
will show the failure as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
On 2017-02-27 18:04:41 +0100, Petr Jelinek wrote:
> On 27/02/17 18:00, Andres Freund wrote:
> > 
> >> FWIW I think the ppc64 machines are failing because of unrelated issue
> >> (changes to integer timestamps). We should probably look at 32bit machines
> >> first.
> > 
> > Don't think so - termite is ppc64 afaics, and the failure doesn't look
> > integer timestamp related (assert failure is clearly about this, and set
> > of changed commits *only* include slab related commits).
> > 
> 
> termite is ppc64 but with 4 byte pointer size according to configure so
> it might be related to that perhaps?

Uh, ok. I checked the --configure options, but not the actual configure
output (blame -ENOCOFEE and jetlag).  The output makes it fairly likely
that my StandardChunkHeader theory is valid, so I'll work on a patch to
clean that up.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane
Andres Freund  writes:
>> FWIW I think the ppc64 machines are failing because of unrelated issue
>> (changes to integer timestamps). We should probably look at 32bit machines
>> first.

> Don't think so - termite is ppc64 afaics, and the failure doesn't look
> integer timestamp related (assert failure is clearly about this, and set
> of changed commits *only* include slab related commits).

There are a couple of animals that have --disable-integer-datetimes,
but those are failing at the configure stage, and were doing so
before today.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Petr Jelinek
On 27/02/17 18:00, Andres Freund wrote:
> 
>> FWIW I think the ppc64 machines are failing because of unrelated issue
>> (changes to integer timestamps). We should probably look at 32bit machines
>> first.
> 
> Don't think so - termite is ppc64 afaics, and the failure doesn't look
> integer timestamp related (assert failure is clearly about this, and set
> of changed commits *only* include slab related commits).
> 

termite is ppc64 but with 4 byte pointer size according to configure so
it might be related to that perhaps?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
Hi,

On 2017-02-27 17:56:08 +0100, Tomas Vondra wrote:
> No, I don't, but I'll ping Craig. I might ping him, but it's ~4AM in
> Australia, though, so it'll take time.

Did the same... ;)


> FWIW I think the ppc64 machines are failing because of unrelated issue
> (changes to integer timestamps). We should probably look at 32bit machines
> first.

Don't think so - termite is ppc64 afaics, and the failure doesn't look
integer timestamp related (assert failure is clearly about this, and set
of changed commits *only* include slab related commits).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra

On 02/27/2017 05:48 PM, Andres Freund wrote:

On 2017-02-27 07:55:32 -0800, Andres Freund wrote:

On 2017-02-27 10:32:25 -0500, Tom Lane wrote:

Andres Freund  writes:

And pushed slab and its usage.  Will have a look at generation.c
tomorrow.


Perhaps first you need to find out why so much of the buildfarm
is unhappy.


Will do, after a morning coffee.


Hm. Not entirely clear on what's going on yet. I've run the tests on
hydra (community ppc 64 machine), which is pretty similar to termite
which failed [1] with:
TRAP: BadArgument("!(((context) != ((void *)0) && (const Node*)((context)))->type) == 
T_AllocSetContext) || const Node*)((context)))->type) == T_SlabContext", File: 
"/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/utils/mmgr/mcxt.c", Line: 
1010)

The best theory I have so far that I have is that slab.c's idea of
StandardChunkHeader's size doesn't match what mcxt.c think it is
(because slab.c simply embeds StandardChunkHeader, but mcxt uses
MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
quite see how that'd cause the issue, since StandardChunkHeader's size
should always be properly sized.

Tomas, do you have access to termite (which appears to be run by Craig,
under company mail).



No, I don't, but I'll ping Craig. I might ping him, but it's ~4AM in 
Australia, though, so it'll take time.


FWIW I think the ppc64 machines are failing because of unrelated issue 
(changes to integer timestamps). We should probably look at 32bit 
machines first.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
Hi,

On 2017-02-27 15:11:40 +0100, Tomas Vondra wrote:
> > I've not yet reviewed the generational allocator yet, but during these
> > measurements I get:
> > postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', 
> > NULL, NULL);
> > WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in 
> > block 0x55d011ef10f0 exceeds 7234 allocated
> > LOCATION:  GenerationCheck, generation.c:693
> > WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in 
> > block 0x55d01023eba0 exceeds 65532 allocated
> > LOCATION:  GenerationCheck, generation.c:693
> > WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in 
> > block 0x55d00d7fb870 exceeds 65532 allocated
> > LOCATION:  GenerationCheck, generation.c:693
> > WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in 
> > block 0x55d00cde17b0 exceeds 65531 allocated
> > LOCATION:  GenerationCheck, generation.c:693
> > 
> > that seems to occur when there's currently in-progress transactions when
> > finishing decoding:
> > 
> ...
> > 
> > could it be that the test's condition is inverted?
> > 
> 
> Yeah, that seems like the culprit - the condition seems wrong. I wonder why
> I haven't seen it during my tests, though ...

I suspect it's because your tests only triggered a memory context reset
when it was empty... But I ran decoding while a concurrent write
transaction was ongoing...


> > I'll work on getting slab committed first, and then review / edit /
> > commit generation.c later.  One first note there is that I'm wondering
> > if generation.c is a too generic filename.

> Naming things is hard.

Indeed.  I was thinking of genalloc, but that might be understood as
general, rather generational...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
On 2017-02-27 07:55:32 -0800, Andres Freund wrote:
> On 2017-02-27 10:32:25 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > And pushed slab and its usage.  Will have a look at generation.c
> > > tomorrow.
> > 
> > Perhaps first you need to find out why so much of the buildfarm
> > is unhappy.
> 
> Will do, after a morning coffee.

Hm. Not entirely clear on what's going on yet. I've run the tests on
hydra (community ppc 64 machine), which is pretty similar to termite
which failed [1] with:
TRAP: BadArgument("!(((context) != ((void *)0) && (const 
Node*)((context)))->type) == T_AllocSetContext) || const 
Node*)((context)))->type) == T_SlabContext", File: 
"/home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/utils/mmgr/mcxt.c",
 Line: 1010)

The best theory I have so far that I have is that slab.c's idea of
StandardChunkHeader's size doesn't match what mcxt.c think it is
(because slab.c simply embeds StandardChunkHeader, but mcxt uses
MAXALIGN(sizeof(StandardChunkHeader))).  That's not good, but I don't
quite see how that'd cause the issue, since StandardChunkHeader's size
should always be properly sized.

Tomas, do you have access to termite (which appears to be run by Craig,
under company mail).

If not, I can push a "blind" fix, but I'd rather have more information.

Greetings,

Andres Freund

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=termite=2017-02-27%2014%3A00%3A06


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
On 2017-02-27 10:32:25 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > And pushed slab and its usage.  Will have a look at generation.c
> > tomorrow.
> 
> Perhaps first you need to find out why so much of the buildfarm
> is unhappy.

Will do, after a morning coffee.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tom Lane
Andres Freund  writes:
> And pushed slab and its usage.  Will have a look at generation.c
> tomorrow.

Perhaps first you need to find out why so much of the buildfarm
is unhappy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund


On February 27, 2017 6:14:20 AM PST, Tomas Vondra 
 wrote:
>On 02/27/2017 01:02 PM, Andres Freund wrote:
>> Hi,
>>
>> On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
>>> I'll work on getting slab committed first, and then review / edit /
>>> commit generation.c later.  One first note there is that I'm
>wondering
>>> if generation.c is a too generic filename.
>>
>> And pushed slab and its usage.  Will have a look at generation.c
>> tomorrow.
>>
>> - Andres
>>
>
>Gah. I don't want to annoy person who just committed my patch, but can 
>you give more time when asking for feedback? I mean, sending a modified
>
>patch on Friday midnight, and committing on Monday noon does not really
>
>give much time to look at it.

Hm. The changes IMO weren't controversial (or surprising -most of them I had 
announced previously); I announced that I would when posting the review that 
I'd push the patch later that weekend. If I hadn't been tired after doing the 
review/editing I'd have just pushed right then and there.  It's hard to find 
time and attention, so not introducing a week of feedback time is quite 
worthwhile.  I listed the changes I made primarily for posterities sake.  Most 
if not all committers make editorializing changed around commit, so that's not 
just me.

If you specifically want I can try to give you more time to look at an edited 
patch, but that'll mean things move slower.  I won't promise not to make minor 
changed just before commit either way, I always do another round of review just 
before push.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra

On 02/27/2017 01:02 PM, Andres Freund wrote:

Hi,

On 2017-02-27 03:17:32 -0800, Andres Freund wrote:

I'll work on getting slab committed first, and then review / edit /
commit generation.c later.  One first note there is that I'm wondering
if generation.c is a too generic filename.


And pushed slab and its usage.  Will have a look at generation.c
tomorrow.

- Andres



Gah. I don't want to annoy person who just committed my patch, but can 
you give more time when asking for feedback? I mean, sending a modified 
patch on Friday midnight, and committing on Monday noon does not really 
give much time to look at it.


The changes seem fine to me, thanks for spending time on this.

Thanks

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Tomas Vondra

On 02/27/2017 12:17 PM, Andres Freund wrote:

Hi,

On 2017-02-24 14:10:38 -0800, Andres Freund wrote:

I've not yet looked a lot at the next type of context - I want to get
this much committed first...

I plan to give this another pass sometime this weekend and then push
soon.


Before committing I wanted to make sure that
http://archives.postgresql.org/message-id/32354.1487977458%40sss.pgh.pa.us
isn't a sufficient fix.

With the test of N=100 from this thread I measured both runtime and
memory usage (note that's peak virtual memory which includes 2GB of
shared_buffers and such), in assert enabled builds.

master: doesn't finish reasonably
master+doubly linked list fix:  9390.805 ms VmPeak: 10,969,424 kb
master+this thread: 6500.293 ms VmPeak:  2,969,528 kB

So the doubly-linked-list fix is great (and much more backpatchable),
but the patches in this thread are both better runtime *and* peak memory
usage wise.  So that seems like a clear call.



Nice, thanks for doing the test.


I've not yet reviewed the generational allocator yet, but during these
measurements I get:
postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', 
NULL, NULL);
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d011ef10f0 exceeds 7234 allocated
LOCATION:  GenerationCheck, generation.c:693
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d01023eba0 exceeds 65532 allocated
LOCATION:  GenerationCheck, generation.c:693
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d00d7fb870 exceeds 65532 allocated
LOCATION:  GenerationCheck, generation.c:693
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d00cde17b0 exceeds 65531 allocated
LOCATION:  GenerationCheck, generation.c:693

that seems to occur when there's currently in-progress transactions when
finishing decoding:


...


could it be that the test's condition is inverted?



Yeah, that seems like the culprit - the condition seems wrong. I wonder 
why I haven't seen it during my tests, though ...



I'll work on getting slab committed first, and then review / edit /
commit generation.c later.  One first note there is that I'm wondering
if generation.c is a too generic filename.



Naming things is hard.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
Hi,

On 2017-02-27 03:17:32 -0800, Andres Freund wrote:
> I'll work on getting slab committed first, and then review / edit /
> commit generation.c later.  One first note there is that I'm wondering
> if generation.c is a too generic filename.

And pushed slab and its usage.  Will have a look at generation.c
tomorrow.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-27 Thread Andres Freund
Hi,

On 2017-02-24 14:10:38 -0800, Andres Freund wrote:
> I've not yet looked a lot at the next type of context - I want to get
> this much committed first...
>
> I plan to give this another pass sometime this weekend and then push
> soon.

Before committing I wanted to make sure that
http://archives.postgresql.org/message-id/32354.1487977458%40sss.pgh.pa.us
isn't a sufficient fix.

With the test of N=100 from this thread I measured both runtime and
memory usage (note that's peak virtual memory which includes 2GB of
shared_buffers and such), in assert enabled builds.

master: doesn't finish reasonably
master+doubly linked list fix:  9390.805 ms VmPeak: 10,969,424 kb
master+this thread: 6500.293 ms VmPeak:  2,969,528 kB

So the doubly-linked-list fix is great (and much more backpatchable),
but the patches in this thread are both better runtime *and* peak memory
usage wise.  So that seems like a clear call.

I've not yet reviewed the generational allocator yet, but during these
measurements I get:
postgres[3970][1]=# select count(*) FROM pg_logical_slot_get_changes('ttt', 
NULL, NULL);
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d011ef10f0 exceeds 7234 allocated
LOCATION:  GenerationCheck, generation.c:693
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d01023eba0 exceeds 65532 allocated
LOCATION:  GenerationCheck, generation.c:693
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d00d7fb870 exceeds 65532 allocated
LOCATION:  GenerationCheck, generation.c:693
WARNING:  01000: problem in Generation Tuples: number of free chunks 0 in block 
0x55d00cde17b0 exceeds 65531 allocated
LOCATION:  GenerationCheck, generation.c:693

that seems to occur when there's currently in-progress transactions when
finishing decoding:

#0  GenerationCheck (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:692
#1  0x5629105d92db in GenerationReset (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:255
#2  0x5629105d93c6 in GenerationDelete (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/generation.c:287
#3  0x5629105e1e12 in MemoryContextDelete (context=0x5629129407c8)
at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:225
#4  0x5629105e1ee3 in MemoryContextDeleteChildren (context=0x562912940008)
at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:245
#5  0x5629105e1de0 in MemoryContextDelete (context=0x562912940008)
at /home/andres/src/postgresql/src/backend/utils/mmgr/mcxt.c:208
#6  0x5629103d5451 in ReorderBufferFree (rb=0x562912906320)
at 
/home/andres/src/postgresql/src/backend/replication/logical/reorderbuffer.c:278
#7  0x5629103cea4f in FreeDecodingContext (ctx=0x562912904310)
at /home/andres/src/postgresql/src/backend/replication/logical/logical.c:462
#8  0x5629103d03f0 in pg_logical_slot_get_changes_guts 
(fcinfo=0x7fffc2042e50, confirm=0 '\000', 

could it be that the test's condition is inverted?

I'll work on getting slab committed first, and then review / edit /
commit generation.c later.  One first note there is that I'm wondering
if generation.c is a too generic filename.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-24 Thread Andres Freund
Hi,


On 2017-02-21 01:43:46 +0100, Tomas Vondra wrote:
> Attached is v9 of this patch series. This addresses most of the points
> raised in the review, namely:

Cool, thanks.

> 3) get rid of the block-level bitmap tracking free chunks
>
> Instead of the bitmap, I've used a simple singly-linked list, using int32
> chunk indexes. Perhaps it could use the slist instead, but I'm not quite
> sure MAXALIGN is guaranteed to be greater than pointer.

I'm pretty sure it's guaranteed to be >= sizeof(void*).  But this seems
ok, so ...


Attached are changes I made on the path to committing the patch. These
look large enough that I wanted to give you a chance to comment:

- The AllocFreeInfo changes aren't correct for 32bit systems with 64bit,
  longs (like linux, unlike windows) - using %zu is better (z is code
  for size_t sized)
- Split off the reorderbuffer changes
- Removed SlabBlock/SlabChunk / renamed SlabBlockData
-
  +#ifdef MEMORY_CONTEXT_CHECKING
  +#define SLAB_CHUNK_USED \
  + (offsetof(SlabChunkData, requested_size) + sizeof(Size))
  +#else
  doesn't work anymore, because requested_size isn't a top-level
  field. I first redefined it as (without surrounding ifdef)
  #define SLAB_CHUNK_USED \
(offsetof(SlabChunkData, header) + sizeof(StandardChunkHeader))
  but I'm not really sure there's a whole lot of point in the define
  rather than just using sizeof() on the whole thing - there's no trailing
  padding?
- SLAB_CHUNK_PUBLIC and SLAB_BLOCKHDRSZ are unused?
- renamed 'set' variables (and comments) to slab.
- used castNode(SlabContext, context) instead of manual casts
- I rephrased
  + *
  + *   We cache indexes of the first empty chunk on each block 
(firstFreeChunk),
  + *   and freelist index for blocks with least free chunks (minFreeChunks), so
  + *   that we don't have to search the freelist and block on every SlabAlloc()
  + *   call, which is quite expensive.
so it's not referencing firstFreeChunk anymore, since that seems to
make less sense now that firstFreeChunk is essentially just the head
of the list of free chunks.
- added typedefs.list entries and pgindented slab.c
- "mark the chunk as unused (zero the bit)" referenced non-existant
   bitmap afaics.
- valgrind was triggering on
block->firstFreeChunk = *(int32 *) SlabChunkGetPointer(chunk);
  because that was previously marked as NOACCESS (via
  wipe_mem). Explicitly marking as DEFINED solves that.
- removed
  * XXX Perhaps we should not be gentle at all and simply fails in all cases,
  * to eliminate the (mostly pointless) uncertainty.
- you'd included MemoryContext tup_context; in 0002, but it's not really
  useful yet (and the comments above it in reorderbuffer.h don't apply)
- SlabFreeInfo/SlabAllocInfo didn't actually compile w/ HAVE_ALLOCINFO
  defined (pre StandardChunkHeader def)
- some minor stuff.

I'm kinda inclined to drop SlabFreeInfo/SlabAllocInfo.

I've not yet looked a lot at the next type of context - I want to get
this much committed first...

I plan to give this another pass sometime this weekend and then push
soon.

- Andres
>From cec3f8372137d2392ff7ac0ab1b2db11fc96e8b3 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 23 Feb 2017 22:35:44 -0800
Subject: [PATCH 1/3] Make useful infrastructure from aset.c generally
 available.

An upcoming patch introduces a new type of memory context. To avoid
duplicating debugging infrastructure with aset.c move useful pieces to
memdebug.[ch].

While touching aset.c, fix printf format AllocFree* debug macros.

Author: Tomas Vondra
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/b3b2245c-b37a-e1e5-ebc4-857c914bc...@2ndquadrant.com
---
 src/backend/utils/mmgr/Makefile   |   2 +-
 src/backend/utils/mmgr/aset.c | 115 +-
 src/backend/utils/mmgr/memdebug.c |  93 ++
 src/include/utils/memdebug.h  |  48 
 4 files changed, 144 insertions(+), 114 deletions(-)
 create mode 100644 src/backend/utils/mmgr/memdebug.c

diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 1842bae386..fc5f793b7f 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = aset.o dsa.o freepage.o mcxt.o portalmem.o
+OBJS = aset.o dsa.o freepage.o mcxt.o memdebug.o portalmem.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 4dfc3ec260..8056c00ae4 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -41,46 +41,6 @@
  *	chunks as chunks.  Anything "large" is passed off to malloc().  Change
  *	the number of freelists to change the small/large boundary.
  *
- *
- *	About CLOBBER_FREED_MEMORY:
- *
- *	If this symbol is defined, all freed memory is overwritten with 0x7F's.
- *	

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-20 Thread Tomas Vondra

Hi,

Attached is v9 of this patch series. This addresses most of the points 
raised in the review, namely:


1) change most 'debug' stuff to 'static inline' in memdebug.h

2) fixed and reworded a bunch of comments

3) get rid of the block-level bitmap tracking free chunks

Instead of the bitmap, I've used a simple singly-linked list, using 
int32 chunk indexes. Perhaps it could use the slist instead, but I'm not 
quite sure MAXALIGN is guaranteed to be greater than pointer.


In any case, this seems to be working reasonably well - it saves a bit 
of code (but also made some code slightly more complex). Also, it seems 
to be a tad faster than v8 - after repeating the same benchmark as 
before, I get these numbers:


masterslab-v8slab-v9
-
  1 50 28 25
  5  17500180160
 10 15380330
 20  ?750670

Although the results are quite noisy.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 7c70a7bef4029dd7f10c7dc9ff0dd92a7bd2f966 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 20 Feb 2017 20:16:16 +0100
Subject: [PATCH 1/3] move common bits to memdebug

---
 src/backend/utils/mmgr/Makefile   |   2 +-
 src/backend/utils/mmgr/aset.c | 115 +-
 src/backend/utils/mmgr/memdebug.c |  93 ++
 src/include/utils/memdebug.h  |  48 
 4 files changed, 144 insertions(+), 114 deletions(-)
 create mode 100644 src/backend/utils/mmgr/memdebug.c

diff --git a/src/backend/utils/mmgr/Makefile b/src/backend/utils/mmgr/Makefile
index 1842bae..fc5f793 100644
--- a/src/backend/utils/mmgr/Makefile
+++ b/src/backend/utils/mmgr/Makefile
@@ -12,6 +12,6 @@ subdir = src/backend/utils/mmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = aset.o dsa.o freepage.o mcxt.o portalmem.o
+OBJS = aset.o dsa.o freepage.o mcxt.o memdebug.o portalmem.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 4dfc3ec..33b4d01 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -41,46 +41,6 @@
  *	chunks as chunks.  Anything "large" is passed off to malloc().  Change
  *	the number of freelists to change the small/large boundary.
  *
- *
- *	About CLOBBER_FREED_MEMORY:
- *
- *	If this symbol is defined, all freed memory is overwritten with 0x7F's.
- *	This is useful for catching places that reference already-freed memory.
- *
- *	About MEMORY_CONTEXT_CHECKING:
- *
- *	Since we usually round request sizes up to the next power of 2, there
- *	is often some unused space immediately after a requested data area.
- *	Thus, if someone makes the common error of writing past what they've
- *	requested, the problem is likely to go unnoticed ... until the day when
- *	there *isn't* any wasted space, perhaps because of different memory
- *	alignment on a new platform, or some other effect.  To catch this sort
- *	of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
- *	the requested space whenever the request is less than the actual chunk
- *	size, and verifies that the byte is undamaged when the chunk is freed.
- *
- *
- *	About USE_VALGRIND and Valgrind client requests:
- *
- *	Valgrind provides "client request" macros that exchange information with
- *	the host Valgrind (if any).  Under !USE_VALGRIND, memdebug.h stubs out
- *	currently-used macros.
- *
- *	When running under Valgrind, we want a NOACCESS memory region both before
- *	and after the allocation.  The chunk header is tempting as the preceding
- *	region, but mcxt.c expects to able to examine the standard chunk header
- *	fields.  Therefore, we use, when available, the requested_size field and
- *	any subsequent padding.  requested_size is made NOACCESS before returning
- *	a chunk pointer to a caller.  However, to reduce client request traffic,
- *	it is kept DEFINED in chunks on the free list.
- *
- *	The rounded-up capacity of the chunk usually acts as a post-allocation
- *	NOACCESS region.  If the request consumes precisely the entire chunk,
- *	there is no such region; another chunk header may immediately follow.  In
- *	that case, Valgrind will not detect access beyond the end of the chunk.
- *
- *	See also the cooperating Valgrind client requests in mcxt.c.
- *
  *-
  */
 
@@ -296,10 +256,10 @@ static const unsigned char LogTable256[256] =
  */
 #ifdef HAVE_ALLOCINFO
 #define AllocFreeInfo(_cxt, _chunk) \
-			fprintf(stderr, "AllocFree: %s: %p, %d\n", \
+			fprintf(stderr, "AllocFree: %s: %p, %lu\n", \
 (_cxt)->header.name, (_chunk), (_chunk)->size)
 #define 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-14 Thread Tomas Vondra

On 02/14/2017 03:22 AM, Andres Freund wrote:

Hi,

On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:

On 02/11/2017 02:33 AM, Andres Freund wrote:

I have a hard time believing this the cache efficiency of linked lists
(which may or may not be real in this case) out-weights this, but if you
want to try, be my guest.


I'm not following - why would there be overhead in anything for
allocations bigger than 4 (or maybe 8) bytes? You can store the list
(via chunk ids, not pointers) inside the chunks itself, where data
otherwise would be.  And I don't see why you'd need a doubly linked
list, as the only operations that are needed are to push to the front of
the list, and to pop from the front of the list - and both operations
are simple to do with a singly linked list?



Oh! I have not considered storing the chunk indexes (for linked lists) in
the chunk itself, which obviously eliminates the overhead concerns, and
you're right a singly-linked list should be enough.

There will be some minimum-chunk-size requirement, depending on the block
size/chunk size. I wonder whether it makes sense to try to be smart and make
it dynamic, so that we only require 1B or 2B for cases when only that many
chunks fit into a block, or just say that it's 4B and be done with it.


I doubt it's worth it - it seems likely that the added branch is more
noticeable overall than the possible savings of 3 bytes. Also, won't the
space be lost due to alignment *anyway*?
+   /* chunk, including SLAB header (both addresses nicely aligned) */
+   fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
use a plain slist - no point in being more careful than that.



Hmm, I think you're right.




I mean 2^32 chunks ought to be enough for anyone, right?


Yea, that seems enough; but given the alignment thing pointed out above,
I think we can just use plain pointers - and that definitely should be
enough :P



People in year 2078: Why the hell did they only use 32 bits? Wasn't it 
obvious we'll have tiny computers with 32EB of RAM? ;-)





I'm still not buying the cache efficiency argument, though. One of
the reasons is that the implementation prefers blocks with fewer
free chunks when handling palloc(), so pfree() is making the block
less likely to be chosen by the next palloc().


That'll possibly de-optimize L1, but for L2 usage the higher density
seems like it'll be a win. All this memory is only accessed by a
single backend, so packing as densely as possible is good.



If so, if you have e.g. 8 byte allocations and 64kb sized blocks,
you end up with an array of 1024 doubly linked lists, which'll
take up 64kb on its own. And there a certainly scenarios where
even bigger block sizes could make sense. That's both memory
overhead, and runtime overhead, because at reset-time we'll have
to check the whole array (which'll presumably largely be empty
lists). Resetting is a pretty common path...



True, but it's not entirely clear if resetting is common for the

paths where we use those new allocators.


That's fair enough. There's still the memory overhead, but I guess
we can also live with that.



Right. My ambition was not to develop another general-purpose memory 
context that would work perfectly for everything, but something that 
works (better than the current code) for places like reorderbuffer.





Also, if we accept that it might be a problem, what other solution you
propose? I don't think just merging everything into a single list is a good
idea, for the reasons I explained before (it might make the resets somewhat
less expensive, but it'll make pfree() more expensive).

>>


Now that I think about it, a binary heap, as suggested elsewhere, isn't
entirely trivial to use for this - it's more or less trivial to "fix"
the heap after changing an element's value, but it's harder to find that
element first.

But a two-level list approach seems like it could work quite well -
basically a simplified skip-list.  A top-level list contains that has
the property that all the elements have a distinct #free, and then
hanging of those sub-lists for all the other blocks with the same number
of chunks.

I think that'd be a better implementation, but I can understand if you
don't immediately want to go there.



I don't want to go there. I'm not all that interested in reorderbuffer, 
to be honest, and this started more as "Hold my beer!" hack, after a 
midnight discussion with Petr, than a seriously meant patch. I've 
already spent like 100x time on it than I expected.





What might work is replacing the array with a list, though. So we'd have a
list of lists, which would eliminate the array overhead.


That seems likely to be significantly worse, because a) iteration is
more expensive b) accessing the relevant list to move between two
different "freecount" lists would be O(n).



Oh, right, I haven't realized we won't know the current head of the 
list, 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-13 Thread Andres Freund
Hi,

On 2017-02-11 14:40:18 +0100, Tomas Vondra wrote:
> On 02/11/2017 02:33 AM, Andres Freund wrote:
> > > I have a hard time believing this the cache efficiency of linked lists
> > > (which may or may not be real in this case) out-weights this, but if you
> > > want to try, be my guest.
> > 
> > I'm not following - why would there be overhead in anything for
> > allocations bigger than 4 (or maybe 8) bytes? You can store the list
> > (via chunk ids, not pointers) inside the chunks itself, where data
> > otherwise would be.  And I don't see why you'd need a doubly linked
> > list, as the only operations that are needed are to push to the front of
> > the list, and to pop from the front of the list - and both operations
> > are simple to do with a singly linked list?
> > 
> 
> Oh! I have not considered storing the chunk indexes (for linked lists) in
> the chunk itself, which obviously eliminates the overhead concerns, and
> you're right a singly-linked list should be enough.
>
> There will be some minimum-chunk-size requirement, depending on the block
> size/chunk size. I wonder whether it makes sense to try to be smart and make
> it dynamic, so that we only require 1B or 2B for cases when only that many
> chunks fit into a block, or just say that it's 4B and be done with it.

I doubt it's worth it - it seems likely that the added branch is more
noticeable overall than the possible savings of 3 bytes. Also, won't the
space be lost due to alignment *anyway*?
+   /* chunk, including SLAB header (both addresses nicely aligned) */
+   fullChunkSize = MAXALIGN(sizeof(SlabChunkData) + MAXALIGN(chunkSize));

In that case I'd just Assert(MAXIMUM_ALIGNOF >= sizeof(slist_head)) and
use a plain slist - no point in being more careful than that.


> I mean 2^32 chunks ought to be enough for anyone, right?

Yea, that seems enough; but given the alignment thing pointed out above,
I think we can just use plain pointers - and that definitely should be
enough :P


> I'm still not buying the cache efficiency argument, though. One of the
> reasons is that the implementation prefers blocks with fewer free chunks
> when handling palloc(), so pfree() is making the block less likely to be
> chosen by the next palloc().

That'll possibly de-optimize L1, but for L2 usage the higher density
seems like it'll be a win.  All this memory is only accessed by a single
backend, so packing as densely as possible is good.


> > If so, if you have e.g. 8 byte allocations and 64kb sized blocks, you
> > end up with an array of 1024 doubly linked lists, which'll take up 64kb
> > on its own.  And there a certainly scenarios where even bigger block
> > sizes could make sense.   That's both memory overhead, and runtime
> > overhead, because at reset-time we'll have to check the whole array
> > (which'll presumably largely be empty lists).  Resetting is a pretty
> > common path...
> > 
> 
> True, but it's not entirely clear if resetting is common for the paths where
> we use those new allocators.

That's fair enough.  There's still the memory overhead, but I guess we
can also live with that.


> Also, if we accept that it might be a problem, what other solution you
> propose? I don't think just merging everything into a single list is a good
> idea, for the reasons I explained before (it might make the resets somewhat
> less expensive, but it'll make pfree() more expensive).

Now that I think about it, a binary heap, as suggested elsewhere, isn't
entirely trivial to use for this - it's more or less trivial to "fix"
the heap after changing an element's value, but it's harder to find that
element first.

But a two-level list approach seems like it could work quite well -
basically a simplified skip-list.  A top-level list contains that has
the property that all the elements have a distinct #free, and then
hanging of those sub-lists for all the other blocks with the same number
of chunks.

I think that'd be a better implementation, but I can understand if you
don't immediately want to go there.


> What might work is replacing the array with a list, though. So we'd have a
> list of lists, which would eliminate the array overhead.

That seems likely to be significantly worse, because a) iteration is
more expensive b) accessing the relevant list to move between two
different "freecount" lists would be O(n).

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-11 Thread Tomas Vondra

On 02/11/2017 02:33 AM, Andres Freund wrote:

On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote:

On 02/09/2017 10:37 PM, Andres Freund wrote:

Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:

 src/backend/utils/mmgr/Makefile   |   2 +-
 src/backend/utils/mmgr/aset.c | 115 +
 src/backend/utils/mmgr/memdebug.c | 131 ++
 src/include/utils/memdebug.h  |  22 +++
 4 files changed, 156 insertions(+), 114 deletions(-)
 create mode 100644 src/backend/utils/mmgr/memdebug.c


I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower?  Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).



Do you have any numbers to support that? AFAICS compilers got really good in
inlining stuff on their own.


Unless you use LTO, they can't inline across translation units. And
using LTO is slow enough for linking that it's not that much fun to use,
as it makes compile-edit-compile cycles essentially take as long as a
full rebuild.



From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 30 Nov 2016 15:36:23 +0100
Subject: [PATCH 2/3] slab allocator

---
 src/backend/replication/logical/reorderbuffer.c |  82 +--
 src/backend/utils/mmgr/Makefile |   2 +-
 src/backend/utils/mmgr/slab.c   | 803 
 src/include/nodes/memnodes.h|   2 +-
 src/include/nodes/nodes.h   |   1 +
 src/include/replication/reorderbuffer.h |  15 +-
 src/include/utils/memutils.h|   9 +


I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.



I rather dislike patches that only add a bunch of code, without actually
using it anywhere.



But if needed, this is trivial to do at commit time - just don't
commit the reorderbuffer bits.


Meh.



+ * Each block includes a simple bitmap tracking which chunks are used/free.
+ * This makes it trivial to check if all chunks on the block are free, and
+ * eventually free the whole block (which is almost impossible with a 
global
+ * freelist of chunks, storing chunks from all blocks).


Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency.  A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?



A block-level counter would be enough to decide if all chunks on the block
are free, but it's not sufficient to identify which chunks are free /
available for reuse.

The bitmap only has a single bit per chunk, so I find "potentially long-ish"
is a bit misleading. Any linked list implementation will require much more
per-chunk overhead - as the chunks are fixed-legth, it's possible to use
chunk index (instead of 64-bit pointers), to save some space. But with large
blocks / small chunks that's still at least 2 or 4 bytes per index, and
you'll need two (to implement doubly-linked list, to make add/remove
efficient).



For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap
that's 16B to track all free space on the block. Doubly linked list would
require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B.



I have a hard time believing this the cache efficiency of linked lists
(which may or may not be real in this case) out-weights this, but if you
want to try, be my guest.


I'm not following - why would there be overhead in anything for
allocations bigger than 4 (or maybe 8) bytes? You can store the list
(via chunk ids, not pointers) inside the chunks itself, where data
otherwise would be.  And I don't see why you'd need a doubly linked
list, as the only operations that are needed are to push to the front of
the list, and to pop from the front of the list - and both operations
are simple to do with a singly linked list?



Oh! I have not considered storing the chunk indexes (for linked lists) 
in the chunk itself, which obviously eliminates the overhead concerns, 
and you're right a singly-linked list should be enough.


There will be some minimum-chunk-size requirement, depending on the 
block size/chunk size. I wonder whether it makes sense to try to be 
smart and make it dynamic, so that we only require 1B or 2B for cases 
when only that many chunks fit into a block, or just say that it's 4B 
and be done with it.


I mean 2^32 chunks ought to be enough for anyone, right?

I'm still not buying the cache efficiency argument, though. One of the 
reasons is that the implementation prefers blocks with fewer free chunks 
when 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-10 Thread Andres Freund
On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote:
> > > + /* move the whole block to the right place in the freelist */
> > > + dlist_delete(>node);
> > > + dlist_push_head(>freelist[block->nfree], >node);
> > 
> > Hm.  What if we, instead of the array of doubly linked lists, just kept
> > a single linked list of blocks, and keep that list sorted by number of
> > free chunks?  Given that freeing / allocation never changes the number
> > of allocated chunks by more than 1, we'll never have to move an entry
> > far in that list to keep it sorted.
> > 
> 
> Only assuming that there'll be only few blocks with the same number of free
> chunks. If that's not the case, you'll have to walk many blocks to move the
> block to the right place in the list. The array of lists handles such cases
> way more efficiently, and I think we should keep it.

The proper datastructure would probably be a heap.  Right now
binaryheap.h is fixed-size - probably not too hard to change.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-10 Thread Andres Freund
On 2017-02-11 02:13:59 +0100, Tomas Vondra wrote:
> On 02/09/2017 10:37 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
> > >  src/backend/utils/mmgr/Makefile   |   2 +-
> > >  src/backend/utils/mmgr/aset.c | 115 +
> > >  src/backend/utils/mmgr/memdebug.c | 131 
> > > ++
> > >  src/include/utils/memdebug.h  |  22 +++
> > >  4 files changed, 156 insertions(+), 114 deletions(-)
> > >  create mode 100644 src/backend/utils/mmgr/memdebug.c
> > 
> > I'm a bit loathe to move these to a .c file - won't this likely make
> > these debugging tools even slower?  Seems better to put some of them
> > them in a header as static inlines (not randomize, but the rest).
> > 
> 
> Do you have any numbers to support that? AFAICS compilers got really good in
> inlining stuff on their own.

Unless you use LTO, they can't inline across translation units. And
using LTO is slow enough for linking that it's not that much fun to use,
as it makes compile-edit-compile cycles essentially take as long as a
full rebuild.


> > > From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> > > From: Tomas Vondra 
> > > Date: Wed, 30 Nov 2016 15:36:23 +0100
> > > Subject: [PATCH 2/3] slab allocator
> > > 
> > > ---
> > >  src/backend/replication/logical/reorderbuffer.c |  82 +--
> > >  src/backend/utils/mmgr/Makefile |   2 +-
> > >  src/backend/utils/mmgr/slab.c   | 803 
> > > 
> > >  src/include/nodes/memnodes.h|   2 +-
> > >  src/include/nodes/nodes.h   |   1 +
> > >  src/include/replication/reorderbuffer.h |  15 +-
> > >  src/include/utils/memutils.h|   9 +
> > 
> > I'd like to see the reorderbuffer changes split into a separate commit
> > from the slab allocator introduction.
> > 
> 
> I rather dislike patches that only add a bunch of code, without actually
> using it anywhere.

> But if needed, this is trivial to do at commit time - just don't
> commit the reorderbuffer bits.

Meh.


> > > + *   Each block includes a simple bitmap tracking which chunks are 
> > > used/free.
> > > + *   This makes it trivial to check if all chunks on the block are 
> > > free, and
> > > + *   eventually free the whole block (which is almost impossible 
> > > with a global
> > > + *   freelist of chunks, storing chunks from all blocks).
> > 
> > Why is checking a potentially somewhat long-ish bitmap better than a
> > simple counter, or a "linked list" of "next free chunk-number" or such
> > (where free chunks simply contain the id of the subsequent chunk)?
> > Using a list instead of a bitmap would also make it possible to get
> > 'lifo' behaviour, which is good for cache efficiency.  A simple
> > chunk-number based singly linked list would only imply a minimum
> > allocation size of 4 - that seems perfectly reasonable?
> > 
> 
> A block-level counter would be enough to decide if all chunks on the block
> are free, but it's not sufficient to identify which chunks are free /
> available for reuse.
> 
> The bitmap only has a single bit per chunk, so I find "potentially long-ish"
> is a bit misleading. Any linked list implementation will require much more
> per-chunk overhead - as the chunks are fixed-legth, it's possible to use
> chunk index (instead of 64-bit pointers), to save some space. But with large
> blocks / small chunks that's still at least 2 or 4 bytes per index, and
> you'll need two (to implement doubly-linked list, to make add/remove
> efficient).

> For example assume 8kB block and 64B chunks, i.e. 128 chunks. With bitmap
> that's 16B to track all free space on the block. Doubly linked list would
> require 1B per chunk index, 2 indexes per chunk. That's 128*2 = 256B.

> I have a hard time believing this the cache efficiency of linked lists
> (which may or may not be real in this case) out-weights this, but if you
> want to try, be my guest.

I'm not following - why would there be overhead in anything for
allocations bigger than 4 (or maybe 8) bytes? You can store the list
(via chunk ids, not pointers) inside the chunks itself, where data
otherwise would be.  And I don't see why you'd need a doubly linked
list, as the only operations that are needed are to push to the front of
the list, and to pop from the front of the list - and both operations
are simple to do with a singly linked list?


> > Thirdly, isn't that approach going to result in a quite long freelists
> > array, when you have small items and a decent blocksize? That seems like
> > a fairly reasonable thing to do?
> > 
> 
> I'm confused. Why wouldn't that be reasonable. Or rather, what would be a
> more reasonable way?

If I understood correctly, you have one an array of doubly linked lists.
A block is stored in the list at the index #block's-free-elements.  Is that
right?

If so, if 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-10 Thread Tomas Vondra

On 02/09/2017 10:37 PM, Andres Freund wrote:

Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:

 src/backend/utils/mmgr/Makefile   |   2 +-
 src/backend/utils/mmgr/aset.c | 115 +
 src/backend/utils/mmgr/memdebug.c | 131 ++
 src/include/utils/memdebug.h  |  22 +++
 4 files changed, 156 insertions(+), 114 deletions(-)
 create mode 100644 src/backend/utils/mmgr/memdebug.c


I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower?  Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).



Do you have any numbers to support that? AFAICS compilers got really 
good in inlining stuff on their own.





From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Wed, 30 Nov 2016 15:36:23 +0100
Subject: [PATCH 2/3] slab allocator

---
 src/backend/replication/logical/reorderbuffer.c |  82 +--
 src/backend/utils/mmgr/Makefile |   2 +-
 src/backend/utils/mmgr/slab.c   | 803 
 src/include/nodes/memnodes.h|   2 +-
 src/include/nodes/nodes.h   |   1 +
 src/include/replication/reorderbuffer.h |  15 +-
 src/include/utils/memutils.h|   9 +


I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.



I rather dislike patches that only add a bunch of code, without actually 
using it anywhere. But if needed, this is trivial to do at commit time - 
just don't commit the reorderbuffer bits.







+/*-
+ *
+ * slab.c
+ *   SLAB allocator definitions.
+ *
+ * SLAB is a custom MemoryContext implementation designed for cases of
+ * equally-sized objects.
+ *
+ *
+ * Portions Copyright (c) 2016, PostgreSQL Global Development Group


Bump, before a committer forgets it.



OK.




+ * IDENTIFICATION
+ *   src/backend/utils/mmgr/slab.c
+ *
+ *
+ * The constant allocation size allows significant simplification and 
various
+ * optimizations. Firstly, we can get rid of the doubling and carve the 
blocks
+ * into chunks of exactly the right size (plus alignment), not wasting 
memory.


Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.



OK, rill reword.




+ * Each block includes a simple bitmap tracking which chunks are used/free.
+ * This makes it trivial to check if all chunks on the block are free, and
+ * eventually free the whole block (which is almost impossible with a 
global
+ * freelist of chunks, storing chunks from all blocks).


Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency.  A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?



A block-level counter would be enough to decide if all chunks on the 
block are free, but it's not sufficient to identify which chunks are 
free / available for reuse.


The bitmap only has a single bit per chunk, so I find "potentially 
long-ish" is a bit misleading. Any linked list implementation will 
require much more per-chunk overhead - as the chunks are fixed-legth, 
it's possible to use chunk index (instead of 64-bit pointers), to save 
some space. But with large blocks / small chunks that's still at least 2 
or 4 bytes per index, and you'll need two (to implement doubly-linked 
list, to make add/remove efficient).


For example assume 8kB block and 64B chunks, i.e. 128 chunks. With 
bitmap that's 16B to track all free space on the block. Doubly linked 
list would require 1B per chunk index, 2 indexes per chunk. That's 128*2 
= 256B.


I have a hard time believing this the cache efficiency of linked lists 
(which may or may not be real in this case) out-weights this, but if you 
want to try, be my guest.





+ * At the context level, we use 'freelist' to track blocks ordered by 
number
+ * of free chunks, starting with blocks having a single allocated chunk, 
and
+ * with completely full blocks on the tail.


Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks?  Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?



Possibly. Naming things is hard.

>

Thirdly, isn't that approach going to result in a quite long freelists
array, when 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-02-09 Thread Andres Freund
Hi,

On 2016-12-13 01:45:13 +0100, Tomas Vondra wrote:
>  src/backend/utils/mmgr/Makefile   |   2 +-
>  src/backend/utils/mmgr/aset.c | 115 +
>  src/backend/utils/mmgr/memdebug.c | 131 
> ++
>  src/include/utils/memdebug.h  |  22 +++
>  4 files changed, 156 insertions(+), 114 deletions(-)
>  create mode 100644 src/backend/utils/mmgr/memdebug.c

I'm a bit loathe to move these to a .c file - won't this likely make
these debugging tools even slower?  Seems better to put some of them
them in a header as static inlines (not randomize, but the rest).


> From 43aaabf70b979b172fd659ef4d0ef129fd78d72d Mon Sep 17 00:00:00 2001
> From: Tomas Vondra 
> Date: Wed, 30 Nov 2016 15:36:23 +0100
> Subject: [PATCH 2/3] slab allocator
> 
> ---
>  src/backend/replication/logical/reorderbuffer.c |  82 +--
>  src/backend/utils/mmgr/Makefile |   2 +-
>  src/backend/utils/mmgr/slab.c   | 803 
> 
>  src/include/nodes/memnodes.h|   2 +-
>  src/include/nodes/nodes.h   |   1 +
>  src/include/replication/reorderbuffer.h |  15 +-
>  src/include/utils/memutils.h|   9 +

I'd like to see the reorderbuffer changes split into a separate commit
from the slab allocator introduction.




> +/*-
> + *
> + * slab.c
> + * SLAB allocator definitions.
> + *
> + * SLAB is a custom MemoryContext implementation designed for cases of
> + * equally-sized objects.
> + *
> + *
> + * Portions Copyright (c) 2016, PostgreSQL Global Development Group

Bump, before a committer forgets it.


> + * IDENTIFICATION
> + * src/backend/utils/mmgr/slab.c
> + *
> + *
> + *   The constant allocation size allows significant simplification and 
> various
> + *   optimizations. Firstly, we can get rid of the doubling and carve the 
> blocks
> + *   into chunks of exactly the right size (plus alignment), not wasting 
> memory.

Getting rid of it relative to what? I'd try to phrase it so these
comments stand on their own.


> + *   Each block includes a simple bitmap tracking which chunks are used/free.
> + *   This makes it trivial to check if all chunks on the block are free, and
> + *   eventually free the whole block (which is almost impossible with a 
> global
> + *   freelist of chunks, storing chunks from all blocks).

Why is checking a potentially somewhat long-ish bitmap better than a
simple counter, or a "linked list" of "next free chunk-number" or such
(where free chunks simply contain the id of the subsequent chunk)?
Using a list instead of a bitmap would also make it possible to get
'lifo' behaviour, which is good for cache efficiency.  A simple
chunk-number based singly linked list would only imply a minimum
allocation size of 4 - that seems perfectly reasonable?


> + *   At the context level, we use 'freelist' to track blocks ordered by 
> number
> + *   of free chunks, starting with blocks having a single allocated chunk, 
> and
> + *   with completely full blocks on the tail.

Why that way round? Filling chunks up as much as possible is good for
cache and TLB efficiency, and allows for earlier de-allocation of
partially used blocks?  Oh, I see you do that in the next comment,
but it still leaves me wondering.

Also, is this actually a list? It's more an array of lists, right?
I.e. it should be named freelists?

Thirdly, isn't that approach going to result in a quite long freelists
array, when you have small items and a decent blocksize? That seems like
a fairly reasonable thing to do?


> + *   This also allows various optimizations - for example when searching for
> + *   free chunk, we the allocator reuses space from the most full blocks 
> first,
> + *   in the hope that some of the less full blocks will get completely empty
> + *   (and returned back to the OS).

Might be worth mentioning tlb/cache efficiency too.


> + *   For each block, we maintain pointer to the first free chunk - this is 
> quite
> + *   cheap and allows us to skip all the preceding used chunks, eliminating
> + *   a significant number of lookups in many common usage patters. In the 
> worst
> + *   case this performs as if the pointer was not maintained.

Hm, so that'd be eliminated if we maintained a linked list of chunks (by
chunk number) and a free_chunk_cnt or such.


> +
> +#include "postgres.h"
> +
> +#include "utils/memdebug.h"
> +#include "utils/memutils.h"
> +#include "lib/ilist.h"

Move ilist up, above memdebug, so the list is alphabetically ordered.


> +/*
> + * SlabPointer
> + *   Aligned pointer which may be a member of an allocation set.
> + */
> +typedef void *SlabPointer;
> +typedef SlabContext *Slab;

I personally wont commit this whith pointer hiding typedefs.  If
somebody else does, I can live with it, but for me it's bad enough taste
that I wont.


> 

Re: [HACKERS] PATCH: two slab-like memory allocators

2017-01-31 Thread Michael Paquier
On Tue, Dec 13, 2016 at 10:32 AM, Petr Jelinek
 wrote:
> Okay, this version looks good to me, marked as RfC.

The patches still apply, moved to CF 2017-03 with same status: RfC.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-12 Thread Petr Jelinek
On 13/12/16 01:45, Tomas Vondra wrote:
> On 12/12/2016 11:39 PM, Tomas Vondra wrote:
>> On 12/12/2016 05:05 AM, Petr Jelinek wrote:
>>>
>>> I'd be happy with this patch now (as in committer ready) except that it
>>> does have some merge conflicts after the recent commits, so rebase is
>>> needed.
>>>
>>
>> Attached is a rebased version of the patch, resolving the Makefile merge
>> conflicts.
>>
> 
> Meh, managed to rebase a wrong branch, missing fix to the off-by-one
> error (fixed v6). Attached is v8, hopefully the correct one.
> 

Okay, this version looks good to me, marked as RfC.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-12 Thread Tomas Vondra

On 12/12/2016 11:39 PM, Tomas Vondra wrote:

On 12/12/2016 05:05 AM, Petr Jelinek wrote:


I'd be happy with this patch now (as in committer ready) except that it
does have some merge conflicts after the recent commits, so rebase is
needed.



Attached is a rebased version of the patch, resolving the Makefile merge
conflicts.



Meh, managed to rebase a wrong branch, missing fix to the off-by-one 
error (fixed v6). Attached is v8, hopefully the correct one.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-move-common-bits-to-memdebug-v8.patch
Description: binary/octet-stream


0002-slab-allocator-v8.patch
Description: binary/octet-stream


0003-generational-context-v8.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-12 Thread Tomas Vondra

On 12/12/2016 05:05 AM, Petr Jelinek wrote:


I'd be happy with this patch now (as in committer ready) except that it
does have some merge conflicts after the recent commits, so rebase is
needed.



Attached is a rebased version of the patch, resolving the Makefile merge 
conflicts.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v7.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-11 Thread Petr Jelinek
On 01/12/16 03:26, Tomas Vondra wrote:
> 
> Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a):
>> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
>>> On 27/11/16 21:47, Andres Freund wrote:
 Hi,

>> +typedef struct SlabBlockData *SlabBlock;/* forward
>> reference */
>> +typedef struct SlabChunkData *SlabChunk;
>>
>> Can we please not continue hiding pointers behind typedefs? It's a
>> bad
>> pattern, and that it's fairly widely used isn't a good excuse to
>> introduce further usages of it.
>>
>
> Why is it a bad pattern?

 It hides what is passed by reference, and what by value, and it
 makes it
 a guessing game whether you need -> or . since you don't know whether
 it's a pointer or the actual object. All to save a * in parameter and
 variable declaration?...

>>>
>>> FWIW I don't like that pattern either although it's used in many
>>> parts of our code-base.
>>
>> But relatively few new ones, most of it is pretty old.
>>
> 
> I do agree it's not particularly pretty pattern, but in this case it's
> fairly isolated in the mmgr sources, and I quite value the consistency
> in this part of the code (i.e. that aset.c, slab.c and generation.c all
> use the same approach). So I haven't changed this.
> 
> The attached v7 fixes the off-by-one error in slab.c, causing failures
> in test_decoding isolation tests, and renames Gen to Generation, as
> proposed by Petr.
> 

I'd be happy with this patch now (as in committer ready) except that it
does have some merge conflicts after the recent commits, so rebase is
needed.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-12-01 Thread Haribabu Kommi
On Thu, Dec 1, 2016 at 1:26 PM, Tomas Vondra 
wrote:

>
>
> Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a):
>
>> On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
>>
>>> On 27/11/16 21:47, Andres Freund wrote:
>>>
 Hi,

 +typedef struct SlabBlockData *SlabBlock;   /* forward
>> reference */
>> +typedef struct SlabChunkData *SlabChunk;
>>
>> Can we please not continue hiding pointers behind typedefs? It's a bad
>> pattern, and that it's fairly widely used isn't a good excuse to
>> introduce further usages of it.
>>
>>
> Why is it a bad pattern?
>

 It hides what is passed by reference, and what by value, and it makes it
 a guessing game whether you need -> or . since you don't know whether
 it's a pointer or the actual object. All to save a * in parameter and
 variable declaration?...


>>> FWIW I don't like that pattern either although it's used in many
>>> parts of our code-base.
>>>
>>
>> But relatively few new ones, most of it is pretty old.
>>
>>
> I do agree it's not particularly pretty pattern, but in this case it's
> fairly isolated in the mmgr sources, and I quite value the consistency in
> this part of the code (i.e. that aset.c, slab.c and generation.c all use
> the same approach). So I haven't changed this.
>
> The attached v7 fixes the off-by-one error in slab.c, causing failures in
> test_decoding isolation tests, and renames Gen to Generation, as proposed
> by Petr.
>

Moved to next CF with same status (needs review).

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-30 Thread Tomas Vondra



Dne 11/27/2016 v 11:02 PM Andres Freund napsal(a):

On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:

On 27/11/16 21:47, Andres Freund wrote:

Hi,


+typedef struct SlabBlockData *SlabBlock;   /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.



Why is it a bad pattern?


It hides what is passed by reference, and what by value, and it makes it
a guessing game whether you need -> or . since you don't know whether
it's a pointer or the actual object. All to save a * in parameter and
variable declaration?...



FWIW I don't like that pattern either although it's used in many
parts of our code-base.


But relatively few new ones, most of it is pretty old.



I do agree it's not particularly pretty pattern, but in this case it's 
fairly isolated in the mmgr sources, and I quite value the consistency 
in this part of the code (i.e. that aset.c, slab.c and generation.c all 
use the same approach). So I haven't changed this.


The attached v7 fixes the off-by-one error in slab.c, causing failures 
in test_decoding isolation tests, and renames Gen to Generation, as 
proposed by Petr.


regards
Tomas




slab-allocators-v7.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-27 Thread Andres Freund
On 2016-11-27 22:21:49 +0100, Petr Jelinek wrote:
> On 27/11/16 21:47, Andres Freund wrote:
> > Hi,
> > 
> >>> +typedef struct SlabBlockData *SlabBlock; /* forward reference */
> >>> +typedef struct SlabChunkData *SlabChunk;
> >>>
> >>> Can we please not continue hiding pointers behind typedefs? It's a bad
> >>> pattern, and that it's fairly widely used isn't a good excuse to
> >>> introduce further usages of it.
> >>>
> >>
> >> Why is it a bad pattern?
> > 
> > It hides what is passed by reference, and what by value, and it makes it
> > a guessing game whether you need -> or . since you don't know whether
> > it's a pointer or the actual object. All to save a * in parameter and
> > variable declaration?...
> > 
> 
> FWIW I don't like that pattern either although it's used in many parts
> of our code-base.

But relatively few new ones, most of it is pretty old.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-27 Thread Petr Jelinek
On 27/11/16 21:47, Andres Freund wrote:
> Hi,
> 
>>> +typedef struct SlabBlockData *SlabBlock;   /* forward reference */
>>> +typedef struct SlabChunkData *SlabChunk;
>>>
>>> Can we please not continue hiding pointers behind typedefs? It's a bad
>>> pattern, and that it's fairly widely used isn't a good excuse to
>>> introduce further usages of it.
>>>
>>
>> Why is it a bad pattern?
> 
> It hides what is passed by reference, and what by value, and it makes it
> a guessing game whether you need -> or . since you don't know whether
> it's a pointer or the actual object. All to save a * in parameter and
> variable declaration?...
> 

FWIW I don't like that pattern either although it's used in many parts
of our code-base.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-27 Thread Andres Freund
Hi,

> > +typedef struct SlabBlockData *SlabBlock;   /* forward reference */
> > +typedef struct SlabChunkData *SlabChunk;
> >
> > Can we please not continue hiding pointers behind typedefs? It's a bad
> > pattern, and that it's fairly widely used isn't a good excuse to
> > introduce further usages of it.
> >
>
> Why is it a bad pattern?

It hides what is passed by reference, and what by value, and it makes it
a guessing game whether you need -> or . since you don't know whether
it's a pointer or the actual object. All to save a * in parameter and
variable declaration?...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-27 Thread Tomas Vondra

On 11/27/2016 07:25 PM, Petr Jelinek wrote:

On 15/11/16 01:44, Tomas Vondra wrote:

Attached is v6 of the patch series, fixing most of the points:

* common bits (valgrind/randomization/wipe) moved to memdebug.h/c

Instead of introducing a new header file, I've added the prototypes to
memdebug.h (which was already used for the valgrind stuff anyway), and
the implementations to a new memdebug.c file. Not sure what you meant by
"static inlines" though.


I think Andres wanted to put the implementation to the static inline
functions directly in the header (see parts of pg_list or how atomics
work), but I guess you way works too.



I see. Well turning that into static inlines just like in pg_list is 
possible. I guess the main reason is performance - for pg_list that 
probably makes sense, but the memory randomization/valgrind stuff is 
only ever used for debugging, which already does a lot of expensive 
stuff anyway.




I've however kept SlabContext->freelist as an array, because there may
be many blocks with the same number of free chunks, in which case moving
the block in the list would be expensive. This way it's simply
dlist_delete + dlist_push.



+1

I get mxact isolation test failures in test_decoding with this version
of patch:
  step s0w: INSERT INTO do_write DEFAULT VALUES;
+ WARNING:  problem in slab TXN: number of free chunks 33 in block
0x22beba0 does not match bitmap 34
  step s0start: SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false');
  data
and
  step s0alter: ALTER TABLE do_write ADD column ts timestamptz;
  step s0w: INSERT INTO do_write DEFAULT VALUES;
+ WARNING:  problem in slab TXN: number of free chunks 33 in block
0x227c3f0 does not match bitmap 34
  step s0start: SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false');
  data



D'oh! I believe this is a simple thinko in SlabCheck, which iterates 
over chunks like this:


for (j = 0; j <= slab->chunksPerBlock; j++)
...

which is of course off-by-one error (and the 33 vs. 34 error message is 
consistent with this theory).




Also, let's just rename the Gen to Generation.



OK.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-27 Thread Petr Jelinek
On 15/11/16 01:44, Tomas Vondra wrote:
> Attached is v6 of the patch series, fixing most of the points:
> 
> * common bits (valgrind/randomization/wipe) moved to memdebug.h/c
> 
> Instead of introducing a new header file, I've added the prototypes to
> memdebug.h (which was already used for the valgrind stuff anyway), and
> the implementations to a new memdebug.c file. Not sure what you meant by
> "static inlines" though.

I think Andres wanted to put the implementation to the static inline
functions directly in the header (see parts of pg_list or how atomics
work), but I guess you way works too.

> 
> I've however kept SlabContext->freelist as an array, because there may
> be many blocks with the same number of free chunks, in which case moving
> the block in the list would be expensive. This way it's simply
> dlist_delete + dlist_push.
> 

+1

I get mxact isolation test failures in test_decoding with this version
of patch:
  step s0w: INSERT INTO do_write DEFAULT VALUES;
+ WARNING:  problem in slab TXN: number of free chunks 33 in block
0x22beba0 does not match bitmap 34
  step s0start: SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false');
  data
and
  step s0alter: ALTER TABLE do_write ADD column ts timestamptz;
  step s0w: INSERT INTO do_write DEFAULT VALUES;
+ WARNING:  problem in slab TXN: number of free chunks 33 in block
0x227c3f0 does not match bitmap 34
  step s0start: SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', 'false');
  data


Also, let's just rename the Gen to Generation.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-14 Thread Tomas Vondra

Attached is v6 of the patch series, fixing most of the points:

* common bits (valgrind/randomization/wipe) moved to memdebug.h/c

Instead of introducing a new header file, I've added the prototypes to 
memdebug.h (which was already used for the valgrind stuff anyway), and 
the implementations to a new memdebug.c file. Not sure what you meant by 
"static inlines" though.


So the patch series now has three parts - 0001 with memdebug stuff, 0002 
with slab and 0003 with gen (still a poor name).


* removing AllocSet references from both new memory contexts

* using FLEXIBLE_ARRAY_ELEMENT in SlabContext

* using dlist instead of the custom linked list

I've however kept SlabContext->freelist as an array, because there may 
be many blocks with the same number of free chunks, in which case moving 
the block in the list would be expensive. This way it's simply 
dlist_delete + dlist_push.


* use StandardChunkHeader instead of the common fields

* removing pointer to context from block header for both contexts

* fix format in FreeInfo/AllocInfo (including for AllocSet)

* improved a bunch of comments (bitmap size, chunksPerBlock formula)

* did a pgindent run on the patch

* implement the missing methods in Gen (Stats/Check)

* fix a few minor bugs in both contexts

I haven't done anything with hiding pointers behind typedefs, because I 
don't know what's so wrong about that.


I also haven't done anything with the bitmap access in SlabAlloc - I 
haven't found any reasonable case when it would be measurable, and I 
don't expect this to be even measurable in practice.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v6.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-13 Thread Tomas Vondra

On 11/12/2016 08:12 PM, Andres Freund wrote:

Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...


+ * IDENTIFICATION
+ *   src/backend/utils/mmgr/slab.c
+ *
+ *
+ * The constant allocation size allows significant simplification and 
various
+ * optimizations that are not possible in AllocSet. Firstly, we can get rid
+ * of the doubling and carve the blocks into chunks of exactly the right 
size
+ * (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.



Yeah, that's probably true.


+ *
+ * At the context level, we use 'freelist' array to track blocks grouped by
+ * number of free chunks. For example freelist[0] is a list of completely 
full
+ * blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?


So there'd be one huge list of blocks, sorted by the number of empty 
chunks? Hmm, that might work I guess.


I don't think the combination of large blocks with small allocations is 
particularly sensible, though - what exactly would be the benefit of 
such combination? I would even consider enforcing some upper limit on 
the number of chunks per block - say, 256, for example.





+ * About MEMORY_CONTEXT_CHECKING:
+ *
+ * Since we usually round request sizes up to the next power of 2, there
+ * is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?



Yeah, sorry about that.



+ * Thus, if someone makes the common error of writing past what they've
+ * requested, the problem is likely to go unnoticed ... until the day when
+ * there *isn't* any wasted space, perhaps because of different memory
+ * ...
+ *
+ * See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.



Yes, I planned to do that for the next version of patch. Laziness.



+typedef struct SlabBlockData *SlabBlock;   /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.



Why is it a bad pattern?


+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+   MemoryContextData header;   /* Standard memory-context fields */
+   /* Allocation parameters for this context: */
+   SizechunkSize;  /* chunk size */
+   SizefullChunkSize;  /* chunk size including header and 
alignment */
+   SizeblockSize;  /* block size */
+   int chunksPerBlock; /* number of chunks per block */
+   int minFreeChunks;  /* min number of free chunks in 
any block */
+   int nblocks;/* number of blocks 
allocated */
+   /* Info about storage allocated in this context: */
+   SlabBlock   freelist[1];/* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.



Will fix, thanks.


+/*
+ * SlabBlockData
+ * Structure of a single block in SLAB allocator.
+ *
+ * slab: context owning this block

What do we need this for?



You're right the pointer to the owning context is unnecessary - there's 
nothing like "standard block header" and we already have the pointer in 
the standard chunk header. But maybe keeping the pointer at least with 
MEMORY_CONTEXT_CHECKING would be a good idea?



+ * prev, next: used for doubly-linked list of blocks in global freelist

I'd prefer using an embedded list here (cf. ilist.h).



Makes sense.


+/*
+ * SlabChunk
+ * The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ * However it's possible to fields in front of the StandardChunkHeader fields,
+ * which is used to add pointer to the block.
+ */


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-11-12 Thread Andres Freund
Hi,

Subject: [PATCH 1/2] slab allocator

diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 6ad7e7d..520f295 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c

I'd rather have that in a separate followup commit...


+ * IDENTIFICATION
+ *   src/backend/utils/mmgr/slab.c
+ *
+ *
+ * The constant allocation size allows significant simplification and 
various
+ * optimizations that are not possible in AllocSet. Firstly, we can get rid
+ * of the doubling and carve the blocks into chunks of exactly the right 
size
+ * (plus alignment), not wasting memory.

References to AllocSet aren't necessarily a good idea, they'll quite
possibly get out of date. The argument can be quite easily be made
without referring to a concrete reference to behaviour elsewhere.

+ *
+ * At the context level, we use 'freelist' array to track blocks grouped by
+ * number of free chunks. For example freelist[0] is a list of completely 
full
+ * blocks, freelist[1] is a block with a single free chunk, etc.

Hm. Those arrays are going to be quite large for small allocations w/
big blocks (an imo sensible combination). Maybe it'd make more sense to
model it as a linked list of blocks? Full blocks are at one end, empty
ones at the other?


+ * About MEMORY_CONTEXT_CHECKING:
+ *
+ * Since we usually round request sizes up to the next power of 2, there
+ * is often some unused space immediately after a requested data
area.

I assume the "round up" stuff is copy-paste?


+ * Thus, if someone makes the common error of writing past what they've
+ * requested, the problem is likely to go unnoticed ... until the day when
+ * there *isn't* any wasted space, perhaps because of different memory
+ * alignment on a new platform, or some other effect.  To catch this sort
+ * of problem, the MEMORY_CONTEXT_CHECKING option stores 0x7E just beyond
+ * the requested space whenever the request is less than the actual chunk
+ * size, and verifies that the byte is undamaged when the chunk is freed.
+ *
+ *
+ * About USE_VALGRIND and Valgrind client requests:
+ *
+ * Valgrind provides "client request" macros that exchange information with
+ * the host Valgrind (if any).  Under !USE_VALGRIND, memdebug.h stubs out
+ * currently-used macros.
+ *
+ * When running under Valgrind, we want a NOACCESS memory region both 
before
+ * and after the allocation.  The chunk header is tempting as the preceding
+ * region, but mcxt.c expects to able to examine the standard chunk header
+ * fields.  Therefore, we use, when available, the requested_size field and
+ * any subsequent padding.  requested_size is made NOACCESS before 
returning
+ * a chunk pointer to a caller.  However, to reduce client request traffic,
+ * it is kept DEFINED in chunks on the free list.
+ *
+ * The rounded-up capacity of the chunk usually acts as a post-allocation
+ * NOACCESS region.  If the request consumes precisely the entire chunk,
+ * there is no such region; another chunk header may immediately follow.  
In
+ * that case, Valgrind will not detect access beyond the end of the chunk.
+ *
+ * See also the cooperating Valgrind client requests in mcxt.c.

I think we need a preliminary patch moving a lot of this into something
like mcxt_internal.h. Copying this comment, and a lot of the utility
functions, into every memory context implementation is a bad pattern.


+typedef struct SlabBlockData *SlabBlock;   /* forward reference */
+typedef struct SlabChunkData *SlabChunk;

Can we please not continue hiding pointers behind typedefs? It's a bad
pattern, and that it's fairly widely used isn't a good excuse to
introduce further usages of it.

+/*
+ * SlabContext is a specialized implementation of MemoryContext.
+ */
+typedef struct SlabContext
+{
+   MemoryContextData header;   /* Standard memory-context fields */
+   /* Allocation parameters for this context: */
+   SizechunkSize;  /* chunk size */
+   SizefullChunkSize;  /* chunk size including header and 
alignment */
+   SizeblockSize;  /* block size */
+   int chunksPerBlock; /* number of chunks per block */
+   int minFreeChunks;  /* min number of free chunks in 
any block */
+   int nblocks;/* number of blocks 
allocated */
+   /* Info about storage allocated in this context: */
+   SlabBlock   freelist[1];/* free lists (block-level) */

I assume this is a variable-length array? If so, that a) should be
documented b) use FLEXIBLE_ARRAY_MEMBER as length - not doing so
actually will cause compiler warnings and potential misoptimizations.

+/*
+ * SlabBlockData
+ * Structure 

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-25 Thread Jim Nasby

On 10/25/16 4:48 PM, Tomas Vondra wrote:

The main issue that bugs me is the name of the Gen allocator, but I
don't have a good naming ideas :( The basic characteristics of Gen is
that it does not reuse space released by pfree(), relying on the fact
that the whole block will become free. That should be reflected in the
name somehow, I guess.


OneTime? OneUse? OneShot? AllocOnce?

OneHitWonder? ;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-25 Thread Jim Nasby

On 10/1/16 7:34 PM, Tomas Vondra wrote:

+/* otherwise add it to the proper freelist bin */
Looks like something went missing... :)



Ummm? The patch contains this:

+/* otherwise add it to the proper freelist bin */
+if (set->freelist[block->nfree])
+set->freelist[block->nfree]->prev = block;
+
+block->next = set->freelist[block->nfree];
+set->freelist[block->nfree] = block;

Which does exactly the thing it should do. Or what is missing?


What's confusing is the "otherwise" right at the beginning of the function:

+static void
+add_to_freelist(Slab set, SlabBlock block)
+{
+   /* otherwise add it to the proper freelist bin */
+   if (set->freelist[block->nfree])
+   set->freelist[block->nfree]->prev = block;
+
+   block->next = set->freelist[block->nfree];
+   set->freelist[block->nfree] = block;
+}

Otherwise what? What's the other option?


(Haven't looked at the newer patch, so maybe this isn't an issue anymore.)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-25 Thread Tomas Vondra

On 10/23/2016 05:26 PM, Petr Jelinek wrote:

On 23/10/16 16:26, Tomas Vondra wrote:

On 10/22/2016 08:30 PM, Tomas Vondra wrote:
...
Moreover, the slab/gen allocators proposed here seem like a better
fit for reorderbuffer, e.g. because they release memory. I haven't
looked at sb_alloc too closely, but I think it behaves more like
AllocSet in this regard (i.e. keeping the memory indefinitely).



For reorderbuffer, from what I've seen in practice, I'd prefer
proper freeing to 5% performance gain as I seen walsenders taking GBs
of memory dues to reoderbuffer allocations that are never properly
freed.



Right.

>

About your actual patch. I do like both the Slab and the Gen allocators
and think that we should proceed with them for the moment. You
definitely need to rename the Gen one (don't ask me to what though) as
it sounds like "generic" and do some finishing touches but I think it's
the way to go. I don't see any point in GenSlab anymore.



Attached is a v5 of the patch that does this i.e. throws away the 
GenSlab allocator and modifies reorderbuffer in two steps.


First (0001) it adds Slab allocator for TXN/Change allocations, and 
keeps the local slab cache for TupleBuf allocations (with a separate 
AllocSet context).


Then (in 0002) it adds the Gen allocator for TupleBuf, removing the last 
bits of the local slab cache.


I do think this version is is as simple as it gets - there's not much 
more we could simplify / remove.


The main issue that bugs me is the name of the Gen allocator, but I 
don't have a good naming ideas :( The basic characteristics of Gen is 
that it does not reuse space released by pfree(), relying on the fact 
that the whole block will become free. That should be reflected in the 
name somehow, I guess.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v5.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-23 Thread Petr Jelinek
On 23/10/16 16:26, Tomas Vondra wrote:
> On 10/22/2016 08:30 PM, Tomas Vondra wrote:
>> On 10/20/2016 04:43 PM, Robert Haas wrote:
>>>
>>> ...
>>>
>>> The sb_alloc allocator I proposed a couple of years ago would work
>>> well for this case, I think.
>>>
>>
>> Maybe, but it does not follow the Memory Context design at all, if I
>> understand it correctly. I was willing to give it a spin anyway and see
>> how it compares to the two other allocators, but this is a significant
>> paradigm shift and certainly much larger step than what I proposed.
>>
>> I'm not even sure it's possible to implement a MemoryContext based on
>> the same ideas as sb_alloc(), because one of the important points of
>> sb_alloc design seems to be throwing away the chunk header. While that
>> may be possible, it would certainly affect the whole tree (not just the
>> reorderbuffer bit), and it'd require way more work.
>>
>> Moreover, the two allocators I proposed significantly benefit from the
>> "same lifespan" assumption. I don't think sb_alloc can do that.
>>
> 
> I've given the sb_alloc patch another try - essentially hacking it into
> reorderbuffer, ignoring the issues mentioned yesterday. And yes, it's
> faster than the allocators discussed in this thread. Based on a few very
> quick tests on my laptop, the difference is usually ~5-10%.
> 
> That might seem like a significant improvement, but it's negligible
> compared to the "master -> slab/gen" improvement, which improves
> performance by orders of magnitude (at least for the tested cases).
> 
> Moreover, the slab/gen allocators proposed here seem like a better fit
> for reorderbuffer, e.g. because they release memory. I haven't looked at
> sb_alloc too closely, but I think it behaves more like AllocSet in this
> regard (i.e. keeping the memory indefinitely).
> 

For reorderbuffer, from what I've seen in practice, I'd prefer proper
freeing to 5% performance gain as I seen walsenders taking GBs of memory
dues to reoderbuffer allocations that are never properly freed.

About your actual patch. I do like both the Slab and the Gen allocators
and think that we should proceed with them for the moment. You
definitely need to rename the Gen one (don't ask me to what though) as
it sounds like "generic" and do some finishing touches but I think it's
the way to go. I don't see any point in GenSlab anymore.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-23 Thread Tomas Vondra

On 10/22/2016 08:30 PM, Tomas Vondra wrote:

On 10/20/2016 04:43 PM, Robert Haas wrote:

>>

...

The sb_alloc allocator I proposed a couple of years ago would work
well for this case, I think.



Maybe, but it does not follow the Memory Context design at all, if I
understand it correctly. I was willing to give it a spin anyway and see
how it compares to the two other allocators, but this is a significant
paradigm shift and certainly much larger step than what I proposed.

I'm not even sure it's possible to implement a MemoryContext based on
the same ideas as sb_alloc(), because one of the important points of
sb_alloc design seems to be throwing away the chunk header. While that
may be possible, it would certainly affect the whole tree (not just the
reorderbuffer bit), and it'd require way more work.

Moreover, the two allocators I proposed significantly benefit from the
"same lifespan" assumption. I don't think sb_alloc can do that.



I've given the sb_alloc patch another try - essentially hacking it into 
reorderbuffer, ignoring the issues mentioned yesterday. And yes, it's 
faster than the allocators discussed in this thread. Based on a few very 
quick tests on my laptop, the difference is usually ~5-10%.


That might seem like a significant improvement, but it's negligible 
compared to the "master -> slab/gen" improvement, which improves 
performance by orders of magnitude (at least for the tested cases).


Moreover, the slab/gen allocators proposed here seem like a better fit 
for reorderbuffer, e.g. because they release memory. I haven't looked at 
sb_alloc too closely, but I think it behaves more like AllocSet in this 
regard (i.e. keeping the memory indefinitely).


FWIW I'm not making any conclusions about sb_alloc benefits outside 
reorderbuffer.c - it might easily be worth pursuing, no doubt about 
that. The amount of remaining work however seems quite high, though.


Attached is the modified sb_alloc patch that I used - it's mostly v1 
with removed uses in nbtree etc. FWIW the patch does not implement 
sb_destroy_private_allocator (it's only defined in the header), which 
seems like a bug.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/test_freepage/Makefile b/contrib/test_freepage/Makefile
new file mode 100644
index 000..b482fe9
--- /dev/null
+++ b/contrib/test_freepage/Makefile
@@ -0,0 +1,17 @@
+# contrib/test_freepage/Makefile
+
+MODULES = test_freepage
+
+EXTENSION = test_freepage
+DATA = test_freepage--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/test_freepage
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/test_freepage/test_freepage--1.0.sql b/contrib/test_freepage/test_freepage--1.0.sql
new file mode 100644
index 000..5d3191e
--- /dev/null
+++ b/contrib/test_freepage/test_freepage--1.0.sql
@@ -0,0 +1,15 @@
+/* contrib/test_freepage/test_freepage--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION test_freepage" to load this file. \quit
+
+CREATE FUNCTION init(size pg_catalog.int8) RETURNS pg_catalog.void
+	AS 'MODULE_PATHNAME' LANGUAGE C STRICT;
+CREATE FUNCTION get(pages pg_catalog.int8) RETURNS pg_catalog.int8
+	AS 'MODULE_PATHNAME' LANGUAGE C STRICT;
+CREATE FUNCTION inquire_largest() RETURNS pg_catalog.int8
+	AS 'MODULE_PATHNAME' LANGUAGE C STRICT;
+CREATE FUNCTION put(first_page pg_catalog.int8, npages pg_catalog.int8)
+	RETURNS pg_catalog.void AS 'MODULE_PATHNAME' LANGUAGE C STRICT;
+CREATE FUNCTION dump() RETURNS pg_catalog.text
+AS 'MODULE_PATHNAME' LANGUAGE C STRICT;
diff --git a/contrib/test_freepage/test_freepage.c b/contrib/test_freepage/test_freepage.c
new file mode 100644
index 000..074cf56
--- /dev/null
+++ b/contrib/test_freepage/test_freepage.c
@@ -0,0 +1,113 @@
+/*--
+ *
+ * test_freepage.c
+ *		Test harness code for free page manager.
+ *
+ * Copyright (C) 2013, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *		contrib/test_freepage/test_freepage.c
+ *
+ * -
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "miscadmin.h"
+#include "utils/builtins.h"
+#include "utils/freepage.h"
+
+PG_MODULE_MAGIC;
+PG_FUNCTION_INFO_V1(init);
+PG_FUNCTION_INFO_V1(get);
+PG_FUNCTION_INFO_V1(inquire_largest);
+PG_FUNCTION_INFO_V1(put);
+PG_FUNCTION_INFO_V1(dump);
+
+Datum		init(PG_FUNCTION_ARGS);
+Datum		get(PG_FUNCTION_ARGS);
+Datum		inquire_largest(PG_FUNCTION_ARGS);
+Datum		put(PG_FUNCTION_ARGS);
+Datum		dump(PG_FUNCTION_ARGS);
+
+char *space;
+FreePageManager *fpm;
+
+Datum
+init(PG_FUNCTION_ARGS)
+{
+	int64 size = PG_GETARG_INT64(0);
+	Size	

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-22 Thread Tomas Vondra

On 10/20/2016 04:43 PM, Robert Haas wrote:

On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinek  wrote:

I agree though that the usability beyond the ReoderBuffer is limited
because everything that will want to use it for part of allocations will
get much more complicated by the fact that it will have to use two
different allocators.

I was wondering if rather than trying to implement new allocator we
should maybe implement palloc_fixed which would use some optimized
algorithm for fixed sized objects in our current allocator. The
advantage of that would be that we could for example use that for things
like ListCell easily (memory management of which I see quite often in
profiles).


The sb_alloc allocator I proposed a couple of years ago would work
well for this case, I think.



Maybe, but it does not follow the Memory Context design at all, if I 
understand it correctly. I was willing to give it a spin anyway and see 
how it compares to the two other allocators, but this is a significant 
paradigm shift and certainly much larger step than what I proposed.


I'm not even sure it's possible to implement a MemoryContext based on 
the same ideas as sb_alloc(), because one of the important points of 
sb_alloc design seems to be throwing away the chunk header. While that 
may be possible, it would certainly affect the whole tree (not just the 
reorderbuffer bit), and it'd require way more work.


Moreover, the two allocators I proposed significantly benefit from the 
"same lifespan" assumption. I don't think sb_alloc can do that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-20 Thread Robert Haas
On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinek  wrote:
> I agree though that the usability beyond the ReoderBuffer is limited
> because everything that will want to use it for part of allocations will
> get much more complicated by the fact that it will have to use two
> different allocators.
>
> I was wondering if rather than trying to implement new allocator we
> should maybe implement palloc_fixed which would use some optimized
> algorithm for fixed sized objects in our current allocator. The
> advantage of that would be that we could for example use that for things
> like ListCell easily (memory management of which I see quite often in
> profiles).

The sb_alloc allocator I proposed a couple of years ago would work
well for this case, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-20 Thread Tomas Vondra

On 10/19/2016 02:51 PM, Tomas Vondra wrote:

...

>

Yeah. There are three contexts in reorder buffers:

- changes (fixed size)
- txns (fixed size)
- tuples (variable size)

The first two work perfectly fine with Slab.

The last one (tuples) is used to allocate variable-sized bits, so I've
tried to come up with something smart - a sequence of Slabs + overflow
AllocSet. I agree that in hindsight it's a bit strange, and that the
"generational" aspect is the key aspect here - i.e. it might be possible
to implement a memory context that allocates variable-length chunks but
still segregates them into generations. That is, don't build this on top
of Slab. That would also fix the issue with two allocators in GenSlab.
I'll think about this.


And here is a fairly complete prototype of this idea, adding "Gen" 
generational memory context based only on the "similar lifespan" 
assumption (and abandoning the fixed-size assumption). It's much simpler 
than GenSlab (which it's supposed to replace), and abandoning the idea 
of composing two memory contexts also fixed the warts with some of the 
API methods (e.g. repalloc).


I've also been thinking that perhaps "Gen" would be useful for all three 
contexts in ReorderBuffer - so I've done a quick test comparing the 
various combinations (using the test1() function used before).


 master   slabs   slabs+genslab   slabs+gen gens
 
   50k  18700   210 220 190  190
  100k 16   380 470 350  350
  200kN/A   750 920 740  679
  500kN/A  225022401790 1740
 1000kN/A  460050003910 3700

Where:

* master - 23ed2ba812117
* slabs  - all three contexts use Slab (patch 0001)
* slabs+genslab - third context is GenSlab (patch 0002)
* slabs+gen - third context is the new Gen (patch 0003)
* gens - all three contexts use Gen

The results are a bit noisy, but I think it's clear the new Gen context 
performs well - it actually seems a bit faster than GenSlab, and using 
only Gen for all three contexts does not hurt peformance.


This is most likely due to the trivial (practically absent) freespace 
management in Gen context, compared to both Slab and GenSlab. So the 
speed is not the only criteria - I haven't measured memory consumption, 
but I'm pretty sure there are cases where Slab consumes much less memory 
than Gen, thanks to reusing free space.


I'd say throwing away GenSlab and keeping Slab+Gen is the way to go.

There's still a fair bit of work on this, particularly implementing the 
missing API methods in Gen - GenCheck() and GenStats(). As Robert 
pointed out, there's also quite a bit of duplicated code between the 
different memory contexts (randomization and valgrind-related), so this 
needs to be moved to a shared place.


I'm also thinking that we need better names, particularly for the Gen 
allocator. It's supposed to mean Generational, although there are no 
explicit generations anymore. Slab is probably OK - it does not match 
any of the existing kernel slab allocators exactly, but it follows the 
same principles, which is what matters.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v4.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-19 Thread Tomas Vondra


On 10/19/2016 12:27 AM, Petr Jelinek wrote:
> On 18/10/16 22:25, Robert Haas wrote:
>> On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
>>  wrote:
>>> attached is v3 of the patches, with a few minor fixes in Slab, and much
>>> larger fixes in GenSlab.
>>>
>>> Slab (minor fixes)
>>> --
>>> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
>>> still need to zero the free bitmap at the end of the block.
>>> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
>>> why/how firstFreeChunk and minFreeChunks are maintained.
>>> - Fixed / improved a bunch of additional comments, based on feedback.
>>
>> I had a look at 0001 today, but it seems to me that it still needs
>> work.  It's still got a lot of remnants of where you've
>> copy-and-pasted aset.c.  I dispute this allegation:
>>
>> + * SlabContext is our standard implementation of MemoryContext.
>>
> 
> Are you looking at old version of the patch? I complained about this as
> well and Tomas has changed that.
> 
>> And then there's this:
>>
>> +#ifdef HAVE_ALLOCINFO
>> +#define SlabFreeInfo(_cxt, _chunk) \
>> +fprintf(stderr, "AllocFree: %s: %p, %d\n", \
>> +(_cxt)->header.name, (_chunk), (_chunk)->size)
>> +#define SlabAllocInfo(_cxt, _chunk) \
>> +fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
>> +(_cxt)->header.name, (_chunk), (_chunk)->size)
>>
>> Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
>> AllocAlloc, a think that, as far as I can tell, is not real. But
>> having this new context type also pretend to be AllocAlloc is even
>> dumber.
> 
> You are definitely looking at old version.
> 

Yeah.

>>
>> More broadly, I'm not sure I like this design very much.  The whole
>> point of a slab context is that all of the objects are the same size.
>> I wouldn't find it too difficult to support this patch if we were
>> adding an allocator for fixed-size objects that was then being used to
>> allocate objects which are fixed size.  However, what we seem to be
>> doing is creating an allocator for fixed-size objects and then using
>> it for variable-size tuples.  That's really pretty weird.  Isn't the
>> root of this problem that aset.c is utterly terrible at handling large
>> number of allocations?  Maybe we should try to attack that problem
>> more directly.
> 
> It's used for TXNs which are fixed and some tuples (there is
> assumption that the decoded tuples have more or less normal
> distribution).
> 

Yeah. There are three contexts in reorder buffers:

- changes (fixed size)
- txns (fixed size)
- tuples (variable size)

The first two work perfectly fine with Slab.

The last one (tuples) is used to allocate variable-sized bits, so I've
tried to come up with something smart - a sequence of Slabs + overflow
AllocSet. I agree that in hindsight it's a bit strange, and that the
"generational" aspect is the key aspect here - i.e. it might be possible
to implement a memory context that allocates variable-length chunks but
still segregates them into generations. That is, don't build this on top
of Slab. That would also fix the issue with two allocators in GenSlab.
I'll think about this.

> I agree though that the usability beyond the ReoderBuffer is limited
> because everything that will want to use it for part of allocations will
> get much more complicated by the fact that it will have to use two
> different allocators.
>

It wasn't my (primary) goal to provide allocators usable outside
ReorderBuffer. I've intended to show that perhaps using AllocSet and
then trying to compensate for the pfree() issues is the wrong direction,
and that perhaps different allocation strategy (exploiting the
ReorderBuffer specifics) would work much better. And I think the two
allocators show prove that.

>
> I was wondering if rather than trying to implement new allocator we
> should maybe implement palloc_fixed which would use some optimized
> algorithm for fixed sized objects in our current allocator. The
> advantage of that would be that we could for example use that for things
> like ListCell easily (memory management of which I see quite often in
> profiles).
> 

I don't see how inveting palloc_fixed() solves any of the problems, and
I think it's going to be much more complicated than you think. The idea
of injecting this into AllocSet seems like a dead-end to me, as the code
is already complex enough and it's likely to cause regressions no matter
what you do.

I prefer the idea of implementing separate specialized memory contexts.
If the bar is moved to "implement palloc_fixed()" or something like
that, someone else will have to do that - I'm not all that interested in
ReorderBuffer (this was the first time I actually saw that code), so my
motivation to spend much more time on this is rather small.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote 

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-18 Thread Petr Jelinek
On 18/10/16 22:25, Robert Haas wrote:
> On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
>  wrote:
>> attached is v3 of the patches, with a few minor fixes in Slab, and much
>> larger fixes in GenSlab.
>>
>> Slab (minor fixes)
>> --
>> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
>> still need to zero the free bitmap at the end of the block.
>> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
>> why/how firstFreeChunk and minFreeChunks are maintained.
>> - Fixed / improved a bunch of additional comments, based on feedback.
> 
> I had a look at 0001 today, but it seems to me that it still needs
> work.  It's still got a lot of remnants of where you've
> copy-and-pasted aset.c.  I dispute this allegation:
> 
> + * SlabContext is our standard implementation of MemoryContext.
> 

Are you looking at old version of the patch? I complained about this as
well and Tomas has changed that.

> And then there's this:
> 
> +#ifdef HAVE_ALLOCINFO
> +#define SlabFreeInfo(_cxt, _chunk) \
> +fprintf(stderr, "AllocFree: %s: %p, %d\n", \
> +(_cxt)->header.name, (_chunk), (_chunk)->size)
> +#define SlabAllocInfo(_cxt, _chunk) \
> +fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
> +(_cxt)->header.name, (_chunk), (_chunk)->size)
> 
> Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
> AllocAlloc, a think that, as far as I can tell, is not real. But
> having this new context type also pretend to be AllocAlloc is even
> dumber.

You are definitely looking at old version.

> 
> More broadly, I'm not sure I like this design very much.  The whole
> point of a slab context is that all of the objects are the same size.
> I wouldn't find it too difficult to support this patch if we were
> adding an allocator for fixed-size objects that was then being used to
> allocate objects which are fixed size.  However, what we seem to be
> doing is creating an allocator for fixed-size objects and then using
> it for variable-size tuples.  That's really pretty weird.  Isn't the
> root of this problem that aset.c is utterly terrible at handling large
> number of allocations?  Maybe we should try to attack that problem
> more directly.

It's used for TXNs which are fixed and some tuples (there is assumption
that the decoded tuples have more or less normal distribution).

I agree though that the usability beyond the ReoderBuffer is limited
because everything that will want to use it for part of allocations will
get much more complicated by the fact that it will have to use two
different allocators.

I was wondering if rather than trying to implement new allocator we
should maybe implement palloc_fixed which would use some optimized
algorithm for fixed sized objects in our current allocator. The
advantage of that would be that we could for example use that for things
like ListCell easily (memory management of which I see quite often in
profiles).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-18 Thread Robert Haas
On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
 wrote:
> attached is v3 of the patches, with a few minor fixes in Slab, and much
> larger fixes in GenSlab.
>
> Slab (minor fixes)
> --
> - Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
> still need to zero the free bitmap at the end of the block.
> - Renamed minFreeCount to minFreeChunks, added a few comments explaining
> why/how firstFreeChunk and minFreeChunks are maintained.
> - Fixed / improved a bunch of additional comments, based on feedback.

I had a look at 0001 today, but it seems to me that it still needs
work.  It's still got a lot of remnants of where you've
copy-and-pasted aset.c.  I dispute this allegation:

+ * SlabContext is our standard implementation of MemoryContext.

And all of this is just a direct copy-paste; I don't really want two copies:

+ *  When running under Valgrind, we want a NOACCESS memory region both before
+ *  and after the allocation.  The chunk header is tempting as the preceding
+ *  region, but mcxt.c expects to able to examine the standard chunk header
+ *  fields.  Therefore, we use, when available, the requested_size field and
+ *  any subsequent padding.  requested_size is made NOACCESS before returning
+ *  a chunk pointer to a caller.  However, to reduce client request traffic,
+ *  it is kept DEFINED in chunks on the free list.

And then there's this:

+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+fprintf(stderr, "AllocFree: %s: %p, %d\n", \
+(_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
+(_cxt)->header.name, (_chunk), (_chunk)->size)

Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
AllocAlloc, a think that, as far as I can tell, is not real. But
having this new context type also pretend to be AllocAlloc is even
dumber.

+static void
+randomize_mem(char *ptr, size_t size)
+{
+static int  save_ctr = 1;
+size_t  remaining = size;
+int ctr;
+
+ctr = save_ctr;
+VALGRIND_MAKE_MEM_UNDEFINED(ptr, size);
+while (remaining-- > 0)
+{
+*ptr++ = ctr;
+if (++ctr > 251)
+ctr = 1;
+}
+VALGRIND_MAKE_MEM_UNDEFINED(ptr - size, size);
+save_ctr = ctr;
+}
+#endif   /* RANDOMIZE_ALLOCATED_MEMORY */

Another copy of this doesn't seem like a good idea, either.

More broadly, I'm not sure I like this design very much.  The whole
point of a slab context is that all of the objects are the same size.
I wouldn't find it too difficult to support this patch if we were
adding an allocator for fixed-size objects that was then being used to
allocate objects which are fixed size.  However, what we seem to be
doing is creating an allocator for fixed-size objects and then using
it for variable-size tuples.  That's really pretty weird.  Isn't the
root of this problem that aset.c is utterly terrible at handling large
number of allocations?  Maybe we should try to attack that problem
more directly.

On a related note, the autodestruct thing is a weird hack that's only
necessary because of the hijinks already discussed in the previous
paragraph.  The context has no fixed lifetime; we're just trying to
find a way of coping with possibly-shifting tuple sizes over time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-05 Thread John Gorman
On Tue, Oct 4, 2016 at 10:11 PM, Tomas Vondra

For GenSlab the situation is less clear, as there probably are ways to make
> it work, but I'd vote to keep it simple for now, and simply do elog(ERROR)
> in the realloc() methods - both for Slab and GenSlab. The current use case
> (reorderbuffer) does not need that, and it seems like a can of worms to me.


Good plan. Realloc can be added later if there is an actual use case.


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-04 Thread Petr Jelinek
On 05/10/16 03:11, Tomas Vondra wrote:
> On 10/04/2016 09:44 PM, John Gorman wrote:
>>
>> Remind me again why we cannot realloc in place for sizes
>> smaller than chunkSize? GenSlab is happy to allocate smaller
>> sizes and put them into the fixed size chunks.
>>
>> Maybe SlabAlloc can be happy with sizes up to chunkSize.
>>
>> if (size <= set->chunkSize)
>> return MemoryContextAlloc(set->slab, size);
>> else
>> return MemoryContextAlloc(set->aset, size);
>>
> 
> That'd be certainly possible, but it seems a bit strange as the whole
> Slab is based on the idea that all chunks have the same size. Moreover,
> people usually realloc() to a larger chunk, so it does not really fix
> anything in practice.
> 
> In GenSlab, the situation is more complicated. But I don't like the
> coupling / moving chunks between contexts, etc.
> 
> If realloc() support is a hard requirement, it immediately rules out
> SlabContext() as an independent memory context. Which seems stupid, as
> independent Slab contexts are quite useful for reorderbuffer use case.
> 
> For GenSlab the situation is less clear, as there probably are ways to
> make it work, but I'd vote to keep it simple for now, and simply do
> elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The
> current use case (reorderbuffer) does not need that, and it seems like a
> can of worms to me.
> 

Hmm, so this in practice means that the caller still has to know the
details of what chunks go where. I would prefer if the realloc just
failed always and "don't do realloc on GenSlab" would be part of spec of
hat context, the randomness that you described originally is the main
problem IMHO. Maybe you could add new "constructor" function for Aset
that would create Aset which can't realloc for use inside the GenSlab?

Alternative would be of course having the individual API calls behind
Aset and Slab exported and used by GenSlab directly instead of using
child contexts. Then all the calls would go to GenSlab which could
decide what to do (and move the chunks between the allocators).

But honestly given the usecases for GenSlab, I would at the moment
prefer just to have predictable error as it can be done more cleanly and
nobody needs the functionality so far, it can be revisited once we
actually do need it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-04 Thread Tomas Vondra

Hi,

attached is v3 of the patches, with a few minor fixes in Slab, and much 
larger fixes in GenSlab.


Slab (minor fixes)
--
- Removed the unnecessary memset() of new blocks in SlabAlloc(), 
although we still need to zero the free bitmap at the end of the block.
- Renamed minFreeCount to minFreeChunks, added a few comments explaining 
why/how firstFreeChunk and minFreeChunks are maintained.

- Fixed / improved a bunch of additional comments, based on feedback.

GenSlab
---
Fixed a bunch of bugs that made GenSlab utterly useless. Firstly, 
chunkSize was not stored in GenSlabContextCreate(), so this check in 
SlabAlloc()


if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, set->chunkSize);
else
return MemoryContextAlloc(set->aset, size);

always fell through to the set->aset case, not allocating stuff in the 
Slab at all.


Secondly, nallocations / nbytes counters were not updated at all, so the 
Slab was never recreated, so GenSlab was not really generational.


This only affected 1 of 3 contexts in ReorderBuffer, but apparently 
those "important ones" to affect performance.


Both issues are fixed in the attached v3, which also introduces two 
additional improvements discussed in this thread:


- The chunk size is limited by ALLOCSET_SEPARATE_THRESHOLD, as for large 
chunks AllocSet works just fine (thanks to keeping them out of free list 
etc.)


- Instead of specifying blockSize and chunkSize, GenSlabCreate() now 
accepts three parameters - minBlockSize, minChunkCount and chunkSize, 
and computes the minimum block size (>= minBlockSize), sufficient to 
store minChunkCount chunks, each chunkSize bytes. This works much better 
in the auto-tuning scenario.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


slab-allocators-v3.tgz
Description: application/compressed-tar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-04 Thread Tomas Vondra

On 10/04/2016 09:44 PM, John Gorman wrote:

SlabContext has a parent context. It can delegate requests that
it cannot handle to the parent context which is GenSlab. Genslab
can send them to the sister aset context.



But Slab may also be used separately, not just as part of GenSlab 
(actually, reorderbuffer has two such contexts). That complicates things 
quite a bit, and it also seems a bit awkward, because:


(a) It'd require a flag in SlabContext (or perhaps a pointer to the 
second context), which introduces coupling between the contexts.


(b) SlabContext was meant to be extremely simple (based on the "single 
chunk size" idea), and this contradicts that a bit.


(c) It'd move chunks between the memory contexts in unpredictable ways 
(although the user should treat it as a single context, and not reset 
the parts independently for example).


>

This could handle all reallocs so there will be no surprises.



Yeah, but it's also

>

Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.

Maybe SlabAlloc can be happy with sizes up to chunkSize.

if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);



That'd be certainly possible, but it seems a bit strange as the whole 
Slab is based on the idea that all chunks have the same size. Moreover, 
people usually realloc() to a larger chunk, so it does not really fix 
anything in practice.


In GenSlab, the situation is more complicated. But I don't like the 
coupling / moving chunks between contexts, etc.


If realloc() support is a hard requirement, it immediately rules out 
SlabContext() as an independent memory context. Which seems stupid, as 
independent Slab contexts are quite useful for reorderbuffer use case.


For GenSlab the situation is less clear, as there probably are ways to 
make it work, but I'd vote to keep it simple for now, and simply do 
elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The 
current use case (reorderbuffer) does not need that, and it seems like a 
can of worms to me.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-04 Thread John Gorman
SlabContext has a parent context. It can delegate requests that
it cannot handle to the parent context which is GenSlab. Genslab
can send them to the sister aset context.

This could handle all reallocs so there will be no surprises.

Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.

Maybe SlabAlloc can be happy with sizes up to chunkSize.

if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);

On Sat, Oct 1, 2016 at 10:15 PM, Tomas Vondra 
wrote:

> On 10/02/2016 12:23 AM, John Gorman wrote:
>
>> I reproduced the quadradic pfree performance problem and verified that
>> these patches solved it.
>>
>> The slab.c data structures and functions contain no quadradic components.
>>
>> I noticed the sizing loop in SlabContextCreate() and came up with
>> a similar formula to determine chunksPerBlock that you arrived at.
>>
>> Firstly, I've realized there's an issue when chunkSize gets too
>>> large - once it exceeds blockSize, the SlabContextCreate() fails
>>> as it's impossible to place a single chunk into the block. In
>>> reorderbuffer, this may happen when the tuples (allocated in
>>> tup_context) get larger than 8MB, as the context uses
>>> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>>>
>>> But maybe there's a simpler solution - we may simply cap the
>>> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
>>> AllocSet handles those requests in a special way - for example
>>> instead of tracking them in freelist, those chunks got freed
>>> immediately.
>>>
>>
>> I like this approach because it fixes the performance problems
>> with smaller allocations and doesn't change how larger
>> allocations are handled.
>>
>>
> One more comment about GenSlab, particularly about unpredictability of the
> repalloc() behavior. It works for "large" chunks allocated in the AllocSet
> part, and mostly does not work for "small" chunks allocated in the
> SlabContext. Moreover, the chunkSize changes over time, so for two chunks
> of the same size, repalloc() may work on one of them and fail on the other,
> depending on time of allocation.
>
> With SlabContext it's perfectly predictable - repalloc() call fails unless
> the chunk size is exactly the same as before (which is perhaps a bit
> pointless, but if we decide to fail even in this case it'll work 100% time).
>
> But with GenSlabContext it's unclear whether the call fails or not.
>
> I don't like this unpredictability - I'd much rather have consistent
> failures (making sure people don't do repalloc() on with GenSlab). But I
> don't see a nice way to achieve that - the repalloc() call does not go
> through GenSlabRealloc() at all, but directly to SlabRealloc() or
> AllocSetRealloc().
>
> The best solution I can think of is adding an alternate version of
> AllocSetMethods, pointing to a different AllocSetReset implementation.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-02 Thread Michael Paquier
On Sun, Oct 2, 2016 at 10:15 AM, Tomas Vondra
 wrote:
> One more comment about GenSlab, particularly about unpredictability of the
> repalloc() behavior. It works for "large" chunks allocated in the AllocSet
> part, and mostly does not work for "small" chunks allocated in the
> SlabContext. Moreover, the chunkSize changes over time, so for two chunks of
> the same size, repalloc() may work on one of them and fail on the other,
> depending on time of allocation.
>
> With SlabContext it's perfectly predictable - repalloc() call fails unless
> the chunk size is exactly the same as before (which is perhaps a bit
> pointless, but if we decide to fail even in this case it'll work 100% time).
>
> But with GenSlabContext it's unclear whether the call fails or not.
>
> I don't like this unpredictability - I'd much rather have consistent
> failures (making sure people don't do repalloc() on with GenSlab). But I
> don't see a nice way to achieve that - the repalloc() call does not go
> through GenSlabRealloc() at all, but directly to SlabRealloc() or
> AllocSetRealloc().
>
> The best solution I can think of is adding an alternate version of
> AllocSetMethods, pointing to a different AllocSetReset implementation.

You guys are still playing with this patch, so moved to next CF with
"waiting on author".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-01 Thread Tomas Vondra

On 10/02/2016 12:23 AM, John Gorman wrote:

I reproduced the quadradic pfree performance problem and verified that
these patches solved it.

The slab.c data structures and functions contain no quadradic components.

I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.


Firstly, I've realized there's an issue when chunkSize gets too
large - once it exceeds blockSize, the SlabContextCreate() fails
as it's impossible to place a single chunk into the block. In
reorderbuffer, this may happen when the tuples (allocated in
tup_context) get larger than 8MB, as the context uses
SLAB_LARGE_BLOCK_SIZE (which is 8MB).

But maybe there's a simpler solution - we may simply cap the
chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
AllocSet handles those requests in a special way - for example
instead of tracking them in freelist, those chunks got freed
immediately.


I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.



One more comment about GenSlab, particularly about unpredictability of 
the repalloc() behavior. It works for "large" chunks allocated in the 
AllocSet part, and mostly does not work for "small" chunks allocated in 
the SlabContext. Moreover, the chunkSize changes over time, so for two 
chunks of the same size, repalloc() may work on one of them and fail on 
the other, depending on time of allocation.


With SlabContext it's perfectly predictable - repalloc() call fails 
unless the chunk size is exactly the same as before (which is perhaps a 
bit pointless, but if we decide to fail even in this case it'll work 
100% time).


But with GenSlabContext it's unclear whether the call fails or not.

I don't like this unpredictability - I'd much rather have consistent 
failures (making sure people don't do repalloc() on with GenSlab). But I 
don't see a nice way to achieve that - the repalloc() call does not go 
through GenSlabRealloc() at all, but directly to SlabRealloc() or 
AllocSetRealloc().


The best solution I can think of is adding an alternate version of 
AllocSetMethods, pointing to a different AllocSetReset implementation.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-01 Thread Tomas Vondra

On 10/02/2016 12:23 AM, John Gorman wrote:

I reproduced the quadradic pfree performance problem and verified
that these patches solved it.

The slab.c data structures and functions contain no quadradic
components.

I noticed the sizing loop in SlabContextCreate() and came up with a
similar formula to determine chunksPerBlock that you arrived at.



;-)


Firstly, I've realized there's an issue when chunkSize gets too
large - once it exceeds blockSize, the SlabContextCreate() fails
as it's impossible to place a single chunk into the block. In
reorderbuffer, this may happen when the tuples (allocated in
tup_context) get larger than 8MB, as the context uses
SLAB_LARGE_BLOCK_SIZE (which is 8MB).

But maybe there's a simpler solution - we may simply cap the
chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
AllocSet handles those requests in a special way - for example
instead of tracking them in freelist, those chunks got freed
immediately.


I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.



Right.


In slab.c it looks like a line in the top comments could be clearer.
Perhaps this is what is meant.

< *  (plus alignment), now wasting memory.

*  (plus alignment), not wasting memory.


In slab.c some lines are over 80 characters could be folded.

It would be nice to give each patch version a unique file name.



OK, will fix.


Nice patch, I enjoyed reading it!



;-)

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-01 Thread Tomas Vondra

On 10/02/2016 01:53 AM, Jim Nasby wrote:

On 9/26/16 9:10 PM, Tomas Vondra wrote:

Attached is v2 of the patch, updated based on the review. That means:


+/* make sure the block can store at least one chunk (with 1B for a
bitmap)? */
(and the comment below it)

I find the question to be confusing... I think these would be better as

+/* make sure the block can store at least one chunk (with 1B for a
bitmap) */
and
+/* number of chunks can we fit into a block, including header and
bitmap */



Thanks, will rephrase the comments a bit.


I'm also wondering if a 1B freespace bitmap is actually useful. If
nothing else, there should probably be a #define for the initial
bitmap size.


That's not the point. The point is that we need to store at least one 
entry per block, with one bit in a bitmap. But we can't store just a 
single byte - we always allocate at least 1 byte. So it's more an 
explanation for the "1" literal in the check, nothing more.




+/* otherwise add it to the proper freelist bin */
Looks like something went missing... :)



Ummm? The patch contains this:

+   /* otherwise add it to the proper freelist bin */
+   if (set->freelist[block->nfree])
+   set->freelist[block->nfree]->prev = block;
+
+   block->next = set->freelist[block->nfree];
+   set->freelist[block->nfree] = block;

Which does exactly the thing it should do. Or what is missing?



Should zeroing the block in SlabAlloc be optional like it is with
palloc/palloc0?



Good catch. The memset(,0) should not be in SlabAlloc() as all, as the 
zeroing is handled in mctx.c, independently of the implementation.



+/*
+ * If there are no free chunks in any existing block, create a new
block
+ * and put it to the last freelist bucket.
+ */
+if (set->minFreeCount == 0)
Couldn't there be blocks that have a free count > minFreeCount? (In
which case you don't want to just alloc a new block, no?)

Nevermind, after reading further down I understand what's going on. I
got confused by "we've decreased nfree for a block with the minimum
count" until I got to the part that deals with minFreeCount = 0. I think
it'd be helpful if the "we've decreased nfree" comment mentioned that if
nfree is 0 we'll look for the correct value after updating the freelists.



Right. I think it'd be good to add an assert ensuring the minFreeCount 
value matches the block freelist. Or at least SlabCheck() could check this.



+block->bitmapptr[idx/8] |= (0x01 << (idx % 8));
Did you consider using a pre-defined map of values to avoid the shift? I
know I've seen that somewhere in the code...

Patch 2...

Doesn't GenSlabReset() need to actually free something? If the call
magically percolates to the other contexts it'd be nice to note that in
the comment.


No, the other contexts are created as children of GenSlab, so the memory 
context infrastructure resets them automatically. I don't think this 
needs to be mentioned in the comments - it's pretty basic part of the 
parent-child context relationship.


Thanks!

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-01 Thread Jim Nasby

On 9/26/16 9:10 PM, Tomas Vondra wrote:

Attached is v2 of the patch, updated based on the review. That means:


+	/* make sure the block can store at least one chunk (with 1B for a 
bitmap)? */

(and the comment below it)

I find the question to be confusing... I think these would be better as

+	/* make sure the block can store at least one chunk (with 1B for a 
bitmap) */

and
+	/* number of chunks can we fit into a block, including header and 
bitmap */


I'm also wondering if a 1B freespace bitmap is actually useful. If 
nothing else, there should probably be a #define for the initial bitmap 
size.


+   /* otherwise add it to the proper freelist bin */
Looks like something went missing... :)


Should zeroing the block in SlabAlloc be optional like it is with 
palloc/palloc0?


+   /*
+* If there are no free chunks in any existing block, create a new block
+* and put it to the last freelist bucket.
+*/
+   if (set->minFreeCount == 0)
Couldn't there be blocks that have a free count > minFreeCount? (In 
which case you don't want to just alloc a new block, no?)


Nevermind, after reading further down I understand what's going on. I 
got confused by "we've decreased nfree for a block with the minimum 
count" until I got to the part that deals with minFreeCount = 0. I think 
it'd be helpful if the "we've decreased nfree" comment mentioned that if 
nfree is 0 we'll look for the correct value after updating the freelists.


+   block->bitmapptr[idx/8] |= (0x01 << (idx % 8));
Did you consider using a pre-defined map of values to avoid the shift? I 
know I've seen that somewhere in the code...


Patch 2...

Doesn't GenSlabReset() need to actually free something? If the call 
magically percolates to the other contexts it'd be nice to note that in 
the comment.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-10-01 Thread John Gorman
I reproduced the quadradic pfree performance problem and verified that
these patches solved it.

The slab.c data structures and functions contain no quadradic components.

I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.

> Firstly, I've realized there's an issue when chunkSize gets too
> large - once it exceeds blockSize, the SlabContextCreate() fails
> as it's impossible to place a single chunk into the block. In
> reorderbuffer, this may happen when the tuples (allocated in
> tup_context) get larger than 8MB, as the context uses
> SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>
> But maybe there's a simpler solution - we may simply cap the
> chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
> AllocSet handles those requests in a special way - for example
> instead of tracking them in freelist, those chunks got freed
> immediately.

I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.

In slab.c it looks like a line in the top comments could be clearer.
Perhaps this is what is meant.

< *  (plus alignment), now wasting memory.
> *  (plus alignment), not wasting memory.

In slab.c some lines are over 80 characters could be folded.

It would be nice to give each patch version a unique file name.

Nice patch, I enjoyed reading it!

Best, John

John Gorman

On Mon, Sep 26, 2016 at 10:10 PM, Tomas Vondra  wrote:

> Hi,
>
> Attached is v2 of the patch, updated based on the review. That means:
>
> - Better comment explaining how free chunks are tracked in Slab context.
>
> - Removed the unused SlabPointerIsValid macro.
>
> - Modified the comment before SlabChunkData, explaining how it relates
>   to StandardChunkHeader.
>
> - Replaced the two Assert() calls with elog().
>
> - Implemented SlabCheck(). I've ended up with quite a few checks there,
>   checking pointers between the context, block and chunks, checks due
>   to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
>   number of free chunks (bitmap, freelist vs. chunk header).
>
> - I've also modified SlabContextCreate() to compute chunksPerBlock a
>   bit more efficiently (use a simple formula instead of the loop, which
>   might be a bit too expensive for large blocks / small chunks).
>
>
> I haven't done any changes to GenSlab, but I do have a few notes:
>
> Firstly, I've realized there's an issue when chunkSize gets too large -
> once it exceeds blockSize, the SlabContextCreate() fails as it's impossible
> to place a single chunk into the block. In reorderbuffer, this may happen
> when the tuples (allocated in tup_context) get larger than 8MB, as the
> context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).
>
> For Slab the elog(ERROR) is fine as both parameters are controlled by the
> developer directly, but GenSlab computes the chunkSize on the fly, so we
> must not let it fail like that - that'd result in unpredictable failures,
> which is not very nice.
>
> I see two ways to fix this. We may either increase the block size
> automatically - e.g. instead of specifying specifying chunkSize and
> blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and
> then choose the smallest 2^k block large enough). For example with
> chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the
> closest 2^k block larger than 96000 bytes.
>
> But maybe there's a simpler solution - we may simply cap the chunkSize (in
> GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those
> requests in a special way - for example instead of tracking them in
> freelist, those chunks got freed immediately.
>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-26 Thread Tomas Vondra

Hi,

Attached is v2 of the patch, updated based on the review. That means:

- Better comment explaining how free chunks are tracked in Slab context.

- Removed the unused SlabPointerIsValid macro.

- Modified the comment before SlabChunkData, explaining how it relates
  to StandardChunkHeader.

- Replaced the two Assert() calls with elog().

- Implemented SlabCheck(). I've ended up with quite a few checks there,
  checking pointers between the context, block and chunks, checks due
  to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
  number of free chunks (bitmap, freelist vs. chunk header).

- I've also modified SlabContextCreate() to compute chunksPerBlock a
  bit more efficiently (use a simple formula instead of the loop, which
  might be a bit too expensive for large blocks / small chunks).


I haven't done any changes to GenSlab, but I do have a few notes:

Firstly, I've realized there's an issue when chunkSize gets too large - 
once it exceeds blockSize, the SlabContextCreate() fails as it's 
impossible to place a single chunk into the block. In reorderbuffer, 
this may happen when the tuples (allocated in tup_context) get larger 
than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).


For Slab the elog(ERROR) is fine as both parameters are controlled by 
the developer directly, but GenSlab computes the chunkSize on the fly, 
so we must not let it fail like that - that'd result in unpredictable 
failures, which is not very nice.


I see two ways to fix this. We may either increase the block size 
automatically - e.g. instead of specifying specifying chunkSize and 
blockSize when creating the Slab, specify chunkSize and chunksPerBlock 
(and then choose the smallest 2^k block large enough). For example with 
chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's 
the closest 2^k block larger than 96000 bytes.


But maybe there's a simpler solution - we may simply cap the chunkSize 
(in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles 
those requests in a special way - for example instead of tracking them 
in freelist, those chunks got freed immediately.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-simple-slab-allocator-fixed-size-allocations.patch
Description: binary/octet-stream


0002-generational-slab-auto-tuning-allocator.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-25 Thread Petr Jelinek
On 25/09/16 22:17, Tomas Vondra wrote:
> On 09/25/2016 08:48 PM, Petr Jelinek wrote:
> 
>> Slab:
>> In general it seems understandable, the initial description helps to
>> understand what's happening well enough.
>>
>> One thing I don't understand however is why the freelist is both
>> array and doubly linked list and why there is new implementation of
>> said doubly linked list given that we have dlist.
>>
> 
> Hmm, perhaps that should be explained better.
> 
> In AllocSet, we only have a global freelist of chunks, i.e. we have a
> list of free chunks for each possible size (there's 11 sizes, starting
> with 8 bytes and then doubling the size). So freelist[0] is a list of
> free 8B chunks, freelist[1] is a list of free 16B chunks, etc.
> 
> In Slab, the freelist has two levels - first there's a bitmap on each
> block (which is possible, as the chunks have constant size), tracking
> which chunks of that particular block are free. This makes it trivial to
> check that all chunks on the block are free, and free the whole block
> (which is impossible with AllocSet).
> 
> Second, the freelist at the context level tracks blocks with a given
> number of free chunks - so freelist[0] tracks completely full blocks,
> freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
> reuse space on almost full blocks first, in the hope that some of the
> less full blocks will get completely empty (and freed to the OS).
> 
> Is that clear?

Ah okay, makes sense, the documentation of this could be improved then
though as it's all squashed into single sentence that wasn't quite clear
for me.

> 
>>> +/*
>>> + * SlabChunk
>>> + *The prefix of each piece of memory in an SlabBlock
>>> + *
>>> + * NB: this MUST match StandardChunkHeader as defined by
>>> utils/memutils.h.
>>> + */
>>
>> Is this true? Why? And if it is then why doesn't the SlabChunk
>> actually match the StandardChunkHeader?
> 
> It is true, a lot of stuff in MemoryContext infrastructure relies on
> that. For example when you do pfree(ptr), we actually do something like
> 
>header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
> 
> to get the chunk header - which includes pointer to the memory context
> and other useful stuff.
> 
> This also means we can put additional fields before StandardChunkHeader
> as that does not break this pointer arithmetic, i.e. SlabChunkData is
> effectively defined like this:
> 
> typedef struct SlabChunkData
> {
> /* block owning this chunk */
> void *block;
> 
> /* standard header */
> StandardChunkHeader header;
> } SlabChunkData;
> 

Yes but your struct then does not match StandardChunkHeader exactly so
it should be explained in more detail (the aset.c where this comment is
also present has struct that matches StandardChunkHeader so it's
sufficient there).

>>
>>> +static void
>>> +SlabCheck(MemoryContext context)
>>> +{
>>> +/* FIXME */
>>> +}
>>
>> Do you plan to implement this interface?
>>
> 
> Yes, although I'm not sure what checks should go there. The only thing I
> can think of right now is checking that the number of free chunks on a
> block (according to the bitmap) matches the freelist index.
> 

Yeah this context does not seem like it needs too much checking. The
freelist vs free chunks check sounds ok to me. I guess GenSlab then will
call the checks for the underlying contexts.

>>> +#define SLAB_DEFAULT_BLOCK_SIZE8192
>>> +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024)
>>
>> I am guessing this is based on max_cached_tuplebufs? Maybe these
>> could be written with same style?
>>
> 
> Not sure I understand what you mean by "based on"? I don't quite
> remember how I came up with those constants, but I guess 8kB and 8MB
> seemed like good values.
> 
> Also, what style you mean? I've used the same style as for ALLOCSET_*
> constants in the same file.
> 

I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more
readable.  ALLOCSET_* does that too (with exception of
ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from
rest of the code).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-25 Thread Tomas Vondra

On 09/25/2016 08:48 PM, Petr Jelinek wrote:

Hi Tomas,

On 02/08/16 17:44, Tomas Vondra wrote:


This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):

Slab

* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)

GenSlab
---
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab



So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?



Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.

I haven't thought about other places that might benefit from these new 
allocators very much - in general, it's useful for places that produce a 
stream of equally-sized items (GenSlab relaxes this), that are pfreed() 
in ~FIFO manner (i.e. roughly in the order of allocation).




So none of this is meant as a universal replacement of AllocSet,
but in the suitable cases the results seem really promising. For
example for the simple test query in [1], the performance
improvement is this:

N |   master |  patched
   -
1 |100ms |100ms
5 |  15000ms |350ms
   10 | 146000ms |700ms
   20 |? |   1400ms

That's a fairly significant improvement, and the submitted version
of the patches should perform even better (~2x, IIRC).



I agree that it improves performance quite nicely and that
reoderbuffer could use this.

About the code. I am not quite sure that this needs to be split into
two patches especially if 1/3 of the second patch is the removal of
the code added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.



I don't know - it seemed natural to first introduce the Slab, as it's 
easier to discuss it separately, and it works for 2 of the 3 contexts 
needed in reorderbuffer.


GenSlab is an improvement of Slab, or rather based on it, so that it 
works for the third context. And it introduces some additional ideas 
(particularly the generational design, etc.)


Of course, none of this means it has to be committed in two separate 
chunks, or that I don't expect GenSlab to get committed ...



Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.

One thing I don't understand however is why the freelist is both
array and doubly linked list and why there is new implementation of
said doubly linked list given that we have dlist.



Hmm, perhaps that should be explained better.

In AllocSet, we only have a global freelist of chunks, i.e. we have a 
list of free chunks for each possible size (there's 11 sizes, starting 
with 8 bytes and then doubling the size). So freelist[0] is a list of 
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.


In Slab, the freelist has two levels - first there's a bitmap on each 
block (which is possible, as the chunks have constant size), tracking 
which chunks of that particular block are free. This makes it trivial to 
check that all chunks on the block are free, and free the whole block 
(which is impossible with AllocSet).


Second, the freelist at the context level tracks blocks with a given 
number of free chunks - so freelist[0] tracks completely full blocks, 
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to 
reuse space on almost full blocks first, in the hope that some of the 
less full blocks will get completely empty (and freed to the OS).


Is that clear?


+/*
+ * SlabContext is our standard implementation of MemoryContext.
+ *


Really?



Meh, that's clearly a bogus comment.


+/*
+ * SlabChunk
+ * The prefix of each piece of memory in an SlabBlock
+ *
+ * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
+ */


Is this true? Why? And if it is then why doesn't the SlabChunk
actually match the StandardChunkHeader?


It is true, a lot of stuff in MemoryContext infrastructure relies on 
that. For example when you do pfree(ptr), we actually do something like


   header = 

Re: [HACKERS] PATCH: two slab-like memory allocators

2016-09-25 Thread Petr Jelinek
Hi Tomas,

On 02/08/16 17:44, Tomas Vondra wrote:
> 
> This patch actually includes two new memory allocators (not one). Very
> brief summary (for more detailed explanation of the ideas, see comments
> at the beginning of slab.c and genslab.c):
> 
> Slab
> 
> * suitable for fixed-length allocations (other pallocs fail)
> * much simpler than AllocSet (no global freelist management etc.)
> * free space is tracked per block (using a simple bitmap)
> * which allows freeing the block once all chunks are freed (AllocSet
> will hold the memory forever, in the hope of reusing it)
> 
> GenSlab
> ---
> * suitable for non-fixed-length allocations, but with chunks of mostly
> the same size (initially unknown, the context will tune itself)
> * a combination AllocSet and Slab (or a sequence of Slab allocators)
> * the goal is to do most allocations in Slab context
> * there's always a single 'current' Slab context, and every now and and
> then it's replaced with a new generation (with the chunk size computed
> from recent requests)
> * the AllocSet context is used for chunks too large for current Slab
> 

So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?

> So none of this is meant as a universal replacement of AllocSet, but in
> the suitable cases the results seem really promising. For example for
> the simple test query in [1], the performance improvement is this:
> 
> N |   master |  patched
>-
> 1 |100ms |100ms
> 5 |  15000ms |350ms
>10 | 146000ms |700ms
>20 |? |   1400ms
> 
> That's a fairly significant improvement, and the submitted version of
> the patches should perform even better (~2x, IIRC).
> 

I agree that it improves performance quite nicely and that reoderbuffer
could use this.

About the code. I am not quite sure that this needs to be split into two
patches especially if 1/3 of the second patch is the removal of the code
added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.

Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.

One thing I don't understand however is why the freelist is both array
and doubly linked list and why there is new implementation of said
doubly linked list given that we have dlist.

> +/*
> + * SlabContext is our standard implementation of MemoryContext.
> + *

Really?

> +/*
> + * SlabChunk
> + *   The prefix of each piece of memory in an SlabBlock
> + *
> + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h.
> + */

Is this true? Why? And if it is then why doesn't the SlabChunk actually
match the StandardChunkHeader?

> +#define SlabPointerIsValid(pointer) PointerIsValid(pointer)

What's the point of this given that it's defined in the .c file?


> +static void *
> +SlabAlloc(MemoryContext context, Size size)
> +{
> + Slabset = (Slab) context;
> + SlabBlock   block;
> + SlabChunk   chunk;
> + int idx;
> +
> + AssertArg(SlabIsValid(set));
> +
> + Assert(size == set->chunkSize);

I wonder if there should be stronger protection (ie, elog) for the size
matching.


> +static void *
> +SlabRealloc(MemoryContext context, void *pointer, Size size)
> +{
> + Slabset = (Slab)context;
> +
> + /* can't do actual realloc with slab, but let's try to be gentle */
> + if (size == set->chunkSize)
> + return pointer;
> +
> + /* we can't really do repalloc with this allocator */
> + Assert(false);

This IMHO should definitely be elog.


> +static void
> +SlabCheck(MemoryContext context)
> +{
> + /* FIXME */
> +}

Do you plan to implement this interface?

> +#define SLAB_DEFAULT_BLOCK_SIZE  8192
> +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024)

I am guessing this is based on max_cached_tuplebufs? Maybe these could
be written with same style?

GenSlab:

Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: two slab-like memory allocators

2016-08-02 Thread Tomas Vondra

Hi,

Back in the bug #14231 thread [1], dealing with performance issues in 
reorderbuffer due to excessive number of expensive free() calls, I've 
proposed to resolve that by a custom slab-like memory allocator, 
suitable for fixed-size allocations. I'd like to put this into the next 
CF, and it's probably too invasive change to count as a bugfix anyway.


[1] 
https://www.postgresql.org/message-id/flat/20160706185502.1426.28143%40wrigleys.postgresql.org


This patch actually includes two new memory allocators (not one). Very 
brief summary (for more detailed explanation of the ideas, see comments 
at the beginning of slab.c and genslab.c):


Slab

* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet 
will hold the memory forever, in the hope of reusing it)


GenSlab
---
* suitable for non-fixed-length allocations, but with chunks of mostly 
the same size (initially unknown, the context will tune itself)

* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and 
then it's replaced with a new generation (with the chunk size computed 
from recent requests)

* the AllocSet context is used for chunks too large for current Slab

So none of this is meant as a universal replacement of AllocSet, but in 
the suitable cases the results seem really promising. For example for 
the simple test query in [1], the performance improvement is this:


N |   master |  patched
   -
1 |100ms |100ms
5 |  15000ms |350ms
   10 | 146000ms |700ms
   20 |? |   1400ms

That's a fairly significant improvement, and the submitted version of 
the patches should perform even better (~2x, IIRC).


There's a bunch of TODOs - e.g. handling of realloc() calls in the 
GenSlab, and probably things I haven't thought about.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-simple-slab-allocator-fixed-size-allocations.patch
Description: binary/octet-stream


0002-generational-slab-auto-tuning-allocator.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers