Re: Rethinking MemoryContext creation

2017-12-12 Thread Tom Lane
Andres Freund  writes:
> On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
>> It strikes me that a way to optimize these cases even more would be to
>> postpone creating the context until it's actually needed.  That might
>> not always be a reasonable plan -- in particular, it occurs to me to
>> think that CurTransactionContext is probably so widely used that
>> changing anything about how it works would probably be really painful
>> -- but it might be possible in some cases.

> That's not generally easy without slowing things down though - e.g. we
> don't want to check for ExprContext's existence before every use, there
> can be billions of usages in a single analytics query. The branches (yea
> yea ;)) would show up as being noticeable.

Yeah.  Also, in most of these cases what we're doing with the context
is installing it as CurrentMemoryContext while we execute some random
code that might or might not need to palloc anything.  We can't just
set CurrentMemoryContext to null - for one thing, that would leave no
trace of how the context should get set up if it's needed.  You could
imagine installing some sort of placeholder, but really that's the
mechanism we've already got, in the case where we just make a context
header and no blocks.

>> Another idea I have is that perhaps we could arrange to reuse contexts
>> instead of destroying them and recreating them.

> I'm a bit doubtful that's going to help, maintaining that list isn't
> going to be free, and the lifetime and number of those contexts aren't
> always going to match up.

Actually, this seems like a really promising idea to me.  To the extent
that an empty context has standard parameters, they're all
interchangeable, so you could imagine that AllocSetDelete shoves it
onto a freelist after resetting it, instead of just giving it back to
libc.  I'm slightly worried about creating allocation islands that
way, but that problem is probably surmountable, if it's real at all.

However, that seems like a different patch from what I'm working on
here.

regards, tom lane



Re: Rethinking MemoryContext creation

2017-12-12 Thread Andres Freund
On 2017-12-12 14:50:37 -0500, Robert Haas wrote:
> On Tue, Dec 12, 2017 at 2:30 PM, Tom Lane  wrote:
> > 379 CurTransactionContext was never used
> >  24 CurTransactionContext was used
> >   66978 ExprContext was never used
> >   17364 ExprContext was used
> >   11139 SPI Exec was never used
> >2421 SPI Exec was used
> >   38386 ginPlaceToPage temporary context was never used
> >3348 ginPlaceToPage temporary context was used
> 
> It strikes me that a way to optimize these cases even more would be to
> postpone creating the context until it's actually needed.  That might
> not always be a reasonable plan -- in particular, it occurs to me to
> think that CurTransactionContext is probably so widely used that
> changing anything about how it works would probably be really painful
> -- but it might be possible in some cases.

That's not generally easy without slowing things down though - e.g. we
don't want to check for ExprContext's existence before every use, there
can be billions of usages in a single analytics query. The branches (yea
yea ;)) would show up as being noticeable.

There are a few places where could probably reliably *detect* that
they're not needed however. E.g. plenty executor nodes only need a
ExprContext when either a qual or projection is needed, both are pretty
cheap to detect at ExecInitNode() time.


> Another idea I have is that perhaps we could arrange to reuse contexts
> instead of destroying them and recreating them.  For example, when
> asked to delete a context, we could instead push it on a linked list
> of old contexts, or only if the list isn't too long already, and when
> asked to create one, we could pop from the list.  Or we could keep
> around an array of, say, 1024 contexts that are never freed and only
> allocated dynamically when we run out.

I'm a bit doubtful that's going to help, maintaining that list isn't
going to be free, and the lifetime and number of those contexts aren't
always going to match up.

I think you're somewhat on to something however: I do think that
especially executor startup would have a good chance of avoiding
noticeable overhead by batching the allocation of all these tiny
contexts together - I just don't quite know how given how our current
initialization works.  The best thing to do would probably to do two
walks during executor initialization, one to compute memory sizes, and a
second to initialize pre-requested memory that's laid out
serially... But obviously that's not a small change.

Greetings,

Andres Freund



Re: Rethinking MemoryContext creation

2017-12-12 Thread Robert Haas
On Tue, Dec 12, 2017 at 2:30 PM, Tom Lane  wrote:
> 379 CurTransactionContext was never used
>  24 CurTransactionContext was used
>   66978 ExprContext was never used
>   17364 ExprContext was used
>   11139 SPI Exec was never used
>2421 SPI Exec was used
>   38386 ginPlaceToPage temporary context was never used
>3348 ginPlaceToPage temporary context was used

It strikes me that a way to optimize these cases even more would be to
postpone creating the context until it's actually needed.  That might
not always be a reasonable plan -- in particular, it occurs to me to
think that CurTransactionContext is probably so widely used that
changing anything about how it works would probably be really painful
-- but it might be possible in some cases.

Another idea I have is that perhaps we could arrange to reuse contexts
instead of destroying them and recreating them.  For example, when
asked to delete a context, we could instead push it on a linked list
of old contexts, or only if the list isn't too long already, and when
asked to create one, we could pop from the list.  Or we could keep
around an array of, say, 1024 contexts that are never freed and only
allocated dynamically when we run out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rethinking MemoryContext creation

2017-12-12 Thread Tom Lane
I wrote:
> I've not done any benchmarking on this yet, just confirmed that it
> compiles and passes check-world.

So once I'd done some benchmarking, I was a bit disappointed: I could
not measure any difference anymore in "pgbench -S", and poking into
other scenarios found cases that were actually slower, like

do $$
begin
  for i in 1..1000 loop
declare x int;
begin
  x := 42;
exception when others then null;
end;
  end loop;
end$$;

which went from about 12.4 seconds in HEAD to about 13.6 with my
v2 patch.  When I looked into the reason, I found that my earlier
blithe pronouncement that "optimizing for the unused-context case
seems like the wrong thing" was too simple.  In this example we
create and delete two memory contexts per loop (a subtransaction
context and an ExprContext) and neither of them receives any
palloc requests.  Some of the contexts created in a "pgbench -S"
scenario are the same way.  So in these examples, we were on the
losing side of the replace-a-palloc-with-a-malloc trade.

However, if we can predict which contexts are more likely not to
receive traffic, we can fix this by doing things the old way for
those contexts.  I instrumented AllocSetDelete to log whether
the target context had received any requests in its lifespan,
and aggregated the reports over a run of the core regression tests.
I found these cases where there were significantly more reports
of "context was never used" than "context was used":

379 CurTransactionContext was never used
 24 CurTransactionContext was used
  66978 ExprContext was never used
  17364 ExprContext was used
993 HashTableContext was never used
 25 HashTableContext was used
 88 Logical Replication Launcher sublist was never used
  2 Logical Replication Launcher sublist was used
  11139 SPI Exec was never used
   2421 SPI Exec was used
 36 SetOp was never used
  2 SetOp was used
185 Statistics snapshot was never used
104 Subplan HashTable Context was never used
 44 Subplan HashTable Context was used
148 Subplan HashTable Temp Context was never used
   1481 Table function arguments was never used
 45 Table function arguments was used
 22 TableFunc per value context was never used
 52 Unique was never used
  4 Unique was used
137 WindowAgg Aggregates was never used
  2 WindowAgg Aggregates was used
127 WindowAgg Partition was never used
 12 WindowAgg Partition was used
288 _bt_pagedel was never used
 14 _bt_pagedel was used
 35 event trigger context was never used
  3 event trigger context was used
  38386 ginPlaceToPage temporary context was never used
   3348 ginPlaceToPage temporary context was used
454 ident parser context was never used
229 tSRF function arguments was never used
 46 tSRF function arguments was used

A lot of these probably aren't bothering with optimizing because they just
don't occur often enough to move the needle.  (And in workloads where that
wasn't true, maybe the usage statistics would be different anyway.)  But
it looked to me like CurTransactionContext, ExprContext, HashTableContext,
SPI Exec, and ginPlaceToPage temporary context would be worth doing it
the old way for.

Accordingly, attached is a v3 patch that adds another flag to
AllocSetContextCreateExtended telling it to do things the old way
with the context header in TopMemoryContext.  (We can still optimize
context name copying as before.)  This fixes the subtransaction case
I showed above, bringing it to 10.7 sec which is noticeably better
than HEAD.  I now see maybe a 1 or 2 percent improvement in "pgbench -S",
which isn't much, but it turns out that that test case only involves
half a dozen context creations/deletions per transaction.  So probably
we aren't going to get any noticeable movement on that benchmark, and
it'd be brighter to look for test cases where more contexts are involved.

regards, tom lane

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 868c14e..adbbc44 100644
*** a/contrib/amcheck/verify_nbtree.c
--- b/contrib/amcheck/verify_nbtree.c
*** bt_check_every_level(Relation rel, bool 
*** 295,303 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_MINSIZE,
!  ALLOCSET_DEFAULT_INITSIZE,
!  ALLOCSET_DEFAULT_MAXSIZE);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
  
  	/* Get true root block from meta-page */
--- 295,301 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_SIZES);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
  
  	/* Get true root block from meta-page */
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 

Re: Rethinking MemoryContext creation

2017-12-11 Thread Simon Riggs
On 11 December 2017 at 17:38, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 11 December 2017 at 16:27, Tom Lane  wrote:
>>> For a *very* large majority of the callers of AllocSetContextCreate,
>>> the context name is a simple C string constant, so we could just store
>>> the pointer to it and save the space and cycles required to copy it.
>
>> Why have the string at all in that case?
>
> Try reading a MemoryContextStats dump without it ...

I understood. I thought you were suggesting removing it in favour of a pointer.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane
Robert Haas  writes:
> On Mon, Dec 11, 2017 at 12:36 PM, Tom Lane  wrote:
>> [ thinks... ]  If we wanted to go that way, one thing we could do to
>> help extension authors (and ourselves) is to define the proposed
>> AllocSetContextCreate macro to include
>> 
>>  StaticAssertExpr(__builtin_constant_p(name))
>> 
>> on compilers that have __builtin_constant_p.  Now, that only helps
>> people using gcc and gcc-alikes, but that's a large fraction of
>> developers I should think.  (I tested this and it does seem to
>> correctly recognize string literals as constants.)

> I like that idea.  I think that would provide good protection not only
> for third-party developers but for core developers.

It turns out this is slightly more painful than I'd anticipated.
I tried to #define AllocSetContextCreate with five parameters,
but all of the call sites that use the size abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) blew up, because as far as
they were concerned there were only three parameters, since the
abstraction macros hadn't gotten expanded yet.

We can make it work by #defining AllocSetContextCreate with three
parameters

#define AllocSetContextCreate(parent, name, allocparams) ...

This approach means that you *must* use an abstraction macro when going
through AllocSetContextCreate; if you want to write out the parameters
longhand, you have to call AllocSetContextCreateExtended.  I do not
feel that this is a big loss, but there were half a dozen sites in our
code that were doing it the old way.  More significantly, since we
only introduced those macros in 9.6, I suspect that most extensions
are still doing it the old way and will get broken by this change.
It's not hard to fix, but the annoyance factor will probably be real.
I see no good way around it though: we can't use a static inline
function instead, because that will almost certainly break the
__builtin_constant_p test.

I did not bother with compatibility macros for SlabContext or
GenerationContext --- I really doubt any extension code is using
the former yet, and they couldn't be using the latter since it's new.

I've not done any benchmarking on this yet, just confirmed that it
compiles and passes check-world.

regards, tom lane

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 868c14e..adbbc44 100644
*** a/contrib/amcheck/verify_nbtree.c
--- b/contrib/amcheck/verify_nbtree.c
*** bt_check_every_level(Relation rel, bool 
*** 295,303 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_MINSIZE,
!  ALLOCSET_DEFAULT_INITSIZE,
!  ALLOCSET_DEFAULT_MAXSIZE);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
  
  	/* Get true root block from meta-page */
--- 295,301 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_SIZES);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);
  
  	/* Get true root block from meta-page */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 046898c..e93d740 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*** AtStart_Memory(void)
*** 997,1007 
  	 */
  	if (TransactionAbortContext == NULL)
  		TransactionAbortContext =
! 			AllocSetContextCreate(TopMemoryContext,
!   "TransactionAbortContext",
!   32 * 1024,
!   32 * 1024,
!   32 * 1024);
  
  	/*
  	 * We shouldn't have a transaction context already.
--- 997,1008 
  	 */
  	if (TransactionAbortContext == NULL)
  		TransactionAbortContext =
! 			AllocSetContextCreateExtended(TopMemoryContext,
! 		  "TransactionAbortContext",
! 		  0,
! 		  32 * 1024,
! 		  32 * 1024,
! 		  32 * 1024);
  
  	/*
  	 * We shouldn't have a transaction context already.
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dd4a8d3..6e27856 100644
*** a/src/backend/catalog/partition.c
--- b/src/backend/catalog/partition.c
*** RelationBuildPartitionDesc(Relation rel)
*** 513,521 
  	}
  
  	/* Now build the actual relcache partition descriptor */
! 	rel->rd_pdcxt = AllocSetContextCreate(CacheMemoryContext,
! 		  RelationGetRelationName(rel),
! 		  ALLOCSET_DEFAULT_SIZES);
  	oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);
  
  	result = (PartitionDescData *) palloc0(sizeof(PartitionDescData));
--- 513,522 
  	}
  
  	/* Now build the actual relcache partition descriptor */
! 	rel->rd_pdcxt = AllocSetContextCreateExtended(CacheMemoryContext,
!   RelationGetRelationName(rel),
!   MEMCONTEXT_OPTION_COPY_NAME,
!   

Re: Rethinking MemoryContext creation

2017-12-11 Thread Andres Freund
On 2017-12-11 13:09:42 -0500, Tom Lane wrote:
> Tomas Vondra  writes:
> > On 12/10/2017 04:42 PM, Tom Lane wrote:
> >> Peter Geoghegan  writes:
> >>> On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane  wrote:
>  Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
>  although that number is a bit shaky since the run-to-run variation
>  is a few percent anyway.
>
> > FWIW I've done some measurements, and while there is a improvement, it's
> > far from 5%. ...
> > So that's about 1.3% and 1.2% improvement. It seems fairly consistent,
> > but it might easily be due to different in layout of the binaries.
>
> Thanks for checking.  With these sorts of small-percentage improvements,
> I would not be surprised for platform-to-platform results to be different.
> At least you do see some improvement too.
>
> Let me code up the change to avoid copying constant name strings,
> and then we can see if that helps any.

FWIW:
+5.37%  postgres  postgres[.] hash_search_with_hash_value
+2.94%  postgres  postgres[.] AllocSetAlloc
+2.68%  postgres  postgres[.] _bt_compare
+2.36%  postgres  postgres[.] LWLockAcquire
+2.13%  postgres  postgres[.] PostgresMain
+1.47%  postgres  postgres[.] LWLockRelease
+1.32%  postgres  libc-2.25.so[.] _int_malloc
+1.23%  postgres  libc-2.25.so[.] vfprintf
+1.22%  postgres  postgres[.] LockAcquire
+1.20%  postgres  postgres[.] _bt_first
-1.11%  postgres  libc-2.25.so[.] strlen
   - strlen
  + 32.24% MemoryContextCreate
  + 16.97% pq_getmsgstring
  + 14.63% set_ps_display

So I'd expect it to help a small amount.

- Andres



Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane
Tomas Vondra  writes:
> On 12/10/2017 04:42 PM, Tom Lane wrote:
>> Peter Geoghegan  writes:
>>> On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane  wrote:
 Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
 although that number is a bit shaky since the run-to-run variation
 is a few percent anyway.

> FWIW I've done some measurements, and while there is a improvement, it's
> far from 5%. ...
> So that's about 1.3% and 1.2% improvement. It seems fairly consistent,
> but it might easily be due to different in layout of the binaries.

Thanks for checking.  With these sorts of small-percentage improvements,
I would not be surprised for platform-to-platform results to be different.
At least you do see some improvement too.

Let me code up the change to avoid copying constant name strings,
and then we can see if that helps any.

regards, tom lane



Re: Rethinking MemoryContext creation

2017-12-11 Thread Robert Haas
On Mon, Dec 11, 2017 at 12:36 PM, Tom Lane  wrote:
> [ thinks... ]  If we wanted to go that way, one thing we could do to
> help extension authors (and ourselves) is to define the proposed
> AllocSetContextCreate macro to include
>
> StaticAssertExpr(__builtin_constant_p(name))
>
> on compilers that have __builtin_constant_p.  Now, that only helps
> people using gcc and gcc-alikes, but that's a large fraction of
> developers I should think.  (I tested this and it does seem to
> correctly recognize string literals as constants.)

I like that idea.  I think that would provide good protection not only
for third-party developers but for core developers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rethinking MemoryContext creation

2017-12-11 Thread Dagfinn Ilmari Mannsåker
Robert Haas  writes:

> On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
>> Tomas Vondra  writes:
>>> On 12/11/2017 05:27 PM, Tom Lane wrote:
 However, unless we want to run around and touch all the ~ 150 calls
 with constant arguments, we'd have to set things up so that the default
 behavior for AllocSetContextCreate is to not copy.  This risks breaking
 callers in extensions.  Not entirely sure if it's worth that --- any
 thoughts?
>>
>>> I don't think silently breaking extensions is particularly attractive
>>> option, so I guess we'll have to run around and tweak the ~150 calls.
>>
>> Meh.  I suppose that of the ~150 call sites, there are probably only
>> a dozen or two where it would actually make a performance difference,
>> so maybe this needn't be quite as invasive as I first thought.
>
> I think changing only a subset of the call sites is unappealing
> because, even though it may not make a measurable performance
> difference in other cases, it may get cargo-culted into some place
> where it does make a difference.

Would it be acceptable to only get this optimisation on compilers that
support __builtin_constant_p or similar?  If so, the wrapper macro could
use that to automatically pass the no-copy flag when called with a
literal string.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen



Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane
Simon Riggs  writes:
> On 11 December 2017 at 16:27, Tom Lane  wrote:
>> For a *very* large majority of the callers of AllocSetContextCreate,
>> the context name is a simple C string constant, so we could just store
>> the pointer to it and save the space and cycles required to copy it.

> Why have the string at all in that case?

Try reading a MemoryContextStats dump without it ...

regards, tom lane



Re: Rethinking MemoryContext creation

2017-12-11 Thread Tomas Vondra
On 12/11/2017 06:22 PM, Robert Haas wrote:
> On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
>> Tomas Vondra  writes:
>>> On 12/11/2017 05:27 PM, Tom Lane wrote:
 However, unless we want to run around and touch all the ~ 150 calls
 with constant arguments, we'd have to set things up so that the default
 behavior for AllocSetContextCreate is to not copy.  This risks breaking
 callers in extensions.  Not entirely sure if it's worth that --- any
 thoughts?
>>
>>> I don't think silently breaking extensions is particularly attractive
>>> option, so I guess we'll have to run around and tweak the ~150 calls.
>>
>> Meh.  I suppose that of the ~150 call sites, there are probably only
>> a dozen or two where it would actually make a performance difference,
>> so maybe this needn't be quite as invasive as I first thought.
> 
> I think changing only a subset of the call sites is unappealing
> because, even though it may not make a measurable performance
> difference in other cases, it may get cargo-culted into some place
> where it does make a difference.
> 

Not sure. One the one hand, it might certainly be somewhat confusing
when we use two different methods to create memory contexts with C
string constants. On the other hand, I'm sure we have other similar
"local" optimizations.

I'd say "let's just tweak all the calls to use the new function" but I'm
not the person doing the leg work ...

> I also don't think silently breaking extensions is a terribly big deal
> in this case.  It seems likely that most extensions use static names
> just as most of our internal stuff does.  I'm going to guess that the
> number of extensions that will actually break as a result of a change
> in this area is probably very small - conceivably zero, and likely
> less than five.  I don't think we should be willing to uglify the core
> code too much for that level of breakage.
> 

I don't know how many extensions you maintain, but in my experience this
type of silent breakage is extremely annoying. I definitely prefer a
clean compile-time API breakage, for example.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rethinking MemoryContext creation

2017-12-11 Thread Simon Riggs
On 11 December 2017 at 16:27, Tom Lane  wrote:

> For a *very* large majority of the callers of AllocSetContextCreate,
> the context name is a simple C string constant, so we could just store
> the pointer to it and save the space and cycles required to copy it.

Why have the string at all in that case?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rethinking MemoryContext creation

2017-12-11 Thread Robert Haas
On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
> Tomas Vondra  writes:
>> On 12/11/2017 05:27 PM, Tom Lane wrote:
>>> However, unless we want to run around and touch all the ~ 150 calls
>>> with constant arguments, we'd have to set things up so that the default
>>> behavior for AllocSetContextCreate is to not copy.  This risks breaking
>>> callers in extensions.  Not entirely sure if it's worth that --- any
>>> thoughts?
>
>> I don't think silently breaking extensions is particularly attractive
>> option, so I guess we'll have to run around and tweak the ~150 calls.
>
> Meh.  I suppose that of the ~150 call sites, there are probably only
> a dozen or two where it would actually make a performance difference,
> so maybe this needn't be quite as invasive as I first thought.

I think changing only a subset of the call sites is unappealing
because, even though it may not make a measurable performance
difference in other cases, it may get cargo-culted into some place
where it does make a difference.

I also don't think silently breaking extensions is a terribly big deal
in this case.  It seems likely that most extensions use static names
just as most of our internal stuff does.  I'm going to guess that the
number of extensions that will actually break as a result of a change
in this area is probably very small - conceivably zero, and likely
less than five.  I don't think we should be willing to uglify the core
code too much for that level of breakage.

But those are just my opinions.  I am glad you are working on this; I
noticed this problem before and thought of trying to do something
about it, but ran out of time and ideas.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane
Tomas Vondra  writes:
> On 12/11/2017 05:27 PM, Tom Lane wrote:
>> However, unless we want to run around and touch all the ~ 150 calls
>> with constant arguments, we'd have to set things up so that the default
>> behavior for AllocSetContextCreate is to not copy.  This risks breaking
>> callers in extensions.  Not entirely sure if it's worth that --- any
>> thoughts?

> I don't think silently breaking extensions is particularly attractive
> option, so I guess we'll have to run around and tweak the ~150 calls.

Meh.  I suppose that of the ~150 call sites, there are probably only
a dozen or two where it would actually make a performance difference,
so maybe this needn't be quite as invasive as I first thought.

regards, tom lane



Re: Rethinking MemoryContext creation

2017-12-11 Thread Tomas Vondra


On 12/11/2017 05:27 PM, Tom Lane wrote:
> I wrote:
>> While fooling around with a different problem, I got annoyed at how slow
>> MemoryContext creation and deletion is.
> 
> Looking at this some more, there's another micro-optimization we could
> make, which is to try to get rid of the strlen+strcpy operations needed
> for the context name.  (And yes, I'm seeing those show up in profiles,
> to the tune of a couple percent of total runtime in some examples.)
> 
> For a *very* large majority of the callers of AllocSetContextCreate,
> the context name is a simple C string constant, so we could just store
> the pointer to it and save the space and cycles required to copy it.
> This would require providing a separate API for the few callers that are
> actually passing transient strings, but that's easy enough.  I envision
> AllocSetContextCreate becoming a wrapper macro for
> "AllocSetContextCreateExtended", which'd take a flag argument specifying
> whether the context name needs to be copied.
> 
> However, unless we want to run around and touch all the ~ 150 calls
> with constant arguments, we'd have to set things up so that the default
> behavior for AllocSetContextCreate is to not copy.  This risks breaking
> callers in extensions.  Not entirely sure if it's worth that --- any
> thoughts?
> 

I don't think silently breaking extensions is particularly attractive
option, so I guess we'll have to run around and tweak the ~150 calls.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Rethinking MemoryContext creation

2017-12-10 Thread Tom Lane
Peter Geoghegan  writes:
> On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane  wrote:
>> Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
>> although that number is a bit shaky since the run-to-run variation
>> is a few percent anyway.

> Is that with "-M prepared", too?

No, I didn't use that.

regards, tom lane



Rethinking MemoryContext creation

2017-12-09 Thread Tom Lane
While fooling around with a different problem, I got annoyed at how slow
MemoryContext creation and deletion is.  A big chunk of that seems to be
the palloc/pfree cycle needed to allocate the context header block in
TopMemoryContext.  After thinking about it for a bit, I decided that
it'd be smarter to include the context header in the context's first
malloc() allocation, thereby eliminating the palloc/pfree cycle
completely.  This is not completely cost-free, because it means that
we must allocate a context's first block ("keeper" block) immediately,
even if no pallocs ever get done in it.  However, optimizing for the
unused-context case seems like the wrong thing.

To do this, we need to move the responsibility for allocating and
releasing the context header into the individual context-type-specific
modules, which leads to a certain amount of code duplication, but not a
lot.  A big advantage is that we can simplify the overly-complicated
creation API that had the context module calling into mcxt.c which then
recursed back to the context module.  Further down the pike there might
be some additional win, since this gives the context module more control
over where to put the header (shared memory maybe?).

Another point worth making is that getting the context headers out of
TopMemoryContext reduces the cost of transiently having a lot of
contexts.  Since TopMemoryContext is never reset, and aset.c doesn't
give memory back to libc short of a reset, creating a lot of contexts
permanently bloats TopMemoryContext even if they all get deleted later.

I didn't spend a lot of effort on slab.c or generation.c, just having
them do a separate malloc and free for the context header.  If anyone's
really excited about that, they could be improved later.

To cut to the chase: I experimented with simply creating and deleting an
aset.c memory context in a tight loop, optionally with one palloc in the
context before deleting it.  On my machine, the time per loop looked like

HEADwith patch

no palloc at all 9.2 us 10.3 us
palloc 100 bytes18.5 us 11.6 us
palloc 1 bytes  17.2 us 18.5 us

So the unused-context case does get a bit slower, as we trade off
a malloc/free cycle to save a palloc/pfree cycle.  The case where
we've made some use of the context gets considerably quicker though.
Also, the last line shows a situation where the lone palloc request
does not fit in the context's initial allocation so that we're forced
to make two malloc requests not one.  This is a bit slower too, but
not by much, and I think it's really an unusual case.  (As soon as
you've made any requests that do fit in the initial allocation,
you'd be back to the patch winning.)

Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
although that number is a bit shaky since the run-to-run variation
is a few percent anyway.

Thoughts, objections?  Anyone want to do some other benchmarking?

regards, tom lane

diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index 296fa19..7f08f5d 100644
*** a/src/backend/utils/mmgr/README
--- b/src/backend/utils/mmgr/README
*** every other context is a direct or indir
*** 177,184 
  here is essentially the same as "malloc", because this context will never
  be reset or deleted.  This is for stuff that should live forever, or for
  stuff that the controlling module will take care of deleting at the
! appropriate time.  An example is fd.c's tables of open files, as well as
! the context management nodes for memory contexts themselves.  Avoid
  allocating stuff here unless really necessary, and especially avoid
  running with CurrentMemoryContext pointing here.
  
--- 177,183 
  here is essentially the same as "malloc", because this context will never
  be reset or deleted.  This is for stuff that should live forever, or for
  stuff that the controlling module will take care of deleting at the
! appropriate time.  An example is fd.c's tables of open files.  Avoid
  allocating stuff here unless really necessary, and especially avoid
  running with CurrentMemoryContext pointing here.
  
*** a maximum block size.  Selecting smaller
*** 420,430 
  space in contexts that aren't expected to hold very much (an example
  is the relcache's per-relation contexts).
  
! Also, it is possible to specify a minimum context size.  If this
! value is greater than zero then a block of that size will be grabbed
! immediately upon context creation, and cleared but not released during
! context resets.  This feature is needed for ErrorContext (see above),
! but will most likely not be used for other contexts.
  
  We expect that per-tuple contexts will be reset frequently and typically
  will not allocate very much space per tuple cycle.  To make this usage
--- 419,428 
  space in contexts that aren't expected to hold very much (an example
  is the