Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-26 Thread Scott Marlowe
On Tue, Nov 25, 2014 at 8:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Hmm, I don't like the trend here.  For the repeat-1000x query, I get
 these reported execution times:

 8.4  360 ms
 9.0  365 ms
 9.1  440 ms
 9.2  510 ms
 9.3  550 ms
 9.4  570 ms
 head 570 ms

 I made a quick-hack patch to suppress redundant GetDefaultOpclass calls
 in typcache.c, and found that that brought HEAD's runtime down to 460ms.

 I found some additional low-hanging fruit by comparing gprof call counts
 in 8.4 and HEAD:

 * OverrideSearchPathMatchesCurrent(), which is not there at all in 8.4
 or 9.2, accounts for a depressingly large amount of palloc/pfree traffic.
 The implementation was quick-n-dirty to begin with, but workloads
 like this one call it often enough to make it a pain point.

 * plpgsql's setup_param_list() contributes another large fraction of
 added palloc/pfree traffic; this is evidently caused by the temporary
 bitmapset needed for its bms_first_member() loop, which was not there
 in 8.4 but is there in 9.2.

 I've been able to bring HEAD's runtime down to about 415 ms with the
 collection of more-or-less quick hacks attached.  None of them are
 ready to commit but I thought I'd post them for the record.

 After review of all this, I think the aspect of your example that is
 causing performance issues is that there are a lot of non-inline-able
 SQL-language function calls.  That's not a case that we've put much
 thought into lately.  I doubt we are going to get all the way back to
 where 8.4 was in the short term, because I can see that there is a
 significant amount of new computation associated with collation
 management during parsing (catcache lookups driven by get_typcollation,
 assign_collations_walker, etc).  The long-term answer to that is to
 improve the SQL-language function support so that it can cache the results
 of parsing the function body; we have surely got enough plancache support
 for that now, but no one's attempted to apply it in functions.c.

Thanks so much for the work on this. We won't be applying a patch in
prod but we can definitely get a feel for the change on some test
boxes.

And if I didn't say it, Thanks to Heikki for his advice. Huge
difference there too.

This is exactly why I love using Postgres so much. The community
support. No other software package has this kind of support.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Scott Marlowe
OK so there's a simple set of tree functions we use at work. They're
quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
a simple mix of sql and plpgsql functions which are at
http://pastebin.com/SXTnNhd5 and which I've attached.

Here's a test query:

select tree_ancestor_keys('00010001');

According to explain analyze on both 8.4 and 9.2 they have the same
plan. However, on the same machine the query is about 40% slower on
9.2. Note we're not hitting the disks, or even buffers here. It's pure
in memory plpsql and sql that we're running.

explain analyze select tree_ancestor_keys('00010001')
from generate_series(1,1000);

On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
same diff, it's just easier to see with the generate series forcing it
to run 1000 times to kind of even out the noise.)

On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
that difference seems to be there on all plpgsql and sql functions.

In our application, these tree functions get called millions and
millions of times a day, and a 40% performance penalty is a pretty big
deal.

We're already using the trick of telling the query planner that this
function will return 1 row with alter function rows 1 etc. That helps
a lot but it doesn't fix this underlying performance issue.

Server versions are 8.4.22 (last I think) and 9.2.9.

If anyone has any suggestions I'd love to hear them.
-- 
To understand recursion, one must first understand recursion.


z.sql
Description: application/sql

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Heikki Linnakangas

On 11/25/2014 10:36 PM, Scott Marlowe wrote:

OK so there's a simple set of tree functions we use at work. They're
quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
a simple mix of sql and plpgsql functions which are at
http://pastebin.com/SXTnNhd5 and which I've attached.

Here's a test query:

select tree_ancestor_keys('00010001');

According to explain analyze on both 8.4 and 9.2 they have the same
plan. However, on the same machine the query is about 40% slower on
9.2. Note we're not hitting the disks, or even buffers here. It's pure
in memory plpsql and sql that we're running.

explain analyze select tree_ancestor_keys('00010001')
from generate_series(1,1000);

On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
same diff, it's just easier to see with the generate series forcing it
to run 1000 times to kind of even out the noise.)

On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
that difference seems to be there on all plpgsql and sql functions.

In our application, these tree functions get called millions and
millions of times a day, and a 40% performance penalty is a pretty big
deal.

We're already using the trick of telling the query planner that this
function will return 1 row with alter function rows 1 etc. That helps
a lot but it doesn't fix this underlying performance issue.

Server versions are 8.4.22 (last I think) and 9.2.9.

If anyone has any suggestions I'd love to hear them.


I don't know why this regressed between those versions, but looking at 
the functions, there's some low-hanging fruit:


1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe 
duplicates are expected here, although I'm not 100% sure).


2. tree_ancestor_keys() calculates tree_level($1) every time it 
recurses. Would be cheaper to calculate once, and pass it as argument.


Put together:

CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer) 
RETURNS SETOF bit varying

LANGUAGE sql IMMUTABLE STRICT
AS $_$
  select tree_ancestor_key($1, $2)
  union all
  select tree_ancestor_keys($1, $2 + 1, $3)
  where $2  $3
$_$;

CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer) 
RETURNS SETOF bit varying

LANGUAGE sql IMMUTABLE STRICT
AS $_$
  select tree_ancestor_keys($1, $2 + 1, tree_level($1))
$_$;

These changes make your test query go about 2x faster on my laptop, with 
git master. I'm sure you could optimize the functions further, but those 
at least seem like fairly safe and simple changes.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Tom Lane
Scott Marlowe scott.marl...@gmail.com writes:
 OK so there's a simple set of tree functions we use at work. They're
 quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
 a simple mix of sql and plpgsql functions which are at
 http://pastebin.com/SXTnNhd5 and which I've attached.

 Here's a test query:

 select tree_ancestor_keys('00010001');

 According to explain analyze on both 8.4 and 9.2 they have the same
 plan. However, on the same machine the query is about 40% slower on
 9.2. Note we're not hitting the disks, or even buffers here. It's pure
 in memory plpsql and sql that we're running.

 explain analyze select tree_ancestor_keys('00010001')
 from generate_series(1,1000);

Hmm, I don't like the trend here.  For the repeat-1000x query, I get
these reported execution times:

8.4 360 ms
9.0 365 ms
9.1 440 ms
9.2 510 ms
9.3 550 ms
9.4 570 ms
head570 ms

(This is in assert-enabled builds, I'm too lazy to rebuild all the
branches without that.)

oprofile isn't showing any smoking gun AFAICS; it seems like things
are just generally slower.  Still, we've not seen similar reports
elsewhere, so somehow this usage style is getting penalized in newer
branches compared to other cases.  If it were all on 9.2's head
I'd be suspicious of the plancache mechanism, but evidently that's
not it, or at least not the whole story.

HEAD profile entries above 1%:

samples  %image name   symbol name
7573  7.2448  postgres AllocSetAlloc
6059  5.7964  postgres SearchCatCache
3533  3.3799  postgres AllocSetCheck
3420  3.2718  postgres base_yyparse
2104  2.0128  postgres AllocSetFree
1613  1.5431  postgres palloc
1523  1.4570  postgres ExecMakeFunctionResultNoSets
1313  1.2561  postgres check_list_invariants
1241  1.1872  postgres palloc0
1213  1.1604  postgres pfree
1157  1.1069  postgres SPI_plan_get_cached_plan
1136  1.0868  postgres GetPrivateRefCountEntry
1098  1.0504  postgres sentinel_ok
1085  1.0380  postgres hash_any
1057  1.0112  postgres core_yylex
1053  1.0074  postgres expression_tree_walker
1046  1.0007  postgres hash_search_with_hash_value

8.4 profile entries above 1%:

samples  %image name   symbol name
1178210.3680  postgres AllocSetAlloc
7369  6.4846  postgres AllocSetCheck
5623  4.9482  postgres base_yyparse
4166  3.6660  postgres SearchCatCache
2671  2.3504  postgres ExecMakeFunctionResultNoSets
2060  1.8128  postgres MemoryContextAllocZeroAligned
2030  1.7864  postgres MemoryContextAlloc
1679  1.4775  postgres ExecEvalParam
1607  1.4141  postgres base_yylex
1389  1.2223  postgres check_list_invariants
1348  1.1862  postgres RevalidateCachedPlan
1341  1.1801  postgres AcquireExecutorLocks
1266  1.1141  postgres MemoryContextCreate
1256  1.1053  postgres hash_any
1255  1.1044  postgres expression_tree_walker
1202  1.0577  postgres expression_tree_mutator
1191  1.0481  postgres AllocSetReset

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Scott Marlowe
On Tue, Nov 25, 2014 at 1:58 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 11/25/2014 10:36 PM, Scott Marlowe wrote:

 OK so there's a simple set of tree functions we use at work. They're
 quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
 a simple mix of sql and plpgsql functions which are at
 http://pastebin.com/SXTnNhd5 and which I've attached.

 Here's a test query:

 select tree_ancestor_keys('00010001');

 According to explain analyze on both 8.4 and 9.2 they have the same
 plan. However, on the same machine the query is about 40% slower on
 9.2. Note we're not hitting the disks, or even buffers here. It's pure
 in memory plpsql and sql that we're running.

 explain analyze select tree_ancestor_keys('00010001')
 from generate_series(1,1000);

 On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
 same diff, it's just easier to see with the generate series forcing it
 to run 1000 times to kind of even out the noise.)

 On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
 that difference seems to be there on all plpgsql and sql functions.

 In our application, these tree functions get called millions and
 millions of times a day, and a 40% performance penalty is a pretty big
 deal.

 We're already using the trick of telling the query planner that this
 function will return 1 row with alter function rows 1 etc. That helps
 a lot but it doesn't fix this underlying performance issue.

 Server versions are 8.4.22 (last I think) and 9.2.9.

 If anyone has any suggestions I'd love to hear them.


 I don't know why this regressed between those versions, but looking at the
 functions, there's some low-hanging fruit:

 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe
 duplicates are expected here, although I'm not 100% sure).

 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses.
 Would be cheaper to calculate once, and pass it as argument.

 Put together:

 CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer) RETURNS
 SETOF bit varying
 LANGUAGE sql IMMUTABLE STRICT
 AS $_$
   select tree_ancestor_key($1, $2)
   union all
   select tree_ancestor_keys($1, $2 + 1, $3)
   where $2  $3
 $_$;

 CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer) RETURNS
 SETOF bit varying
 LANGUAGE sql IMMUTABLE STRICT
 AS $_$
   select tree_ancestor_keys($1, $2 + 1, tree_level($1))
 $_$;

 These changes make your test query go about 2x faster on my laptop, with git
 master. I'm sure you could optimize the functions further, but those at
 least seem like fairly safe and simple changes.

Wow that made a huge difference. About a 50% increase across the
board. Sadly, 9.2 is still way behind 8.4 (see Tom's email)

Here's the results of 10 runs, 9.2 old functions, 9.2 new functions,
8.4 old functions, 8.4 new functions:

402.454 ms 217.718 ms 283.289 ms 167.108 ms
390.828 ms 219.644 ms 282.994 ms 165.524 ms
397.987 ms 216.864 ms 289.053 ms 170.821 ms
391.670 ms 220.943 ms 296.410 ms 164.190 ms
437.099 ms 233.360 ms 284.279 ms 183.919 ms
473.945 ms 291.199 ms 347.916 ms 168.300 ms
453.974 ms 231.350 ms 367.517 ms 158.717 ms
377.221 ms 226.697 ms 297.255 ms 164.196 ms
396.812 ms 262.638 ms 300.073 ms 161.325 ms
436.822 ms 227.489 ms 292.553 ms 179.194 ms
405.929 ms 233.461 ms 267.355 ms 162.847 ms

so about 400 versus about 220, and about 290 versus about 165 or so.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Tom Lane
I wrote:
 Scott Marlowe scott.marl...@gmail.com writes:
 OK so there's a simple set of tree functions we use at work. They're
 quite fast in 8.4 and they've gotten about 40% slower in 9.2.

 Hmm, I don't like the trend here.  For the repeat-1000x query, I get
 these reported execution times:

 8.4   360 ms
 9.0   365 ms
 9.1   440 ms
 9.2   510 ms
 9.3   550 ms
 9.4   570 ms
 head  570 ms

I found part of the issue: you're doing a lot of UNIONs on varbit
columns, and every time we parse one of those, get_sort_group_operators
asks the typcache about hash operators for the type.  typcache finds
out that varbit has no default hash opclass ... but *it doesn't cache
negative answers*.  So that means a physical lookup in pg_opclass every
time :-(.  That is actually the only bufmgr traffic induced by this
test query, once the catalog caches are loaded.  Versions before 9.1
don't have that hit because they didn't consider hashing for UNIONs.

I made a quick-hack patch to suppress redundant GetDefaultOpclass calls
in typcache.c, and found that that brought HEAD's runtime down to 460ms.
I don't think I'd want to commit this in its current form, but with
some additions to flush the cache after pg_opclass updates it would
be a credible improvement.

So that probably explains the jump from 9.0 to 9.1.  Don't know yet
about the other lossage.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Tom Lane
Scott Marlowe scott.marl...@gmail.com writes:
 On Tue, Nov 25, 2014 at 1:58 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I don't know why this regressed between those versions, but looking at the
 functions, there's some low-hanging fruit:
 
 1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe
 duplicates are expected here, although I'm not 100% sure).
 
 2. tree_ancestor_keys() calculates tree_level($1) every time it recurses.
 Would be cheaper to calculate once, and pass it as argument.

 Wow that made a huge difference. About a 50% increase across the
 board. Sadly, 9.2 is still way behind 8.4 (see Tom's email)

Switching from UNION to UNION ALL would dodge the varbit hash-opclass
caching issue, I think.  But there's still something else going on.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Tom Lane
I wrote:
 Hmm, I don't like the trend here.  For the repeat-1000x query, I get
 these reported execution times:

 8.4  360 ms
 9.0  365 ms
 9.1  440 ms
 9.2  510 ms
 9.3  550 ms
 9.4  570 ms
 head 570 ms

 I made a quick-hack patch to suppress redundant GetDefaultOpclass calls
 in typcache.c, and found that that brought HEAD's runtime down to 460ms.

I found some additional low-hanging fruit by comparing gprof call counts
in 8.4 and HEAD:

* OverrideSearchPathMatchesCurrent(), which is not there at all in 8.4
or 9.2, accounts for a depressingly large amount of palloc/pfree traffic.
The implementation was quick-n-dirty to begin with, but workloads
like this one call it often enough to make it a pain point.

* plpgsql's setup_param_list() contributes another large fraction of
added palloc/pfree traffic; this is evidently caused by the temporary
bitmapset needed for its bms_first_member() loop, which was not there
in 8.4 but is there in 9.2.

I've been able to bring HEAD's runtime down to about 415 ms with the
collection of more-or-less quick hacks attached.  None of them are
ready to commit but I thought I'd post them for the record.

After review of all this, I think the aspect of your example that is
causing performance issues is that there are a lot of non-inline-able
SQL-language function calls.  That's not a case that we've put much
thought into lately.  I doubt we are going to get all the way back to
where 8.4 was in the short term, because I can see that there is a
significant amount of new computation associated with collation
management during parsing (catcache lookups driven by get_typcollation,
assign_collations_walker, etc).  The long-term answer to that is to
improve the SQL-language function support so that it can cache the results
of parsing the function body; we have surely got enough plancache support
for that now, but no one's attempted to apply it in functions.c.

regards, tom lane

diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 8c6c7fc..bf80563 100644
*** a/src/backend/utils/cache/typcache.c
--- b/src/backend/utils/cache/typcache.c
*** static HTAB *TypeCacheHash = NULL;
*** 77,82 
--- 77,84 
  #define TCFLAGS_CHECKED_FIELD_PROPERTIES	0x0010
  #define TCFLAGS_HAVE_FIELD_EQUALITY			0x0020
  #define TCFLAGS_HAVE_FIELD_COMPARE			0x0040
+ #define TCFLAGS_CHECKED_BTREE_OPCLASS		0x0080
+ #define TCFLAGS_CHECKED_HASH_OPCLASS		0x0100
  
  /* Private information to support comparisons of enum values */
  typedef struct
*** lookup_type_cache(Oid type_id, int flags
*** 227,249 
  	{
  		Oid			opclass;
  
! 		opclass = GetDefaultOpClass(type_id, BTREE_AM_OID);
! 		if (OidIsValid(opclass))
  		{
! 			typentry-btree_opf = get_opclass_family(opclass);
! 			typentry-btree_opintype = get_opclass_input_type(opclass);
  		}
  		/* If no btree opclass, we force lookup of the hash opclass */
  		if (typentry-btree_opf == InvalidOid)
  		{
  			if (typentry-hash_opf == InvalidOid)
  			{
! opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
! if (OidIsValid(opclass))
  {
! 	typentry-hash_opf = get_opclass_family(opclass);
! 	typentry-hash_opintype = get_opclass_input_type(opclass);
  }
  			}
  		}
  		else
--- 229,259 
  	{
  		Oid			opclass;
  
! 		if (!(typentry-flags  TCFLAGS_CHECKED_BTREE_OPCLASS))
  		{
! 			opclass = GetDefaultOpClass(type_id, BTREE_AM_OID);
! 			if (OidIsValid(opclass))
! 			{
! typentry-btree_opf = get_opclass_family(opclass);
! typentry-btree_opintype = get_opclass_input_type(opclass);
! 			}
! 			typentry-flags |= TCFLAGS_CHECKED_BTREE_OPCLASS;
  		}
  		/* If no btree opclass, we force lookup of the hash opclass */
  		if (typentry-btree_opf == InvalidOid)
  		{
  			if (typentry-hash_opf == InvalidOid)
  			{
! if (!(typentry-flags  TCFLAGS_CHECKED_HASH_OPCLASS))
  {
! 	opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
! 	if (OidIsValid(opclass))
! 	{
! 		typentry-hash_opf = get_opclass_family(opclass);
! 		typentry-hash_opintype = get_opclass_input_type(opclass);
! 	}
  }
+ typentry-flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
  			}
  		}
  		else
*** lookup_type_cache(Oid type_id, int flags
*** 268,278 
  	{
  		Oid			opclass;
  
! 		opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
! 		if (OidIsValid(opclass))
  		{
! 			typentry-hash_opf = get_opclass_family(opclass);
! 			typentry-hash_opintype = get_opclass_input_type(opclass);
  		}
  	}
  
--- 278,292 
  	{
  		Oid			opclass;
  
! 		if (!(typentry-flags  TCFLAGS_CHECKED_HASH_OPCLASS))
  		{
! 			opclass = GetDefaultOpClass(type_id, HASH_AM_OID);
! 			if (OidIsValid(opclass))
! 			{
! typentry-hash_opf = get_opclass_family(opclass);
! typentry-hash_opintype = get_opclass_input_type(opclass);
! 			}
! 			typentry-flags |= TCFLAGS_CHECKED_HASH_OPCLASS;
  		}
  	}
  
diff --git a/src/backend/catalog/namespace.c