On 14.04.2011 17:43, Tom Lane wrote:
Greg Smith<g...@2ndquadrant.com> writes:
samples % image name symbol name
53548 6.7609 postgres AllocSetAlloc
32787 4.1396 postgres MemoryContextAllocZeroAligned
26330 3.3244 postgres base_yyparse
21723 2.7427 postgres hash_search_with_hash_value
20831 2.6301 postgres SearchCatCache
19094 2.4108 postgres hash_seq_search
18402 2.3234 postgres hash_any
15975 2.0170 postgres AllocSetFreeIndex
14205 1.7935 postgres _bt_compareSince
13370 1.6881 postgres core_yylex
10455 1.3200 postgres MemoryContextAlloc
10330 1.3042 postgres LockAcquireExtended
10197 1.2875 postgres ScanKeywordLookup
9312 1.1757 postgres MemoryContextAllocZero
Yeah, this is pretty typical ...
In this case you could just use prepared statements and get rid of all
the parser related overhead, which includes much of the allocations.
I don't know nearly enough about the memory allocator to comment on
whether it's possible to optimize it better for this case to relieve any
bottleneck.
I doubt that it's possible to make AllocSetAlloc radically cheaper.
I think the more likely route to improvement there is going to be to
find a way to do fewer pallocs. For instance, if we had more rigorous
rules about which data structures are read-only to which code, we could
probably get rid of a lot of just-in-case tree copying that happens in
the parser and planner.
But at the same time, even if we could drive all palloc costs to zero,
it would only make a 10% difference in this example. And this sort of
fairly flat profile is what I see in most cases these days --- we've
been playing performance whack-a-mole for long enough now that there
isn't much low-hanging fruit left. For better or worse, the system
design we've chosen just isn't amenable to minimal overhead for simple
queries. I think a lot of this ultimately traces to the extensible,
data-type-agnostic design philosophy. The fact that we don't know what
an integer is until we look in pg_type, and don't know what an "="
operator does until we look up its properties, is great from a flexibility
point of view; but this sort of query is where the costs become obvious.
I think the general strategy to make this kind of queries faster will be
to add various fastpaths to cache and skip even more work. For example,
There's one very low-hanging fruit here, though. I profiled the pgbench
case, with -M prepared, and found that like in Greg Smith's profile,
hash_seq_search pops up quite high in the list. Those calls are coming
from LockReleaseAll(), where we scan the local lock hash to find all
locks held. We specify the initial size of the local lock hash table as
128, which is unnecessarily large for small queries like this. Reducing
it to 8 slashed the time spent in hash_seq_search().
I think we should make that hash table smaller. It won't buy much,
somewhere between 1-5 % in this test case, but it's very easy to do and
I don't see much downside, it's a local hash table so it will grow as
needed.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers