On 14.04.2011 17:43, Tom Lane wrote:
Greg Smith<g...@2ndquadrant.com>  writes:
samples  %        image name               symbol name
53548     6.7609  postgres                 AllocSetAlloc
32787     4.1396  postgres                 MemoryContextAllocZeroAligned
26330     3.3244  postgres                 base_yyparse
21723     2.7427  postgres                 hash_search_with_hash_value
20831     2.6301  postgres                 SearchCatCache
19094     2.4108  postgres                 hash_seq_search
18402     2.3234  postgres                 hash_any
15975     2.0170  postgres                 AllocSetFreeIndex
14205     1.7935  postgres                 _bt_compareSince
13370     1.6881  postgres                 core_yylex
10455     1.3200  postgres                 MemoryContextAlloc
10330     1.3042  postgres                 LockAcquireExtended
10197     1.2875  postgres                 ScanKeywordLookup
9312      1.1757  postgres                 MemoryContextAllocZero

Yeah, this is pretty typical ...

In this case you could just use prepared statements and get rid of all the parser related overhead, which includes much of the allocations.

I don't know nearly enough about the memory allocator to comment on
whether it's possible to optimize it better for this case to relieve any
bottleneck.

I doubt that it's possible to make AllocSetAlloc radically cheaper.
I think the more likely route to improvement there is going to be to
find a way to do fewer pallocs.  For instance, if we had more rigorous
rules about which data structures are read-only to which code, we could
probably get rid of a lot of just-in-case tree copying that happens in
the parser and planner.

But at the same time, even if we could drive all palloc costs to zero,
it would only make a 10% difference in this example.  And this sort of
fairly flat profile is what I see in most cases these days --- we've
been playing performance whack-a-mole for long enough now that there
isn't much low-hanging fruit left.  For better or worse, the system
design we've chosen just isn't amenable to minimal overhead for simple
queries.  I think a lot of this ultimately traces to the extensible,
data-type-agnostic design philosophy.  The fact that we don't know what
an integer is until we look in pg_type, and don't know what an "="
operator does until we look up its properties, is great from a flexibility
point of view; but this sort of query is where the costs become obvious.

I think the general strategy to make this kind of queries faster will be to add various fastpaths to cache and skip even more work. For example,

There's one very low-hanging fruit here, though. I profiled the pgbench case, with -M prepared, and found that like in Greg Smith's profile, hash_seq_search pops up quite high in the list. Those calls are coming from LockReleaseAll(), where we scan the local lock hash to find all locks held. We specify the initial size of the local lock hash table as 128, which is unnecessarily large for small queries like this. Reducing it to 8 slashed the time spent in hash_seq_search().

I think we should make that hash table smaller. It won't buy much, somewhere between 1-5 % in this test case, but it's very easy to do and I don't see much downside, it's a local hash table so it will grow as needed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to