Re: [HACKERS] plan time of MASSIVE partitioning ...

Robert Haas Fri, 29 Oct 2010 10:17:11 -0700

On Fri, Oct 29, 2010 at 12:53 PM, Tom Lane <[email protected]> wrote:
> Boszormenyi Zoltan <[email protected]> writes:
>> On the other hand, if I use a similar test case to my original one
>> (i.e. the tables are much wider) then the query planning takes
>> 1.42 seconds in 9.1 with this patch instead of about 4.7 seconds
>> as we observed it using PostgreSQL 9.0.0. The beginning of the gprof
>> output now looks like this:
>
>>   %   cumulative   self              self     total
>>  time   seconds   seconds    calls   s/call   s/call  name
>>  21.13      0.30     0.30   235091     0.00     0.00  SearchCatCache
>>   7.04      0.40     0.10  1507206     0.00     0.00  
>> hash_search_with_hash_value
>>   3.52      0.45     0.05  2308219     0.00     0.00  AllocSetAlloc
>
> Yeah, for me it looks even worse: oprofile shows about 77% of time in
> SearchCatCache.  I poked around a little and it seems that probably most
> of the time is going into searches of the STATRELATTINH syscache, which
> looks like this:
>
> $13 = {id = 41, cc_next = 0x2b43a60,
>  cc_relname = 0x7f6bc6ed2218 "pg_statistic", cc_reloid = 2619,
>  cc_indexoid = 2696, cc_relisshared = 0 '\000', cc_tupdesc = 0x7f6bc6ed11d8,
>  cc_ntup = 68922, cc_nbuckets = 1024, cc_nkeys = 3, cc_key = {1, 2, 3, 0},
>  ...
>
> Most of those entries are "negative" cache entries, since we don't have
> any actual stats in this toy example.
>
> I think that we probably should be very circumspect about believing that
> this example is still a good guide to what to optimize next; in
> particular, in a real-world example with real stats, I'm not sure that
> the hot spots will still be in the same places.  I'd advise loading up
> some real data and doing more profiling.
>
> However, if the hot spot does stay in SearchCatCache, I can't help
> noticing that those bucket chains are looking a bit overloaded ---
> sixty-plus entries per bucket ain't good.  Maybe it's time to teach
> catcache.c how to reorganize its hashtables once the load factor
> exceeds a certain level.  Or more drastically, maybe it should lose
> its private hashtable logic and use dynahash.c; I'm not sure at the
> moment if the private implementation has any important characteristics
> dynahash hasn't got.


I'm not sure what's happening in this particular case, but I seem to
remember poking at a case a while back where we were doing a lot of
repeated statistics lookups for the same columns.  If that's also the
the case here and if there is some way to avoid it (hang a pointer to
the stats off the node tree somewhere?) we might be able to cut down
on the number of hash probes, as an alternative to or in addition to
making them faster.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] plan time of MASSIVE partitioning ...

Reply via email to