The full set of conf changes that were in use during these tests are as
follows:

default_statistics_target = 100 # pgtune wizard 2010-08-17
maintenance_work_mem = 1GB # pgtune wizard 2010-08-17
constraint_exclusion = on # pgtune wizard 2010-08-17
checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17
effective_cache_size = 36GB # sam
work_mem = 1500MB # pgtune wizard 2010-08-17
wal_buffers = 8MB # pgtune wizard 2010-08-17
#checkpoint_segments = 16 # pgtune wizard 2010-08-17
checkpoint_segments = 30 # sam
shared_buffers = 8GB # pgtune wizard 2010-08-17
max_connections = 80 # pgtune wizard 2010-08-17
cpu_tuple_cost = 0.0030                 # sam
cpu_index_tuple_cost = 0.0010           # sam
cpu_operator_cost = 0.0005              # sam
random_page_cost = 2.0                  # sam


On Wed, Aug 18, 2010 at 11:50 PM, Samuel Gendler
<sgend...@ideasculptor.com>wrote:

> On Wed, Aug 18, 2010 at 11:14 PM, Samuel Gendler <
> sgend...@ideasculptor.com> wrote:
>
>> Please forgive the barrage of questions.  I'm just learning how to tune
>> things in postgres and I've still got a bit of learning curve to get over,
>> apparently.  I have done a lot of reading, though, I swear.
>>
>> I've got two identical queries except for a change of one condition which
>> cuts the number of rows in half - which also has the effect of eliminating
>> one partition from consideration (partitioned over time and I cut the time
>> period in half).  The query plans are considerably different as a result.
>> The net result is that the fast query is 5x faster than the slow query.  I'm
>> curious if the alternate query plan is inherently faster or is it just a
>> case of the algorithm scaling worse than linearly with the row count, which
>> certainly wouldn't be surprising.  The big win, for me, is that the sort
>> uses vastly less memory.  The slow plan requires work_mem to be 1500MB to
>> even make it 5x worse.  With a more reasonable work_mem (400MB), it drops to
>> something like 15x worse because it has to sort on disk.
>>
>> fast plan: http://explain.depesz.com/s/iZ
>> slow plan: http://explain.depesz.com/s/Dv2
>>
>> query:
>>
>>
>> EXPLAIN ANALYZE SELECT
>>            t_lookup.display_name as group,
>>            to_char(t_fact.time, 'DD/MM HH24:MI') as category,
>>            substring(t_lookup.display_name from 1 for 20) as label,
>>            round(sum(t_fact.total_ms)/sum(t_fact.count)) as value
>>        FROM
>>            portal.providers t_lookup,
>>            day_scale_radar_performance_fact t_fact
>>        WHERE
>>            t_fact.probe_type_num < 3
>>            and t_lookup.provider_id = t_fact.provider_id
>>            and t_lookup.provider_owner_customer_id =
>> t_fact.provider_owner_customer_id
>>            and t_fact.provider_owner_customer_id = 0
>>            and t_fact.time between timezone('UTC', '2010-08-18 15:00:00')
>> - interval '30 day' and timezone('UTC', '2010-08-18 15:00:00')
>>        GROUP BY
>>            t_fact.provider_owner_customer_id, t_fact.provider_id,
>>            t_lookup.display_name,
>>            t_fact.time
>>        ORDER BY
>>            t_fact.time
>>
>> table structure:
>>
>>        Table "perf_reporting.abstract_radar_performance_fact"
>>           Column           |            Type             | Modifiers
>> ----------------------------+-----------------------------+-----------
>> count                      | bigint                      | not null
>> total_ms                   | bigint                      | not null
>> time                       | timestamp without time zone | not null
>> market_num                 | integer                     | not null
>> country_num                | integer                     | not null
>> autosys_num                | integer                     | not null
>> provider_owner_zone_id     | integer                     | not null
>> provider_owner_customer_id | integer                     | not null
>> provider_id                | integer                     | not null
>> probe_type_num             | integer                     | not null
>>
>> with individual indexes on the everything from time to the bottom on the
>> child tables
>>
>> and
>>
>>
>>                             Table "portal.providers"
>>           Column           |            Type             |       Modifiers
>>
>>
>> ----------------------------+-----------------------------+------------------------
>> btime                      | timestamp without time zone | not null
>> default now()
>> mtime                      | timestamp without time zone | not null
>> default now()
>> version                    | integer                     | not null
>> default 1
>> provider_id                | integer                     | not null
>> provider_owner_zone_id     | integer                     | not null
>> provider_owner_customer_id | integer                     | not null
>> provider_category_id       | integer                     | not null
>> name                       | character varying(255)      | not null
>> display_name               | character varying(255)      | not null
>>
>> with indexes on every column with name ending in '_id'
>>
>>
> It gets more complicated:
>
> When I dropped to a query over 15 days instead of 30 days, I saw a huge
> bump in performance (about 16 secs), the query plan for which is here:
>
> http://explain.depesz.com/s/iaf
>
> note: the query is identical to the one below, but with the interval
> changed to 15 days from 30 days, which also keeps the query within a single
> partition.  Note that the sort requires almost no memory and occurs after
> the aggregation.  I thought my problems were solved, since reducing the
> normal window over which queries are performed is something the app can
> tolerate.
>
> However, if I keep the same 15 day window (so row count is approximately
> the same), but change the time window start date by 2 days (still keeping
> the entire query within the same partition), I get a completely different
> query plan.  There is effectively no difference between the two queries
> other than the start date of the time window in the where clause, but one
> executes in twice the time (35 secs or thereabouts).
>
> http://explain.depesz.com/s/LA
>
> Just for completeness' sake, I changed the query such that it is still 15
> days, but this time crosses a partition boundary.  The plan is very similar
> to the previous one and executes in about the same time (35 secs or so)
>
> http://explain.depesz.com/s/Aqw
>
> Statistics are up to date and were performed with default_statistics_target
> = 100
>
> Is there any way I can force the more efficient HashAggregate then sort
> plan instead of sort then GroupAggregate?
>
>
>

Reply via email to