The full set of conf changes that were in use during these tests are as follows:
default_statistics_target = 100 # pgtune wizard 2010-08-17 maintenance_work_mem = 1GB # pgtune wizard 2010-08-17 constraint_exclusion = on # pgtune wizard 2010-08-17 checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17 effective_cache_size = 36GB # sam work_mem = 1500MB # pgtune wizard 2010-08-17 wal_buffers = 8MB # pgtune wizard 2010-08-17 #checkpoint_segments = 16 # pgtune wizard 2010-08-17 checkpoint_segments = 30 # sam shared_buffers = 8GB # pgtune wizard 2010-08-17 max_connections = 80 # pgtune wizard 2010-08-17 cpu_tuple_cost = 0.0030 # sam cpu_index_tuple_cost = 0.0010 # sam cpu_operator_cost = 0.0005 # sam random_page_cost = 2.0 # sam On Wed, Aug 18, 2010 at 11:50 PM, Samuel Gendler <sgend...@ideasculptor.com>wrote: > On Wed, Aug 18, 2010 at 11:14 PM, Samuel Gendler < > sgend...@ideasculptor.com> wrote: > >> Please forgive the barrage of questions. I'm just learning how to tune >> things in postgres and I've still got a bit of learning curve to get over, >> apparently. I have done a lot of reading, though, I swear. >> >> I've got two identical queries except for a change of one condition which >> cuts the number of rows in half - which also has the effect of eliminating >> one partition from consideration (partitioned over time and I cut the time >> period in half). The query plans are considerably different as a result. >> The net result is that the fast query is 5x faster than the slow query. I'm >> curious if the alternate query plan is inherently faster or is it just a >> case of the algorithm scaling worse than linearly with the row count, which >> certainly wouldn't be surprising. The big win, for me, is that the sort >> uses vastly less memory. The slow plan requires work_mem to be 1500MB to >> even make it 5x worse. With a more reasonable work_mem (400MB), it drops to >> something like 15x worse because it has to sort on disk. >> >> fast plan: http://explain.depesz.com/s/iZ >> slow plan: http://explain.depesz.com/s/Dv2 >> >> query: >> >> >> EXPLAIN ANALYZE SELECT >> t_lookup.display_name as group, >> to_char(t_fact.time, 'DD/MM HH24:MI') as category, >> substring(t_lookup.display_name from 1 for 20) as label, >> round(sum(t_fact.total_ms)/sum(t_fact.count)) as value >> FROM >> portal.providers t_lookup, >> day_scale_radar_performance_fact t_fact >> WHERE >> t_fact.probe_type_num < 3 >> and t_lookup.provider_id = t_fact.provider_id >> and t_lookup.provider_owner_customer_id = >> t_fact.provider_owner_customer_id >> and t_fact.provider_owner_customer_id = 0 >> and t_fact.time between timezone('UTC', '2010-08-18 15:00:00') >> - interval '30 day' and timezone('UTC', '2010-08-18 15:00:00') >> GROUP BY >> t_fact.provider_owner_customer_id, t_fact.provider_id, >> t_lookup.display_name, >> t_fact.time >> ORDER BY >> t_fact.time >> >> table structure: >> >> Table "perf_reporting.abstract_radar_performance_fact" >> Column | Type | Modifiers >> ----------------------------+-----------------------------+----------- >> count | bigint | not null >> total_ms | bigint | not null >> time | timestamp without time zone | not null >> market_num | integer | not null >> country_num | integer | not null >> autosys_num | integer | not null >> provider_owner_zone_id | integer | not null >> provider_owner_customer_id | integer | not null >> provider_id | integer | not null >> probe_type_num | integer | not null >> >> with individual indexes on the everything from time to the bottom on the >> child tables >> >> and >> >> >> Table "portal.providers" >> Column | Type | Modifiers >> >> >> ----------------------------+-----------------------------+------------------------ >> btime | timestamp without time zone | not null >> default now() >> mtime | timestamp without time zone | not null >> default now() >> version | integer | not null >> default 1 >> provider_id | integer | not null >> provider_owner_zone_id | integer | not null >> provider_owner_customer_id | integer | not null >> provider_category_id | integer | not null >> name | character varying(255) | not null >> display_name | character varying(255) | not null >> >> with indexes on every column with name ending in '_id' >> >> > It gets more complicated: > > When I dropped to a query over 15 days instead of 30 days, I saw a huge > bump in performance (about 16 secs), the query plan for which is here: > > http://explain.depesz.com/s/iaf > > note: the query is identical to the one below, but with the interval > changed to 15 days from 30 days, which also keeps the query within a single > partition. Note that the sort requires almost no memory and occurs after > the aggregation. I thought my problems were solved, since reducing the > normal window over which queries are performed is something the app can > tolerate. > > However, if I keep the same 15 day window (so row count is approximately > the same), but change the time window start date by 2 days (still keeping > the entire query within the same partition), I get a completely different > query plan. There is effectively no difference between the two queries > other than the start date of the time window in the where clause, but one > executes in twice the time (35 secs or thereabouts). > > http://explain.depesz.com/s/LA > > Just for completeness' sake, I changed the query such that it is still 15 > days, but this time crosses a partition boundary. The plan is very similar > to the previous one and executes in about the same time (35 secs or so) > > http://explain.depesz.com/s/Aqw > > Statistics are up to date and were performed with default_statistics_target > = 100 > > Is there any way I can force the more efficient HashAggregate then sort > plan instead of sort then GroupAggregate? > > >