On Thu, Aug 16, 2012 at 5:20 PM, Josh Berkus <j...@agliodbs.com> wrote: > I'm mainly looking for numbers for the 9.2 release. Like, "up to 14X > speedup on data warehousing queries".
OK: I work in the multi family industry and we have a large database which we are organizing into an on-demand OLAP style data delivery project. The data is organized into a fact table like this: (PropertyId, Floorplan, UnitType, TimeSeries, AggregationTypeCode, MetricCode, MetricValue) Where metric value contains a single numeric value for the calculated metric (say, average rent). The other fields point at the identifying criteria for the metric: property it pertains to, etc. TimeSeries represents a point in time: It's a string that is 'Y2012M01', 'Y2011Q3', etc. The table is partitioned on a two year basis. The 2010_2011 partition has 37million records and is only expected to grow as we add new properties and metrics. One of the important questions this table has to answer is to gather metric values like this: SELECT PropertyId, AGG(MetricValue) FROM PrepertyMetric WHERE Floorplan = w and UnitType = x and TimeSeries = y and MetricCode = z; And that query might return 7k-15k records depending. It is extremely important to be fast and cache warm-up is a huge issue for us as we have various nightly processes that blow the cache out. The main interface to the function is a routine called GetChartData which runs the query above N times looking back in time from a known point (typically 12 times) and returns arrays which get converted to json when going out the door. On the 9.2 database I organized one of the partitions (only) to utilize IOS and started capturing timings when I came in this morning with a completely cold cache. Partition tables have been chilled as part of the nightly build. hese are 100% real world results. output data is 3 parallel arrays of size 12. 9.1 Run 1 (Cold), 12M partition Time: 5147.000 ms 9.1 Run 2 (Warm), 12M partition Time: 219.000 ms 9.2 Run 1 (Cold) 12M partition, Index Only scan Time: 257.000 ms 9.2 Run 2 (Warm) 2M partition, Index Only scan Time: 92.000 ms 9.1 Run 1 (Cold) 37M partition Time: 22074.000 ms 9.1 Run 2 (Warm) 37M partition Time: 435.000 ms 9.2 Run 1 (Cold) 37M partition (not IOS!) Time: 7629.000 ms 9.2 Run 2 (Warm) 37M partition (not IOS!) Time: 183.000 ms The takeaway is that for this query I'm seeing end to end query execution improvement of 58% in the worst case (warm cache) and 20x or more in the best case when the cache is cold: that 22 sec time is unpredictable and can run into the minutes as the index randomizes the i/o to the heap (the table is not clustered on this index nor can it be). This 'best case' is extremely important to us as the generally the first thing the user sees when firing up the application the first time and with 9.2 it's a 'night and day' experience. Here is the bottom line in 9.1 vs 9.2 IOS: 9.1: ysanalysis=# explain (buffers, analyze) SELECT ... QUERY PLAN ---------------------------------------------------------------------------------------- Result (cost=0.00..0.26 rows=1 width=0) (actual time=219.979..219.979 rows=1 loops=1) Buffers: shared hit=176227 Total runtime: 219.990 ms 9.2: ysanalysis=# explain (buffers, analyze) SELECT ... QUERY PLAN -------------------------------------------------------------------------------------- Result (cost=0.00..0.26 rows=1 width=0) (actual time=92.909..92.909 rows=1 loops=1) Buffers: shared hit=27198 Total runtime: 92.921 ms That's going to translate into less cache pressure and overall cpu efficiency. It's a also a good technique to manage i/o patterns especially for those of us that don't have the luxury of an all-SSD storage unit (my backend is a generally-good-but-you-never-know-performance enterprise SAN). Good knowledge of the database internals is helpful and precise indexing is a must. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers