Re: [PERFORM] AutoVacuum_NapTime
George Sexton wrote: If I'm cold starting the system, would it vacuum all 330 databases and then wait 720 minutes and then do them all again, or would it distribute the databases more or less evenly over the time period? the latter -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Greg Smith wrote: Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a cache: Ron is correct that the problem of mainstream SATA drives accepting the cache flush command but not actually doing anything with it is long gone at this point. If you have a regular SATA drive, it almost certainly supports proper cache flushing. And if your whole software/storage stacks understands all that, you should not end up with corrupted data just because there's a volative write cache in there. OK, but I have a few questions. Is a write to the drive and a cache flush command the same? Which file systems implement both? I thought a write to the drive was always assumed to flush it to the platters, assuming the drive's cache is set to write-through. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Ron Mayer wrote: Bruce Momjian wrote: Agreed, thought I thought the problem was that SSDs lie about their cache flush like SATA drives do, or is there something I am missing? There's exactly one case I can find[1] where this century's IDE drives lied more than any other drive with a cache: Under 120GB Maxtor drives from late 2003 to early 2004. and it's apparently been worked around for years. Those drives claimed to support the FLUSH_CACHE_EXT feature (IDE command 0xEA), but did not support sending 48-bit commands which was needed to send the cache flushing command. And for that case a workaround for Linux was quickly identified by checking for *both* the support for 48-bit commands and support for the flush cache extension[2]. Beyond those 2004 drive + 2003 kernel systems, I think most the rest of such reports have been various misfeatures in some of Linux's filesystems (like EXT3 that only wants to send drives cache-flushing commands when inode change[3]) and linux software raid misfeatures ...and ISTM those would affect SSDs the same way they'd affect SATA drives. I think the point is not that drives lie about their write-back and write-through behavior, but rather that many SATA/IDE drives default to write-back, and not write-through, and many administrators an file systems are not aware of this behavior. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] shared_buffers
I'm reading the docs for 8.4.2, section 18.4.1 Memory. I'm trying to figure out what reasonable values for my usage would be. The doc says: shared_buffers (integer) Sets the amount of memory the database server uses for shared memory buffers. While circular definitions are always right, I'm kind of looking for some information like: PostgreSQL uses shared memory buffers for Could someone please explain what the role of shared buffers is? George Sexton MH Software, Inc. - Home of Connect Daily Web Calendar http://www.mhsoftware.com/ Voice: 303 438 9585 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] shared_buffers
George Sexton geor...@mhsoftware.com wrote: Could someone please explain what the role of shared buffers is? This Wiki page might be useful to you: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server The short answer (from that page) is: The shared_buffers configuration parameter determines how much memory is dedicated to PostgreSQL use for caching data. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Bruce Momjian wrote: Greg Smith wrote: If you have a regular SATA drive, it almost certainly supports proper cache flushing OK, but I have a few questions. Is a write to the drive and a cache flush command the same? I believe they're different as of ATAPI-6 from 2001. Which file systems implement both? Seems ZFS and recent ext4 have thought these interactions out thoroughly. Find a slow ext4 that people complain about, and that's the one doing it right :-). Ext3 has some particularly odd annoyances where it flushes and waits for certain writes (ones involving inode changes) but doesn't bother to flush others (just data changes). As far as I can tell, with ext3 you need userspace utilities to make sure flushes occur when you need them.At one point I was tempted to try to put such userspace hacks into postgres. I know less about other file systems. Apparently the NTFS guys are aware of such stuff - but don't know what kinds of fsync equivalent you'd need to make it happen. Also worth noting - Linux's software raid stuff (MD and LVM) need to handle this right as well - and last I checked (sometime last year) the default setups didn't. I thought a write to the drive was always assumed to flush it to the platters, assuming the drive's cache is set to write-through. Apparently somewhere around here: http://www.t10.org/t13/project/d1410r3a-ATA-ATAPI-6.pdf they were separated in the IDE world. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] plpgsql plan cache
Hi, I am trying to make a select query in my plpgsql function to use an index allowing an index scan instead of a seq scan. When running the query in the sql prompt, it works fine, but apparently the index is not used for the same query in the plpgsql function. The problem is not the data types of the parameters to the function or the query, they are identical. When I tried using EXECUTE in the plpgsql function, the index is being used. I thought the query planner must have made a bad decision when I created the function the first time. I therefore tried to drop the function, disconnect from the sql client, reconnect (to get a new session), create the function again. The function still runs slow though. I cannot understand why the index is not being used when in the plpgsql function? I even tried to make a test function containing nothing more than the single query. Still the index is not being used. When running the same query in the sql prompt, the index is in use though. Is there a way to someone clear the entire query cache or even better for a particular plpgsql function? I'm greatful for any ideas. Best regards, Joel Jacobson -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] plpgsql plan cache
I cannot understand why the index is not being used when in the plpgsql function? I even tried to make a test function containing nothing more than the single query. Still the index is not being used. When running the same query in the sql prompt, the index is in use though. Please post the following : - EXPLAIN ANALYZE your query directly in psql - PREPARE testq AS your query - EXPLAIN ANALYZE EXECUTE testq( your parameters ) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] plpgsql plan cache
db=# \d FlagValueAccountingTransactions Table public.flagvalueaccountingtransactions Column| Type |Modifiers -+-- + -- flagvalueid | integer | not null eventid | integer | not null transactionid | integer | not null recorddate | timestamp with time zone | not null debitaccountnumber | integer | not null creditaccountnumber | integer | not null debitaccountname| character varying| not null creditaccountname | character varying| not null amount | numeric | not null currency| character(3) | not null seqid | integer | not null default nextval('seqflagvalueaccountingtransactions'::regclass) undone | smallint | undoneseqid | integer | Indexes: flagvalueaccountingtransactions_pkey PRIMARY KEY, btree (seqid) index_flagvalueaccountingtransactions_eventid btree (eventid) index_flagvalueaccountingtransactions_flagvalueid btree (flagvalueid) index_flagvalueaccountingtransactions_recorddate btree (recorddate) db=# EXPLAIN ANALYZE SELECT SUM(Amount) FROM FlagValueAccountingTransactions WHERE FlagValueID = 182903 AND (RecordDate = '2008-10-21' AND RecordDate '2008-10-22') AND CreditAccountName = 'CLIENT_BALANCES' AND Currency = 'SEK'; QUERY PLAN Aggregate (cost=1291.74..1291.75 rows=1 width=7) (actual time=1.812..1.812 rows=1 loops=1) - Index Scan using index_flagvalueaccountingtransactions_recorddate on flagvalueaccountingtransactions (cost=0.00..1291.68 rows=25 width=7) (actual time=1.055..1.807 rows=1 loops=1) Index Cond: ((recorddate = '2008-10-21 00:00:00+02'::timestamp with time zone) AND (recorddate '2008-10-22 00:00:00+02'::timestamp with time zone)) Filter: ((flagvalueid = 182903) AND ((creditaccountname)::text = 'CLIENT_BALANCES'::text) AND (currency = 'SEK'::bpchar)) Total runtime: 1.847 ms (5 rows) db=# PREPARE myplan (integer,date,date,varchar,char(3)) AS SELECT SUM(Amount) FROM FlagValueAccountingTransactions WHERE FlagValueID = $1 AND RecordDate = $2 AND RecordDate $3 AND DebitAccountName = $4 AND Currency = $5;PREPARE PREPARE db=# EXPLAIN ANALYZE EXECUTE myplan(182903,'2008-10-21','2008-10-22','CLIENT_BALANCES','SEK'); QUERY PLAN Aggregate (cost=3932.75..3932.76 rows=1 width=7) (actual time=175.792..175.792 rows=1 loops=1) - Bitmap Heap Scan on flagvalueaccountingtransactions (cost=2283.91..3932.74 rows=1 width=7) (actual time=175.747..175.767 rows=4 loops=1) Recheck Cond: ((recorddate = $2) AND (recorddate $3) AND (flagvalueid = $1)) Filter: (((debitaccountname)::text = ($4)::text) AND (currency = $5)) - BitmapAnd (cost=2283.91..2283.91 rows=582 width=0) (actual time=175.714..175.714 rows=0 loops=1) - Bitmap Index Scan on index_flagvalueaccountingtransactions_recorddate (cost=0.00..395.97 rows=21536 width=0) (actual time=1.158..1.158 rows=3432 loops=1) Index Cond: ((recorddate = $2) AND (recorddate $3)) - Bitmap Index Scan on index_flagvalueaccountingtransactions_flagvalueid (cost=0.00..1887.69 rows=116409 width=0) (actual time=174.132..174.132 rows=1338824 loops=1) Index Cond: (flagvalueid = $1) Total runtime: 175.879 ms (10 rows) Hm, it is strange the query planner is using two different strategies for the same query? On Feb 22, 2010, at 8:42 PM, Pierre C wrote: I cannot understand why the index is not being used when in the plpgsql function? I even tried to make a test function containing nothing more than the single query. Still the index is not being used. When running the same query in the sql prompt, the index is in use though. Please post the following : - EXPLAIN ANALYZE your query directly in psql - PREPARE testq AS your query - EXPLAIN ANALYZE EXECUTE testq( your parameters )
Re: [PERFORM] plpgsql plan cache
The planner knows that that particular date range is quite selective so it doesn't have to BitmapAnd two indexes together. The problem is that a prepared statement asks the db to plan the query without knowing anything about the parameters. I think functions behave in exactly the same way. Its kind of a pain but you can do your query with dynamic sql like on here: http://www.postgresql.org/docs/8.4/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN On Mon, Feb 22, 2010 at 2:58 PM, Joel Jacobson j...@gluefinance.com wrote: db=# \d FlagValueAccountingTransactions Table public.flagvalueaccountingtransactions Column| Type | Modifiers -+--+-- flagvalueid | integer | not null eventid | integer | not null transactionid | integer | not null recorddate | timestamp with time zone | not null debitaccountnumber | integer | not null creditaccountnumber | integer | not null debitaccountname| character varying| not null creditaccountname | character varying| not null amount | numeric | not null currency| character(3) | not null seqid | integer | not null default nextval('seqflagvalueaccountingtransactions'::regclass) undone | smallint | undoneseqid | integer | Indexes: flagvalueaccountingtransactions_pkey PRIMARY KEY, btree (seqid) index_flagvalueaccountingtransactions_eventid btree (eventid) index_flagvalueaccountingtransactions_flagvalueid btree (flagvalueid) index_flagvalueaccountingtransactions_recorddate btree (recorddate) db=# EXPLAIN ANALYZE SELECT SUM(Amount) FROM FlagValueAccountingTransactions WHERE FlagValueID = 182903 AND (RecordDate = '2008-10-21' AND RecordDate '2008-10-22') AND CreditAccountName = 'CLIENT_BALANCES' AND Currency = 'SEK'; QUERY PLAN Aggregate (cost=1291.74..1291.75 rows=1 width=7) (actual time=1.812..1.812 rows=1 loops=1) - Index Scan using index_flagvalueaccountingtransactions_recorddate on flagvalueaccountingtransactions (cost=0.00..1291.68 rows=25 width=7) (actual time=1.055..1.807 rows=1 loops=1) Index Cond: ((recorddate = '2008-10-21 00:00:00+02'::timestamp with time zone) AND (recorddate '2008-10-22 00:00:00+02'::timestamp with time zone)) Filter: ((flagvalueid = 182903) AND ((creditaccountname)::text = 'CLIENT_BALANCES'::text) AND (currency = 'SEK'::bpchar)) Total runtime: 1.847 ms (5 rows) db=# PREPARE myplan (integer,date,date,varchar,char(3)) AS SELECT SUM(Amount) FROM FlagValueAccountingTransactions WHERE FlagValueID = $1 AND RecordDate = $2 AND RecordDate $3 AND DebitAccountName = $4 AND Currency = $5;PREPARE PREPARE db=# EXPLAIN ANALYZE EXECUTE myplan(182903,'2008-10-21','2008-10-22','CLIENT_BALANCES','SEK'); QUERY PLAN Aggregate (cost=3932.75..3932.76 rows=1 width=7) (actual time=175.792..175.792 rows=1 loops=1) - Bitmap Heap Scan on flagvalueaccountingtransactions (cost=2283.91..3932.74 rows=1 width=7) (actual time=175.747..175.767 rows=4 loops=1) Recheck Cond: ((recorddate = $2) AND (recorddate $3) AND (flagvalueid = $1)) Filter: (((debitaccountname)::text = ($4)::text) AND (currency = $5)) - BitmapAnd (cost=2283.91..2283.91 rows=582 width=0) (actual time=175.714..175.714 rows=0 loops=1) - Bitmap Index Scan on index_flagvalueaccountingtransactions_recorddate (cost=0.00..395.97 rows=21536 width=0) (actual time=1.158..1.158 rows=3432 loops=1) Index Cond: ((recorddate = $2) AND (recorddate $3)) - Bitmap Index Scan on index_flagvalueaccountingtransactions_flagvalueid (cost=0.00..1887.69 rows=116409 width=0) (actual time=174.132..174.132 rows=1338824 loops=1) Index Cond: (flagvalueid = $1) Total runtime: 175.879 ms (10 rows) Hm, it is strange the query planner is using two different strategies for the same query? On Feb 22, 2010, at 8:42 PM, Pierre C wrote: I cannot understand why the index is not being used when in the plpgsql function? I even tried to make a test function containing nothing more than the single query. Still the index is
Re: [PERFORM] plpgsql plan cache
Actually, planner was smart in using a bitmap index scan in the prepared query. Suppose you later EXECUTE that canned plan with a date range which covers say half of the table : the indexscan would be a pretty bad choice since it would have to access half the rows in the table in index order, which is potentially random disk IO. Bitmap Index Scan is slower in your high-selectivity case, but it can withstand much more abuse on the parameters. PG supports the quite clever syntax of EXECUTE 'blah' USING params, you don't even need to mess with quoting. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
On 02/22/2010 08:04 PM, Greg Smith wrote: Arjen van der Meijden wrote: That's weird. Intel's SSD's didn't have a write cache afaik: I asked Intel about this and it turns out that the DRAM on the Intel drive isn't used for user data because of the risk of data loss, instead it is used as memory by the Intel SATA/flash controller for deciding exactly where to write data (I'm assuming for the wear leveling/reliability algorithms). http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403p=10 Read further down: Despite the presence of the external DRAM, both the Intel controller and the JMicron rely on internal buffers to cache accesses to the SSD...Intel's controller has a 256KB SRAM on-die. That's the problematic part: the Intel controllers have a volatile 256KB write cache stored deep inside the SSD controller, and issuing a standard SATA write cache flush command doesn't seem to clear it. Makes the drives troublesome for database use. I had read the above when posted, and then looked up SRAM. SRAM seems to suggest it will hold the data even after power loss, but only for a period of time. As long as power can restore within a few minutes, it seemed like this would be ok? I can understand a SSD might do unexpected things when it loses power all of a sudden. It will probably try to group writes to fill a single block (and those blocks vary in size but are normally way larger than those of a normal spinning disk, they are values like 256 or 512KB) and it might loose that waiting until a full block can be written-data or perhaps it just couldn't complete a full block-write due to the power failure. Although that behavior isn't really what you want, it would be incorrect to blame write caching for the behavior if the device doesn't even have a write cache ;) If you write data and that write call returns before the data hits disk, it's a write cache, period. And if that write cache loses its contents if power is lost, it's a volatile write cache that can cause database corruption. The fact that the one on the Intel devices is very small, basically just dealing with the block chunking behavior you describe, doesn't change either of those facts. The SRAM seems to suggest that it does not necessarily lose its contents if power is lost - it just doesn't say how long you have to plug it back in. Isn't this similar to a battery-backed cache or capacitor-backed cache? I'd love to have a better guarantee - but is SRAM really such a bad model? Cheers, mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Ron Mayer wrote: I know less about other file systems. Apparently the NTFS guys are aware of such stuff - but don't know what kinds of fsync equivalent you'd need to make it happen. It's actually pretty straightforward--better than ext3. Windows with NTFS has been perfectly aware how to do write-through on drives that support it when you execute _commit for some time: http://msdn.microsoft.com/en-us/library/17618685(VS.80).aspx If you switch the postgresql.conf setting to fsync_writethrough on Windows, it will execute _commit where it would execute fsync on other platforms, and that pushes through the drive's caches as it should (unlike fsync in many cases). More about this at http://archives.postgresql.org/pgsql-hackers/2005-08/msg00227.php and http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm (which also covers OS X). -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
On Mon, Feb 22, 2010 at 6:39 PM, Greg Smith g...@2ndquadrant.com wrote: Mark Mielke wrote: I had read the above when posted, and then looked up SRAM. SRAM seems to suggest it will hold the data even after power loss, but only for a period of time. As long as power can restore within a few minutes, it seemed like this would be ok? The normal type of RAM everyone uses is DRAM, which requires constrant refresh cycles to keep it working and is pretty power hungry as a result. Power gone, data gone an instant later. Actually, oddly enough, per bit stored dram is much lower power usage than sram, because it only has something like 2 transistors per bit, while sram needs something like 4 or 5 (it's been a couple decades since I took the classes on each). Even with the constant refresh, dram has a lower power draw than sram. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
On Mon, Feb 22, 2010 at 7:21 PM, Scott Marlowe scott.marl...@gmail.com wrote: On Mon, Feb 22, 2010 at 6:39 PM, Greg Smith g...@2ndquadrant.com wrote: Mark Mielke wrote: I had read the above when posted, and then looked up SRAM. SRAM seems to suggest it will hold the data even after power loss, but only for a period of time. As long as power can restore within a few minutes, it seemed like this would be ok? The normal type of RAM everyone uses is DRAM, which requires constrant refresh cycles to keep it working and is pretty power hungry as a result. Power gone, data gone an instant later. Actually, oddly enough, per bit stored dram is much lower power usage than sram, because it only has something like 2 transistors per bit, while sram needs something like 4 or 5 (it's been a couple decades since I took the classes on each). Even with the constant refresh, dram has a lower power draw than sram. Note that's power draw per bit. dram is usually much more densely packed (it can be with fewer transistors per cell) so the individual chips for each may have similar power draws while the dram will be 10 times as densely packed as the sram. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Advice requested on structuring aggregation queries
Hi folks I have an application which collects performance stats at time intervals, to which I am retro-fitting a table partitioning scheme in order to improve scalability. The original data is keyed by a 3-ary tuple of strings to keep the row size down, in the new data model I'm actually storing 32-bit int's in Postgres. The new schema for each table looks like this: (a integer, b integer, c integer, ts timestamp without timezone, value double precision) with two indexes: (a, b, ts) and (b, ts) If the table had a primary key, it would be (a, b, c, ts) but I don't need one, and I don't want to carry column c in the index as it isn't necessary ... the number of distinct values of c for any (a,b) pair is small (less than 10). Queries that read the data for graphing are of the form: . where a=constant and b=constant and ts between x and y One of the things I need to do periodically is roll up this data, e.g. calculate min / max / average values of a given datum over a period for use in graphing at larger time scales. The rollup tables use the same schema as the raw data ones. There are about 60 different values of b, and for each such value there is a exactly one type of rollup. The old code is doing the rollups in Postgres with 60 bulk insert into select statements, hence the need for the second index. For rollups which are instrinsic SQL functions, and only depend on the value column, this is straightforward, e.g. to take the averages: insert into rollup_table select a, 17, c, '2009-02-22 19:00', avg(value) from data_table where b=17 and ts between '2009-02-22 18:00' and '2009-02-22 19:00' group by a, c; So far so good. This gets slightly more tricky if the rollup you want to do isn't a simple SQL function I have two such cases, (a) the most recent value and (b) the modal (most common) value. In both cases, I've been doing this using a dependent subquery in the select clause, e.g. for the most recent value: insert into rollup_table select a, 23, c, '2009-02-23 00:00', (select y.value from (select z.value, z.ts from data_table z where z.a=.a and z.b=x.b and z.c=x.c) y order by y.ts desc limit 1) from data_table x where b=23 group by a, c; Due to the second index, this actually performs quite well, but it of course depends on doing a small index scan on the same data_table for each row of the outer select. I have a few questions: *Q1.* Is this the right place for the subquery, or would it be better to move it to the where clause, e.g. insert into rollup_table select a, 23, c, '2009-02-23 00:00', value from data_table x where b=23 and ts=(select max(y.ts) from data_table y where y.a=x.a and y.b=23 and y.c=x.c) group by a, c; or does the answer depend on row stats? Is there a rule of thumb on whether to prefer limit 1 vs. max? There will be between 20 and 300 rows with different timestamps for each (a,b,c) tuple. For better scalability, I am partitioning these tables by time I am not using PG's table inheritance and triggers to do the partitioning, but instead dynamically generating the SQL and table names in the application code (Java). In most cases, the rollups will still happen from a single source data_table and I plan to continue using the existing SQL, but I have a few cases where the source data_table rows may actually come from two adjacent tables. For intrinsic SQL rollup functions like avg / min / max this is pretty easy: insert into rollup_table select a, 17, c, '... date ...', max(value) from ( select from data_table_1 where b=17 and ... union all select from data_table_2 where b=17 and ... ) group by a, c; *Q2.* Is there any benefit to doing rollups (max() and group by) in the inner queries inside the UNION ALL, or is it a wash? For now I'm expecting each inner select to produce 20-200 rows per unqiue (a,b,c) combination, and to be unioning no more than 2 tables at a time. (I'm aware that max() and min() are composable, but that doing this with avg() and getting correct results involves using sum's and count's' in the subqueries and doing a division in the outer select, but AFAICT that's orthogonal to the performance considerations of whether to do the inner rollups or not). *Q3.* (You saw it coming) What really has me wrapped around the axle is how to do the most recent and mode rollups efficiently in this scenario in the examples in Q1 above, in both cases the SQL references the data_tabletwice, which is fine if it's a real table, but in the partiioned scenario the data has to be pulled together at run time by the UNION ALL subquery, and I'd prefer not to execute it multiple times - I'm not much of a SQL wordsmith but I can't see way to write this in plain SQL without repeating the subquery, e.g. insert into rollup_table select a, 23, c, '2009-02-23 00:00', (select y.value from (select z.value, z.ts from *(... union all ...)* z where z.a=x.a and z.b=23 and z.c=x.c) y order by