Re: [PERFORM] AutoVacuum_NapTime

2010-02-22 Thread Alvaro Herrera
George Sexton wrote:

 If I'm cold starting the system, would it vacuum all 330 databases and then
 wait 720 minutes and then do them all again, or would it distribute the
 databases more or less evenly over the time period?

the latter

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Bruce Momjian
Greg Smith wrote:
 Ron Mayer wrote:
  Bruce Momjian wrote:

  Agreed, thought I thought the problem was that SSDs lie about their
  cache flush like SATA drives do, or is there something I am missing?
  
 
  There's exactly one case I can find[1] where this century's IDE
  drives lied more than any other drive with a cache:
 
 Ron is correct that the problem of mainstream SATA drives accepting the 
 cache flush command but not actually doing anything with it is long gone 
 at this point.  If you have a regular SATA drive, it almost certainly 
 supports proper cache flushing.  And if your whole software/storage 
 stacks understands all that, you should not end up with corrupted data 
 just because there's a volative write cache in there.

OK, but I have a few questions.  Is a write to the drive and a cache
flush command the same?  Which file systems implement both?  I thought a
write to the drive was always assumed to flush it to the platters,
assuming the drive's cache is set to write-through.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Bruce Momjian
Ron Mayer wrote:
 Bruce Momjian wrote:
  Agreed, thought I thought the problem was that SSDs lie about their
  cache flush like SATA drives do, or is there something I am missing?
 
 There's exactly one case I can find[1] where this century's IDE
 drives lied more than any other drive with a cache:
 
   Under 120GB Maxtor drives from late 2003 to early 2004.
 
 and it's apparently been worked around for years.
 
 Those drives claimed to support the FLUSH_CACHE_EXT feature (IDE
 command 0xEA), but did not support sending 48-bit commands which
 was needed to send the cache flushing command.
 
 And for that case a workaround for Linux was quickly identified by
 checking for *both* the support for 48-bit commands and support for the
 flush cache extension[2].
 
 
 Beyond those 2004 drive + 2003 kernel systems, I think most the rest
 of such reports have been various misfeatures in some of Linux's
 filesystems (like EXT3 that only wants to send drives cache-flushing
 commands when inode change[3]) and linux software raid misfeatures
 
 ...and ISTM those would affect SSDs the same way they'd affect SATA drives.

I think the point is not that drives lie about their write-back and
write-through behavior, but rather that many SATA/IDE drives default to
write-back, and not write-through, and many administrators an file
systems are not aware of this behavior.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] shared_buffers

2010-02-22 Thread George Sexton
I'm reading the docs for 8.4.2, section 18.4.1 Memory. I'm trying to figure
out what reasonable values for my usage would be.

The doc says:

shared_buffers (integer)

Sets the amount of memory the database server uses for shared memory
buffers.

While circular definitions are always right, I'm kind of looking for some
information like:

PostgreSQL uses shared memory buffers for 

Could someone please explain what the role of shared buffers is?

George Sexton
MH Software, Inc. - Home of Connect Daily Web Calendar
http://www.mhsoftware.com/
Voice: 303 438 9585
  




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers

2010-02-22 Thread Kevin Grittner
George Sexton geor...@mhsoftware.com wrote:
 
 Could someone please explain what the role of shared buffers is?
 
This Wiki page might be useful to you:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 
The short answer (from that page) is:
 
The shared_buffers configuration parameter determines how much
memory is dedicated to PostgreSQL use for caching data.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Ron Mayer
Bruce Momjian wrote:
 Greg Smith wrote:
  If you have a regular SATA drive, it almost certainly 
 supports proper cache flushing
 
 OK, but I have a few questions.  Is a write to the drive and a cache
 flush command the same?

I believe they're different as of ATAPI-6 from 2001.

 Which file systems implement both?

Seems ZFS and recent ext4 have thought these interactions out
thoroughly.   Find a slow ext4 that people complain about, and
that's the one doing it right :-).

Ext3 has some particularly odd annoyances where it flushes and waits
for certain writes (ones involving inode changes) but doesn't bother
to flush others (just data changes).   As far as I can tell, with
ext3 you need userspace utilities to make sure flushes occur when
you need them.At one point I was tempted to try to put such
userspace hacks into postgres.

I know less about other file systems.  Apparently the NTFS guys
are aware of such stuff - but don't know what kinds of fsync equivalent
you'd need to make it happen.

Also worth noting - Linux's software raid stuff (MD and LVM)
need to handle this right as well - and last I checked (sometime
last year) the default setups didn't.

  I thought a
 write to the drive was always assumed to flush it to the platters,
 assuming the drive's cache is set to write-through.

Apparently somewhere around here:
http://www.t10.org/t13/project/d1410r3a-ATA-ATAPI-6.pdf
they were separated in the IDE world.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] plpgsql plan cache

2010-02-22 Thread Joel Jacobson

Hi,

I am trying to make a select query in my plpgsql function to use an  
index allowing an index scan instead of a seq scan.


When running the query in the sql prompt,  it works fine, but  
apparently the index is not used for the same query in the plpgsql  
function.


The problem is not the data types of the parameters to the function or  
the query, they are identical.


When I tried using EXECUTE in the plpgsql function, the index is being  
used.


I thought the query planner must have made a bad decision when I  
created the function the first time.


I therefore tried to drop the function, disconnect from the sql  
client, reconnect (to get a new session), create the function again.

The function still runs slow though.

I cannot understand why the index is not being used when in the  
plpgsql function?
I even tried to make a test function containing nothing more than the  
single query. Still the index is not being used.
When running the same query in the sql prompt, the index is in use  
though.


Is there a way to someone clear the entire query cache or even better  
for a particular plpgsql function?


I'm greatful for any ideas.

Best regards,

Joel Jacobson


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plpgsql plan cache

2010-02-22 Thread Pierre C


I cannot understand why the index is not being used when in the plpgsql  
function?
I even tried to make a test function containing nothing more than the  
single query. Still the index is not being used.
When running the same query in the sql prompt, the index is in use  
though.


Please post the following :

- EXPLAIN ANALYZE your query directly in psql
- PREPARE testq AS your query
- EXPLAIN ANALYZE EXECUTE testq( your parameters )

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plpgsql plan cache

2010-02-22 Thread Joel Jacobson

db=# \d FlagValueAccountingTransactions
  Table  
public.flagvalueaccountingtransactions
   Column|   Type
|Modifiers
-+-- 
+ 
--

 flagvalueid | integer  | not null
 eventid | integer  | not null
 transactionid   | integer  | not null
 recorddate  | timestamp with time zone | not null
 debitaccountnumber  | integer  | not null
 creditaccountnumber | integer  | not null
 debitaccountname| character varying| not null
 creditaccountname   | character varying| not null
 amount  | numeric  | not null
 currency| character(3) | not null
 seqid   | integer  | not null default  
nextval('seqflagvalueaccountingtransactions'::regclass)

 undone  | smallint |
 undoneseqid | integer  |
Indexes:
flagvalueaccountingtransactions_pkey PRIMARY KEY, btree (seqid)
index_flagvalueaccountingtransactions_eventid btree (eventid)
index_flagvalueaccountingtransactions_flagvalueid btree  
(flagvalueid)
index_flagvalueaccountingtransactions_recorddate btree  
(recorddate)


db=# EXPLAIN ANALYZE SELECT SUM(Amount) FROM  
FlagValueAccountingTransactions WHERE FlagValueID = 182903 AND  
(RecordDate = '2008-10-21' AND RecordDate  '2008-10-22') AND  
CreditAccountName = 'CLIENT_BALANCES' AND Currency = 'SEK';


 QUERY 
 PLAN


 Aggregate  (cost=1291.74..1291.75 rows=1 width=7) (actual  
time=1.812..1.812 rows=1 loops=1)
   -  Index Scan using  
index_flagvalueaccountingtransactions_recorddate on  
flagvalueaccountingtransactions  (cost=0.00..1291.68 rows=25 width=7)  
(actual time=1.055..1.807 rows=1 loops=1)
 Index Cond: ((recorddate = '2008-10-21  
00:00:00+02'::timestamp with time zone) AND (recorddate  '2008-10-22  
00:00:00+02'::timestamp with time zone))
 Filter: ((flagvalueid = 182903) AND  
((creditaccountname)::text = 'CLIENT_BALANCES'::text) AND (currency =  
'SEK'::bpchar))

 Total runtime: 1.847 ms
(5 rows)

db=# PREPARE myplan (integer,date,date,varchar,char(3)) AS SELECT  
SUM(Amount) FROM FlagValueAccountingTransactions WHERE FlagValueID =  
$1 AND RecordDate = $2 AND RecordDate  $3 AND DebitAccountName  = $4  
AND Currency = $5;PREPARE

PREPARE

db=# EXPLAIN ANALYZE EXECUTE  
myplan(182903,'2008-10-21','2008-10-22','CLIENT_BALANCES','SEK');


   QUERY 
 PLAN


 Aggregate  (cost=3932.75..3932.76 rows=1 width=7) (actual  
time=175.792..175.792 rows=1 loops=1)
   -  Bitmap Heap Scan on flagvalueaccountingtransactions   
(cost=2283.91..3932.74 rows=1 width=7) (actual time=175.747..175.767  
rows=4 loops=1)
 Recheck Cond: ((recorddate = $2) AND (recorddate  $3) AND  
(flagvalueid = $1))
 Filter: (((debitaccountname)::text = ($4)::text) AND  
(currency = $5))
 -  BitmapAnd  (cost=2283.91..2283.91 rows=582 width=0)  
(actual time=175.714..175.714 rows=0 loops=1)
   -  Bitmap Index Scan on  
index_flagvalueaccountingtransactions_recorddate  (cost=0.00..395.97  
rows=21536 width=0) (actual time=1.158..1.158 rows=3432 loops=1)
 Index Cond: ((recorddate = $2) AND (recorddate  
 $3))
   -  Bitmap Index Scan on  
index_flagvalueaccountingtransactions_flagvalueid  (cost=0.00..1887.69  
rows=116409 width=0) (actual time=174.132..174.132 rows=1338824  
loops=1) Index Cond: (flagvalueid = $1)

 Total runtime: 175.879 ms
(10 rows)



Hm, it is strange the query planner is using two different strategies  
for the same query?




On Feb 22, 2010, at 8:42 PM, Pierre C wrote:



I cannot understand why the index is not being used when in the  
plpgsql function?
I even tried to make a test function containing nothing more than  
the single query. Still the index is not being used.
When running the same query in the sql prompt, the index is in use  
though.


Please post the following :

- EXPLAIN ANALYZE your query directly in psql
- PREPARE testq AS your query
- EXPLAIN ANALYZE EXECUTE testq( your parameters )




Re: [PERFORM] plpgsql plan cache

2010-02-22 Thread Nikolas Everett
The planner knows that that particular date range is quite selective so it
doesn't have to BitmapAnd two indexes together.

The problem is that a prepared statement asks the db to plan the query
without knowing anything about the parameters.  I think functions behave in
exactly the same way.  Its kind of a pain but you can do your query with
dynamic sql like on here:
http://www.postgresql.org/docs/8.4/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN

On Mon, Feb 22, 2010 at 2:58 PM, Joel Jacobson j...@gluefinance.com wrote:

 db=# \d FlagValueAccountingTransactions
   Table
 public.flagvalueaccountingtransactions
Column|   Type   |
  Modifiers

 -+--+--
  flagvalueid | integer  | not null
  eventid | integer  | not null
  transactionid   | integer  | not null
  recorddate  | timestamp with time zone | not null
  debitaccountnumber  | integer  | not null
  creditaccountnumber | integer  | not null
  debitaccountname| character varying| not null
  creditaccountname   | character varying| not null
  amount  | numeric  | not null
  currency| character(3) | not null
  seqid   | integer  | not null default
 nextval('seqflagvalueaccountingtransactions'::regclass)
  undone  | smallint |
  undoneseqid | integer  |
 Indexes:
 flagvalueaccountingtransactions_pkey PRIMARY KEY, btree (seqid)
 index_flagvalueaccountingtransactions_eventid btree (eventid)
 index_flagvalueaccountingtransactions_flagvalueid btree (flagvalueid)
 index_flagvalueaccountingtransactions_recorddate btree (recorddate)

 db=# EXPLAIN ANALYZE SELECT SUM(Amount) FROM
 FlagValueAccountingTransactions WHERE FlagValueID = 182903 AND (RecordDate
 = '2008-10-21' AND RecordDate  '2008-10-22') AND CreditAccountName =
 'CLIENT_BALANCES' AND Currency = 'SEK';


   QUERY PLAN


 
  Aggregate  (cost=1291.74..1291.75 rows=1 width=7) (actual
 time=1.812..1.812 rows=1 loops=1)
-  Index Scan using index_flagvalueaccountingtransactions_recorddate on
 flagvalueaccountingtransactions  (cost=0.00..1291.68 rows=25 width=7)
 (actual time=1.055..1.807 rows=1 loops=1)
  Index Cond: ((recorddate = '2008-10-21 00:00:00+02'::timestamp
 with time zone) AND (recorddate  '2008-10-22 00:00:00+02'::timestamp with
 time zone))
  Filter: ((flagvalueid = 182903) AND ((creditaccountname)::text =
 'CLIENT_BALANCES'::text) AND (currency = 'SEK'::bpchar))
  Total runtime: 1.847 ms
 (5 rows)

 db=# PREPARE myplan (integer,date,date,varchar,char(3)) AS SELECT
 SUM(Amount) FROM FlagValueAccountingTransactions WHERE FlagValueID = $1 AND
 RecordDate = $2 AND RecordDate  $3 AND DebitAccountName  = $4 AND Currency
 = $5;PREPARE
 PREPARE

 db=# EXPLAIN ANALYZE EXECUTE
 myplan(182903,'2008-10-21','2008-10-22','CLIENT_BALANCES','SEK');


 QUERY PLAN


 
  Aggregate  (cost=3932.75..3932.76 rows=1 width=7) (actual
 time=175.792..175.792 rows=1 loops=1)
-  Bitmap Heap Scan on flagvalueaccountingtransactions
  (cost=2283.91..3932.74 rows=1 width=7) (actual time=175.747..175.767 rows=4
 loops=1)
  Recheck Cond: ((recorddate = $2) AND (recorddate  $3) AND
 (flagvalueid = $1))
  Filter: (((debitaccountname)::text = ($4)::text) AND (currency =
 $5))
  -  BitmapAnd  (cost=2283.91..2283.91 rows=582 width=0) (actual
 time=175.714..175.714 rows=0 loops=1)
-  Bitmap Index Scan on
 index_flagvalueaccountingtransactions_recorddate  (cost=0.00..395.97
 rows=21536 width=0) (actual time=1.158..1.158 rows=3432 loops=1)
  Index Cond: ((recorddate = $2) AND (recorddate  $3))
-  Bitmap Index Scan on
 index_flagvalueaccountingtransactions_flagvalueid  (cost=0.00..1887.69
 rows=116409 width=0) (actual time=174.132..174.132 rows=1338824 loops=1)
 Index Cond: (flagvalueid = $1)
  Total runtime: 175.879 ms
 (10 rows)



 Hm, it is strange the query planner is using two different strategies for
 the same query?



 On Feb 22, 2010, at 8:42 PM, Pierre C wrote:


 I cannot understand why the index is not being used when in the plpgsql
 function?

 I even tried to make a test function containing nothing more than the
 single query. Still the index is 

Re: [PERFORM] plpgsql plan cache

2010-02-22 Thread Pierre C


Actually, planner was smart in using a bitmap index scan in the prepared  
query. Suppose you later EXECUTE that canned plan with a date range which  
covers say half of the table : the indexscan would be a pretty bad choice  
since it would have to access half the rows in the table in index order,  
which is potentially random disk IO. Bitmap Index Scan is slower in your  
high-selectivity case, but it can withstand much more abuse on the  
parameters.


PG supports the quite clever syntax of EXECUTE 'blah' USING params, you  
don't even need to mess with quoting.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Mark Mielke

On 02/22/2010 08:04 PM, Greg Smith wrote:

Arjen van der Meijden wrote:

That's weird. Intel's SSD's didn't have a write cache afaik:
I asked Intel about this and it turns out that the DRAM on the Intel 
drive isn't used for user data because of the risk of data loss, 
instead it is used as memory by the Intel SATA/flash controller for 
deciding exactly where to write data (I'm assuming for the wear 
leveling/reliability algorithms).

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403p=10


Read further down:

Despite the presence of the external DRAM, both the Intel controller 
and the JMicron rely on internal buffers to cache accesses to the 
SSD...Intel's controller has a 256KB SRAM on-die.


That's the problematic part:  the Intel controllers have a volatile 
256KB write cache stored deep inside the SSD controller, and issuing a 
standard SATA write cache flush command doesn't seem to clear it.  
Makes the drives troublesome for database use.


I had read the above when posted, and then looked up SRAM. SRAM seems to 
suggest it will hold the data even after power loss, but only for a 
period of time. As long as power can restore within a few minutes, it 
seemed like this would be ok?


I can understand a SSD might do unexpected things when it loses power 
all of a sudden. It will probably try to group writes to fill a 
single block (and those blocks vary in size but are normally way 
larger than those of a normal spinning disk, they are values like 256 
or 512KB) and it might loose that waiting until a full block can be 
written-data or perhaps it just couldn't complete a full block-write 
due to the power failure.
Although that behavior isn't really what you want, it would be 
incorrect to blame write caching for the behavior if the device 
doesn't even have a write cache ;)


If you write data and that write call returns before the data hits 
disk, it's a write cache, period.  And if that write cache loses its 
contents if power is lost, it's a volatile write cache that can cause 
database corruption.  The fact that the one on the Intel devices is 
very small, basically just dealing with the block chunking behavior 
you describe, doesn't change either of those facts.




The SRAM seems to suggest that it does not necessarily lose its contents 
if power is lost - it just doesn't say how long you have to plug it back 
in. Isn't this similar to a battery-backed cache or capacitor-backed cache?


I'd love to have a better guarantee - but is SRAM really such a bad model?

Cheers,
mark


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Greg Smith

Ron Mayer wrote:

I know less about other file systems.  Apparently the NTFS guys
are aware of such stuff - but don't know what kinds of fsync equivalent
you'd need to make it happen.
  


It's actually pretty straightforward--better than ext3.  Windows with 
NTFS has been perfectly aware how to do write-through on drives that 
support it when you execute _commit for some time: 
http://msdn.microsoft.com/en-us/library/17618685(VS.80).aspx


If you switch the postgresql.conf setting to fsync_writethrough on 
Windows, it will execute _commit where it would execute fsync on other 
platforms, and that pushes through the drive's caches as it should 
(unlike fsync in many cases).  More about this at 
http://archives.postgresql.org/pgsql-hackers/2005-08/msg00227.php and 
http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm (which 
also covers OS X).


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 6:39 PM, Greg Smith g...@2ndquadrant.com wrote:
 Mark Mielke wrote:

 I had read the above when posted, and then looked up SRAM. SRAM seems to
 suggest it will hold the data even after power loss, but only for a period
 of time. As long as power can restore within a few minutes, it seemed like
 this would be ok?

 The normal type of RAM everyone uses is DRAM, which requires constrant
 refresh cycles to keep it working and is pretty power hungry as a result.
  Power gone, data gone an instant later.

Actually, oddly enough, per bit stored dram is much lower power usage
than sram, because it only has something like 2 transistors per bit,
while sram needs something like 4 or 5 (it's been a couple decades
since I took the classes on each).  Even with the constant refresh,
dram has a lower power draw than sram.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 7:21 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 6:39 PM, Greg Smith g...@2ndquadrant.com wrote:
 Mark Mielke wrote:

 I had read the above when posted, and then looked up SRAM. SRAM seems to
 suggest it will hold the data even after power loss, but only for a period
 of time. As long as power can restore within a few minutes, it seemed like
 this would be ok?

 The normal type of RAM everyone uses is DRAM, which requires constrant
 refresh cycles to keep it working and is pretty power hungry as a result.
  Power gone, data gone an instant later.

 Actually, oddly enough, per bit stored dram is much lower power usage
 than sram, because it only has something like 2 transistors per bit,
 while sram needs something like 4 or 5 (it's been a couple decades
 since I took the classes on each).  Even with the constant refresh,
 dram has a lower power draw than sram.

Note that's power draw per bit.  dram is usually much more densely
packed (it can be with fewer transistors per cell) so the individual
chips for each may have similar power draws while the dram will be 10
times as densely packed as the sram.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Advice requested on structuring aggregation queries

2010-02-22 Thread Dave Crooke
Hi folks

I have an application which collects performance stats at time intervals, to
which I am retro-fitting a table partitioning scheme in order to improve
scalability.

The original data is keyed by a 3-ary tuple of strings  to keep the row
size down, in the new data model I'm actually storing 32-bit int's in
Postgres. The new schema for each table looks like this:

(a integer,
 b integer,
 c integer,
 ts timestamp without timezone,
 value double precision)

with two indexes: (a, b, ts) and (b, ts)

If the table had a primary key, it would be (a, b, c, ts) but I don't need
one, and I don't want to carry column c in the index as it isn't necessary
... the number of distinct values of c for any (a,b) pair is small (less
than 10). Queries that read the data for graphing are of the form:

. where a=constant and b=constant and ts between x and y

One of the things I need to do periodically is roll up this data, e.g.
calculate min / max / average values of a given datum over a period for use
in graphing at larger time scales. The rollup tables use the same schema as
the raw data ones.

There are about 60 different values of b, and for each such value there is a
exactly one type of rollup. The old code is doing the rollups in Postgres
with 60 bulk insert into  select statements, hence the need for the
second index.

For rollups which are instrinsic SQL functions, and only depend on the value
column, this is straightforward, e.g. to take the averages:

insert into rollup_table
select a, 17, c, '2009-02-22 19:00', avg(value)
from data_table
where b=17
and ts between '2009-02-22 18:00' and '2009-02-22 19:00'
group by a, c;

So far so good.

This gets slightly more tricky if the rollup you want to do isn't a simple
SQL function  I have two such cases, (a) the most recent value and (b)
the modal (most common) value. In both cases, I've been doing this using a
dependent subquery in the select clause, e.g. for the most recent value:

insert into rollup_table
select a, 23, c, '2009-02-23 00:00',
   (select y.value from
  (select z.value, z.ts from data_table z where z.a=.a and z.b=x.b and
z.c=x.c) y
   order by y.ts desc limit 1)
from data_table x
where b=23
group by a, c;

Due to the second index, this actually performs quite well, but it of course
depends on doing a small index scan on the same data_table for each row of
the outer select. I have a few questions:

*Q1.* Is this the right place for the subquery, or would it be better to
move it to the where clause, e.g.

insert into rollup_table
select a, 23, c, '2009-02-23 00:00', value
from data_table x
where b=23 and
   ts=(select max(y.ts) from data_table y where y.a=x.a and y.b=23 and
y.c=x.c)
group by a, c;

or does the answer depend on row stats? Is there a rule of thumb on whether
to prefer limit 1 vs. max? There will be between 20 and 300 rows with
different timestamps for each (a,b,c) tuple.


For better scalability, I am partitioning these tables by time  I am not
using PG's table inheritance and triggers to do the partitioning, but
instead dynamically generating the SQL and table names in the application
code (Java). In most cases, the rollups will still happen from a single
source data_table and I plan to continue using the existing SQL, but I
have a few cases where the source data_table rows may actually come from
two adjacent tables.

For intrinsic SQL rollup functions like avg / min / max this is pretty easy:

insert into rollup_table
select a, 17, c, '... date ...', max(value) from (
   select  from data_table_1 where b=17 and ...
union all
   select  from data_table_2 where b=17 and ...
)
group by a, c;


*Q2.* Is there any benefit to doing rollups (max() and group by) in the
inner queries inside the UNION ALL, or is it a wash? For now I'm expecting
each inner select to produce 20-200 rows per unqiue (a,b,c) combination, and
to be unioning no more than 2 tables at a time. (I'm aware that max() and
min() are composable, but that doing this with avg() and getting correct
results involves using sum's and count's' in the subqueries and doing a
division in the outer select, but AFAICT that's orthogonal to the
performance considerations of whether to do the inner rollups or not).


*Q3.* (You saw it coming) What really has me wrapped around the axle is how
to do the most recent and mode rollups efficiently in this scenario 
in the examples in Q1 above, in both cases the SQL references the
data_tabletwice, which is fine if it's a real table, but in the
partiioned scenario
the data has to be pulled together at run time by the UNION ALL subquery,
and I'd prefer not to execute it multiple times - I'm not much of a SQL
wordsmith but I can't see way to write this in plain SQL without repeating
the subquery, e.g.

insert into rollup_table
select a, 23, c, '2009-02-23 00:00',
   (select y.value from
  (select z.value, z.ts from *(... union all ...)* z where z.a=x.a and
z.b=23 and z.c=x.c) y
   order by