Coming from the MSSQL world, I'm used to the first step in optimization to be, choose your clustered index and choose it well. I see that PG has a one-shot CLUSTER command, but doesn't support continuously-updated clustered indexes. What I infer from newsgroup browsing is, such an index is impossible, given the MVCC versioning of records (happy to learn I'm wrong). I'd be curious to know what other people, who've crossed this same bridge from MSSQL or Oracle or Sybase to PG, have devised, faced with the same kind of desired performance gain for retrieving blocks of rows with the same partial key.
Just to let people know, after trying various options, this looks the most promising:
- segment the original table into four tables (call them A,B,C,D)
- all insertions go into A. - longterm data lives in B.
- primary keys of all requests to delete rows from (B) go into D -- no actual deletions are done against B. Deletions against A happen as normal.
- all queries are made against a view: a union of A and B and (not exists) D.
- daily merge A,B and (where not exists...) D, into C - run cluster on C, then swap names on B and C, truncate A and D.
Not rocket science, but it seems to give the payback of normal clustering without locking the table for long periods of time. It also saves on VACUUM FULL time.
At present, we're only at 1M rows in B on this. More when I know it.
Advance warning on any gotchas with this approach would be much appreciated. Making a complete copy of (B) is a bit of an ouch.
---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]