On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov <[EMAIL PROTECTED]> wrote:
 I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it will scroll
through the memory reasonably fast.

In Postgres, an update is the same as a delete/insert.  That means that 
changing the data in one column rewrites ALL of the columns for that row, and 
you end up with a table that's 50% dead space, which you then have to vacuum.

Sometimes if you have a "volatile" column that goes with several "static" 
columns, you're far better off to create a second table for the volatile data, duplicating the 
primary key in both tables.  In your case, it would mean the difference between 10^8 inserts of 
(int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes 
of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow).

The down side of this design is that later on, it requires a join to fetch all 
the data for each key.

You do have a primary key on your data, right?  Or some sort of index?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to