The table is paprospect2, as below:
The columns pfeature_id..confidence and run_id_new (in red) are from an inherited table. Although the inheritance itself is probably not relevant here (correction welcome), I suspect it may be relevant that all existing rows were written before the table definition included run_id_new. p2thread_i_trigger is defined fires on insert only (not update).\d paprospect2 Column | Type | Modifiers -------------+---------+------------------------------------------------------------------- pfeature_id | integer | not null default nextval('unison.pfeature_pfeature_id_seq'::text) pseq_id | integer | not null pftype_id | integer | not null start | integer | stop | integer | confidence | real | run_id | integer | not null [snip 13 integer and real columns] run_id_new | integer | Indexes: paprospect2_redundant_alignment unique btree (pseq_id, "start", stop, run_id, pmodel_id), p2thread_p2params_id btree (run_id), p2thread_pmodel_id btree (pmodel_id) Foreign Key constraints: pftype_id_exists FOREIGN KEY (pftype_id) REFERENCES pftype(pftype_id) ON UPDATE CASCADE ON DELETE CASCADE, p2thread_pmodel_id_exists FOREIGN KEY (pmodel_id) REFERENCES pmprospect2(pmodel_id) ON UPDATE CASCADE ON DELETE CASCADE, pseq_id_exists FOREIGN KEY (pseq_id) REFERENCES pseq(pseq_id) ON UPDATE CASCADE ON DELETE CASCADE Triggers: p2thread_i_trigger
paprospect2 contains ~40M rows. The goal now is to migrate the data to the supertable-inherited column with
update paprospect2 set run_id_new=run_id;
The update's been running for 5 hours (unloaded dual 2.4 GHz Xeon w/2GB RAM, SCSI160 10K drive). There are no other jobs running. Load is ~1.2 and the update's using ~3-5% of the CPU.
This suggests that the update is I/O bound (duh) and vmstat supports this:$ ps -ostime,time,pcpu,cmd 28701 STIME TIME %CPU CMD 12:18 00:07:19 2.3 postgres: admin csb 128.137.116.213 UPDATE
$ vmstat 1 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 1 0 0 11288 94632 3558960 0 0 14 6 12 21 1 0 6 0 1 0 0 12044 94632 3558956 0 0 0 972 332 16 0 1 99 0 1 0 0 11092 94632 3558932 0 0 16 4420 309 25 0 2 97 0 1 0 0 11456 94636 3558928 0 0 0 980 326 23 0 1 99 1 0 0 0 12340 94636 3558924 0 0 16 532 329 14 0 0 100 0 1 0 0 12300 94636 3558916 0 0 0 1376 324 16 1 0 99 0 1 0 0 12252 94636 3558904 0 0 16 1888 325 18 0 0 99 0 1 0 0 11452 94636 3558888 0 0 16 2864 324 23 1 1 98 0 1 0 0 12172 94636 3558884 0 0 0 940 320 12 0 1 99 0 1 0 0 12180 94636 3558872 0 0 16 1840 318 22 0 1 99 0 1 0 0 11588 94636 3558856 0 0 0 2752 312 16 1 2 97
Presumably the large number of blocks written (bo) versus blocks read (bi) reflects an enormous amount of bookkeeping that has to be done for MVCC, logging, perhaps rewriting a row for the new definition (a guess -- I don't know how this is handled), indicies, etc. There's no swapping and no processes are waiting. In short, it seems that this is ENTIRELY an I/O issue. Obviously, faster drives will help (but probably only by small factor).
Any ideas how I might speed this up? Presumably this is all getting wrapped in a transaction -- does that hurt me for such a large update?
Thanks,
Reece
Bonus diversionary topic: In case it's not obvious, the motivation for this is that the subtable (paprospect2) contains a column (run_id) whose definition I would like to migrate to the inherited table (i.e., the 'super-table'). Although postgresql permits adding a column to a supertable with the same name as an extant column in a subtable, it appears that such "merged definition" columns do not have the same properties as a typical inherited column. In particular, dropping the column from the supertable does not drop it from the subtable (but renaming it does change both names). Hmm.
-- Reece Hart, Ph.D. [EMAIL PROTECTED], http://www.gene.com/ Genentech, Inc. 650/225-6133 (voice), -5389 (fax) Bioinformatics and Protein Engineering 1 DNA Way, MS-93 http://www.in-machina.com/~reece/ South San Francisco, CA 94080-4990 [EMAIL PROTECTED], GPG: 0x25EC91A0 |