Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?
Erik Norvelle <[EMAIL PROTECTED]> writes: > Here's the query I am running: > update indethom > set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track > of how fast the query is running > sectref = (select clavis from s2.sectiones where > s2.sectiones.nomeoper = indethom.nomeoper > and s2.sectiones.refere1a = indethom.refere1a and > s2.sectiones.refere1b = indethom.refere1b > and s2.sectiones.refere2a = indethom.refere2a and > s2.sectiones.refere2b = indethom.refere2b > and s2.sectiones.refere3a = indethom.refere3a and > s2.sectiones.refere3b = indethom.refere3b > and s2.sectiones.refere4a = indethom.refere4a and > s2.sectiones.refere4b = indethom.refere4b); > > Here´s the query plan: > QUERY PLAN > - > Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212) >SubPlan > -> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4) >Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND > (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND > (refere4a = $7) AND (refere4b = $8)) > (4 rows) Firstly, you might try running "vacuum full" on both tables. If there are tons of extra dead records that are left-over they could be slowing down the update. This isn't the fastest possible plan but it's pretty good. You might be able to get it somewhat faster using the non-standard "from" clause on the update statement. update indethom set sectref = clavis from sectiones where sectiones.nomeoper = indethom.nomeoper and sectiones.refere1a = indethom.refere1a and sectiones.refere1b = indethom.refere1b and sectiones.refere2a = indethom.refere2a and sectiones.refere2b = indethom.refere2b and sectiones.refere3a = indethom.refere3a and sectiones.refere3b = indethom.refere3b and sectiones.refere4a = indethom.refere4a and sectiones.refere4b = indethom.refere4b This might be able to use a merge join which will take longer to get started because it has to sort both tables, but might finish faster. You might also try just paring the index down to just the two or three most useful columns. Is it common that something matches refere1a and refere1b but doesn't match the remaining? A 8-column index is a lot of overhead. I'm not sure how much that effects lookup times but it might be substantial. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?
Erik Norvelle <[EMAIL PROTECTED]> writes: > update indethom > set query_counter =3D nextval('s2.query_counter_seq'), -- Just= > =20=20 > for keeping track of how fast the query is running > sectref =3D (select clavis from s2.sectiones where > s2.sectiones.nomeoper =3D indethom.nomeoper > and s2.sectiones.refere1a =3D indethom.refere1a and=20=20 > s2.sectiones.refere1b =3D indethom.refere1b > and s2.sectiones.refere2a =3D indethom.refere2a and=20=20 > s2.sectiones.refere2b =3D indethom.refere2b > and s2.sectiones.refere3a =3D indethom.refere3a and=20=20 > s2.sectiones.refere3b =3D indethom.refere3b > and s2.sectiones.refere4a =3D indethom.refere4a and=20=20 > s2.sectiones.refere4b =3D indethom.refere4b); This is effectively forcing a nestloop-with-inner-indexscan join. You might be better off with update indethom set query_counter = nextval('s2.query_counter_seq'), sectref = sectiones.clavis from s2.sectiones where s2.sectiones.nomeoper = indethom.nomeoper and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b and s2.sectiones.refere2a = indethom.refere2a and s2.sectiones.refere2b = indethom.refere2b and s2.sectiones.refere3a = indethom.refere3a and s2.sectiones.refere3b = indethom.refere3b and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b; regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Update performance ... is 200,000 updates per hour what I should expect?
Folks: I´m running a query which is designed to generate a foreign key for a table of approx. 10 million records (I've mentioned this in an earlier posting). The table is called "indethom", and each row contains a single word from the works of St. Thomas Aquinas, along with grammatical data about the word form, and (most importantly for my current problem) a set of columns identifying the particular work/section/paragraph that the word appears in. This database is completely non-normalized, and I'm working on performing some basic normalization, beginning with creating a table called "s2.sectiones" which (naturally) contains a complete listing of all of the sections of all the works of St. Thomas. I will then eliminate this information from the original "indethom" table, replacing it with the foreign key I am currently generating. ** My question has to do with whether or not I am getting maximal speed out of PostgreSQL, or whether I need to perform further optimizations. I am currently getting about 200,000 updates per hour, and updating the entire 10 million rows thus requires 50 hours, which seems a bit much. Here's the query I am running: update indethom set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track of how fast the query is running sectref = (select clavis from s2.sectiones where s2.sectiones.nomeoper = indethom.nomeoper and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b and s2.sectiones.refere2a = indethom.refere2a and s2.sectiones.refere2b = indethom.refere2b and s2.sectiones.refere3a = indethom.refere3a and s2.sectiones.refere3b = indethom.refere3b and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b); Here´s the query plan: QUERY PLAN - Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212) SubPlan -> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4) Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8)) (4 rows) Note: I have just performed a VACUUM ANALYZE on the indethom table, as suggested by this listserve. Here's the structure of the s2.sectiones table: it=> \d s2.sectiones Table "s2.sectiones" Column | Type | Modifiers --+--+--- nomeoper | character(3) | refere1a | character(2) | refere1b | character(2) | refere2a | character(2) | refere2b | character(2) | refere3a | character(2) | refere3b | character(2) | refere4a | character(2) | refere4b | character(2) | clavis | integer | Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b) Finally, here is the structure of indethom (some non-relevant columns not shown): it=> \d indethom Table "public.indethom" Column | Type | Modifiers ---+---+--- numeoper | smallint | not null nomeoper | character(3) | not null editcrit | character(1) | refere1a | character(2) | refere1b | character(2) | refere2a | character(2) | refere2b | character(2) | refere3a | character(2) | refere3b | character(2) | refere4a | character(2) | refere4b | character(2) | refere5a | character(2) | not null refere5b | smallint | not null referen6 | smallint | not null ... several columns skipped ... verbum| character varying(22) | not null ... other columns skipped ... poslinop | integer | not null posverli | smallint | not null posverop | integer | not null clavis| integer | not null articref | integer | sectref | integer | query_counter | integer | Indexes: indethom_pkey primary key btree (clavis), indethom_articulus_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b), indethom_sectio_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b), verbum_ndx btree (verbum) Thanks for your assistance! -Erik Norvelle