subject:"\[PERFORM\] Update performance ... is 200,000 updates per hour what I should expect\?"

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Greg Stark

Erik Norvelle <[EMAIL PROTECTED]> writes:

> Here's the query I am running:
> update indethom
>   set query_counter = nextval('s2.query_counter_seq'), -- Just for keeping track 
> of how fast the query is running
>   sectref = (select clavis from s2.sectiones where
>   s2.sectiones.nomeoper = indethom.nomeoper
>   and s2.sectiones.refere1a = indethom.refere1a and 
> s2.sectiones.refere1b = indethom.refere1b
>   and s2.sectiones.refere2a = indethom.refere2a and 
> s2.sectiones.refere2b = indethom.refere2b
>   and s2.sectiones.refere3a = indethom.refere3a and 
> s2.sectiones.refere3b = indethom.refere3b
>   and s2.sectiones.refere4a = indethom.refere4a and 
> s2.sectiones.refere4b = indethom.refere4b);
> 
> Here´s the query plan:
> QUERY PLAN
> -
>  Seq Scan on indethom (cost=0.00..1310352.72 rows=10631972 width=212)
>SubPlan
>  -> Index Scan using sectiones_ndx on sectiones (cost=0.00..6.03 rows=1 width=4)
>Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND 
> (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND 
> (refere4a = $7) AND (refere4b = $8))
> (4 rows)

Firstly, you might try running "vacuum full" on both tables. If there are tons
of extra dead records that are left-over they could be slowing down the
update.

This isn't the fastest possible plan but it's pretty good.

You might be able to get it somewhat faster using the non-standard "from"
clause on the update statement.

update indethom
   set sectref = clavis
  from sectiones
 where sectiones.nomeoper = indethom.nomeoper
   and sectiones.refere1a = indethom.refere1a
   and sectiones.refere1b = indethom.refere1b
   and sectiones.refere2a = indethom.refere2a
   and sectiones.refere2b = indethom.refere2b
   and sectiones.refere3a = indethom.refere3a
   and sectiones.refere3b = indethom.refere3b
   and sectiones.refere4a = indethom.refere4a
   and sectiones.refere4b = indethom.refere4b

This might be able to use a merge join which will take longer to get started
because it has to sort both tables, but might finish faster.

You might also try just paring the index down to just the two or three most
useful columns. Is it common that something matches refere1a and refere1b but
doesn't match the remaining? A 8-column index is a lot of overhead. I'm not
sure how much that effects lookup times but it might be substantial.

-- 
greg

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Tom Lane

Erik Norvelle <[EMAIL PROTECTED]> writes:
> update indethom
>   set query_counter =3D nextval('s2.query_counter_seq'),   -- Just=
> =20=20
> for keeping track of how fast the query is running
>   sectref =3D (select clavis from s2.sectiones where
>   s2.sectiones.nomeoper =3D indethom.nomeoper
>   and s2.sectiones.refere1a =3D indethom.refere1a and=20=20
> s2.sectiones.refere1b =3D indethom.refere1b
>   and s2.sectiones.refere2a =3D indethom.refere2a  and=20=20
> s2.sectiones.refere2b =3D indethom.refere2b
>   and s2.sectiones.refere3a =3D indethom.refere3a  and=20=20
> s2.sectiones.refere3b =3D indethom.refere3b
>   and s2.sectiones.refere4a =3D indethom.refere4a and=20=20
> s2.sectiones.refere4b =3D indethom.refere4b);

This is effectively forcing a nestloop-with-inner-indexscan join.  You
might be better off with

update indethom
set query_counter = nextval('s2.query_counter_seq'),
sectref = sectiones.clavis
from s2.sectiones
where
s2.sectiones.nomeoper = indethom.nomeoper
and s2.sectiones.refere1a = indethom.refere1a and  
s2.sectiones.refere1b = indethom.refere1b
and s2.sectiones.refere2a = indethom.refere2a  and  
s2.sectiones.refere2b = indethom.refere2b
and s2.sectiones.refere3a = indethom.refere3a  and  
s2.sectiones.refere3b = indethom.refere3b
and s2.sectiones.refere4a = indethom.refere4a and  
s2.sectiones.refere4b = indethom.refere4b;

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

2003-12-02 Thread Erik Norvelle

Folks:

I´m running a query which is designed to generate a foreign key for a table of approx. 10 million records (I've mentioned this in an earlier posting).  The table is called "indethom", and each row contains a single word from the works of St. Thomas Aquinas, along with grammatical data about the word form, and (most importantly for my current problem) a set of columns identifying the particular work/section/paragraph that the word appears in.

This database is completely non-normalized, and I'm working on performing some basic normalization, beginning with creating a table called "s2.sectiones" which (naturally) contains a complete listing of all of the sections of all the works of St. Thomas.  I will then eliminate this information from the original "indethom" table, replacing it with the foreign key I am currently generating.

** My question has to do with whether or not I am getting maximal speed out of PostgreSQL, or whether I need to perform further optimizations.  I am currently getting about 200,000 updates per hour, and updating the entire 10 million rows thus requires 50 hours, which seems a bit much.

Here's the query I am running: 
update indethom 
set query_counter = nextval('s2.query_counter_seq'),   -- Just for keeping track of how fast the query is running
sectref = (select clavis from s2.sectiones where 
s2.sectiones.nomeoper = indethom.nomeoper 
and s2.sectiones.refere1a = indethom.refere1a and s2.sectiones.refere1b = indethom.refere1b 
and s2.sectiones.refere2a = indethom.refere2a  and s2.sectiones.refere2b = indethom.refere2b 
and s2.sectiones.refere3a = indethom.refere3a  and s2.sectiones.refere3b = indethom.refere3b 
and s2.sectiones.refere4a = indethom.refere4a and s2.sectiones.refere4b = indethom.refere4b);

Here´s the query plan:
QUERY PLAN
-
Seq Scan on indethom  (cost=0.00..1310352.72 rows=10631972 width=212)
SubPlan
->  Index Scan using sectiones_ndx on sectiones  (cost=0.00..6.03 rows=1 width=4)
Index Cond: ((nomeoper = $0) AND (refere1a = $1) AND (refere1b = $2) AND (refere2a = $3) AND (refere2b = $4) AND (refere3a = $5) AND (refere3b = $6) AND (refere4a = $7) AND (refere4b = $8))
(4 rows)

Note:  I have just performed a VACUUM ANALYZE on the indethom table, as suggested by this listserve.

Here's the structure of the s2.sectiones table:
it=> \d s2.sectiones
Table "s2.sectiones"
Column  | Type | Modifiers 
--+--+---
nomeoper | character(3) | 
refere1a | character(2) | 
refere1b | character(2) | 
refere2a | character(2) | 
refere2b | character(2) | 
refere3a | character(2) | 
refere3b | character(2) | 
refere4a | character(2) | 
refere4b | character(2) | 
clavis   | integer  | 
Indexes: sectiones_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b)

Finally, here is the structure of indethom (some non-relevant columns not shown):
it=> \d indethom
Table "public.indethom"
Column | Type  | Modifiers 
---+---+---
numeoper  | smallint  | not null
nomeoper  | character(3)  | not null
editcrit  | character(1)  | 
refere1a  | character(2)  | 
refere1b  | character(2)  | 
refere2a  | character(2)  | 
refere2b  | character(2)  | 
refere3a  | character(2)  | 
refere3b  | character(2)  | 
refere4a  | character(2)  | 
refere4b  | character(2)  | 
refere5a  | character(2)  | not null
refere5b  | smallint  | not null
referen6  | smallint  | not null
... several columns skipped ...
verbum| character varying(22) | not null
... other columns skipped ...
poslinop  | integer   | not null
posverli  | smallint  | not null
posverop  | integer   | not null
clavis| integer   | not null
articref  | integer   | 
sectref   | integer   | 
query_counter | integer   | 
Indexes: indethom_pkey primary key btree (clavis),
indethom_articulus_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b),
indethom_sectio_ndx btree (nomeoper, refere1a, refere1b, refere2a, refere2b, refere3a, refere3b, refere4a, refere4b),
verbum_ndx btree (verbum)

Thanks for your assistance!
-Erik Norvelle

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

Re: [PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

[PERFORM] Update performance ... is 200,000 updates per hour what I should expect?

3 matches

Site Navigation

Mail list logo

Footer information