I agree, it does seem a little slow. But I've found that in working with large tables, it's often significantly faster to create a new table (CREATE TABLE new_table AS SELECT ...) than it is to UPDATE an existing table. For all intents and purposes, an UPDATE really is a DELETE statement followed by an INSERT statement. That and updating any indexes makes UPDATEs extremely slow. So creating a new table from scratch is already going to be twice as fast as an update (since there is only an INSERT to perform) ... and you don't have to perform a VACUUM FULL afterwards either because the new table doesn't have any dead tuples in it!

I once worked with a large and wide table of 270million tuples. An update took over 2 hours to do. A CREATE TABLE AS statement did the job in 10 min.

-- Kevin

Paul Ramsey wrote:
It seems slower than I might expect. Is your table really wide? (100s
of columns, or some really big objects?) If the rows are all toasted
that would make updates and reads both a good deal slower.

P

On Tue, May 5, 2009 at 8:40 PM, Ben Madin <[email protected]> wrote:
G'day all,

I realise that there are probably many reasons, but I have a table with
about 4 million tuples reflecting start and end point id's over two years.

I haven't been able to workable join these on the fly, so I have added three
columns, one for the geometry of the start point, one for the geometry of
the end point, and one for the (direct) line between them.

I then updated the start and end point columns (took around 20 minutes for
each) by a UPDATE ... FROM ... JOIN syntax.

then this morning reduced the fillfactor of the table to 50% to try to make
writes easier and I used st_makeline to create the third geometry (the
line). the query plan didn't look very complex (unfortunately I didn't
explain analyze):

update nlis set line_geom = st_makeline(s_geom, d_geom) where s_geom is not
null and d_geom is not null;

                          QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on nlis  (cost=0.00..194364.48 rows=3654306 width=246)
  Filter: ((s_geom IS NOT NULL) AND (d_geom IS NOT NULL))
(2 rows)

UPDATE 3701850

duration: 1 743 563.733 ms

Is 30 minutes reasonable for this sort of query?

I have increased max_connection, shared_buffers, effective cache size etc
and kern.sysv.shmmax and kern.sysv.shmall and a few other tuning bits and
pieces based on information at
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server. This is
running on a 2.4 GHz Core Duo (Mac OS X) with 2GB RAM and I notice that the
postgres process seems to only variably run between 10 and 45% of total CPU.
Disk writes seemed to come in packets every few seconds, with nothing much
in between.

I could be worrying about nothing, but I see posts of people with millions
and millions of records, so I guess I'm just checking.

cheers

Ben

--

Ben Madin
REMOTE INFORMATION

t : +61 8 9192 5455
f : +61 8 9192 5535
m : 0448 887 220
Broome   WA   6725

[email protected]



                                                       Out here, it pays to
know...


_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to