I agree, it does seem a little slow. But I've found that in working
with large tables, it's often significantly faster to create a new table
(CREATE TABLE new_table AS SELECT ...) than it is to UPDATE an existing
table. For all intents and purposes, an UPDATE really is a DELETE
statement followed by an INSERT statement. That and updating any
indexes makes UPDATEs extremely slow. So creating a new table from
scratch is already going to be twice as fast as an update (since there
is only an INSERT to perform) ... and you don't have to perform a VACUUM
FULL afterwards either because the new table doesn't have any dead
tuples in it!
I once worked with a large and wide table of 270million tuples. An
update took over 2 hours to do. A CREATE TABLE AS statement did the job
in 10 min.
-- Kevin
Paul Ramsey wrote:
It seems slower than I might expect. Is your table really wide? (100s
of columns, or some really big objects?) If the rows are all toasted
that would make updates and reads both a good deal slower.
P
On Tue, May 5, 2009 at 8:40 PM, Ben Madin <[email protected]> wrote:
G'day all,
I realise that there are probably many reasons, but I have a table with
about 4 million tuples reflecting start and end point id's over two years.
I haven't been able to workable join these on the fly, so I have added three
columns, one for the geometry of the start point, one for the geometry of
the end point, and one for the (direct) line between them.
I then updated the start and end point columns (took around 20 minutes for
each) by a UPDATE ... FROM ... JOIN syntax.
then this morning reduced the fillfactor of the table to 50% to try to make
writes easier and I used st_makeline to create the third geometry (the
line). the query plan didn't look very complex (unfortunately I didn't
explain analyze):
update nlis set line_geom = st_makeline(s_geom, d_geom) where s_geom is not
null and d_geom is not null;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on nlis (cost=0.00..194364.48 rows=3654306 width=246)
Filter: ((s_geom IS NOT NULL) AND (d_geom IS NOT NULL))
(2 rows)
UPDATE 3701850
duration: 1 743 563.733 ms
Is 30 minutes reasonable for this sort of query?
I have increased max_connection, shared_buffers, effective cache size etc
and kern.sysv.shmmax and kern.sysv.shmall and a few other tuning bits and
pieces based on information at
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server. This is
running on a 2.4 GHz Core Duo (Mac OS X) with 2GB RAM and I notice that the
postgres process seems to only variably run between 10 and 45% of total CPU.
Disk writes seemed to come in packets every few seconds, with nothing much
in between.
I could be worrying about nothing, but I see posts of people with millions
and millions of records, so I guess I'm just checking.
cheers
Ben
--
Ben Madin
REMOTE INFORMATION
t : +61 8 9192 5455
f : +61 8 9192 5535
m : 0448 887 220
Broome WA 6725
[email protected]
Out here, it pays to
know...
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users