I thought that post mentioned that the plan
was one statement in an iteration, and that the cache would have
been primed by a previous query checking whether there were any rows
to update.  If that was the case, it might be worthwhile to look at
the entire flow of an iteration.

This is the only SQL query in the code in question - the rest of the code manages the looping and commit. The code was copied to PgAdminIII and values written in for the WHERE clause. In order for me to validate that rows would have been updated, I had to run a SELECT with the same WHERE clause in PgAdminIII first to see how many rows would have qualified. But this was for testing purposes only. The SELECT statement does not exist in the code. The vast majority of the rows that will be processed will be updated as this is a backfill to synch the old rows with the values being filled into new columns now being inserted.

Also, if you ever responded with version and configuration
information, I missed it.

This is hosted on a new server the client set up so I am waiting for the exact OS and hardware config. PG Version is PostgreSQL 8.3.6, compiled by Visual C++ build 1400, OS appears to be Windows 2003 x64 Server.

More than anything, I am more concerned with the long-term use of the system. This particular challenge with the 500M row update is one thing, but I am concerned about the exceptional effort required to do this. Is it REALLY this exceptional to want to update 500M rows of data in this day and age? Or is the fact that we are considering dumping and restoring and dropping indexes, etc to do all an early warning that we don't have a solution that is scaled to the problem?

Config data follows (I am assuming commented values which I did not include are defaulted).

Carlo

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to