Harry Mantheakis wrote:
The mystery remains, for me: why updating 100,000 records could complete in as quickly as 5 seconds, whereas an attempt to update a million records was still running after 25 minutes before we killed it?

The way you were doing this originally, it was joining every record in table A against every record in table B, finding the matches (note the sequential scans on each in the query plan you showed). Having A * B possible matches there was using up a bunch of resources to line those two up for an efficient join, and it's possible that parts of that required spilling working data over to disk and other expensive operations. And you were guaranteeing that every row in each table was being processed in some way.

Now, when you only took a small slice of A instead, and a small slice of B to match, this was likely using an index and working with a lot less rows in total--only ones in B that mattered were considered, not every one in B. And each processing slice was working on less rows, making it more likely to fit in memory, and thus avoiding both slow spill to disk operation and work that was less likely to fit into the system cache.

I don't know exactly how much of each of these two components went into your large run-time difference, but I suspect both were involved. The way the optimizer switches to using a sequential scan when doing bulk operations is often the right move. But if it happens in a way that causes the set of data to be processed to become much larger than RAM, it can be a bad decision. The performance drop when things stop fitting in memory is not a slow one, it's like a giant cliff you fall off.

--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to