On Thu, Jan 9, 2014 at 12:21 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Wed, Jan 8, 2014 at 3:33 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
>>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>>> workloads, but it means that large SQL queries and pg_dump effectively
>>> do much the same work as VACUUM, generating huge amounts of I/O and
>>> WAL on the master, the cost and annoyance of which is experienced
>>> directly by the user. That is avoided on standbys.
>> On a pgbench workload, though, essentially all page cleanup happens as
>> a result of HOT cleanups, like >99.9%. It might be OK to have that
>> happen for write operations, but it would be a performance disaster if
>> updates didn't try to HOT-prune. Our usual argument for doing HOT
>> pruning even on SELECT cleanups is that not doing so pessimizes
>> repeated scans, but there are clearly cases that end up worse off as a
>> result of that decision.
> My recollection of the discussion when HOT was developed is that it works
> that way not because anyone thought it was beneficial, but simply because
> we didn't see an easy way to know when first fetching a page whether we're
> going to try to UPDATE some tuple on the page. (And we can't postpone the
> pruning, because the query will have tuple pointers into the page later.)
> Maybe we should work a little harder on passing that information down.
> It seems reasonable to me that SELECTs shouldn't be tasked with doing
> HOT pruning.
>> I'm not entirely wild about adding a parameter in this area because it
>> seems that we're increasingly choosing to further expose what arguably
>> ought to be internal implementation details.
> I'm -1 for a parameter as well, but I think that just stopping SELECTs
> from doing pruning at all might well be a win. It's at least worthy
> of some investigation.
Unfortunately, there's no categorical answer. You can come up with
workloads where HOT pruning on selects is a win; just create a bunch
of junk and then read the same pages lots of times in a row. And you
can also come up with workloads where it's a loss; create a bunch of
junk and then read them just once. I don't know how easy it's going
to be to set that parameter in a useful way for some particular
environment, and I think that's possibly an argument against having
it. But the argument that we don't need a parameter because one
behavior is best for everyone is not going to fly.
The Enterprise PostgreSQL Company
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: