On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs <si...@2ndquadrant.com> wrote: >> I won't take responsibility for paying my neighbor's tax bill, but I >> might take responsibility for picking up his mail while he's on >> holiday. > > That makes it sound like this is an occasional, non-annoying thing. > > It's more like, whoever fetches the mail needs to fetch it for > everybody. So we are slowing down one person disproportionately, while > others fly through without penalty. There is no argument that one > workload necessarily needs to perform that on behalf of the other > workload.
Sure there is. It's called a tragedy of the commons - everybody acts in their own selfish interest (it's not *my* responsibility to limit grazing on public land, or prune this page that I'm not modifying) and as a result some resource that everybody cares about (grass, system-wide I/O) gets trashed to everyone's detriment. Purely selfish behavior can only be justified here if we assume that the selfish actor intends to participate in the system only once: I'm going to run one big reporting query which must run as fast as possible, and then I'm getting on a space ship to Mars. So if my refusal to do any pruning during that reporting query causes lots of extra I/O on the system ten minutes from now, I don't care, because I'll have left the playing field forever at that point. As Heikki points out, any HOT pruning operation generates WAL and has a CPU cost. However, pruning a page that is currently dirty *decreases* the total volume of writes to the data files, whereas pruning a page that is currently clean *increases* the total volume of writes to the data files. In the first case, if we prune the page right now while it's still dirty, we can't possibly cause any additional data-file writes, and we may save one, because it's possible that someone else would later have pruned it when it was clean and there was no other reason to dirty it. In the second case, if we prune the page that is currently clean, it will become dirty. That will cost us no additional I/O if the page is again modified before it's written out, but otherwise it costs an additional data file write. I think there's a big difference between those two cases. Sure, from the narrow point of view of how much work it takes this scan to process this page, it's always better not to prune. But if you make the more realistic assumption that you will keep on issuing queries on the system, then what you're doing to the overall system I/O load is pretty important. By the way, was anything ever done about this: http://www.postgresql.org/message-id/20140929091343.ga4...@alap3.anarazel.de That's just a workload that is 5/6th pgbench -S and 1/6th pgbench, which is in no way an unrealistic workload, and showed a significant regression with an earlier version of the patch. You seem very eager to commit this patch after four months of inactivity, but I think this is a pretty massive behavior change that deserves careful scrutiny before it goes in. If we push something that changes longstanding behavior and can't even be turned off, and it regresses behavior for a use case that common, our users are going to come after us with pitchforks. That's not to say some people won't be happy, but in my experience it takes a lot of happy users to make up for getting stabbed with even one pitchfork. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers