Gregory Stark wrote:
I'm also a bit concerned that *how many hint bits* isn't enough information to
determine how important it is to write out the page.

Agreed, that doesn't seem like a very good metric to me either.

Or how many *unhinted* xmin/xmax
values were found? If HTSV can hint xmin for a tuple but finds xmax still in
progress perhaps that's a good sign it's not worth dirtying the page?

I like that thought.

Overall, I feel that we should never dirty when setting a hint bit, just set the separate buffer flag to indicate that hint bits have been set. The decision to dirty and write out, or not, should be delayed until we're about to write/replace the buffer. That is, in bgwriter.

How about this strategy:

1. First of all, before writing a dirty buffer, scan all tuples on the page and set all hint bits that can be set. This will hopefully save us from having to dirty the page again in the future, when another tuple on the page is accessed. This has been proposed before, and IIRC Tom has argued that it's a modularity violation for bgwriter to access the contents of pages like that, but I'm sure we can find a way to do it safely.

2. When bgwriter encounters a page that's marked as "hint bits dirty", write it only if *all* hint bits on the page has been, or can be, set. Dirtying a page before that point doesn't seem worthwhile, as the next access to the tuple that doesn't have all the hint bits set will have to dirty the page again.

Actually, I'd like to see some benchmarks on an even simpler strategy:
just never dirty a page just because a hint bit has been set. It might work surprisingly well in practice: If a database is I/O bound, we don't care about the extra CPU work or lock congestion of checking the clog. If it's CPU bound, the active pages that matter are in the buffer cache, and so are the hint bits for those pages.

  Heikki Linnakangas

Sent via pgsql-patches mailing list (
To make changes to your subscription:

Reply via email to