On Fri, Dec 23, 2011 at 11:14 AM, Kevin Grittner <kevin.gritt...@wicourts.gov> wrote: > Thoughts?
Those are good thoughts. Here's another random idea, which might be completely nuts. Maybe we could consider some kind of summarization of CLOG data, based on the idea that most transactions commit. We introduce the idea of a CLOG rollup page. On a CLOG rollup page, each bit represents the status of N consecutive XIDs. If the bit is set, that means all XIDs in that group are known to have committed. If it's clear, then we don't know, and must fall through to a regular CLOG lookup. If you let N = 1024, then 8K of CLOG rollup data is enough to represent the status of 64 million transactions, which means that just a couple of pages could cover as much of the XID space as you probably need to care about. Also, you would need to replace CLOG summary pages in memory only very infrequently. Backends could test the bit without any lock. If it's set, they do pg_read_barrier(), and then check the buffer label to make sure it's still the summary page they were expecting. If so, no CLOG lookup is needed. If the page has changed under us or the bit is clear, then we fall through to a regular CLOG lookup. An obvious problem is that, if the abort rate is significantly different from zero, and especially if the aborts are randomly mixed in with commits rather than clustered together in small portions of the XID space, the CLOG rollup data would become useless. On the other hand, if you're doing 10k tps, you only need to have a window of a tenth of a second or so where everything commits in order to start getting some benefit, which doesn't seem like a stretch. Perhaps the CLOG rollup data wouldn't even need to be kept on disk. We could simply have bgwriter (or bghinter) set the rollup bits in shared memory for new transactions, as it becomes possible to do so, and let lookups for XIDs prior to the last shutdown fall through to CLOG. Or, if that's not appealing, we could reconstruct the data in memory by groveling through the CLOG pages - or maybe just set summary bits only for CLOG pages that actually get faulted in. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers