On 17 November 2012 21:20, Jeff Davis <pg...@j-davis.com> wrote: >> ISTM that we should tune that specifically by performing a VM lookup >> for next 32 pages (or more), so we reduce the lookups well below 1 per >> page. That way the overhead of using the VM will be similar to using >> the PD_ALL_VISIBLE. > > That's another potential way to mitigate the effects during a scan, but > it does add a little complexity. Right now, it share locks a buffer, and > uses an array with one element for each tuple in the page. If > PD_ALL_VISIBLE is set, then it marks all of the tuples *currently > present* on the page as visible in the array, and then releases the > share lock. Then, when reading the page, if another tuple is added > (because we released the share lock and only have a pin), it doesn't > matter because it's already invisible according to the array. > > With this approach, we'd need to keep a larger array to represent many > pages. And it sounds like we'd need to share lock the pages ahead, and > find out which items are currently present, in order to properly fill in > the array. Not quite sure what to do there, but would require some more > thought.
Hmm, that's too much and not really what I was thinking, but I concede that was a little vague. No need for bigger arrays etc.. If we check the VM for next N blocks, then we know that all completed transactions are commited. Yes, the VM can change, but that is not a problem. What I mean is that we keep an array of boolean[N] that simply tracks what the VM said last time we checked it. If that is true for a block then we do special processing, similar to the current all-visible path and yet different, desribed below. What we want is to do a HeapTupleVisibility check that does not rely on tuple hints AND yet avoids all clog access. So when we scan a buffer in page mode and we know the VM said it was all visible we still check each tuple's visibility. If xid is below snapshot xmin then the xid is known committed and the tuple is visible to this scan (not necessarily all scans). We know this because the VM said this page was all-visible AFTER our snapshot was taken. If tuple xid is within snapshot or greater than snapshot xmax then the tuple is invisible to our snapshot and we don't need to check clog. So once we know the VM said the page was all visible we do not need to check clog to establish visibility, we only need to check the tuple xmin against our snapshot xmin. So the VM can change under us and it doesn't matter. We don't need a pin or lock on the VM, we just read it and let go. No race conditions, no fuss. The difference here is that we still need to check visibility of each tuple, but that can be a very cheap check and never involves clog, nor does it dirty the page. Tuple access is reasonably expensive in comparison with a clog-less check on tuple xmin against snapshot xmin, so the extra work is negligible. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers