On Sun, 2010-02-07 at 21:33 -0500, Tom Lane wrote: > That last problem is easy to fix, but I'm not at all sure what to do > about the scan interlock problem. Thoughts?
AFAICS the problem doesn't exist in normal running. _bt_page_recyclable() tests against RecentXmin, which includes the xmins of read only transactions. So it doesn't matter if a read-only transaction still exists that is earlier than the value of opaque->btpo.xact when it is set. If it still there later then the page cannot be reused. A basic interlock approach can be put in place for Hot Standby. We just WAL log the reuse of a btree page in _bt_getbuf() just before we _bt_pageinit(), using transaction id that took that action. We can then conflict on that xid. - - For the TODO, I'm thinking whether there's a way to allow the page to be reused earlier and have it all just work. That would allow us to recycle index blocks faster and avoid index bloat from occurring in the presence of long lived transactions. Otherwise fixing this for the normal case will accentuate index bloat. It seems possible that a page can be reused and end up at exactly the same place in the index key space, so that the left link of the new page matches the right link of the page the scan just left. Most likely it would be in a different place entirely and so ignoring the issue will cause scans to potentially stop earlier than they should and we give an incomplete answer to a query. So we can't just re-check links to validate the page. The only thing we actually need to record about the old page is the right link, so perhaps we can store the right link value in a central place, together with visibility information. Make that info WAL-logged so it is available on standby also. That would allow us to find out whether we should read the page or use the right link info to move right. We then store a recycled-by transaction id on the new page we are recycling. When we scan onto a new page we check to see whether the page has been recycled by a transaction that we consider still in progress. If so, we consult the page-visibility info to see what the right link of the page was as far as our scan is concerned, then use that to continue our scan. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers