Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On 09/27/2014 09:36 AM, Peter Geoghegan wrote: On Fri, Sep 26, 2014 at 11:34 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have observed that this patch is in 'Needs Review' state for next CF. Do you expect any further review from myside? I think we can use text recommended by Heikki and after that if you feel something more is still required, then you can update the same and send an updated patch. I believe it should be in 'Waiting On Author' state, please do let me know if you feel otherwise. I don't think so. Thanks for the review! Ok, applied those extra paragraphs now, and marked as committed in the commitfest. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Mon, Nov 24, 2014 at 3:51 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Ok, applied those extra paragraphs now, and marked as committed in the commitfest. Thanks! -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 12, 2014 at 10:54 PM, Peter Geoghegan p...@heroku.com wrote: It isn't. It's a minor point, originally raised by Amit. I have observed that this patch is in 'Needs Review' state for next CF. Do you expect any further review from myside? I think we can use text recommended by Heikki and after that if you feel something more is still required, then you can update the same and send an updated patch. I believe it should be in 'Waiting On Author' state, please do let me know if you feel otherwise. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 26, 2014 at 11:34 PM, Amit Kapila amit.kapil...@gmail.com wrote: I have observed that this patch is in 'Needs Review' state for next CF. Do you expect any further review from myside? I think we can use text recommended by Heikki and after that if you feel something more is still required, then you can update the same and send an updated patch. I believe it should be in 'Waiting On Author' state, please do let me know if you feel otherwise. I don't think so. Thanks for the review! -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On 09/11/2014 11:47 PM, Peter Geoghegan wrote: On Tue, Sep 9, 2014 at 12:01 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: + Although it could be argued that Lehman and Yao isn't followed to the + letter because single pages are read locked as the tree is descended, + this is at least necessary to support deletion, a common requirement + which LY hardly acknowledge. Read locks also ensure that B-tree + pages are self-consistent (LY appear to assume atomic page reads and + writes). This is just duplicating the existing paragraph. I don't see the point of this. The point is to make clear the reason for the differences - evidently, Amit felt it was unclear why we don't follow LY. I am suggesting that LY's talk of having no read locks is unrealistic (it might be realistic in the 21st century, but that's a whole other story). IMHO the existing paragraph does a much better job explaining that: Lehman and Yao don't require read locks, but assume that in-memory copies of tree pages are unshared. Postgres shares in-memory buffers among backends. As a result, we do page-level read locking on btree pages in order to guarantee that no record is modified while we are examining it. Amit: did you notice that paragraph in the README? If not, and now that you have read it, does that paragraph make things clear? If that's not enough, what do you think is missing? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 12, 2014 at 2:15 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Amit: did you notice that paragraph in the README? If not, and now that you have read it, does that paragraph make things clear? If that's not enough, what do you think is missing? Amit raised the fact that LY say that no read locks are required, so clearly he read that paragraph. We don't try and justify that right now, and trying to explain the whole point of LY without first explaining this satisfactorily seems odd. The current text reads: Lehman and Yao don't require read locks, but assume that in-memory copies of tree pages are unshared. If they assume that they're unshared, uh, then why bother with all that locking stuff? Or does this mean that page reads and writes are (dubiously) assumed atomic without locks? If you take a step back, you can see how confusing that is. LY don't get around to explaining this, but it's pretty clear that this is what's going on. If LY did a better job of explaining their algorithm, they'd just say that the latch coupling (coupling, or crabbing, of what we'd call buffer locks) previously required is no longer required, but single shared locks are required. As I pointed out, it looks like someone figured out a way to make that true much, much later [1]. I think page-level MVCC designs might have also been used, but basically our interpretation of LY is the standard one. We cannot really be considered to have deviated from LY, since AFAICT everyone else went with the same interpretation. FWIW, in recent years Stonebraker has used latch coupling as an example of the (implicitly no longer acceptable) overhead of designs influenced by System R and ARIES. This is unfounded, but plenty of bright people still at least believe that latch coupling is common. [1] http://www.vldb.org/conf/2001/P181.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 12, 2014 at 12:57 PM, Peter Geoghegan p...@heroku.com wrote: On Fri, Sep 12, 2014 at 2:15 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Amit: did you notice that paragraph in the README? If not, and now that you have read it, does that paragraph make things clear? If that's not enough, what do you think is missing? Amit raised the fact that LY say that no read locks are required, so clearly he read that paragraph. We don't try and justify that right now, and trying to explain the whole point of LY without first explaining this satisfactorily seems odd. The current text reads: Lehman and Yao don't require read locks, but assume that in-memory copies of tree pages are unshared. If they assume that they're unshared, uh, then why bother with all that locking stuff? Or does this mean that page reads and writes are (dubiously) assumed atomic without locks? If you take a step back, you can see how confusing that is. LY don't get around to explaining this, but it's pretty clear that this is what's going on. If LY did a better job of explaining their algorithm, they'd just say that the latch coupling (coupling, or crabbing, of what we'd call buffer locks) previously required is no longer required, but single shared locks are required. As I pointed out, it looks like someone figured out a way to make that true much, much later [1]. I think page-level MVCC designs might have also been used, but basically our interpretation of LY is the standard one. We cannot really be considered to have deviated from LY, since AFAICT everyone else went with the same interpretation. Gosh, I think you're making this way more complicated than it needs to be. My interpretation of the above statement was that they knew individual page reads and writes would need to be made atomic - probably using some form of simple locking - but omitted that from their pseudocode for clarity. Such elisions are common in computer science literature and don't typically detract from understanding. It is assumed that the implementor knows enough to avoid the trivial pitfalls of whatever is being discussed and therefore focuses on the higher-level algorithmic issues. If this is what we're arguing about, it's completely not worth the time we've spent on it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 12, 2014 at 10:10 AM, Robert Haas robertmh...@gmail.com wrote: Gosh, I think you're making this way more complicated than it needs to be. My interpretation of the above statement was that they knew individual page reads and writes would need to be made atomic - probably using some form of simple locking - but omitted that from their pseudocode for clarity. That clearly isn't the case. The introductory paragraph of LY says the following: Our solution compares favorably with earlier solutions in that the locking scheme is simpler (no read-locks are used) and only a (small) constant number of nodes are locked by any update process at any given time. They clearly and prominently state that not needing read locks is a major advantage of their algorithm, which doesn't quite ring true. If this is what we're arguing about, it's completely not worth the time we've spent on it. It isn't. It's a minor point, originally raised by Amit. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
Peter Geoghegan p...@heroku.com wrote: The introductory paragraph of LY says the following: Our solution compares favorably with earlier solutions in that the locking scheme is simpler (no read-locks are used) and only a (small) constant number of nodes are locked by any update process at any given time. They clearly and prominently state that not needing read locks is a major advantage of their algorithm, which doesn't quite ring true. It's been a while since I read that paper, but my recollection is that they assumed that each process or thread looking at a buffer would have its own private copy of that buffer, which it could be sure nobody was changing (even if the master copy somewhere else was changing). Locking was only needed to prevent conflicting writes. Now, whether it is safe to assume that creating a process-local buffer and copying to it is cheaper than getting a lock seems dicey, but that seemed to be the implicit assumption. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 12, 2014 at 12:39 PM, Kevin Grittner kgri...@ymail.com wrote: It's been a while since I read that paper, but my recollection is that they assumed that each process or thread looking at a buffer would have its own private copy of that buffer, which it could be sure nobody was changing (even if the master copy somewhere else was changing). Locking was only needed to prevent conflicting writes. Now, whether it is safe to assume that creating a process-local buffer and copying to it is cheaper than getting a lock seems dicey, but that seemed to be the implicit assumption. That is one way to make reads atomic, but I don't recall any explicit mention of it. In 1981, I think page sizes were about the same as today, but 4K was a lot of memory. We could actually do this, with some work. I think that this has actually been implemented elsewhere, though. Note that LY have practically nothing to say about deletion - they simply suggest that it be done offline. It is really useful that we can recover from page splits as and when problems arise. That's really what I'd like to prominently convey - it is the whole point of LY. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Fri, Sep 12, 2014 at 2:45 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 09/11/2014 11:47 PM, Peter Geoghegan wrote: On Tue, Sep 9, 2014 at 12:01 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: + Although it could be argued that Lehman and Yao isn't followed to the + letter because single pages are read locked as the tree is descended, + this is at least necessary to support deletion, a common requirement + which LY hardly acknowledge. Read locks also ensure that B-tree + pages are self-consistent (LY appear to assume atomic page reads and + writes). This is just duplicating the existing paragraph. I don't see the point of this. The point is to make clear the reason for the differences - evidently, Amit felt it was unclear why we don't follow LY. I am suggesting that LY's talk of having no read locks is unrealistic (it might be realistic in the 21st century, but that's a whole other story). IMHO the existing paragraph does a much better job explaining that: Lehman and Yao don't require read locks, but assume that in-memory copies of tree pages are unshared. Postgres shares in-memory buffers among backends. As a result, we do page-level read locking on btree pages in order to guarantee that no record is modified while we are examining it. Amit: did you notice that paragraph in the README? Yeah, I have noticed and even discussed in this thread that it might be better to add new text after that paragraph. If not, and now that you have read it, does that paragraph make things clear? If that's not enough, what do you think is missing? The paragraph is quite clear to me and the only thing, I have mentioned above is that if you (Peter) want to add anything new in that regard, then it is about an additional point why read locks are taken during traversal of tree which is something he is trying to say in his new patch as below: this is at least necessary to support deletion, However it could have been added in the existing text as well. Having said that, I think that was just a very minor thing which is generally obvious and was not at all the main feedback for patch. The main point was to explain about how does right-links at all levels and high-key helps to detect and recover from concurrent page splits which I think the patch has tried to explain, but honestly the wording you have posted (under heading The basic Lehman Yao Algorithm ) is more clear to me. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Tue, Sep 9, 2014 at 12:01 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: + Lehman and Yao don't require read locks, but assume that in-memory + copies of tree pages are unshared. This is the existing paragraph, just moved to different place in the README. That's right - it seemed to make just as much sense to talk about that here, while doing so establishes the context of talking about how what we do differs from the canonical algorithm (which I think is true of real-world LY implementations generally). Which makes the next paragraph easier to understand: + Although it could be argued that Lehman and Yao isn't followed to the + letter because single pages are read locked as the tree is descended, + this is at least necessary to support deletion, a common requirement + which LY hardly acknowledge. Read locks also ensure that B-tree + pages are self-consistent (LY appear to assume atomic page reads and + writes). This is just duplicating the existing paragraph. I don't see the point of this. The point is to make clear the reason for the differences - evidently, Amit felt it was unclear why we don't follow LY. I am suggesting that LY's talk of having no read locks is unrealistic (it might be realistic in the 21st century, but that's a whole other story). Even with these read locks, following LY obviates the need + of earlier schemes to hold multiple locks concurrently when descending + the tree as part of servicing index scans (pessimistic lock coupling). + The addition of right-links at all levels, as well as the addition of + a page high key allows detection and dynamic recovery from + concurrent page splits (that is, splits between unlocking an internal + page, and subsequently locking its child page during a descent). This explains in a few sentences what a LY B-tree looks like. The current README assumes that you're already familiar with what a LY tree looks like, or that you go read the paper mentioned at the top of the README. It might be a good idea to expand on that, and add an introduction like this so that an unfamiliar reader doesn't need to read the LY paper first. Is that the purpose of this patch? Please make it more explicit. Yes, although even LY don't get around to telling the reader why they should care, the mistake that many good papers make. We now know its significance, both in general and to Postgres. Framing the discussion like this aids understanding more than you'd think. And please make the sentences simpler - the above reads like a Shakespeare play. Out, damned lock! Something like: The basic Lehman Yao Algorithm Compared to a classic B-tree, LY adds a right-link pointer to each page, to the page's right sibling. It also adds a high key to each page, which is an upper bound on the keys that are allowed on that page. These two additions make it possible detect a concurrent page split, which allows the tree to be searched without holding any read locks (except to keep a single page from being modified while reading it). When a search follows a downlink to a child page, it compares the page's high key with the search key. If the search key is greater than the high key, the page must've been split concurrently, and you must follow the right-link to find the new page containing the key range you're looking for. This might need to be repeated, if the page has been split more than once. Differences to the Lehman Yao algorithm = (current Lehamn and Yao Algorithm and Insertions section) I think that's pretty much the same information you added, but it's in a separate section, with the clear purpose that it explains what a LY tree looks like. You can skip over it, if you have read the paper or are otherwise already familiar with it. It still assumes that you're familiar with B-trees in general. That seems fair enough - I'd just expand on why we don't completely avoid read locks, which LY suppose we can get away with. That is clearly a point of confusion. Anyway, I see that you had resurrected this in the commitfest app after three weeks of inactivity. I'm going to mark this back to Returned with Feedback. Please don't resurrect it again, this patch has received more than its fair share of attention. I didn't mean to suggest that it deserves much attention. I didn't know how to interpret the fact that you changed the status, since in fact not much additional work was required. I was busy throughout, for reasons that are perhaps obvious. I am not fussed about when this happens, but I really think we should get around to it. Instead, please help by signing up to review a patch. The commitfest progress is glacial at the moment, and we really need experienced reviewers like you to get closure to people's patches. I'm back from holidays now. I plan to look at the Tomas Vondra's patch. If you think I should look at some particular
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On 09/07/2014 05:58 AM, Peter Geoghegan wrote: + Lehman and Yao don't require read locks, but assume that in-memory + copies of tree pages are unshared. Postgres shares in-memory buffers + among backends. As a result, we do page-level read locking on btree + pages in order to guarantee that no record is modified while we are + examining it. This reduces concurrency but guarantees correct + behavior. An advantage is that when trading in a read lock for a + write lock, we need not re-read the page after getting the write lock. + Since we're also holding a pin on the shared buffer containing the + page, we know that buffer still contains the page and is up-to-date. This is the existing paragraph, just moved to different place in the README. + Although it could be argued that Lehman and Yao isn't followed to the + letter because single pages are read locked as the tree is descended, + this is at least necessary to support deletion, a common requirement + which LY hardly acknowledge. Read locks also ensure that B-tree + pages are self-consistent (LY appear to assume atomic page reads and + writes). This is just duplicating the existing paragraph. I don't see the point of this. Even with these read locks, following LY obviates the need + of earlier schemes to hold multiple locks concurrently when descending + the tree as part of servicing index scans (pessimistic lock coupling). + The addition of right-links at all levels, as well as the addition of + a page high key allows detection and dynamic recovery from + concurrent page splits (that is, splits between unlocking an internal + page, and subsequently locking its child page during a descent). When + a page is first locked (at every level of a descent servicing an index + scan), we consider the need to move right: if the scankey value is + less than (or sometimes less than or equal to) the page's existing + highkey value, a value which serves as an upper bound for values on + the page generally, then it must be necessary to move the scan to the + right-hand page on the same level. It's even possible that the scan + needs to move right more than once. Once the other session's + concurrent page split finishes, a downlink will be inserted into the + parent, and so assuming there are no further page splits, future index + scans using the same scankey value will not need to move right. LY + Trees are sometimes referred to as B-Link trees in the literature. This explains in a few sentences what a LY B-tree looks like. The current README assumes that you're already familiar with what a LY tree looks like, or that you go read the paper mentioned at the top of the README. It might be a good idea to expand on that, and add an introduction like this so that an unfamiliar reader doesn't need to read the LY paper first. Is that the purpose of this patch? Please make it more explicit. And please make the sentences simpler - the above reads like a Shakespeare play. Something like: The basic Lehman Yao Algorithm Compared to a classic B-tree, LY adds a right-link pointer to each page, to the page's right sibling. It also adds a high key to each page, which is an upper bound on the keys that are allowed on that page. These two additions make it possible detect a concurrent page split, which allows the tree to be searched without holding any read locks (except to keep a single page from being modified while reading it). When a search follows a downlink to a child page, it compares the page's high key with the search key. If the search key is greater than the high key, the page must've been split concurrently, and you must follow the right-link to find the new page containing the key range you're looking for. This might need to be repeated, if the page has been split more than once. Differences to the Lehman Yao algorithm = (current Lehamn and Yao Algorithm and Insertions section) I think that's pretty much the same information you added, but it's in a separate section, with the clear purpose that it explains what a LY tree looks like. You can skip over it, if you have read the paper or are otherwise already familiar with it. It still assumes that you're familiar with B-trees in general. Anyway, I see that you had resurrected this in the commitfest app after three weeks of inactivity. I'm going to mark this back to Returned with Feedback. Please don't resurrect it again, this patch has received more than its fair share of attention. Instead, please help by signing up to review a patch. The commitfest progress is glacial at the moment, and we really need experienced reviewers like you to get closure to people's patches. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Tue, Jul 22, 2014 at 10:49 PM, Peter Geoghegan p...@heroku.com wrote: Basically I think it will be better if you can explain in bit more detail that how does right-links at all levels and high-key helps to detect and recover from concurrent page splits. You might be right about that - perhaps I should go into more detail. I've gone into more detail on what a high key is, and how we arguably do not follow Lehman Yao to the letter because we still have read locks. LY's assumption of atomic page reads/writes, and cursory handling of deletion kind of make it inevitable that read locks are used, which I now imply. So nbtree isn't a substandard LY implementation - it's a realistic one, which only needs to hold a single read lock at a time when servicing index scans (or when finding a place for insertion). I guess Lehman Yao preferred to put forward the claim no read locks rather than only one read lock at a time on internal pages, even for insertion because there might be some uses of their algorithm where that is actually realistic. It is really a sympathetic way of spinning things to say that *no* read locks are used, though. -- Peter Geoghegan diff --git a/src/backend/access/nbtree/README b/src/backend/access/nbtree/README new file mode 100644 index 4820f76..8285050 *** a/src/backend/access/nbtree/README --- b/src/backend/access/nbtree/README *** use a simplified version of the deletion *** 11,16 --- 11,50 Shasha (V. Lanin and D. Shasha, A Symmetric Concurrent B-Tree Algorithm, Proceedings of 1986 Fall Joint Computer Conference, pp 380-389). + Lehman and Yao don't require read locks, but assume that in-memory + copies of tree pages are unshared. Postgres shares in-memory buffers + among backends. As a result, we do page-level read locking on btree + pages in order to guarantee that no record is modified while we are + examining it. This reduces concurrency but guarantees correct + behavior. An advantage is that when trading in a read lock for a + write lock, we need not re-read the page after getting the write lock. + Since we're also holding a pin on the shared buffer containing the + page, we know that buffer still contains the page and is up-to-date. + + Although it could be argued that Lehman and Yao isn't followed to the + letter because single pages are read locked as the tree is descended, + this is at least necessary to support deletion, a common requirement + which LY hardly acknowledge. Read locks also ensure that B-tree + pages are self-consistent (LY appear to assume atomic page reads and + writes). Even with these read locks, following LY obviates the need + of earlier schemes to hold multiple locks concurrently when descending + the tree as part of servicing index scans (pessimistic lock coupling). + The addition of right-links at all levels, as well as the addition of + a page high key allows detection and dynamic recovery from + concurrent page splits (that is, splits between unlocking an internal + page, and subsequently locking its child page during a descent). When + a page is first locked (at every level of a descent servicing an index + scan), we consider the need to move right: if the scankey value is + less than (or sometimes less than or equal to) the page's existing + highkey value, a value which serves as an upper bound for values on + the page generally, then it must be necessary to move the scan to the + right-hand page on the same level. It's even possible that the scan + needs to move right more than once. Once the other session's + concurrent page split finishes, a downlink will be inserted into the + parent, and so assuming there are no further page splits, future index + scans using the same scankey value will not need to move right. LY + Trees are sometimes referred to as B-Link trees in the literature. + The Lehman and Yao Algorithm and Insertions --- *** to be inserted has a choice whether or n *** 42,57 key could go on either page. (Currently, we try to find a page where there is room for the new key without a split.) - Lehman and Yao don't require read locks, but assume that in-memory - copies of tree pages are unshared. Postgres shares in-memory buffers - among backends. As a result, we do page-level read locking on btree - pages in order to guarantee that no record is modified while we are - examining it. This reduces concurrency but guarantees correct - behavior. An advantage is that when trading in a read lock for a - write lock, we need not re-read the page after getting the write lock. - Since we're also holding a pin on the shared buffer containing the - page, we know that buffer still contains the page and is up-to-date. - We support the notion of an ordered scan of an index as well as insertions, deletions, and simple lookups. A scan in the forward direction is no problem, we just use the right-sibling pointers that ---
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Thu, Jul 24, 2014 at 9:28 AM, Peter Geoghegan p...@heroku.com wrote: On Wed, Jul 23, 2014 at 8:52 PM, Amit Kapila amit.kapil...@gmail.com wrote: As such there is no problem in saying the way you have mentioned, but I feel it would be better if we can mention the mechanism of _bt_search() as quoted by you upthread in the first line. In more concrete terms, _bt_search() releases and only then acquires read locks during a descent of the tree (by calling _bt_relandgetbuf()), and, perhaps counterintuitively, that's just fine. I guess I could say that too. Okay. One more point, why you think it is important to add this new text on top? I think adding new text after Lehman and Yao don't require read locks, .. paragraph is okay. I've added it to the top because it's really the most important point on Lehman and Yao. It's the _whole_ point. Consider how it's introduced here, for example: http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html Why should I bury the lead? I think even if you want to keep it at top, may be we could have another heading like : Concurrency Considerations with Lehman Yao Approach However, I think we can leave this point for Committer to decide. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Wed, Jul 23, 2014 at 11:19 AM, Peter Geoghegan p...@heroku.com wrote: On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila amit.kapil...@gmail.com wrote: Okay, but how does this justify to add below new text in README. + Even with these read locks, Lehman and Yao's approach obviates the + need of earlier schemes to hold multiple read locks concurrently when + descending the tree as part of servicing index scans (pessimistic lock + coupling). Actually I think putting it can lead to inconsistency in the README. Currently it indicates that our algorithm is different from LY w.r.t taking Read Locks and has given explanation for same. Not really. Firstly, that sentence acknowledges that there are read locks where LY assume there will not be. Even with these read locks references the first paragraph, where it is stated the Postgres B-Trees still acquire read locks while descending the tree. I think here you want to state that the difference in Postgres is as we are using L Y approach, it don't need to hold *multiple* read locks concurrently, and L Y approach which obviates this need is explained in second line (which indicates the importance of maintaining right-links and high-keys to detect and recover from page splits). As such there is no problem in saying the way you have mentioned, but I feel it would be better if we can mention the mechanism of _bt_search() as quoted by you upthread in the first line. In more concrete terms, _bt_search() releases and only then acquires read locks during a descent of the tree (by calling _bt_relandgetbuf()), and, perhaps counterintuitively, that's just fine. One more point, why you think it is important to add this new text on top? I think adding new text after Lehman and Yao don't require read locks, .. paragraph is okay. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Wed, Jul 23, 2014 at 8:52 PM, Amit Kapila amit.kapil...@gmail.com wrote: As such there is no problem in saying the way you have mentioned, but I feel it would be better if we can mention the mechanism of _bt_search() as quoted by you upthread in the first line. In more concrete terms, _bt_search() releases and only then acquires read locks during a descent of the tree (by calling _bt_relandgetbuf()), and, perhaps counterintuitively, that's just fine. I guess I could say that too. One more point, why you think it is important to add this new text on top? I think adding new text after Lehman and Yao don't require read locks, .. paragraph is okay. I've added it to the top because it's really the most important point on Lehman and Yao. It's the _whole_ point. Consider how it's introduced here, for example: http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html Why should I bury the lead? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila amit.kapil...@gmail.com wrote: There is a mention about the race condition where it needs to move right in another caller (_bt_search) of _bt_moveright() as well. /* * Race -- the page we just grabbed may have split since we read its * pointer in the parent (or metapage). If it has, we may need to * move right to its new sibling. Do that. .. Do you think there is more to what is already mentioned on top of second caller which we should add or you think if it is true for both, then it should be on top of _bt_moveright()? Well, maybe it is justified to mention it multiple times, so as to not break the reader's train of thought. I'm not sure. In general, I agree with you that we should mention about any advantage of the algorithm we are using and especially if it is significant. I think it will be better if can also mention how that advantage or use is realized in our implementation as we are already doing in README of nbtree. Right. It seems like the nbtree README is very shy about telling us what the point of all this extra work is. IMV that should be stated very prominently to aid understanding. A lot of people believe that we have to do lock coupling/crabbing when descending the tree. We do not. The locks acquired when descending a B-Tree in Postgres are pretty granular. One read buffer lock is held at a time in the process of servicing index scans. There are times during the descent of the tree when no buffer locks are held whatsoever. Moreover, (with some caveats) it doesn't really matter if a stale view of the page is seen during a descent (as it happens I've been trying to think of ways to further take advantage of this). That's pretty cool. If you only know one thing about how the nbtree code works, that should probably be it. The above indicates 2 things: a. L Y doesn't need to hold read locks concurrently. b. Advantage of right-links at all levels and high-key. As per my understanding, we are not following point (a) in our code, so what is the benefit we get by having a reference of same in README? The major reason that we don't completely avoid read locks, is, I suppose, the need for self-consistent pages (but also because it would break page deletion - I'm pretty sure that LY don't consider page deletion, and the page deletion logic is entirely based on the Lanin and Shasha paper and original research, but I didn't check). I think that the sentence Lehman and Yao don't require read locks, but assume that in-memory copies of tree pages are unshared is intended to convey the point on the self-consistency of pages. Of course, Lehman and Yao must assume that the B-Tree is in some sense in shared memory. Otherwise, there wouldn't be much point to their elaborate locking protocol. :-) Isn't it better if we mention how the point (b) is used in our code and it's advantage together rather than keeping it at top of README? Maybe it deserves more prominent mention in the code too. Already README mentions in brief about right-link and how it is used as below: .. The scan must remember the page's right-link at the time it was scanned, since that is the page to move right to; if we move right to the current right-link then we'd re-scan any items moved by a page split. ... This is talking about how index scans interlock against VACUUM while going to the heap, by keeping a leaf page pinned (this prevents super exclusive lock acquisition). This is after the tree has been descended. This is really just a detail (albeit one that follows similar principles, since pages split right and it similarly exploits that fact), whereas the use of right links and high keys while descending the tree, and in particular the fact that the move right LY technique obviates the prior need for lock coupling is pretty much the whole point of LY. In more concrete terms, _bt_search() releases and only then acquires read locks during a descent of the tree (by calling _bt_relandgetbuf()), and, perhaps counterintuitively, that's just fine. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
Peter Geoghegan p...@heroku.com writes: Right. It seems like the nbtree README is very shy about telling us what the point of all this extra work is. IIRC, the README was written on the assumption that you'd already read LY. If this patch is mostly about not assuming that, why not? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Wed, Jul 23, 2014 at 5:58 AM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jul 21, 2014 at 9:55 PM, Amit Kapila amit.kapil...@gmail.com wrote: The above indicates 2 things: a. L Y doesn't need to hold read locks concurrently. b. Advantage of right-links at all levels and high-key. As per my understanding, we are not following point (a) in our code, so what is the benefit we get by having a reference of same in README? The major reason that we don't completely avoid read locks, is, I suppose, the need for self-consistent pages (but also because it would break page deletion - I'm pretty sure that LY don't consider page deletion, and the page deletion logic is entirely based on the Lanin and Shasha paper and original research, but I didn't check). I think that the sentence Lehman and Yao don't require read locks, but assume that in-memory copies of tree pages are unshared is intended to convey the point on the self-consistency of pages. Of course, Lehman and Yao must assume that the B-Tree is in some sense in shared memory. Otherwise, there wouldn't be much point to their elaborate locking protocol. :-) Okay, but how does this justify to add below new text in README. + Even with these read locks, Lehman and Yao's approach obviates the + need of earlier schemes to hold multiple read locks concurrently when + descending the tree as part of servicing index scans (pessimistic lock + coupling). Actually I think putting it can lead to inconsistency in the README. Currently it indicates that our algorithm is different from LY w.r.t taking Read Locks and has given explanation for same. Isn't it better if we mention how the point (b) is used in our code and it's advantage together rather than keeping it at top of README? Maybe it deserves more prominent mention in the code too. Already README mentions in brief about right-link and how it is used as below: .. The scan must remember the page's right-link at the time it was scanned, since that is the page to move right to; if we move right to the current right-link then we'd re-scan any items moved by a page split. ... This is talking about how index scans interlock against VACUUM while going to the heap, by keeping a leaf page pinned (this prevents super exclusive lock acquisition). This is after the tree has been descended. This is really just a detail (albeit one that follows similar principles, since pages split right and it similarly exploits that fact), whereas the use of right links and high keys while descending the tree, and in particular the fact that the move right LY technique obviates the prior need for lock coupling is pretty much the whole point of LY. In more concrete terms, _bt_search() releases and only then acquires read locks during a descent of the tree (by calling _bt_relandgetbuf()), and, perhaps counterintuitively, that's just fine. So don't you think that it needs bit more explanation than you have quoted in below text. + The addition of right-links at all levels, as well as the + addition of a page high key allows detection of, and dynamic + recovery from concurrent page splits (that is, splits between + unlocking an internal page, and subsequently locking its child page + during a descent). Basically I think it will be better if you can explain in bit more detail that how does right-links at all levels and high-key helps to detect and recover from concurrent page splits. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Tue, Jul 22, 2014 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: IIRC, the README was written on the assumption that you'd already read LY. If this patch is mostly about not assuming that, why not? LY made the same mistake that the authors of most influential papers make - they never get around to telling the reader why they should bother to read it. The paper is over 30 years old, and we now know that it's very influential, and the reasons why. I think that both the nbtree README and LY would be a lot more approachable with a high level introduction (arguably LY attempt this, but the way they go about it seems impenetrable, mostly consisting of esoteric references to other papers). Surely making that code more approachable is a worthy goal. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Tue, Jul 22, 2014 at 8:59 PM, Amit Kapila amit.kapil...@gmail.com wrote: Okay, but how does this justify to add below new text in README. + Even with these read locks, Lehman and Yao's approach obviates the + need of earlier schemes to hold multiple read locks concurrently when + descending the tree as part of servicing index scans (pessimistic lock + coupling). Actually I think putting it can lead to inconsistency in the README. Currently it indicates that our algorithm is different from LY w.r.t taking Read Locks and has given explanation for same. Not really. Firstly, that sentence acknowledges that there are read locks where LY assume there will not be. Even with these read locks references the first paragraph, where it is stated the Postgres B-Trees still acquire read locks while descending the tree. Secondly, I'm pretty sure that even Lehman and Yao realized that their apparent assumption that real implementations would not require read locks isn't realistic. Their handling of deletion seems perfunctory to me. They say In situations where excessive deletions cause the storage utilization of tree nodes to be unacceptably low, a batch reorganization or an underflow operation which locks the entire tree can be performed. I'm pretty sure that that sounded almost as bad in 1980 as it does now. We don't have a not quite LY implementation just because there are single read locks acquired while descending the tree. Prior schemes needed multiple *concurrent* exclusive locks. B-Trees were around for about 10 years before LY. There is reason to think that pretty much every practical implementation uses read locks for many years, because there is a well received 2001 paper [1] that describes a scheme where LY style B-link trees can *actually* be made to not require read locks, which discusses things like caching effects on contemporary hardware - it involves playing tricks with detecting and recovering from page level inconsistencies, IIRC. Furthermore, it references a scheme from the late 90s involving local copies of B-Link pages. I thought about pursuing something like that myself, but the cost of latching (buffer locking) B-Trees in PostgreSQL is likely to be reduced before too long anyway, which makes the general idea seem unenticing right now. Basically I think it will be better if you can explain in bit more detail that how does right-links at all levels and high-key helps to detect and recover from concurrent page splits. You might be right about that - perhaps I should go into more detail. [1] http://www.vldb.org/conf/2001/P181.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Tue, May 27, 2014 at 1:52 AM, Peter Geoghegan p...@heroku.com wrote: While talking to various people during pgCon, I was reminded that the nbtree README does a poor job of explaining the actual practical advantages of LY from a high level. The move right trick is initially mentioned only as an adjunct to a discussion of the special-case handling of root page splits, even though it's of huge significance. We also say this within nbtinsert.c: /* * If the page was split between the time that we surrendered our read * lock and acquired our write lock, then this page may no longer be the * right place for the key we want to insert. In this case, we need to * move right in the tree. See Lehman and Yao for an excruciatingly * precise description. */ (Why we need to say something about _bt_moveright() within _bt_doinsert() that is equally true of both _bt_moveright() callers isn't clear). There is a mention about the race condition where it needs to move right in another caller (_bt_search) of _bt_moveright() as well. /* * Race -- the page we just grabbed may have split since we read its * pointer in the parent (or metapage). If it has, we may need to * move right to its new sibling. Do that. .. Do you think there is more to what is already mentioned on top of second caller which we should add or you think if it is true for both, then it should be on top of _bt_moveright()? I think that this isn't enough. Attached patch proposes to add a small paragraph at the top of the nbtree README, to clarify the advantages of LY from a high level. I don't think it's appropriate to state the advantages of an algorithm in a README file generally, but in this instance it makes it easier to understand the algorithm. In general, I agree with you that we should mention about any advantage of the algorithm we are using and especially if it is significant. I think it will be better if can also mention how that advantage or use is realized in our implementation as we are already doing in README of nbtree. + + Even with these read locks, Lehman and Yao's approach obviates the + need of earlier schemes to hold multiple read locks concurrently when + descending the tree as part of servicing index scans (pessimistic lock + coupling). The addition of right-links at all levels, as well as the + addition of a page high key allows detection of, and dynamic + recovery from concurrent page splits (that is, splits between + unlocking an internal page, and subsequently locking its child page + during a descent). LY Trees are sometimes referred to as B-Link + trees in the literature. + The above indicates 2 things: a. L Y doesn't need to hold read locks concurrently. b. Advantage of right-links at all levels and high-key. As per my understanding, we are not following point (a) in our code, so what is the benefit we get by having a reference of same in README? Isn't it better if we mention how the point (b) is used in our code and it's advantage together rather than keeping it at top of README? Already README mentions in brief about right-link and how it is used as below: .. The scan must remember the page's right-link at the time it was scanned, since that is the page to move right to; if we move right to the current right-link then we'd re-scan any items moved by a page split. ... With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Mon, May 26, 2014 at 1:22 PM, Peter Geoghegan p...@heroku.com wrote: I think that this isn't enough. Attached patch proposes to add a small paragraph at the top of the nbtree README, to clarify the advantages of LY from a high level. I've added this to the ongoing commitfest. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
At 2014-07-02 17:24:06 -0700, p...@heroku.com wrote: I've added this to the ongoing commitfest. I don't actually understand why you were able to do that (seeing as this CF is no longer open for new patches). Trivial or not, I think at this point it should go into the next one. Or it should be handled outside the CF altogether. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Thu, Jul 3, 2014 at 11:40 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: At 2014-07-02 17:24:06 -0700, p...@heroku.com wrote: I've added this to the ongoing commitfest. I don't actually understand why you were able to do that (seeing as this CF is no longer open for new patches). Trivial or not, I think at this point it should go into the next one. Or it should be handled outside the CF altogether. +1. Even if it is just a doc patch, it has been submitted after 6/15, which was the CF1 deadline. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
At 2014-07-03 11:59:21 +0900, michael.paqu...@gmail.com wrote: +1. Even if it is just a doc patch, it has been submitted after 6/15, which was the CF1 deadline. Well, to be fair, the original patch was posted to the list more than a month ago, and it should have been in this CF. But… it wasn't. And now after more than two weeks into this CF, I don't think it should be. I moved it to 2014-08. (Sorry, Peter.) -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stating the significance of Lehman Yao in the nbtree README
On Wed, Jul 2, 2014 at 8:14 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: Well, to be fair, the original patch was posted to the list more than a month ago, and it should have been in this CF. But… it wasn't. And now after more than two weeks into this CF, I don't think it should be. Is that how the rule is interpreted? Okay, I defer to you. I guess I've just never seen a situation where that distinction needed to be drawn come up before. I moved it to 2014-08. (Sorry, Peter.) I don't mind if no one looks at this until then. I agree that this is exactly the kind of thing that generally doesn't need to be handled through a commitfest submission. The only reason I added this to any commitfest was to avoid having it be forgotten about entirely, which there is a real danger of for something like this when it isn't handled quickly. I almost forgot about it myself. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers