On Fri, Feb 10, 2017 at 11:22 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Why can't we rely on _bt_walk_left? > > The reason is mentioned in comments, but let me try to explain with > some example. When you reach that point of code, it means that either > the current page (assume page number is 10) doesn't contain any > matching items or it is a half-dead page, both of which indicates that > we have to move to the previous page. Now, before checking if the > current page contains matching items, we signal parallel machinery > (via _bt_parallel_release) to allow workers to read the previous page > (assume previous page number is 9). So it is quite possible that > after deciding that current page (page number 10) doesn't contain any > matching tuples if we directly move to the previous page (in this case > it will be 9) by using _bt_walk_left, some other worker would have > read page 9. In short, if we directly use _bt_walk_left(), then we > are prone to returning some of the values twice as multiple workers > can read the same page.
But ... the entire point of the seize-and-release stuff is to avoid this problem. You're suppose to seize the scan, read the current page, walk left, store the page you find in the scan, and then release the scan. The entire point of that stuff is that when somebody's advancing the scan to the next page, everybody else waits for them to get done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers