Let me re-summarize what's been done here to make sure I understand:
Each key in GIN has its own posting tree, which is itself a btree,
holding all of the tuples that contain that key. This posting tree is
really just a set of tuples, and searches always scan an entire
posting tree (because all the tuples contain the key, so all are a
Currently, the cleanup process requires blocking all reads and writes
in the posting list. I assume this is only a problem when a key is
common enough that its posting list is quite large (how large?).
This patch makes a couple changes to improve concurrency:
1. Vacuum leaves first, without removing any pages. Instead, just keep
track of pages which are completely empty (more generally, subtrees
that contain empty pages).
2. Free the empty pages in those subtrees. That requires blocking
reads and writes to that subtree, but not the entire posting list.
Blocking writes is easy: acquiring a cleanup lock on the root of any
subtree always blocks any writes to that subtree, because the write
keeps a pin on all pages in that path. Blocking reads is accomplished
by locking the page to be deleted, then locking/unlocking the page one
left of that (which may be outside the subtree?).
Maybe a basic question, but why is the posting list a btree at all,
and not, say a doubly-linked list?
Review of the code itself:
* Nice and simple, with a net line count of only +45.
* Remove commented-out code.
* In the README, "leaf-to-right" should be left-to-right; and "takinf"
should be "taking".
* When you say the leftmost page is never deleted; do you mean the
leftmost of the entire posting tree, or leftmost of the subtree that
you are removing pages from?
* In order to keep out concurrent reads, you need to lock/unlock the
left page while holding exclusive lock on the page being deleted, but
I didn't see how that happens exactly in the code. Where does that
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: