On Wed, Aug 27, 2014 at 7:08 PM, Tatsuo Ishii <is...@postgresql.org> wrote: > While looking into a btree internal page using pg_filedump against an > int4 index generated pgbench, I noticed that only item 2 has length 8, > which indicates that the index tuple has only tuple header and has no > index data. In my understanding this indicates that the item is used > to represent a down link to a page. Question is, why the item is 2, > not 1. I thought an index tuple indicating down link is always 1. Is > this a sign that something goes wrong?
No. On a non-rightmost page, the "high key" item is physically first (which is a bit odd, because it serves as a high-bound invariant on the items that the page stores, but it's convenient to do it that way for other reasons). On an internal page (that is also non-rightmost), the second item (which is the first "real" item - i.e. the item which P_FIRSTDATAKEY() returns) is just placeholder garbage. The reason for that is noted above _bt_compare(): * CRUCIAL NOTE: on a non-leaf page, the first data key is assumed to be * "minus infinity": this routine will always claim it is less than the * scankey. The actual key value stored (if any, which there probably isn't) * does not matter. This convention allows us to implement the Lehman and * Yao convention that the first down-link pointer is before the first key. * See backend/access/nbtree/README for details. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers