On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: > Splitting a B-tree page is a two-stage process: First, the page is split, > and then a downlink for the new right page is inserted into the parent > (which might recurse to split the parent page, too). What happens if > inserting the downlink fails for some reason? I tried that out, and it turns > out that it's not nice. > > I used this to cause a failure: > > >--- a/src/backend/access/nbtree/nbtinsert.c > >+++ b/src/backend/access/nbtree/nbtinsert.c > >@@ -1669,6 +1669,8 @@ _bt_insert_parent(Relation rel, > > _bt_relbuf(rel, pbuf); > > } > > > >+ elog(ERROR, "fail!"); > >+ > > /* get high key from left page == lowest key on new right page > > */ > > ritem = (IndexTuple) PageGetItem(page, > > > > PageGetItemId(page, P_HIKEY)); > > postgres=# create table foo (i int4 primary key); > CREATE TABLE > postgres=# insert into foo select generate_series(1, 10000); > ERROR: fail! > > That's not surprising. But when I removed that elog again and restarted the > server, I still can't insert. The index is permanently broken: > > postgres=# insert into foo select generate_series(1, 10000); > ERROR: failed to re-find parent key in index "foo_pkey" for split pages 4/5 > > In real life, you would get a failure like this e.g if you run out of memory > or disk space while inserting the downlink to the parent. Although rare in > practice, it's no fun if it happens.
Why doesn't the incomplete split mechanism prevent this? Because we do not delay checkpoints on the primary and a checkpoint happened just befor your elog(ERROR) above? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers