Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Heikki Linnakangas
Tom Lane wrote: I wrote: I've been analyzing Ed L's recent report of index corruption: http://archives.postgresql.org/pgsql-general/2006-10/msg01183.php Auch. That's nasty indeed. So I think the rule needs to be don't delete the rightmost child unless it's the only child, in which case you

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: I don't understand how this in the meantime thing works. I tried to work out a step-by-step example, could you take a look at it? See http://users.tkk.fi/~hlinnaka/pgsql/btree-deletion-bug/ [ looks at that for a bit... ] Yeah, you're right. Once

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Heikki Linnakangas
Tom Lane wrote: In theory, given a slow-enough-moving VACUUM process, this could happen even without a crash. So I think that means we have to go over to the other plan of locking everything all the way up to the top of the deletion before we start doing it --- and also, we'll need crash

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Tom Lane
I wrote: [ looks at that for a bit... ] Yeah, you're right. Once the deletion is completed, the F lower-bound key will disappear from the grandparent, which would restore consistency --- but we could have already delivered wrong search answers, so that won't do. On further reflection, I

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: What does the original research paper by Lanin Shasha say about this? Nothing very useful. The connection of our code to that paper is actually a bit tenuous: their approach to deletion is to make the target page's key space move left not right.

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Heikki Linnakangas
Tom Lane wrote: On further reflection, I think I understand why we've not realized the existence of this bug before: in fact, it *doesn't* lead to wrong search answers. I think the only visible consequence is exactly the failed to re-find parent key VACUUM error that Ed saw. The reason is that

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes: But now that I look at the original post by Ed, I don't see how the failed to re-find parent key error could result from the issue we've been talking about. The error message is printed when _bt_getstackbuf is unable to re-find an item in the

Re: [HACKERS] Nasty btree deletion bug

2006-10-26 Thread Tom Lane
I wrote: Right, but _bt_getstackbuf is working from a search stack created by a standard search for the victim page's high key. If that search descended through a page to the right of the victim page's actual parent, _bt_getstackbuf isn't able to recover. What I'm tempted to do, at least in

[HACKERS] Nasty btree deletion bug

2006-10-25 Thread Tom Lane
I've been analyzing Ed L's recent report of index corruption: http://archives.postgresql.org/pgsql-general/2006-10/msg01183.php (thanks to Ed for letting me have the actual index to study). I think I've figured out what happened to it. nbtree/README says : The notion of a half-dead page means

Re: [HACKERS] Nasty btree deletion bug

2006-10-25 Thread Tom Lane
I wrote: I've been analyzing Ed L's recent report of index corruption: http://archives.postgresql.org/pgsql-general/2006-10/msg01183.php After some thought, it still seems to me to be impractical to delete the rightmost child of a btree page while other children remain. Doing this would