Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-18 Thread Tom Lane
Heikki Linnakangas writes: > Yeah, it's a bit silly that each resource manager has to do that on > their own. It would be useful to have a memory context that was > automatically reset between each WAL record. In fact that should > probably be the default memory context you run the WAL redo rou

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-18 Thread Heikki Linnakangas
On 02/06/2014 01:54 AM, Peter Geoghegan wrote: On Thu, Jan 23, 2014 at 1:36 PM, Peter Geoghegan wrote: So while post-recovery callbacks no longer exist for any rmgr-managed-resource, 100% of remaining startup and cleanup callbacks concern the simple management of memory of AM-specific recovery

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-18 Thread Heikki Linnakangas
On 02/06/2014 06:42 AM, Peter Geoghegan wrote: I'm not sure about this: *** _bt_findinsertloc(Relation rel, *** 675,680 --- 701,707 static void _bt_insertonpg(Relation rel, Buffer buf, + Buffer cbuf,

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-14 Thread Heikki Linnakangas
On 03/14/2014 01:03 PM, Peter Geoghegan wrote: Ping? I committed the other patch this depends on now. I'll take another stab at this one next. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mail

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-03-14 Thread Peter Geoghegan
Ping? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
Some more thoughts: Please add comments above _bt_mark_page_halfdead(), a new routine from the dependency patch. I realize that this is substantially similar to part of how _bt_pagedel() used to work, but it's still incongruous. > ! Our approach is to create any missing downlinks on-they-fly, whe

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
On Tue, Feb 4, 2014 at 11:56 PM, Heikki Linnakangas wrote: > I also changed _bt_moveright to never return a write-locked buffer, when the > caller asked for a read-lock (an issue you pointed out earlier in this > thread). I think that _bt_moveright() looks good now. There is now bitrot, caused b

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
On Thu, Jan 23, 2014 at 1:36 PM, Peter Geoghegan wrote: > So while post-recovery callbacks no longer exist for any > rmgr-managed-resource, 100% of remaining startup and cleanup callbacks > concern the simple management of memory of AM-specific recovery > contexts (for GiST, GiN and SP-GiST). I ha

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-05 Thread Peter Geoghegan
On Tue, Feb 4, 2014 at 11:56 PM, Heikki Linnakangas wrote: >> Since, as I mentioned, _bt_finish_split() ultimately unlocks *and >> unpins*, it may not be the same buffer as before, so even with the >> refactoring there are race conditions. > > Care to elaborate? Or are you just referring to the mi

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-04 Thread Heikki Linnakangas
On 02/04/2014 02:40 AM, Peter Geoghegan wrote: On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas wrote: I refactored the loop in _bt_moveright to, well, not have that bug anymore. The 'page' and 'opaque' pointers are now fetched at the beginning of the loop. Did I miss something? I think so

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-02-03 Thread Peter Geoghegan
On Fri, Jan 31, 2014 at 9:09 AM, Heikki Linnakangas wrote: > I refactored the loop in _bt_moveright to, well, not have that bug anymore. > The 'page' and 'opaque' pointers are now fetched at the beginning of the > loop. Did I miss something? I think so, yes. You still aren't assigning the value r

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-31 Thread Heikki Linnakangas
On 01/30/2014 12:46 AM, Peter Geoghegan wrote: On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan wrote: On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas wrote: I think I see some bugs in _bt_moveright(). If you examine _bt_finish_split() in detail, you'll see that it doesn't just drop the

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-29 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan wrote: > On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas > wrote: >>> I think I see some bugs in _bt_moveright(). If you examine >>> _bt_finish_split() in detail, you'll see that it doesn't just drop the >>> write buffer lock that the caller

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:58 AM, Heikki Linnakangas wrote: > Okay, promise not to laugh. I did write a bunch of hacks, to generate > graphviz .dot files from the btree pages, and render them into pictures. It > consist of multiple parts, all in the attached tarball. It's funny that you should sa

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Peter Geoghegan
On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas wrote: >> I think I see some bugs in _bt_moveright(). If you examine >> _bt_finish_split() in detail, you'll see that it doesn't just drop the >> write buffer lock that the caller will have provided (per its >> comments) - it also drops the buff

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-27 Thread Heikki Linnakangas
On 01/23/2014 11:36 PM, Peter Geoghegan wrote: The first thing I noticed about this patchset is that it completely expunges btree_xlog_startup(), btree_xlog_cleanup() and btree_safe_restartpoint(). The post-recovery cleanup that previously occurred to address both sets of problems (the problem ad

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2014-01-23 Thread Peter Geoghegan
On Thu, Nov 14, 2013 at 9:23 AM, Heikki Linnakangas wrote: > Ok, here's a new version of the patch to handle incomplete B-tree splits. I finally got around to taking a look at this. Unlike with the as-yet uncommitted "Race condition in b-tree page deletion" patch that Kevin looked at, which there

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-25 Thread Heikki Linnakangas
On 22.10.2013 19:55, Heikki Linnakangas wrote: I fixed the the same problem in GiST a few years back, by making it tolerate missing downlinks, and inserting them lazily. The B-tree code tolerates them already on scans, but gets confused on insertion, as seen above. I propose that we use the same

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 16:38:05 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2013-10-22 15:24:40 -0400, Tom Lane wrote: > >> No, that's hardly a good idea. As Heikki says, that would amount to > >> converting an entirely foreseeable situation into a PANIC. > > > But IIUC this can currently lead

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Tom Lane
Andres Freund writes: > On 2013-10-22 15:24:40 -0400, Tom Lane wrote: >> No, that's hardly a good idea. As Heikki says, that would amount to >> converting an entirely foreseeable situation into a PANIC. > But IIUC this can currently lead to an index giving wrong answers, not > "just" fail at fur

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 15:24:40 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2013-10-22 21:29:13 +0300, Heikki Linnakangas wrote: > >> We could put a critical section around the whole recursion that inserts the > >> downlinks, so that you would get a PANIC and the incomplete split mechanism > >> w

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
On 22.10.2013 22:24, Tom Lane wrote: I wonder whether Heikki's approach could be used to remove the need for the incomplete-split-fixup code altogether, thus eliminating a class of recovery failure possibilities. Yes. I intend to do that, too. - Heikki -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Tom Lane
Andres Freund writes: > On 2013-10-22 21:29:13 +0300, Heikki Linnakangas wrote: >> We could put a critical section around the whole recursion that inserts the >> downlinks, so that you would get a PANIC and the incomplete split mechanism >> would fix it at recovery. But that would hardly be an imp

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 21:29:13 +0300, Heikki Linnakangas wrote: > On 22.10.2013 21:25, Andres Freund wrote: > >On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: > >>Splitting a B-tree page is a two-stage process: First, the page is split, > >>and then a downlink for the new right page is inserted int

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
On 22.10.2013 21:25, Andres Freund wrote: On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: Splitting a B-tree page is a two-stage process: First, the page is split, and then a downlink for the new right page is inserted into the parent (which might recurse to split the parent page, too).

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Andres Freund
On 2013-10-22 19:55:09 +0300, Heikki Linnakangas wrote: > Splitting a B-tree page is a two-stage process: First, the page is split, > and then a downlink for the new right page is inserted into the parent > (which might recurse to split the parent page, too). What happens if > inserting the downlin

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Peter Geoghegan
On Tue, Oct 22, 2013 at 10:30 AM, Heikki Linnakangas wrote: > I may be missing something, but there are already plenty of b-tree specific > flags. See BTP_* in nbtree.h. I'll just add another to that list. Based on your remarks, I thought that you were intent on directly using page level bits (pd

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
On 22.10.2013 20:27, Peter Geoghegan wrote: On Tue, Oct 22, 2013 at 9:55 AM, Heikki Linnakangas wrote: I propose that we use the same approach I used with GiST, and add a flag to the page header to indicate "the downlink hasn't been inserted yet". When insertion (or vacuum) bumps into a flagge

Re: [HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Peter Geoghegan
On Tue, Oct 22, 2013 at 9:55 AM, Heikki Linnakangas wrote: > I propose that we use the same approach I used with GiST, and add a flag to > the page header to indicate "the downlink hasn't been inserted yet". When > insertion (or vacuum) bumps into a flagged page, it can finish the > incomplete act

[HACKERS] Failure while inserting parent tuple to B-tree is not fun

2013-10-22 Thread Heikki Linnakangas
Splitting a B-tree page is a two-stage process: First, the page is split, and then a downlink for the new right page is inserted into the parent (which might recurse to split the parent page, too). What happens if inserting the downlink fails for some reason? I tried that out, and it turns out