Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-07-04 Thread Andres Freund
Hi,

On 2018-06-27 12:16:29 -0700, Andres Freund wrote:
> I think we should backpatch those checks - it's a fairly nasty situation
> for users to not be able to even drop an index without going to single
> user mode.

Did that back to 9.4 - before that page deletion and splits worked
differently enough that it seems higher risk to add the checks there.

Greetings,

Andres Freund



Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-06-27 Thread Peter Geoghegan
On Wed, Jun 27, 2018 at 1:11 PM, Andres Freund  wrote:
> Well, I don't really want to generally do better. Just be able to check
> for interrupts ;)

That's what I meant. :-)


-- 
Peter Geoghegan



Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-06-27 Thread Andres Freund
On 2018-06-27 13:02:25 -0700, Peter Geoghegan wrote:
> On Wed, Jun 27, 2018 at 12:52 PM, Andres Freund  wrote:
> > A related question is if it matters - without complicating the code I
> > don't see how we could release all the locks in that loop. Therefore no
> > interrupts can be accepted.  I hope I'm missing something?
> 
> I agree.
> 
> In general, page deletion is the most complicated part of nbtree
> concurrency, by far (if we just had the basic L, the concurrency
> aspects would be far easier to grasp). Doing better in
> _bt_unlink_halfdead_page() seems extremely difficult, and very
> unlikely to be worthwhile.

Well, I don't really want to generally do better. Just be able to check
for interrupts ;)

Greetings,

Andres Freund



Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-06-27 Thread Peter Geoghegan
On Wed, Jun 27, 2018 at 12:52 PM, Andres Freund  wrote:
> A related question is if it matters - without complicating the code I
> don't see how we could release all the locks in that loop. Therefore no
> interrupts can be accepted.  I hope I'm missing something?

I agree.

In general, page deletion is the most complicated part of nbtree
concurrency, by far (if we just had the basic L, the concurrency
aspects would be far easier to grasp). Doing better in
_bt_unlink_halfdead_page() seems extremely difficult, and very
unlikely to be worthwhile.

-- 
Peter Geoghegan



Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-06-27 Thread Andres Freund
Hi,

On 2018-06-27 12:49:56 -0700, Peter Geoghegan wrote:
> On Wed, Jun 27, 2018 at 12:18 PM, Andres Freund  wrote:
> > It's possibly wrong that it's unreachable - I've just not managed to get
> > there. If somebody has an idea how to build a reproducible case to reach
> > it...
> 
> Set a breakpoint in _bt_unlink_halfdead_page() after the initial
> "LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);", and then provoke a page
> split in the left sibling by carefully inserting values that belong
> within its part of the key space? I would probably attempt this with
> an index on text, so that I could contrive as many key values that
> belong on the left sibling as needed.

A related question is if it matters - without complicating the code I
don't see how we could release all the locks in that loop. Therefore no
interrupts can be accepted.  I hope I'm missing something?

Greetings,

Andres Freund



Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-06-27 Thread Peter Geoghegan
On Wed, Jun 27, 2018 at 12:18 PM, Andres Freund  wrote:
> It's possibly wrong that it's unreachable - I've just not managed to get
> there. If somebody has an idea how to build a reproducible case to reach
> it...

Set a breakpoint in _bt_unlink_halfdead_page() after the initial
"LockBuffer(leafbuf, BUFFER_LOCK_UNLOCK);", and then provoke a page
split in the left sibling by carefully inserting values that belong
within its part of the key space? I would probably attempt this with
an index on text, so that I could contrive as many key values that
belong on the left sibling as needed.

You're probably looking at a test-case that doesn't involve a
multi-level deletion, where the leaf and target page are the same. Not
sure if you want to reproduce the multi-level deletion case
independently.


-- 
Peter Geoghegan



Re: Allow cancelling VACUUM of nbtrees with corrupted right links

2018-06-27 Thread Andres Freund
Hi,

On 2018-06-27 12:16:29 -0700, Andres Freund wrote:
> I couldn't see how to fix the _bt_unlink_halfdead_page() right-sib loop,
> because we always hold a lock. But given that that loop appears to
> mostly be dead code, that doesn't seem too bad?

It's possibly wrong that it's unreachable - I've just not managed to get
there. If somebody has an idea how to build a reproducible case to reach
it...

- Andres