On Thu, Nov 16, 2017 at 12:29 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Thu, Nov 16, 2017 at 7:08 AM, Masahiko Sawada <sawada.m...@gmail.com>
> wrote:
> > Agreed, that's better. Attached updated patch.
> > Also I've added this to the next CF so as not to forget.
>
> Committed and back-patched.  While I'm fairly sure this is a correct
> fix, post-commit review from someone who knows GIN better than I do
> would be a great idea.
>
> I am also clear on what the consequences of this bug are.  It seems
> like it could harm insert performance by making us wait when we
> shouldn't, but can it cause corruption?  That I'm not sure about.  If
> there's already a cleanup of the pending list in progress, how do
> things go wrong?
>
>

It can cause corruption.  Attached is a test case to show it.  It is based
on my usual crash-recovery test, but to get corruption I had to turn off
the crash-injection (which means the test can be run on an unpatched
server) and increase gin_pending_list_limit.

On 8 CPU, it took anywhere from 15 minutes to 8 hours to see corruption,
which presents as something like this in the script output:

child abnormal exit update did not update 1 row: key 8117 updated 2 at
count.pl line 193.\n  at count.pl line 201.

Your commit has apparently fixed it, but I will run some more extensive
tests.

The source of the corruption is this:

A tuple is inserted into the pending list.  It is quickly updated again, or
deleted, and then passes the dead-to-all horizon, so is eligible to vacuum.

A user back end takes the lock to start cleaning the pending list.

A vacuum back end attempts the lock, but due to the bug, it does so
conditionally and then decides it is done when that fails, and starts the
index vacuum proper.

The user back end reaches the dead tuple in the pending list, and inserts
into the main part of the index, in a part that the vacuum has already
passed over.

The vacuum goes back to the table and marks the tid slot as re-usable, even
though there is still an index tuple pointing to it.

Cheers,

Jeff

Attachment: count.pl
Description: Binary data

Attachment: do.sh
Description: Bourne shell script

Reply via email to