On Fri, 2007-06-01 at 16:58 +0400, Teodor Sigaev wrote:
> >> <2007-06-01 13:11:29.365 CEST:%> DEBUG: 00000: Ressource manager (13)
> >> has partial state information
> > To me, this points clearly to there being an improperly completed action
> > in resource manager 13. (GIN) In summary, it appears that there may be
> > an issue with the GIN code for WAL recovery and this is effecting the
> > Warm Standby.
>
> Hmm. I found that gin_xlog_cleanup doesn't reset incomplete_splits list. Is
> it
> possible reason of bug?
Hi Teodor,
Hmm, well, the list should be empty by that point anyway. That code is
only executed at the end of xlog replay, not half-way through as we are
seeing.
There are two possibilities:
1. There are some incomplete splits, pointing to a likely bug in GIN
2. There are so many index splits that we aren't able to make a
successful restartpoint using the current mechanism. Not a bug, but
would be an issue with how restartpoints interact with GIN (possibly
other index types also).
When we wrote this I thought (2) would be a problem, but its not shown
up to be so for btrees (yet, I guess). I have some ideas if its (2).
The attached patch should show which of these it is. I'll dress it up a
little better so we have a debug option on this. Please note I've not
tested this patch myself, so Frank if you don't mind me splatting
something at you we'll see what we see.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Index: src/backend/access/gin/ginxlog.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/access/gin/ginxlog.c,v
retrieving revision 1.6
diff -c -r1.6 ginxlog.c
*** src/backend/access/gin/ginxlog.c 5 Jan 2007 22:19:21 -0000 1.6
--- src/backend/access/gin/ginxlog.c 1 Jun 2007 13:35:05 -0000
***************
*** 26,37 ****
BlockNumber leftBlkno;
BlockNumber rightBlkno;
BlockNumber rootBlkno;
} ginIncompleteSplit;
static List *incomplete_splits;
static void
! pushIncompleteSplit(RelFileNode node, BlockNumber leftBlkno, BlockNumber rightBlkno, BlockNumber rootBlkno)
{
ginIncompleteSplit *split;
--- 26,39 ----
BlockNumber leftBlkno;
BlockNumber rightBlkno;
BlockNumber rootBlkno;
+ XLogRecPtr lsn;
} ginIncompleteSplit;
static List *incomplete_splits;
static void
! pushIncompleteSplit(RelFileNode node, BlockNumber leftBlkno,
! BlockNumber rightBlkno, BlockNumber rootBlkno, XLogRecPtr lsn)
{
ginIncompleteSplit *split;
***************
*** 43,48 ****
--- 45,51 ----
split->leftBlkno = leftBlkno;
split->rightBlkno = rightBlkno;
split->rootBlkno = rootBlkno;
+ split->lsn = lsn;
incomplete_splits = lappend(incomplete_splits, split);
***************
*** 324,330 ****
UnlockReleaseBuffer(rootBuf);
}
else
! pushIncompleteSplit(data->node, data->lblkno, data->rblkno, data->rootBlkno);
UnlockReleaseBuffer(rbuffer);
UnlockReleaseBuffer(lbuffer);
--- 327,333 ----
UnlockReleaseBuffer(rootBuf);
}
else
! pushIncompleteSplit(data->node, data->lblkno, data->rblkno, data->rootBlkno, lsn);
UnlockReleaseBuffer(rbuffer);
UnlockReleaseBuffer(lbuffer);
***************
*** 600,605 ****
--- 603,623 ----
gin_safe_restartpoint(void)
{
if (incomplete_splits)
+ {
+ ListCell *l;
+ int nsplits = list_length(incomplete_splits);
+
+ elog(LOG,"GIN incomplete splits=%d", nsplits);
+ if (nsplits < 10)
+ {
+ foreach(l, incomplete_splits)
+ {
+ ginIncompleteSplit *split = (ginIncompleteSplit *) lfirst(l);
+ elog(LOG,"GIN incomplete split root:%u l:%u r:%u at redo %X/%X",
+ split->rootBlkno, split->leftBlkno, split->rightBlkno, split->lsn.xlogid, split->lsn.xrecoff);
+ }
+ }
return false;
+ }
return true;
}
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster