On 01/24/2017 04:47 PM, Konstantin Knizhnik wrote:
As I already mentioned, we built Postgres with LOCK_DEBUG , so we can
inspect lock owner. Backend is waiting for itself!
Now please look at two frames in this stack trace marked with red.
XLogInsertRecord is setting WALInsert locks at the beginning of the
function:
if (isLogSwitch)
WALInsertLockAcquireExclusive();
else
WALInsertLockAcquire();
WALInsertLockAcquire just selects random item from WALInsertLocks array
and exclusively locks:
if (lockToTry == -1)
lockToTry = MyProc->pgprocno % NUM_XLOGINSERT_LOCKS;
MyLockNo = lockToTry;
immed = LWLockAcquire(&WALInsertLocks[MyLockNo].l.lock, LW_EXCLUSIVE);
Then, following the stack trace, AdvanceXLInsertBuffer calls
WaitXLogInsertionsToFinish:
/*
* Now that we have an up-to-date LogwrtResult value, see if we
* still need to write it or if someone else already did.
*/
if (LogwrtResult.Write < OldPageRqstPtr)
{
/*
* Must acquire write lock. Release WALBufMappingLock
first,
* to make sure that all insertions that we need to
wait for
* can finish (up to this same position). Otherwise we risk
* deadlock.
*/
LWLockRelease(WALBufMappingLock);
WaitXLogInsertionsToFinish(OldPageRqstPtr);
LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
It releases WALBufMappingLock but not WAL insert locks!
Finally in WaitXLogInsertionsToFinish tries to wait for all locks:
for (i = 0; i < NUM_XLOGINSERT_LOCKS; i++)
{
XLogRecPtr insertingat = InvalidXLogRecPtr;
do
{
/*
* See if this insertion is in progress. LWLockWait will
wait for
* the lock to be released, or for the 'value' to be set by a
* LWLockUpdateVar call. When a lock is initially
acquired, its
* value is 0 (InvalidXLogRecPtr), which means that we
don't know
* where it's inserting yet. We will have to wait for it. If
* it's a small insertion, the record will most likely fit
on the
* same page and the inserter will release the lock without
ever
* calling LWLockUpdateVar. But if it has to sleep, it will
* advertise the insertion point with LWLockUpdateVar before
* sleeping.
*/
if (LWLockWaitForVar(&WALInsertLocks[i].l.lock,
&WALInsertLocks[i].l.insertingAt,
insertingat, &insertingat))
And here we stuck!
Interesting.. What should happen here is that for the backend's own
insertion slot, the "insertingat" value should be greater than the
requested flush point ('upto' variable). That's because before
GetXLogBuffer() calls AdvanceXLInsertBuffer(), it updates the backend's
insertingat value, to the position that it wants to insert to. And
AdvanceXLInsertBuffer() only calls WaitXLogInsertionsToFinish() with
value smaller than what was passed as the 'upto' argument.
The comment to WaitXLogInsertionsToFinish says:
* Note: When you are about to write out WAL, you must call this function
* *before* acquiring WALWriteLock, to avoid deadlocks. This function might
* need to wait for an insertion to finish (or at least advance to next
* uninitialized page), and the inserter might need to evict an old WAL
buffer
* to make room for a new one, which in turn requires WALWriteLock.
Which contradicts to the observed stack trace.
Not AFAICS. In the stack trace you showed, the backend is not holding
WALWriteLock. It would only acquire it after the
WaitXLogInsertionsToFinish() call finished.
I wonder if it is really synchronization bug in xlog.c or there is
something wrong in this stack trace and it can not happen in case of
normal work?
Yeah, hard to tell. Something is clearly wrong..
This line in the stack trace is suspicious:
WaitXLogInsertionsToFinish(upto = 102067874328), line 1583 in "xlog.c"
AdvanceXLInsertBuffer() should only ever call
WaitXLogInsertionsToFinish() with an xlog position that points to a page
bounary, but that upto value points to the middle of a page.
Perhaps the value stored in the stack trace is not what the caller
passed, but it was updated because it was past the 'reserveUpto' value?
That would explain the "request to flush past end
of generated WAL" notices you saw in the log. Now, why would that
happen, I have no idea.
If you can and want to provide me access to the system, I could have a
look myself. I'd also like to see if the attached additional Assertions
will fire.
- Heikki
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f5d603066..a2ea03506a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1940,6 +1940,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
* already written out.
*/
OldPageRqstPtr = XLogCtl->xlblocks[nextidx];
+ Assert(OldPageRqstPtr <= XLogCtl->InitializedUpTo);
if (LogwrtResult.Write < OldPageRqstPtr)
{
/*
@@ -1970,6 +1971,10 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool
opportunistic)
*/
LWLockRelease(WALBufMappingLock);
+ /* we should exit the loop before reaching a
page beyond the
+ * requested 'upto' position. */
+ Assert(OldPageRqstPtr <= upto || opportunistic);
+
WaitXLogInsertionsToFinish(OldPageRqstPtr);
LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers