Hi hackers,
I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is
defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug
seems to be platform-independent and should reproduce in other environments.
A patch to fix the bug is also attached.
## Reproduing a crash
You can reproduce a crash by this way:
git co a0841ecd2518d4505b96132b764b918ab5d21ad4
git clean -dfx
./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
make check
In my environment, the following messages appeared.
( omit... )
../../../src/test/regress/pg_regress --inputdir=.
--temp-install=./tmp_check --top-builddir=../../..--dlpath=.
--schedule=./parallel_schedule
== creating temporary installation==
== initializing database system ==
pg_regress: initdb failed
and initdb.log contained the following messages.
reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in
/tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main
142: shacq 0 exacq 1 blk 0 spindelay 0
( omit... )
PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
ok
loading system objects' descriptions ... TRAP:
FailedAssertion(!(CritSectionCount == 0 || (context) == ErrorContext ||
(MyAuxProcType == CheckpointerProcess)), File: mcxt.c, Line: 594)
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory /tmp/pghead/src/test/regress/./tmp_check/data not
removed at user's request
## The cause of crash
The failing assertion is for prohibiting memory allocation in a critical
section, which is introduced by commit 4a170ee9 on 2014-04-04.
In my understanding, the root cause of the assertion failure is on-demand
allocation of lwlock_stats entry. For each LWLock, a lwlock_stats entry is
created at the first invocation of LWLockAcquire using MemoryContextAlloc. If
the first invocation is in a critical section, the assertion fails.
For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function
was the problem.
I also confirmed the assertion failure on starting postgres on a correctly
initialized database. In this case, locking CheckpointerCommLock in
AbsorbFsyncRequests function was the problem.
## A solution
In order to avoid memory allocation during critical sections, lwlock_stats hash
table should be populated at the initialization of each process.
The attached patch populate lwlock_stats entries of MainLWLockArray at the end
of CreateLWLocks, InitProcess and InitAuxiliaryProcess.
With this patch, all regression tests can be passed so far, but I think this
patch is not perfect because it does not cover LWLocks outside of
MainLWLockArray. I'm not sure where is the right place to initialize
lwlock_stats entries for that locks. So I feel it needs some refinements by
you hackers.
From f96708d14ab0073abd95c463eaf8d60db42f411d Mon Sep 17 00:00:00 2001
From: Yuto Hayamizu y.hayam...@gmail.com
Date: Tue, 20 May 2014 16:19:56 +0900
Subject: [PATCH] pre-populating lwlock_stats entries
---
src/backend/storage/lmgr/lwlock.c | 21 +
src/backend/storage/lmgr/proc.c |8
src/include/storage/lwlock.h |4
3 files changed, 33 insertions(+)
diff --git a/src/backend/storage/lmgr/lwlock.c
b/src/backend/storage/lmgr/lwlock.c
index d23ac62..fc97ae0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -359,8 +359,29 @@ CreateLWLocks(void)
MainLWLockTranche.array_base = MainLWLockArray;
MainLWLockTranche.array_stride = sizeof(LWLockPadded);
LWLockRegisterTranche(0, MainLWLockTranche);
+
+#ifdef LWLOCK_STATS
+ InitializeLWLockStats();
+#endif
}
+#ifdef LWLOCK_STATS
+void
+InitializeLWLockStats(void)
+{
+ int numLocks = NumLWLocks();
+ int id;
+ LWLockPadded*lock;
+
+ if (MyProc == NULL) return;
+
+ /* Initialize all lwlock_stats entries of MainLWLockArray */
+ for (id = 0, lock = MainLWLockArray; id numLocks; id++, lock++)
+ {
+ get_lwlock_stats_entry(lock-lock);
+ }
+}
+#endif
/*
* LWLockAssign - assign a dynamically-allocated LWLock number
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 266b0da..e09cbf8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -415,6 +415,10 @@ InitProcess(void)
* the deadlock checker.
*/