Re: [HACKERS] HEAD crashes with assertion and LWLOCK_STATS enabled

2014-08-07 Thread Fujii Masao
On Fri, May 23, 2014 at 9:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU y.hayam...@gmail.com wrote:
 The failing assertion is for prohibiting memory allocation in a critical 
 section, which is introduced by commit 4a170ee9 on 2014-04-04.

This issue is still in open item list for 9.4. But the commit which had
caused this issue was reverted by d27d493. So I removed this from the
Open Item list.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HEAD crashes with assertion and LWLOCK_STATS enabled

2014-05-23 Thread Robert Haas
On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU y.hayam...@gmail.com wrote:
 The failing assertion is for prohibiting memory allocation in a critical 
 section, which is introduced by commit 4a170ee9 on 2014-04-04.

 In my understanding, the root cause of the assertion failure is on-demand 
 allocation of lwlock_stats entry.  For each LWLock, a lwlock_stats entry is 
 created at the first invocation of LWLockAcquire using MemoryContextAlloc.  
 If the first invocation is in a critical section, the assertion fails.

 For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush 
 function was the problem.
 I also confirmed the assertion failure on starting postgres on a correctly 
 initialized database. In this case, locking CheckpointerCommLock in 
 AbsorbFsyncRequests function was the problem.

 ## A solution

 In order to avoid memory allocation during critical sections, lwlock_stats 
 hash table should be populated at the initialization of each process.
 The attached patch populate lwlock_stats entries of MainLWLockArray at the 
 end of CreateLWLocks, InitProcess and InitAuxiliaryProcess.

 With this patch, all regression tests can be passed so far, but I think this 
 patch is not perfect because it does not cover LWLocks outside of 
 MainLWLockArray.  I'm not sure where is the right place to initialize 
 lwlock_stats entries for that locks.  So I feel it needs some refinements by 
 you hackers.

Prior to my commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f, which
introduced LWLockTranche, we used to allocate enough storage for all
of the LWLOCK_STATS entries in the system; that commit changed things
so that we allocate entries for particular LWLocks on an as-needed
basis.  Although that wasn't the main point of that patch, I thought
it was a nice idea, since it might save you quite a bit of memory if
you have a lot of backends that don't touch very many LWLocks.  But
maybe we need to give up on that in view of this report.

I don't think we should adopt the approach proposed in this patch,
though, because if we're going to preallocate all of the entries
anyway there's little reason to use a hash table instead of an array.
If we're going to go with the approach of preallocating all the
entries, maybe we should change the definition of LWLockTranche to
include the number of lwlocks in the tranche.  We could then add
another array parallel to LWLockTrancheArray which would point to an
appropriately-sized array of lwlock_stats objects for each tranche.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HEAD crashes with assertion and LWLOCK_STATS enabled

2014-05-20 Thread Yuto HAYAMIZU
Hi hackers,

I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is 
defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug 
seems to be platform-independent and should reproduce in other environments.
A patch to fix the bug is also attached.

## Reproduing a crash

You can reproduce a crash by this way:

git co a0841ecd2518d4505b96132b764b918ab5d21ad4
git clean -dfx
./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
make check

In my environment, the following messages appeared.

( omit... )
../../../src/test/regress/pg_regress --inputdir=. 
--temp-install=./tmp_check --top-builddir=../../..--dlpath=.  
--schedule=./parallel_schedule
== creating temporary installation==
== initializing database system   ==

pg_regress: initdb failed

and initdb.log contained the following messages.

reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in 
/tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 
142: shacq 0 exacq 1 blk 0 spindelay 0
( omit... )
PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
ok
loading system objects' descriptions ... TRAP: 
FailedAssertion(!(CritSectionCount == 0 || (context) == ErrorContext || 
(MyAuxProcType == CheckpointerProcess)), File: mcxt.c, Line: 594)
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory /tmp/pghead/src/test/regress/./tmp_check/data not 
removed at user's request

## The cause of crash

The failing assertion is for prohibiting memory allocation in a critical 
section, which is introduced by commit 4a170ee9 on 2014-04-04.

In my understanding, the root cause of the assertion failure is on-demand 
allocation of lwlock_stats entry.  For each LWLock, a lwlock_stats entry is 
created at the first invocation of LWLockAcquire using MemoryContextAlloc.  If 
the first invocation is in a critical section, the assertion fails.

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function 
was the problem.
I also confirmed the assertion failure on starting postgres on a correctly 
initialized database. In this case, locking CheckpointerCommLock in 
AbsorbFsyncRequests function was the problem.

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash 
table should be populated at the initialization of each process.
The attached patch populate lwlock_stats entries of MainLWLockArray at the end 
of CreateLWLocks, InitProcess and InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this 
patch is not perfect because it does not cover LWLocks outside of 
MainLWLockArray.  I'm not sure where is the right place to initialize 
lwlock_stats entries for that locks.  So I feel it needs some refinements by 
you hackers.
From f96708d14ab0073abd95c463eaf8d60db42f411d Mon Sep 17 00:00:00 2001
From: Yuto Hayamizu y.hayam...@gmail.com
Date: Tue, 20 May 2014 16:19:56 +0900
Subject: [PATCH] pre-populating lwlock_stats entries

---
 src/backend/storage/lmgr/lwlock.c |   21 +
 src/backend/storage/lmgr/proc.c   |8 
 src/include/storage/lwlock.h  |4 
 3 files changed, 33 insertions(+)

diff --git a/src/backend/storage/lmgr/lwlock.c 
b/src/backend/storage/lmgr/lwlock.c
index d23ac62..fc97ae0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -359,8 +359,29 @@ CreateLWLocks(void)
MainLWLockTranche.array_base = MainLWLockArray;
MainLWLockTranche.array_stride = sizeof(LWLockPadded);
LWLockRegisterTranche(0, MainLWLockTranche);
+
+#ifdef LWLOCK_STATS
+   InitializeLWLockStats();
+#endif
 }
 
+#ifdef LWLOCK_STATS
+void
+InitializeLWLockStats(void)
+{
+   int numLocks = NumLWLocks();
+   int id;
+   LWLockPadded*lock;
+
+   if (MyProc == NULL) return;
+
+   /* Initialize all lwlock_stats entries of MainLWLockArray */
+   for (id = 0, lock = MainLWLockArray; id  numLocks; id++, lock++)
+   {
+   get_lwlock_stats_entry(lock-lock);
+   }
+}
+#endif
 
 /*
  * LWLockAssign - assign a dynamically-allocated LWLock number
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 266b0da..e09cbf8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -415,6 +415,10 @@ InitProcess(void)
 * the deadlock checker.
 */