In this thread back in January
we saw evidence that contention for SInvalLock was a problem,
and we (or at least I) concluded that the sinval messaging system
needed a nontrivial rewrite to get rid of its habit of sending everyone
to read the queue at the same time.  Here is the proposed rewrite.

The major changes are:

* When the queue overflows, we only issue a cache reset to the specific
backend or backends that still haven't read the oldest message, rather
than resetting everyone as in the original coding.  The reset-everyone
approach forced unnecessary cache-rebuilding work in backends that
weren't actually behind.

* When we observe backend(s) falling well behind, we signal SIGUSR1
to only one backend, the one that is furthest behind and doesn't
already have a signal outstanding for it.  When it finishes catching
up, it will "pass on" the SIGUSR1 to the next-furthest-back guy,
if there is one that is far enough behind to justify a signal.
This changes the former behavior where every backend in the system
might try to hit the queue at the same time into a "daisy chain"
where only one backend is catching up at a time; furthermore,
backends that aren't well behind aren't forced to catch up too.

* We don't attempt to clean out dead messages after every
message-receipt operation; rather, we do it on the insertion side,
and only when the queue fullness passes certain thresholds.
(As presented, the thresholds are 8/16, 9/16, 10/16, etc, but this
could be adjusted by changing a couple of #defines.)  This greatly
reduces the amount of contention for exclusive hold on SInvalLock.

* Readers attempt to pull multiple messages (as presented, up to 32)
out of the queue for each acquisition of SInvalLock.  Although they
take the lock in shared mode and thus aren't contending at the
LWLock level, I thought this was important to do to reduce contention
for the lock's spinlock --- the prevalence of s_lock() in the "before"
oprofile convinces me that there was just too much traffic through
the lock itself.

Note: PMSIGNAL_WAKEN_CHILDREN is now unused and should be removed,
but I didn't include that change in the patch.

The test case I'm using looks like this:

$ cat createdrop.sql
create temp table register_reqs(full_key integer, register_index integer) 
$ pgbench -n -f createdrop.sql -c 40 -t 1000 bench

On my machine CVS HEAD gets about 485 transactions/sec on this; with the
patch that improves to 700 tps.

oprofile says that the routines above 1% of CPU in CVS HEAD are

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) 
with a unit mask of 0x01 (mandatory) count 100000
samples  %        image name               symbol name
363116   11.2433  postgres                 CatalogCacheIdInvalidate
316608    9.8032  postgres                 s_lock
305881    9.4711  postgres                 LWLockAcquire
272717    8.4442  postgres                 LWLockRelease
234812    7.2706  postgres                 CatalogCacheFlushRelation
215445    6.6709  postgres                 hash_search_with_hash_value
78544     2.4320  postgres                 _bt_compare
62007     1.9199  postgres                 SIGetDataEntry
48614     1.5052  postgres                 AllocSetAlloc
48033     1.4873  postgres                 XLogInsert
41280     1.2782  postgres                 PinBuffer
40659     1.2589  postgres                 LocalExecuteInvalidationMessage
36906     1.1427  postgres                 ResetCatalogCache
35708     1.1056  postgres                 hash_any
34151     1.0574  postgres                 PrepareToInvalidateCacheTuple

and with the patch

CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) 
with a unit mask of 0x01 (mandatory) count 100000
samples  %        image name               symbol name
550342   13.5424  postgres                 CatalogCacheIdInvalidate
362684    8.9246  postgres                 hash_search_with_hash_value
199291    4.9040  postgres                 LWLockAcquire
198277    4.8790  postgres                 CatalogCacheFlushRelation
149716    3.6841  postgres                 _bt_compare
114006    2.8054  postgres                 LWLockRelease
103138    2.5379  postgres                 XLogInsert
84471     2.0786  postgres                 PrepareToInvalidateCacheTuple
78464     1.9308  postgres                 LocalExecuteInvalidationMessage
75336     1.8538  postgres                 PinBuffer
74837     1.8415  postgres                 AllocSetAlloc
63764     1.5691  postgres                 SIGetDataEntries
60468     1.4879  postgres                 hash_any
51662     1.2713              memcpy
43999     1.0827              memcmp
43407     1.0681  postgres                 _bt_checkpage

so this is quite successful at eliminating spinlock-level contention
and does a reasonable job at reducing the amount of LWLock processing.

Comments?  Does anyone have more realistic test cases for this problem,
or want to fiddle with the tuning #defines in sinvaladt.c?

                        regards, tom lane

Attachment: bin7h80XxstUn.bin
Description: sinval-rewrite-1.patch.gz

Sent via pgsql-patches mailing list (
To make changes to your subscription:

Reply via email to