In this thread back in January http://archives.postgresql.org/pgsql-performance/2008-01/msg00001.php we saw evidence that contention for SInvalLock was a problem, and we (or at least I) concluded that the sinval messaging system needed a nontrivial rewrite to get rid of its habit of sending everyone to read the queue at the same time. Here is the proposed rewrite.
The major changes are: * When the queue overflows, we only issue a cache reset to the specific backend or backends that still haven't read the oldest message, rather than resetting everyone as in the original coding. The reset-everyone approach forced unnecessary cache-rebuilding work in backends that weren't actually behind. * When we observe backend(s) falling well behind, we signal SIGUSR1 to only one backend, the one that is furthest behind and doesn't already have a signal outstanding for it. When it finishes catching up, it will "pass on" the SIGUSR1 to the next-furthest-back guy, if there is one that is far enough behind to justify a signal. This changes the former behavior where every backend in the system might try to hit the queue at the same time into a "daisy chain" where only one backend is catching up at a time; furthermore, backends that aren't well behind aren't forced to catch up too. * We don't attempt to clean out dead messages after every message-receipt operation; rather, we do it on the insertion side, and only when the queue fullness passes certain thresholds. (As presented, the thresholds are 8/16, 9/16, 10/16, etc, but this could be adjusted by changing a couple of #defines.) This greatly reduces the amount of contention for exclusive hold on SInvalLock. * Readers attempt to pull multiple messages (as presented, up to 32) out of the queue for each acquisition of SInvalLock. Although they take the lock in shared mode and thus aren't contending at the LWLock level, I thought this was important to do to reduce contention for the lock's spinlock --- the prevalence of s_lock() in the "before" oprofile convinces me that there was just too much traffic through the lock itself. Note: PMSIGNAL_WAKEN_CHILDREN is now unused and should be removed, but I didn't include that change in the patch. The test case I'm using looks like this: $ cat createdrop.sql create temp table register_reqs(full_key integer, register_index integer) WITHOUT OIDS ON COMMIT DROP; $ pgbench -n -f createdrop.sql -c 40 -t 1000 bench On my machine CVS HEAD gets about 485 transactions/sec on this; with the patch that improves to 700 tps. oprofile says that the routines above 1% of CPU in CVS HEAD are CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 363116 11.2433 postgres CatalogCacheIdInvalidate 316608 9.8032 postgres s_lock 305881 9.4711 postgres LWLockAcquire 272717 8.4442 postgres LWLockRelease 234812 7.2706 postgres CatalogCacheFlushRelation 215445 6.6709 postgres hash_search_with_hash_value 78544 2.4320 postgres _bt_compare 62007 1.9199 postgres SIGetDataEntry 48614 1.5052 postgres AllocSetAlloc 48033 1.4873 postgres XLogInsert 41280 1.2782 postgres PinBuffer 40659 1.2589 postgres LocalExecuteInvalidationMessage 36906 1.1427 postgres ResetCatalogCache 35708 1.1056 postgres hash_any 34151 1.0574 postgres PrepareToInvalidateCacheTuple and with the patch CPU: P4 / Xeon with 2 hyper-threads, speed 2792.99 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % image name symbol name 550342 13.5424 postgres CatalogCacheIdInvalidate 362684 8.9246 postgres hash_search_with_hash_value 199291 4.9040 postgres LWLockAcquire 198277 4.8790 postgres CatalogCacheFlushRelation 149716 3.6841 postgres _bt_compare 114006 2.8054 postgres LWLockRelease 103138 2.5379 postgres XLogInsert 84471 2.0786 postgres PrepareToInvalidateCacheTuple 78464 1.9308 postgres LocalExecuteInvalidationMessage 75336 1.8538 postgres PinBuffer 74837 1.8415 postgres AllocSetAlloc 63764 1.5691 postgres SIGetDataEntries 60468 1.4879 postgres hash_any 51662 1.2713 libc-2.7.so memcpy 43999 1.0827 libc-2.7.so memcmp 43407 1.0681 postgres _bt_checkpage so this is quite successful at eliminating spinlock-level contention and does a reasonable job at reducing the amount of LWLock processing. Comments? Does anyone have more realistic test cases for this problem, or want to fiddle with the tuning #defines in sinvaladt.c? regards, tom lane
bin7h80XxstUn.bin
Description: sinval-rewrite-1.patch.gz
-- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches