Simon, In the 16/16 (16 buffer partitions/16 lock partitions) test, the WALInsertLock lock had 14643080 acquisition attempts and 12057678 successful acquisitions on the lock. That's 2585402 retries on the lock. That is to say that PGSemaphoreLock was invoked 2585402 times.
In the 128/128 test, the WALInsertLock lock had 14991208 acquisition attempts and 12324765 successful acquisitions. That's 2666443 retries. The 128/128 test attempted 348128 more lock acquisitions than the 16/16 test and retried 81041 times more than the 16/16 test. We attribute the rise in WALInsertLock lock accesses to the reduction in time on acquiring the BufMapping and LockMgr partition locks. Does this seem reasonable? The overhead of any monitoring is of great concern to us. We've tried both clock_gettime () and getttimeofday () calls. They both seem to have the same overhead ~1 us/call (measured against the TSC of the CPU) and both seem to be accurate. We realize this can be a delicate point and so we would be happy to rerun any tests with a different timing mechanism. David -----Original Message----- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 13, 2006 2:22 AM To: Tom Lane Cc: Strong, David; PostgreSQL-development Subject: Re: [HACKERS] Lock partitions On Tue, 2006-09-12 at 12:40 -0400, Tom Lane wrote: > "Strong, David" <[EMAIL PROTECTED]> writes: > > When using 16 buffer and 16 lock partitions, we see that BufMapping > > takes 809 seconds to acquire locks and 174 seconds to release locks. The > > LockMgr takes 362 seconds to acquire locks and 26 seconds to release > > locks. > > > When using 128 buffer and 128 lock partitions, we see that BufMapping > > takes 277 seconds (532 seconds improvement) to acquire locks and 78 > > seconds (96 seconds improvement) to release locks. The LockMgr takes 235 > > seconds (127 seconds improvement) to acquire locks and 22 seconds (4 > > seconds improvement) to release locks. > > While I don't see any particular penalty to increasing > NUM_BUFFER_PARTITIONS, increasing NUM_LOCK_PARTITIONS carries a very > significant penalty (increasing PGPROC size as well as the work needed > during LockReleaseAll, which is executed at every transaction end). > I think 128 lock partitions is probably verging on the ridiculous > ... particularly if your benchmark only involves touching half a dozen > tables. I'd be more interested in comparisons between 4 and 16 lock > partitions. Also, please vary the two settings independently rather > than confusing the issue by changing them both at once. Good thinking David. Even if 128 is fairly high, it does seem worth exploring higher values - I was just stuck in "fewer == better" thoughts. > > With the improvements in the various locking times, one might expect an > > improvement in the overall benchmark result. However, a 16 partition run > > produces a result of 198.74 TPS and a 128 partition run produces a > > result of 203.24 TPS. > > > Part of the time saved from BufMapping and LockMgr partitions is > > absorbed into the WALInsertLock lock. For a 16 partition run, the total > > time to lock/release the WALInsertLock lock is 5845 seconds. For 128 > > partitions, the WALInsertLock lock takes 6172 seconds, an increase of > > 327 seconds. Perhaps we have our WAL configured incorrectly? > > I fear this throws your entire measurement procedure into question. For > a fixed workload the number of acquisitions of WALInsertLock ought to be > fixed, so you shouldn't see any more contention for WALInsertLock if the > transaction rate didn't change materially. David's results were to do with lock acquire/release time, not the number of acquisitions, so that in itself doesn't make me doubt these measurements. Perhaps we can ask whether there was a substantially different number of lock acquisitions? As Tom says, that would be an issue. It seems reasonable that relieving the bottleneck on BufMapping and LockMgr locks that we would then queue longer on the next bottleneck, WALInsertLock. So again, those tests seem reasonable to me so far. These seem to be the beginnings of accurate wait time analysis, so I'm listening closely. Are you using a lightweight timer? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match