[HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability

Simon Riggs Tue, 11 Sep 2007 02:49:32 -0700

I've completed a review of all of the LWlocking in the backends. This is
documented in the enclosed file. I would propose that we use this as
comments in lwlock.h or in the README, if people agree.


A number of points emerge from that analysis:

1. The ProcArrayLock is acquired Exclusive-ly by only one remaining
operation: XidCacheRemoveRunningXids(). Reducing things to that level is
brilliant work, Florian and Tom. After analysis, I am still concerned
because subxact abort could now be starved out by large number of shared
holders, then when it is acquired we may experience starvation of shared
requestors, as described in point (4) here:
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php
I no longer want to solve it in the way described there, but have a
solution described in a separate post on -hackers. The original solution
still seems valid, but if we can solve it another way we should.

2. CountActiveBackends() searches the whole of the proc array, even
though it could stop when it gets to commit_siblings. Stopping once the
heuristic has been determined seems like the best thing to do. A small
patch to implement this is attached.

3. ReceiveSharedInvalidMessages() takes a Shared lock on SInvalLock,
then takes an Exclusive lock later in the same routine to perform
SIDelExpiredDataEntries(). The latter routine examines data that it
hasn't touched to see if it can delete anything. If it finds anything
other than its own consumed message it will only be because it beat
another backend in the race to delete a message it just consumed. So
most callers of SIDelExpiredDataEntries() will do nothing at all, after
having queued for an X lock. I can't see the sense in that, but maybe
there is some deeper purpose? ISTM that we should only attempt to clean
the queue when it fills, during SIInsertDataEntry(), which it already
does. We want to avoid continually re-triggering postmaster signals, but
we should do that anyway with a "yes-I-already-did-that" flag, rather
than by eager cleaning of the queue, which just defers a postmaster
signal storm, but does not prevent it.

4. WALWriteLock is acquired in Shared mode by bgwriter when it runs
GetLastSegSwitchTime(). All other callers are Exclusive lockers, so the
Shared request will queue like everybody else. WALWriteLock queue length
can be long, so the bgwriter can get stuck for much longer than
bgwriter_delay when it makes this call; this happens only when
archive_timeout > 0 so probably has never shown up in any performance
testing. XLogWrite takes info_lck also, so we can move the
lastSegSwitchTime behind that lock instead. That way bgwriter need never
wait on I/O, just spin for access to info_lck. Minor change.

5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(),
but I can't find a caller of that anywhere in core or contrib. Can those
now be removed?

6. David Strong talked about doing some testing to see if
NUM_BUFFER_PARTITIONS should be increased above 16. We don't have any
further information on that. Should we increase the value to 32 or 64? A
minor increase seems safe and should provide the most gain without
decreasing performance for lower numbers of CPUs.

7. VACUUM has many contention points within it, so HOT should avoid the
annoyance of having to run VACUUM repeatedly on very small
heavily-updated tables.

I haven't further analysed the SLRU locks, since nothing much has
changed there recently and they were already pretty efficient, IIRC.

I'm working on patches for 1-4. We've moved far in recent weeks, so it
seems like we should finish the job.

Comments?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

        BufFreelistLock,
                        /* X - all - each time we allocate a new buffer for 
data block I/O
                         * Never held across I/O
                         */
        ShmemIndexLock,                         
                        /* X - all - create/attach to shared memory
                         * Never held across I/O
                         */
        OidGenLock,
                        /* X - all - each GetNewOid() and each 
GetNewRelFileNode()
                         * S - bgwriter - acquired during checkpoint
                         * Writes WAL record every 8192 OIDs, so vanishing 
chance
                         * of being held across I/O
                         */
        XidGenLock,
                        /* X - all - for each GetNewTransactionId()
                         *                      check whether we need to call 
ExtendClog or ExtendSubtrans
                         *                      could be held across I/O if 
clog or subtrans buffers
                         *                      have a dirty LRU page
                         * S - all - for each ReadNewTransactionId()
*5                       *                      called by GetNextXidAndEpoch(), 
                         *                      once per VACUUM of each relation
                         *                      once per start of autovacuum 
worker
                         * X - all - for each SetTransactionIdLimit()
                         *                      called after each VACUUM of 
whole database and 
                         *                      at EOXact if we update catalogs 
and write relcache file
                         * X - bgwriter - acquired during checkpoint
                         */
        ProcArrayLock,  
                        /* X - all - adding/removing procs from procarray
                         *                      backend start or exit, two 
phase commits
*1                       * X - all - XidCacheRemoveRunningXids()
                         * S - all - TransactionIdIsInProgress(), 
TransactionIdIsActive(),
                         *                      GetOldestXmin(), 
GetSnapshotData(), GetTransactionsInCommit()
                         *                      HaveTransactionsInCommit(), 
BackendPidGetProc(),
                         *                      BackendXidGetProc(), 
GetCurrentVirtualXids(),
                         *                      CountDBBackends(), 
CountUserBackends(),
*2                       *                      CheckOtherDBBackends(), 
CountActiveBackends()
                         */
        SInvalLock,
                        /* X - all - backend startup or exit
                         * X - all - send SInval message
                         * S - all - receive SInval message
*3                       * X - all - release dead SInval messages
                         */
        FreeSpaceLock,
                        /* X - access to the FSM to reuse a block, record 
freespace
                         * X - held during VACUUM to record free space, maybe 
rearrange FSM
                         * Never held across I/O, except at database 
startup/shutdown
                         */
        WALInsertLock,
                        /* X - insert data into WAL buffers
                         * Holder may acquire WALWriteLock if WAL buffers full
                         */
        WALWriteLock,
                        /* X - any - write WAL buffers to disk - Always held 
across I/O
*4                       * S - bgwriter - each loop checks 
GetLastSegSwitchTime()
                         * Holder conditionally acquiresmay WALInsertLock to 
perform
                         * piggyback I/O on WAL 
                         */
        ControlFileLock,
                        /* X - any - must be held to read/write from Control 
file 
                         * Always held across I/O
                         */ 
        CheckpointLock,
                        /* X - bgwriter - must be held to perform 
CreateCheckpoint
                         * Holder always acquires WALInsertLock, XidGenLock, 
OidGenLock,
                         * ProcArrayLock and ControlFileLock 
                         */
        CLogControlLock,
        SubtransControlLock,
        MultiXactGenLock,
        MultiXactOffsetControlLock,
        MultiXactMemberControlLock,
                        /* SLRU locks
                         */
        RelCacheInitLock,
        BgWriterCommLock,
        TwoPhaseStateLock,
        TablespaceCreateLock,
        BtreeVacuumLock,
        AddinShmemInitLock,
        AutovacuumLock,
        AutovacuumScheduleLock,
        SyncScanLock,
                        /* X - any - once per large SeqScan, plus conditionally 
once 
                         *                              per ~16 blocks, during 
ss_report_location()
                         */
        /* Individual lock IDs end here */
        FirstBufMappingLock,
*6      FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,

        /* must be last except for MaxDynamicLWLock: */
        NumFixedLWLocks = FirstLockMgrLock + NUM_LOCK_PARTITIONS,

        MaxDynamicLWLock = 1000000000

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[HACKERS] Final Thoughts for 8.3 on LWLocking and Scalability

Reply via email to