Re: pgsql: Improve performance of subsystems on top of SLRU
FWIW there's a stupid bug in 0002, which is fixed here. I'm writing a simple test for it. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Now I have my system running, not a byte was off the shelf; It rarely breaks and when it does I fix the code myself. It's stable, clean and elegant, and lightning fast as well, And it doesn't cost a nickel, so Bill Gates can go to hell." >From ecf613092a3cbc4c5112d30af7eea55b668decec Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Mon, 4 Mar 2024 11:49:01 +0100 Subject: [PATCH v3] Rework redundant loop in subtrans.c --- src/backend/access/transam/subtrans.c | 29 +++ 1 file changed, 7 insertions(+), 22 deletions(-) diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c index dc9566fb51..50bb1d8cfc 100644 --- a/src/backend/access/transam/subtrans.c +++ b/src/backend/access/transam/subtrans.c @@ -311,7 +311,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID) FullTransactionId nextXid; int64 startPage; int64 endPage; - LWLock *prevlock; + LWLock *prevlock = NULL; LWLock *lock; /* @@ -324,42 +324,27 @@ StartupSUBTRANS(TransactionId oldestActiveXID) nextXid = TransamVariables->nextXid; endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid)); - prevlock = SimpleLruGetBankLock(SubTransCtl, startPage); - LWLockAcquire(prevlock, LW_EXCLUSIVE); - while (startPage != endPage) + for (;;) { lock = SimpleLruGetBankLock(SubTransCtl, startPage); - - /* - * Check if we need to acquire the lock on the new bank then release - * the lock on the old bank and acquire on the new bank. - */ if (prevlock != lock) { - LWLockRelease(prevlock); + if (prevlock) +LWLockRelease(prevlock); LWLockAcquire(lock, LW_EXCLUSIVE); prevlock = lock; } (void) ZeroSUBTRANSPage(startPage); + if (startPage == endPage) + break; + startPage++; /* must account for wraparound */ if (startPage > TransactionIdToPage(MaxTransactionId)) startPage = 0; } - lock = SimpleLruGetBankLock(SubTransCtl, startPage); - - /* - * Check if we need to acquire the lock on the new bank then release the - * lock on the old bank and acquire on the new bank. - */ - if (prevlock != lock) - { - LWLockRelease(prevlock); - LWLockAcquire(lock, LW_EXCLUSIVE); - } - (void) ZeroSUBTRANSPage(startPage); LWLockRelease(lock); } -- 2.39.2
Re: pgsql: Improve performance of subsystems on top of SLRU
On 2024-Mar-03, Tom Lane wrote: > This is certainly simpler, but I notice that it holds the current > LWLock across the line > > ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); > > where the old code did not. Could the palloc take long enough that > holding the lock is bad? Hmm, I guess most of the time it shouldn't be much of a problem (if the length is small so we can palloc without malloc'ing); but it could be in the worst case. But the fix is simple: just release the lock before. There's no correctness argument for holding it all the way down. I was just confused about how the original code worked. > Also, with this coding the "lock = NULL;" assignment just before > "goto retry" is a dead store. Not sure if Coverity or other static > analyzers would whine about that. Oh, right. I removed that. On 2024-Mar-04, Dilip Kumar wrote: > I am not sure about the other changes, I mean that makes the code much > simpler but now we are not releasing the 'MultiXactOffsetCtl' related > bank lock, and later in the following loop, we are comparing that lock > against 'MultiXactMemberCtl' related bank lock. This makes code > simpler because now in the loop we are sure that we are always holding > the lock but I do not like comparing the bank locks for 2 different > SLRUs, although there is no problem as there would not be a common > lock address, True. This can be addressed in the same way Tom's first comment is: just release the lock before entering the second loop, and setting lock to NULL. This brings the code to a similar state than before, except that the additional LWLock * variables are in a tighter scope. That's in 0001. Now, I had a look at the other users of slru.c and noticed in subtrans.c that StartupSUBTRANS we have some duplicate code that I think could be rewritten by making the "while" block test the condition at the end instead of at the start; changed that in 0002. I'll leave this one for later, because I want to add some test code for it -- right now it's pretty much test-uncovered code. I also looked at slru.c for uses of shared->bank_locks and noticed a couple that could be made simpler. That's 0003 here. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "If it is not right, do not do it. If it is not true, do not say it." (Marcus Aurelius, Meditations) >From a6be7cac5de83a36e056147f3e81bea2eb1096bd Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sun, 3 Mar 2024 15:20:36 +0100 Subject: [PATCH v2 1/3] rework locking code in GetMultiXactIdMembers Per Coverity --- src/backend/access/transam/multixact.c | 53 +++--- 1 file changed, 22 insertions(+), 31 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index cd476b94fa..83b578dced 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1247,14 +1247,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members, MultiXactOffset offset; int length; int truelength; - int i; MultiXactId oldestMXact; MultiXactId nextMXact; MultiXactId tmpMXact; MultiXactOffset nextOffset; MultiXactMember *ptr; LWLock *lock; - LWLock *prevlock = NULL; debug_elog3(DEBUG2, "GetMembers: asked for %u", multi); @@ -1361,18 +1359,9 @@ retry: pageno = MultiXactIdToOffsetPage(multi); entryno = MultiXactIdToOffsetEntry(multi); - /* - * If this page falls under a different bank, release the old bank's lock - * and acquire the lock of the new bank. - */ + /* Acquire the bank lock for the page we need. */ lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (lock != prevlock) - { - if (prevlock != NULL) - LWLockRelease(prevlock); - LWLockAcquire(lock, LW_EXCLUSIVE); - prevlock = lock; - } + LWLockAcquire(lock, LW_EXCLUSIVE); slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno]; @@ -1407,17 +1396,19 @@ retry: if (pageno != prev_pageno) { + LWLock *newlock; + /* * Since we're going to access a different SLRU page, if this page * falls under a different bank, release the old bank's lock and * acquire the lock of the new bank. */ - lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (prevlock != lock) + newlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); + if (newlock != lock) { -LWLockRelease(prevlock); -LWLockAcquire(lock, LW_EXCLUSIVE); -prevlock = lock; +LWLockRelease(lock); +LWLockAcquire(newlock, LW_EXCLUSIVE); +lock = newlock; } slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact); } @@ -1429,8 +1420,7 @@ retry: if (nextMXOffset == 0) { /* Corner case 2: next multixact is still being filled in */ - LWLockRelease(prevlock); - prevlock = NULL; + LWLockRelease(lock);
Re: pgsql: Improve performance of subsystems on top of SLRU
On Mon, Mar 4, 2024 at 1:56 AM Alvaro Herrera wrote: > > On 2024-Feb-28, Alvaro Herrera wrote: > > > Improve performance of subsystems on top of SLRU > > Coverity had the following complaint about this commit: > > > *** CID NNN: Control flow issues (DEADCODE) > /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: > 1375 in GetMultiXactIdMembers() > 1369 * and acquire the lock of the new bank. > 1370 */ > 1371lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); > 1372if (lock != prevlock) > 1373{ > 1374if (prevlock != NULL) > >>> CID 1592913: Control flow issues (DEADCODE) > >>> Execution cannot reach this statement: "LWLockRelease(prevlock);". > 1375LWLockRelease(prevlock); > 1376LWLockAcquire(lock, LW_EXCLUSIVE); > 1377prevlock = lock; > 1378} > 1379 > 1380slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, > multi); > > And I think it's correct that this is somewhat bogus, or at least > confusing: the only way to have control back here on line 1371 after > having executed once is via the "goto retry" line below; and there we > release "prevlock" and set it to NULL beforehand, so it's impossible for > prevlock to be NULL. Looking closer I think this code is all confused, > so I suggest to rework it as shown in the attached patch. > > I'll have a look at the other places where we use this "prevlock" coding > pattern tomorrow. + /* Acquire the bank lock for the page we need. */ lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (lock != prevlock) - { - if (prevlock != NULL) - LWLockRelease(prevlock); - LWLockAcquire(lock, LW_EXCLUSIVE); - prevlock = lock; - } + LWLockAcquire(lock, LW_EXCLUSIVE); This part is definitely an improvement. I am not sure about the other changes, I mean that makes the code much simpler but now we are not releasing the 'MultiXactOffsetCtl' related bank lock, and later in the following loop, we are comparing that lock against 'MultiXactMemberCtl' related bank lock. This makes code simpler because now in the loop we are sure that we are always holding the lock but I do not like comparing the bank locks for 2 different SLRUs, although there is no problem as there would not be a common lock address, anyway, I do not have any strong objection to what you have done here. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pgsql: Improve performance of subsystems on top of SLRU
Alvaro Herrera writes: > And I think it's correct that this is somewhat bogus, or at least > confusing: the only way to have control back here on line 1371 after > having executed once is via the "goto retry" line below; and there we > release "prevlock" and set it to NULL beforehand, so it's impossible for > prevlock to be NULL. Looking closer I think this code is all confused, > so I suggest to rework it as shown in the attached patch. This is certainly simpler, but I notice that it holds the current LWLock across the line ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); where the old code did not. Could the palloc take long enough that holding the lock is bad? Also, with this coding the "lock = NULL;" assignment just before "goto retry" is a dead store. Not sure if Coverity or other static analyzers would whine about that. regards, tom lane
Re: pgsql: Improve performance of subsystems on top of SLRU
On 2024-Feb-28, Alvaro Herrera wrote: > Improve performance of subsystems on top of SLRU Coverity had the following complaint about this commit: *** CID NNN: Control flow issues (DEADCODE) /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 1375 in GetMultiXactIdMembers() 1369 * and acquire the lock of the new bank. 1370 */ 1371lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); 1372if (lock != prevlock) 1373{ 1374if (prevlock != NULL) >>> CID 1592913: Control flow issues (DEADCODE) >>> >>> Execution cannot reach this statement: "LWLockRelease(prevlock);". >>> 1375LWLockRelease(prevlock); 1376LWLockAcquire(lock, LW_EXCLUSIVE); 1377prevlock = lock; 1378} 1379 1380slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); And I think it's correct that this is somewhat bogus, or at least confusing: the only way to have control back here on line 1371 after having executed once is via the "goto retry" line below; and there we release "prevlock" and set it to NULL beforehand, so it's impossible for prevlock to be NULL. Looking closer I think this code is all confused, so I suggest to rework it as shown in the attached patch. I'll have a look at the other places where we use this "prevlock" coding pattern tomorrow. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ >From a82fa380c7ff5a468ca920f4c06e626946bc8ecf Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Sun, 3 Mar 2024 15:20:36 +0100 Subject: [PATCH] rework locking code in GetMultiXactIdMembers Per Coverity --- src/backend/access/transam/multixact.c | 52 ++ 1 file changed, 20 insertions(+), 32 deletions(-) diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9b81506145..ec446949a9 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -1250,14 +1250,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members, MultiXactOffset offset; int length; int truelength; - int i; MultiXactId oldestMXact; MultiXactId nextMXact; MultiXactId tmpMXact; MultiXactOffset nextOffset; MultiXactMember *ptr; LWLock *lock; - LWLock *prevlock = NULL; debug_elog3(DEBUG2, "GetMembers: asked for %u", multi); @@ -1364,18 +1362,9 @@ retry: pageno = MultiXactIdToOffsetPage(multi); entryno = MultiXactIdToOffsetEntry(multi); - /* - * If this page falls under a different bank, release the old bank's lock - * and acquire the lock of the new bank. - */ + /* Acquire the bank lock for the page we need. */ lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (lock != prevlock) - { - if (prevlock != NULL) - LWLockRelease(prevlock); - LWLockAcquire(lock, LW_EXCLUSIVE); - prevlock = lock; - } + LWLockAcquire(lock, LW_EXCLUSIVE); slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi); offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno]; @@ -1410,17 +1399,19 @@ retry: if (pageno != prev_pageno) { + LWLock *newlock; + /* * Since we're going to access a different SLRU page, if this page * falls under a different bank, release the old bank's lock and * acquire the lock of the new bank. */ - lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); - if (prevlock != lock) + newlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno); + if (newlock != lock) { -LWLockRelease(prevlock); -LWLockAcquire(lock, LW_EXCLUSIVE); -prevlock = lock; +LWLockRelease(lock); +LWLockAcquire(newlock, LW_EXCLUSIVE); +lock = newlock; } slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact); } @@ -1432,8 +1423,8 @@ retry: if (nextMXOffset == 0) { /* Corner case 2: next multixact is still being filled in */ - LWLockRelease(prevlock); - prevlock = NULL; + LWLockRelease(lock); + lock = NULL; CHECK_FOR_INTERRUPTS(); pg_usleep(1000L); goto retry; @@ -1442,14 +1433,11 @@ retry: length = nextMXOffset - offset; } - LWLockRelease(prevlock); - prevlock = NULL; - ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember)); truelength = 0; prev_pageno = -1; - for (i = 0; i < length; i++, offset++) + for (int i = 0; i < length; i++, offset++) { TransactionId *xactptr; uint32 *flagsptr; @@ -1462,18 +1450,19 @@ retry: if (pageno != prev_pageno) { + LWLock *newlock; + /* * Since we're going to access a different SLRU page,