Re: pgsql: Improve performance of subsystems on top of SLRU

2024-03-04 Thread Alvaro Herrera
FWIW there's a stupid bug in 0002, which is fixed here.  I'm writing a
simple test for it.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."
>From ecf613092a3cbc4c5112d30af7eea55b668decec Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 4 Mar 2024 11:49:01 +0100
Subject: [PATCH v3] Rework redundant loop in subtrans.c

---
 src/backend/access/transam/subtrans.c | 29 +++
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/subtrans.c b/src/backend/access/transam/subtrans.c
index dc9566fb51..50bb1d8cfc 100644
--- a/src/backend/access/transam/subtrans.c
+++ b/src/backend/access/transam/subtrans.c
@@ -311,7 +311,7 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	FullTransactionId nextXid;
 	int64		startPage;
 	int64		endPage;
-	LWLock	   *prevlock;
+	LWLock	   *prevlock = NULL;
 	LWLock	   *lock;
 
 	/*
@@ -324,42 +324,27 @@ StartupSUBTRANS(TransactionId oldestActiveXID)
 	nextXid = TransamVariables->nextXid;
 	endPage = TransactionIdToPage(XidFromFullTransactionId(nextXid));
 
-	prevlock = SimpleLruGetBankLock(SubTransCtl, startPage);
-	LWLockAcquire(prevlock, LW_EXCLUSIVE);
-	while (startPage != endPage)
+	for (;;)
 	{
 		lock = SimpleLruGetBankLock(SubTransCtl, startPage);
-
-		/*
-		 * Check if we need to acquire the lock on the new bank then release
-		 * the lock on the old bank and acquire on the new bank.
-		 */
 		if (prevlock != lock)
 		{
-			LWLockRelease(prevlock);
+			if (prevlock)
+LWLockRelease(prevlock);
 			LWLockAcquire(lock, LW_EXCLUSIVE);
 			prevlock = lock;
 		}
 
 		(void) ZeroSUBTRANSPage(startPage);
+		if (startPage == endPage)
+			break;
+
 		startPage++;
 		/* must account for wraparound */
 		if (startPage > TransactionIdToPage(MaxTransactionId))
 			startPage = 0;
 	}
 
-	lock = SimpleLruGetBankLock(SubTransCtl, startPage);
-
-	/*
-	 * Check if we need to acquire the lock on the new bank then release the
-	 * lock on the old bank and acquire on the new bank.
-	 */
-	if (prevlock != lock)
-	{
-		LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-	}
-	(void) ZeroSUBTRANSPage(startPage);
 	LWLockRelease(lock);
 }
 
-- 
2.39.2



Re: pgsql: Improve performance of subsystems on top of SLRU

2024-03-04 Thread Alvaro Herrera
On 2024-Mar-03, Tom Lane wrote:

> This is certainly simpler, but I notice that it holds the current
> LWLock across the line
> 
>   ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
> 
> where the old code did not.  Could the palloc take long enough that
> holding the lock is bad?

Hmm, I guess most of the time it shouldn't be much of a problem (if the
length is small so we can palloc without malloc'ing); but it could be in
the worst case.  But the fix is simple: just release the lock before.
There's no correctness argument for holding it all the way down.  I was
just confused about how the original code worked.

> Also, with this coding the "lock = NULL;" assignment just before
> "goto retry" is a dead store.  Not sure if Coverity or other static
> analyzers would whine about that.

Oh, right.  I removed that.

On 2024-Mar-04, Dilip Kumar wrote:

> I am not sure about the other changes, I mean that makes the code much
> simpler but now we are not releasing the 'MultiXactOffsetCtl' related
> bank lock, and later in the following loop, we are comparing that lock
> against 'MultiXactMemberCtl' related bank lock. This makes code
> simpler because now in the loop we are sure that we are always holding
> the lock but I do not like comparing the bank locks for 2 different
> SLRUs, although there is no problem as there would not be a common
> lock address,

True.  This can be addressed in the same way Tom's first comment is:
just release the lock before entering the second loop, and setting lock
to NULL.  This brings the code to a similar state than before, except
that the additional LWLock * variables are in a tighter scope.  That's
in 0001.


Now, I had a look at the other users of slru.c and noticed in subtrans.c
that StartupSUBTRANS we have some duplicate code that I think could be
rewritten by making the "while" block test the condition at the end
instead of at the start; changed that in 0002.  I'll leave this one for
later, because I want to add some test code for it -- right now it's
pretty much test-uncovered code.


I also looked at slru.c for uses of shared->bank_locks and noticed a
couple that could be made simpler.  That's 0003 here.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"If it is not right, do not do it.
If it is not true, do not say it." (Marcus Aurelius, Meditations)
>From a6be7cac5de83a36e056147f3e81bea2eb1096bd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Sun, 3 Mar 2024 15:20:36 +0100
Subject: [PATCH v2 1/3] rework locking code in GetMultiXactIdMembers

Per Coverity
---
 src/backend/access/transam/multixact.c | 53 +++---
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index cd476b94fa..83b578dced 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1247,14 +1247,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactOffset offset;
 	int			length;
 	int			truelength;
-	int			i;
 	MultiXactId oldestMXact;
 	MultiXactId nextMXact;
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
 	LWLock	   *lock;
-	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1361,18 +1359,9 @@ retry:
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
-	/*
-	 * If this page falls under a different bank, release the old bank's lock
-	 * and acquire the lock of the new bank.
-	 */
+	/* Acquire the bank lock for the page we need. */
 	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-	if (lock != prevlock)
-	{
-		if (prevlock != NULL)
-			LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-		prevlock = lock;
-	}
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
@@ -1407,17 +1396,19 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page, if this page
 			 * falls under a different bank, release the old bank's lock and
 			 * acquire the lock of the new bank.
 			 */
-			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-			if (prevlock != lock)
+			newlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (newlock != lock)
 			{
-LWLockRelease(prevlock);
-LWLockAcquire(lock, LW_EXCLUSIVE);
-prevlock = lock;
+LWLockRelease(lock);
+LWLockAcquire(newlock, LW_EXCLUSIVE);
+lock = newlock;
 			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
 		}
@@ -1429,8 +1420,7 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(prevlock);
-			prevlock = NULL;
+			LWLockRelease(lock);
 			

Re: pgsql: Improve performance of subsystems on top of SLRU

2024-03-03 Thread Dilip Kumar
On Mon, Mar 4, 2024 at 1:56 AM Alvaro Herrera  wrote:
>
> On 2024-Feb-28, Alvaro Herrera wrote:
>
> > Improve performance of subsystems on top of SLRU
>
> Coverity had the following complaint about this commit:
>
> 
> *** CID NNN:  Control flow issues  (DEADCODE)
> /srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c:
>  1375 in GetMultiXactIdMembers()
> 1369 * and acquire the lock of the new bank.
> 1370 */
> 1371lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
> 1372if (lock != prevlock)
> 1373{
> 1374if (prevlock != NULL)
> >>> CID 1592913:  Control flow issues  (DEADCODE)
> >>> Execution cannot reach this statement: "LWLockRelease(prevlock);".
> 1375LWLockRelease(prevlock);
> 1376LWLockAcquire(lock, LW_EXCLUSIVE);
> 1377prevlock = lock;
> 1378}
> 1379
> 1380slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, 
> multi);
>
> And I think it's correct that this is somewhat bogus, or at least
> confusing: the only way to have control back here on line 1371 after
> having executed once is via the "goto retry" line below; and there we
> release "prevlock" and set it to NULL beforehand, so it's impossible for
> prevlock to be NULL.  Looking closer I think this code is all confused,
> so I suggest to rework it as shown in the attached patch.
>
> I'll have a look at the other places where we use this "prevlock" coding
> pattern tomorrow.


+ /* Acquire the bank lock for the page we need. */
  lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
- if (lock != prevlock)
- {
- if (prevlock != NULL)
- LWLockRelease(prevlock);
- LWLockAcquire(lock, LW_EXCLUSIVE);
- prevlock = lock;
- }
+ LWLockAcquire(lock, LW_EXCLUSIVE);

This part is definitely an improvement.

I am not sure about the other changes, I mean that makes the code much
simpler but now we are not releasing the 'MultiXactOffsetCtl' related
bank lock, and later in the following loop, we are comparing that lock
against 'MultiXactMemberCtl' related bank lock. This makes code
simpler because now in the loop we are sure that we are always holding
the lock but I do not like comparing the bank locks for 2 different
SLRUs, although there is no problem as there would not be a common
lock address, anyway, I do not have any strong objection to what you
have done here.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: pgsql: Improve performance of subsystems on top of SLRU

2024-03-03 Thread Tom Lane
Alvaro Herrera  writes:
> And I think it's correct that this is somewhat bogus, or at least
> confusing: the only way to have control back here on line 1371 after
> having executed once is via the "goto retry" line below; and there we
> release "prevlock" and set it to NULL beforehand, so it's impossible for
> prevlock to be NULL.  Looking closer I think this code is all confused,
> so I suggest to rework it as shown in the attached patch.

This is certainly simpler, but I notice that it holds the current
LWLock across the line

ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));

where the old code did not.  Could the palloc take long enough that
holding the lock is bad?

Also, with this coding the "lock = NULL;" assignment just before
"goto retry" is a dead store.  Not sure if Coverity or other static
analyzers would whine about that.

regards, tom lane




Re: pgsql: Improve performance of subsystems on top of SLRU

2024-03-03 Thread Alvaro Herrera
On 2024-Feb-28, Alvaro Herrera wrote:

> Improve performance of subsystems on top of SLRU

Coverity had the following complaint about this commit:


*** CID NNN:  Control flow issues  (DEADCODE)
/srv/coverity/git/pgsql-git/postgresql/src/backend/access/transam/multixact.c: 
1375 in GetMultiXactIdMembers()
1369 * and acquire the lock of the new bank.
1370 */
1371lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
1372if (lock != prevlock)
1373{
1374if (prevlock != NULL)
>>> CID 1592913:  Control flow issues  (DEADCODE)   
>>>   
>>> Execution cannot reach this statement: "LWLockRelease(prevlock);".  
>>>   
1375LWLockRelease(prevlock);
1376LWLockAcquire(lock, LW_EXCLUSIVE);
1377prevlock = lock;
1378}
1379
1380slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, 
multi);

And I think it's correct that this is somewhat bogus, or at least
confusing: the only way to have control back here on line 1371 after
having executed once is via the "goto retry" line below; and there we
release "prevlock" and set it to NULL beforehand, so it's impossible for
prevlock to be NULL.  Looking closer I think this code is all confused,
so I suggest to rework it as shown in the attached patch.

I'll have a look at the other places where we use this "prevlock" coding
pattern tomorrow.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From a82fa380c7ff5a468ca920f4c06e626946bc8ecf Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Sun, 3 Mar 2024 15:20:36 +0100
Subject: [PATCH] rework locking code in GetMultiXactIdMembers

Per Coverity
---
 src/backend/access/transam/multixact.c | 52 ++
 1 file changed, 20 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 9b81506145..ec446949a9 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1250,14 +1250,12 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
 	MultiXactOffset offset;
 	int			length;
 	int			truelength;
-	int			i;
 	MultiXactId oldestMXact;
 	MultiXactId nextMXact;
 	MultiXactId tmpMXact;
 	MultiXactOffset nextOffset;
 	MultiXactMember *ptr;
 	LWLock	   *lock;
-	LWLock	   *prevlock = NULL;
 
 	debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
 
@@ -1364,18 +1362,9 @@ retry:
 	pageno = MultiXactIdToOffsetPage(multi);
 	entryno = MultiXactIdToOffsetEntry(multi);
 
-	/*
-	 * If this page falls under a different bank, release the old bank's lock
-	 * and acquire the lock of the new bank.
-	 */
+	/* Acquire the bank lock for the page we need. */
 	lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-	if (lock != prevlock)
-	{
-		if (prevlock != NULL)
-			LWLockRelease(prevlock);
-		LWLockAcquire(lock, LW_EXCLUSIVE);
-		prevlock = lock;
-	}
+	LWLockAcquire(lock, LW_EXCLUSIVE);
 
 	slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, multi);
 	offptr = (MultiXactOffset *) MultiXactOffsetCtl->shared->page_buffer[slotno];
@@ -1410,17 +1399,19 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page, if this page
 			 * falls under a different bank, release the old bank's lock and
 			 * acquire the lock of the new bank.
 			 */
-			lock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
-			if (prevlock != lock)
+			newlock = SimpleLruGetBankLock(MultiXactOffsetCtl, pageno);
+			if (newlock != lock)
 			{
-LWLockRelease(prevlock);
-LWLockAcquire(lock, LW_EXCLUSIVE);
-prevlock = lock;
+LWLockRelease(lock);
+LWLockAcquire(newlock, LW_EXCLUSIVE);
+lock = newlock;
 			}
 			slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, true, tmpMXact);
 		}
@@ -1432,8 +1423,8 @@ retry:
 		if (nextMXOffset == 0)
 		{
 			/* Corner case 2: next multixact is still being filled in */
-			LWLockRelease(prevlock);
-			prevlock = NULL;
+			LWLockRelease(lock);
+			lock = NULL;
 			CHECK_FOR_INTERRUPTS();
 			pg_usleep(1000L);
 			goto retry;
@@ -1442,14 +1433,11 @@ retry:
 		length = nextMXOffset - offset;
 	}
 
-	LWLockRelease(prevlock);
-	prevlock = NULL;
-
 	ptr = (MultiXactMember *) palloc(length * sizeof(MultiXactMember));
 
 	truelength = 0;
 	prev_pageno = -1;
-	for (i = 0; i < length; i++, offset++)
+	for (int i = 0; i < length; i++, offset++)
 	{
 		TransactionId *xactptr;
 		uint32	   *flagsptr;
@@ -1462,18 +1450,19 @@ retry:
 
 		if (pageno != prev_pageno)
 		{
+			LWLock	   *newlock;
+
 			/*
 			 * Since we're going to access a different SLRU page,