RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-07-26 Thread Chengchao Yu
  #22 0x5630d4695e8b in main (argc=7, argv=0x5630d6534d20) at 
/path/to/postgres/source/build/../src/backend/main/main.c:224
Though ENOSPC is avoided by reservation in PG, the other error code could be 
returned from OS to form this stack.

> Ok, we are fixing this. The proposed patch lets LWLockReleaseAll() called 
> before
> InitBufferPoolBackend() by registering the former after the latter into 
> on_shmem_exit
> list. Even if it works, I think it's neither clean nor safe to register 
> multiple
> order-sensitive callbacks.

Actually I think the order of callbacks retains the order of how components got 
initialized. In the patch v2, the specific location requirement was for the 
cascade exception to work as well.
However, I think we can discuss about this as we just would like to ensure 
lw-locks are released before AbortBufferIO().

> And the only caller of it is shmem_exit. More of that, all other caller sites 
> calls
> LWLockReleaseAll() just before calling it. If that's the case, why don't we 
> just release
> all LWLocks in shmem_exit or in AtProcExit_Buffers before calling 
> AbortBufferIO()? I think
> it's sufficient that AtProcExit_Buffers calls it at the beginning. (The 
> comment for the
> funcgtion needs editing).

Putting LWLockReleaseAll() in AtProcExit_Buffers() is OK, however, it does not 
work for the cascade exception case if putting in shmem_exit().
Indeed putting LWLockReleaseAll() in AtProcExit_Buffers() was considered 
firstly, but as the other part of PG code base prefers putting in other 
callbacks (e.g. ShutdownAuxiliaryProcess() callback when UnderPostmaster is 
true), I just followed the same style in patch v2.
But after revisited the decision, I think I agree with you, because:
1. Yes, it looks cleaner in the code.
2. We can avoid the pain if people forgot or wrongly registered the 
additional callback.
3. Calling LWLockReleaseAll() for the second time is quite fast, it will 
not bring overburden to AtProcExit_Buffers()

Thus, I have updated the patch v3 according to your suggestions. Could you help 
to review again?
Please let me know should you have more suggestions or feedbacks.

Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/



fix-deadlock-v3.patch
Description: fix-deadlock-v3.patch


RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-03-03 Thread Chengchao Yu
Hi Amit,

Greetings! Thank you so much for your previous feedbacks!

It seems the old patch is broken with latest master branch. So, I have rebased 
the patch, now it can be applied to the latest master without conflicts.

Btw, since the commitfest was created: 
https://commitfest.postgresql.org/22/2003/,
were there some places that could be improved? Could you give some suggestions? 
Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/

-Original Message-
From: Chengchao Yu 
Sent: Monday, February 18, 2019 6:08 PM
To: Amit Kapila 
Cc: Thomas Munro ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke ; 
TEJA Mupparti 
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

Thank you so much Amit! I have created the patch below:
https://commitfest.postgresql.org/22/2003/

Please let me know should you have more suggestions. Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL 
https://azure.microsoft.com/en-us/services/postgresql/


fix-deadlock-v2.patch
Description: fix-deadlock-v2.patch


RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-02-18 Thread Chengchao Yu
Thank you so much Amit! I have created the patch below:
https://commitfest.postgresql.org/22/2003/

Please let me know should you have more suggestions. Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-Original Message-
From: Amit Kapila  
Sent: Friday, February 1, 2019 6:58 PM
To: Chengchao Yu 
Cc: Thomas Munro ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke ; 
TEJA Mupparti 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu  wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both 
> messages.
>
> > We generally reserve the space in a relation before attempting to write, so 
> > not sure how you are able to hit the disk full situation via mdwrite.  If 
> > you see the description of the function, that also indicates same.
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft 
> due to our own storage layer which treat mdextend() actions as setting length 
> of the file only. We have a workaround, and any change isn’t needed for 
> Postgres.
>
> > I am not telling that mdwrite can never lead to error, but just trying to 
> > understand the issue you actually faced.  I haven't read your proposed 
> > solution yet, let's first try to establish the problem you are facing.
>
> We see transient IO errors reading a block where PG instance gets dead-lock 
> in single user mode until we kill the instance. The stack trace below shows 
> the behavior which is when mdread() failed with buffer holding its lw-lock. 
> This happens because in single user mode there is no call back to release the 
> lock and when AbortBufferIO() tries to acquire the same lock again, it will 
> wait for the lock indefinitely.
>

I think you can register your patch for next CF [1] so that we don't forget 
about it.

[1] - 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F22%2Fdata=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307sdata=lJ2LjRgo%2Bd6ViKqwJ040BPzicOTFtFO8NmmVft00yKY%3Dreserved=0

--
With Regards,
Amit Kapila.
EnterpriseDB: 
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.comdata=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307sdata=nXcVn6B1fl6b5iiDKybl3zf0fXo22%2BrZ1Ne7v1%2FM5DE%3Dreserved=0


fix-deadlock.patch
Description: fix-deadlock.patch


RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-02-01 Thread Chengchao Yu
Hi Amit, Thomas,

Thank you very much for your feedbacks! Apologizes but I just saw both messages.

> We generally reserve the space in a relation before attempting to write, so 
> not sure how you are able to hit the disk full situation via mdwrite.  If you 
> see the description of the function, that also indicates same.

Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due 
to our own storage layer which treat mdextend() actions as setting length of 
the file only. We have a workaround, and any change isn’t needed for Postgres.

> I am not telling that mdwrite can never lead to error, but just trying to 
> understand the issue you actually faced.  I haven't read your proposed 
> solution yet, let's first try to establish the problem you are facing.

We see transient IO errors reading a block where PG instance gets dead-lock in 
single user mode until we kill the instance. The stack trace below shows the 
behavior which is when mdread() failed with buffer holding its lw-lock. This 
happens because in single user mode there is no call back to release the lock 
and when AbortBufferIO() tries to acquire the same lock again, it will wait for 
the lock indefinitely.

Here is the stack trace:

0a 0004`8080cc30 0004`80dcf917 postgres!PGSemaphoreLock+0x65 
[d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158] 
0b 0004`8080cc90 0004`80db025c postgres!LWLockAcquire+0x137 
[d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234] 
0c 0004`8080ccd0 0004`80db25db postgres!AbortBufferIO+0x2c 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995] 
0d 0004`8080cd20 0004`80dbce36 postgres!AtProcExit_Buffers+0xb 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479] 
0e 0004`8080cd50 0004`80dbd1bd postgres!shmem_exit+0xf6 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262] 
0f 0004`8080cd80 0004`80dbccfd postgres!proc_exit_prepare+0x4d 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188] 
10 0004`8080cdb0 0004`80ef9e74 postgres!proc_exit+0xd 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141] 
11 0004`8080cde0 0004`80ddb6ef postgres!errfinish+0x204 
[d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624] 
12 0004`8080ce50 0004`80db0f59 postgres!mdread+0x12f 
[d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806] 
13 0004`8080cea0 0004`80daeb70 postgres!ReadBuffer_common+0x2c9 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 897] 
14 0004`8080cf30 0004`80b81322 postgres!ReadBufferWithoutRelcache+0x60 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 694] 
15 0004`8080cf90 0004`80db9cbb postgres!XLogReadBufferExtended+0x142 
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlogutils.c @ 513] 
16 0004`8080cff0 0004`80b2f53a 
postgres!XLogRecordPageWithFreeSpace+0xbb 
[d:\orcasqlagsea10\14\s\src\backend\storage\freespace\freespace.c @ 254] 
17 0004`8080d030 0004`80b6eb94 postgres!heap_xlog_insert+0x36a 
[d:\orcasqlagsea10\14\s\src\backend\access\heap\heapam.c @ 8491] 
18 0004`8080f0d0 0004`80f0a13f postgres!StartupXLOG+0x1f84 
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlog.c @ 7480] 
19 0004`8080fbf0 0004`80de121e postgres!InitPostgres+0x12f 
[d:\orcasqlagsea10\14\s\src\backend\utils\init\postinit.c @ 656] 
1a 0004`8080fcd0 0004`80c92c31 postgres!PostgresMain+0x25e 
[d:\orcasqlagsea10\14\s\src\backend\tcop\postgres.c @ 3881] 
1b 0004`8080fed0 0004`80f51df3 postgres!main+0x491 
[d:\orcasqlagsea10\14\s\src\backend\main\main.c @ 235] 

Please let us know should you have more feedbacks. Thank you!
 
Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-Original Message-
From: Thomas Munro  
Sent: Thursday, January 24, 2019 2:32 PM
To: Amit Kapila 
Cc: Chengchao Yu ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila  wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu  wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including 
> > disk full, random remote disk IO failures) happens in single user mode. We 
> > found the issue exists on both Linux and Windows in multiple postgres 
> > versions.
> >
> > 3.   Because the unable to write relation data scenario is difficult to 
> > hit naturally even reserved space is turned off, I have prepared a small 
> > patch (see attachment “emulate-error.patch”) to force an error when PG 
> > tries to write data to relation files. We can just apply the patch and 
> > there is no need to put efforts flooding data to disk any more.
>
> I 

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-01-02 Thread Chengchao Yu
Greetings,

Happy new year!

We would like to follow up again for this issue and fix proposal. Could someone 
give some suggestions to the fix proposal? Or other ideas to fix this issue?

Looking forward to your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Fservices%2Fpostgresql%2F=02%7C01%7Cchengyu%40microsoft.com%7C519f3f8b8d304d8945ba08d666048905%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636808567020764789=n9LnXSl1tXwEJWw71Nfv1Txj6iFXiEd9fWh3wM1pvfs%3D=0>


From: Chengchao Yu 
Sent: Wednesday, December 19, 2018 2:51 PM
To: pgsql-hack...@postgresql.org
Cc: Prabhat Tripathi ; Sunil Kamath 
; Michal Primke ; Bhavin 
Gandhi 
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

Greetings,

Just would like to follow up this issue and fix proposal. We really would like 
to have this issue fixed in PG. Could someone give some suggestions to the fix 
proposal? Or other ideas to fix this issue?

Looking forward for your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Fservices%2Fpostgresql%2F=02%7C01%7Cchengyu%40microsoft.com%7C519f3f8b8d304d8945ba08d666048905%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636808567020764789=n9LnXSl1tXwEJWw71Nfv1Txj6iFXiEd9fWh3wM1pvfs%3D=0>

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' 
mailto:pgsql-hack...@postgresql.org>>
Cc: Prabhat Tripathi mailto:pt...@microsoft.com>>; Sunil 
Kamath mailto:sunil.kam...@microsoft.com>>; Michal 
Primke mailto:mpri...@microsoft.com>>
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs


Greetings,



Recently, we hit a few occurrences of deadlock when IO failure (including disk 
full, random remote disk IO failures) happens in single user mode. We found the 
issue exists on both Linux and Windows in multiple postgres versions.



Here are the steps to repro on Linux (as Windows repro is similar):


1.   Get latest PostgreSQL code, build and install the executables.



$ git clone 
https://git.postgresql.org/git/postgresql.git<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.postgresql.org%2Fgit%2Fpostgresql.git=02%7C01%7Cchengyu%40microsoft.com%7C519f3f8b8d304d8945ba08d666048905%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636808567020774790=QWbuMQ2RM4JVJPTcu2dCosMS18smHQHePjVIbhgw1Uo%3D=0>

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install


2.   Run initdb to initialize a PG database folder.



$ /path/to/postgres/bin/initdb -D /path/to/data


3.   Because the unable to write relation data scenario is difficult to hit 
naturally even reserved space is turned off, I have prepared a small patch (see 
attachment "emulate-error.patch") to force an error when PG tries to write data 
to relation files. We can just apply the patch and there is no need to put 
efforts flooding data to disk any more.



$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install


4.   Connect to the newly initialized database cluster with single user 
mode, create a table, and insert some data to the table, do a checkpoint or 
directly give EOF. Then we hit the deadlock issue and the process will not exit 
until we kill it.



Do a checkpoint explicitly:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;



2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247



Or directly give an EOF:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relati

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2018-12-19 Thread Chengchao Yu
Greetings,

Just would like to follow up this issue and fix proposal. We really would like 
to have this issue fixed in PG. Could someone give some suggestions to the fix 
proposal? Or other ideas to fix this issue?

Looking forward for your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' 
Cc: Prabhat Tripathi ; Sunil Kamath 
; Michal Primke 
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs


Greetings,



Recently, we hit a few occurrences of deadlock when IO failure (including disk 
full, random remote disk IO failures) happens in single user mode. We found the 
issue exists on both Linux and Windows in multiple postgres versions.



Here are the steps to repro on Linux (as Windows repro is similar):


1.   Get latest PostgreSQL code, build and install the executables.



$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install


2.   Run initdb to initialize a PG database folder.



$ /path/to/postgres/bin/initdb -D /path/to/data


3.   Because the unable to write relation data scenario is difficult to hit 
naturally even reserved space is turned off, I have prepared a small patch (see 
attachment "emulate-error.patch") to force an error when PG tries to write data 
to relation files. We can just apply the patch and there is no need to put 
efforts flooding data to disk any more.



$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install


4.   Connect to the newly initialized database cluster with single user 
mode, create a table, and insert some data to the table, do a checkpoint or 
directly give EOF. Then we hit the deadlock issue and the process will not exit 
until we kill it.



Do a checkpoint explicitly:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;



2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247



Or directly give an EOF:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation 
base/12368/1247


5.   Moreover, when we try to recover the database with single user mode, 
we hit the issue again, and the process does not bring up the database nor exit.



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was 
interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut 
down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: 
wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log 
time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation 
base/12368/1247



Analyses:



So, what happened? Actually, there are 2 types of the deadlock due to the same 
root cause. Let's first take a look at the scenario in step #5. In this 
scenario, the deadlock happens when disk IO failure occurs inside 
StartupXLOG(). If we attach debugger to PG process, we will see the process is 
stuck acquiring the buffer's lw-lock in AbortBufferIO().



void

AbortBufferIO(void)

{

BufferDesc *buf = InProgressBuf;



if (buf)

{

uint32  buf_state;



/*

 * Since LWLockReleaseAll has already been called, we're not holding

 * the buffer's io_in_progress_lock. We have to re-acquire 

[PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2018-11-30 Thread Chengchao Yu
-git a/src/backend/utils/init/postinit.c 
b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool 
override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char 
*username,

 * way, start up the XLOG machinery, and register to have it 
closed

 * down at exit.

 */

+on_shmem_exit(ReleaseLWLocks, 0);

StartupXLOG();

on_shmem_exit(ShutdownXLOG, 0);

}

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

heap_close(relsetting, AccessShareLock);

}

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ *  (1) Exceptions thrown during StartupXLOG()

+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+LWLockReleaseAll();

+}

+

/*

  * Backend-shutdown callback.  Do cleanup that we want to be sure happens

  * before all the supporting modules begin to nail their doors shut via



The fix proposal is also attached to this email in file "fix-deadlock.patch".



Please let us know should you have suggestions on this issue and the fix.



Thank you!



Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/



emulate-error.patch
Description: emulate-error.patch


fix-deadlock.patch
Description: fix-deadlock.patch