subject:"RE\: \[PATCH\] Fix Proposal \- Deadlock Issue in Single User Mode When IO Failure Occurs"

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2020-01-31 Thread Amit Kapila

On Mon, Nov 25, 2019 at 1:17 PM Michael Paquier  wrote:
>
> On Mon, Sep 09, 2019 at 05:34:43PM +0530, Amit Kapila wrote:
> > The only difference is in the last line where for me it gives
> > assertion failure when trying to do ReleaseAuxProcessResources.  Below
> > is the callstack:
>
> No need for Windows on this one and I have reproduced easily the same
> trace as Amit.  The patch has been moved to next CF.  Chengchao, could
> you provide an update please?
>

I have marked this patch as "Returned with feedback" as it's been long
since the author has responded.  Feel free to provide a new patch for
the next CF.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-11-24 Thread Michael Paquier

On Mon, Sep 09, 2019 at 05:34:43PM +0530, Amit Kapila wrote:
> The only difference is in the last line where for me it gives
> assertion failure when trying to do ReleaseAuxProcessResources.  Below
> is the callstack:

No need for Windows on this one and I have reproduced easily the same
trace as Amit.  The patch has been moved to next CF.  Chengchao, could
you provide an update please?
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-09-09 Thread Amit Kapila

On Sat, Jul 27, 2019 at 6:22 AM Chengchao Yu  wrote:
>
> Thus, I have updated the patch v3 according to your suggestions. Could you 
> help to review again?
> Please let me know should you have more suggestions or feedbacks.
>

I have tried to look into this patch and I don't think it fixes the
problem.  Basically, I have tried the commands suggested by you in
single-user mode, create table; insert and then checkpoint.  Now, what
I see is almost the same behavior as explained by you in one of the
above emails with a slight difference which makes me think that the
fix you are proposing is not correct.  Below is what you told:

"The second type is in Step #4. At the time when “checkpoint” SQL
command is being executed, PG has already set up the before_shmem_exit
callbackShutdownPostgres(), which releases all lw-locks given
transaction or sub-transaction is on-going. So after the first IO
error, the buffer page’s lw-lock gets released successfully. However,
later ShutdownXLOG() is invoked, and PG tries to flush buffer pages
again, which results in the second IO error. Different from the first
time, this time, all the previous executed before/on_shmem_exit
callbacks are not invoked again due to the decrease of the indexes. So
lw-locks for buffer pages are not released when PG tries to get the
same buffer lock in AbortBufferIO(), and then PG process gets stuck."

The only difference is in the last line where for me it gives
assertion failure when trying to do ReleaseAuxProcessResources.  Below
is the callstack:

  postgres.exe!ExceptionalCondition(const char *
conditionName=0x00db0c78, const char * errorType=0x00db0c68, const
char * fileName=0x00db0c18, int lineNumber=1722)  Line 55 C
  postgres.exe!UnpinBuffer(BufferDesc * buf=0x052a104c, bool
fixOwner=true)  Line 1722 + 0x2f bytes C
  postgres.exe!ReleaseBuffer(int buffer=96)  Line 3367 + 0x17 bytes C
  postgres.exe!ResourceOwnerReleaseInternal(ResourceOwnerData *
owner=0x0141f6e8, 
phase=RESOURCE_RELEASE_BEFORE_LOCKS, bool isCommit=false, bool
isTopLevel=true)  Line 526 + 0x9 bytes C
  postgres.exe!ResourceOwnerRelease(ResourceOwnerData *
owner=0x0141f6e8, 
phase=RESOURCE_RELEASE_BEFORE_LOCKS, bool isCommit=false, bool
isTopLevel=true)  Line 484 + 0x17 bytes C
  postgres.exe!ReleaseAuxProcessResources(bool isCommit=false)  Line
861 + 0x15 bytes C
> postgres.exe!ReleaseAuxProcessResourcesCallback(int code=1, unsigned int 
> arg=0)  Line 881 + 0xa bytes C
  postgres.exe!shmem_exit(int code=1)  Line 272 + 0x1f bytes C
  postgres.exe!proc_exit_prepare(int code=1)  Line 194 + 0x9 bytes C
  postgres.exe!proc_exit(int code=1)  Line 107 + 0x9 bytes C
  postgres.exe!errfinish(int dummy=0, ...)  Line 538 + 0x7 bytes C
  postgres.exe!mdwrite(SMgrRelationData * reln=0x0147e140, ForkNumber
forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 713 + 0x4c bytes C
  postgres.exe!smgrwrite(SMgrRelationData * reln=0x0147e140,
ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 587 + 0x24 bytes C
  postgres.exe!FlushBuffer(BufferDesc * buf=0x052a104c,
SMgrRelationData * reln=0x0147e140)  Line 2759 + 0x1d bytes C
  postgres.exe!SyncOneBuffer(int buf_id=95, bool
skip_recently_used=false, WritebackContext * wb_context=0x012ccea0)
Line 2402 + 0xb bytes C
  postgres.exe!BufferSync(int flags=5)  Line 1992 + 0x15 bytes C
  postgres.exe!CheckPointBuffers(int flags=5)  Line 2586 + 0x9 bytes C
  postgres.exe!CheckPointGuts(unsigned __int64
checkPointRedo=22933176, int flags=5)  Line 8991 + 0x9 bytes C
  postgres.exe!CreateCheckPoint(int flags=5)  Line 8780 + 0x11 bytes C
  postgres.exe!ShutdownXLOG(int code=1, unsigned int arg=0)  Line 8333
+ 0x7 bytes C
  postgres.exe!shmem_exit(int code=1)  Line 272 + 0x1f bytes C
  postgres.exe!proc_exit_prepare(int code=1)  Line 194 + 0x9 bytes C
  postgres.exe!proc_exit(int code=1)  Line 107 + 0x9 bytes C
  postgres.exe!errfinish(int dummy=0, ...)  Line 538 + 0x7 bytes C
  postgres.exe!mdwrite(SMgrRelationData * reln=0x0147e140, ForkNumber
forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 713 + 0x4c bytes C
  postgres.exe!smgrwrite(SMgrRelationData * reln=0x0147e140,
ForkNumber forknum=MAIN_FORKNUM, unsigned int blocknum=7, char *
buffer=0x0542dd00, bool skipFsync=false)  Line 587 + 0x24 bytes C
  postgres.exe!FlushBuffer(BufferDesc * buf=0x052a104c,
SMgrRelationData * reln=0x0147e140)  Line 2759 + 0x1d bytes C
  postgres.exe!SyncOneBuffer(int buf_id=95, bool
skip_recently_used=false, WritebackContext * wb_context=0x012ce580)
Line 2402 + 0xb bytes C
  postgres.exe!BufferSync(int flags=44)  Line 1992 + 0x15 bytes C
  postgres.exe!CheckPointBuffers(int flags=44)  Line 2586 + 0x9 bytes C
  postgres.exe!CheckPointGuts(unsigned __int64
checkPointRedo=22933176, int flags=44)  Line 8991 + 0x9 bytes C
  postgres.exe!CreateCheckPoint(int flags=44)  Line 8780 + 0x11 bytes C

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-07-26 Thread Chengchao Yu

Hi Kyotaro,

Thank you so much for your valued feedback and suggestions!

> I assume that we are in a consensus about the problem we are to  fix here.
> 
> > 0a 0004`8080cc30 0004`80dcf917 postgres!PGSemaphoreLock+0x65 
> > [d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158] 0b 
> > 0004`8080cc90 0004`80db025c postgres!LWLockAcquire+0x137 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234] 0c 
> > 0004`8080ccd0 0004`80db25db postgres!AbortBufferIO+0x2c 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995] 0d 
> > 0004`8080cd20 0004`80dbce36 postgres!AtProcExit_Buffers+0xb 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479] 0e 
> > 0004`8080cd50 0004`80dbd1bd postgres!shmem_exit+0xf6 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262] 0f 
> > 0004`8080cd80 0004`80dbccfd postgres!proc_exit_prepare+0x4d 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188]
> > 10 0004`8080cdb0 0004`80ef9e74 postgres!proc_exit+0xd 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141]
> > 11 0004`8080cde0 0004`80ddb6ef postgres!errfinish+0x204 
> > [d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624]
> > 12 0004`8080ce50 0004`80db0f59 postgres!mdread+0x12f 
> > [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806]

Yes, this is one of the two situations we want to fix. The other situation is a 
cascade exception case like following.

  #0  0x7f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, 
abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at 
../sysdeps/unix/sysv/linux/futex-internal.h:205
  #1  do_futex_wait (sem=sem(at)entry=0x7f0fd14c81b8, abstime=0x0) at 
sem_waitcommon.c:111
  #2  0x7f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, 
abstime=0x0) at sem_waitcommon.c:181
  #3  0x5630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at 
pg_sema.c:316
  #4  0x5630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, 
mode=LW_EXCLUSIVE) at 
/path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243
  #5  0x5630d47cd087 in AbortBufferIO () at 
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988
  #6  0x5630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at 
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473
  #7  0x5630d47dbc32 in shmem_exit (code=1) at 
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272
  #8  0x5630d47dba5e in proc_exit_prepare (code=1) at 
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194
  #9  0x5630d47db9c6 in proc_exit (code=1) at 
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107
  #10 0x5630d49811bc in errfinish (dummy=0) at 
/path/to/postgres/source/build/../src/backend/utils/error/elog.c:541
  #11 0x5630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, 
blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at 
/path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843
  #12 0x5630d4804716 in smgrwrite (reln=0x5630d6588c68, 
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at 
/path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650
  #13 0x5630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, 
reln=0x5630d6588c68) at 
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751
  #14 0x5630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, 
wb_context=0x7ffccc371a70) at 
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394
  #15 0x5630d47cab00 in BufferSync (flags=6) at 
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984
  #16 0x5630d47cb57f in CheckPointBuffers (flags=6) at 
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578
  #17 0x5630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) 
at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149
  #18 0x5630d44a62cf in CreateCheckPoint (flags=6) at 
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937
  #19 0x5630d44a45e3 in StartupXLOG () at 
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723
  #20 0x5630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", 
dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, 
override_allow_connections=false)
  at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636
  #21 0x5630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, 
dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at 
/path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810
  #22 0x5630d4695e8b in main (argc=7, argv=0x5630d6534d20) at 
/path/to/postgres/source/build/../src/backend/main/main.c:224
Though ENOSPC is avoided by reservation in PG,

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-07-24 Thread Kyotaro Horiguchi

Sorry in advance for link-breaking message force by gmail..

https://www.postgresql.org/message-id/flat/cy4pr2101mb0804ce9836e582c0702214e8aa...@cy4pr2101mb0804.namprd21.prod.outlook.com

I assume that we are in a consensus about the problem we are to  fix
here.

> 0a 0004`8080cc30 0004`80dcf917 postgres!PGSemaphoreLock+0x65 
> [d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158]
> 0b 0004`8080cc90 0004`80db025c postgres!LWLockAcquire+0x137 
> [d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234]
> 0c 0004`8080ccd0 0004`80db25db postgres!AbortBufferIO+0x2c 
> [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995]
> 0d 0004`8080cd20 0004`80dbce36 postgres!AtProcExit_Buffers+0xb 
> [d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479]
> 0e 0004`8080cd50 0004`80dbd1bd postgres!shmem_exit+0xf6 
> [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262]
> 0f 0004`8080cd80 0004`80dbccfd postgres!proc_exit_prepare+0x4d 
> [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188]
> 10 0004`8080cdb0 0004`80ef9e74 postgres!proc_exit+0xd 
> [d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141]
> 11 0004`8080cde0 0004`80ddb6ef postgres!errfinish+0x204 
> [d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624]
> 12 0004`8080ce50 0004`80db0f59 postgres!mdread+0x12f 
> [d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806]

Ok, we are fixing this. The proposed patch lets LWLockReleaseAll()
called before InitBufferPoolBackend() by registering the former after
the latter into on_shmem_exit list. Even if it works, I think it's
neither clean nor safe to register multiple order-sensitive callbacks.

AtProcExit_Buffers has the following comment:

> * During backend exit, ensure that we released all shared-buffer locks and
> * assert that we have no remaining pins.

And the only caller of it is shmem_exit. More of that, all other
caller sites calls LWLockReleaseAll() just before calling it. If
that's the case, why don't we just release all LWLocks in shmem_exit
or in AtProcExit_Buffers before calling AbortBufferIO()? I think it's
sufficient that AtProcExit_Buffers calls it at the beginning. (The
comment for the funcgtion needs editing).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-03-03 Thread Chengchao Yu

Hi Amit,

Greetings! Thank you so much for your previous feedbacks!

It seems the old patch is broken with latest master branch. So, I have rebased 
the patch, now it can be applied to the latest master without conflicts.

Btw, since the commitfest was created: 
https://commitfest.postgresql.org/22/2003/,
were there some places that could be improved? Could you give some suggestions? 
Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/

-Original Message-
From: Chengchao Yu 
Sent: Monday, February 18, 2019 6:08 PM
To: Amit Kapila 
Cc: Thomas Munro ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke ; 
TEJA Mupparti 
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

Thank you so much Amit! I have created the patch below:
https://commitfest.postgresql.org/22/2003/

Please let me know should you have more suggestions. Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL 
https://azure.microsoft.com/en-us/services/postgresql/


fix-deadlock-v2.patch
Description: fix-deadlock-v2.patch

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-02-18 Thread Chengchao Yu

Thank you so much Amit! I have created the patch below:
https://commitfest.postgresql.org/22/2003/

Please let me know should you have more suggestions. Thank you!

Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-Original Message-
From: Amit Kapila  
Sent: Friday, February 1, 2019 6:58 PM
To: Chengchao Yu 
Cc: Thomas Munro ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke ; 
TEJA Mupparti 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu  wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both 
> messages.
>
> > We generally reserve the space in a relation before attempting to write, so 
> > not sure how you are able to hit the disk full situation via mdwrite.  If 
> > you see the description of the function, that also indicates same.
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft 
> due to our own storage layer which treat mdextend() actions as setting length 
> of the file only. We have a workaround, and any change isn’t needed for 
> Postgres.
>
> > I am not telling that mdwrite can never lead to error, but just trying to 
> > understand the issue you actually faced.  I haven't read your proposed 
> > solution yet, let's first try to establish the problem you are facing.
>
> We see transient IO errors reading a block where PG instance gets dead-lock 
> in single user mode until we kill the instance. The stack trace below shows 
> the behavior which is when mdread() failed with buffer holding its lw-lock. 
> This happens because in single user mode there is no call back to release the 
> lock and when AbortBufferIO() tries to acquire the same lock again, it will 
> wait for the lock indefinitely.
>

I think you can register your patch for next CF [1] so that we don't forget 
about it.

[1] - 
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F22%2Fdata=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307sdata=lJ2LjRgo%2Bd6ViKqwJ040BPzicOTFtFO8NmmVft00yKY%3Dreserved=0

--
With Regards,
Amit Kapila.
EnterpriseDB: 
https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.comdata=02%7C01%7Cchengyu%40microsoft.com%7Cfee132e6ec2843c2838a08d688ba3aef%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636846730778775307sdata=nXcVn6B1fl6b5iiDKybl3zf0fXo22%2BrZ1Ne7v1%2FM5DE%3Dreserved=0


fix-deadlock.patch
Description: fix-deadlock.patch

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-02-01 Thread Amit Kapila

On Sat, Feb 2, 2019 at 4:42 AM Chengchao Yu  wrote:
>
> Hi Amit, Thomas,
>
> Thank you very much for your feedbacks! Apologizes but I just saw both 
> messages.
>
> > We generally reserve the space in a relation before attempting to write, so 
> > not sure how you are able to hit the disk full situation via mdwrite.  If 
> > you see the description of the function, that also indicates same.
>
> Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft 
> due to our own storage layer which treat mdextend() actions as setting length 
> of the file only. We have a workaround, and any change isn’t needed for 
> Postgres.
>
> > I am not telling that mdwrite can never lead to error, but just trying to 
> > understand the issue you actually faced.  I haven't read your proposed 
> > solution yet, let's first try to establish the problem you are facing.
>
> We see transient IO errors reading a block where PG instance gets dead-lock 
> in single user mode until we kill the instance. The stack trace below shows 
> the behavior which is when mdread() failed with buffer holding its lw-lock. 
> This happens because in single user mode there is no call back to release the 
> lock and when AbortBufferIO() tries to acquire the same lock again, it will 
> wait for the lock indefinitely.
>

I think you can register your patch for next CF [1] so that we don't
forget about it.

[1] - https://commitfest.postgresql.org/22/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-02-01 Thread Chengchao Yu

Hi Amit, Thomas,

Thank you very much for your feedbacks! Apologizes but I just saw both messages.

> We generally reserve the space in a relation before attempting to write, so 
> not sure how you are able to hit the disk full situation via mdwrite.  If you 
> see the description of the function, that also indicates same.

Absolutely agree, this isn’t a PG issue. Issue manifest for us at Microsoft due 
to our own storage layer which treat mdextend() actions as setting length of 
the file only. We have a workaround, and any change isn’t needed for Postgres.

> I am not telling that mdwrite can never lead to error, but just trying to 
> understand the issue you actually faced.  I haven't read your proposed 
> solution yet, let's first try to establish the problem you are facing.

We see transient IO errors reading a block where PG instance gets dead-lock in 
single user mode until we kill the instance. The stack trace below shows the 
behavior which is when mdread() failed with buffer holding its lw-lock. This 
happens because in single user mode there is no call back to release the lock 
and when AbortBufferIO() tries to acquire the same lock again, it will wait for 
the lock indefinitely.

Here is the stack trace:

0a 0004`8080cc30 0004`80dcf917 postgres!PGSemaphoreLock+0x65 
[d:\orcasqlagsea10\14\s\src\backend\port\win32_sema.c @ 158] 
0b 0004`8080cc90 0004`80db025c postgres!LWLockAcquire+0x137 
[d:\orcasqlagsea10\14\s\src\backend\storage\lmgr\lwlock.c @ 1234] 
0c 0004`8080ccd0 0004`80db25db postgres!AbortBufferIO+0x2c 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 3995] 
0d 0004`8080cd20 0004`80dbce36 postgres!AtProcExit_Buffers+0xb 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 2479] 
0e 0004`8080cd50 0004`80dbd1bd postgres!shmem_exit+0xf6 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 262] 
0f 0004`8080cd80 0004`80dbccfd postgres!proc_exit_prepare+0x4d 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 188] 
10 0004`8080cdb0 0004`80ef9e74 postgres!proc_exit+0xd 
[d:\orcasqlagsea10\14\s\src\backend\storage\ipc\ipc.c @ 141] 
11 0004`8080cde0 0004`80ddb6ef postgres!errfinish+0x204 
[d:\orcasqlagsea10\14\s\src\backend\utils\error\elog.c @ 624] 
12 0004`8080ce50 0004`80db0f59 postgres!mdread+0x12f 
[d:\orcasqlagsea10\14\s\src\backend\storage\smgr\md.c @ 806] 
13 0004`8080cea0 0004`80daeb70 postgres!ReadBuffer_common+0x2c9 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 897] 
14 0004`8080cf30 0004`80b81322 postgres!ReadBufferWithoutRelcache+0x60 
[d:\orcasqlagsea10\14\s\src\backend\storage\buffer\bufmgr.c @ 694] 
15 0004`8080cf90 0004`80db9cbb postgres!XLogReadBufferExtended+0x142 
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlogutils.c @ 513] 
16 0004`8080cff0 0004`80b2f53a 
postgres!XLogRecordPageWithFreeSpace+0xbb 
[d:\orcasqlagsea10\14\s\src\backend\storage\freespace\freespace.c @ 254] 
17 0004`8080d030 0004`80b6eb94 postgres!heap_xlog_insert+0x36a 
[d:\orcasqlagsea10\14\s\src\backend\access\heap\heapam.c @ 8491] 
18 0004`8080f0d0 0004`80f0a13f postgres!StartupXLOG+0x1f84 
[d:\orcasqlagsea10\14\s\src\backend\access\transam\xlog.c @ 7480] 
19 0004`8080fbf0 0004`80de121e postgres!InitPostgres+0x12f 
[d:\orcasqlagsea10\14\s\src\backend\utils\init\postinit.c @ 656] 
1a 0004`8080fcd0 0004`80c92c31 postgres!PostgresMain+0x25e 
[d:\orcasqlagsea10\14\s\src\backend\tcop\postgres.c @ 3881] 
1b 0004`8080fed0 0004`80f51df3 postgres!main+0x491 
[d:\orcasqlagsea10\14\s\src\backend\main\main.c @ 235] 

Please let us know should you have more feedbacks. Thank you!
 
Best regards,
--
Chengchao Yu
Software Engineer | Microsoft | Azure Database for PostgreSQL
https://azure.microsoft.com/en-us/services/postgresql/


-Original Message-
From: Thomas Munro  
Sent: Thursday, January 24, 2019 2:32 PM
To: Amit Kapila 
Cc: Chengchao Yu ; Pg Hackers 
; Prabhat Tripathi ; Sunil 
Kamath ; Michal Primke 
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila  wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu  wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including 
> > disk full, random remote disk IO failures) happens in single user mode. We 
> > found the issue exists on both Linux and Windows in multiple postgres 
> > versions.
> >
> > 3.   Because the unable to write relation data scenario is difficult to 
> > hit naturally even reserved space is turned off, I have prepared a small 
> > patch (see attachment “emulate-error.patch”) to force an error when PG 
> > tries to write data to relation files. We can just apply the patch and 
> > there is no need to put efforts flooding data to disk any more.
>
> I

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-01-24 Thread Thomas Munro

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila  wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu  wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including 
> > disk full, random remote disk IO failures) happens in single user mode. We 
> > found the issue exists on both Linux and Windows in multiple postgres 
> > versions.
> >
> > 3.   Because the unable to write relation data scenario is difficult to 
> > hit naturally even reserved space is turned off, I have prepared a small 
> > patch (see attachment “emulate-error.patch”) to force an error when PG 
> > tries to write data to relation files. We can just apply the patch and 
> > there is no need to put efforts flooding data to disk any more.
>
> I have one question related to the way you have tried to emulate the error.
>
> @@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum,
> BlockNumber blocknum,
> nbytes,
> BLCKSZ);
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("Emulate exception in mdwrite() when writing to disk")));
> +
>
> We generally reserve the space in a relation before attempting to
> write, so not sure how you are able to hit the disk full situation via
> mdwrite.  If you see the description of the function, that also
> indicates same.

Presumably ZFS or BTRFS or something more exotic could still get
ENOSPC here, and of course any filesystem could give us EIO here
(because the disk is on fire or the remote NFS server is rebooting due
to an automatic Windows update).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-01-19 Thread Amit Kapila

On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu  wrote:
>
>
> Recently, we hit a few occurrences of deadlock when IO failure (including 
> disk full, random remote disk IO failures) happens in single user mode. We 
> found the issue exists on both Linux and Windows in multiple postgres 
> versions.
>
>
> 3.   Because the unable to write relation data scenario is difficult to 
> hit naturally even reserved space is turned off, I have prepared a small 
> patch (see attachment “emulate-error.patch”) to force an error when PG tries 
> to write data to relation files. We can just apply the patch and there is no 
> need to put efforts flooding data to disk any more.
>
>

I have one question related to the way you have tried to emulate the error.

@@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum,
BlockNumber blocknum,
nbytes,
BLCKSZ);
+ ereport(ERROR,
+ (errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg("Emulate exception in mdwrite() when writing to disk")));
+

We generally reserve the space in a relation before attempting to
write, so not sure how you are able to hit the disk full situation via
mdwrite.  If you see the description of the function, that also
indicates same.

/*
 * mdwrite() -- Write the supplied block at the appropriate location.
 *
 * This is to be used only for updating already-existing blocks of a
 * relation (ie, those before the current EOF).  To extend a relation,
 * use mdextend().
 */

I am not telling that mdwrite can never lead to error, but just trying
to understand the issue you actually faced.  I haven't read your
proposed solution yet, let's first try to establish the problem you
are facing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-01-02 Thread Chengchao Yu

Greetings,

Happy new year!

We would like to follow up again for this issue and fix proposal. Could someone 
give some suggestions to the fix proposal? Or other ideas to fix this issue?

Looking forward to your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Fservices%2Fpostgresql%2F=02%7C01%7Cchengyu%40microsoft.com%7C519f3f8b8d304d8945ba08d666048905%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636808567020764789=n9LnXSl1tXwEJWw71Nfv1Txj6iFXiEd9fWh3wM1pvfs%3D=0>


From: Chengchao Yu 
Sent: Wednesday, December 19, 2018 2:51 PM
To: pgsql-hack...@postgresql.org
Cc: Prabhat Tripathi ; Sunil Kamath 
; Michal Primke ; Bhavin 
Gandhi 
Subject: RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs

Greetings,

Just would like to follow up this issue and fix proposal. We really would like 
to have this issue fixed in PG. Could someone give some suggestions to the fix 
proposal? Or other ideas to fix this issue?

Looking forward for your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fazure.microsoft.com%2Fen-us%2Fservices%2Fpostgresql%2F=02%7C01%7Cchengyu%40microsoft.com%7C519f3f8b8d304d8945ba08d666048905%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636808567020764789=n9LnXSl1tXwEJWw71Nfv1Txj6iFXiEd9fWh3wM1pvfs%3D=0>

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' 
mailto:pgsql-hack...@postgresql.org>>
Cc: Prabhat Tripathi mailto:pt...@microsoft.com>>; Sunil 
Kamath mailto:sunil.kam...@microsoft.com>>; Michal 
Primke mailto:mpri...@microsoft.com>>
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs


Greetings,



Recently, we hit a few occurrences of deadlock when IO failure (including disk 
full, random remote disk IO failures) happens in single user mode. We found the 
issue exists on both Linux and Windows in multiple postgres versions.



Here are the steps to repro on Linux (as Windows repro is similar):


1.   Get latest PostgreSQL code, build and install the executables.



$ git clone 
https://git.postgresql.org/git/postgresql.git<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.postgresql.org%2Fgit%2Fpostgresql.git=02%7C01%7Cchengyu%40microsoft.com%7C519f3f8b8d304d8945ba08d666048905%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636808567020774790=QWbuMQ2RM4JVJPTcu2dCosMS18smHQHePjVIbhgw1Uo%3D=0>

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install


2.   Run initdb to initialize a PG database folder.



$ /path/to/postgres/bin/initdb -D /path/to/data


3.   Because the unable to write relation data scenario is difficult to hit 
naturally even reserved space is turned off, I have prepared a small patch (see 
attachment "emulate-error.patch") to force an error when PG tries to write data 
to relation files. We can just apply the patch and there is no need to put 
efforts flooding data to disk any more.



$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install


4.   Connect to the newly initialized database cluster with single user 
mode, create a table, and insert some data to the table, do a checkpoint or 
directly give EOF. Then we hit the deadlock issue and the process will not exit 
until we kill it.



Do a checkpoint explicitly:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;



2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247



Or directly give an EOF:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relati

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2018-12-19 Thread Chengchao Yu

Greetings,

Just would like to follow up this issue and fix proposal. We really would like 
to have this issue fixed in PG. Could someone give some suggestions to the fix 
proposal? Or other ideas to fix this issue?

Looking forward for your feedbacks!


Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

From: Chengchao Yu
Sent: Friday, November 30, 2018 1:00 PM
To: 'Pg Hackers' 
Cc: Prabhat Tripathi ; Sunil Kamath 
; Michal Primke 
Subject: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO 
Failure Occurs


Greetings,



Recently, we hit a few occurrences of deadlock when IO failure (including disk 
full, random remote disk IO failures) happens in single user mode. We found the 
issue exists on both Linux and Windows in multiple postgres versions.



Here are the steps to repro on Linux (as Windows repro is similar):


1.   Get latest PostgreSQL code, build and install the executables.



$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install


2.   Run initdb to initialize a PG database folder.



$ /path/to/postgres/bin/initdb -D /path/to/data


3.   Because the unable to write relation data scenario is difficult to hit 
naturally even reserved space is turned off, I have prepared a small patch (see 
attachment "emulate-error.patch") to force an error when PG tries to write data 
to relation files. We can just apply the patch and there is no need to put 
efforts flooding data to disk any more.



$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install


4.   Connect to the newly initialized database cluster with single user 
mode, create a table, and insert some data to the table, do a checkpoint or 
directly give EOF. Then we hit the deadlock issue and the process will not exit 
until we kill it.



Do a checkpoint explicitly:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> checkpoint;

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;



2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation 
base/12368/1247



Or directly give an EOF:



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true < create table t1(a int);

> insert into t1 values (1), (2), (3);

> EOF



PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:  Emulate 
exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation 
base/12368/1247


5.   Moreover, when we try to recover the database with single user mode, 
we hit the issue again, and the process does not bring up the database nor exit.



$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c 
exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was 
interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly shut 
down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at 0/1684B90: 
wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was at log 
time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite() when 
writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation 
base/12368/1247



Analyses:



So, what happened? Actually, there are 2 types of the deadlock due to the same 
root cause. Let's first take a look at the scenario in step #5. In this 
scenario, the deadlock happens when disk IO failure occurs inside 
StartupXLOG(). If we attach debugger to PG process, we will see the process is 
stuck acquiring the buffer's lw-lock in AbortBufferIO().



void

AbortBufferIO(void)

{

BufferDesc *buf = InProgressBuf;



if (buf)

{

uint32  buf_state;



/*

 * Since LWLockReleaseAll has already been called, we're not holding

 * the buffer's io_in_progress_lock. We have to re-acquire it so that

 * we can use TerminateBufferIO. Anyone who's

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2018-12-01 Thread Jesse Zhang

Hey Chengyu,
How did you set up your GDB to get "p" to pretty-print a Postgres list?

Cheers,
Jesse

On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu  wrote:

> Greetings,
>
>
>
> Recently, we hit a few occurrences of deadlock when IO failure (including
> disk full, random remote disk IO failures) happens in single user mode. We
> found the issue exists on both Linux and Windows in multiple postgres
> versions.
>
>
>
> Here are the steps to repro on Linux (as Windows repro is similar):
>
>
>
> 1.   Get latest PostgreSQL code, build and install the executables.
>
>
>
> $ git clone https://git.postgresql.org/git/postgresql.git
>
> $ cd postgresql
>
> $ PGROOT=$(pwd)
>
> $ git checkout REL_11_STABLE
>
> $ mkdir build
>
> $ cd build
>
> $ ../configure --prefix=/path/to/postgres
>
> $ make && make install
>
>
>
> 2.   Run initdb to initialize a PG database folder.
>
>
>
> $ /path/to/postgres/bin/initdb -D /path/to/data
>
>
>
> 3.   Because the unable to write relation data scenario is difficult
> to hit naturally even reserved space is turned off, I have prepared a small
> patch (see attachment “emulate-error.patch”) to force an error when PG
> tries to write data to relation files. We can just apply the patch and
> there is no need to put efforts flooding data to disk any more.
>
>
>
> $ cd $PGROOT
>
> $ git apply /path/to/emulate-error.patch
>
> $ cd build
>
> $ make && make install
>
>
>
> 4.   Connect to the newly initialized database cluster with single
> user mode, create a table, and insert some data to the table, do a
> checkpoint or directly give EOF. Then we hit the deadlock issue and the
> process will not exit until we kill it.
>
>
>
> Do a checkpoint explicitly:
>
>
>
> $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
> exit_on_error=true <
> > create table t1(a int);
>
> > insert into t1 values (1), (2), (3);
>
> > checkpoint;
>
> > EOF
>
>
>
> PostgreSQL stand-alone backend 11.1
>
> backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:
> Emulate exception in mdwrite() when writing to disk
>
> 2018-11-29 02:55:27.891 UTC [18806] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
> 2018-11-29 02:55:27.891 UTC [18806] STATEMENT:  checkpoint;
>
>
>
> 2018-11-29 02:55:27.900 UTC [18806] FATAL:  Emulate exception in mdwrite()
> when writing to disk
>
> 2018-11-29 02:55:27.900 UTC [18806] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
>
>
> Or directly give an EOF:
>
>
>
> $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
> exit_on_error=true <
> > create table t1(a int);
>
> > insert into t1 values (1), (2), (3);
>
> > EOF
>
>
>
> PostgreSQL stand-alone backend 11.1
>
> backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:
> Emulate exception in mdwrite() when writing to disk
>
> 2018-11-29 02:45:24.438 UTC [18149] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
>
>
> 5.   Moreover, when we try to recover the database with single user
> mode, we hit the issue again, and the process does not bring up the
> database nor exit.
>
>
>
> $ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
> exit_on_error=true
>
> 2018-11-29 02:59:33.257 UTC [19058] LOG:  database system shutdown was
> interrupted; last known up at 2018-11-29 02:58:49 UTC
>
> 2018-11-29 02:59:33.485 UTC [19058] LOG:  database system was not properly
> shut down; automatic recovery in progress
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  redo starts at 0/1672E40
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  invalid record length at
> 0/1684B90: wanted 24, got 0
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  redo done at 0/1684B68
>
> 2018-11-29 02:59:33.500 UTC [19058] LOG:  last completed transaction was
> at log time 2018-11-29 02:58:49.856663+00
>
> 2018-11-29 02:59:33.547 UTC [19058] FATAL:  Emulate exception in mdwrite()
> when writing to disk
>
> 2018-11-29 02:59:33.547 UTC [19058] CONTEXT:  writing block 8 of relation
> base/12368/1247
>
>
>
> Analyses:
>
>
>
> So, what happened? Actually, there are 2 types of the deadlock due to the
> same root cause. Let’s first take a look at the scenario in step #5. In
> this scenario, the deadlock happens when disk IO failure occurs inside
> StartupXLOG(). If we attach debugger to PG process, we will see the
> process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().
>
>
>
> void
>
> AbortBufferIO(void)
>
> {
>
> BufferDesc *buf = InProgressBuf;
>
>
>
> if (buf)
>
> {
>
> uint32  buf_state;
>
>
>
> /*
>
>  * Since LWLockReleaseAll has already been called, we're not
> holding
>
>  * the buffer's io_in_progress_lock. We have to re-acquire it so
> that
>
>  * we can use TerminateBufferIO. Anyone who's executing WaitIO on
> the
>
>  * buffer will be in a busy spin until we succeed in doing this.
>
>  */
>
> LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);
>
>
>
>

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

14 matches

Site Navigation

Mail list logo

Footer information