Re: [HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION

2017-11-02 Thread Peter Eisentraut
On 10/24/17 13:13, Konstantin Knizhnik wrote:
> The reason of this deadlock seems to be clear: ALTER SUBSCRIPTION starts 
> transaction at one node and tries to create slot at other node, which waiting 
> for completion of all active transaction while building scnapshpot.
> Is there any way to avoid this deadlock?

I don't see a way to avoid it in general, unless we come up with a novel
way of creating replication slots.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION

2017-10-31 Thread Masahiko Sawada
On Tue, Oct 24, 2017 at 7:13 PM, Konstantin Knizhnik
 wrote:
> Parallel execution of ALTER SUBSCRIPTION REFRESH PUBLICATION at several
> nodes may cause deadlock:
>
> knizhnik  1480  0.0  0.1 1417532 16496 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16589 sync 16720
> waiting
> knizhnik  1481  0.0  0.1 1417668 17668 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(36378) idle
> knizhnik  1484  0.0  0.1 1406952 16676 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(56194) ALTER SUBSCRIPTION waiting
> knizhnik  1486  0.0  0.1 1410040 21268 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(36386) idle
> knizhnik  1487  0.0  0.1 1417532 16736 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16589 sync 16774
> knizhnik  1489  0.0  0.1 1410040 21336 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(36388) idle
> knizhnik  1510  0.0  0.1 1406952 16820 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(56228) ALTER SUBSCRIPTION waiting
> knizhnik  1530  0.0  0.1 1406952 16736 ?   Ss   20:01   0:00 postgres:
> knizhnik postgres 127.0.0.1(56260) ALTER SUBSCRIPTION waiting
> knizhnik  1534  0.0  0.0 1417504 14360 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16590
> knizhnik  1537  0.0  0.1 1409164 16920 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(45054) idle
> knizhnik  1545  0.0  0.0 1417344 14576 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592
> knizhnik  1546  0.0  0.1 1409168 16588 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56274) idle
> knizhnik  1547  0.0  0.0 1417348 14332 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16606
> knizhnik  1549  0.0  0.0 1409004 14512 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56276) idle in transaction waiting
> knizhnik  1553  0.0  0.0 1417348 14540 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592
> knizhnik  1554  0.0  0.1 1409168 16596 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56280) idle
> knizhnik  1555  0.0  0.0 1417348 14332 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16660
> knizhnik  1556  0.0  0.0 1409004 14520 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56282) idle in transaction waiting
> knizhnik  1558  0.0  0.0 1417348 14460 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16696
> knizhnik  1560  0.0  0.0 1409004 14516 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56286) idle in transaction waiting
> knizhnik  1562  0.0  0.0 1417348 14084 ?   Ss   20:01   0:00 postgres:
> bgworker: logical replication worker for subscription 16592 sync 16732
> knizhnik  1567  0.0  0.0 1409004 14516 ?   Ss   20:01   0:00 postgres:
> wal sender process knizhnik 127.0.0.1(56288) idle in transaction waiting
>
> knizhnik@knizhnik:~$ gdb -p 1556
> (gdb) bt
> #0  0x7f7b991569b3 in __epoll_wait_nocancel () at
> ../sysdeps/unix/syscall-template.S:84
> #1  0x0087185a in WaitEventSetWaitBlock (set=0x246d1e8,
> cur_timeout=-1, occurred_events=0x73a43580, nevents=1) at latch.c:1048
> #2  0x00871716 in WaitEventSetWait (set=0x246d1e8, timeout=-1,
> occurred_events=0x73a43580, nevents=1, wait_event_info=50331652) at
> latch.c:1000
> #3  0x00870e6c in WaitLatchOrSocket (latch=0x7f7b975daba4,
> wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385
> #4  0x00870d3e in WaitLatch (latch=0x7f7b975daba4, wakeEvents=1,
> timeout=0, wait_event_info=50331652) at latch.c:339
> #5  0x0088995c in ProcSleep (locallock=0x24764d0,
> lockMethodTable=0xc17420 ) at proc.c:1255
> #6  0x008828cb in WaitOnLock (locallock=0x24764d0, owner=0x2517c78)
> at lock.c:1702
> #7  0x0088157e in LockAcquireExtended (locktag=0x73a439a0,
> lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
> '\001') at lock.c:998
> #8  0x00880ba8 in LockAcquire (locktag=0x73a439a0, lockmode=5,
> sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688
> #9  0x0087f8dc in XactLockTableWait (xid=624, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:587
> #10 0x0082f524 in SnapBuildWaitSnapshot (running=0x246d1b0,
> cutoff=632) at snapbuild.c:1400
> #11 0x0082f2f6 in SnapBuildFindSnapshot (builder=0x254b550,
> lsn=24410008, running=0x246d1b0) at snapbuild.c:1311
> #12 0x0082ee67 in SnapBuildProcessRunningXacts (builder=0x254b550,
> lsn=24410008, running=0x246d1b0) at 

[HACKERS] Deadlock in ALTER SUBSCRIPTION REFRESH PUBLICATION

2017-10-24 Thread Konstantin Knizhnik
Parallel execution of ALTER SUBSCRIPTION REFRESH PUBLICATION at several 
nodes may cause deadlock:


knizhnik  1480  0.0  0.1 1417532 16496 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16589 
sync 16720    waiting
knizhnik  1481  0.0  0.1 1417668 17668 ?   Ss   20:01   0:00 
postgres: knizhnik postgres 127.0.0.1(36378) idle
knizhnik  1484  0.0  0.1 1406952 16676 ?   Ss   20:01   0:00 
postgres: knizhnik postgres 127.0.0.1(56194) ALTER SUBSCRIPTION waiting
knizhnik  1486  0.0  0.1 1410040 21268 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(36386) idle
knizhnik  1487  0.0  0.1 1417532 16736 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16589 
sync 16774
knizhnik  1489  0.0  0.1 1410040 21336 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(36388) idle
knizhnik  1510  0.0  0.1 1406952 16820 ?   Ss   20:01   0:00 
postgres: knizhnik postgres 127.0.0.1(56228) ALTER SUBSCRIPTION waiting
knizhnik  1530  0.0  0.1 1406952 16736 ?   Ss   20:01   0:00 
postgres: knizhnik postgres 127.0.0.1(56260) ALTER SUBSCRIPTION waiting
knizhnik  1534  0.0  0.0 1417504 14360 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16590
knizhnik  1537  0.0  0.1 1409164 16920 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(45054) idle
knizhnik  1545  0.0  0.0 1417344 14576 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16592
knizhnik  1546  0.0  0.1 1409168 16588 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(56274) idle
knizhnik  1547  0.0  0.0 1417348 14332 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16592 
sync 16606
knizhnik  1549  0.0  0.0 1409004 14512 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(56276) idle in 
transaction waiting
knizhnik  1553  0.0  0.0 1417348 14540 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16592
knizhnik  1554  0.0  0.1 1409168 16596 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(56280) idle
knizhnik  1555  0.0  0.0 1417348 14332 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16592 
sync 16660
knizhnik  1556  0.0  0.0 1409004 14520 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(56282) idle in 
transaction waiting
knizhnik  1558  0.0  0.0 1417348 14460 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16592 
sync 16696
knizhnik  1560  0.0  0.0 1409004 14516 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(56286) idle in 
transaction waiting
knizhnik  1562  0.0  0.0 1417348 14084 ?   Ss   20:01   0:00 
postgres: bgworker: logical replication worker for subscription 16592 
sync 16732
knizhnik  1567  0.0  0.0 1409004 14516 ?   Ss   20:01   0:00 
postgres: wal sender process knizhnik 127.0.0.1(56288) idle in 
transaction waiting


knizhnik@knizhnik:~$ gdb -p 1556
(gdb) bt
#0  0x7f7b991569b3 in __epoll_wait_nocancel () at 
../sysdeps/unix/syscall-template.S:84
#1  0x0087185a in WaitEventSetWaitBlock (set=0x246d1e8, cur_timeout=-1, 
occurred_events=0x73a43580, nevents=1) at latch.c:1048
#2  0x00871716 in WaitEventSetWait (set=0x246d1e8, timeout=-1, 
occurred_events=0x73a43580, nevents=1, wait_event_info=50331652) at 
latch.c:1000
#3  0x00870e6c in WaitLatchOrSocket (latch=0x7f7b975daba4, 
wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385
#4  0x00870d3e in WaitLatch (latch=0x7f7b975daba4, wakeEvents=1, 
timeout=0, wait_event_info=50331652) at latch.c:339
#5  0x0088995c in ProcSleep (locallock=0x24764d0, lockMethodTable=0xc17420 
) at proc.c:1255
#6  0x008828cb in WaitOnLock (locallock=0x24764d0, owner=0x2517c78) at 
lock.c:1702
#7  0x0088157e in LockAcquireExtended (locktag=0x73a439a0, 
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 
'\001') at lock.c:998
#8  0x00880ba8 in LockAcquire (locktag=0x73a439a0, lockmode=5, 
sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688
#9  0x0087f8dc in XactLockTableWait (xid=624, rel=0x0, ctid=0x0, 
oper=XLTW_None) at lmgr.c:587
#10 0x0082f524 in SnapBuildWaitSnapshot (running=0x246d1b0, cutoff=632) 
at snapbuild.c:1400
#11 0x0082f2f6 in SnapBuildFindSnapshot (builder=0x254b550, 
lsn=24410008, running=0x246d1b0) at snapbuild.c:1311
#12 0x0082ee67 in SnapBuildProcessRunningXacts (builder=0x254b550, 
lsn=24410008, running=0x246d1b0) at snapbuild.c:1106
#13 0x0081c458 in DecodeStandbyOp (ctx=0x25334b0, buf=0x73a43b10) 
at decode.c:314
#14 0x0081bf39 in LogicalDecodingProcessRecord (ctx=0x25334b0,