subject:"\[HACKERS\] Issues with logical replication"

Re: [HACKERS] Issues with logical replication

2017-10-09 Thread Stas Kelvich


> On 2 Oct 2017, at 19:59, Petr Jelinek  wrote:
> 
>> 
>> Program terminated with signal SIGABRT, Aborted.
>> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> 56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
>> (gdb) bt
>> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
>> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
>> #1  0x7f3608f92028 in __GI_abort () at abort.c:89
>> #2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30
>> "!(((xid) != ((TransactionId) 0)))",
>> errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
>> lineNumber=582) at assert.c:54
>> #3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
>> oper=XLTW_None) at lmgr.c:582
>> #4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
>> cutoff=898498) at snapbuild.c:1400
>> #5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
>> lsn=798477760, running=0x2749198) at snapbuild.c:1311
>> #6  0x0081c339 in SnapBuildProcessRunningXacts
>> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
> 
> 
> Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
> transaction ids but we specifically skip those in
> GetRunningTransactionData(). Can you check pg_waldump output for that
> record (lsn is shown in the backtrace)?

  I investigated this case and it seems that XactLockTableWait() in 
SnapBuildWaitSnapshot()
not always work as expected. XactLockTableWait() waits on LockAcquire() for xid 
to be 
completed and if we finally got this lock but transaction is still in progress 
then such xid
assumed to be a subxid. However LockAcquire/LockRelease cycle can happen after 
transaction
set xid, but before XactLockTableInsert().

Namely following history happened for xid 5225 and lead to crash:

[backend] LOG:  AssignTransactionId: XactTopTransactionId = 5225
   [walsender] LOG:  LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, 
xid[0]=5225
   [walsender] LOG:  XactLockTableWait: LockAcquire 5225
   [walsender] LOG:  XactLockTableWait: LockRelease 5225
[backend] LOG:  AssignTransactionId: LockAcquire ExclusiveLock 5225
   [walsender] LOG:  TransactionIdIsInProgress: SVC->latestCompletedXid=5224 < 
xid=5225 => true
[backend] LOG:  CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
[backend] LOG:  CommitTransaction: ResourceOwnerRelease locks xid=5225

I’ve quickly checked other usages of XactLockTableWait() and it seems that it 
is used mostly with
xids from heap so tx definetly set it lock somewhere in the past.

Not sure what it the best approach to handle that. May be write running xacts 
only if they already
set their lock?

Also attaching pgbench script that can reproduce failure. I run it over usual 
pgbench database
in 10 threads. It takes several minutes to crash.



reload.pgb
Description: Binary data

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with logical replication

2017-10-03 Thread Masahiko Sawada

On Wed, Oct 4, 2017 at 8:35 AM, Petr Jelinek
 wrote:
> On 02/10/17 18:59, Petr Jelinek wrote:
>>>
>>> Now fix the trigger function:
>>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>>> BEGIN
>>>   RETURN NEW;
>>> END $$ LANGUAGE plpgsql;
>>>
>>> And manually perform at master two updates inside one transaction:
>>>
>>> postgres=# begin;
>>> BEGIN
>>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>>> UPDATE 1
>>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>>> UPDATE 1
>>> postgres=# commit;
>>> 
>>>
>>> and in replica log we see:
>>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>>> worker for subscription "sub" has started
>>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>>> tuple
>>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>>> replication worker for subscription 16403 (PID 2954) exited with exit
>>> code 1
>>>
>>> Error happens in trigger.c:
>>>
>>> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>> lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>>> trigger.c:3103
>>> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>> fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>>> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>> at execReplication.c:461
>>> #6  0x00820894 in apply_handle_update (s=0x7ffc442163b0) at
>>> worker.c:736
>>
>> We have locked the same tuple in RelationFindReplTupleByIndex() just
>> before this gets called and didn't get the same error. I guess we do
>> something wrong with snapshots. Will need to investigate more.
>>
>
> Okay, so it's not snapshot. It's the fact that we don't set the
> es_output_cid in replication worker which GetTupleForTrigger is using
> when locking the tuple. Attached one-liner fixes it.
>

Thank you for the patch.
This bug can happen even without the trigger and I confirmed tgat the
bug is fixed by the patch. I think the patch fixed it properly.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issues with logical replication

2017-10-03 Thread Petr Jelinek

On 02/10/17 18:59, Petr Jelinek wrote:
>>
>> Now fix the trigger function:
>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
>> BEGIN
>>   RETURN NEW;
>> END $$ LANGUAGE plpgsql;
>>
>> And manually perform at master two updates inside one transaction:
>>
>> postgres=# begin;
>> BEGIN
>> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
>> UPDATE 1
>> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
>> UPDATE 1
>> postgres=# commit;
>> 
>>
>> and in replica log we see:
>> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
>> worker for subscription "sub" has started
>> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
>> tuple
>> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
>> replication worker for subscription 16403 (PID 2954) exited with exit
>> code 1
>>
>> Error happens in trigger.c:
>>
>> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
>> trigger.c:3103
>> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
>> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
>> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>>     at execReplication.c:461
>> #6  0x00820894 in apply_handle_update (s=0x7ffc442163b0) at
>> worker.c:736
> 
> We have locked the same tuple in RelationFindReplTupleByIndex() just
> before this gets called and didn't get the same error. I guess we do
> something wrong with snapshots. Will need to investigate more.
> 

Okay, so it's not snapshot. It's the fact that we don't set the
es_output_cid in replication worker which GetTupleForTrigger is using
when locking the tuple. Attached one-liner fixes it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index b58d2e1008..d6a5cf7088 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -204,6 +204,8 @@ create_estate_for_relation(LogicalRepRelMapEntry *rel)
 	estate->es_num_result_relations = 1;
 	estate->es_result_relation_info = resultRelInfo;
 
+	estate->es_output_cid = GetCurrentCommandId(true);
+
 	/* Triggers might need a slot */
 	if (resultRelInfo->ri_TrigDesc)
 		estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Issues with logical replication

2017-10-02 Thread Konstantin Knizhnik


I have faced two issues with logical replication.
Repro scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:
pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:
create publication pub for table pgbench_accounts;
4. Create subscription at another node:
delete from pgbench_accounts;
CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost 
port=5432 sslmode=disable' publication pub;
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS 
TRIGGER AS $$

BEGIN
--  RETURN NEW;
END $$ LANGUAGE plpgsql;
   CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE 
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE 
replication_trigger_proc();

   ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:
pgbench -T 1000 -P 2 -c 10 postgres


Please notice commented "return new" statement. Invocation of this 
function cause error and in log we see repeated messages:


2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table 
synchronization worker for subscription "sub", table "pgbench_accounts" 
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of 
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function 
replication_trigger_proc()

COPY pgbench_accounts, line 1: "110 "
2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical 
replication worker for subscription 17264 sync 17251 (PID 32129) exited 
with exit code 1

...


After few minutes of work primary node (with publication) is crashed 
with the following stack trace:


Program terminated with signal SIGABRT, Aborted.
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1  0x7f3608f92028 in __GI_abort () at abort.c:89
#2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 
"!(((xid) != ((TransactionId) 0)))",
errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", 
lineNumber=582) at assert.c:54
#3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, 
oper=XLTW_None) at lmgr.c:582
#4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, 
cutoff=898498) at snapbuild.c:1400
#5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, 
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6  0x0081c339 in SnapBuildProcessRunningXacts 
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
#7  0x0080a1c7 in DecodeStandbyOp (ctx=0x27ef870, 
buf=0x7ffd301858d0) at decode.c:314
#8  0x00809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, 
record=0x27efb30) at decode.c:117
#9  0x0080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) 
at logical.c:458
#10 0x00823968 in CreateReplicationSlot (cmd=0x27483a8) at 
walsender.c:934

#11 0x008246ee in exec_replication_command (
cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT 
\"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at 
walsender.c:1511
#12 0x0088eb44 in PostgresMain (argc=1, argv=0x275b738, 
dbname=0x275b578 "postgres", username=0x272b800 "knizhnik")

at postgres.c:4086
#13 0x007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357
#14 0x007ef0cb in BackendStartup (port=0x27532a0) at 
postmaster.c:4029

#15 0x007eb68b in ServerLoop () at postmaster.c:1753
#16 0x007eac77 in PostmasterMain (argc=3, argv=0x2729670) at 
postmaster.c:1361

#17 0x00728552 in main (argc=3, argv=0x2729670) at main.c:228


Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;


and in replcas log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply 
worker for subscription "sub" has started

2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical 
replication worker for subscription 16403 (PID 2954) exited with exit code 1


Error happens in trigger.c:
#3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at 
trigger.c:3103
#4  0x0069b259 in ExecBRUpdateTriggers

Re: [HACKERS] Issues with logical replication

2017-10-02 Thread Petr Jelinek

On 02/10/17 17:49, Konstantin Knizhnik wrote:
> I have faced two issues with logical replication.
> Reproducing scenario:
> 
> 1. start two Postgres instances (I start both at local machine).
> 2. Initialize pgbench tables at both instances:
>     pgbench -i -s 10 postgres
> 3. Create publication of pgbench_accounts table at one node:
>     create publication pub for table pgbench_accounts;
> 4. Create subscription at another node:
>     delete from pgbench_accounts;
>     CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost
> port=5432 sslmode=disable' publication pub;
>     CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS
> TRIGGER AS $$
>     BEGIN
>     --  RETURN NEW;
>     END $$ LANGUAGE plpgsql;
>    CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE
> ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE
> replication_trigger_proc();
>    ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
> 5. Start pgbench at primary node:
>     pgbench -T 1000 -P 2 -c 10 postgres
> 
> 
> Please notice commented "return new" statement. Invocation of this
> function cause error and in log we see repeated messages:
> 
> 2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table
> synchronization worker for subscription "sub", table "pgbench_accounts"
> has started
> 2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of
> trigger procedure without RETURN
> 2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function
> replication_trigger_proc()
>     COPY pgbench_accounts, line 1: "1    1    0 "
> 2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical
> replication worker for subscription 17264 sync 17251 (PID 32129) exited
> with exit code 1
> ...
> 
> 
> After few minutes of work primary node (with publication) is crashed
> with the following stack trace:
> 
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56    ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f3608f92028 in __GI_abort () at abort.c:89
> #2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30
> "!(((xid) != ((TransactionId) 0)))",
>     errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c",
> lineNumber=582) at assert.c:54
> #3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
> oper=XLTW_None) at lmgr.c:582
> #4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198,
> cutoff=898498) at snapbuild.c:1400
> #5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910,
> lsn=798477760, running=0x2749198) at snapbuild.c:1311
> #6  0x0081c339 in SnapBuildProcessRunningXacts
> (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106


Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid
transaction ids but we specifically skip those in
GetRunningTransactionData(). Can you check pg_waldump output for that
record (lsn is shown in the backtrace)?

> 
> Now fix the trigger function:
> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
> BEGIN
>   RETURN NEW;
> END $$ LANGUAGE plpgsql;
> 
> And manually perform at master two updates inside one transaction:
> 
> postgres=# begin;
> BEGIN
> postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
> UPDATE 1
> postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
> UPDATE 1
> postgres=# commit;
> 
> 
> and in replica log we see:
> 2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply
> worker for subscription "sub" has started
> 2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible
> tuple
> 2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical
> replication worker for subscription 16403 (PID 2954) exited with exit
> code 1
> 
> Error happens in trigger.c:
> 
> #3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
>     lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at
> trigger.c:3103
> #4  0x0069b259 in ExecBRUpdateTriggers (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac,
>     fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748
> #5  0x006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50,
> epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240)
>     at execReplication.c:461
> #6  0x00820894 in apply_handle_update (s=0x7ffc442163b0) at
> worker.c:736

We have locked the same tuple in RelationFindReplTupleByIndex() just
before this gets called and didn't get the same error. I guess we do
something wrong with snapshots. Will need to investigate more.

-- 
  Petr Jelinek

[HACKERS] Issues with logical replication

2017-10-02 Thread Konstantin Knizhnik


I have faced two issues with logical replication.
Reproducing scenario:

1. start two Postgres instances (I start both at local machine).
2. Initialize pgbench tables at both instances:
pgbench -i -s 10 postgres
3. Create publication of pgbench_accounts table at one node:
create publication pub for table pgbench_accounts;
4. Create subscription at another node:
delete from pgbench_accounts;
CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost 
port=5432 sslmode=disable' publication pub;
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS 
TRIGGER AS $$

BEGIN
--  RETURN NEW;
END $$ LANGUAGE plpgsql;
   CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE 
ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE 
replication_trigger_proc();

   ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger;
5. Start pgbench at primary node:
pgbench -T 1000 -P 2 -c 10 postgres


Please notice commented "return new" statement. Invocation of this 
function cause error and in log we see repeated messages:


2017-10-02 16:47:46.764 MSK [32129] LOG:  logical replication table 
synchronization worker for subscription "sub", table "pgbench_accounts" 
has started
2017-10-02 16:47:46.771 MSK [32129] ERROR:  control reached end of 
trigger procedure without RETURN
2017-10-02 16:47:46.771 MSK [32129] CONTEXT:  PL/pgSQL function 
replication_trigger_proc()

COPY pgbench_accounts, line 1: "110 "
2017-10-02 16:47:46.772 MSK [28020] LOG:  worker process: logical 
replication worker for subscription 17264 sync 17251 (PID 32129) exited 
with exit code 1

...


After few minutes of work primary node (with publication) is crashed 
with the following stack trace:


Program terminated with signal SIGABRT, Aborted.
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x7f3608f8ec37 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56

#1  0x7f3608f92028 in __GI_abort () at abort.c:89
#2  0x009f5740 in ExceptionalCondition (conditionName=0xbf6b30 
"!(((xid) != ((TransactionId) 0)))",
errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", 
lineNumber=582) at assert.c:54
#3  0x0086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, 
oper=XLTW_None) at lmgr.c:582
#4  0x0081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, 
cutoff=898498) at snapbuild.c:1400
#5  0x0081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, 
lsn=798477760, running=0x2749198) at snapbuild.c:1311
#6  0x0081c339 in SnapBuildProcessRunningXacts 
(builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106
#7  0x0080a1c7 in DecodeStandbyOp (ctx=0x27ef870, 
buf=0x7ffd301858d0) at decode.c:314
#8  0x00809ce1 in LogicalDecodingProcessRecord (ctx=0x27ef870, 
record=0x27efb30) at decode.c:117
#9  0x0080ddf9 in DecodingContextFindStartpoint (ctx=0x27ef870) 
at logical.c:458
#10 0x00823968 in CreateReplicationSlot (cmd=0x27483a8) at 
walsender.c:934

#11 0x008246ee in exec_replication_command (
cmd_string=0x27b9520 "CREATE_REPLICATION_SLOT 
\"sub_17264_sync_17251\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at 
walsender.c:1511
#12 0x0088eb44 in PostgresMain (argc=1, argv=0x275b738, 
dbname=0x275b578 "postgres", username=0x272b800 "knizhnik")

at postgres.c:4086
#13 0x007ef9a1 in BackendRun (port=0x27532a0) at postmaster.c:4357
#14 0x007ef0cb in BackendStartup (port=0x27532a0) at 
postmaster.c:4029

#15 0x007eb68b in ServerLoop () at postmaster.c:1753
#16 0x007eac77 in PostmasterMain (argc=3, argv=0x2729670) at 
postmaster.c:1361

#17 0x00728552 in main (argc=3, argv=0x2729670) at main.c:228


Now fix the trigger function:
CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$
BEGIN
  RETURN NEW;
END $$ LANGUAGE plpgsql;

And manually perform at master two updates inside one transaction:

postgres=# begin;
BEGIN
postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26;
UPDATE 1
postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26;
UPDATE 1
postgres=# commit;


and in replica log we see:
2017-10-02 18:40:26.094 MSK [2954] LOG:  logical replication apply 
worker for subscription "sub" has started
2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible 
tuple
2017-10-02 18:40:26.102 MSK [2882] LOG:  worker process: logical 
replication worker for subscription 16403 (PID 2954) exited with exit 
code 1


Error happens in trigger.c:

#3  0x0069bddb in GetTupleForTrigger (estate=0x2e36b50, 
epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac,
lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at 
trigger.c:3103
#4  0x0069b259 in ExecBRUpdateTriggers

Re: [HACKERS] Issues with logical replication

Re: [HACKERS] Issues with logical replication

Re: [HACKERS] Issues with logical replication

[HACKERS] Issues with logical replication

Re: [HACKERS] Issues with logical replication

[HACKERS] Issues with logical replication

6 matches

Site Navigation

Mail list logo

Footer information