RE: Forget close an open relation in ReorderBufferProcessTXN()

osumi.takami...@fujitsu.com Thu, 20 May 2021 23:55:35 -0700

On Thursday, May 20, 2021 9:59 PM Amit Langote <amitlangot...@gmail.com> wrote:
> Here are updated/divided patches.
Thanks for your updates.


But, I've detected segmentation faults caused by the patch,
which can happen during 100_bugs.pl in src/test/subscription.
This happens more than one in ten times.

This problem would be a timing issue and has been introduced by v3 already.
I used v5 for HEAD also and reproduced this failure, while
OSS HEAD doesn't reproduce this, even when I executed 100_bugs.pl 200 times in 
a tight loop.
I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from 
v3.

* The message of the failure during TAP test.

# Postmaster PID for node "twoways" is 5015
Waiting for replication conn testsub's replay_lsn to pass pg_current_wal_lsn() 
on twoways
# poll_query_until timed out executing this query:
# SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming' FROM 
pg_catalog.pg_stat_replication WHERE application_name = 'testsub';
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# psql: error: connection to server on socket "/tmp/cs8dhFOtZZ/.s.PGSQL.59345" 
failed: No such file or directory
#       Is the server running locally and accepting connections on that socket?
timed out waiting for catchup at t/100_bugs.pl line 148.


The failure produces core file and its back trace is below.
My first guess of the cause is that between the timing to get an entry from 
hash_search() in get_rel_sync_entry()
and to set the map by convert_tuples_by_name() in maybe_send_schema(), we had 
invalidation message,
which tries to free unset descs in the entry ?

* core file backtrace

Core was generated by `postgres: twoways: walsender k5user [local] 
START_REPLICATION                 '.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f93b38b8c2b in rel_sync_cache_relation_cb (arg=0, relid=16388) at 
pgoutput.c:1225
1225                            FreeTupleDesc(entry->map->indesc);
Missing separate debuginfos, use: debuginfo-install libgcc-4.8.5-44.el7.x86_64 
libicu-50.2-4.el7_7.x86_64 libstdc++-4.8.5-44.el7.x86_64
(gdb) bt
#0  0x00007f93b38b8c2b in rel_sync_cache_relation_cb (arg=0, relid=16388) at 
pgoutput.c:1225
#1  0x0000000000ae21f0 in LocalExecuteInvalidationMessage (msg=0x21d1de8) at 
inval.c:601
#2  0x00000000008dbd6e in ReorderBufferExecuteInvalidations (nmsgs=4, 
msgs=0x21d1db8) at reorderbuffer.c:3232
#3  0x00000000008da70a in ReorderBufferProcessTXN (rb=0x21d1a40, txn=0x2210b58, 
commit_lsn=25569096, snapshot_now=0x21d1e10, command_id=1, streaming=false)
    at reorderbuffer.c:2294
#4  0x00000000008dae56 in ReorderBufferReplay (txn=0x2210b58, rb=0x21d1a40, 
xid=748, commit_lsn=25569096, end_lsn=25569216, commit_time=674891531661619,
    origin_id=0, origin_lsn=0) at reorderbuffer.c:2591
#5  0x00000000008daede in ReorderBufferCommit (rb=0x21d1a40, xid=748, 
commit_lsn=25569096, end_lsn=25569216, commit_time=674891531661619, origin_id=0,
    origin_lsn=0) at reorderbuffer.c:2615
#6  0x00000000008cae06 in DecodeCommit (ctx=0x21e6880, buf=0x7fffb9383db0, 
parsed=0x7fffb9383c10, xid=748, two_phase=false) at decode.c:744
#7  0x00000000008ca1ed in DecodeXactOp (ctx=0x21e6880, buf=0x7fffb9383db0) at 
decode.c:278
#8  0x00000000008c9e76 in LogicalDecodingProcessRecord (ctx=0x21e6880, 
record=0x21e6c80) at decode.c:142
#9  0x0000000000901fcc in XLogSendLogical () at walsender.c:2876
#10 0x0000000000901327 in WalSndLoop (send_data=0x901f30 <XLogSendLogical>) at 
walsender.c:2306
#11 0x00000000008ffd5f in StartLogicalReplication (cmd=0x219aff8) at 
walsender.c:1206
#12 0x00000000009006ae in exec_replication_command (
    cmd_string=0x2123c20 "START_REPLICATION SLOT 
\"pg_16400_sync_16392_6964617299612181363\" LOGICAL 0/182D058 (proto_version 
'2', publication_names '\"testpub\"')") at walsender.c:1646
#13 0x000000000096ffae in PostgresMain (argc=1, argv=0x7fffb93840d0, 
dbname=0x214ef18 "d1", username=0x214eef8 "k5user") at postgres.c:4482

I'll update when I get more information.


Best Regards,
        Takamichi Osumi

RE: Forget close an open relation in ReorderBufferProcessTXN()

Reply via email to