Also, what is the proper way to locate and remove these replication slots as mentioned in this log?
On Mon, May 11, 2015 at 11:36 PM, Wayne E. Seguin <wayneeseg...@gmail.com> wrote: > Also, > > Is there a way to remove these things from the init target node easier? > > d= p=504 a=ERROR: 55000: previous init failed, manual cleanup is required > d= p=504 a=DETAIL: Found bdr.bdr_nodes entry for bdr > (6147869128174526660,1,16908,) with state=i in remote bdr.bdr_nodes > d= p=504 a=HINT: Remove all replication identifiers and slots > corresponding to this node from the init target node then drop and recreate > this database and try again > > > > On Mon, May 11, 2015 at 11:33 PM, Wayne E. Seguin <wayneeseg...@gmail.com> > wrote: > >> To the wonderful BDR folks, >> >> I have a 5 node pg+bdr cluster that I've spun up. I have verified that >> each node can connect via psql to each other node. I am having an issue >> creating the bdr group and having the nodes join each other. >> >> The steps I am taking are as follows (I used pg compiled using getting >> started guide instructions): >> 1. initdb >> 2. start postgres >> 3. create database pgbdr >> 4. create extension btree_gist (actually I'm doing this on template1 for >> my purposes) >> 5. create extension bdr >> 6. on node 0 I am doing: >> SELECT bdr.bdr_group_create( >> local_node_name := 'pgbdr0', >> node_external_dsn := 'host=10.244.2.2 port=5432 user=postgres >> dbname=pgbdr' >> ); >> 7. on nodes 1-4 I am doing (adjusted for the nodes IP): >> SELECT bdr.bdr_group_join( >> local_node_name := 'pgbdr1', >> node_external_dsn := 'host=10.244.2.6 port=5432 user=postgres >> dbname=pgbdr', >> join_using_dsn := 'host=10.244.2.2 port=5432 user=postgres >> dbname=pgbdr' >> ); >> >> Sometimes this works and all nodes are joined, but it isn't consistent >> and I'm not sure at the moment how to debug why. >> >> From the most recent attempt on node 0: >> >> pgbdr=# SELECT * FROM bdr.bdr_nodes; >> node_sysid | node_timeline | node_dboid | node_status | >> node_name | node_local_dsn | >> node_init_from_dsn >> >> ---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+-------------------- >> 6147857338297172162 | 1 | 16908 | r | rdpg0 >> | host=10.244.2.2 port=5432 user=postgres dbname=pgbdr | >> (1 row) >> >> From the most recent attempt on node 1: >> >> pgbdr=# SELECT * FROM bdr.bdr_nodes; >> node_sysid | node_timeline | node_dboid | node_status | >> node_name | node_local_dsn | >> node_init_from_d >> sn >> >> ---------------------+---------------+------------+-------------+-----------+------------------------------------------------------+---------------------------------- >> -------------------- >> 6147857345042854082 | 1 | 16908 | i | rdpg1 >> | host=10.244.2.6 port=5432 user=postgres dbname=pgbdr | host=10.244.2.2 >> port=5432 user=po >> stgres dbname=pgbdr >> (1 row) >> >> I have pasted some relevant logs from node 0 further below. >> >> Even if I use 'SELECT bdr.bdr_node_join_wait_for_ready();' it will >> usually just hang. >> >> How can I debug why the nodes are not connecting to each other find out >> where they getting stuck trying to join? >> >> ~Wayne >> >> Wayne E. Seguin >> wayneeseg...@gmail.com >> wayneeseguin on irc.freenode.net >> http://twitter.com/wayneeseguin/ >> https://github.com/wayneeseguin/ >> >> >> >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000: >> received replication command: IDENTIFY_SYSTEM >> [171/773] >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000: >> received replication command: CREATE_REPLICATION_SLOT >> "bdr_16908_6147857345042854082_1_16908__" >> LOGICAL bdr >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DEBUG: 00000: >> searching for logical decoding starting point, starting at 0/18E87A0 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: >> DecodingContextFindStartpoint, logical.c:469 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000: >> logical decoding found consistent point at 0/18E87A0 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init DETAIL: There >> are no running transactions. >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: >> SnapBuildFindSnapshot, snapbuild.c:1271 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 00000: >> exported logical decoding snapshot: "0000071E-1" with 0 transaction IDs >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: >> SnapBuildExportSnapshot, snapbuild.c:604 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000: >> received replication command: IDENTIFY_SYSTEM >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000: >> received replication command: CREATE_REPLICATION_SLOT >> "bdr_16908_6147857344392790209_1_16908__" >> LOGICAL bdr >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DEBUG: 00000: >> searching for logical decoding starting point, starting at 0/18EC970 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> DecodingContextFindStartpoint, logical.c:469 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000: >> logical decoding found initial starting point at 0/18EC970 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL: 1 >> transaction needs to finish. >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> SnapBuildFindSnapshot, snapbuild.c:1338 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000: >> received replication command: IDENTIFY_SYSTEM >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000: >> received replication command: CREATE_REPLICATION_SLOT >> "bdr_16908_6147857340000055490_1_16908__" >> LOGICAL bdr >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOG: 08006: >> unexpected EOF on client connection with an open transaction >> d=pgbdr p=412 a=bdr (6147857345042854082,1,16908,):init LOCATION: >> SocketBackend, postgres.c:344 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DEBUG: 00000: >> searching for logical decoding starting point, starting at 0/18EC9F0 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> DecodingContextFindStartpoint, logical.c:469 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000: >> logical decoding found initial starting point at 0/18EC9F0 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL: 1 >> transaction needs to finish. >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> SnapBuildFindSnapshot, snapbuild.c:1338 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000: >> logical decoding found consistent point at 0/18ECA30 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init DETAIL: >> Transaction ID 1822 finished; no more running transactions. >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> SnapBuildEndTxn, snapbuild.c:907 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000: >> logical decoding found consistent point at 0/18ECA30 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init DETAIL: >> Transaction ID 1822 finished; no more running transactions. >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> SnapBuildEndTxn, snapbuild.c:907 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 00000: >> exported logical decoding snapshot: "00000720-1" with 0 transaction IDs >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> SnapBuildExportSnapshot, snapbuild.c:604 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 00000: >> exported logical decoding snapshot: "0000071F-1" with 1 transaction ID >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> SnapBuildExportSnapshot, snapbuild.c:604 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000: >> received replication command: IDENTIFY_SYSTEM >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000: >> received replication command: CREATE_REPLICATION_SLOT >> "bdr_16908_6147857340923527359_1_16908__" >> LOGICAL bdr >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> exec_replication_command, walsender.c:1291 >> d= p=392 a=DEBUG: 00000: started 0 elections >> d= p=392 a=LOCATION: bdr_sequencer_start_elections, bdr_seq.c:688 >> d= p=392 a=DEBUG: 00000: started 0 votes >> d= p=392 a=LOCATION: bdr_sequencer_vote, bdr_seq.c:629 >> d= p=392 a=DEBUG: 00000: tallied 0 elections >> d= p=392 a=LOCATION: bdr_sequencer_tally, bdr_seq.c:755 >> d= p=392 a=DEBUG: 00000: checked 0 sequences for filling >> d= p=392 a=LOCATION: bdr_sequencer_fill_sequences, bdr_seq.c:1034 >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOG: 08006: >> unexpected EOF on client connection with an open transaction >> d=pgbdr p=422 a=bdr (6147857340000055490,1,16908,):init LOCATION: >> SocketBackend, postgres.c:344 >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOG: 08006: >> unexpected EOF on client connection with an open transaction >> d=pgbdr p=417 a=bdr (6147857344392790209,1,16908,):init LOCATION: >> SocketBackend, postgres.c:344 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DEBUG: 00000: >> searching for logical decoding starting point, starting at 0/18ECA60 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> DecodingContextFindStartpoint, logical.c:469 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000: >> logical decoding found initial starting point at 0/18ECA60 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL: 2 >> transactions need to finish. >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> SnapBuildFindSnapshot, snapbuild.c:1338 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000: >> logical decoding found consistent point at 0/18ECAD0 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init DETAIL: >> Transaction ID 1823 finished; no more running transactions. >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> SnapBuildEndTxn, snapbuild.c:907 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 00000: >> exported logical decoding snapshot: "00000721-1" with 0 transaction IDs >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> SnapBuildExportSnapshot, snapbuild.c:604 >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOG: 08006: >> unexpected EOF on client connection with an open transaction >> d=pgbdr p=426 a=bdr (6147857340923527359,1,16908,):init LOCATION: >> SocketBackend, postgres.c:344 >> > > > > -- > ~Wayne > > Wayne E. Seguin > wayneeseg...@gmail.com > wayneeseguin on irc.freenode.net > http://twitter.com/wayneeseguin/ > https://github.com/wayneeseguin/ > -- ~Wayne Wayne E. Seguin wayneeseg...@gmail.com wayneeseguin on irc.freenode.net http://twitter.com/wayneeseguin/ https://github.com/wayneeseguin/