Wow. Yes, this looks a lot like the tmblue 9/10/16 event. I’ll be very interested to take a look at that patch. Haven’t forgotten the 2.2.6 patches I have in flight. Need to finish some other work and pull those off the back burner… Thanks,
Tom ( On 2/22/18, 6:06 PM, "Steve Singer" <st...@ssinger.info> wrote: On Thu, 22 Feb 2018, Tignor, Tom wrote: Looks like? http://lists.slony.info/pipermail/slony1-general/2016-September/013331.html I can't remember if that was what prompted http://lists.slony.info/pipermail/slony1-hackers/2016-December/000560.html https://github.com/ssinger/slony1-engine/tree/bug375 I can't seem to find a reason why this wasn't committed. > > > > Hello slony1 community, > > We have a head scratcher here. It appears a DROP NODE command was not fully processed. The > command was issued and confirmed on all our nodes at approximately 2018-02-21 19:19:50 UTC. When we went to > restore it over two hours later, all replication stopped on an sl_event constraint violation. Investigation > showed a SYNC event for the dropped node with a timestamp of just a few seconds before the drop. I believe this > is a first for us. The DROP NODE command is supposed to remove all state for the dropped node. Is that right? Is > there a potential race condition somewhere which could leave behind state? > > Thanks in advance, > > > > ---- master log replication freeze error ---- > > 2018-02-21 21:38:52 UTC [5775] ERROR remoteWorkerThread_8: "insert into "_ams_cluster".sl_event (ev_origin, > ev_seqno, ev_timestamp, ev_snapshot, ev\ > > _type ) values ('8', '5002075962', '2018-02-21 19:19:41.958719+00', '87044110:87044110:', 'SYNC'); insert > into "_ams_cluster".sl_confirm (con_origi\ > > n, con_received, con_seqno, con_timestamp) values (8, 1, '5002075962', now()); select > "_ams_cluster".logApplySaveStats('_ams_cluster', 8, '0.139 s'::inter\ > > val); commit transaction;" PGRES_FATAL_ERROR ERROR: duplicate key value violates unique constraint > "sl_event-pkey" > > DETAIL: Key (ev_origin, ev_seqno)=(8, 5002075962) already exists. > > 2018-02-21 21:38:52 UTC [13649] CONFIG slon: child terminated signal: 9; pid: 5775, current worker pid: 5775 > > 2018-02-21 21:38:52 UTC [13649] CONFIG slon: restart of worker in 10 seconds > > ---- master log replication freeze error ---- > > > > > > ---- master DB leftover event ---- > > a...@ams6.cmb.netmgmt:~$ psql -U akamai -d ams > > psql (9.1.24) > > Type "help" for help. > > > > ams=# select * from sl_event_bak; > > ev_origin | ev_seqno | ev_timestamp | ev_snapshot | ev_type | ev_data1 | ev_data2 | > ev_data3 | ev_data4 | ev_data5 | ev_data6 | ev_ > > data7 | ev_data8 > > -----------+------------+-------------------------------+--------------------+---------+----------+----------+- > ---------+----------+----------+----------+---- > > ------+---------- > > 8 | 5002075962 | 2018-02-21 19:19:41.958719+00 | 87044110:87044110: | SYNC | | | > | | | | > > | > > (1 row) > > > > ams=# > > ---- master DB leftover event ---- > > > > ---- master log drop node record ---- > > 2018-02-21 19:19:50 UTC [22582] CONFIG disableNode: no_id=8 > > 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=4 li_receiver=1 li_provider=4 > > 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=7 li_receiver=1 li_provider=7 > > 2018-02-21 19:19:50 UTC [22582] CONFIG storeListen: li_origin=3 li_receiver=1 li_provider=3 > > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: update provider configuration > > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection for provider 4 terminated > > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: disconnecting from data provider 4 > > 2018-02-21 19:19:50 UTC [22582] CONFIG remoteWorkerThread_4: connection for provider 7 terminated > > ---- master log drop node record ---- > > > > ---- replica log drop node record ---- > > 2018-02-21 19:19:51 UTC [22650] WARN remoteWorkerThread_1: got DROP NODE for local node ID > > NOTICE: Slony-I: Please drop schema "_ams_cluster" > > 2018-02-21 19:19:53 UTC [22650] INFO remoteWorkerThread_7: SYNC 5001868819 done in 2.153 seconds > > NOTICE: drop cascades to 243 other objects > > DETAIL: drop cascades to table _ams_cluster.sl_node > > drop cascades to table _ams_cluster.sl_nodelock > > drop cascades to table _ams_cluster.sl_set > > drop cascades to table _ams_cluster.sl_setsync > > drop cascades to table _ams_cluster.sl_table > > drop cascades to table _ams_cluster.sl_sequence > > ---- replica log drop node record ---- > > > > > > Tom ☺ > > > > > > > _______________________________________________ Slony1-general mailing list Slony1-general@lists.slony.info http://lists.slony.info/mailman/listinfo/slony1-general