Hi all,
PostgreSQL v 9.1.5 - 9.1.6
Slony version 2.1.0
I'm having an issue that's occurred twice now. I have 4 node slony
cluster, and one of the operations is to drop a node from replication,
do maintenance on it, then add it back to replication.
Node 1 = master
Node 2 = slave
Node 3 = slave -> dropped then readded
Node 4 = slave
I dropped node 2 from replication, then a while later added it back. I
have my own set of scripts that add it to replication all automatically.
They have been used dozens of times over the past few months without
issue. But recently, I've had the issue where after adding back the node
(in this example, node 3), node 4 suddenly stops syncing. I've attached
a snip of the log for node 4 in this email. The main thing I'm seeing is
this:
Node 4 (the one that stopped syncing) was proccessing the following SYNC
events
5000030355
5000030357
5000030358
5000030359
Then at this point it receives the event recognizing the new node
addition to the slony cluster.
then we have the error where it's attempting to insert the event
5000000032 (see log attached).
Are these sequential event id's supposed to be related?
I looked at the sl_event on the node 4 box that's erroring, and did the
following query:
select now() - ev_timestamp, ev_seqno from _slony.sl_event where
ev_origin = 4 order by 1;
(how long ago was this event
?column? | ev_seqno
-----------------+------------
01:48:30.7277 | 5000000031
01:48:40.731095 | 5000000030
01:48:50.734454 | 5000000029
01:49:00.735438 | 5000000028
01:49:10.728117 | 5000000027
01:49:20.731387 | 5000000026
01:49:30.734199 | 5000000025
01:49:40.726768 | 5000000024
01:49:50.730152 | 5000000023
01:50:00.732747 | 5000000022
01:50:10.736126 | 5000000021
01:50:20.737394 | 5000000020
01:50:30.733839 | 5000000019
01:50:40.736928 | 5000000018
01:50:50.739931 | 5000000017
01:51:00.742365 | 5000000016
01:51:10.745724 | 5000000015
01:51:20.748913 | 5000000014
01:51:30.751388 | 5000000013
01:51:40.754439 | 5000000012
01:51:50.757796 | 5000000011
01:52:00.760079 | 5000000010
01:52:10.763595 | 5000000009
01:52:20.762106 | 5000000008
01:52:30.764838 | 5000000007
01:52:40.768154 | 5000000006
01:52:50.771847 | 5000000005
01:53:00.775925 | 5000000004
01:53:38.2839 | 5000000003
01:53:38.422047 | 5000000002
01:53:38.435689 | 5000000001
01:55:29.173397 | 5000000032
NOTE: At the moment I took this snapshot, the st_lag_time for this node
was '01:55:28.411043', which falls between the last two timestamps above.
What strikes me as 'interesting' is that 5000000032 (the one it's
erroring on) happened way before the rest of them timestamp wise. So I'm
thinking that perhaps it was leftover from the previous 'node' before I
dropped it, and it was never cleaned up?
The last time this happened, I deleted that offending row myself
(desperate, i know to usually never touch the slony schema data), and it
actually did completely break slony replication. It's a smaller set so
wasn't too hard to set up again from scratch.
So I'm going to do some testing to see if it's due to lack of cleanup,
but thought I'd throw this out there. Is there anything else that could
possibly cause this?
thanks in advance,
- Brian F
2012-09-27 12:11:44 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:11:44 EDT INFO remoteWorkerThread_1: SYNC 5000030350 done in
0.330 seconds
2012-09-27 12:11:50 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:11:50 EDT INFO remoteWorkerThread_1: SYNC 5000030351 done in
0.332 seconds
2012-09-27 12:11:56 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:11:56 EDT INFO remoteWorkerThread_1: SYNC 5000030352 done in
0.371 seconds
2012-09-27 12:11:56 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:11:57 EDT INFO remoteWorkerThread_1: SYNC 5000030353 done in
0.371 seconds
2012-09-27 12:12:12 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:12:12 EDT INFO remoteWorkerThread_1: SYNC 5000030354 done in
0.330 seconds
2012-09-27 12:12:18 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:12:19 EDT INFO remoteWorkerThread_1: SYNC 5000030355 done in
0.371 seconds
2012-09-27 12:12:31 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:12:31 EDT INFO remoteWorkerThread_1: SYNC 5000030357 done in
0.371 seconds
2012-09-27 12:12:42 EDT WARN storePath: unknown node ID 4 - event pending
2012-09-27 12:12:42 EDT CONFIG storeNode: no_id=4 no_comment='<event pending>'
2012-09-27 12:12:42 EDT CONFIG storePath: pa_server=4 pa_client=2
pa_conninfo="dbname=mydbprod host=10.2.40.101 port=5432 user=slony1"
pa_connretry=10
2012-09-27 12:12:42 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:42 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:42 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:42 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:42 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:42 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:42 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:43 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:12:43 EDT INFO remoteWorkerThread_1: SYNC 5000030358 done in
0.372 seconds
2012-09-27 12:12:43 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:12:43 EDT INFO remoteWorkerThread_1: SYNC 5000030359 done in
0.329 seconds
2012-09-27 12:12:43 EDT CONFIG storeNode: no_id=4 no_comment='Server 4' -
update node
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:43 EDT CONFIG enableNode: no_id=4
2012-09-27 12:12:43 EDT INFO remoteWorkerThread_4: thread starts
2012-09-27 12:12:43 EDT INFO remoteListenThread_4: thread starts
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:43 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:43 EDT CONFIG version for "dbname=mydbprod port=5432
host=10.3.40.100 user=slony1" is 90104
2012-09-27 12:12:43 EDT CONFIG remoteWorkerThread_4: update provider
configuration
2012-09-27 12:12:43 EDT INFO remoteWorkerThread_1: syncing set 1 with 6
table(s) from provider 1
2012-09-27 12:12:44 EDT CONFIG version for "dbname=mydbprod host=10.2.40.101
port=5432 user=slony1" is 90106
2012-09-27 12:12:44 EDT INFO remoteWorkerThread_1: SYNC 5000030363 done in
0.333 seconds
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_4: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_4: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=3
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=4 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=3 li_receiver=2
li_provider=4
2012-09-27 12:12:44 EDT CONFIG storeListen: li_origin=1 li_receiver=2
li_provider=1
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_3: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_4: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: update provider
configuration
2012-09-27 12:12:44 EDT CONFIG remoteWorkerThread_1: added active set 1 to
provider 1
2012-09-27 12:12:49 EDT ERROR remoteWorkerThread_4: "insert into
"_slony".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_snapshot,
ev_type ) values ('4', '5000000032', '2012-09-27 12:10:48.994257-04',
'2394:2394:', 'SYNC'); insert into "_slony".sl_confirm (con_origin,
con_received, con_seqno, con_timestamp) values (4, 2, '5000000032', now());
commit transaction;" PGRES_FATAL_ERROR ERROR: duplicate key value violates
unique constraint "sl_event-pkey"
DETAIL: Key (ev_origin, ev_seqno)=(4, 5000000032) already exists.
2012-09-27 12:12:49 EDT CONFIG slon: child terminated signal: 9; pid: 7470,
current worker pid: 7470
2012-09-27 12:12:49 EDT CONFIG slon: restart of worker in 10 seconds
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general