Hi, As a follow-up to my previous post about sl_confirm getting aged, I *did* do a move_set from node 4 to node 1 about 6 days ago. Any reason why the slon cleanup cycle didn't pick up these confirmations and delete them? Perhaps it is a bug of some sort?
In any case, I deleted the rows in sl_confirm, so the test_slony_state-dbi.pl script doesn't list these anomalies anymore. Could anyone else has encountered this, or have an explanation for this? --Richard On Mar 30, 2007, at 12:17 PM, Richard Yen wrote: > Hi all, > > I've recently been experiencing climbing lags, followed by a sudden > drop, at random times during the day. I understand that for some > people a ~40 event lag isn't much, but it's quite unusual for my > cluster. > > I run a 4-node cluster (1 provider, 3 subscribers), and it appears > that at random times, the event lag climbs up to ~40, and then > suddenly drops to 0. Load on all nodes is < 1.0 during these times, > so I don't suspect that it's hardware or configuration. That leaves > me with no explanation of what's happening that causes these "lag > spikes." > > Tried running test_slony_state-dbi.pl, and found the following output: > > ===BEGIN LOG=== > Tests for node 1 - DSN = dbname=tii host=tii- > db1.oaktown.iparadigms.com user=slony password=3l3phant > ======================================== > pg_listener info: > Pages: 9 > Tuples: 1 > > Size Tests > ================================================ > sl_log_1 1918 26082.000000 > sl_log_2 0 0.000000 > sl_seqlog 20 1543.000000 > > Listen Path Analysis > =================================================== > No problems found with sl_listen > > ---------------------------------------------------------------------- > -- > -------- > Summary of event info > Origin Min SYNC Max SYNC Min SYNC Age Max SYNC Age > ====================================================================== > == > ======== > 2 2277006 2277401 00:00:00 00:19:00 0 > 1 2999671 3001970 00:00:00 00:19:00 0 > 5 516048 516088 00:00:00 00:20:00 0 > 4 173746 174140 00:00:00 00:19:00 0 > > > ---------------------------------------------------------------------- > -- > --------- > Summary of sl_confirm aging > Origin Receiver Min SYNC Max SYNC Age of latest SYNC Age > of eldest SYNC > ====================================================================== > == > ========= > 1 2 2999672 3001969 00:00:00 > 00:19:00 0 > 1 4 2999678 3001969 00:00:00 > 00:19:00 0 > 1 5 2999671 3001962 00:00:00 > 00:19:00 0 > 2 1 2277006 2277401 00:00:00 > 00:19:00 0 > 2 4 2277006 2277401 00:00:00 > 00:19:00 0 > 2 5 2277006 2277400 00:00:00 > 00:19:00 0 > 4 1 173746 174140 00:00:00 > 00:19:00 0 > 4 2 6030310 6030310 6 days 01:52:00 6 days > 01:52:00 1 > 4 5 6030307 6030307 6 days 01:52:00 6 days > 01:52:00 1 > 5 1 516048 516088 00:00:00 > 00:20:00 0 > 5 2 516048 516088 00:00:00 > 00:20:00 0 > 5 4 516048 516088 00:00:00 > 00:20:00 0 > > > ---------------------------------------------------------------------- > -- > ------ > > Listing of old open connections > Database PID User Query > Age Query > ====================================================================== > == > ======== > ===END OF LOG=== > > If you notice, the lines for Origin->Receiver on 4->2 and 4->2 have > some old SYNCs. These nodes (2 and 5) are the ones I experience the > "lag spikes" on. The other subscriber, node 4, doesn't experience > lag spikes at all. This report is similar for every node in the > test_slony_state-dbi.pl script, so I'm kind of perplexed. > > Wondering if anyone would be able to interpret this for me and > provide and help/advice. > > Thanks a lot! > --Richard > _______________________________________________ > Slony1-general mailing list > [email protected] > http://gborg.postgresql.org/mailman/listinfo/slony1-general _______________________________________________ Slony1-general mailing list [email protected] http://gborg.postgresql.org/mailman/listinfo/slony1-general
