Thanks @liviuchircu - sadly GitHub isn't letting me comment at the moment (bug) so coming back here.
Checking my active masters I am indeed seeing: `INFO:usrloc:replicate_ucontact_update: All destinations in cluster: 25 are down or probing` Browsing the repo I can see that patch 40b7422 is not included in any tagged releases at this time so I will amend the `ping_timeout` as suggested. Thanks for the invaluable insight. On Wed, 19 Feb 2020 at 14:29, Liviu Chircu <[email protected]> wrote: > Hi, Callum! > > On 18.02.2020 15:56, Callum Guy wrote: > > > INFO:usrloc:receive_ucontact_insert: failed to fetch local urecord - > creating new one (ci: '[email protected]') > INFO:usrloc:receive_ucontact_update: failed to fetch local urecord - > create new record and contact (ci: '[email protected]') > INFO:usrloc:receive_ucontact_update: failed to fetch local urecord - > create new record and contact (ci: '[email protected]') > > Is this a simple matter of sessions timing out on the backup and being > removed before the next registration on the primary instance? > > Exactly. The UA re-REGISTERs too close to the expiration point, creating > a race condition > between the two nodes. By the time the backup processes the contact > refresh packet, its > contact is long gone, so it has nothing to match it against. Hence this > harmless INFO message. > > > Secondly I am graphing registration counts based on the following CLI > request: > > opensips-cli -x mi ul_dump brief=1 | grep AOR | wc -l > > Several times a day I see a small drop in the backup registrations > (lighter red line) before a resync operation as shown here: > > > Can anyone help to explain what's going on here? There aren't any log > messages at this time however I do see the backup node reporting a primary > node ping loss ~1 hour before the dip: > > 2020-02-18T11:38:37.786021+00:00 opensips[91789]: > INFO:clusterer:do_action_trans_2: Ping reply not received, node [13] is > down > 2020-02-18T11:38:38.795838+00:00 opensips[91804]: > INFO:clusterer:handle_internal_msg: Node [13] is UP > > This is 100% identical to what Alexey reported in #1976 [1]. Please try > to incorporate at least one of > my proposed solutions -- this should make it much more harder for the link > between the nodes to do down. > > Best regards, > > [1]: https://github.com/OpenSIPS/opensips/issues/1976 > > -- > Liviu Chircuwww.twitter.com/liviuchircu | www.opensips-solutions.com > > OpenSIPS Summit, Amsterdam, May 2020 > www.opensips.org/events > OpenSIPS Bootcamp, Miami, March 2020 > www.opensips.org/training > > -- *0333 332 0000 | www.x-on.co.uk <http://www.x-on.co.uk> | ** <https://www.linkedin.com/company/x-on> <https://www.facebook.com/XonTel> <https://twitter.com/xonuk> * X-on is a trading name of Storacall Technology Ltd a limited company registered in England and Wales. Registered Office : Avaland House, 110 London Road, Apsley, Hemel Hempstead, Herts, HP3 9SD. Company Registration No. 2578478. The information in this e-mail is confidential and for use by the addressee(s) only. If you are not the intended recipient, please notify X-on immediately on +44(0)333 332 0000 and delete the message from your computer. If you are not a named addressee you must not use, disclose, disseminate, distribute, copy, print or reply to this email. Views or opinions expressed by an individual within this email may not necessarily reflect the views of X-on or its associated companies. Although X-on routinely screens for viruses, addressees should scan this email and any attachments for viruses. X-on makes no representation or warranty as to the absence of viruses in this email or any attachments.
_______________________________________________ Users mailing list [email protected] http://lists.opensips.org/cgi-bin/mailman/listinfo/users
