Hi Sergio, Can you please share syslog and mdslog of all the nodes. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Mohan Kanakam [mailto:mo...@gethighavailability.com] Sent: 10 February 2022 23:23 To: 'Sérgio Marques'; 'opensaf-users@lists.sourceforge.net' Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len
Hi Sergio, Thanks for the information. Can you please share us the steps(kind of checkpoint, frequency of checkpoint, etc.) to reproduce the issue in our lab? Can you please share ckptd and ckptnd traces(you can enable/disable at runtime using "kill -USR2 <ckptnd_pid/ckptd_pid>") and application api calls. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:sergio-l-marq...@alticelabs.com] Sent: 10 February 2022 20:17 To: Mohan Kanakam; opensaf-users@lists.sourceforge.net Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Mohan, Thanks for your quick answer. I have a cluster with 2 controller and 2 payload boards. All of the nodes are using the last opensaf version 5.22.01. Some checkpoints are created by the payload and controller applications. I can see these errors every 3 to 10 seconds at the controller slave card. Is there a way of debugging this issue to find out where exactly are the messages being lost? Please feel free to ask for more info/logs/tests. Thanks and regards, Sérgio Marques -----Original Message----- From: Mohan Kanakam <mo...@gethighavailability.com> Sent: 9 de fevereiro de 2022 17:48 To: Sérgio Marques <sergio-l-marq...@alticelabs.com>; opensaf-users@lists.sourceforge.net Subject: RE: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi Sergio, Can you please let us know the opensaf version being used. Since, I don't know the test scenario and the use case, so I am giving generic answer. If ckptnd sends a sync message(checkpoint) to Active ckptd and waits for its reply and if Active ckptd is stopped, then ckptnd will never get reply of those messages and the context of sync messages gets invalid and unanswered. It looks, there are few such messages pending at ckptd to be replied, and if Active controller reboots, then Active ckptd is stopped and all the sync messages waiting to be replied at ckptnd may get such error because the context of the sync messages are lost. To me, it looks the messages are genuine, but then it may be a concern that few ckptnd's checkpoint messages are not being responded and they are lost. So, application(if running on payload) need to send it again. Thanks & Regards Mohan Kanakam | 91-8333082448 Senior Software Engineer High Availability Solutions www.GetHighAvailability.com Get High Availability Today ! NJ, USA: 1 508-507-6507 | Hyderabad, India: 91 798-992-5293 -----Original Message----- From: Sérgio Marques [mailto:sergio-l-marq...@alticelabs.com] Sent: 09 February 2022 14:50 To: opensaf-users@lists.sourceforge.net Subject: [users] MDS_SND_RCV: Invalid Sync CTXT Len Hi, In my system, some "MDS_SND_RCV: Invalid Sync CTXT Len" events are being registered in mds.log. Are these errors normal? I'm asking this because I'm experiencing very weird problems when rebooting the active controller node. Thanks in advance, Sérgio Marques <141>1 2022-01-23T02:24:44.936356Z OLT2T4-UNICOM-2 osaflcknd 2151 mds.log [meta sequenceId="347"] MDTM: svc down event for svc_id = GLA(3), subscri. by svc_id = GLND(4) pwe_id=1 Adest = <nodeid[0x20c0f]:osaflcknd[2151]> <141>1 2022-01-23T02:24:44.936699Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4505"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPND(17) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptnd[2169]> <141>1 2022-01-23T02:24:44.936703Z OLT2T4-UNICOM-2 osafckptd 2210 mds.log [meta sequenceId="750"] MDTM: svc down event for svc_id = CPA(18), subscri. by svc_id = CPD(16) pwe_id=1 Adest = <nodeid[0x20c0f]:osafckptd[2210]> <141>1 2022-01-23T02:24:44.937203Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="743"] MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <141>1 2022-01-23T02:24:44.937479Z OLT2T4-UNICOM-2 osafimmnd 1588 mds.log [meta sequenceId="744"] MDTM: svc down event for svc_id = IMMA_OI(27), subscri. by svc_id = IMMND(25) pwe_id=1 Adest = <nodeid[0x20c0f]:osafimmnd[1588]> <139>1 2022-01-23T02:24:46.71289Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4506"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.726201Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4507"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.73951Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4508"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.75349Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4509"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.768767Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4510"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.777688Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4511"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.788105Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4512"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.801785Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4513"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.815353Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4514"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.824419Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4515"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.833325Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4516"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.842368Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4517"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.851617Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4518"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.862324Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4519"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.876586Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4520"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:46.887585Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4521"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.764874Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4522"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.776664Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4523"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.788544Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4524"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.799566Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4525"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.806417Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4526"] MDS_SND_RCV: Invalid Sync CTXT Len <139>1 2022-01-23T02:24:48.81328Z OLT2T4-UNICOM-2 osafckptnd 2169 mds.log [meta sequenceId="4527"] MDS_SND_RCV: Invalid Sync CTXT Len _______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users _______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users