Hello, We are using opensaf version 5.1.0. We have a cluster using tcp as a transport mechanism with opensaf multicast feature enabled. We would appreciate answers to the following questions:
1. Please provide a link or any document that gives details on how the opensaf mds layer works. 2. osafntfd ER ntfs_mds_msg_send FAILED - Trace of the problem. Sep 3 19:57:26.107676 osafntfd [11558:NtfClient.cc:0202] << notificationReceived Sep 3 19:57:26.107679 osafntfd [11558:NtfClient.cc:0147] >> notificationReceived: 108 2 Sep 3 19:57:26.107685 osafntfd [11558:NtfFilter.cc:0464] >> checkFilter Sep 3 19:57:26.107711 osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header Sep 3 19:57:26.107721 osafntfd [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header Sep 3 19:57:26.107726 osafntfd [11558:NtfFilter.cc:0071] T8 numNotificationClassIds: 0 Sep 3 19:57:26.107729 osafntfd [11558:NtfFilter.cc:0056] T8 num EventTypes: 1 Sep 3 19:57:26.107732 osafntfd [11558:NtfFilter.cc:0060] T2 EventTypes matches Sep 3 19:57:26.107735 osafntfd [11558:NtfFilter.cc:0187] T8 num notificationObjects: 0 Sep 3 19:57:26.107738 osafntfd [11558:NtfFilter.cc:0202] T8 num NotifyingObjects: 0 Sep 3 19:57:26.107741 osafntfd [11558:NtfFilter.cc:0223] T2 hdfilter matches Sep 3 19:57:26.107745 osafntfd [11558:NtfFilter.cc:0087] T8 numSi: 0 Sep 3 19:57:26.107748 osafntfd [11558:NtfFilter.cc:0471] << checkFilter Sep 3 19:57:26.107751 osafntfd [11558:NtfClient.cc:0184] T2 NtfClient::notificationReceived notification 723 matches subscription 0, client 108 Sep 3 19:57:26.107756 osafntfd [11558:NtfNotification.cc:0105] T1 Subscription 0 added to list in notification 723 client 108, subscriptionList size is 1 Sep 3 19:57:26.107761 osafntfd [11558:NtfSubscription.cc:0211] >> sendNotification Sep 3 19:57:26.107764 osafntfd [11558:NtfSubscription.cc:0222] T3 send_notification_lib called, client 108, notification 723 Sep 3 19:57:26.107768 osafntfd [11558:ntfs_com.c:0284] >> send_notification_lib Sep 3 19:57:26.107771 osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header Sep 3 19:57:26.107774 osafntfd [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header Sep 3 19:57:26.107777 osafntfd [11558:ntfs_com.c:0286] T3 client id: 108, not_id: 723 Sep 3 19:57:26.107781 osafntfd [11558:mds_c_sndrcv.c:0396] >> mds_send Sep 3 19:57:26.107785 osafntfd [11558:mds_c_sndrcv.c:0403] << mds_send Sep 3 19:57:26.107788 osafntfd [11558:mds_c_sndrcv.c:0681] >> mds_mcm_send Sep 3 19:57:26.107791 osafntfd [11558:mds_c_sndrcv.c:0916] >> mcm_pvt_normal_svc_snd Sep 3 19:57:26.107794 osafntfd [11558:mds_c_sndrcv.c:0956] >> mcm_pvt_normal_snd_process_common Sep 3 19:57:26.107800 osafntfd [11558:mds_c_sndrcv.c:1699] >> mds_mcm_process_disc_queue_checks Sep 3 19:57:26.107804 osafntfd [11558:mds_c_sndrcv.c:1740] TR in else if sub_info->tmr_flag !- true Sep 3 19:57:26.107813 osafntfd [11558:mds_c_sndrcv.c:1747] TR MDS_SND_RCV:Subscription exists but no timer running Sep 3 19:57:26.107816 osafntfd [11558:mds_c_sndrcv.c:1749] TR MDS_SND_RCV :L mds_mcm_process_disc_queue_checks Sep 3 19:57:26.107819 osafntfd [11558:mds_c_sndrcv.c:1750] << mds_mcm_process_disc_queue_checks Sep 3 19:57:26.107900 osafntfd [11558:mds_c_sndrcv.c:1048] << mcm_pvt_normal_snd_process_common Sep 3 19:57:26.107937 osafntfd [11558:mds_c_sndrcv.c:0937] >> mcm_pvt_normal_svc_snd Sep 3 19:57:26.107941 osafntfd [11558:mds_c_sndrcv.c:0846] << mds_mcm_send Sep 3 19:57:26.108141 osafntfd [11558:ntfs_mds.c:1290] ER ntfs_mds_msg_send FAILED Sep 3 19:57:26.108160 osafntfd [11558:ntfs_com.c:0308] ER ntfs_mds_msg_send to ntfa failed rc: 2 Sep 3 19:57:26.108165 osafntfd [11558:NtfNotification.cc:0142] T1 Removing subscription 0 client 108 from notification 723, subscriptionList size is 0 Sep 3 19:57:26.108169 osafntfd [11558:ntfs_com.c:0503] >> sendNotConfirmUpdate: client: 108, subId: 0, notId: 723 a. The traces show that a notification is received for client id 108 and then a mds_send is tried for the same client id but it fails because there is no timer running. b. What does a client represent? A opensaf process? A SU? A component? c. What is the purpose of sending a message back after receipt of the notification? Since it is not sent it is discarded and does not seem to have any impact to the cluster. d. Hardcoded timers are defined in mds_main.c: uint32_t MDS_QUIESCED_TMR_VAL = 80; uint32_t MDS_AWAIT_ACTIVE_TMR_VAL = 18000; uint32_t MDS_SUBSCRIPTION_TMR_VAL = 500; uint32_t MDTM_REASSEMBLE_TMR_VAL = 500; uint32_t MDTM_CACHED_EVENTS_TMR_VAL = 24000;; Could each one of these be explained? Can any of these be increased? If yes, what effect would that have? 3. Are there limitations of a size of a cluster, the number of SGs, the number of SUs, the number of components per SU? Testing shows that as the number of SUs/components increase, the errors when starting the cluster and node appear/increase. Some errors that are seen from diff starts : Sep 6 16:16:09 host--s1-h2 osafimmnd[3119]: NO ERR_BAD_HANDLE: Admin owner 19 does not exist Sep 6 17:57:33 host--s1-h1 osafimmnd[2238]: WA MDS Send Failed to service:IMMND rc:2 Sep 6 17:57:33 host--s1-h1 osafimmnd[2238]: ER Problem in sending to peer IMMND over MDS. Discarding admin op reply. Sep 6 17:57:33 host--s1-h1 osafimmnd[2238]: WA Error code 2 returned for message type 21 - ignoring a. Some of our SUs have more than 30 components. b. All of the components of the cluster perform an IMM search on specific SUs to understand the state of some 2N SG to work with the active SU of those SGs. c. All components register for notifications so that they can react to the 2N SG state changes. 4. Warning of IMMND Client went down is seen on all nodes of the cluster: Sep 6 16:16:09 host--s1-h1 osafimmnd[31600]: WA IMMND - Client went down so no response Sep 6 16:16:09 host--s1-h1 osafimmnd[31600]: NO ERR_BAD_HANDLE: Admin owner 19 does not exist a) What went down? What is "client"? The IMMND did not die nor reboot? b) Can these messages be expanded to include more detail to truly understand what occurred? ________________________________ The information transmitted herein is intended only for the person or entity to which it is addressed and may contain confidential, proprietary and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. _______________________________________________ Opensaf-users mailing list Opensaf-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-users