Hi,

You can find the mds documentation is here: https://sourceforge.net/p/opensaf/internal-docs/ci/default/tree/programmers_reference/

From your osafntfd trace, the notification 723 is sent by clientId=2, there is a ntf subscriber with clientId=108 and the notification 723 matches the subscription criteria of clientId=108, thus the notification is "forwarded" to clientId=108.

In ntf, a client can be a sender, a reader, or a subscriber. As long as a saNtfInitialize() succeeds, a client is created, so you may have many ntf clients in your process. I haven't tried many saNtfInitalize() to create multiple clients in one process but I think it should work.

The mds error "Subscription exists but no timer running", one possibility is that the timer MDS_SUBSCRIPTION_TMR_VAL may be a bit short, so it timed out too fast before the event MDTM_LIB_UP_TYPE of dtm can reach to mds.

If after increasing the timer does not help, I think you can try to turn the dtm trace on, enable mds debug logĀ  (export MDS_LOG_LEVEL=5 in ntfd.conf), and see whether the event MDTM_LIB_UP_TYPE is created at dtm and it does reach to mds.

/Minh

On 7/9/19 10:21 pm, opensaf-users-requ...@lists.sourceforge.net wrote:
Send Opensaf-users mailing list submissions to
        opensaf-users@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/opensaf-users
or, via email, send a message with subject or body 'help' to
        opensaf-users-requ...@lists.sourceforge.net

You can reach the person managing the list at
        opensaf-users-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Opensaf-users digest..."


Today's Topics:

    1. Issues concerning opensaf with TCP (William R Elliott)


----------------------------------------------------------------------

Message: 1
Date: Fri, 6 Sep 2019 21:07:22 +0000
From: William R Elliott <william.elli...@netcracker.com>
To: "opensaf-users@lists.sourceforge.net"
        <opensaf-users@lists.sourceforge.net>, Lisa Ann Lentz-Liddell
        <lisa.a.lentz-lidd...@netcracker.com>, David S Thompson
        <david.thomp...@netcracker.com>
Subject: [users] Issues concerning opensaf with TCP
Message-ID: <884dcf77544b4d459ecca7f4c438a...@netcracker.com>
Content-Type: text/plain; charset="us-ascii"

Hello,

We are using opensaf version 5.1.0.  We have a cluster using tcp as a transport 
mechanism with opensaf multicast feature enabled.
We would appreciate answers to the following questions:

1.       Please provide a link or any document that gives details on how the 
opensaf mds layer works.

2.       osafntfd ER ntfs_mds_msg_send FAILED  - Trace of the problem.

Sep 3 19:57:26.107676 osafntfd [11558:NtfClient.cc:0202] << notificationReceived
Sep 3 19:57:26.107679 osafntfd [11558:NtfClient.cc:0147] >> 
notificationReceived: 108 2
Sep 3 19:57:26.107685 osafntfd [11558:NtfFilter.cc:0464] >> checkFilter
Sep 3 19:57:26.107711 osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header
Sep 3 19:57:26.107721 osafntfd [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header
Sep 3 19:57:26.107726 osafntfd [11558:NtfFilter.cc:0071] T8 
numNotificationClassIds: 0
Sep 3 19:57:26.107729 osafntfd [11558:NtfFilter.cc:0056] T8 num EventTypes: 1
Sep 3 19:57:26.107732 osafntfd [11558:NtfFilter.cc:0060] T2 EventTypes matches
Sep 3 19:57:26.107735 osafntfd [11558:NtfFilter.cc:0187] T8 num 
notificationObjects: 0
Sep 3 19:57:26.107738 osafntfd [11558:NtfFilter.cc:0202] T8 num 
NotifyingObjects: 0
Sep 3 19:57:26.107741 osafntfd [11558:NtfFilter.cc:0223] T2 hdfilter matches
Sep 3 19:57:26.107745 osafntfd [11558:NtfFilter.cc:0087] T8 numSi: 0
Sep 3 19:57:26.107748 osafntfd [11558:NtfFilter.cc:0471] << checkFilter
Sep 3 19:57:26.107751 osafntfd [11558:NtfClient.cc:0184] T2 
NtfClient::notificationReceived notification 723 matches subscription 0, client 
108
Sep 3 19:57:26.107756 osafntfd [11558:NtfNotification.cc:0105] T1 Subscription 
0 added to list in notification 723 client 108, subscriptionList size is 1
Sep 3 19:57:26.107761 osafntfd [11558:NtfSubscription.cc:0211] >> 
sendNotification
Sep 3 19:57:26.107764 osafntfd [11558:NtfSubscription.cc:0222] T3 
send_notification_lib called, client 108, notification 723
Sep 3 19:57:26.107768 osafntfd [11558:ntfs_com.c:0284] >> send_notification_lib
Sep 3 19:57:26.107771 osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header
Sep 3 19:57:26.107774 osafntfd [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header
Sep 3 19:57:26.107777 osafntfd [11558:ntfs_com.c:0286] T3 client id: 108, 
not_id: 723
Sep 3 19:57:26.107781 osafntfd [11558:mds_c_sndrcv.c:0396] >> mds_send
Sep 3 19:57:26.107785 osafntfd [11558:mds_c_sndrcv.c:0403] << mds_send
Sep 3 19:57:26.107788 osafntfd [11558:mds_c_sndrcv.c:0681] >> mds_mcm_send
Sep 3 19:57:26.107791 osafntfd [11558:mds_c_sndrcv.c:0916] >> 
mcm_pvt_normal_svc_snd
Sep 3 19:57:26.107794 osafntfd [11558:mds_c_sndrcv.c:0956] >> 
mcm_pvt_normal_snd_process_common
Sep 3 19:57:26.107800 osafntfd [11558:mds_c_sndrcv.c:1699] >> 
mds_mcm_process_disc_queue_checks
Sep 3 19:57:26.107804 osafntfd [11558:mds_c_sndrcv.c:1740] TR in else if 
sub_info->tmr_flag !- true
Sep 3 19:57:26.107813 osafntfd [11558:mds_c_sndrcv.c:1747] TR 
MDS_SND_RCV:Subscription exists but no timer running
Sep 3 19:57:26.107816 osafntfd [11558:mds_c_sndrcv.c:1749] TR MDS_SND_RCV :L 
mds_mcm_process_disc_queue_checks
Sep 3 19:57:26.107819 osafntfd [11558:mds_c_sndrcv.c:1750] << 
mds_mcm_process_disc_queue_checks
Sep 3 19:57:26.107900 osafntfd [11558:mds_c_sndrcv.c:1048] << 
mcm_pvt_normal_snd_process_common
Sep 3 19:57:26.107937 osafntfd [11558:mds_c_sndrcv.c:0937] >> 
mcm_pvt_normal_svc_snd
Sep 3 19:57:26.107941 osafntfd [11558:mds_c_sndrcv.c:0846] << mds_mcm_send
Sep 3 19:57:26.108141 osafntfd [11558:ntfs_mds.c:1290] ER ntfs_mds_msg_send 
FAILED
Sep 3 19:57:26.108160 osafntfd [11558:ntfs_com.c:0308] ER ntfs_mds_msg_send to 
ntfa failed rc: 2
Sep 3 19:57:26.108165 osafntfd [11558:NtfNotification.cc:0142] T1 Removing 
subscription 0 client 108 from notification 723, subscriptionList size is 0
Sep 3 19:57:26.108169 osafntfd [11558:ntfs_com.c:0503] >> sendNotConfirmUpdate: 
client: 108, subId: 0, notId: 723

a.      The traces show that a notification is received for client id 108 and 
then a mds_send is tried for the same client id but it fails because there is 
no timer running.
b.      What does a client represent?  A opensaf process?  A SU?  A component?
c.      What is the purpose of sending a message back after receipt of the 
notification?  Since it is not sent it is discarded and does not seem to have 
any impact to the cluster.
d.      Hardcoded timers are defined in mds_main.c:

uint32_t MDS_QUIESCED_TMR_VAL = 80;
uint32_t MDS_AWAIT_ACTIVE_TMR_VAL = 18000;
uint32_t MDS_SUBSCRIPTION_TMR_VAL = 500;
uint32_t MDTM_REASSEMBLE_TMR_VAL = 500;
uint32_t MDTM_CACHED_EVENTS_TMR_VAL = 24000;;

Could each one of these be explained? Can any of these be increased? If yes, 
what effect would that have?


3.       Are there limitations of a size of a cluster, the number of SGs, the 
number of SUs, the number of components per SU?   Testing shows that as the 
number of SUs/components increase, the
errors when starting the cluster and node appear/increase.  Some errors that 
are seen from diff starts :

Sep  6 16:16:09 host--s1-h2 osafimmnd[3119]: NO ERR_BAD_HANDLE: Admin owner 19 
does not exist

Sep  6 17:57:33 host--s1-h1 osafimmnd[2238]: WA MDS Send Failed to 
service:IMMND rc:2
Sep  6 17:57:33 host--s1-h1 osafimmnd[2238]: ER Problem in sending to peer 
IMMND over MDS. Discarding admin op reply.
Sep  6 17:57:33 host--s1-h1 osafimmnd[2238]: WA Error code 2 returned for 
message type 21 - ignoring

a.      Some of our SUs have more than 30 components.
b.      All of the components of the cluster perform an IMM search on specific 
SUs to understand the state of some 2N SG to work with the active SU of those 
SGs.
c.      All components register for notifications so that they can react to the 
2N SG state changes.

4.  Warning of IMMND Client went down is seen on all nodes of the cluster:

Sep  6 16:16:09 host--s1-h1 osafimmnd[31600]: WA IMMND - Client went down so no 
response
Sep  6 16:16:09 host--s1-h1 osafimmnd[31600]: NO ERR_BAD_HANDLE: Admin owner 19 
does not exist

a)      What went down?  What is "client"?  The IMMND did not die nor reboot?
b)      Can these messages be expanded to include more detail to truly 
understand what occurred?




________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.


------------------------------



------------------------------

Subject: Digest Footer

_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users


------------------------------

End of Opensaf-users Digest, Vol 72, Issue 1
********************************************



_______________________________________________
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to