Re: [users] Issues concerning opensaf with TCP

2019-09-10 Thread Lisa Ann Lentz-Liddell
Inline [ll]
Thanks.

-Original Message-
From: Nguyen Minh Vu [mailto:vu.m.ngu...@dektech.com.au]
Sent: Tuesday, September 10, 2019 12:53 AM
To: William R Elliott ; 
opensaf-users@lists.sourceforge.net; Lisa Ann Lentz-Liddell 
; David S Thompson 

Subject: Re: [users] Issues concerning opensaf with TCP

[External Email]




Hi,

Please see my responses for questions # 3 and #4.

Regards, Vu

On 9/7/19 4:07 AM, William R Elliott wrote:
> Hello,
>
> We are using opensaf version 5.1.0.  We have a cluster using tcp as a 
> transport mechanism with opensaf multicast feature enabled.
> We would appreciate answers to the following questions:
>
> 1.   Please provide a link or any document that gives details on how the 
> opensaf mds layer works.
>
> 2.   osafntfd ER ntfs_mds_msg_send FAILED  - Trace of the problem.
>
> Sep 3 19:57:26.107676 osafntfd [11558:NtfClient.cc:0202] <<
> notificationReceived Sep 3 19:57:26.107679 osafntfd
> [11558:NtfClient.cc:0147] >> notificationReceived: 108 2 Sep 3
> 19:57:26.107685 osafntfd [11558:NtfFilter.cc:0464] >> checkFilter Sep
> 3 19:57:26.107711 osafntfd [11558:ntfsv_mem.c:0769] >>
> ntfsv_get_ntf_header Sep 3 19:57:26.107721 osafntfd
> [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header Sep 3 19:57:26.107726
> osafntfd [11558:NtfFilter.cc:0071] T8 numNotificationClassIds: 0 Sep 3
> 19:57:26.107729 osafntfd [11558:NtfFilter.cc:0056] T8 num EventTypes:
> 1 Sep 3 19:57:26.107732 osafntfd [11558:NtfFilter.cc:0060] T2
> EventTypes matches Sep 3 19:57:26.107735 osafntfd
> [11558:NtfFilter.cc:0187] T8 num notificationObjects: 0 Sep 3
> 19:57:26.107738 osafntfd [11558:NtfFilter.cc:0202] T8 num
> NotifyingObjects: 0 Sep 3 19:57:26.107741 osafntfd
> [11558:NtfFilter.cc:0223] T2 hdfilter matches Sep 3 19:57:26.107745
> osafntfd [11558:NtfFilter.cc:0087] T8 numSi: 0 Sep 3 19:57:26.107748
> osafntfd [11558:NtfFilter.cc:0471] << checkFilter Sep 3
> 19:57:26.107751 osafntfd [11558:NtfClient.cc:0184] T2
> NtfClient::notificationReceived notification 723 matches subscription
> 0, client 108 Sep 3 19:57:26.107756 osafntfd
> [11558:NtfNotification.cc:0105] T1 Subscription 0 added to list in
> notification 723 client 108, subscriptionList size is 1 Sep 3
> 19:57:26.107761 osafntfd [11558:NtfSubscription.cc:0211] >>
> sendNotification Sep 3 19:57:26.107764 osafntfd
> [11558:NtfSubscription.cc:0222] T3 send_notification_lib called,
> client 108, notification 723 Sep 3 19:57:26.107768 osafntfd
> [11558:ntfs_com.c:0284] >> send_notification_lib Sep 3 19:57:26.107771
> osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header Sep 3
> 19:57:26.107774 osafntfd [11558:ntfsv_mem.c:0790] <<
> ntfsv_get_ntf_header Sep 3 19:57:26.10 osafntfd
> [11558:ntfs_com.c:0286] T3 client id: 108, not_id: 723 Sep 3
> 19:57:26.107781 osafntfd [11558:mds_c_sndrcv.c:0396] >> mds_send Sep 3
> 19:57:26.107785 osafntfd [11558:mds_c_sndrcv.c:0403] << mds_send Sep 3
> 19:57:26.107788 osafntfd [11558:mds_c_sndrcv.c:0681] >> mds_mcm_send
> Sep 3 19:57:26.107791 osafntfd [11558:mds_c_sndrcv.c:0916] >>
> mcm_pvt_normal_svc_snd Sep 3 19:57:26.107794 osafntfd
> [11558:mds_c_sndrcv.c:0956] >> mcm_pvt_normal_snd_process_common Sep 3
> 19:57:26.107800 osafntfd [11558:mds_c_sndrcv.c:1699] >>
> mds_mcm_process_disc_queue_checks Sep 3 19:57:26.107804 osafntfd
> [11558:mds_c_sndrcv.c:1740] TR in else if sub_info->tmr_flag !- true
> Sep 3 19:57:26.107813 osafntfd [11558:mds_c_sndrcv.c:1747] TR
> MDS_SND_RCV:Subscription exists but no timer running Sep 3
> 19:57:26.107816 osafntfd [11558:mds_c_sndrcv.c:1749] TR MDS_SND_RCV :L
> mds_mcm_process_disc_queue_checks Sep 3 19:57:26.107819 osafntfd
> [11558:mds_c_sndrcv.c:1750] << mds_mcm_process_disc_queue_checks Sep 3
> 19:57:26.107900 osafntfd [11558:mds_c_sndrcv.c:1048] <<
> mcm_pvt_normal_snd_process_common Sep 3 19:57:26.107937 osafntfd
> [11558:mds_c_sndrcv.c:0937] >> mcm_pvt_normal_svc_snd Sep 3
> 19:57:26.107941 osafntfd [11558:mds_c_sndrcv.c:0846] << mds_mcm_send
> Sep 3 19:57:26.108141 osafntfd [11558:ntfs_mds.c:1290] ER
> ntfs_mds_msg_send FAILED Sep 3 19:57:26.108160 osafntfd
> [11558:ntfs_com.c:0308] ER ntfs_mds_msg_send to ntfa failed rc: 2 Sep
> 3 19:57:26.108165 osafntfd [11558:NtfNotification.cc:0142] T1 Removing
> subscription 0 client 108 from notification 723, subscriptionList size
> is 0 Sep 3 19:57:26.108169 osafntfd [11558:ntfs_com.c:0503] >>
> sendNotConfirmUpdate: client: 108, subId: 0, notId: 723
>
> a.  The traces show that a notification is received for client id 108 and 
> then a mds_send is tried for the same client id but it fails because there is 
> no timer ru

Re: [users] Issues concerning opensaf with TCP

2019-09-09 Thread Nguyen Minh Vu

Hi,

Please see my responses for questions # 3 and #4.

Regards, Vu

On 9/7/19 4:07 AM, William R Elliott wrote:

Hello,

We are using opensaf version 5.1.0.  We have a cluster using tcp as a transport 
mechanism with opensaf multicast feature enabled.
We would appreciate answers to the following questions:

1.   Please provide a link or any document that gives details on how the 
opensaf mds layer works.

2.   osafntfd ER ntfs_mds_msg_send FAILED  - Trace of the problem.

Sep 3 19:57:26.107676 osafntfd [11558:NtfClient.cc:0202] << notificationReceived
Sep 3 19:57:26.107679 osafntfd [11558:NtfClient.cc:0147] >> 
notificationReceived: 108 2
Sep 3 19:57:26.107685 osafntfd [11558:NtfFilter.cc:0464] >> checkFilter
Sep 3 19:57:26.107711 osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header
Sep 3 19:57:26.107721 osafntfd [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header
Sep 3 19:57:26.107726 osafntfd [11558:NtfFilter.cc:0071] T8 
numNotificationClassIds: 0
Sep 3 19:57:26.107729 osafntfd [11558:NtfFilter.cc:0056] T8 num EventTypes: 1
Sep 3 19:57:26.107732 osafntfd [11558:NtfFilter.cc:0060] T2 EventTypes matches
Sep 3 19:57:26.107735 osafntfd [11558:NtfFilter.cc:0187] T8 num 
notificationObjects: 0
Sep 3 19:57:26.107738 osafntfd [11558:NtfFilter.cc:0202] T8 num 
NotifyingObjects: 0
Sep 3 19:57:26.107741 osafntfd [11558:NtfFilter.cc:0223] T2 hdfilter matches
Sep 3 19:57:26.107745 osafntfd [11558:NtfFilter.cc:0087] T8 numSi: 0
Sep 3 19:57:26.107748 osafntfd [11558:NtfFilter.cc:0471] << checkFilter
Sep 3 19:57:26.107751 osafntfd [11558:NtfClient.cc:0184] T2 
NtfClient::notificationReceived notification 723 matches subscription 0, client 
108
Sep 3 19:57:26.107756 osafntfd [11558:NtfNotification.cc:0105] T1 Subscription 
0 added to list in notification 723 client 108, subscriptionList size is 1
Sep 3 19:57:26.107761 osafntfd [11558:NtfSubscription.cc:0211] >> 
sendNotification
Sep 3 19:57:26.107764 osafntfd [11558:NtfSubscription.cc:0222] T3 
send_notification_lib called, client 108, notification 723
Sep 3 19:57:26.107768 osafntfd [11558:ntfs_com.c:0284] >> send_notification_lib
Sep 3 19:57:26.107771 osafntfd [11558:ntfsv_mem.c:0769] >> ntfsv_get_ntf_header
Sep 3 19:57:26.107774 osafntfd [11558:ntfsv_mem.c:0790] << ntfsv_get_ntf_header
Sep 3 19:57:26.10 osafntfd [11558:ntfs_com.c:0286] T3 client id: 108, 
not_id: 723
Sep 3 19:57:26.107781 osafntfd [11558:mds_c_sndrcv.c:0396] >> mds_send
Sep 3 19:57:26.107785 osafntfd [11558:mds_c_sndrcv.c:0403] << mds_send
Sep 3 19:57:26.107788 osafntfd [11558:mds_c_sndrcv.c:0681] >> mds_mcm_send
Sep 3 19:57:26.107791 osafntfd [11558:mds_c_sndrcv.c:0916] >> 
mcm_pvt_normal_svc_snd
Sep 3 19:57:26.107794 osafntfd [11558:mds_c_sndrcv.c:0956] >> 
mcm_pvt_normal_snd_process_common
Sep 3 19:57:26.107800 osafntfd [11558:mds_c_sndrcv.c:1699] >> 
mds_mcm_process_disc_queue_checks
Sep 3 19:57:26.107804 osafntfd [11558:mds_c_sndrcv.c:1740] TR in else if 
sub_info->tmr_flag !- true
Sep 3 19:57:26.107813 osafntfd [11558:mds_c_sndrcv.c:1747] TR 
MDS_SND_RCV:Subscription exists but no timer running
Sep 3 19:57:26.107816 osafntfd [11558:mds_c_sndrcv.c:1749] TR MDS_SND_RCV :L 
mds_mcm_process_disc_queue_checks
Sep 3 19:57:26.107819 osafntfd [11558:mds_c_sndrcv.c:1750] << 
mds_mcm_process_disc_queue_checks
Sep 3 19:57:26.107900 osafntfd [11558:mds_c_sndrcv.c:1048] << 
mcm_pvt_normal_snd_process_common
Sep 3 19:57:26.107937 osafntfd [11558:mds_c_sndrcv.c:0937] >> 
mcm_pvt_normal_svc_snd
Sep 3 19:57:26.107941 osafntfd [11558:mds_c_sndrcv.c:0846] << mds_mcm_send
Sep 3 19:57:26.108141 osafntfd [11558:ntfs_mds.c:1290] ER ntfs_mds_msg_send 
FAILED
Sep 3 19:57:26.108160 osafntfd [11558:ntfs_com.c:0308] ER ntfs_mds_msg_send to 
ntfa failed rc: 2
Sep 3 19:57:26.108165 osafntfd [11558:NtfNotification.cc:0142] T1 Removing 
subscription 0 client 108 from notification 723, subscriptionList size is 0
Sep 3 19:57:26.108169 osafntfd [11558:ntfs_com.c:0503] >> sendNotConfirmUpdate: 
client: 108, subId: 0, notId: 723

a.  The traces show that a notification is received for client id 108 and 
then a mds_send is tried for the same client id but it fails because there is 
no timer running.
b.  What does a client represent?  A opensaf process?  A SU?  A component?
c.  What is the purpose of sending a message back after receipt of the 
notification?  Since it is not sent it is discarded and does not seem to have 
any impact to the cluster.
d.  Hardcoded timers are defined in mds_main.c:

uint32_t MDS_QUIESCED_TMR_VAL = 80;
uint32_t MDS_AWAIT_ACTIVE_TMR_VAL = 18000;
uint32_t MDS_SUBSCRIPTION_TMR_VAL = 500;
uint32_t MDTM_REASSEMBLE_TMR_VAL = 500;
uint32_t MDTM_CACHED_EVENTS_TMR_VAL = 24000;;

Could each one of these be explained? Can any of these be increased? If yes, 
what effect would that have?


3.   Are there limitations of a size of a cluster, the number of SGs, the 
number of SUs, the number of components per SU?   Testing shows that as the 
number of SUs/components increase, the
erro