Re: [devel] [PATCH 1 of 1] mds: Remove use of mutex at TIPC receive (part 1) V3 [#2132]

Hans Nordeback Sun, 11 Dec 2016 23:26:24 -0800

Hi Mahesh,

yes removing the global lock from the receive function would be good.


The V6 of the patch is your suggestion with a modification of the 
condition to

break the loop when either a recv_ctr threshold is reached, this to 
break the loop and to release the global lock.

The global lock is held for a too long time and the other threads are 
not able to get the global lock?

The next condition is if the FD_DSOCK or FD_TMRFD has been signaled.

/Regards HansN


On 12/12/2016 07:47 AM, A V Mahesh wrote:
> Hi HansN,
>
> On 12/12/2016 11:52 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> On 12/8/2016 2:20 PM, Hans Nordeback wrote:
>>> And checking pfd in the FD_BSRSOCK loop will probably give the old 
>>> behaviour 
>>
> I understand  that  it will give old behavior  ,  even giving less 
> value of MAX_RECV_THRESHOLD  will not resolve the issue
> completely  , may be we need to remove FD_BSRSOCK loop , just limit 
> the patch to  lock removal.
>
> -AVM
>> Practically  it is breaking the loop by considering current status ( 
>> see below ),
>> Am I missing any thing  ?
>>
>> ==========================================================================================================================================
>>  
>>
>>
>> <133>1 2016-12-12T11:40:37.926189+05:30 SC-1 immcfg 21413 0 - BEGIN 
>> MDS LOGGING| ARCHW=a|64bit=1
>> <133>1 2016-12-12T11:40:37.926446+05:30 SC-1 osafimmnd 21153 77 - 
>> MDTM: svc up event for svc_id = IMMA_OM(26), subscri. by svc_id = 
>> IMMND(25) pwe_id=1 Adest = <nodeid[1]:osafimmnd[21153]>
>> <133>1 2016-12-12T11:40:37.926601+05:30 SC-1 immcfg 21413 1 - MDTM: 
>> install_tipc : svc_id = IMMA_OM(26), vdest=65535
>> <133>1 2016-12-12T11:40:37.926757+05:30 SC-1 immcfg 21413 2 - MDTM: 
>> svc up event for svc_id = IMMND(25), subscri. by svc_id = IMMA_OM(26) 
>> pwe_id=1 Adest = <rem_node[3]:dest_tipc_id_ref[1031798788]>
>> <133>1 2016-12-12T11:40:37.926781+05:30 SC-1 immcfg 21413 3 - MDTM: 
>> svc up event for svc_id = IMMND(25), subscri. by svc_id = IMMA_OM(26) 
>> pwe_id=1 Adest = <rem_node[2]:dest_tipc_id_ref[3630301200]>
>> <133>1 2016-12-12T11:40:37.926793+05:30 SC-1 immcfg 21413 4 - MDTM: 
>> svc up event for svc_id = IMMND(25), subscri. by svc_id = IMMA_OM(26) 
>> pwe_id=1 Adest = <rem_node[4]:dest_tipc_id_ref[805175300]>
>> <133>1 2016-12-12T11:40:37.926812+05:30 SC-1 immcfg 21413 5 - MDTM: 
>> svc up event for svc_id = IMMND(25), subscri. by svc_id = IMMA_OM(26) 
>> pwe_id=1 Adest = <nodeid[1]:immcfg[21413]>
>> *<131>1 2016-12-12T11:40:37.947514+05:30 SC-1 osafimmnd 21153 78 - 
>> MDTM: break (pfd[FD_BSRSOCK].revents & POLLIN) Loop..**
>> **<131>1 2016-12-12T11:40:38.036485+05:30 SC-1 osafimmd 21142 42 - 
>> MDTM: break (pfd[FD_BSRSOCK].revents & POLLIN) Loop..*
>> <133>1 2016-12-12T11:40:38.06639+05:30 SC-1 osafimmnd 21153 79 - 
>> MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = 
>> IMMND(25) pwe_id=1 Adest = <nodeid[1]:osafimmnd[21153]>
>> <133>1 2016-12-12T11:40:38.066509+05:30 SC-1 immcfg 21413 6 - MDTM: 
>> uninstall_tipc : svc_id = IMMA_OM(26),vdest_id = 65535
>> <133>1 2016-12-12T11:40:38.081814+05:30 SC-1 immcfg 21416 0 - BEGIN 
>> MDS LOGGING| ARCHW=a|64bit=1
>> <133>1 2016-12-12T11:40:38.082067+05:30 SC-1 osafimmnd 21153 80 - 
>> MDTM: svc up event for svc_id = IMMA_OM(26
>>
>>
>> <133>1 2016-12-12T11:40:33.157767+05:30 SC-1 osafclmd 21183 77 - 
>> MDTM: svc down event for svc_id = CLMA(35), subscri. by svc_id = 
>> CLMS(34) pwe_id=1 Adest = <nodeid[1]:osafclmd[21183]>
>> <133>1 2016-12-12T11:40:33.157777+05:30 SC-1 osafclmd 21183 78 - 
>> MDTM: svc up event for svc_id = CLMA(35), subscri. by svc_id = 
>> CLMS(34) pwe_id=1 Adest = <nodeid[1]:osafclmd[21183]>
>> <133>1 2016-12-12T11:40:33.160249+05:30 SC-1 osafclmd 21183 79 - 
>> MDTM: svc down event for svc_id = CLMA(35), subscri. by svc_id = 
>> CLMS(34) pwe_id=1 Adest = <nodeid[1]:osafclmd[21183]>
>> <133>1 2016-12-12T11:40:33.160667+05:30 SC-1 osafsmfd 21219 38 - 
>> MDTM: uninstall_tipc : svc_id = CLMA(35),vdest_id = 65535
>> <133>1 2016-12-12T11:40:33.169483+05:30 SC-1 osafimmnd 21153 38 - 
>> MDTM: svc down event for svc_id = IMMA_OM(26), subscri. by svc_id = 
>> IMMND(25) pwe_id=1 Adest = <nodeid[1]:osafimmnd[21153]>
>> *<131>1 2016-12-12T11:40:33.169554+05:30 SC-1 osafimmnd 21153 39 - 
>> MDTM: break (pfd[FD_BSRSOCK].revents & POLLIN) Loop..*
>> <133>1 2016-12-12T11:40:33.169608+05:30 SC-1 osaflogd 21163 22 - 
>> MDTM: uninstall_tipc : svc_id = IMMA_OM(26),vdest_id = 65535
>> <133>1 2016-12-12T11:40:33.169767+05:30 SC-1 osafimmnd 21153 40 - 
>> MDTM: svc up event for svc_id = IMMA_OM(26), subscri. by svc_id = 
>> IMMND(25) pwe_id=1 Adest = <nodeid[1]:osafimmnd[21153]>
>> <133>1 2016-12-12T11:40:33.1699+05:30 SC-1 osaflogd 21163 23 - MDTM: 
>> install_tipc : svc_id = IMMA_OM(26), vdest=65535
>> <133>1 2016-12-12T11:40:33.169964+05:30 SC-1 osaflogd 21163 24 - 
>> MDTM: svc up event for svc_id = IMMND(25), subscri. by svc
>>
>> <133>1 2016-12-12T11:40:24.536395+05:30 SC-1 osafimmnd 21153 2 - 
>> MDTM: svc up event for svc_id = IMMD(24), subscri. by svc_id = 
>> IMMND(25) pwe_id=1 Adest = <rem_node[2]:dest_tipc_id_ref[3630301197]>
>> <133>1 2016-12-12T11:40:24.536495+05:30 SC-1 osafimmnd 21153 3 - 
>> MDTM: svc up event for svc_id = IMMD(24), subscri. by svc_id = 
>> IMMND(25) pwe_id=1 Adest = <rem_node[2]:dest_tipc_id_ref[3630301197]>
>> <133>1 2016-12-12T11:40:24.538935+05:30 SC-1 osafclmna 21114 9 - 
>> MDTM: svc up event for svc_id = CLMNA(36), subscri. by svc_id = 
>> CLMNA(36) pwe_id=1 Adest = <rem_node[3]:dest_tipc_id_ref[1031798791]>
>> *<131>1 2016-12-12T11:40:24.539112+05:30 SC-1 osafclmna 21114 10 - 
>> MDTM: break (pfd[FD_BSRSOCK].revents & POLLIN) Loop..*
>> <133>1 2016-12-12T11:40:24.539584+05:30 SC-1 osafclmna 21114 11 - 
>> MDTM: svc down event for svc_id = RDE(33), subscri. by svc_id = 
>> CLMNA(36) pwe_id=1 Adest = <nodeid[1]:osafclmna[21114]>
>> <133>1 2016-12-12T11:40:24.539892+05:30 SC-1 osafrded 21123 5 - MDTM: 
>> uninstall_tipc : svc_id = RDE(33),vdest_id = 65535
>> <133>1 2016-12-12T11:40:32.61317+05:30 SC-1 osaflogd 21163 0 - BEGIN 
>> MDS LOGGING| ARCHW=a|64bit=1
>> <133>1 2016-12-12T11:40:32.645023+05:30 SC-1 osafntfd 21173 0 - BEGIN 
>> MDS LOGGING| ARCHW=a|64bit=1
>>
>>
>> 133>1 2016-12-12T11:40:33.131777+05:30 SC-1 osafclmd 21183 60 - MDTM: 
>> svc up event for svc_id = IMMND(25), subscri. by svc_id = IMMA_OI(27) 
>> pwe_id=1 Adest = <rem_node[4]:dest_tipc_id_ref[805175300]>
>> <133>1 2016-12-12T11:40:33.131782+05:30 SC-1 osafclmd 21183 61 - 
>> MDTM: svc up event for svc_id = IMMND(25), subscri. by svc_id = 
>> IMMA_OI(27) pwe_id=1 Adest = <nodeid[1]:osafclmd[21183]>
>> <133>1 2016-12-12T11:40:33.132853+05:30 SC-1 osafntfd 21173 41 - 
>> MDTM: svc up event for svc_id = NTFA(29), subscri. by svc_id = 
>> NTFS(28) pwe_id=1 Adest = <nodeid[1]:osafntfd[21173]>
>> *<131>1 2016-12-12T11:40:33.13311+05:30 SC-1 osafntfd 21173 42 - 
>> MDTM: break (pfd[FD_BSRSOCK].revents & POLLIN) Loop..*
>> <133>1 2016-12-12T11:40:33.133373+05:30 SC-1 osafamfd 21193 45 - 
>> MDTM: install_tipc : svc_id = NTFA(29), vdest=65535
>> <133>1 2016-12-12T11:40:33.133457+05:30 SC-1 osafamfd 21193 46 - 
>> MDTM: svc up event for svc_id = NTFS(28), subscri. by svc_id = 
>> NTFA(29) pwe_id=1 Adest = <rem_node[2]:dest_tipc_id_ref[3630301206]>
>> <133>1 2016-12-12T11:40:33.133512+05:30 SC-1 osafamfd 21193 47 - 
>> MDTM: svc up event for svc_id = NTFS(28), subscri. by svc_id = 
>> NTFA(29) pwe_id=1 Adest = <rem_node[2]:dest_tipc_id_ref[3630301206]>
>>
>> =======================================================================================================================
>>  
>>
>>
>> -AVM
>>
>> On 12/12/2016 9:19 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> But before suggesting this,  when I tested it was breaking the data 
>>> loop of FD_BSRSOCK if any NEW  control event occurs on FD_DSOCK,
>>> I will retest and send you the logs.
>>>
>>> Sorry, I was on leave reply got delayed .
>>>
>>> -AVM
>>>
>>> On 12/8/2016 2:20 PM, Hans Nordeback wrote:
>>>> Hi Mahesh,
>>>>
>>>> at a closer look there are some questions with the suggestion below,
>>>>
>>>> the structure pfd will only be updated after a new poll call, i.e 
>>>> it is the kernel who updates
>>>>
>>>> this structure. And checking pfd in the FD_BSRSOCK loop will 
>>>> probably give the old behaviour
>>>>
>>>> of breaking the loop after one iteration. I think we have to 
>>>> consider reverting back to use the counter
>>>>
>>>> and switch to using the mailbox instead of the callbacks.
>>>>
>>>> /Regards HansN
>>>>
>>>>
>>>>
>>>> On 12/07/2016 01:32 PM, Hans Nordeback wrote:
>>>>> Hi Mahesh,
>>>>>
>>>>> great, the suggestion below looks good. I will send out a V4 shortly.
>>>>>
>>>>> /Thanks HansN
>>>>>
>>>>>
>>>>> On 12/07/2016 05:29 AM, A V Mahesh wrote:
>>>>>> Hi HansN,
>>>>>>
>>>>>> I did some investigation  my fist observation is , issues 
>>>>>> occurring because of delayed processing of Control /(FD_DSOCK/ = 
>>>>>> /POLLIN)/, Timer (/FD_TMRFD = //POLLIN ) /and
>>>>>> Discovery Socket lost the connectivity with the topology events 
>>>>>> (/[FD_DSOCK= //POLLHUP/ ), and this is occurring because of when 
>>>>>> held up in  processing of Data message loop of 1024 ,
>>>>>> so the fix for that will be breaking Data message processing loop 
>>>>>> if we have priority control , timer and connectivity lost events.
>>>>>>
>>>>>> So we need to change the key logic of  `message counter and break 
>>>>>> out of the receive loop after we have received,
>>>>>> 1024 messages , even if there are still more messages on the 
>>>>>> socket` with alternate logic. The new logic will also address the 
>>>>>> out of order caused by MDS library as well.
>>>>>>
>>>>>> Please do following changes and re-based and send the new version 
>>>>>> patch, so that i will continue testing of further and we can 
>>>>>> conclude this patch.
>>>>>> If you are busy with some other ticket please let me know i will 
>>>>>> drive the patch
>>>>>>
>>>>>> 1)  replace this with below :
>>>>>> /===============================================================================================//
>>>>>>  
>>>>>>
>>>>>> //On 11/22/2016 3:07 PM, Hans Nordeback wrote://
>>>>>> //> +                        mds_buff_dump(inbuf, recd_bytes, 
>>>>>> 100);//
>>>>>> //> +                    }//
>>>>>> //> +                    if (++recv_ctr > MAX_RECV_THRESHOLD) {//
>>>>>> //> +                        break;//
>>>>>> //> +                    }//
>>>>>> //===============================================================================================//
>>>>>>  
>>>>>>
>>>>>> //
>>>>>> //===============================================================================================//
>>>>>>  
>>>>>>
>>>>>> //+ mds_buff_dump(inbuf, recd_bytes, 100);//
>>>>>> //+                                       }//
>>>>>> //+                                       if 
>>>>>> ((pfd[FD_DSOCK].revents == POLLIN) ||//
>>>>>> //+ (pfd[FD_DSOCK].revents & POLLHUP) ||//
>>>>>> //+ (pfd[FD_TMRFD].revents == POLLIN)) {//
>>>>>> //+ m_MDS_LOG_INFO("MDTM: break (pfd[FD_BSRSOCK].revents & 
>>>>>> POLLIN) Loop.."); //
>>>>>> //+                                               break;
>>>>>> ///+                        }///
>>>>>> ===============================================================================================/
>>>>>>  
>>>>>>
>>>>>>
>>>>>> 2)  replace this with below :
>>>>>> ===============================================================================================
>>>>>>  
>>>>>>
>>>>>> On 11/22/2016 3:07 PM, Hans Nordeback wrote:
>>>>>>  -                        assert(0);
>>>>>>  +                    }
>>>>>>  +                    if (++recv_ctr > MAX_RECV_THRESHOLD) {
>>>>>>  +                        break;
>>>>>>                       }
>>>>>> ===============================================================================================
>>>>>>  
>>>>>>
>>>>>>
>>>>>> ===============================================================================================
>>>>>>  
>>>>>>
>>>>>> - assert(0);
>>>>>> +                                       }
>>>>>> +                                       if (pfd[FD_DSOCK].revents 
>>>>>> & POLLHUP) {
>>>>>> + m_MDS_LOG_INFO("MDTM: break (pfd[FD_DSOCK].revents == POLLIN) 
>>>>>> Loop..");
>>>>>> +                                               break;
>>>>>>                                         }
>>>>>> ===============================================================================================
>>>>>>  
>>>>>>
>>>>>>
>>>>>> 3) Remove  variables `+#define MAX_RECV_THRESHOLD 1024` & `+  
>>>>>> unsigned int recv_ctr = 0`
>>>>>>
>>>>>> -AVM
>>>>>>
>>>>>>
>>>>>> On 11/23/2016 11:07 AM, A V Mahesh wrote:
>>>>>>> Hi HansN
>>>>>>>
>>>>>>> In my testing with concurrent application which sends large data 
>>>>>>> and large number of up/down event ,
>>>>>>> the cluster is going for reboot because of some  MDS application 
>>>>>>> are not getting scope to process with in expected time  ( see 
>>>>>>> below logs) as I expected,
>>>>>>> for example `osafamfnd[18568]: ER AMF director heart beat 
>>>>>>> timeout, generating core for amfd`  &
>>>>>>> `osaffmd[20377]: Rebooting OpenSAF NodeId = 0 EE Name = No EE 
>>>>>>> Mapped, Reason: Activation timer supervision expired: no ACTIVE 
>>>>>>> assignment received within the time limit,`,
>>>>>>> unfortunately , they are no  MDS logs are not being logged at 
>>>>>>> the time of event occurring ( it is being stuck some time back)
>>>>>>> so ,please hold on this patch  I will find way to root cause the 
>>>>>>> problem by doing different kind of testing.
>>>>>>>
>>>>>>> ==============================================================================================
>>>>>>>  
>>>>>>>
>>>>>>>
>>>>>>> Nov 23 10:46:27 SC-2 osafimmnd[18324]: NO Ccb 4807 COMMITTED 
>>>>>>> (immcfg_PL-4_32406)
>>>>>>> Nov 23 10:46:27 SC-2 osafimmnd[18324]: NO Ccb 4808 COMMITTED 
>>>>>>> (immcfg_PL-4_32409)
>>>>>>> Nov 23 10:46:27 SC-2 osafimmnd[18324]: NO Ccb 4809 COMMITTED 
>>>>>>> (immcfg_PL-4_32412)
>>>>>>> Nov 23 10:46:27 SC-2 osafimmnd[18324]: NO Ccb 4810 COMMITTED 
>>>>>>> (immcfg_PL-4_32415)
>>>>>>> Nov 23 10:46:27 SC-2 osafimmnd[18324]: NO Ccb 4811 COMMITTED 
>>>>>>> (immcfg_PL-4_32418)
>>>>>>> Nov 23 10:46:27 SC-2 osafamfnd[18568]: ER AMF director heart 
>>>>>>> beat timeout, generating core for amfd
>>>>>>> Nov 23 10:46:28 SC-2 osafamfnd[18568]: Rebooting OpenSAF NodeId 
>>>>>>> = 131599 EE Name = , Reason: AMF director heart beat timeout, 
>>>>>>> OwnNodeId = 131599, SupervisionTime = 60
>>>>>>> Nov 23 10:46:28 SC-2 opensaf_reboot: Rebooting local node; 
>>>>>>> timeout=60
>>>>>>> Nov 23 10:46:29 SC-2 osafimmnd[18324]: WA MDS Send Failed
>>>>>>> Nov 23 10:46:29 SC-2 osafimmnd[18324]: WA Error code 2 returned 
>>>>>>> for message type 16 - ignoring
>>>>>>> Nov 23 10:47:09 SC-2 syslog-ng[1256]: syslog-ng starting up; 
>>>>>>> version='2.0.9'
>>>>>>> Nov 23 10:47:10 SC-2 ifup:     lo
>>>>>>>
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 241 COMMITTED 
>>>>>>> (immcfg_PL-4_18950)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 242 COMMITTED 
>>>>>>> (immcfg_PL-4_18953)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 243 COMMITTED 
>>>>>>> (immcfg_PL-4_18956)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 244 COMMITTED 
>>>>>>> (immcfg_PL-4_18959)
>>>>>>> Nov 23 10:46:22 SC-1 osaffmd[20377]: Rebooting OpenSAF NodeId = 
>>>>>>> 0 EE Name = No EE Mapped, Reason: Activation timer supervision 
>>>>>>> expired: no ACTIVE assignment received within the time limit, 
>>>>>>> OwnNodeId = 131343, SupervisionTime = 60
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 245 COMMITTED 
>>>>>>> (immcfg_PL-4_18962)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 246 COMMITTED 
>>>>>>> (immcfg_PL-4_18965)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 247 COMMITTED 
>>>>>>> (immcfg_PL-4_18968)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 255 COMMITTED 
>>>>>>> (immcfg_PL-4_18989)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 256 COMMITTED 
>>>>>>> (immcfg_PL-4_18992)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 257 COMMITTED 
>>>>>>> (immcfg_PL-4_18995)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 258 COMMITTED 
>>>>>>> (immcfg_PL-4_18998)
>>>>>>> Nov 23 10:46:22 SC-1 osafimmnd[20398]: NO Ccb 251 COMMITTED 
>>>>>>> (immcfg_SC-1_20807)
>>>>>>> Nov 23 10:46:22 SC-1 opensaf_reboot: Rebooting local node; 
>>>>>>> timeout=60
>>>>>>>
>>>>>>> ==============================================================================================
>>>>>>>  
>>>>>>>
>>>>>>>
>>>>>>> -AVM
>>>>>>>
>>>>>>>
>>>>>>> On 11/22/2016 3:07 PM, Hans Nordeback wrote:
>>>>>>>> osaf/libs/core/mds/mds_dt_tipc.c | 389 
>>>>>>>> +++++++++++++++++++++-----------------
>>>>>>>>   1 files changed, 216 insertions(+), 173 deletions(-)
>>>>>>>>
>>>>>>>>
>>>>>>>> This first part changes the socket descriptors to non blocking 
>>>>>>>> at receive.
>>>>>>>> This to read all incoming messages before returning to the next 
>>>>>>>> poll call.
>>>>>>>>
>>>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c 
>>>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>>>>>>> @@ -142,6 +142,8 @@ static MDS_SUBTN_REF_VAL handle;
>>>>>>>>   static uint16_t num_subscriptions;
>>>>>>>>     uint32_t mdtm_global_frag_num;
>>>>>>>> +
>>>>>>>> +#define MAX_RECV_THRESHOLD 1024
>>>>>>>> /*********************************************************
>>>>>>>>       Function NAME: mdtm_tipc_init
>>>>>>>> @@ -541,15 +543,13 @@ static uint32_t mdtm_destroy_rcv_task(vo
>>>>>>>>   ssize_t recvfrom_connectionless (int sd, void *buf, size_t 
>>>>>>>> nbytes, int flags,
>>>>>>>>           struct sockaddr *from, socklen_t *addrlen)
>>>>>>>>   {
>>>>>>>> -    struct msghdr msg;
>>>>>>>> -    struct iovec iov;
>>>>>>>> +    struct msghdr msg = {0};
>>>>>>>> +    struct iovec iov = {0};
>>>>>>>>       char anc_buf[CMSG_SPACE(8) + CMSG_SPACE(1024) + 
>>>>>>>> CMSG_SPACE(12)];
>>>>>>>>       struct cmsghdr *anc;
>>>>>>>>       int has_addr;
>>>>>>>>       int anc_data[2];
>>>>>>>>   -    ssize_t sz;
>>>>>>>> -
>>>>>>>>       has_addr = (from != NULL) && (addrlen != NULL);
>>>>>>>>         iov.iov_base = buf;
>>>>>>>> @@ -562,55 +562,63 @@ ssize_t recvfrom_connectionless (int sd,
>>>>>>>>       msg.msg_control = anc_buf;
>>>>>>>>       msg.msg_controllen = sizeof(anc_buf);
>>>>>>>>   -    sz = recvmsg(sd, &msg, flags);
>>>>>>>> -    if (sz >= 0) {
>>>>>>>> -        anc = CMSG_FIRSTHDR(&msg);
>>>>>>>> -        if (anc == NULL) {
>>>>>>>> -            m_MDS_LOG_DBG("MDTM: size: %d  anc is NULL", 
>>>>>>>> (int)sz);
>>>>>>>> +    while (true) {
>>>>>>>> +        ssize_t sz = recvmsg(sd, &msg, flags);
>>>>>>>> +        if (sz == -1) {
>>>>>>>> +            if (errno == EAGAIN || errno == EWOULDBLOCK) {
>>>>>>>> +                return sz;
>>>>>>>> +            } else if (errno == EINTR) {
>>>>>>>> +                continue;
>>>>>>>> +            } else {
>>>>>>>> +                /* -1 indicates connection termination 
>>>>>>>> connectionless this not possible */
>>>>>>>> +                osaf_abort(sz);
>>>>>>>> +            }
>>>>>>>>           }
>>>>>>>> -        while (anc != NULL) {
>>>>>>>> -
>>>>>>>> -            /* Receipt of a normal data message never creates 
>>>>>>>> the TIPC_ERRINFO
>>>>>>>> -               and TIPC_RETDATA objects, and only creates the 
>>>>>>>> TIPC_DESTNAME object
>>>>>>>> -               if the message was sent using a TIPC name or 
>>>>>>>> name sequence as the
>>>>>>>> -               destination rather than a TIPC port ID*/
>>>>>>>> -            if (anc->cmsg_type == TIPC_ERRINFO) {
>>>>>>>> -                anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) 
>>>>>>>> + 0));
>>>>>>>> -                if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>>>>>>>> -                    LOG_CR("MDTM: undelivered message 
>>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>> -                    m_MDS_LOG_CRITICAL("MDTM: undelivered 
>>>>>>>> message condition ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>> +        if (sz >= 0) {
>>>>>>>> +            anc = CMSG_FIRSTHDR(&msg);
>>>>>>>> +            if (anc == NULL) {
>>>>>>>> +                m_MDS_LOG_DBG("MDTM: size: %d  anc is NULL", 
>>>>>>>> (int)sz);
>>>>>>>> +            }
>>>>>>>> +            while (anc != NULL) {
>>>>>>>> +
>>>>>>>> +                /* Receipt of a normal data message never 
>>>>>>>> creates the TIPC_ERRINFO
>>>>>>>> +                   and TIPC_RETDATA objects, and only creates 
>>>>>>>> the TIPC_DESTNAME object
>>>>>>>> +                   if the message was sent using a TIPC name 
>>>>>>>> or name sequence as the
>>>>>>>> +                   destination rather than a TIPC port ID*/
>>>>>>>> +                if (anc->cmsg_type == TIPC_ERRINFO) {
>>>>>>>> +                    anc_data[0] = *((unsigned 
>>>>>>>> int*)(CMSG_DATA(anc) + 0));
>>>>>>>> +                    if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>>>>>>>> +                        LOG_CR("MDTM: undelivered message 
>>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>> +                        m_MDS_LOG_CRITICAL("MDTM: undelivered 
>>>>>>>> message condition ancillary data: TIPC_ERR_OVERLOAD");
>>>>>>>> +                    } else {
>>>>>>>> +                        /* TIPC_ERRINFO - TIPC error code 
>>>>>>>> associated with a returned data message or a connection 
>>>>>>>> termination message */
>>>>>>>> +                        m_MDS_LOG_DBG("MDTM: undelivered 
>>>>>>>> message condition ancillary data: TIPC_ERRINFO err : %d", 
>>>>>>>> anc_data[0]);
>>>>>>>> +                    }
>>>>>>>> +                } else if (anc->cmsg_type == TIPC_RETDATA) {
>>>>>>>> +                    /* If we set TIPC_DEST_DROPPABLE off 
>>>>>>>> message (configure TIPC to return rejected messages to the 
>>>>>>>> sender )
>>>>>>>> +                       we will hit this when we implement MDS 
>>>>>>>> retransmit lost messages, can be replaced with flow control 
>>>>>>>> logic */
>>>>>>>> +                    /* TIPC_RETDATA -The contents of a 
>>>>>>>> returned data message */
>>>>>>>> +                    LOG_CR("MDTM: undelivered message 
>>>>>>>> condition ancillary data: TIPC_RETDATA");
>>>>>>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered 
>>>>>>>> message condition ancillary data: TIPC_RETDATA");
>>>>>>>> +                } else if (anc->cmsg_type == TIPC_DESTNAME) {
>>>>>>>> +                    if (sz == 0) {
>>>>>>>> +                        m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>>>>>> received on sock, abnormal/unknown condition. Ignoring");
>>>>>>>> +                    }
>>>>>>>>                   } else {
>>>>>>>> -                    /* TIPC_ERRINFO - TIPC error code 
>>>>>>>> associated with a returned data message or a connection 
>>>>>>>> termination message */
>>>>>>>> -                    m_MDS_LOG_DBG("MDTM: undelivered message 
>>>>>>>> condition ancillary data: TIPC_ERRINFO err : %d", anc_data[0]);
>>>>>>>> +                    m_MDS_LOG_INFO("MDTM: unrecognized 
>>>>>>>> ancillary data type %u\n", anc->cmsg_type);
>>>>>>>> +                    if (sz == 0) {
>>>>>>>> +                        m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>>>>>> received on sock, abnormal/unkown condition. Ignoring");
>>>>>>>> +                    }
>>>>>>>>                   }
>>>>>>>> -            } else if (anc->cmsg_type == TIPC_RETDATA) {
>>>>>>>> -                /* If we set TIPC_DEST_DROPPABLE off message 
>>>>>>>> (configure TIPC to return rejected messages to the sender )
>>>>>>>> -                   we will hit this when we implement MDS 
>>>>>>>> retransmit lost messages, can be replaced with flow control 
>>>>>>>> logic */
>>>>>>>> -                /* TIPC_RETDATA -The contents of a returned 
>>>>>>>> data message */
>>>>>>>> -                LOG_CR("MDTM: undelivered message condition 
>>>>>>>> ancillary data: TIPC_RETDATA");
>>>>>>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>>>>>>>> condition ancillary data: TIPC_RETDATA");
>>>>>>>> -            } else if (anc->cmsg_type == TIPC_DESTNAME) {
>>>>>>>> -                if (sz == 0) {
>>>>>>>> -                    m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>>>>>> received on sock, abnormal/unknown condition. Ignoring");
>>>>>>>> -                }
>>>>>>>> -            } else {
>>>>>>>> -                m_MDS_LOG_INFO("MDTM: unrecognized ancillary 
>>>>>>>> data type %u\n",    anc->cmsg_type);
>>>>>>>> -                if (sz == 0) {
>>>>>>>> -                    m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>>>>>> received on sock, abnormal/unkown condition. Ignoring");
>>>>>>>> -                }
>>>>>>>> +
>>>>>>>> +                anc = CMSG_NXTHDR(&msg, anc);
>>>>>>>>               }
>>>>>>>>   -            anc = CMSG_NXTHDR(&msg, anc);
>>>>>>>> +            if (has_addr)
>>>>>>>> +                *addrlen = msg.msg_namelen;
>>>>>>>>           }
>>>>>>>> -
>>>>>>>> -        if (has_addr)
>>>>>>>> -            *addrlen = msg.msg_namelen;
>>>>>>>> -    } else {
>>>>>>>> -        /* -1 indicates connection termination connectionless 
>>>>>>>> this not possible */
>>>>>>>> -        abort();
>>>>>>>> +        return sz;
>>>>>>>>       }
>>>>>>>> -
>>>>>>>> -    return sz;
>>>>>>>>   }
>>>>>>>> /*********************************************************
>>>>>>>> @@ -627,91 +635,117 @@ ssize_t recvfrom_connectionless (int sd,
>>>>>>>> *********************************************************/
>>>>>>>>   static uint32_t mdtm_process_recv_events(void)
>>>>>>>>   {
>>>>>>>> +    enum {
>>>>>>>> +        FD_DSOCK = 0,
>>>>>>>> +        FD_BSRSOCK,
>>>>>>>> +        FD_TMRFD,
>>>>>>>> +        NUM_FDS
>>>>>>>> +    };
>>>>>>>> +
>>>>>>>>       /*
>>>>>>>>          STEP 1: Poll on the BSRsock and Dsock to get the events
>>>>>>>>          if data is received process the received data
>>>>>>>>          if discovery events are received , process the 
>>>>>>>> discovery events
>>>>>>>>        */
>>>>>>>>   -    while (1) {
>>>>>>>> +    while (true) {
>>>>>>>> +        unsigned int recv_ctr = 0;
>>>>>>>>           unsigned int pollres;
>>>>>>>> -        struct pollfd pfd[3];
>>>>>>>> +        struct pollfd pfd[NUM_FDS] = {{0}};
>>>>>>>>           struct tipc_event event;
>>>>>>>>   -        pfd[0].fd = tipc_cb.Dsock;
>>>>>>>> -        pfd[0].events = POLLIN;
>>>>>>>> -        pfd[1].fd = tipc_cb.BSRsock;
>>>>>>>> -        pfd[1].events = POLLIN;
>>>>>>>> -        pfd[2].fd = tipc_cb.tmr_fd;
>>>>>>>> -        pfd[2].events = POLLIN;
>>>>>>>> -
>>>>>>>> -        pfd[0].revents = pfd[1].revents = pfd[2].revents = 0;
>>>>>>>> -
>>>>>>>> -        pollres = poll(pfd, 3, MDTM_TIPC_POLL_TIMEOUT);
>>>>>>>> +        pfd[FD_DSOCK].fd = tipc_cb.Dsock;
>>>>>>>> +        pfd[FD_DSOCK].events = POLLIN;
>>>>>>>> +        pfd[FD_BSRSOCK].fd = tipc_cb.BSRsock;
>>>>>>>> +        pfd[FD_BSRSOCK].events = POLLIN;
>>>>>>>> +        pfd[FD_TMRFD].fd = tipc_cb.tmr_fd;
>>>>>>>> +        pfd[FD_TMRFD].events = POLLIN;
>>>>>>>> +
>>>>>>>> +        pfd[FD_DSOCK].revents = pfd[FD_BSRSOCK].revents = 
>>>>>>>> pfd[FD_TMRFD].revents = 0;
>>>>>>>> +
>>>>>>>> +        pollres = poll(pfd, NUM_FDS, MDTM_TIPC_POLL_TIMEOUT);
>>>>>>>>             if (pollres > 0) {    /* Check for EINTR and 
>>>>>>>> discard */
>>>>>>>>               memset(&event, 0, sizeof(event));
>>>>>>>> osaf_mutex_lock_ordie(&gl_mds_library_mutex);
>>>>>>>> -            if (pfd[0].revents == POLLIN) {
>>>>>>>> -                if (recv(tipc_cb.Dsock, &event, sizeof(event), 
>>>>>>>> 0) != sizeof(event)) {
>>>>>>>> -                    m_MDS_LOG_ERR("Unable to capture the recd 
>>>>>>>> event .. Continuing  err :%s", strerror(errno));
>>>>>>>> - osaf_mutex_unlock_ordie(&gl_mds_library_mutex);
>>>>>>>> -                    continue;
>>>>>>>> -                } else {
>>>>>>>> -                    if (NTOHL(event.event) == TIPC_PUBLISHED) {
>>>>>>>> -
>>>>>>>> -                        m_MDS_LOG_INFO("MDTM: Published: ");
>>>>>>>> -                        m_MDS_LOG_INFO("MDTM: <%u,%u,%u> port 
>>>>>>>> id <0x%08x:%u>\n",
>>>>>>>> - NTOHL(event.s.seq.type), NTOHL(event.found_lower),
>>>>>>>> - NTOHL(event.found_upper), NTOHL(event.port.node),
>>>>>>>> - NTOHL(event.port.ref));
>>>>>>>> -
>>>>>>>> -                        if ( NCSCC_RC_SUCCESS != 
>>>>>>>> mdtm_process_discovery_events(TIPC_PUBLISHED,
>>>>>>>> -                                event)) {
>>>>>>>> -                            m_MDS_LOG_INFO("MDTM: Published 
>>>>>>>> Event processing status: F");
>>>>>>>> +            if (pfd[FD_DSOCK].revents == POLLIN) {
>>>>>>>> +                recv_ctr = 0;
>>>>>>>> +                while (true) {
>>>>>>>> +                    ssize_t rc = recv(tipc_cb.Dsock, &event, 
>>>>>>>> sizeof(event), MSG_DONTWAIT);
>>>>>>>> +                    if (rc == -1) {
>>>>>>>> +                        if (errno == EAGAIN || errno == 
>>>>>>>> EWOULDBLOCK) {
>>>>>>>> +                            break;
>>>>>>>> +                        } else if (errno == EINTR) {
>>>>>>>> +                            continue;
>>>>>>>> +                        } else {
>>>>>>>> +                            m_MDS_LOG_ERR("Unable to capture 
>>>>>>>> the recd event .. Continuing  err :%s", strerror(errno));
>>>>>>>> +                            break;
>>>>>>>>                           }
>>>>>>>> -                    } else if (NTOHL(event.event) == 
>>>>>>>> TIPC_WITHDRAWN) {
>>>>>>>> -                        m_MDS_LOG_INFO("MDTM: Withdrawn: ");
>>>>>>>> -                        m_MDS_LOG_INFO("MDTM: <%u,%u,%u> port 
>>>>>>>> id <0x%08x:%u>\n",
>>>>>>>> - NTOHL(event.s.seq.type), NTOHL(event.found_lower),
>>>>>>>> - NTOHL(event.found_upper), NTOHL(event.port.node),
>>>>>>>> - NTOHL(event.port.ref));
>>>>>>>> -
>>>>>>>> -                        if ( NCSCC_RC_SUCCESS != 
>>>>>>>> mdtm_process_discovery_events(TIPC_WITHDRAWN,
>>>>>>>> -                                event)) {
>>>>>>>> -                            m_MDS_LOG_INFO("MDTM: Withdrawn 
>>>>>>>> event processing status: F");
>>>>>>>> +                    } else {
>>>>>>>> +                        if (rc != sizeof(event)) {
>>>>>>>> +                            m_MDS_LOG_ERR("Unable to capture 
>>>>>>>> the recd event .. Continuing  err :%s", strerror(errno));
>>>>>>>> +                            break;
>>>>>>>> +                        } else {
>>>>>>>> +                            if (NTOHL(event.event) == 
>>>>>>>> TIPC_PUBLISHED) {
>>>>>>>> +
>>>>>>>> + m_MDS_LOG_INFO("MDTM: Published: ");
>>>>>>>> + m_MDS_LOG_INFO("MDTM: <%u,%u,%u> port id <0x%08x:%u>\n",
>>>>>>>> + NTOHL(event.s.seq.type), NTOHL(event.found_lower),
>>>>>>>> + NTOHL(event.found_upper), NTOHL(event.port.node),
>>>>>>>> + NTOHL(event.port.ref));
>>>>>>>> +
>>>>>>>> +                                if (NCSCC_RC_SUCCESS != 
>>>>>>>> mdtm_process_discovery_events(TIPC_PUBLISHED,
>>>>>>>> + event)) {
>>>>>>>> + m_MDS_LOG_INFO("MDTM: Published Event processing status: F");
>>>>>>>> +                                }
>>>>>>>> +                            } else if (NTOHL(event.event) == 
>>>>>>>> TIPC_WITHDRAWN) {
>>>>>>>> + m_MDS_LOG_INFO("MDTM: Withdrawn: ");
>>>>>>>> + m_MDS_LOG_INFO("MDTM: <%u,%u,%u> port id <0x%08x:%u>\n",
>>>>>>>> + NTOHL(event.s.seq.type), NTOHL(event.found_lower),
>>>>>>>> + NTOHL(event.found_upper), NTOHL(event.port.node),
>>>>>>>> + NTOHL(event.port.ref));
>>>>>>>> +
>>>>>>>> +                                if (NCSCC_RC_SUCCESS != 
>>>>>>>> mdtm_process_discovery_events(TIPC_WITHDRAWN,
>>>>>>>> + event)) {
>>>>>>>> + m_MDS_LOG_INFO("MDTM: Withdrawn event processing status: F");
>>>>>>>> +                                }
>>>>>>>> +                            } else if (NTOHL(event.event) == 
>>>>>>>> TIPC_SUBSCR_TIMEOUT) {
>>>>>>>> +                                /* As the timeout passed in 
>>>>>>>> infinite, No need to check for the Timeout */
>>>>>>>> + m_MDS_LOG_ERR("MDTM: Timeou Event");
>>>>>>>> + m_MDS_LOG_INFO("MDTM: Timeou Event");
>>>>>>>> + m_MDS_LOG_INFO("MDTM: <%u,%u,%u> port id <0x%08x:%u>\n",
>>>>>>>> + NTOHL(event.s.seq.type), NTOHL(event.found_lower),
>>>>>>>> + NTOHL(event.found_upper), NTOHL(event.port.node),
>>>>>>>> + NTOHL(event.port.ref));
>>>>>>>> +                            } else {
>>>>>>>> + m_MDS_LOG_ERR("MDTM: Unknown Event");
>>>>>>>> +                                /* This should never come */
>>>>>>>> +                                osafassert(0);
>>>>>>>> +                            }
>>>>>>>>                           }
>>>>>>>> -                    } else if (NTOHL(event.event) == 
>>>>>>>> TIPC_SUBSCR_TIMEOUT) {
>>>>>>>> -                        /* As the timeout passed in infinite, 
>>>>>>>> No need to check for the Timeout */
>>>>>>>> -                        m_MDS_LOG_ERR("MDTM: Timeou Event");
>>>>>>>> -                        m_MDS_LOG_INFO("MDTM: Timeou Event");
>>>>>>>> -                        m_MDS_LOG_INFO("MDTM: <%u,%u,%u> port 
>>>>>>>> id <0x%08x:%u>\n",
>>>>>>>> - NTOHL(event.s.seq.type), NTOHL(event.found_lower),
>>>>>>>> - NTOHL(event.found_upper), NTOHL(event.port.node),
>>>>>>>> - NTOHL(event.port.ref));
>>>>>>>> -                    } else {
>>>>>>>> -                        m_MDS_LOG_ERR("MDTM: Unknown Event");
>>>>>>>> -                        /* This should never come */
>>>>>>>> -                        assert(0);
>>>>>>>> +                    }
>>>>>>>> +                    if (++recv_ctr > MAX_RECV_THRESHOLD) {
>>>>>>>> +                        break;
>>>>>>>>                       }
>>>>>>>>                   }
>>>>>>>> -            } else if (pfd[0].revents & POLLHUP) {
>>>>>>>> +            }
>>>>>>>> +            if (pfd[FD_DSOCK].revents & POLLHUP) {
>>>>>>>>                   /* This value is returned when the number of 
>>>>>>>> subscriptions made cross the tipc max_subscr limit,
>>>>>>>>                      so no more connection to the tipc 
>>>>>>>> topserver is present(viz no more up/down events),
>>>>>>>>                      so abort and exit the process */
>>>>>>>> -                   m_MDS_LOG_CRITICAL("MDTM: POLLHUP returned 
>>>>>>>> on Discovery Socket, No. of subscriptions=%d",
>>>>>>>> -                     num_subscriptions);
>>>>>>>> -                abort();    /* This means, the process is use 
>>>>>>>> less as
>>>>>>>> +                m_MDS_LOG_CRITICAL("MDTM: POLLHUP returned on 
>>>>>>>> Discovery Socket, No. of subscriptions=%d",
>>>>>>>> +                           num_subscriptions);
>>>>>>>> +                osaf_abort(pfd[FD_DSOCK].revents); /* This 
>>>>>>>> means, the process is use less as
>>>>>>>>                              it has lost the connectivity with 
>>>>>>>> the topology server
>>>>>>>>                              and will not be able to receive 
>>>>>>>> any UP/DOWN events */
>>>>>>>>   -            } else if (pfd[1].revents & POLLIN) {
>>>>>>>> +            }
>>>>>>>> +            if (pfd[FD_BSRSOCK].revents & POLLIN) {
>>>>>>>>                     /* Data Received */
>>>>>>>>                     uint8_t *inbuf = tipc_cb.recvbuf;
>>>>>>>>                   uint8_t *data;    /* Used for decoding */
>>>>>>>> -                uint16_t recd_bytes = 0;
>>>>>>>>   #ifdef MDS_CHECKSUM_ENABLE_FLAG
>>>>>>>>                   uint16_t old_checksum = 0;
>>>>>>>>                   uint16_t new_checksum = 0;
>>>>>>>> @@ -719,78 +753,87 @@ static uint32_t mdtm_process_recv_events
>>>>>>>>                   struct sockaddr_tipc client_addr;
>>>>>>>>                   socklen_t alen = sizeof(client_addr);
>>>>>>>>   -                uint16_t recd_buf_len = 0;
>>>>>>>>                   m_MDS_LOG_INFO("MDTM: Data received: 
>>>>>>>> Processing data ");
>>>>>>>>   -                recd_bytes = 
>>>>>>>> recvfrom_connectionless(tipc_cb.BSRsock, inbuf, 
>>>>>>>> TIPC_MAX_USER_MSG_SIZE, 0,
>>>>>>>> -                        (struct sockaddr *)&client_addr, &alen);
>>>>>>>> -                if (recd_bytes == 0) {
>>>>>>>> -                    m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>>>>>> received on sock, abnormal/unknown/hack condition. Ignoring");
>>>>>>>> - osaf_mutex_unlock_ordie(&gl_mds_library_mutex);
>>>>>>>> -                    continue;
>>>>>>>> +                recv_ctr = 0;
>>>>>>>> +                while (true) {
>>>>>>>> +                    uint16_t recd_buf_len = 0;
>>>>>>>> +                    ssize_t recd_bytes = 
>>>>>>>> recvfrom_connectionless(tipc_cb.BSRsock, inbuf, 
>>>>>>>> TIPC_MAX_USER_MSG_SIZE, MSG_DONTWAIT,
>>>>>>>> +                                         (struct sockaddr 
>>>>>>>> *)&client_addr, &alen);
>>>>>>>> +                    if (recd_bytes == -1) {
>>>>>>>> +                        m_MDS_LOG_DBG("MDTM: no more data to 
>>>>>>>> read");
>>>>>>>> +                        break;
>>>>>>>> +                    }
>>>>>>>> +                    if (recd_bytes == 0) {
>>>>>>>> +                        m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>>>>>> received on sock, abnormal/unknown/hack condition. Ignoring");
>>>>>>>> +                        break;
>>>>>>>> +                    }
>>>>>>>> +                    data = inbuf;
>>>>>>>> +
>>>>>>>> +                    recd_buf_len = ncs_decode_16bit(&data);
>>>>>>>> +
>>>>>>>> +                    if (pfd[FD_BSRSOCK].revents & POLLERR) {
>>>>>>>> +                        m_MDS_LOG_ERR("MDTM: Error 
>>>>>>>> Recd:tipc_id=<0x%08x:%u>:errno=0x%08x",
>>>>>>>> + client_addr.addr.id.node, client_addr.addr.id.ref, errno);
>>>>>>>> +                    } else if (recd_buf_len == recd_bytes) {
>>>>>>>> +                        uint64_t tipc_id = 0;
>>>>>>>> +                        uint32_t buff_dump = 0;
>>>>>>>> +                        tipc_id = 
>>>>>>>> ((uint64_t)client_addr.addr.id.node) << 32; /* 
>>>>>>>> TIPC_ID=<NODE,REF> */
>>>>>>>> +                        tipc_id |= client_addr.addr.id.ref;
>>>>>>>> +
>>>>>>>> +#ifdef MDS_CHECKSUM_ENABLE_FLAG
>>>>>>>> +                        if (inbuf[2] == 1) {
>>>>>>>> +                            old_checksum = ((uint16_t)inbuf[3] 
>>>>>>>> << 8 | inbuf[4]);
>>>>>>>> +                            inbuf[3] = 0;
>>>>>>>> +                            inbuf[4] = 0;
>>>>>>>> +                            new_checksum = 
>>>>>>>> mds_checksum(recd_bytes, inbuf);
>>>>>>>> +
>>>>>>>> +                            if (old_checksum != new_checksum) {
>>>>>>>> +                                m_MDS_LOG_ERR
>>>>>>>> + ("CHECKSUM-MISMATCH:recvd_on_sock=%zd, Tipc_id=0x%llu, Adest 
>>>>>>>> = <%08x,%u>",
>>>>>>>> +                                     recd_bytes, tipc_id,
>>>>>>>> + m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(client_addr.addr.
>>>>>>>> + id.node),
>>>>>>>> + client_addr.addr.id.ref);
>>>>>>>> + mds_buff_dump(inbuf, recd_bytes, 100);
>>>>>>>> +                                osaf_abort();
>>>>>>>> +                                break;
>>>>>>>> +                            }
>>>>>>>> + mdtm_process_recv_data(&inbuf[5], recd_bytes - 5, tipc_id, 
>>>>>>>> &buff_dump);
>>>>>>>> +                            if (buff_dump) {
>>>>>>>> +                                m_MDS_LOG_ERR
>>>>>>>> + ("RECV_DATA_PROCESS:recvd_on_sock=%zd, Tipc_id=0x%llu, Adest 
>>>>>>>> = <%08x,%u>",
>>>>>>>> +                                     recd_bytes, tipc_id,
>>>>>>>> + m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(client_addr.addr.
>>>>>>>> + id.node),
>>>>>>>> + client_addr.addr.id.ref);
>>>>>>>> + mds_buff_dump(inbuf, recd_bytes, 100);
>>>>>>>> +                            }
>>>>>>>> +                        } else {
>>>>>>>> + mdtm_process_recv_data(&inbuf[5], recd_bytes - 5, tipc_id, 
>>>>>>>> &buff_dump);
>>>>>>>> +                        }
>>>>>>>> +#else
>>>>>>>> + mdtm_process_recv_data(&inbuf[2], recd_bytes - 2, tipc_id, 
>>>>>>>> &buff_dump);
>>>>>>>> +#endif
>>>>>>>> +                    } else {
>>>>>>>> +                        uint64_t tipc_id;
>>>>>>>> +                        tipc_id = 
>>>>>>>> ((uint64_t)client_addr.addr.id.node) << 32; /* 
>>>>>>>> TIPC_ID=<NODE,REF> */
>>>>>>>> +                        tipc_id |= client_addr.addr.id.ref;
>>>>>>>> +
>>>>>>>> +                        /* Log message that we are dropping 
>>>>>>>> the data */
>>>>>>>> +                        m_MDS_LOG_ERR
>>>>>>>> + ("LEN-MISMATCH:recvd_on_sock=%zd, size_in_mds_hdr=%d, 
>>>>>>>> Tipc_id= %"PRIu64", Adest = <%08x,%u>",
>>>>>>>> +                             recd_bytes, recd_buf_len, tipc_id,
>>>>>>>> + 
>>>>>>>> m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(client_addr.addr.id.node),
>>>>>>>> + client_addr.addr.id.ref);
>>>>>>>> +                        mds_buff_dump(inbuf, recd_bytes, 100);
>>>>>>>> +                    }
>>>>>>>> +                    if (++recv_ctr > MAX_RECV_THRESHOLD) {
>>>>>>>> +                        break;
>>>>>>>> +                    }
>>>>>>>>                   }
>>>>>>>> -                data = inbuf;
>>>>>>>> -
>>>>>>>> -                recd_buf_len = ncs_decode_16bit(&data);
>>>>>>>> -
>>>>>>>> -                if (pfd[1].revents & POLLERR) {
>>>>>>>> -                    m_MDS_LOG_ERR("MDTM: Error 
>>>>>>>> Recd:tipc_id=<0x%08x:%u>:errno=0x%08x",
>>>>>>>> - client_addr.addr.id.node, client_addr.addr.id.ref, errno);
>>>>>>>> -                } else if (recd_buf_len == recd_bytes) {
>>>>>>>> -                    uint64_t tipc_id = 0;
>>>>>>>> -                    uint32_t buff_dump = 0;
>>>>>>>> -                    tipc_id = 
>>>>>>>> ((uint64_t)client_addr.addr.id.node) << 32; /* 
>>>>>>>> TIPC_ID=<NODE,REF> */
>>>>>>>> -                    tipc_id |= client_addr.addr.id.ref;
>>>>>>>> -
>>>>>>>> -#ifdef MDS_CHECKSUM_ENABLE_FLAG
>>>>>>>> -                    if (inbuf[2] == 1) {
>>>>>>>> -                        old_checksum = ((uint16_t)inbuf[3] << 
>>>>>>>> 8 | inbuf[4]);
>>>>>>>> -                        inbuf[3] = 0;
>>>>>>>> -                        inbuf[4] = 0;
>>>>>>>> -                        new_checksum = 
>>>>>>>> mds_checksum(recd_bytes, inbuf);
>>>>>>>> -
>>>>>>>> -                        if (old_checksum != new_checksum) {
>>>>>>>> -                            m_MDS_LOG_ERR
>>>>>>>> - ("CHECKSUM-MISMATCH:recvd_on_sock=%d, Tipc_id=0x%llu, Adest = 
>>>>>>>> <%08x,%u>",
>>>>>>>> -                                 recd_bytes, tipc_id,
>>>>>>>> - m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(client_addr.addr.
>>>>>>>> - id.node),
>>>>>>>> - client_addr.addr.id.ref);
>>>>>>>> -                            mds_buff_dump(inbuf, recd_bytes, 
>>>>>>>> 100);
>>>>>>>> -                            abort();
>>>>>>>> - osaf_mutex_unlock_ordie(&gl_mds_library_mutex);
>>>>>>>> -                            continue;
>>>>>>>> -                        }
>>>>>>>> - mdtm_process_recv_data(&inbuf[5], recd_bytes - 5, tipc_id, 
>>>>>>>> &buff_dump);
>>>>>>>> -                        if (buff_dump) {
>>>>>>>> -                            m_MDS_LOG_ERR
>>>>>>>> - ("RECV_DATA_PROCESS:recvd_on_sock=%d, Tipc_id=0x%llu, Adest = 
>>>>>>>> <%08x,%u>",
>>>>>>>> -                                 recd_bytes, tipc_id,
>>>>>>>> - m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(client_addr.addr.
>>>>>>>> - id.node),
>>>>>>>> - client_addr.addr.id.ref);
>>>>>>>> -                            mds_buff_dump(inbuf, recd_bytes, 
>>>>>>>> 100);
>>>>>>>> -                        }
>>>>>>>> -                    } else {
>>>>>>>> - mdtm_process_recv_data(&inbuf[5], recd_bytes - 5, tipc_id, 
>>>>>>>> &buff_dump);
>>>>>>>> -                    }
>>>>>>>> -#else
>>>>>>>> - mdtm_process_recv_data(&inbuf[2], recd_bytes - 2, tipc_id, 
>>>>>>>> &buff_dump);
>>>>>>>> -#endif
>>>>>>>> -                } else {
>>>>>>>> -                    uint64_t tipc_id;
>>>>>>>> -                    tipc_id = 
>>>>>>>> ((uint64_t)client_addr.addr.id.node) << 32; /* 
>>>>>>>> TIPC_ID=<NODE,REF> */
>>>>>>>> -                    tipc_id |= client_addr.addr.id.ref;
>>>>>>>> -
>>>>>>>> -                    /* Log message that we are dropping the 
>>>>>>>> data */
>>>>>>>> -                    m_MDS_LOG_ERR
>>>>>>>> - ("LEN-MISMATCH:recvd_on_sock=%d, size_in_mds_hdr=%d, Tipc_id= 
>>>>>>>> %"PRIu64", Adest = <%08x,%u>",
>>>>>>>> -                         recd_bytes, recd_buf_len, tipc_id,
>>>>>>>> - 
>>>>>>>> m_MDS_GET_NCS_NODE_ID_FROM_TIPC_NODE_ID(client_addr.addr.id.node),
>>>>>>>> -                         client_addr.addr.id.ref);
>>>>>>>> -                    mds_buff_dump(inbuf, recd_bytes, 100);
>>>>>>>> -                }
>>>>>>>> -            } else if (pfd[2].revents == POLLIN) {
>>>>>>>> +            }
>>>>>>>> +            if (pfd[FD_TMRFD].revents == POLLIN) {
>>>>>>>>                   m_MDS_LOG_INFO("MDTM: Processing Timer 
>>>>>>>> mailbox events\n");
>>>>>>>>                     /* Check if destroy-event has been 
>>>>>>>> processed */
>>>>>>>> @@ -1396,7 +1439,7 @@ uint32_t mds_mdtm_svc_install_tipc(PW_EN
>>>>>>>>             if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, 
>>>>>>>> TIPC_IMPORTANCE, &imp, sizeof(imp)) != 0) {
>>>>>>>>               m_MDS_LOG_ERR("MDTM: Can't set socket option 
>>>>>>>> TIPC_IMP err :%s\n", strerror(errno));
>>>>>>>> -            assert(0);
>>>>>>>> +            osafassert(0);
>>>>>>>>           } else {
>>>>>>>>               m_MDS_LOG_INFO("MDTM: Successfully set socket 
>>>>>>>> option TIPC_IMP, svc_id = %s(%d)",
>>>>>>>>                get_svc_names(svc_id), svc_id);
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


------------------------------------------------------------------------------
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today.http://sdm.link/xeonphi
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Re: [devel] [PATCH 1 of 1] mds: Remove use of mutex at TIPC receive (part 1) V3 [#2132]

Reply via email to