Hi Mahesh,

Please see response below with [HansN]
/Thanks HansN

-----Original Message-----
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 23 augusti 2016 08:25
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; mathi.naic...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

Hi HansN

Please see response below with [AVM]

-AVM

On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> please see comments below.
>
> /Thanks HansN
>
>
> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>> Hi HansN,
>>
>> Let us fist discuss the error handling and abort, then we can come 
>> back to interpretation of  TIPC currently  does permit  OR does not 
>> permit an application to send a multicast message with the 
>> "destination droppable" setting disabled.
>>
>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
>> an undelivered multicast message to its sender and we can  determine 
>> issue is  because of TIPC_ERR_OVERLOAD, this helps in debugging , so 
>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
>> problem.
>>
>> But still we need to abort(), the reason for that is current MDS 
>> implementations doesn't have flow control logic ( no retry because of 
>> error ) , so Application like AMF can go wrong and cluster will go 
>> into unstable/recoverble state.
>>
> [HansN] In the current implementation messages are dropped silently 
> and no abort is done.
[AVM]  I can see  abort(); in current code , you mean abort(); is not working 
and application(amf) is not existing ?
[HansN] In case of TIPC_DROPPABLE=true and messages are dropped, 
(TIPC_ERR_OVERLOAD)  no abort is be performed, e.g 
amfd detects this in the msg sanity chk and logs "invalid msg id ..." 
============================================================================
if (anc->cmsg_type == TIPC_ERRINFO) {
     /* TIPC_ERRINFO - TIPC error code associated with a returned data message 
or a connection termination message  so abort */
     m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_ERRINFO abort err :%s", strerror(errno) );
*abort();*
} else if (anc->cmsg_type == TIPC_RETDATA) {
     /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return 
rejected messages to the sender )
        we will hit this when we implement MDS retransmit lost messages abort 
can be replaced with flow control logic*/
     for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
         m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
         cptr++;
     }
     /* TIPC_RETDATA -The contents of a returned data message  so abort */
     m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_RETDATA abort err :%s", strerror(errno) );
*abort();*
}
============================================================================
> This patch enables logging
> when packages are dropped to help in debugging. I don't agree that we 
> should also introduce abort, but instead:
> 1) Implement a solution to handle dropped packages, ticket #1960
[AVM]  This is nothing but flow control implementation in MDS, this is future 
enhancement

> 2) Investigate why packages may be dropped, the receiving MDS thread 
> is a real time thread and should be able to consume a large amount of 
> incoming messages.
> E.g. is the receiving MDS thread "live hanging" due to locks, file I/O 
> etc?
>> This was the reason we haven't gone for it while addressing Ticket
>> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
>> So currently we don't have any advantage of disabling 
>> TIPC_DEST_DROPPABLE and not allowing multicast  messages.
>>
>> -AVM
>>
>>
>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>   osaf/libs/core/mds/mds_dt_tipc.c |  32
>>> +++++++++++++++++++++++++-------
>>>   1 files changed, 25 insertions(+), 7 deletions(-)
>>>
>>>
>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c
>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>>                   m_MDS_LOG_INFO("MDTM: Successfully set default 
>>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>           }
>>>   +        int droppable = 0;
>>> +        if (setsockopt(tipc_cb.BSRsock, SOL_TIPC,
>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) {
>>> +                LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero
>>> err :%s\n", strerror(errno));
>>> +                m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE
>>> to zero err :%s\n", strerror(errno));
>>> +                osafassert(0);
>>> +        } else {
>>> +                m_MDS_LOG_NOTIFY("MDTM: Successfully set
>>> TIPC_DEST_DROPPABLE to zero");
>>> +        }
>>> +
>>>       return NCSCC_RC_SUCCESS;
>>>   }
>>>   @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>>       unsigned char *cptr;
>>>       int i;
>>>       int has_addr;
>>> +    int anc_data[2];
>>> +
>>>       ssize_t sz;
>>>         has_addr = (from != NULL) && (addrlen != NULL); @@ -591,19 
>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>>                  if the message was sent using a TIPC name or name 
>>> sequence as the
>>>                  destination rather than a TIPC port ID So abort for 
>>> TIPC_ERRINFO and TIPC_RETDATA*/
>>>               if (anc->cmsg_type == TIPC_ERRINFO) {
>>> -                /* TIPC_ERRINFO - TIPC error code associated with a 
>>> returned data message or a connection termination message  so abort */
>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>>> condition ancillary data: TIPC_ERRINFO abort err :%s",
>>> strerror(errno) );
>>> -                abort();
>>> +                anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0));
>>> +                if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>>> +                    LOG_CR("MDTM: undelivered message condition
>>> ancillary data: TIPC_ERR_OVERLOAD");
>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>> condition ancillary data: TIPC_ERR_OVERLOAD");
>>> +                } else {
>>> +                    /* TIPC_ERRINFO - TIPC error code associated
>>> with a returned data message or a connection termination message  so 
>>> abort */
>>> +                    LOG_CR("MDTM: undelivered message condition
>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>> condition ancillary data: TIPC_ERRINFO abort err : %d", 
>>> anc_data[0]);
>>> +                }
>>>               } else if (anc->cmsg_type == TIPC_RETDATA) {
>>> -                /* If we set TIPC_DEST_DROPPABLE off messge 
>>> (configure TIPC to return rejected messages to the sender )
>>> +                /* If we set TIPC_DEST_DROPPABLE off message
>>> (configure TIPC to return rejected messages to the sender )
>>>                      we will hit this when we implement MDS 
>>> retransmit lost messages  abort can be replaced with flow control 
>>> logic*/
>>>                   for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>>> -                    m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", 
>>> *cptr);
>>> +                    LOG_CR("MDTM: returned byte 0x%02x\n", *cptr);
>>> +                    m_MDS_LOG_CRITICAL("MDTM: returned byte
>>> 0x%02x\n", *cptr);
>>>                       cptr++;
>>>                   }
>>>                   /* TIPC_RETDATA -The contents of a returned data 
>>> message  so abort */
>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message 
>>> condition ancillary data: TIPC_RETDATA abort err :%s",
>>> strerror(errno) );
>>> -                abort();
>>> +                LOG_CR("MDTM: undelivered message condition
>>> ancillary data: TIPC_RETDATA");
>>> +                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>> condition ancillary data: TIPC_RETDATA");
>>>               } else if (anc->cmsg_type == TIPC_DESTNAME) {
>>>                   if (sz == 0) {
>>>                       m_MDS_LOG_DBG("MDTM: recd bytes=0 on received 
>>> on sock, abnormal/unknown  condition. Ignoring");
>>
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to