Hi Mahesh,

Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may drop 
messages silently,  at receive sock buffer full condition,  but do not return 
any ancillary message.
If TIPC_DROPPABLE = false tipc may drop message but will send an ancillary 
message to inform about TIPC_ERR_OVERLOAD.
Correcting this and adding an abort is not backward compatible as some service 
already handle flow control in some way, only log when packages are dropped.
Regarding ticket #1960 there are other solutions than introducing flow control 
in MDS, e.g. expose an option to the service to choose connection oriented
or connection less.
The problem with dropped messages seems in one case related to, (by MDS), 
intensive MDS logging.

/Thanks HansN
-----Original Message-----
From: A V Mahesh [mailto:mahesh.va...@oracle.com] 
Sent: den 23 augusti 2016 11:27
To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
<anders.wid...@ericsson.com>; mathi.naic...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]

Hi HansN,

It seems I am missing some thing , please allow me to under stand

If I currently understand you observation :

With current Opensaf code ( this #1957 patch NOT applied ) , by default 
TIPC_DROPPABLE=true ,while running Opensaf with that binary when 
TIPC_ERR_OVERLOAD  occurring, TIPC is not  given errors TIPC_ERRINFO or  
TIPC_RETDATA and following code is not being get hit of function 
recvfrom_connectionless(), is my  understanding right ?

=============================================================================================================

*if (anc->cmsg_type == TIPC_ERRINFO) {*
     /* TIPC_ERRINFO - TIPC error code associated with a returned data message 
or a connection termination message  so abort */
     m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_ERRINFO abort err :%s", strerror(errno) );
*abort();*
*} else if (anc->cmsg_type == TIPC_RETDATA) {*
     /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return 
rejected messages to the sender )
        we will hit this when we implement MDS retransmit lost messages abort 
can be replaced with flow control logic*/
     for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
         m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
         cptr++;
     }
     /* TIPC_RETDATA -The contents of a returned data message  so abort */
     m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary
data: TIPC_RETDATA abort err :%s", strerror(errno) );
*abort();*
}

=============================================================================================================

-AVM


On 8/23/2016 1:08 PM, Hans Nordebäck wrote:
> Hi Mahesh,
>
> Please see response below with [HansN] /Thanks HansN
>
> -----Original Message-----
> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
> Sent: den 23 augusti 2016 08:25
> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell 
> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957]
>
> Hi HansN
>
> Please see response below with [AVM]
>
> -AVM
>
> On 8/23/2016 11:41 AM, Hans Nordebäck wrote:
>> Hi Mahesh,
>>
>> please see comments below.
>>
>> /Thanks HansN
>>
>>
>> On 08/23/2016 07:21 AM, A V Mahesh wrote:
>>> Hi HansN,
>>>
>>> Let us fist discuss the error handling and abort, then we can come 
>>> back to interpretation of  TIPC currently  does permit  OR does not 
>>> permit an application to send a multicast message with the 
>>> "destination droppable" setting disabled.
>>>
>>> Let us disable TIPC_DEST_DROPPABLE, so that  TIPC will try to return 
>>> an undelivered multicast message to its sender and we can  determine 
>>> issue is  because of TIPC_ERR_OVERLOAD, this helps in debugging , so 
>>> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the 
>>> problem.
>>>
>>> But still we need to abort(), the reason for that is current MDS 
>>> implementations doesn't have flow control logic ( no retry because 
>>> of error ) , so Application like AMF can go wrong and cluster will 
>>> go into unstable/recoverble state.
>>>
>> [HansN] In the current implementation messages are dropped silently 
>> and no abort is done.
> [AVM]  I can see  abort(); in current code , you mean abort(); is not working 
> and application(amf) is not existing ?
> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, 
> (TIPC_ERR_OVERLOAD)  no abort is be performed, e.g amfd detects this in the 
> msg sanity chk and logs "invalid msg id ..."
> ======================================================================
> ======
> if (anc->cmsg_type == TIPC_ERRINFO) {
>       /* TIPC_ERRINFO - TIPC error code associated with a returned data 
> message or a connection termination message  so abort */
>       m_MDS_LOG_CRITICAL("MDTM: undelivered message condition 
> ancillary
> data: TIPC_ERRINFO abort err :%s", strerror(errno) );
> *abort();*
> } else if (anc->cmsg_type == TIPC_RETDATA) {
>       /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return 
> rejected messages to the sender )
>          we will hit this when we implement MDS retransmit lost messages 
> abort can be replaced with flow control logic*/
>       for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>           m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr);
>           cptr++;
>       }
>       /* TIPC_RETDATA -The contents of a returned data message  so abort */
>       m_MDS_LOG_CRITICAL("MDTM: undelivered message condition 
> ancillary
> data: TIPC_RETDATA abort err :%s", strerror(errno) );
> *abort();*
> }
> ======================================================================
> ======
>> This patch enables logging
>> when packages are dropped to help in debugging. I don't agree that we 
>> should also introduce abort, but instead:
>> 1) Implement a solution to handle dropped packages, ticket #1960
> [AVM]  This is nothing but flow control implementation in MDS, this is 
> future enhancement
>
>> 2) Investigate why packages may be dropped, the receiving MDS thread 
>> is a real time thread and should be able to consume a large amount of 
>> incoming messages.
>> E.g. is the receiving MDS thread "live hanging" due to locks, file 
>> I/O etc?
>>> This was the reason we haven't gone for it while addressing Ticket
>>> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/)
>>> So currently we don't have any advantage of disabling 
>>> TIPC_DEST_DROPPABLE and not allowing multicast  messages.
>>>
>>> -AVM
>>>
>>>
>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote:
>>>>    osaf/libs/core/mds/mds_dt_tipc.c |  32
>>>> +++++++++++++++++++++++++-------
>>>>    1 files changed, 25 insertions(+), 7 deletions(-)
>>>>
>>>>
>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c
>>>> b/osaf/libs/core/mds/mds_dt_tipc.c
>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c
>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c
>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid,
>>>>                    m_MDS_LOG_INFO("MDTM: Successfully set default 
>>>> socket option TIPC_IMP = %d", TIPCIMPORTANCE);
>>>>            }
>>>>    +        int droppable = 0;
>>>> +        if (setsockopt(tipc_cb.BSRsock, SOL_TIPC,
>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) {
>>>> +                LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to 
>>>> + zero
>>>> err :%s\n", strerror(errno));
>>>> +                m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE
>>>> to zero err :%s\n", strerror(errno));
>>>> +                osafassert(0);
>>>> +        } else {
>>>> +                m_MDS_LOG_NOTIFY("MDTM: Successfully set
>>>> TIPC_DEST_DROPPABLE to zero");
>>>> +        }
>>>> +
>>>>        return NCSCC_RC_SUCCESS;
>>>>    }
>>>>    @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd,
>>>>        unsigned char *cptr;
>>>>        int i;
>>>>        int has_addr;
>>>> +    int anc_data[2];
>>>> +
>>>>        ssize_t sz;
>>>>          has_addr = (from != NULL) && (addrlen != NULL); @@ -591,19
>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd,
>>>>                   if the message was sent using a TIPC name or name 
>>>> sequence as the
>>>>                   destination rather than a TIPC port ID So abort 
>>>> for TIPC_ERRINFO and TIPC_RETDATA*/
>>>>                if (anc->cmsg_type == TIPC_ERRINFO) {
>>>> -                /* TIPC_ERRINFO - TIPC error code associated with a
>>>> returned data message or a connection termination message  so abort */
>>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>> condition ancillary data: TIPC_ERRINFO abort err :%s",
>>>> strerror(errno) );
>>>> -                abort();
>>>> +                anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0));
>>>> +                if (anc_data[0] == TIPC_ERR_OVERLOAD) {
>>>> +                    LOG_CR("MDTM: undelivered message condition
>>>> ancillary data: TIPC_ERR_OVERLOAD");
>>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>> condition ancillary data: TIPC_ERR_OVERLOAD");
>>>> +                } else {
>>>> +                    /* TIPC_ERRINFO - TIPC error code associated
>>>> with a returned data message or a connection termination message  
>>>> so abort */
>>>> +                    LOG_CR("MDTM: undelivered message condition
>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]);
>>>> +                    m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>> condition ancillary data: TIPC_ERRINFO abort err : %d", 
>>>> anc_data[0]);
>>>> +                }
>>>>                } else if (anc->cmsg_type == TIPC_RETDATA) {
>>>> -                /* If we set TIPC_DEST_DROPPABLE off messge
>>>> (configure TIPC to return rejected messages to the sender )
>>>> +                /* If we set TIPC_DEST_DROPPABLE off message
>>>> (configure TIPC to return rejected messages to the sender )
>>>>                       we will hit this when we implement MDS 
>>>> retransmit lost messages  abort can be replaced with flow control 
>>>> logic*/
>>>>                    for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) {
>>>> -                    m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n",
>>>> *cptr);
>>>> +                    LOG_CR("MDTM: returned byte 0x%02x\n", *cptr);
>>>> +                    m_MDS_LOG_CRITICAL("MDTM: returned byte
>>>> 0x%02x\n", *cptr);
>>>>                        cptr++;
>>>>                    }
>>>>                    /* TIPC_RETDATA -The contents of a returned data 
>>>> message  so abort */
>>>> -                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>> condition ancillary data: TIPC_RETDATA abort err :%s",
>>>> strerror(errno) );
>>>> -                abort();
>>>> +                LOG_CR("MDTM: undelivered message condition
>>>> ancillary data: TIPC_RETDATA");
>>>> +                m_MDS_LOG_CRITICAL("MDTM: undelivered message
>>>> condition ancillary data: TIPC_RETDATA");
>>>>                } else if (anc->cmsg_type == TIPC_DESTNAME) {
>>>>                    if (sz == 0) {
>>>>                        m_MDS_LOG_DBG("MDTM: recd bytes=0 on 
>>>> received on sock, abnormal/unknown  condition. Ignoring");


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to