Hi Mahesh, Please see response below with [HansN] /Thanks HansN
-----Original Message----- From: A V Mahesh [mailto:mahesh.va...@oracle.com] Sent: den 23 augusti 2016 08:25 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell <anders.wid...@ericsson.com>; mathi.naic...@oracle.com Cc: opensaf-devel@lists.sourceforge.net Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] Hi HansN Please see response below with [AVM] -AVM On 8/23/2016 11:41 AM, Hans Nordebäck wrote: > Hi Mahesh, > > please see comments below. > > /Thanks HansN > > > On 08/23/2016 07:21 AM, A V Mahesh wrote: >> Hi HansN, >> >> Let us fist discuss the error handling and abort, then we can come >> back to interpretation of TIPC currently does permit OR does not >> permit an application to send a multicast message with the >> "destination droppable" setting disabled. >> >> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to return >> an undelivered multicast message to its sender and we can determine >> issue is because of TIPC_ERR_OVERLOAD, this helps in debugging , so >> that application may increased SO_SNDBUF/SO_RCVBUF to reduce the >> problem. >> >> But still we need to abort(), the reason for that is current MDS >> implementations doesn't have flow control logic ( no retry because of >> error ) , so Application like AMF can go wrong and cluster will go >> into unstable/recoverble state. >> > [HansN] In the current implementation messages are dropped silently > and no abort is done. [AVM] I can see abort(); in current code , you mean abort(); is not working and application(amf) is not existing ? [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd detects this in the msg sanity chk and logs "invalid msg id ..." ============================================================================ if (anc->cmsg_type == TIPC_ERRINFO) { /* TIPC_ERRINFO - TIPC error code associated with a returned data message or a connection termination message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_ERRINFO abort err :%s", strerror(errno) ); *abort();* } else if (anc->cmsg_type == TIPC_RETDATA) { /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to return rejected messages to the sender ) we will hit this when we implement MDS retransmit lost messages abort can be replaced with flow control logic*/ for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); cptr++; } /* TIPC_RETDATA -The contents of a returned data message so abort */ m_MDS_LOG_CRITICAL("MDTM: undelivered message condition ancillary data: TIPC_RETDATA abort err :%s", strerror(errno) ); *abort();* } ============================================================================ > This patch enables logging > when packages are dropped to help in debugging. I don't agree that we > should also introduce abort, but instead: > 1) Implement a solution to handle dropped packages, ticket #1960 [AVM] This is nothing but flow control implementation in MDS, this is future enhancement > 2) Investigate why packages may be dropped, the receiving MDS thread > is a real time thread and should be able to consume a large amount of > incoming messages. > E.g. is the receiving MDS thread "live hanging" due to locks, file I/O > etc? >> This was the reason we haven't gone for it while addressing Ticket >> #1227 (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >> So currently we don't have any advantage of disabling >> TIPC_DEST_DROPPABLE and not allowing multicast messages. >> >> -AVM >> >> >> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>> +++++++++++++++++++++++++------- >>> 1 files changed, 25 insertions(+), 7 deletions(-) >>> >>> >>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>> b/osaf/libs/core/mds/mds_dt_tipc.c >>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>> m_MDS_LOG_INFO("MDTM: Successfully set default >>> socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>> } >>> + int droppable = 0; >>> + if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>> + LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to zero >>> err :%s\n", strerror(errno)); >>> + m_MDS_LOG_ERR("MDTM: Can't set TIPC_DEST_DROPPABLE >>> to zero err :%s\n", strerror(errno)); >>> + osafassert(0); >>> + } else { >>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set >>> TIPC_DEST_DROPPABLE to zero"); >>> + } >>> + >>> return NCSCC_RC_SUCCESS; >>> } >>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>> unsigned char *cptr; >>> int i; >>> int has_addr; >>> + int anc_data[2]; >>> + >>> ssize_t sz; >>> has_addr = (from != NULL) && (addrlen != NULL); @@ -591,19 >>> +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>> if the message was sent using a TIPC name or name >>> sequence as the >>> destination rather than a TIPC port ID So abort for >>> TIPC_ERRINFO and TIPC_RETDATA*/ >>> if (anc->cmsg_type == TIPC_ERRINFO) { >>> - /* TIPC_ERRINFO - TIPC error code associated with a >>> returned data message or a connection termination message so abort */ >>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>> strerror(errno) ); >>> - abort(); >>> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + 0)); >>> + if (anc_data[0] == TIPC_ERR_OVERLOAD) { >>> + LOG_CR("MDTM: undelivered message condition >>> ancillary data: TIPC_ERR_OVERLOAD"); >>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>> condition ancillary data: TIPC_ERR_OVERLOAD"); >>> + } else { >>> + /* TIPC_ERRINFO - TIPC error code associated >>> with a returned data message or a connection termination message so >>> abort */ >>> + LOG_CR("MDTM: undelivered message condition >>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>> condition ancillary data: TIPC_ERRINFO abort err : %d", >>> anc_data[0]); >>> + } >>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>> - /* If we set TIPC_DEST_DROPPABLE off messge >>> (configure TIPC to return rejected messages to the sender ) >>> + /* If we set TIPC_DEST_DROPPABLE off message >>> (configure TIPC to return rejected messages to the sender ) >>> we will hit this when we implement MDS >>> retransmit lost messages abort can be replaced with flow control >>> logic*/ >>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>> - m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", >>> *cptr); >>> + LOG_CR("MDTM: returned byte 0x%02x\n", *cptr); >>> + m_MDS_LOG_CRITICAL("MDTM: returned byte >>> 0x%02x\n", *cptr); >>> cptr++; >>> } >>> /* TIPC_RETDATA -The contents of a returned data >>> message so abort */ >>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>> condition ancillary data: TIPC_RETDATA abort err :%s", >>> strerror(errno) ); >>> - abort(); >>> + LOG_CR("MDTM: undelivered message condition >>> ancillary data: TIPC_RETDATA"); >>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>> condition ancillary data: TIPC_RETDATA"); >>> } else if (anc->cmsg_type == TIPC_DESTNAME) { >>> if (sz == 0) { >>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on received >>> on sock, abnormal/unknown condition. Ignoring"); >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel