Hi HansN, Do you have any tips to created overload case,
I would like test and observe TIPC_DEST_DROPPABLE enabled & disabled cases. -AVM On 9/1/2016 9:12 AM, A V Mahesh wrote: > Hi HansN, > > Sorry for the delay. > > I will test it and get back to you soon. > > -AVM > > > On 8/31/2016 4:29 PM, Hans Nordebäck wrote: >> Hi Mahesh, >> Any updates on this? >> >> /Regards HansN >> >> -----Original Message----- >> From: Anders Widell >> Sent: den 25 augusti 2016 13:11 >> To: A V Mahesh <mahesh.va...@oracle.com>; Hans Nordebäck >> <hans.nordeb...@ericsson.com>; mathi.naic...@oracle.com >> Cc: opensaf-devel@lists.sourceforge.net >> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >> >> Hi! >> >> This is what the TIPC user documentation says about TIPC_DEST_DROPPABLE: >> "This option governs the handling of messages sent by the socket if >> the message cannot be delivered to its destination, either because >> the receiver is congested or because the specified receiver does not >> exist. >> If enabled, the message is discarded; otherwise the message is >> returned to the sender." >> >> This is what the TIPC user documentation says about the return value >> from the recvmsg() system call: "When used with a connectionless >> socket, a return value of 0 indicates the arrival of a returned data >> message that was originally sent by this socket." >> >> I think the documentation is pretty clear. If you set >> TIPC_DEST_DROPPABLE to true, the receiver can discard messages e.g. >> when the receive buffer is full. The sender will not be notified in >> this case. If TIPC_DEST_DROPPABLE is set to false, the message will >> be returned to the sender in case of a full receive buffer. The >> sender knows that it has received such a returned message when the >> recvmsg() call returns zero. >> >> regards, >> Anders Widell >> >> On 08/25/2016 11:30 AM, A V Mahesh wrote: >>> Hi HansN, >>> >>> >>> On 8/23/2016 5:22 PM, Hans Nordebäck wrote: >>> >>>> Hi Mahesh, >>>> >>>> Yes, this is my understanding too, if TIPC_DROPPABLE = true tipc may >>>> drop messages silently, at receive sock buffer full condition, but >>>> do not return any ancillary message. >>>> If TIPC_DROPPABLE = false tipc may drop message but will send an >>>> ancillary message to inform about TIPC_ERR_OVERLOAD. >>> [AVM] >>> >>> My observation are understanding is different, based on TIPC code and >>> Linux TIPC 2.0 Programmer's Guide , that the TIPC_ERR_OVERLOAD error >>> returned when TIPC is unable to enqueue an incoming message on the >>> receiving socket's receive queue irrelevant of TIPC_DEST_DROPPABLE >>> enabled or disabled. >>> >>> The only difference between TIPC_DEST_DROPPABLE enabled or disabled is >>> , If TIPC_DEST_DROPPABLE enabled, the message is discarded and >>> recvmsg() returned size is ZERO and application will get errors, if >>> TIPC_DEST_DROPPABLE disabled the message is returned to the sender it >>> means the recvmsg() returned size is user send data size and >>> application will get errors . >>> >>> I did check the TIPC code and documentations and I haven't get any >>> evidences that TIPC_ERR_OVERLOAD error code will be send only If >>> TIPC_DEST_DROPPABLE = false. >>> >>> Even while testing #1227 >>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) my >>> observations and understanding was, an individual TIPC socket is only >>> allowed to queue up >>> OVERLOAD_LIMIT_BASE/2 messages of the lowest importance level before >>> it starts rejecting them. >>> Once a socket receiving queue length exceeds the maximum limit value, >>> the receiving socket will send out a reject message with >>> TIPC_ERR_OVERLOAD error code with cmsg_type as >>> TIPC_ERRINFO/TIPC_RETDATA, and the tipc code and Linux TIPC 2.0 >>> Programmer's Guide confirmed the same . >>> >>> tipc/socket.c >>> ======================================================= >>> /* Reject message if there isn't room to queue it */ >>> >>> recv_q_len = (u32)atomic_read(&tipc_queue_size); >>> if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) { >>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE)) >>> return TIPC_ERR_OVERLOAD; >>> } >>> recv_q_len = skb_queue_len(&sk->sk_receive_queue); >>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) { >>> if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2)) >>> return TIPC_ERR_OVERLOAD; >>> } >>> ======================================================= >>> >>> >>> 2.1.17. setsockopt() of TIPC 2.0 Programmer's Guide >>> ======================================================= >>> TIPC_DEST_DROPPABLE >>> This option governs the handling of messages sent by the socket if the >>> message cannot be delivered to its destination, either because the >>> receiver is congested or because the specified receiver does not >>> exist. If enabled, the message is discarded; otherwise the message is >>> returned to the sender. >>> >>> By default, this option is disabled for SOCK_SEQPACKET and SOCK_STREAM >>> socket types, and enabled for SOCK_RDM and SOCK_DGRAM, This >>> arrangement ensures proper teardown of failed connections when >>> connection-oriented data transfer is used, without increasing the >>> complexity of connectionless data transfer. >>> >>> TIPC_SRC_DROPPABLE >>> This option governs the handling of messages sent by the socket if >>> link congestion occurs. If enabled, the message is discarded; >>> otherwise the system queues the message for later transmission. >>> By default, this option is disabled for SOCK_SEQPACKET, SOCK_STREAM, >>> and SOCK_RDM socket types (resulting in "reliable" data transfer), and >>> enabled for SOCK_DGRAM (resulting in "unreliable" data transfer). >>> ======================================================= >>> >>> Now I will try to create OVERLOAD case and update you soon my latest >>> observations. >>> >>> -AVM >>> >>>> Correcting this and adding an abort is not backward compatible as >>>> some service already handle flow control in some way, only log when >>>> packages are dropped. >>>> Regarding ticket #1960 there are other solutions than introducing >>>> flow control in MDS, e.g. expose an option to the service to choose >>>> connection oriented or connection less. >>>> The problem with dropped messages seems in one case related to, (by >>>> MDS), intensive MDS logging. >>>> >>>> /Thanks HansN >>>> -----Original Message----- >>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>> Sent: den 23 augusti 2016 11:27 >>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>> Cc: opensaf-devel@lists.sourceforge.net >>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>>> >>>> Hi HansN, >>>> >>>> It seems I am missing some thing , please allow me to under stand >>>> >>>> If I currently understand you observation : >>>> >>>> With current Opensaf code ( this #1957 patch NOT applied ) , by >>>> default TIPC_DROPPABLE=true ,while running Opensaf with that binary >>>> when TIPC_ERR_OVERLOAD occurring, TIPC is not given errors >>>> TIPC_ERRINFO or TIPC_RETDATA and following code is not being get hit >>>> of function recvfrom_connectionless(), is my understanding right ? >>>> >>>> ===================================================================== >>>> ======================================== >>>> >>>> >>>> *if (anc->cmsg_type == TIPC_ERRINFO) {* >>>> /* TIPC_ERRINFO - TIPC error code associated with a returned >>>> data message or a connection termination message so abort */ >>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>> ancillary >>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>> *abort();* >>>> *} else if (anc->cmsg_type == TIPC_RETDATA) {* >>>> /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC to >>>> return rejected messages to the sender ) >>>> we will hit this when we implement MDS retransmit lost >>>> messages abort can be replaced with flow control logic*/ >>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); >>>> cptr++; >>>> } >>>> /* TIPC_RETDATA -The contents of a returned data message so >>>> abort */ >>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>> ancillary >>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>> *abort();* >>>> } >>>> >>>> ===================================================================== >>>> ======================================== >>>> >>>> >>>> -AVM >>>> >>>> >>>> On 8/23/2016 1:08 PM, Hans Nordebäck wrote: >>>>> Hi Mahesh, >>>>> >>>>> Please see response below with [HansN] /Thanks HansN >>>>> >>>>> -----Original Message----- >>>>> From: A V Mahesh [mailto:mahesh.va...@oracle.com] >>>>> Sent: den 23 augusti 2016 08:25 >>>>> To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Anders Widell >>>>> <anders.wid...@ericsson.com>; mathi.naic...@oracle.com >>>>> Cc: opensaf-devel@lists.sourceforge.net >>>>> Subject: Re: [PATCH 1 of 1] MDS: Log TIPC dropped messages [#1957] >>>>> >>>>> Hi HansN >>>>> >>>>> Please see response below with [AVM] >>>>> >>>>> -AVM >>>>> >>>>> On 8/23/2016 11:41 AM, Hans Nordebäck wrote: >>>>>> Hi Mahesh, >>>>>> >>>>>> please see comments below. >>>>>> >>>>>> /Thanks HansN >>>>>> >>>>>> >>>>>> On 08/23/2016 07:21 AM, A V Mahesh wrote: >>>>>>> Hi HansN, >>>>>>> >>>>>>> Let us fist discuss the error handling and abort, then we can come >>>>>>> back to interpretation of TIPC currently does permit OR does >>>>>>> not permit an application to send a multicast message with the >>>>>>> "destination droppable" setting disabled. >>>>>>> >>>>>>> Let us disable TIPC_DEST_DROPPABLE, so that TIPC will try to >>>>>>> return an undelivered multicast message to its sender and we can >>>>>>> determine issue is because of TIPC_ERR_OVERLOAD, this helps in >>>>>>> debugging , so that application may increased SO_SNDBUF/SO_RCVBUF >>>>>>> to reduce the problem. >>>>>>> >>>>>>> But still we need to abort(), the reason for that is current MDS >>>>>>> implementations doesn't have flow control logic ( no retry because >>>>>>> of error ) , so Application like AMF can go wrong and cluster will >>>>>>> go into unstable/recoverble state. >>>>>>> >>>>>> [HansN] In the current implementation messages are dropped silently >>>>>> and no abort is done. >>>>> [AVM] I can see abort(); in current code , you mean abort(); is >>>>> not working and application(amf) is not existing ? >>>>> [HansN] In case of TIPC_DROPPABLE=true and messages are dropped, >>>>> (TIPC_ERR_OVERLOAD) no abort is be performed, e.g amfd detects this >>>>> in the msg sanity chk and logs "invalid msg id ..." >>>>> ==================================================================== >>>>> == >>>>> ====== >>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>> /* TIPC_ERRINFO - TIPC error code associated with a returned >>>>> data message or a connection termination message so abort */ >>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>>> ancillary >>>>> data: TIPC_ERRINFO abort err :%s", strerror(errno) ); >>>>> *abort();* >>>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>>> /* If we set TIPC_DEST_DROPPABLE off messge (configure TIPC >>>>> to return rejected messages to the sender ) >>>>> we will hit this when we implement MDS retransmit lost >>>>> messages abort can be replaced with flow control logic*/ >>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; i--) { >>>>> m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", *cptr); >>>>> cptr++; >>>>> } >>>>> /* TIPC_RETDATA -The contents of a returned data message so >>>>> abort */ >>>>> m_MDS_LOG_CRITICAL("MDTM: undelivered message condition >>>>> ancillary >>>>> data: TIPC_RETDATA abort err :%s", strerror(errno) ); >>>>> *abort();* >>>>> } >>>>> ==================================================================== >>>>> == >>>>> ====== >>>>>> This patch enables logging >>>>>> when packages are dropped to help in debugging. I don't agree that >>>>>> we should also introduce abort, but instead: >>>>>> 1) Implement a solution to handle dropped packages, ticket #1960 >>>>> [AVM] This is nothing but flow control implementation in MDS, this >>>>> is future enhancement >>>>> >>>>>> 2) Investigate why packages may be dropped, the receiving MDS >>>>>> thread is a real time thread and should be able to consume a large >>>>>> amount of incoming messages. >>>>>> E.g. is the receiving MDS thread "live hanging" due to locks, file >>>>>> I/O etc? >>>>>>> This was the reason we haven't gone for it while addressing Ticket >>>>>>> #1227 >>>>>>> (https://sourceforge.net/p/opensaf/mailman/message/33207717/) >>>>>>> So currently we don't have any advantage of disabling >>>>>>> TIPC_DEST_DROPPABLE and not allowing multicast messages. >>>>>>> >>>>>>> -AVM >>>>>>> >>>>>>> >>>>>>> On 8/18/2016 2:43 PM, Hans Nordeback wrote: >>>>>>>> osaf/libs/core/mds/mds_dt_tipc.c | 32 >>>>>>>> +++++++++++++++++++++++++------- >>>>>>>> 1 files changed, 25 insertions(+), 7 deletions(-) >>>>>>>> >>>>>>>> >>>>>>>> diff --git a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>> b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>> --- a/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>> +++ b/osaf/libs/core/mds/mds_dt_tipc.c >>>>>>>> @@ -320,6 +320,15 @@ uint32_t mdtm_tipc_init(NODE_ID nodeid, >>>>>>>> m_MDS_LOG_INFO("MDTM: Successfully set >>>>>>>> default socket option TIPC_IMP = %d", TIPCIMPORTANCE); >>>>>>>> } >>>>>>>> + int droppable = 0; >>>>>>>> + if (setsockopt(tipc_cb.BSRsock, SOL_TIPC, >>>>>>>> TIPC_DEST_DROPPABLE, &droppable, sizeof(droppable)) != 0) { >>>>>>>> + LOG_ER("MDTM: Can't set TIPC_DEST_DROPPABLE to >>>>>>>> + zero >>>>>>>> err :%s\n", strerror(errno)); >>>>>>>> + m_MDS_LOG_ERR("MDTM: Can't set >>>>>>>> + TIPC_DEST_DROPPABLE >>>>>>>> to zero err :%s\n", strerror(errno)); >>>>>>>> + osafassert(0); >>>>>>>> + } else { >>>>>>>> + m_MDS_LOG_NOTIFY("MDTM: Successfully set >>>>>>>> TIPC_DEST_DROPPABLE to zero"); >>>>>>>> + } >>>>>>>> + >>>>>>>> return NCSCC_RC_SUCCESS; >>>>>>>> } >>>>>>>> @@ -563,6 +572,8 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>>> unsigned char *cptr; >>>>>>>> int i; >>>>>>>> int has_addr; >>>>>>>> + int anc_data[2]; >>>>>>>> + >>>>>>>> ssize_t sz; >>>>>>>> has_addr = (from != NULL) && (addrlen != NULL); @@ >>>>>>>> -591,19 >>>>>>>> +602,26 @@ ssize_t recvfrom_connectionless (int sd, >>>>>>>> if the message was sent using a TIPC name or >>>>>>>> name sequence as the >>>>>>>> destination rather than a TIPC port ID So >>>>>>>> abort for TIPC_ERRINFO and TIPC_RETDATA*/ >>>>>>>> if (anc->cmsg_type == TIPC_ERRINFO) { >>>>>>>> - /* TIPC_ERRINFO - TIPC error code associated >>>>>>>> with a >>>>>>>> returned data message or a connection termination message so >>>>>>>> abort */ >>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>> condition ancillary data: TIPC_ERRINFO abort err :%s", >>>>>>>> strerror(errno) ); >>>>>>>> - abort(); >>>>>>>> + anc_data[0] = *((unsigned int*)(CMSG_DATA(anc) + >>>>>>>> 0)); >>>>>>>> + if (anc_data[0] == TIPC_ERR_OVERLOAD) { >>>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>>> ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>>> + message >>>>>>>> condition ancillary data: TIPC_ERR_OVERLOAD"); >>>>>>>> + } else { >>>>>>>> + /* TIPC_ERRINFO - TIPC error code associated >>>>>>>> with a returned data message or a connection termination message >>>>>>>> so abort */ >>>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>>> ancillary data: TIPC_ERRINFO abort err : %d", anc_data[0]); >>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered >>>>>>>> + message >>>>>>>> condition ancillary data: TIPC_ERRINFO abort err : %d", >>>>>>>> anc_data[0]); >>>>>>>> + } >>>>>>>> } else if (anc->cmsg_type == TIPC_RETDATA) { >>>>>>>> - /* If we set TIPC_DEST_DROPPABLE off messge >>>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>>> + /* If we set TIPC_DEST_DROPPABLE off message >>>>>>>> (configure TIPC to return rejected messages to the sender ) >>>>>>>> we will hit this when we implement MDS >>>>>>>> retransmit lost messages abort can be replaced with flow control >>>>>>>> logic*/ >>>>>>>> for (i = anc->cmsg_len - sizeof(*anc); i > 0; >>>>>>>> i--) { >>>>>>>> - m_MDS_LOG_DBG("MDTM: returned byte 0x%02x\n", >>>>>>>> *cptr); >>>>>>>> + LOG_CR("MDTM: returned byte 0x%02x\n", >>>>>>>> *cptr); >>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: returned byte >>>>>>>> 0x%02x\n", *cptr); >>>>>>>> cptr++; >>>>>>>> } >>>>>>>> /* TIPC_RETDATA -The contents of a returned >>>>>>>> data message so abort */ >>>>>>>> - m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>> condition ancillary data: TIPC_RETDATA abort err :%s", >>>>>>>> strerror(errno) ); >>>>>>>> - abort(); >>>>>>>> + LOG_CR("MDTM: undelivered message condition >>>>>>>> ancillary data: TIPC_RETDATA"); >>>>>>>> + m_MDS_LOG_CRITICAL("MDTM: undelivered message >>>>>>>> condition ancillary data: TIPC_RETDATA"); >>>>>>>> } else if (anc->cmsg_type == TIPC_DESTNAME) { >>>>>>>> if (sz == 0) { >>>>>>>> m_MDS_LOG_DBG("MDTM: recd bytes=0 on >>>>>>>> received on sock, abnormal/unknown condition. Ignoring"); >> > ------------------------------------------------------------------------------ _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel