> -----Original Message-----
> From: A V Mahesh [mailto:[email protected]]
> Sent: den 29 oktober 2013 12:25
> To: Hans Feldt
> Cc: [email protected]
> Subject: Re: assert in MDS (using TCP)
> 
> Hi Hans,
> 
> On 10/28/2013 3:18 PM, Hans Feldt wrote:
> > I have not been able yet to reproduce. But from reading the code it is
> > obvious that there is a queue of max 200 messages when using MDS/TCP.
> > If that queue gets longer you get an assert in the library! Such queue
> > does not exist in the MDS/TIPC case. Could you explain why it is there?
> 
> [AVM]
> 
> - non-blocking sockets can return  EAGAIN or EWOULDBLOCK
> -  and EINTR .
> 
> The idea is to retry later(by queueing) when the above errors occur:

But the TCP will have a socket buffer to fill, this is just extra space that 
just adds complexity. I think you should remove it completely or just return an 
error instead of assert.

Assert means there is a bug in the MDS library and it is not, just temporary 
overload.

> I think the following check( it is missing) should be added to the code
> you pointed out:
> ==============================================================================
>   if ((send_len == -1) || (send_len != bufflen)) {
> 
>         /*Only queue if  socket returns EAGAIN, EWOULDBLOCK and EINTR */
>         if (errno == EWOULDBLOCK ||  errno == EINTR ||  errno == EAGAIN) {
>              syslog(LOG_ERR, "MDTM : Send() Failed with  err :%s ,adding
> to add unsent msg queue ", strerror(errno));

Guess you should use the mds log and not syslog

>              return mds_mdtm_queue_add_unsent_msg(tcp_buffer, bufflen);
>          }
>          else {
>               syslog(LOG_ERR, "MDTM : Send() Failed with  err :%s",
> strerror(errno));
>               return NCSCC_RC_FAILURE;
>          }
> }
> ==============================================================================
> 
> 
> >
> > To me this is a bug and should be fixed. MDS could for example return
> > an error code instead of assert.
> > Do you agree?
> [AVM]
> If you think that the above error situations will not occur very rare,
> then we could perhaps remove the assert.
> 
> 
> -AVM
> >
> > Thanks,
> > Hans
> >
> > On 10/15/2013 12:59 PM, A V Mahesh wrote:
> >> Hi Hans,
> >>
> >> Assuming you tried this in an UML environment, can you share the
> >> backtrace?
> >> B.T.W, i am not able to repordcue the problme even with writes a
> >> burst of larger number of records.
> >> Perhaps LOG is a slow receiver, will get back once you share the
> >> backtrace.
> >>
> >> -AVM
> >>
> >>
> >> but not able to reprodcue the  the log server crash
> >> can you please suggenst exat tescase whic logtest that writes a burst
> >> of 700 records with 5 us interval.
> >>
> >> On 10/14/2013 6:52 PM, Hans Feldt wrote:
> >>>
> >>> Hi,
> >>>
> >>> Using the OpenSAF test program "logtest" and the latest opensaf
> >>> configured with MDS/TCP crashes the log server in the
> >>> assert in mds_mdtm_queue_add_unsent_msg():
> >>>
> >>>> ++tcp_cb->mdtm_tcp_unsent_counter; /* Increment the counter to keep
> >>>> a tab on number of messages */
> >>>>     if (tcp_cb->mdtm_tcp_unsent_counter <= DTM_INTRANODE_UNSENT_MSG) {
> >>>>         if (NULL == hdr && NULL == tail) {
> >>>>             tcp_cb->mds_mdtm_msg_unsent_hdr = tmp;
> >>>>             tcp_cb->mds_mdtm_msg_unsent_tail = tmp;
> >>>>         } else {
> >>>>             tail->next = tmp;
> >>>>             tcp_cb->mds_mdtm_msg_unsent_tail = tmp;
> >>>>
> >>>>             /* Change the poll from POLLIN to POLLOUT */
> >>>>             pfd[0].events = pfd[0].events | POLLOUT;
> >>>>         }
> >>>>     } else {
> >>>>         syslog(LOG_ERR, " MDTM unsent message is more!=%d",
> >>>> DTM_INTRANODE_UNSENT_MSG);
> >>>>         assert(0);
> >>>>         return NCSCC_RC_FAILURE;
> >>>>     }
> >>>
> >>> $ grep DTM_INTRANODE_UNSENT_MSG include/*
> >>> include/mds_dt_tcp_disc.h:#define DTM_INTRANODE_UNSENT_MSG 200
> >>>
> >>> mds_mdtm_unsent_queue_add_send() is the only place
> >>> mds_mdtm_queue_add_unsent_msg() is called.
> >>>
> >>> mds_mdtm_unsent_queue_add_send() can return an error code, none of
> >>> its callers check the return code! I guess it
> >>> should return void then and abort internally.
> >>>
> >>> Can you explain what is going on?
> >>>
> >>> Thanks,
> >>> hans
> >>
> >>
> >>


------------------------------------------------------------------------------
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to