Hi Charles --

Ah, the code in the trunk is different in this case than the code I had
(which was from the 2.0.1 bundle), and in particular, the
enter_congestion handling is different. So now I am seeing the TCP
requests resent as is apparently expected, so the 2.0.1-based patch I
had isn't needed after all -- except the part for Solaris that Oliver
already put into the trunk (thanks Oliver!).

Thanks for the help.

-Scott

On Fri, 2007-10-26 at 12:07, Charles P Wright wrote:
> Scott,
> 
> Thanks for your debugging and analysis, and not just blindly accepting my 
> explanation.  It is easy to read what you think should happen (or intended 
> to happen) into code that you wrote rather that just accepting.
> 
> [EMAIL PROTECTED] wrote on 10/26/2007 11:23:18 AM:
> > > However, for #2 this approach can create problems.  The only reason 
> that I 
> > > have the buffers for partial messages is that we must resend partial 
> > > messages, otherwise the TCP socket can get corrupted.  However, for 
> > > messages which we did not send at all, we don't need to keep them 
> around; 
> > > they can just be "lost" like any other UDP packet.  If we keep them 
> > > around, we'll end up with an infinitely growing amount of memory when 
> we 
> > > start reaching overload.
> > 
> > It doesn't make sense to me that the TCP side would intentionally lose
> > data. I understand that the protocol needs to deal with UDP and lost
> > packets, but one of the primary reasons for selecting TCP is so that
> > data won't be lost. 
> Yes, the TCP side should never lose a packet; and if it does it is a bug 
> that needs to be corrected.
>  
> > In the case of saving a particular message on a 0-byte write, it's only
> > a few bytes more than saving a partial message. Or is it the case that
> > several messages would get saved to pending messages in that case (as
> > opposed to really only one partially-written message)? I see where that
> > could cause a potential issue, but I still don't think that dropping the
> > TCP data is the right thing to do.
> The problem isn't with the individual packet, it is with the fact that 
> every single call will stick data into that buffer.  We can only ever have 
> a single partial message, but we can have thousands of full messages that 
> would just get stuffed in the buffer; so we really need to schedule the 
> calls.
> 
> > > Have you observed a non-blocking write return zero?  At least during 
> my 
> > > testing it would return -1 and then a EWOULDBLOCK (or EAGAIN), which 
> would 
> > > cause the call to be rescheduled.  Of course, if this lasts forever 
> you 
> > > eventually need to fail the send which is why -send_timeout or 
> > > timeout="XXX" on a <send> attribute does.
> > 
> > I said it returns 0, but I meant that it returns -1/EAGAIN (too long
> > since I've looked at C/C++ code :-). But the program logic in either
> > case (a return of <=0) is the same in  the existing code, and in the
> > existing code, it doesn't reschedule the call -- it just drops that data
> > request, and the call eventually fails/times out. Which is what you
> > implied you expected in the previous paragraph, so I'm confused whether
> > you think this is working correctly or not.
> 
> > It occurred to me that the intent might be to reschedule sending the
> > data, because the send_message() return call is propagated up the stack,
> > and the call::run() method says:
> > 
> > if(send_status == -1) { /* Would Block on TCP */
> > 
> > But the send_status is 0 when TCP blocked (it comes ultimately from
> > enter_congestion(), which always returns 0). But I'm also not sure that
> > returning the correct status (-1) would actually reschedule the call;
> In the trunk the enter_congestion does return -1:
> /* This socket is congestion, mark its as such and add it to the poll 
> files. */
> int enter_congestion(struct sipp_socket *socket, int again) {
>   socket->ss_congested = true;
> 
>   TRACE_MSG((s,"Problem %s on socket  %d and poll_idx  is %d \n",
>         again == EWOULDBLOCK ? "EWOULDBLOCK" : "EAGAIN",
>         socket->ss_fd, socket->ss_pollidx));
> 
>   pollfiles[socket->ss_pollidx].events |= POLLOUT;
> 
>   nb_net_cong++;
>   return -1;
> }
> 
> This probably is what is causing some confusion.
> 
> > I
> > went down that path, and it didn't seem to work (the message is still
> > never resent), and then it occurred to me that simply queuing the unsent
> > message as if it were a partial message would accomplish the same goal.
> It is done a bit indirectly.
> 
>     msg_snd = send_scene(msg_index, &send_status);
>     if(send_status == -1 && errno == EWOULDBLOCK) {
>       if (incr_cseq) --cseq;
>       /* Have we set the timeout yet? */
>       if (send_timeout) {
>         /* If we have actually timed out. */
>           ...
>       } else if (scenario[msg_index]->timeout) {
>         /* Initialize the send timeout to the per message timeout. */
>         send_timeout = clock_tick + scenario[msg_index]->timeout;
>       } else if (defl_send_timeout) {
>         /* Initialize the send timeout to the global timeout. */
>         send_timeout = clock_tick + defl_send_timeout;
>       }
>       return true; /* No step, nothing done, retry later */
> 
> The call returns true, so it stays around, but the msg_index has never 
> been advanced.  When the call is run next time, it will try to retransmit 
> the message.
> 
> Can you try your test with the trunk?
> 
> Charles


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Sipp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sipp-users

Reply via email to