Scott,

Thanks for your debugging and analysis, and not just blindly accepting my 
explanation.  It is easy to read what you think should happen (or intended 
to happen) into code that you wrote rather that just accepting.

[EMAIL PROTECTED] wrote on 10/26/2007 11:23:18 AM:
> > However, for #2 this approach can create problems.  The only reason 
that I 
> > have the buffers for partial messages is that we must resend partial 
> > messages, otherwise the TCP socket can get corrupted.  However, for 
> > messages which we did not send at all, we don't need to keep them 
around; 
> > they can just be "lost" like any other UDP packet.  If we keep them 
> > around, we'll end up with an infinitely growing amount of memory when 
we 
> > start reaching overload.
> 
> It doesn't make sense to me that the TCP side would intentionally lose
> data. I understand that the protocol needs to deal with UDP and lost
> packets, but one of the primary reasons for selecting TCP is so that
> data won't be lost. 
Yes, the TCP side should never lose a packet; and if it does it is a bug 
that needs to be corrected.
 
> In the case of saving a particular message on a 0-byte write, it's only
> a few bytes more than saving a partial message. Or is it the case that
> several messages would get saved to pending messages in that case (as
> opposed to really only one partially-written message)? I see where that
> could cause a potential issue, but I still don't think that dropping the
> TCP data is the right thing to do.
The problem isn't with the individual packet, it is with the fact that 
every single call will stick data into that buffer.  We can only ever have 
a single partial message, but we can have thousands of full messages that 
would just get stuffed in the buffer; so we really need to schedule the 
calls.

> > Have you observed a non-blocking write return zero?  At least during 
my 
> > testing it would return -1 and then a EWOULDBLOCK (or EAGAIN), which 
would 
> > cause the call to be rescheduled.  Of course, if this lasts forever 
you 
> > eventually need to fail the send which is why -send_timeout or 
> > timeout="XXX" on a <send> attribute does.
> 
> I said it returns 0, but I meant that it returns -1/EAGAIN (too long
> since I've looked at C/C++ code :-). But the program logic in either
> case (a return of <=0) is the same in  the existing code, and in the
> existing code, it doesn't reschedule the call -- it just drops that data
> request, and the call eventually fails/times out. Which is what you
> implied you expected in the previous paragraph, so I'm confused whether
> you think this is working correctly or not.

> It occurred to me that the intent might be to reschedule sending the
> data, because the send_message() return call is propagated up the stack,
> and the call::run() method says:
> 
> if(send_status == -1) { /* Would Block on TCP */
> 
> But the send_status is 0 when TCP blocked (it comes ultimately from
> enter_congestion(), which always returns 0). But I'm also not sure that
> returning the correct status (-1) would actually reschedule the call;
In the trunk the enter_congestion does return -1:
/* This socket is congestion, mark its as such and add it to the poll 
files. */
int enter_congestion(struct sipp_socket *socket, int again) {
  socket->ss_congested = true;

  TRACE_MSG((s,"Problem %s on socket  %d and poll_idx  is %d \n",
        again == EWOULDBLOCK ? "EWOULDBLOCK" : "EAGAIN",
        socket->ss_fd, socket->ss_pollidx));

  pollfiles[socket->ss_pollidx].events |= POLLOUT;

  nb_net_cong++;
  return -1;
}

This probably is what is causing some confusion.

> I
> went down that path, and it didn't seem to work (the message is still
> never resent), and then it occurred to me that simply queuing the unsent
> message as if it were a partial message would accomplish the same goal.
It is done a bit indirectly.

    msg_snd = send_scene(msg_index, &send_status);
    if(send_status == -1 && errno == EWOULDBLOCK) {
      if (incr_cseq) --cseq;
      /* Have we set the timeout yet? */
      if (send_timeout) {
        /* If we have actually timed out. */
          ...
      } else if (scenario[msg_index]->timeout) {
        /* Initialize the send timeout to the per message timeout. */
        send_timeout = clock_tick + scenario[msg_index]->timeout;
      } else if (defl_send_timeout) {
        /* Initialize the send timeout to the global timeout. */
        send_timeout = clock_tick + defl_send_timeout;
      }
      return true; /* No step, nothing done, retry later */

The call returns true, so it stays around, but the msg_index has never 
been advanced.  When the call is run next time, it will try to retransmit 
the message.

Can you try your test with the trunk?

Charles

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Sipp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sipp-users

Reply via email to