Hi Charles --
Ah, the code in the trunk is different in this case than the code I had
(which was from the 2.0.1 bundle), and in particular, the
enter_congestion handling is different. So now I am seeing the TCP
requests resent as is apparently expected, so the 2.0.1-based patch I
had isn't needed after all -- except the part for Solaris that Oliver
already put into the trunk (thanks Oliver!).
Thanks for the help.
-Scott
On Fri, 2007-10-26 at 12:07, Charles P Wright wrote:
> Scott,
>
> Thanks for your debugging and analysis, and not just blindly accepting my
> explanation. It is easy to read what you think should happen (or intended
> to happen) into code that you wrote rather that just accepting.
>
> [EMAIL PROTECTED] wrote on 10/26/2007 11:23:18 AM:
> > > However, for #2 this approach can create problems. The only reason
> that I
> > > have the buffers for partial messages is that we must resend partial
> > > messages, otherwise the TCP socket can get corrupted. However, for
> > > messages which we did not send at all, we don't need to keep them
> around;
> > > they can just be "lost" like any other UDP packet. If we keep them
> > > around, we'll end up with an infinitely growing amount of memory when
> we
> > > start reaching overload.
> >
> > It doesn't make sense to me that the TCP side would intentionally lose
> > data. I understand that the protocol needs to deal with UDP and lost
> > packets, but one of the primary reasons for selecting TCP is so that
> > data won't be lost.
> Yes, the TCP side should never lose a packet; and if it does it is a bug
> that needs to be corrected.
>
> > In the case of saving a particular message on a 0-byte write, it's only
> > a few bytes more than saving a partial message. Or is it the case that
> > several messages would get saved to pending messages in that case (as
> > opposed to really only one partially-written message)? I see where that
> > could cause a potential issue, but I still don't think that dropping the
> > TCP data is the right thing to do.
> The problem isn't with the individual packet, it is with the fact that
> every single call will stick data into that buffer. We can only ever have
> a single partial message, but we can have thousands of full messages that
> would just get stuffed in the buffer; so we really need to schedule the
> calls.
>
> > > Have you observed a non-blocking write return zero? At least during
> my
> > > testing it would return -1 and then a EWOULDBLOCK (or EAGAIN), which
> would
> > > cause the call to be rescheduled. Of course, if this lasts forever
> you
> > > eventually need to fail the send which is why -send_timeout or
> > > timeout="XXX" on a <send> attribute does.
> >
> > I said it returns 0, but I meant that it returns -1/EAGAIN (too long
> > since I've looked at C/C++ code :-). But the program logic in either
> > case (a return of <=0) is the same in the existing code, and in the
> > existing code, it doesn't reschedule the call -- it just drops that data
> > request, and the call eventually fails/times out. Which is what you
> > implied you expected in the previous paragraph, so I'm confused whether
> > you think this is working correctly or not.
>
> > It occurred to me that the intent might be to reschedule sending the
> > data, because the send_message() return call is propagated up the stack,
> > and the call::run() method says:
> >
> > if(send_status == -1) { /* Would Block on TCP */
> >
> > But the send_status is 0 when TCP blocked (it comes ultimately from
> > enter_congestion(), which always returns 0). But I'm also not sure that
> > returning the correct status (-1) would actually reschedule the call;
> In the trunk the enter_congestion does return -1:
> /* This socket is congestion, mark its as such and add it to the poll
> files. */
> int enter_congestion(struct sipp_socket *socket, int again) {
> socket->ss_congested = true;
>
> TRACE_MSG((s,"Problem %s on socket %d and poll_idx is %d \n",
> again == EWOULDBLOCK ? "EWOULDBLOCK" : "EAGAIN",
> socket->ss_fd, socket->ss_pollidx));
>
> pollfiles[socket->ss_pollidx].events |= POLLOUT;
>
> nb_net_cong++;
> return -1;
> }
>
> This probably is what is causing some confusion.
>
> > I
> > went down that path, and it didn't seem to work (the message is still
> > never resent), and then it occurred to me that simply queuing the unsent
> > message as if it were a partial message would accomplish the same goal.
> It is done a bit indirectly.
>
> msg_snd = send_scene(msg_index, &send_status);
> if(send_status == -1 && errno == EWOULDBLOCK) {
> if (incr_cseq) --cseq;
> /* Have we set the timeout yet? */
> if (send_timeout) {
> /* If we have actually timed out. */
> ...
> } else if (scenario[msg_index]->timeout) {
> /* Initialize the send timeout to the per message timeout. */
> send_timeout = clock_tick + scenario[msg_index]->timeout;
> } else if (defl_send_timeout) {
> /* Initialize the send timeout to the global timeout. */
> send_timeout = clock_tick + defl_send_timeout;
> }
> return true; /* No step, nothing done, retry later */
>
> The call returns true, so it stays around, but the msg_index has never
> been advanced. When the call is run next time, it will try to retransmit
> the message.
>
> Can you try your test with the trunk?
>
> Charles
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Sipp-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sipp-users