On Tue, 2009-11-10 at 02:43 -0500, Dale Worley wrote:
> XX-6065 is a can of worms:  There are a number of problems that can
> produce the symptom (a phone rings forever), and a number of symptoms
> that can arise from the principal cause (a phone does not respond to an
> INVITE with a 1xx response quickly enough).

This is a great summary of the whole mess (Dale has nicely emptied the
can of worms and sorted them for us by size and type!).

I propose that we divide our response to all this into three separate
work items (issues): one for each of #2, #3, and #4 below.

> Background:  The following sipx-dev threads contain discussion of some
> of the underlying design questions regarding the request resend
> schedule.  The particular message pointed to for each thread is one I
> consider to be particularly relevant:
> 
> http://list.sipfoundry.org/archive/sipx-dev/msg15757.html
> Subject: [sipX-dev] Improving the resend schedule
> 
> 
> http://list.sipfoundry.org/archive/sipxecs-commit/msg20957.html
> Subject: [SFtrack] Issue Comment Edited: (XECS-1589) SipXecs does not
> gracefully handle broken TCP streams
> 
> http://list.sipfoundry.org/archive/sipx-dev/msg15640.html
> Subject: Re: [sipX-dev] Retransmission of messages over TCP
> 
> 
> Component problems/solutions:
> 
> 1.  As far as I know, all SIP phones will ring forever if they receive
> an INVITE but no CANCEL.  This is really silly, as there is no situation
> where a phone should ring more than, say, 15 minutes.  So we should
> suggest to phone vendors that the phone have a "dead-man" expiration of
> 15 minutes or so.

#1 and #5 are not really ours to fix... the best we can do is point them
out to phone vendors and hope for the best.

> 2.  To circumvent (1), *every* INVITE generated by sipXecs should have
> an Expires header.  Fortunately, every transaction in the proxy has an
> expiration time, so we can have the proxy add an Expires header to every
> INVITE that does not already have one without changing the functionality
> in all non-error cases.  To avoid having both ends of the transaction
> cancel it at the same time (causing clutter in traces), the added
> Expires value should be 1 or 2 seconds longer than the expiration in the
> proxy transaction.

#2 seems like a reasonable idea, and might help.  I would make that
extra time more like 5 seconds - we got into this can of worms because
things are not responding in reasonable times: let's not ask for a new
race condition.  There is, I think, significant potential here for
changes in the behavior of the system as a whole with this, so it should
be done separately from everything else, and we should watch for changes
afterward.

> 3.  If the proxy receives a 1xx response to an INVITE that it has sent,
> but the transaction in the proxy has already (internally) canceled that
> leg, the proxy should send a CANCEL.  Currently, if no 1xx was received
> for the INVITE, the proxy will (correctly) not send a CANCEL, but if a
> 1xx is received subsequently, the proxy will not then send a CANCEL.
> (Kathy E. tells me that at least part of the machinery needed for this
> is present in SipTransaction; it is possible that I am requesting the
> intended behavior, but that there is an outright bug in sipXtack about
> this.)

This seems perfectly safe, and should be as easy to change as anything
in SipTransaction ever is (that is to say, not very).

> 4.  The timeout for resending to a non-responding destination should be
> increased.  The current timeout is 1.5 seconds.  This appears to have
> been chosen only because we thought it was sufficient for the slowest
> networks that we expected sipXecs to be used on.  (See the discussions
> referenced above.)  The new value needs to be discussed, but values up
> to 6 seconds have been suggested.  The constraints are that the value
> should be high enough to avoid incorrectly marking destinations as
> unresponsive, but low enough to not annoy users in HA systems with
> nonresponding components.  (IIRC, the typical HA worst case is 3 times
> the time-out:  once for a proxy, once for a registrar, and once for a
> target application server.)
> 
> Note that this would be done by increasing the number of resend cycles,
> rather than increasing the length of the first resend cycle (the T1
> value), because T1 is set based on the most common case (where RTT < 100
> msec).

This should be a very easy change, and we should do this one first.  I
strongly suspect that this alone will solve the problem that Mark first
reported (which doesn't mean that the other changes above are not also
worthwhile improvements).

> 5.  If phones take more than 100 msec to respond to an INVITE, they are
> out of specification.  If response times are much more than that, we
> should prod the phone vendors to investigate the problem.  (This is
> really a "soft real-time" problem, and that is a known problem domain.)

See above.

_______________________________________________
sipx-dev mailing list [email protected]
List Archive: http://list.sipfoundry.org/archive/sipx-dev
Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-dev
sipXecs IP PBX -- http://www.sipfoundry.org/

Reply via email to