On Tue, 2009-11-10 at 02:43 -0500, Dale Worley wrote: > XX-6065 is a can of worms: There are a number of problems that can > produce the symptom (a phone rings forever), and a number of symptoms > that can arise from the principal cause (a phone does not respond to an > INVITE with a 1xx response quickly enough).
This is a great summary of the whole mess (Dale has nicely emptied the can of worms and sorted them for us by size and type!). I propose that we divide our response to all this into three separate work items (issues): one for each of #2, #3, and #4 below. > Background: The following sipx-dev threads contain discussion of some > of the underlying design questions regarding the request resend > schedule. The particular message pointed to for each thread is one I > consider to be particularly relevant: > > http://list.sipfoundry.org/archive/sipx-dev/msg15757.html > Subject: [sipX-dev] Improving the resend schedule > > > http://list.sipfoundry.org/archive/sipxecs-commit/msg20957.html > Subject: [SFtrack] Issue Comment Edited: (XECS-1589) SipXecs does not > gracefully handle broken TCP streams > > http://list.sipfoundry.org/archive/sipx-dev/msg15640.html > Subject: Re: [sipX-dev] Retransmission of messages over TCP > > > Component problems/solutions: > > 1. As far as I know, all SIP phones will ring forever if they receive > an INVITE but no CANCEL. This is really silly, as there is no situation > where a phone should ring more than, say, 15 minutes. So we should > suggest to phone vendors that the phone have a "dead-man" expiration of > 15 minutes or so. #1 and #5 are not really ours to fix... the best we can do is point them out to phone vendors and hope for the best. > 2. To circumvent (1), *every* INVITE generated by sipXecs should have > an Expires header. Fortunately, every transaction in the proxy has an > expiration time, so we can have the proxy add an Expires header to every > INVITE that does not already have one without changing the functionality > in all non-error cases. To avoid having both ends of the transaction > cancel it at the same time (causing clutter in traces), the added > Expires value should be 1 or 2 seconds longer than the expiration in the > proxy transaction. #2 seems like a reasonable idea, and might help. I would make that extra time more like 5 seconds - we got into this can of worms because things are not responding in reasonable times: let's not ask for a new race condition. There is, I think, significant potential here for changes in the behavior of the system as a whole with this, so it should be done separately from everything else, and we should watch for changes afterward. > 3. If the proxy receives a 1xx response to an INVITE that it has sent, > but the transaction in the proxy has already (internally) canceled that > leg, the proxy should send a CANCEL. Currently, if no 1xx was received > for the INVITE, the proxy will (correctly) not send a CANCEL, but if a > 1xx is received subsequently, the proxy will not then send a CANCEL. > (Kathy E. tells me that at least part of the machinery needed for this > is present in SipTransaction; it is possible that I am requesting the > intended behavior, but that there is an outright bug in sipXtack about > this.) This seems perfectly safe, and should be as easy to change as anything in SipTransaction ever is (that is to say, not very). > 4. The timeout for resending to a non-responding destination should be > increased. The current timeout is 1.5 seconds. This appears to have > been chosen only because we thought it was sufficient for the slowest > networks that we expected sipXecs to be used on. (See the discussions > referenced above.) The new value needs to be discussed, but values up > to 6 seconds have been suggested. The constraints are that the value > should be high enough to avoid incorrectly marking destinations as > unresponsive, but low enough to not annoy users in HA systems with > nonresponding components. (IIRC, the typical HA worst case is 3 times > the time-out: once for a proxy, once for a registrar, and once for a > target application server.) > > Note that this would be done by increasing the number of resend cycles, > rather than increasing the length of the first resend cycle (the T1 > value), because T1 is set based on the most common case (where RTT < 100 > msec). This should be a very easy change, and we should do this one first. I strongly suspect that this alone will solve the problem that Mark first reported (which doesn't mean that the other changes above are not also worthwhile improvements). > 5. If phones take more than 100 msec to respond to an INVITE, they are > out of specification. If response times are much more than that, we > should prod the phone vendors to investigate the problem. (This is > really a "soft real-time" problem, and that is a known problem domain.) See above. _______________________________________________ sipx-dev mailing list [email protected] List Archive: http://list.sipfoundry.org/archive/sipx-dev Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-dev sipXecs IP PBX -- http://www.sipfoundry.org/
