Kenneth The proposed behavior in RFC 3263 for a SIP UA to locate redundantly deployed SIP servers would result in long delay in call setup when there is a server failure. For a UA with high traffic rate, the behavior in 3263 does not scale well.
I bumped into the same set of problems two years ago in our attempt to implement 3263 for an IP-PBX that hosts 1500+ phones. For a server address that has experenced communication difficulty, e.g., connection setup failure, request retransmission failure, 503, etc., the server address should be tagged as "failed". Once a server address is tagged as "failed" in a failed call setup attempt, should any subsequent call try the same address? IMHO, I would say no. For a UA represents large number of endpoints, each server address resolved from Naptr/Srv/A query should be shared between calls initiated from multiple endpoints. How would one know when a failed server come back in service? Discarding it from cache won't work because there is no way to take it back. You may want to keep it in cache and probe it periodically for a predefined window of time. If the server responds to a probe within the window, the server address can be tagged as "in service", so that a next call will use it. You may also want to consider to include machanism that can handle DNS server provisioning change, which typically take days to propagate through DNS network. /Guang Kenneth Soerensen <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 04/10/2008 04:33 AM To [email protected] cc Subject [Sip-implementors] RFC3263 DNS SRV and fail over Hi We are discussing how to deal with this scenario: A SIP UA is communicating with two redundant SIP servers - a primary and a secondary. The addresses and priorities of these servers are obtained through DNS SRV. The primary server is failing and the UA needs to communicate with the secondary server. As I understand RFC3263 the UA must try to communicate with the primary server for every new transaction and then fail over to the secondary. However, this will introduce very long call setup delays as RFC3261 specifies a timeout of 32 seconds (timer B = 64 x T1). A solution for this could be to reduce the timeout but is that a good idea? On the other hand RFC3263 page 4 states: -------------- The identity of the available server would ideally be cached for some amount of time in order to reduce call setup delays of subsequent calls. The client cannot query a failed server continuously to determine when it becomes available again, since this does not scale. Furthermore, the availability state must eventually be flushed in order to redistribute load to recovered elements when they come back online. -------------- This indicates that it would be a good idea to remember that the primary server is unavailable for some time. This could reduce the call setup delays while using the secondary server. However, the UA would not switch back to the primary server as soon as possible. To make the situation even more complicated our UA contains up to 1000 wireless endpoints. This could enable us to use the information gathered by one endpoint for the rest of the endpoints. What is the preferred way to handle this problem? Thank you /Kenneth _______________________________________________ Sip-implementors mailing list [email protected] https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors _______________________________________________ Sip-implementors mailing list [email protected] https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors
