Kenneth

The proposed behavior in RFC 3263 for a SIP UA to locate redundantly 
deployed SIP servers would result in long delay in call setup when there 
is a server failure. For a UA with high traffic rate, the behavior in 3263 
does not scale well.

I bumped into the same set of problems two years ago in our attempt to 
implement 3263 for an IP-PBX that hosts 1500+ phones.

For a server address that has experenced communication difficulty, e.g., 
connection setup failure, request retransmission failure, 503, etc., the 
server address should be tagged as "failed". Once a server address is 
tagged as "failed" in a failed call setup attempt, should any subsequent 
call try the same address? IMHO, I would say no. For a UA represents large 
number of endpoints, each server address resolved from Naptr/Srv/A query 
should be shared between calls initiated from multiple endpoints.

How would one know when a failed server come back in service? Discarding 
it from cache won't work because there is no way to take it back. You may 
want to keep it in cache and probe it periodically for a predefined window 
of time. If the server responds to a probe within the window, the server 
address can be tagged as "in service", so that a next call will use it. 

You may also want to consider to include machanism that can handle DNS 
server provisioning change, which typically take days to propagate through 
DNS network.

/Guang





Kenneth Soerensen <[EMAIL PROTECTED]> 
Sent by: [EMAIL PROTECTED]
04/10/2008 04:33 AM

To
[email protected]
cc

Subject
[Sip-implementors] RFC3263 DNS SRV and fail over






Hi

We are discussing how to deal with this scenario:

A SIP UA is communicating with two redundant SIP servers - a primary and
a secondary. The addresses and priorities of these servers are obtained
through DNS SRV.

The primary server is failing and the UA needs to communicate with the
secondary server.

As I understand RFC3263 the UA must try to communicate with the primary
server for every new transaction and then fail over to the secondary.
However, this will introduce very long call setup delays as RFC3261
specifies a timeout of 32 seconds (timer B = 64 x T1). A solution for
this could be to reduce the timeout but is that a good idea?

On the other hand RFC3263 page 4 states:

--------------
   The identity of the available server would ideally be cached for some
   amount of time in order to reduce call setup delays of subsequent
   calls.  The client cannot query a failed server continuously to
   determine when it becomes available again, since this does not scale.
   Furthermore, the availability state must eventually be flushed in
   order to redistribute load to recovered elements when they come back
   online.
--------------

This indicates that it would be a good idea to remember that the primary
server is unavailable for some time. This could reduce the call setup
delays while using the secondary server. However, the UA would not
switch back to the primary server as soon as possible.

To make the situation even more complicated our UA contains up to 1000
wireless endpoints. This could enable us to use the information gathered
by one endpoint for the rest of the endpoints.

What is the preferred way to handle this problem?

Thank you

/Kenneth
_______________________________________________
Sip-implementors mailing list
[email protected]
https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors

_______________________________________________
Sip-implementors mailing list
[email protected]
https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors

Reply via email to