For polling, you want to make sure it is SIP level response. So, OPTION is 
one alternative.

Every DNS query result has TTL. You may want to use TTL to determine its 
life span. Or you may want to apply a predefined life span (much smaller 
than TTL) to each DSN query result. Once the life span is passed, the DNS 
query result need to be removed from your cache, and a new DNS query is 
performed for the same server when a next outgoing call is processed. If 
the server is failing, it repeats the same story. The value of this idea 
is that it allows the UA to detect server failure and optimizes call 
traffic handling by eliminating call setup attempt on servers that are 
already known to have failed.

/Guang
 



Kenneth Soerensen <[EMAIL PROTECTED]> 
04/11/2008 09:00 AM

To
[email protected]
cc
[EMAIL PROTECTED]
Subject
Re: [Sip-implementors] RFC3263 DNS SRV and fail over






Hi

It is exactly these problems we are struggling with.

You suggest to remember the the state of servers and then periodically
poll to check if they have come into service again. How would you
perform this polling - using an OPTION or REGISTER request?

Maybe a good solution would be to remember the state for some time and
then just forget it. The next ordinary request will then be used to
detect if the server is still failing?

/Kenneth


On tor, 2008-04-10 at 09:18 -0400, [EMAIL PROTECTED] wrote:
> 
> Kenneth 
> 
> The proposed behavior in RFC 3263 for a SIP UA to locate redundantly
> deployed SIP servers would result in long delay in call setup when
> there is a server failure. For a UA with high traffic rate, the
> behavior in 3263 does not scale well. 
> 
> I bumped into the same set of problems two years ago in our attempt to
> implement 3263 for an IP-PBX that hosts 1500+ phones. 
> 
> For a server address that has experenced communication difficulty,
> e.g., connection setup failure, request retransmission failure, 503,
> etc., the server address should be tagged as "failed". Once a server
> address is tagged as "failed" in a failed call setup attempt, should
> any subsequent call try the same address? IMHO, I would say no. For a
> UA represents large number of endpoints, each server address resolved
> from Naptr/Srv/A query should be shared between calls initiated from
> multiple endpoints. 
> 
> How would one know when a failed server come back in service?
> Discarding it from cache won't work because there is no way to take it
> back. You may want to keep it in cache and probe it periodically for a
> predefined window of time. If the server responds to a probe within
> the window, the server address can be tagged as "in service", so that
> a next call will use it. 
> 
> You may also want to consider to include machanism that can handle DNS
> server provisioning change, which typically take days to propagate
> through DNS network. 
> 
> /Guang 
> 
> 
> 
> 
> Kenneth Soerensen
> <[EMAIL PROTECTED]> 
> Sent by:
> [EMAIL PROTECTED] 
> 
> 04/10/2008 04:33 AM 
> 
> 
>                To
> [email protected] 
>                cc
> 
>           Subject
> [Sip-implementors] RFC3263 DNS SRV and fail over
> 
> 
> 
> 
> 
> 
> 
> 
> Hi
> 
> We are discussing how to deal with this scenario:
> 
> A SIP UA is communicating with two redundant SIP servers - a primary
> and
> a secondary. The addresses and priorities of these servers are
> obtained
> through DNS SRV.
> 
> The primary server is failing and the UA needs to communicate with the
> secondary server.
> 
> As I understand RFC3263 the UA must try to communicate with the
> primary
> server for every new transaction and then fail over to the secondary.
> However, this will introduce very long call setup delays as RFC3261
> specifies a timeout of 32 seconds (timer B = 64 x T1). A solution for
> this could be to reduce the timeout but is that a good idea?
> 
> On the other hand RFC3263 page 4 states:
> 
> --------------
>   The identity of the available server would ideally be cached for
> some
>   amount of time in order to reduce call setup delays of subsequent
>   calls.  The client cannot query a failed server continuously to
>   determine when it becomes available again, since this does not
> scale.
>   Furthermore, the availability state must eventually be flushed in
>   order to redistribute load to recovered elements when they come back
>   online.
> --------------
> 
> This indicates that it would be a good idea to remember that the
> primary
> server is unavailable for some time. This could reduce the call setup
> delays while using the secondary server. However, the UA would not
> switch back to the primary server as soon as possible.
> 
> To make the situation even more complicated our UA contains up to 1000
> wireless endpoints. This could enable us to use the information
> gathered
> by one endpoint for the rest of the endpoints.
> 
> What is the preferred way to handle this problem?
> 
> Thank you
> 
> /Kenneth
> _______________________________________________
> Sip-implementors mailing list
> [email protected]
> https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors
> 

_______________________________________________
Sip-implementors mailing list
[email protected]
https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors

Reply via email to