Hi,

As a starter for your exploration, a few key points:

(1) When we talk about stateful proxies, we (and the standards) mean that they 
are transaction-stateful, not something like "dialog-stateful". 

A transaction consists of a SIP request and 0 or more provisional replies 
(where applicable) and a final dispositive reply (2xx - 6xx), although ACK is a 
little special. 

This is what a stateful proxy has memory of, and, aside from conferring a 
slight performance benefit[1], it is needed to implement things like failover 
timers. Can't have a timeout if you aren't tracking something.

(2) Neither transaction state, nor dialog state, nor any other kind of state, 
is required to route a SIP message, with the exception of a CANCEL (see below). 

Thus, this formulation is actually quite incorrect: "then use stateful 
responses to direct client back to same node for subsequent messages in a 
dialog."

You do not need state to route in-dialog requests to the correct place, as 
these are routed via the Route/RR set in the SIP request body itself. You do 
not need state to route replies to the correct place, whether to in-dialog 
requests or any other kind of request, because this is done through the 'Via' 
header stack.

So, everything that is needed to route SIP requests and replies within some 
sort of context that persists for some amount of time (SIP calls this a dialog) 
can actually be found in the content of SIP messages themselves. 

You can easily show this by using a stateful-only (i.e. TM) configuration and 
restarting Kamailio in the middle of a pending or established call. Try it and 
watch the capture. You will see that every message you expect to have 
delivered, whether request or reply, will make it exactly where you think it 
should, even though Kamailio has lost all transaction state[2]. 

A firm grasp of this is very important to any redundancy ruminations.

(3) The exception to this is a CANCEL, and that is because a CANCEL is a 
so-called "hop-by-hop" request. 

Whereas most requests and replies pass through the proxy, the proxy is actually 
an independent party to CANCEL requests. That is to say, when a party CANCELs 
an INVITE, it actually asks the proxy to CANCEL it, and the proxy asks any 
upstream branches to CANCEL separately. This is to make the forking behaviour 
of proxies possible. 

The consequence of this is that when a proxy receives a CANCEL request, it 
needs transaction state in order to know which upstream branches to match it up 
to. If it is lost, it won't know what to do with the CANCEL.

This is the primary obstacle to anycast setups, from my point of view. You can 
count on any proxy to relay requests statelessly in a correct fashion, but you 
can't count on any proxy to process a CANCEL correctly. So, if a CANCEL goes to 
a different place than the INVITE to which it corresponds, it'll be dropped on 
the floor.

...

Otherwise, and notwithstanding transparent approaches like anycast, the methods 
you're contemplating are all variations of a common idea: the redundancy and 
failover is provided on the client side, in principle. Actual choice of method 
here is usually dictated by what the concrete clients in questio support. For 
example, not all clients support DNS-based failover, or may not implement it in 
the way you want. If you're offering a service to many different kinds of 
clients or devices, you'll have to take that into account.

-- Alex

[1] At a memory cost, but this isn't really a factor in modern computing.

[2] Where it exists. An established call (200 OK + e2e ACK, no BYE yet) will 
actually not have any transaction state, since all transactions involved in 
establishing it have been terminated and no further transactions, e.g. to hang 
it up, have been created. 

> On Dec 14, 2022, at 6:56 PM, Jawaid Bazyar <baz...@gmail.com> wrote:
> 
> Hi,
>  I am exploring different redundancy / load-balancing models for a Kamailio 
> cluster.  When I say cluster, I mean, a number (N) of Kamailio nodes acting 
> as stateful proxies.
>  Each node is configured the same as the others, and all have access to the 
> same lookup data to make routing decisions.
>  I would appreciate any advice or experience any of you can share on these 
> different models.
>  Overall model:
>  
>     • Direct to proxies
>     • Redirect servers first, which redirect to proxies
>  Selecting the first node to talk to. Each model could use either type of 
> selection.
>  
>     • DNS-based (SRV or NAPTR, client makes call to dns name)
>     • Anycast with ECMP (equal-cost multi-path routing)
>     • Cluster with a mobile IP and service-down detection (this would just 
> provide 1:1 protection)
>  Have clients make calls through the proxy using a DNS record containing an 
> SRV record for each node (or, alternatively, done with NAPTR). Would rely on 
> the client to switch nodes in the event of a node failure mid-call. (Is that 
> even possible?)
>  Anycast would only work with UDP signaling. Use Anycast to find the first 
> proxy, then use stateful responses to direct client back to same node for 
> subsequent messages in a dialog.
>  So for anyone who has tried any of these methods, I would love to hear the 
> pros and cons..
>  Thanks in advance!
>  Jawaid
>   __________________________________________________________
> Kamailio - Users Mailing List - Non Commercial Discussions
> To unsubscribe send an email to sr-users-le...@lists.kamailio.org
> Important: keep the mailing list in the recipients, do not reply only to the 
> sender!
> Edit mailing list options or unsubscribe:


-- 
Alex Balashov | Principal | Evariste Systems LLC

Tel: +1-706-510-6800 / +1-800-250-5920 (toll-free)
Web: http://www.evaristesys.com/, http://www.csrpswitch.com/

__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
To unsubscribe send an email to sr-users-le...@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the 
sender!
Edit mailing list options or unsubscribe:

Reply via email to