Thank you all for the advice!  I have consolidated this into a design 
document and submitted to my peers for review.   The rest of my response 
inline.

On 12/09/2011 12:47 AM, Kevin P. Fleming wrote:
> On 12/08/2011 10:22 AM, Paul Kyzivat wrote:
>> On 12/8/11 8:16 AM, Joegen Baclor wrote:
>>> On 12/08/2011 06:47 AM, Worley, Dale R (Dale) wrote:
>>>>> From: Joegen Baclor [[email protected]]
>>>>>
>>>>> I am implementing an RLS service that shares load via DNS/SRV records.
>>>>> Let us say I have 2 RLS servers sharing the load equally for 200
>>>>> subscribers.  In an ideal setting, each would service 100 subscribers
>>>>> each.  In the moment one goes down, subscribers from the other server
>>>>> spills over to the other server.  So at one point all 200 subscribers
>>>>> are now subscribed to a single server.  When the node that went down
>>>>> previously goes up, I like to be able to bring back load sharing between
>>>>> the two nodes.   There seems to be no obvious mechanism to do this in
>>>>> SIP.  I would appreciate some insights and suggestions.
>>>> There are at least two approaches.  One approach is to have the RLS
>>>> servers split the load on a request-by-request basis, so that they are
>>>> jointly the notifyer UA for the subscriptions.  This requires that
>>>> they share all of the subscription state.  When a server sends the 200
>>>> to a subscription-initiating SUBSCRIBE, it inserts a Record-Route
>>>> containing a DNS name that resolves to both servers.  Thus, each
>>>> SUBSCRIBE within the dialogs goes randomly to either of the servers,
>>>> and the server that receives the SUBSCRIBE replicates the updated
>>>> subscription state to the other server.  For any NOTIFY to be
>>>> generated, the servers have to decide which of them is to send it, and
>>>> update their mutually-held subscription state to record that fact.
>>>> This would be rather high-overhead to implement.

As I document this scenario, It became evident to me right away that 
joint notification is not trivial.  It makes sense to make the RLS who 
first accepted the dialog to be the notifier since it inherently 
benefited from DNS/SRV load share.  The nasty part would be electing a 
new notifier in the case where the original notifier is no longer 
available.   It will require extra work simply to monitor if the elected 
notifier is dead or alive.  We do have a fast message queue at our 
disposal to broadcast such event though.  So I remain optimistic that 
this is the most viable solution.  Subscription state replication can be 
done inband using internal database replication.


>>>>
>>>> Another approach is to have both servers serve the same data, but any
>>>> single subscription is terminated on only one server as notifying UA.
>>>> If one server goes down, when a UA whose subscription is notified by
>>>> that server decides to re-SUBSCRIBE, it discovers the server is down.
>>>> The UA reestablishes a new subscription, which goes to the other
>>>> server.  When the down server comes up again, the load is unbalanced.
>>>> The overloaded server could force some subscriptions to be
>>>> reestablished by sending "NOTIFY/Subscription-State: terminated".  But
>>>> I don't think there's any point in doing that -- of necessity, either
>>>> server must be able to handle the full load without degrading the
>>>> system.
>>>>
>>>> Dale
>>>>

Elegant and simple but the drawback is the amount of time it takes to 
achieve equilibrium in load sharing.


>>> Thanks Dale.   I think you do have a good handle on this being the
>>> author of RLS in sipX.  I am actually contemplating on option 1 and
>>> simply have a static algorithm in the proxy to somehow control the
>>> balancing based on realtime load of each RLS server.  I am indeed
>>> tempted to resort to simply use the 3265's Notify with state
>>> "terminated" and hope that the UAC is compliant and re-establish the
>>> dialog but having to rely on "UAC's" compliance  almost always guaranty
>>> an interoperability issue.
>> If you are using DNS/SRV load balancing, then simply terminating the
>> subscription and having the subscriber reissue it is sub-optimal. Since
>> the load balancing is static, presumably half of the terminated
>> subscriptions will come back to the server that terminated them. So you
>> will end up only shifting 25% of the load to the recovered server.


I agree.  This approach is not optimal.


>>
>> You could do something extreme, like send a REFER/Replaces with
>> method=SUBSCRIBE, using an address that prefers the server you want to
>> transfer to. But the likelihood of that being supported by the
>> subscriber is slim to none.


I don't think it would be an option.  Interoperability is always one of 
the given.


>>
>> I'm inclined to agree with Dale thatyou just do nothing, and let it
>> level out based on attrition.
> Keep in mind there is an additional caveat of Dale's proposed solution
> #2 (and all such solutions): during the time after an RLS server fails,
> and before the UA has attempted to re-SUBSCRIBE, the UA will not receive
> any NOTIFY messages for its subscriptions on that server. If the
> subscription time is fairly short, the UA will re-subscribe and get
> NOTIFYs to start flowing again.
>
>
Yes, I agree.




_______________________________________________
Sip-implementors mailing list
[email protected]
https://lists.cs.columbia.edu/cucslists/listinfo/sip-implementors

Reply via email to