Hi,

Sorry for the delay, I was(am) quite busy integrating our product with
sipx 4.2. First of all, thanks for your reply. I will try to answer as
best as I can.

>#1 - If you have two sites, Site 1 and Site 2 then typically, the IP
>subnets fot both sites would be different...  It seems to be that the
>scheme you present would impose that all nodes be part of the same
subnet.
Yes, this is a requirement for this design. All three nodes have to be
in the same subnet.

>#2 - Even if we overcome problem #1, sipXecs components cannot
reconfigure
>the IP address then bind to on the fly.  This means that the whole
sipXecs
>would have to be brought up upon detecting the fail-over.
Yes, node 3 is a cold standby. Once the IP failover happens, the sipXecs
on it is started up. It should have the exact copy of the config files
of node 1 and the same IP address when start happens. According to my
experience the proxy and registrar come up pretty quick, so the outage
should be short.

>#3 - What is the trigger for initiating the 'fail-over' process?  This
is
>kind of tricky actually...
Yes it is. The first implementation I plan will be based on a
requirement we have where there is a SIP trunk connected to each site
and only one is active. There the SIP trunk failover triggers the
IP/SIPX failover. I would like to keep the failover detection logic
separate so, that it can be changed later with minimum impact on the
rest of the system.

>#4 - Sometimes, configuration changes made through sipXconfig affect
>multiple configuration files.  If a crash happens in the middle of the
>generation of a multi-file configuration change then partial
configuration
>will end up at the replicated node.  It's hard to predict how the
partial
>configuration will affect the fail-over system but I think it's fair to
say
>that the results will not be what the customer expects.
I think initially it's OK if the profile is sent to the nodes again
manually if this happens. It's a good point and worth thinking about.

>I am not a clustering expert so maybe there are trivial answers to the
>points I'm raising.  I have you studied the techniques employed by
>clustering technologies out there but I'd be interested in getting your
>perspective on this.
Me neither. If there'd be trivial answers I wouldn't bring this
discussion to the list. ;)

BR,
Chris

-----Original Message-----
From: JOLY, ROBERT (ROBERT) [mailto:[email protected]] 
Sent: Thursday, May 20, 2010 9:43 PM
To: Krisztian Ganyai; [email protected]
Subject: RE: Fully redundant sipxecs

 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Krisztian Ganyai
> Sent: Thursday, May 06, 2010 7:56 AM
> To: [email protected]
> Subject: [sipX-dev] Fully redundant sipxecs
> 
> Hi,
> 
> There are some services in sipXecs, which are SPOF even in 
> redundant sipXecs deployments. These are: the 
> mediaservices(ACD, conferencing, VM) and the config(admin and 
> user portal).
> 
> Our goal is to build a fully HA system. We have come up with 
> multiple ideas and some of them we discussed briefly with the 
> sipx developers already. The most promising idea we have to 
> date looks like this:
> 
> .------------------------.
> .---------------------------------------------.
> | Site 1                 |   | Site 2
> |
> |                        |   |
> |
> | .--------------------. |   | .-------------------.
> .-------------------. |
> | | Node 1             | |   | | Node 2            | | Node 3(cs)
> | |
> | | (service IP)       | |   | |                   | |
> | |
> | |                    | |   | |                   | |
> | |
> | | .---------------.  | |   | | .---------------. | | 
> .---------------.
> | |
> | | | Proxy(pri.)   |----------->| Proxy(sec.)   | | | | 
> Proxy(pri.)   |
> | |
> | | '---------------'  | |   | | '---------------' | | 
> '---------------'
> | |
> | | .---------------.  | |   | |                   | | 
> .---------------.
> | |
> | | | Config        |  | |   | |                   | | | 
> Config        |
> | |
> | | '---------------'  | |   | |                   | | 
> '---------------'
> | |
> | | .---------------.  | |   | |                   | | 
> .---------------.
> | |
> | | | Mediaservices |  | |   | |                   | | | 
> Mediaservices |
> | |
> | | '---------------'  | |   | |                   | | 
> '---------------'
> | |
> | |                    | |   | |                   | |
> | |
> | '--------------------' |   | '-------------------'
> '-------------------' |
> |           |            |   |                                 |
> |
> '-----------|------------'
> '---------------------------------|-----------'
>             |            .------------------------.            |
>             |            | Replicated config      |            |
>             '----------->| (using DRBD)           |<-----------'
>                          '------------------------'
> Figure 1: Before failover
> 
> The idea behind is to extend the basic redundant 
> configuration sipfoundry suggests(Node 1 on Site 1 and Node 2 
> on Site 2) with a 3rd node(Node 3) on Site 2. The Node 3 acts 
> as a cold standby primary node and has all configuration 
> files synced to/from the primary server.
> 
> The configuration directories(/etc/sipxpbx and the 
> /var/sipxdata) between the primary Node 1 and the cold 
> standby Node 3 would be replicated using DRBD, which is a 
> filesystem replication facility available in CentOS.
> 
> Since the configuration is shared, the Node 1 and Node 3 has 
> to have same IP address as we don't want to mess around with 
> the config files.
> To avoid IP conflicts, there would be 3 IP addresses for the 
> Node 1 and Node 2. From the three, one would be the "service 
> IP", for which sipXecs is configured. When both primary nodes 
> start, they come up with one of the non-service IP address 
> and with some mechanism they decide who should be active. The 
> active one takes over the service IP(using ifconfig eth0 for 
> eg) and starts the sipxecs services. When failover 
> happens(previous active goes down), the other node takes over 
> the service IP and starts sipXecs service.
> 
> .------------------------.
> .---------------------------------------------.
> | Site 1                 |   | Site 2
> |
> | (DOWN)                 |   |
> |
> | .--------------------. |   | .-------------------.
> .-------------------. |
> | | Node 1             | |   | | Node 2            | | Node 3
> | |
> | | (DOWN)             | |   | |                   | | (service IP)
> | |
> | |                    | |   | |                   | |
> | |
> | | .---------------.  | |   | | .---------------. | | 
> .---------------.
> | |
> | | | Proxy(pri.)   |  | |   | | | Proxy(sec.)   |<----| 
> Proxy(pri.)   |
> | |
> | | '---------------'  | |   | | '---------------' | | 
> '---------------'
> | |
> | | .---------------.  | |   | |                   | | 
> .---------------.
> | |
> | | | Config        |  | |   | |                   | | | 
> Config        |
> | |
> | | '---------------'  | |   | |                   | | 
> '---------------'
> | |
> | | .---------------.  | |   | |                   | | 
> .---------------.
> | |
> | | | Mediaservices |  | |   | |                   | | | 
> Mediaservices |
> | |
> | | '---------------'  | |   | |                   | | 
> '---------------'
> | |
> | |                    | |   | |                   | |
> | |
> | '--------------------' |   | '-------------------'
> '-------------------' |
> |           |            |   |                                 |
> |
> '-----------|------------'
> '---------------------------------|-----------'
>             |            .------------------------.            |
>             |            | Replicated config      |            |
>             '----------->| (using DRBD)           |<-----------'
>                          '------------------------'
> Figure 2: After failover
> 
> Since it's probably not only us who'd like to have a fully 
> redundant sipXecs we decided not to do it undercover, but 
> share our ides/progress with the community. I hope we can 
> come up with something usable.
> Chris

Chris, thank you very much for putting this together.  Sorry for the
late reply.  Although your ASCII art didn't come out nice and clean on
my e-mail reader, I got an idea of what you are trying to do.  There are
a couple of fundamental elements I cannot quite piece together.  Here
they are.

#1 - If you have two sites, Site 1 and Site 2 then typically, the IP
subnets fot both sites would be different.  With the description you
present, it appears that the Service IP address needs to be routable in
both sites.  How would you manage that?  It seems to be that the scheme
you present would impose that all nodes be part of the same subnet.
That way the service IP address remains routable when 'transitioning'
from one node to the other.  Such an IP address 'transition' would
require that you send out gratuitous ARPs to update everyones ARP table.

#2 - Even if we overcome problem #1, sipXecs components cannot
reconfigure the IP address then bind to on the fly.  This means that the
whole sipXecs would have to be brought up upon detecting the fail-over.

#3 - What is the trigger for initiating the 'fail-over' process?  This
is kind of tricky actually.  If you have some kind of ping going between
nodes 1 and 3, how do we deal with cases where the network connectivity
between 1 and 3 is broken but both nodes remain operational and
reachable to a subset of devices.  Would that not lead to a situation
where both 1 and 3 think that should be active, leaving two active
primary servers out there?  And during that time where both are active,
if some configuration changes are independently made on both nodes, how
do we reconcile/merge configuration changes when the connectivity
problems heal between nodes 1 and 3?

#4 - Sometimes, configuration changes made through sipXconfig affect
multiple configuration files.  If a crash happens in the middle of the
generation of a multi-file configuration change then partial
configuration will end up at the replicated node.  It's hard to predict
how the partial configuration will affect the fail-over system but I
think it's fair to say that the results will not be what the customer
expects.

I am not a clustering expert so maybe there are trivial answers to the
points I'm raising.  I have you studied the techniques employed by
clustering technologies out there but I'd be interested in getting your
perspective on this.

Thanks again,
bob
_______________________________________________
sipx-dev mailing list [email protected]
List Archive: http://list.sipfoundry.org/archive/sipx-dev
Unsubscribe: http://list.sipfoundry.org/mailman/listinfo/sipx-dev
sipXecs IP PBX -- http://www.sipfoundry.org/

Reply via email to