Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-10 Thread Chris Albertson

--- Steve Underwood [EMAIL PROTECTED] wrote:
 WipeOut wrote:
 
  Granted five 9's is never easy but in a cluster of 10+ servers the 
  system should survive just about anything short of an act of God..
 
 You do realise that is a real dumb statement, don't you? :-)
 
 A cluster of 10 machines, each on a different site. Guarantees from
 the 
 power company - checked personally to see that aren't cheating - that
 
 you have genuinely independant feeds to these sites. Large UPSs, with
 
 diesel generator backups. Multiple diverse telecoms links between the

If he says cluster he likely means 10 servers in one rack.  But still
you are right.  It is all the other stuff that could break.  You
will need paralleld Ethernet switches (Yes they make these, no, they
are NOT cheap.) you will need some kind of fail over.  The switches
can do that for you. (do a google on level 3 switch)

It's the level three switches that make .9 possible but half or
more of your hardware will be just hot spares so it really will
take a rack full of boxes

Each box should have mirrored drives and dual power supplies and each
AC power cord needs to go to it's own UPS

Has anyone tried to build Asterisk on SPARC/Solaris?  One SPARC
server is almost five nines all by itself as it can do thinks
like boot around failed CPU, RAM or disks.  I've actually
pulled a disk drive out of a running Sun SPARC and applications
continoued to run. 



=
Chris Albertson
  Home:   310-376-1029  [EMAIL PROTECTED]
  Cell:   310-990-7550
  Office: 310-336-5189  [EMAIL PROTECTED]
  KG6OMK

__
Do you Yahoo!?
Yahoo! Hotjobs: Enter the Signing Bonus Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-10 Thread Steve Underwood
Hi,

I don't want to drag this into a long thread, but note the original says 
the system should survive just about anything short of an act of God, 
and suddenly you are talking about a reliable server and a few switches. 
These are quite different things. I have yet to see a 5 x 9's server 
room. Fire, mechanical damage and other factors will normally keep the 
location itself well below 5 x 9's. Think system instead of server 
equipment, and the picture looks very different. Even for a single PC 
type server, downtime due to telecoms lines, power problems, fire, 
flood, typhoon damage, theft and a mass of other stuff mught well exceed 
the server unavailablility itself. I've seen many servers not fail in 5 
years. I have yet to see the best location go that long without causing 
at least one substantial period of downtime. 5 x 9's allows about 6 
minutes downtime a year. That means 100% of all failures must have 
automated failover, as manuals repair could never be achieved so fast. 
Physical diversity if essential for that.

Regards,
Steve
Chris Albertson wrote:

--- Steve Underwood [EMAIL PROTECTED] wrote:
 

WipeOut wrote:

   

Granted five 9's is never easy but in a cluster of 10+ servers the 
system should survive just about anything short of an act of God..
 

You do realise that is a real dumb statement, don't you? :-)

A cluster of 10 machines, each on a different site. Guarantees from
the 
power company - checked personally to see that aren't cheating - that

you have genuinely independant feeds to these sites. Large UPSs, with

diesel generator backups. Multiple diverse telecoms links between the
   

If he says cluster he likely means 10 servers in one rack.  But still
you are right.  It is all the other stuff that could break.  You
will need paralleld Ethernet switches (Yes they make these, no, they
are NOT cheap.) you will need some kind of fail over.  The switches
can do that for you. (do a google on level 3 switch)
It's the level three switches that make .9 possible but half or
more of your hardware will be just hot spares so it really will
take a rack full of boxes
Each box should have mirrored drives and dual power supplies and each
AC power cord needs to go to it's own UPS
Has anyone tried to build Asterisk on SPARC/Solaris?  One SPARC
server is almost five nines all by itself as it can do thinks
like boot around failed CPU, RAM or disks.  I've actually
pulled a disk drive out of a running Sun SPARC and applications
continoued to run. 
 



___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-09 Thread Steve Underwood
WipeOut wrote:

Granted five 9's is never easy but in a cluster of 10+ servers the 
system should survive just about anything short of an act of God..
You do realise that is a real dumb statement, don't you? :-)

A cluster of 10 machines, each on a different site. Guarantees from the 
power company - checked personally to see that aren't cheating - that 
you have genuinely independant feeds to these sites. Large UPSs, with 
diesel generator backups. Multiple diverse telecoms links between the 
sites, personally checked multiple times to see there is genuine 
diversity (Its a waste of time asking a telco for guarantees of this 
kind, as they lie by habit). This *might* start to approach 5 9's. Just 
having 10 servers means *very* little.

Regards,
Steve
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-09 Thread Steven Critchfield
On Fri, 2004-01-09 at 21:36, Steve Underwood wrote:
 WipeOut wrote:
 
  Granted five 9's is never easy but in a cluster of 10+ servers the 
  system should survive just about anything short of an act of God..
 
 You do realise that is a real dumb statement, don't you? :-)
 
 A cluster of 10 machines, each on a different site. Guarantees from the 
 power company - checked personally to see that aren't cheating - that 
 you have genuinely independant feeds to these sites. Large UPSs, with 
 diesel generator backups. Multiple diverse telecoms links between the 
 sites, personally checked multiple times to see there is genuine 
 diversity (Its a waste of time asking a telco for guarantees of this 
 kind, as they lie by habit). This *might* start to approach 5 9's. Just 
 having 10 servers means *very* little.

Maybe the fact that the main clusters I have knowledge or in university
settings meant to increase compute power, but cluster tends to have the
connotation of being in one location. In the case of a single location,
the extra machines do mean higher odds of loosing parts due to average
time between failure. A friend of mine made a comment about one of the
top 500 super computer clusters maintenance having to have a box of
memory, and drives. It was mentioned that they lost a certain number of
memory modules a day. That freaked me out as the only times I had
experienced memory failure was due to miss handling not normal course of
computer operation. 

The setup you mention above isn't what I would normally associate with
clustering. It also is unlikely to make a difference for a single office
location keeping their system available.
-- 
Steven Critchfield [EMAIL PROTECTED]

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Nicolas Bougues
On Sun, Jan 04, 2004 at 07:38:16PM +, WipeOut wrote:

 Also a failover system would typically only be 2 servers, if there were 
 a cluster system there could be 10 servers in which case five 9's should 
 be easy..
 

Err, no. five 9s is *never* easy.

Does your telco provide you with SLAs that make five 9s reasonable at
all ?

Do you really need five 9s ? There is no such thing I'm aware of in
enterprise grade telephony. You have to go to carrier grade
equipment, which asterisk, and PCs in general, are definetly not aimed
at.

-- 
Nicolas Bougues
Axialys Interactive
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread WipeOut
Nicolas Bougues wrote:

On Sun, Jan 04, 2004 at 07:38:16PM +, WipeOut wrote:
 

Also a failover system would typically only be 2 servers, if there were 
a cluster system there could be 10 servers in which case five 9's should 
be easy..

   

Err, no. five 9s is *never* easy.

Does your telco provide you with SLAs that make five 9s reasonable at
all ?
Do you really need five 9s ? There is no such thing I'm aware of in
enterprise grade telephony. You have to go to carrier grade
equipment, which asterisk, and PCs in general, are definetly not aimed
at.
 

Granted five 9's is never easy but in a cluster of 10+ servers the 
system should survive just about anything short of an act of God..

Maybe, as mentioned eariler, a more realistic goal for Asterisk is three 
or four 9's.. Three 9's could probably be achived already on a single 
server with RAID and hot swap power so four 9's is probably a good 
target to go for..

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Doug Shubert

 Does your telco provide you with SLAs that make five 9s reasonable at
 all ?


LOL... Our telco services could be down for several hours at a time.

We found than most US Broadband carriers (DSL and Cable) offer a
best effort zero SLA service. If you are using broadband as a primary
transport, expect the failure points to be up stream more than in house.

 Do you really need five 9s ? There is no such thing I'm aware of in
 enterprise grade telephony.

Cisco has a white paper IP Telephony: The Five Nines Story
http://www.cisco.com/warp/public/cc/so/neso/vvda/iptl/5nine_wp.htm

My take on the nine's is that Telcordia SR-323 / Bellcore MIL-HDBK-217
attempted to predict reliability of individual electronic components, and
marketing
departments have used the predictions as sales tools to best an opponents
product.


 You have to go to carrier grade
 equipment, which asterisk, and PCs in general, are definetly not aimed
 at.


Most Carrier and even Enterprise phone equipment use a blade design.
PC's can be configured in a hot swap blade design.

Doug



--
FREE Unlimited Worldwide Voip calling
set-up an account and start saving today!
http://www.voippages.com ext. 7000
http://www.pulver.com/fwd/ ext. 83740
free IP phone software @
http://www.xten.com/
http://iaxclient.sourceforge.net/iaxcomm/


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Rich Adamson
  Using another load-balancing box (F5 or whatever) only moves the problem
  to that box. Duplicating it, moves the problem to another box, until
  the costs exponentially grow beyond the initial intended value of the
  solution. The weak points become lots of other boxes and infrastructure, 
  suggesting that asterisk really isn't the weakest point (regardless of 
  what its built on).
 
 Rich is hitting the main point in designing anything for high
 reliability. So lets enumerate failures and then what if anything can be
 done to eliminate them.
 
 1. Line failures.
snip
 2. Hardware failure. 
snip
 3. Software failure.
 This could be any number of bugs not yet found or that will be
 introduced later.
snip
 4. Phones.

The primary points the questions were attempting to uncover are more
related to basic layer-2 and layer-3 issues (of all necessary components
in an end-to-end telephony implementation), and not just basic hardware
configurations.

Having spent a fair number of years working with corporations that have
attempted to build high-availability solutions, the typical engineering
approach is almost always oriented towards throwing more hardware at the
problem and not thinking about the basic layer-2/3/4 issues. (I don't have
an answer that I'm sponsoring either, just looking for comments from
those that intimately know the end-to-end impact of doing things like
hot-sparing or clustering.) I'm sure its fairly clear to most that
adding redundant supplies, ups, raid, etc, will improve the uptime of the
* box. However, once past throwing hardware at the server, where are
the pitfalls associated with hot-sparing or clustering * servers?

Several well-known companies have attempted products that swap MAC
addresses between machines (layer-2), hide servers behind a virtual
IP (layer-3), hide a cluster behind some form of load balancing hardware
(generally layer-2  3), etc. Most of those solutions end up creating yet 
another problem that was not considered in the original thought process. 
I.e., not well thought out. (Even Cisco with a building full of engineers
didn't initially consider the impact of flip-flopping between boxes
when hsrp was first implemented. And there still are issues with that
approach that many companies have witnessed first hand.)

Load balancers have some added value, but those that have had to deal
with a problem where a single system within the cluster is up but not
processing data would probably argue their actual value.

So, if one were to attempt either hot-sparing or clustering, are there
issues associated with sip, rtp, iax, nat and/or other asterisk protocols 
that would impact the high-availability design?

One issue that would _seem_ to be a problem are those installations that 
have to use canreinvite=no (meaning, even in a clustered environment 
those rtp sessions are going to be dropped with a server failure. Maybe
its okay to simply note the exceptions in a proposed high-availability
design.)

If any proposed design actually involved a different MAC address,
obviously all local sip phones would die since the arp cache timeout 
within the phones would preclude a failover. (Not cool.)

IBM (with their stack of AIX machines) and Tandem (with their non-stop
architecture) didn't throw clustered database servers at the problem.
Both had them, but not as a means of increasing the availability of the 
base systems.

Technology now supports 100 meg layer-2 pipes throughout a city at a
reasonable cost. If a cluster were split across mutiple buildings within 
a city, it certainly would be of interest to those that are responsible 
for business continuity planning. Are there limitations?

Someone mentioned the only data needed to be shared between clustered
systems was phone Registration info (and then quickly jumped to engineering
a solution for that). Is that the only data needed or might someone
need a ton of other stuff? (Is cdr, iax, dialplans, agi, vm, and/or
other dynamic data an issue that needs to be considered in a reasonable
high-availability design?)

Whether the objective is 2, 3, 4, or 5 nines is somewhat irrelavent. If
one had to stand in front of the President or Board and represent/sell
availability, they are going to assume end-to-end and not just the
server. Later, they are not going to talk kindly about the phone
system when your single F5 box died; or, (not all that unusual) you
say asterisk was up the entire time, its your stupid phones that couldn't 
find it!! (Or, you lost five hours of cdr data because of why???)

I'd have to guess there are probably hundreds on this list that can 
engineer raid drives, ups's for ethernet closet switches, protected
cat 5 cabling, and switch boxes that can move physical interfaces between
servers. But, I'd also guess there are far fewer that can identify many 
of the sip, rtp, iax, nat, cdr, etc, etc, issues. What are some of those
issues? (Maybe there aren't any?)

Rich


___

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-05 Thread Martin Bene
Hi Richard,

Load balancers have some added value, but those that have had to deal
with a problem where a single system within the cluster is up but not
processing data would probably argue their actual value.

I've done quite a lot of work with clustered/ha linux configurations. I
usualy try to keep additional boxes/hardware to an absolute minimum,
otherwise the newly introduced points of (hardware) failure tend to make the
whole exersize pointless. A solution I found to work quite well:

Software load balancer (using LVS) run as a HA service (ldirectord) on two of
the servers. This allows use of quite specific probes for the real servers
being balanced, so a server not correctly processing requests can be removed
from the list of active quite reliably. Since the director script is perl,
adding probes for protocols not supported in the default install is fairly
streightforward.

If any proposed design actually involved a different MAC address,
obviously all local sip phones would die since the arp cache timeout 
within the phones would preclude a failover. (Not cool.)

Arp cache timeouts usualy don't come into this: when moving a cluster IP
address to a different NIC (probaly on a different machine) you can broadcast
gratuitous arp packets on the affected ethernet segment; this updates the arp
caches of all connected devices and allows failovers far faster than arp
chache timeout. Notable exception: some firewalls can be quite paranoid wrt.
to arp updates and will NOT accept gratuitous arp packets. I've run into this
with a cluster installation with one of my customers.

Technology now supports 100 meg layer-2 pipes throughout a city at a
reasonable cost. If a cluster were split across mutiple 
buildings within a city, it certainly would be of interest to those 
that are responsible for business continuity planning. Are there
limitations?

I'm wary of split cluster configurations because often the need for multiple,
independent communication paths between cluster nodes gets overlooked or
ignored in these configurations, greatly increasing risk of split-brain
configurations, i.e. several nodes in the cluster thinking they're the only
online server and trying to take over services. This easily/usually leads to
a real mess (data corruption) that can be costly to clean up. When keeping
your nodes in physical proximity it's much easier to have, say, 2 network
links + one serial link between cluster nodes thus providing a very resilient
fabric for inter-cluster communications.

Someone mentioned the only data needed to be shared between clustered
systems was phone Registration info (and then quickly jumped 
to engineering a solution for that). Is that the only data needed or 
might someone need a ton of other stuff? (Is cdr, iax, dialplans, agi, 
vm, and/or other dynamic data an issue that needs to be considered in 
a reasonable high-availability design?)

Depends on what you want/need to fail over in case your asterisk box goes
down. in stages that'd be
1 (cluster) IP address for sip/h323 etc. services
2 voice mail, recordings, activity logs
3 registrations for connected VoIP clients
4 active calls (VoIP + PSTN)

For the moment, item 4 definitely isn't feasible; even if we get some
hardware to switch over E1/T1/PRI whatever interfaves, card or interface
initialisation will kill active calls. 

Item 2 would be plain file on-disk data; for an active/standby cluster
replicating these should be pretty straigthforward using either shared
storage or an apropriate filesystem/blockdevice replication system. I've
personaly had good experience with drbd (block device replication over the
network; only supports 2 nodes in active/standby configuration but works
quite well for that.)

Item 3 should also feasible; this information is already persistent over
asterisk restarts and seems to be just a berkley db file for a default
install. Sme method as for item 2 should work.

I'd have to guess there are probably hundreds on this list that can 
engineer raid drives, ups's for ethernet closet switches, protected
cat 5 cabling, and switch boxes that can move physical 
interfaces between servers. But, I'd also guess there are far fewer 
that can identify many of the sip, rtp, iax, nat, cdr, etc, etc, 
issues. What are some of those issues? (Maybe there aren't any?)

Since I'm still very much an asterisk beginner I'll have to pass on  this
one; However, I'm definitely going to do some experiments on my test cluster
systems with asterisk to just see what breaks when failing over asterisk
services.

Also, things get MUCH more interesting when yo start to move from plain
active/standby to active/active configurations: here, for failover, you'll
end up with the registration and file data from the failed server and need to
integrate that into an already running server merging the seperate sets of
information - preferably without trashing the running server :-)

Bye, Martin

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Doug Shubert
I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.

In our network, Linux is approaching
Enterprise Class and I don't see why *
could not achieve this in the near future.


Steven Critchfield wrote:

 On Sun, 2004-01-04 at 04:35, EDWARD WILSON wrote:
  Does anyone know what the hardware requirements would be to build an
  Enterprise Asterisk Universal Gateway ?  I am thinking of something
  comprable to the Cisco AS5xxx Series of gateways.

 Just to prepare you, if you ask the above question, you are not ready to
 ask the above question.

 Basically it falls down to the problem of what is needed to be done, and
 more so what is considered enterprise level hardware to be run upon.
 --
 Steven Critchfield [EMAIL PROTECTED]

 ___
 Asterisk-Users mailing list
 [EMAIL PROTECTED]
 http://lists.digium.com/mailman/listinfo/asterisk-users

--
FREE Unlimited Worldwide Voip calling
set-up an account and start saving today!
http://www.voippages.com ext. 7000
http://www.pulver.com/fwd/ ext. 83740
free IP phone software @
http://www.xten.com/
http://iaxclient.sourceforge.net/iaxcomm/


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Andrew Kohlsmith
 I would set the Enterprise Class bar at five 9's reliability
 (about 5.25 minutes per year of down time) the same
 as a Class 4/5 phone switch. This would require redundant
 design considerations in both hardware and software.

My Norstar Meridian system has nowhere near this.  We get about 5 minutes 
downtime every month (usually trunk card issues).

Not arguing against anything you've said, just making a datapoint.

Regards,
Andrew
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread WipeOut
Doug Shubert wrote:

I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
In our network, Linux is approaching
Enterprise Class and I don't see why *
could not achieve this in the near future.
 

Asterisk would need some kind of clustering/load balancing ability 
(Single IP system image for the IP phones across multiple servers) to be 
truely Enterprise Class in terms of both reliability and 
scaleability..  Obviously that would not be as relevent for the analog 
hard wired phones unless the channelbanks and T1/E1 lines could be 
automatically switched to another server..

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Olle E. Johansson
Andrew Kohlsmith wrote:
I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.

To turn around, let's discuss what we need to focus on to get
Asterisk there:
Here's a few bullet points, there's certainly a lot more
* Linux platform stability - how?
** Special demands when using Zaptel cards
* Redundancy architecture
* Development/stable release scheme
Then we have some channel demands, like
* Better support for SRV records in the SIP channel
More?

/O

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson
 I would set the Enterprise Class bar at five 9's reliability
 (about 5.25 minutes per year of down time) the same
 as a Class 4/5 phone switch. This would require redundant
 design considerations in both hardware and software.
 
 In our network, Linux is approaching
 Enterprise Class and I don't see why *
 could not achieve this in the near future.

Linux might approach that, but * as an application won't in its present
design for lots of reasons that have already been discussed. I'd be
reasonable certain (you're right) it will head that direction, it just 
happens to not be there today. On the surface, I've not heard of
anyone that is actually addressing it either.


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Steven Critchfield
On Sun, 2004-01-04 at 10:14, Doug Shubert wrote:
 I would set the Enterprise Class bar at five 9's reliability
 (about 5.25 minutes per year of down time) the same
 as a Class 4/5 phone switch. This would require redundant
 design considerations in both hardware and software.
 
 In our network, Linux is approaching
 Enterprise Class and I don't see why *
 could not achieve this in the near future.

I may be wrong, but I think the 5 9's relates to full system not to
individual pieces especially when talking about a class4/5 switch. On a
small scale deployment, that will be a problem as you won't implement
full redundancy. Redundancy adds quite a bit to the cost of your
deployment. 

As far as linux goes, it is at that level if you put forth the effort to
make it's environment decent. I have multiple machines approaching 2
years of uptime, and many over a year of uptime. I have not had a
machine in my colo space go down since we removed the one machine with a
buggy NIC.

So next step, is asterisk. Outside of a couple of deadlocks from kernel
problems when I was compiling new modules, I haven't had asterisk knock
over while doing normal calls.

The downtime could have been dealt with by having some redundancy in the
physical lines. I would have lost the calls on the line, but the calls
could be reconnected immediately. 

I can say up front that I have asterisk installs running multiple months
without problems. 
-- 
Steven Critchfield [EMAIL PROTECTED]

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson
 Andrew Kohlsmith wrote:
 I would set the Enterprise Class bar at five 9's reliability
 (about 5.25 minutes per year of down time) the same
 as a Class 4/5 phone switch. This would require redundant
 design considerations in both hardware and software.
  
 
 To turn around, let's discuss what we need to focus on to get
 Asterisk there:
 
 Here's a few bullet points, there's certainly a lot more
 * Linux platform stability - how?
 ** Special demands when using Zaptel cards
 * Redundancy architecture
 * Development/stable release scheme
 
 Then we have some channel demands, like
 * Better support for SRV records in the SIP channel
 
 More?

Better sip phone support for primary/secondary proxy (and failover)
 (note: some phones don't support a second proxy at all; some say they
  do, but fail at it.)

Maybe some sort of HSRP (hot spare standby protocol, or whatever)

Some form of dynamic config sharing between pri/sec systems

Won't mention external pstn line failover as that's sort of a separate
  topic, or loss of calls in flight, etc.

I'd guess part of the five-9's discussion centers around how automated
must one be to be able to actually get close?  If one assumes the loss
of a SIMM the answer/effort certainly is different then assuming the 
loss of a single interface card (when multiples exist), etc.

I would doubt that anyone reading this list actually have a justifiable
business requirement for five-9's given the expontential cost/effort
involved to get there. But, setting some sort of reasonable goal
that would focus towards failover within xx number of seconds (and
maybe some other conditions) seems very practical. 



___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread WipeOut
Steven Critchfield wrote:

On Sun, 2004-01-04 at 10:14, Doug Shubert wrote:
 

I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
In our network, Linux is approaching
Enterprise Class and I don't see why *
could not achieve this in the near future.
   

I may be wrong, but I think the 5 9's relates to full system not to
individual pieces especially when talking about a class4/5 switch. On a
small scale deployment, that will be a problem as you won't implement
full redundancy. Redundancy adds quite a bit to the cost of your
deployment. 

As far as linux goes, it is at that level if you put forth the effort to
make it's environment decent. I have multiple machines approaching 2
years of uptime, and many over a year of uptime. I have not had a
machine in my colo space go down since we removed the one machine with a
buggy NIC.
So next step, is asterisk. Outside of a couple of deadlocks from kernel
problems when I was compiling new modules, I haven't had asterisk knock
over while doing normal calls.
The downtime could have been dealt with by having some redundancy in the
physical lines. I would have lost the calls on the line, but the calls
could be reconnected immediately. 

I can say up front that I have asterisk installs running multiple months
without problems. 
 

Steven,

You often mention your servers uptime, I am assuming you don't count 
reboots since you must have had to patch your kernel at least a few 
times in the last year and the reboot would have reset your uptime..

If that is the case then I have a server that is also around the 2 year 
uptime mark.. The longest single runtime between reboots for updated 
kernels is only 127 days.. :)

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread WipeOut
Rich Adamson wrote:

Andrew Kohlsmith wrote:
   

I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.
   

To turn around, let's discuss what we need to focus on to get
Asterisk there:
Here's a few bullet points, there's certainly a lot more
* Linux platform stability - how?
** Special demands when using Zaptel cards
* Redundancy architecture
* Development/stable release scheme
Then we have some channel demands, like
* Better support for SRV records in the SIP channel
More?
   

Better sip phone support for primary/secondary proxy (and failover)
(note: some phones don't support a second proxy at all; some say they
 do, but fail at it.)
Maybe some sort of HSRP (hot spare standby protocol, or whatever)

Some form of dynamic config sharing between pri/sec systems

Won't mention external pstn line failover as that's sort of a separate
 topic, or loss of calls in flight, etc.
I'd guess part of the five-9's discussion centers around how automated
must one be to be able to actually get close?  If one assumes the loss
of a SIMM the answer/effort certainly is different then assuming the 
loss of a single interface card (when multiples exist), etc.

I would doubt that anyone reading this list actually have a justifiable
business requirement for five-9's given the expontential cost/effort
involved to get there. But, setting some sort of reasonable goal
that would focus towards failover within xx number of seconds (and
maybe some other conditions) seems very practical. 

 

A failover system does not solve the scalability issue.. which means 
that you have a full server sitting there doing nothing most of the time 
when if the load were being balanced across the servers in a cluster 
senario you would also have the scalability..

Also a failover system would typically only be 2 servers, if there were 
a cluster system there could be 10 servers in which case five 9's should 
be easy..

Later..

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread James Sharp
 Andrew Kohlsmith wrote:
I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.


 To turn around, let's discuss what we need to focus on to get
 Asterisk there:

 Here's a few bullet points, there's certainly a lot more
 * Linux platform stability - how?

 Even more than Linux itself is the x86 platform... I've thought about this
 a bit when considering * boxes for big customers.  When one actually comes
 along, I'll have to actually make a decision :-).
From where I stand, the best thing to do for smaller customers is give
 them a box with RAID and redundant power supplies, if they can afford it.

You can overcome most of those problems by buying good quality hardware. 
If you buy your * server from your local Taiwanese clone shop, you're
asking for trouble.  A big, beefy machine from Dell would be better.

 But if I were to have a big customer with deep pockets, I'd really like *
 on a big Sun beast with redundant-everything (i.e. you can hot swap any
 component and there's usually n+1 of everything).  The problem is that I
 don't think there's any Solaris support for Digium cards, since it's kind
 of  a chicken-and-egg problem.

Nope.  No Solaris support, but you might be able to get away with
Linux/Solaris...but then you lose a lot of the hot-swapability.  In my
experience, though, the only things I've ever been able to hotswap were
power supplies and hard drives...and thats not software/os dependant.

 One of these days, I may convince myself to buy a modern Sun box (maybe
 the ~$1000 Blade 100s) and see what can be done.  The only problem I could
 conceive would be endian-ness, but I read about Digium cards in a PowerPC
 box, so that won't be a problem, right?
 Nick

Endian-ness is really only a driver issue.  Its when programmers who
believe that the world revolves around Linux/i386 that you have problems.

Personally, I'd stick my Digium cards into an Alpha of some sort.  A
DS-10L for 1U mounting with 1 card or a DS-20 for multiple cards where you
need lots of processor zoobs.
___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Steven Critchfield
On Sun, 2004-01-04 at 13:28, WipeOut wrote:
 Steven Critchfield wrote:
 
 On Sun, 2004-01-04 at 10:14, Doug Shubert wrote:
   
 
 I would set the Enterprise Class bar at five 9's reliability
 (about 5.25 minutes per year of down time) the same
 as a Class 4/5 phone switch. This would require redundant
 design considerations in both hardware and software.
 
 In our network, Linux is approaching
 Enterprise Class and I don't see why *
 could not achieve this in the near future.
 
 
 
 I may be wrong, but I think the 5 9's relates to full system not to
 individual pieces especially when talking about a class4/5 switch. On a
 small scale deployment, that will be a problem as you won't implement
 full redundancy. Redundancy adds quite a bit to the cost of your
 deployment. 
 
 As far as linux goes, it is at that level if you put forth the effort to
 make it's environment decent. I have multiple machines approaching 2
 years of uptime, and many over a year of uptime. I have not had a
 machine in my colo space go down since we removed the one machine with a
 buggy NIC.
 
 So next step, is asterisk. Outside of a couple of deadlocks from kernel
 problems when I was compiling new modules, I haven't had asterisk knock
 over while doing normal calls.
 
 The downtime could have been dealt with by having some redundancy in the
 physical lines. I would have lost the calls on the line, but the calls
 could be reconnected immediately. 
 
 I can say up front that I have asterisk installs running multiple months
 without problems. 
   
 
 Steven,
 
 You often mention your servers uptime, I am assuming you don't count 
 reboots since you must have had to patch your kernel at least a few 
 times in the last year and the reboot would have reset your uptime..

Why do you assume I would have to patch a kernel? Not all machines must
run the most current kernels, and some kernels can be such that they are
sufficiently minimal enough to present low risk. Plus all the recent
problems require a local user to exploit. I subscribe to the theory to
only give access to critical machines to people I can quickly level a
shotgun to their head. With that knowledge, and my users acknowledgment
or witness to my accuracy, they don't wish to screw with the systems. 

BTW, my accuracy goes up with the number of concurrent targets by about
4 percent. 

 If that is the case then I have a server that is also around the 2 year 
 uptime mark.. The longest single runtime between reboots for updated 
 kernels is only 127 days.. :)

I have 2 machines at this moment that are halfway to looping the uptime
counter again at 497 days.

Webserver is at 497 + 197 days
Old almost decommissioned file server is at 497 + 194 days
A VPN machine is at 414 days
DB server is at 245 days
A almost decommissioned distro server is at 497 + 165 days


due to some upgrades, I now have fewer machines holding high uptimes. My
mail server was updated just over 2 months ago and it was swapped to the
distro server. So the distro server that is about to be decommissioned
is really just waiting for me to go take it out of the rack. 

Those are real uptimes with no reboots. What makes those 4 machines with
more than a year uptime interesting is that 1 is a dell, one is a
supermicro, the other 2 are homebuilt systems. So I can attest to x86
being able to be stable. Maybe not always, and I would like some more
swappable parts.
-- 
Steven Critchfield [EMAIL PROTECTED]

___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Nick Bachmann
 Andrew Kohlsmith wrote:
I would set the Enterprise Class bar at five 9's reliability
(about 5.25 minutes per year of down time) the same
as a Class 4/5 phone switch. This would require redundant
design considerations in both hardware and software.


 To turn around, let's discuss what we need to focus on to get
 Asterisk there:

 Here's a few bullet points, there's certainly a lot more
 * Linux platform stability - how?

 Even more than Linux itself is the x86 platform... I've thought about
 this a bit when considering * boxes for big customers.  When one
 actually comes along, I'll have to actually make a decision :-).
From where I stand, the best thing to do for smaller customers is give
 them a box with RAID and redundant power supplies, if they can afford
 it.

 You can overcome most of those problems by buying good quality
 hardware.  If you buy your * server from your local Taiwanese clone
 shop, you're asking for trouble.  A big, beefy machine from Dell would
 be better.

Yeah, but nothing like a nice, big Sun machine.  A cluster of Dell
machines is reliable, but a midrange Sun box puts them to shame.
 But if I were to have a big customer with deep pockets, I'd really
 like * on a big Sun beast with redundant-everything (i.e. you can hot
 swap any component and there's usually n+1 of everything).  The
 problem is that I don't think there's any Solaris support for Digium
 cards, since it's kind of  a chicken-and-egg problem.

 Nope.  No Solaris support, but you might be able to get away with
 Linux/Solaris...but then you lose a lot of the hot-swapability.  In my
 experience, though, the only things I've ever been able to hotswap were
 power supplies and hard drives...and thats not software/os dependant.

With the big boxes like the 4800, you can hot swap CPUs and memory and
such as well.  You're right that all that stuff is pretty
Solaris-dependent, which is why I wanted to see if I couldn't get Asterisk
to run on a little Solaris machine (and then sell it to people who own the
big ones).
 One of these days, I may convince myself to buy a modern Sun box
 (maybe the ~$1000 Blade 100s) and see what can be done.  The only
 problem I could conceive would be endian-ness, but I read about Digium
 cards in a PowerPC box, so that won't be a problem, right?
 Nick

 Endian-ness is really only a driver issue.  Its when programmers who
 believe that the world revolves around Linux/i386 that you have
 problems.

But it can also be a problem if you have on-card firmware, I've heard.

 Personally, I'd stick my Digium cards into an Alpha of some sort.  A
 DS-10L for 1U mounting with 1 card or a DS-20 for multiple cards where
 you need lots of processor zoobs.

I like the Alphas too, but they're being discontinued last I heard, and
being replaced with the Itanium.  Even VMS is being ported (now _there's_
an OS for * :-)
Nick


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson

 I'd guess part of the five-9's discussion centers around how automated
 must one be to be able to actually get close?  If one assumes the loss
 of a SIMM the answer/effort certainly is different then assuming the 
 loss of a single interface card (when multiples exist), etc.
 
 I would doubt that anyone reading this list actually have a justifiable
 business requirement for five-9's given the expontential cost/effort
 involved to get there. But, setting some sort of reasonable goal
 that would focus towards failover within xx number of seconds (and
 maybe some other conditions) seems very practical. 
 
   
 
 A failover system does not solve the scalability issue.. which means 
 that you have a full server sitting there doing nothing most of the time 
 when if the load were being balanced across the servers in a cluster 
 senario you would also have the scalability..
 
 Also a failover system would typically only be 2 servers, if there were 
 a cluster system there could be 10 servers in which case five 9's should 
 be easy..

Everyone's response to Olle's proposition are of value including yours.

For those that have been involved with analyzing the requirments to
achive five-9's (for anything), there are tons of approaches, and each 
approach comes with some sort of cost/benefit trade off. Once the approaches
have been documented and costs associated with them, it's common for
the original requirements to be redefined in terms of something that is
more realistic in business terms. Whether that is clustering, hot standby,
or another approach is largely irrelavent at the beginning of the process.

If you're a sponsor of clustering and your forced to use canreinvite=no, 
lots of people would be unhappy when their RTP system died. I'm not
suggesting clustering is a bad choice, only suggesting there are lots
of cost/benefit trade-offs that are made on an individual basis and there
might be more then one answer to reliability/uptime question.

In an earlier post, you mentioned a single IP address issue. That's really
not an issue in some cases as a virtual IP (within a cluster) may be
perfectly fine (canreinvite=yes), etc. Pure guess is that use of a virtual
IP forces some other design choices like the need for a layer-3 box
(since virtual IP's won't fix layer-2 problems), and probably revisiting
RTP standards. (And, if we only have one layer-3 box, guess we need to get
another for uptime, etc, etc.)

Since hardware has become increasingly more reliable, infrastructure items
less expensive, uptimes moving towards larger numbers, software more
reliable (in very general terms over years), using a hot spare approach
could be just as effective as a two-box cluster. In both cases, part of
the problem boils down to assumptions about external interfaces and how
to move those interfaces between two or more boxes; and, what design
requirements one states regardling calls in progress.

(Olle, are you watching?)

1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is 
mostly trevial, however what signal is needed to detect a system failure 
and move the physical connection to a second machine/interface? (If there 
are three systems in a cluster, what signal is needed? If a three-way 
switch is reqquired, does someone want to design, build, and sell it to 
users? Any need to discuss a four-way switch? Should there be a single
switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)

Since protecting calls in progress (under all circumstances and 
configurations) is likely the most expensive and most difficult to achive,
we can probably all agree that handling this should be left to some
future long-range plan. Is that acceptable to everyone?

2. In a hot-spare arrangement (single primary, single running secondary),
what static and/or dynamic information needs to be shared across the
two systems to maintain the best chance of switching to the secondary
system in the shortest period of time, and while minimizing the loss of
business data? (Should this same data be shared across all systems in
a cluster if the cluster consists of two or more machines?)

3. If a clustered environment, is clustering based on IP address or MAC
address?
   a. If based on an IP address, is a layer-3 box required between * and
  sip phones? (If so, how many?)
   b. If based on MAC address, what process moves an active * MAC address
  to a another * machine (to maintain connectivity to sip phones)?
   c. Should sessions that rely on a failed machine in a cluster simply
  be dropped?
   d. Are there any realistic ways to recover RTP sessions in a clustered
  environment when a single machine within the cluster fails, and RTP
  sessions were flowing through it (canreinvite=no)?
   e. Should a sip phone's arp cache timeout be configurable?
   f. Which system(s) control the physical switch in #1 above?
   g. Is sharing static/dynamic operational data across some sort of
  high-availability 

Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread James Sharp
 1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is
 mostly trevial, however what signal is needed to detect a system failure
 and move the physical connection to a second machine/interface? (If there
 are three systems in a cluster, what signal is needed? If a three-way
 switch is reqquired, does someone want to design, build, and sell it to
 users? Any need to discuss a four-way switch? Should there be a single
 switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)

Simple idea:  Have a process on each machine pulse a lead-state (something
a s simple as DTR out a serial port or a single data line on a parallel
port) out to an external box.  This box is strictly discrete hardware and
built with timeout that is retriggered by the pulse.  When the pulse fails
to arrive, the box switches the T1 over to the backup system.


 Since protecting calls in progress (under all circumstances and
 configurations) is likely the most expensive and most difficult to achive,
 we can probably all agree that handling this should be left to some
 future long-range plan. Is that acceptable to everyone?

Its going to be almost impossible to preserve calls in progress.  If you
switch a T1 from one machine to the other, there's going to either going
to be a lack of sync (ISDN D-channels need to come up, RBS channels need
to wink) that's going to result in the loss of the call.

 2. In a hot-spare arrangement (single primary, single running secondary),
 what static and/or dynamic information needs to be shared across the
 two systems to maintain the best chance of switching to the secondary
 system in the shortest period of time, and while minimizing the loss of
 business data? (Should this same data be shared across all systems in
 a cluster if the cluster consists of two or more machines?)

 3. If a clustered environment, is clustering based on IP address or MAC
 address?
a. If based on an IP address, is a layer-3 box required between * and
   sip phones? (If so, how many?)

Yes.  You'll need something like Linux Virtual Server or an F5 load
balancing box to make this happen.  You can play silly games with round
robin DNS, but it doesn't handle failure well.

b. If based on MAC address, what process moves an active * MAC address
   to a another * machine (to maintain connectivity to sip phones)?

Something like Ultra Monkey (http://www.ultramonkey.org)

c. Should sessions that rely on a failed machine in a cluster simply
   be dropped?
d. Are there any realistic ways to recover RTP sessions in a clustered
   environment when a single machine within the cluster fails, and RTP
   sessions were flowing through it (canreinvite=no)?
e. Should a sip phone's arp cache timeout be configurable?

Shouldn't need to worry about that unless the phone is on the same
physical network segment.

f. Which system(s) control the physical switch in #1 above?

A voting system...all systems control it.  It is up to the switch to
decide who isn't working right.

g. Is sharing static/dynamic operational data across some sort of
   high-availability hsrp channel acceptable, or, should two or more
   database servers be deployed?

DB Server clustering is a fairly solid technology these days.  Deploy a DB
cluster if you want.

 4. If a firewall/nat box is involved, what are the requirements to detect
and handle a failed * machine?
a. Are the requirements different for hot-spare vs clustering?
b. What if the firewall is an inexpensive device (eg, Linksys) with
   minimal configuration options?
c. Are the nat requirements within * different for clustering?

 5. Should sip phones be configurable with a primary and secondary proxy?
a. If the primary proxy fails, what determines when a sip phone fails
   over to the secondary proxy?

Usually a simple timeout works for this..but if your clustering/hot-spare
switch works right...the client should never need to change.


b. After fail over to the secondary, what determines when the sip phone
   should switch back to the primary proxy? (Is the primary ready to
   handle production calls, or is it back ready for a system admin to
   diagnose the original problem in a non-production manner?)

Auto switch-back is never a good thing.  Once a system is taken out of
service by an automated monitoring system, it should be up to human
intervention to say that it is ready to go back into service.


___
Asterisk-Users mailing list
[EMAIL PROTECTED]
http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Hardware to build an Enterprise AsteriskUniversal Gateway

2004-01-04 Thread Rich Adamson
The comments below are certainly not intended as any form of negativism,
but rather to pursue thought processes for redundant systems.

  1. Moving a physical interface (whether a T1, ethernet or 2-wire pstn) is
  mostly trivial, however what signal is needed to detect a system failure
  and move the physical connection to a second machine/interface? (If there
  are three systems in a cluster, what signal is needed? If a three-way
  switch is required, does someone want to design, build, and sell it to
  users? Any need to discuss a four-way switch? Should there be a single
  switch that flip-flops all three at the same time (T1, Ethernet, pstn)?)
 
 Simple idea:  Have a process on each machine pulse a lead-state (something
 a s simple as DTR out a serial port or a single data line on a parallel
 port) out to an external box.  This box is strictly discrete hardware and
 built with timeout that is retriggered by the pulse.  When the pulse fails
 to arrive, the box switches the T1 over to the backup system.

And upon partial restoration of the failed system, should it automatically
fall back to the primary? Or, might there be some element of human 
control that would suggest not falling back until told to do so?

  Since protecting calls in progress (under all circumstances and
  configurations) is likely the most expensive and most difficult to achieve,
  we can probably all agree that handling this should be left to some
  future long-range plan. Is that acceptable to everyone?
 
 Its going to be almost impossible to preserve calls in progress.  If you
 switch a T1 from one machine to the other, there's going to either going
 to be a lack of sync (ISDN D-channels need to come up, RBS channels need
 to wink) that's going to result in the loss of the call.

What about calls in progress between two sip phones (and cdr records)?
 
  2. In a hot-spare arrangement (single primary, single running secondary),
  what static and/or dynamic information needs to be shared across the
  two systems to maintain the best chance of switching to the secondary
  system in the shortest period of time, and while minimizing the loss of
  business data? (Should this same data be shared across all systems in
  a cluster if the cluster consists of two or more machines?)
 
  3. If a clustered environment, is clustering based on IP address or MAC
  address?
 a. If based on an IP address, is a layer-3 box required between * and
sip phones? (If so, how many?)
 
 Yes.  You'll need something like Linux Virtual Server or an F5 load
 balancing box to make this happen.  You can play silly games with round
 robin DNS, but it doesn't handle failure well.

Agreed, but then one would need two F5 boxes as it would become the new
single point of failure.
 
 b. If based on MAC address, what process moves an active * MAC address
to a another * machine (to maintain connectivity to sip phones)?
 
 Something like Ultra Monkey (http://www.ultramonkey.org)
 
 c. Should sessions that rely on a failed machine in a cluster simply
be dropped?
 d. Are there any realistic ways to recover RTP sessions in a clustered
environment when a single machine within the cluster fails, and RTP
sessions were flowing through it (canreinvite=no)?
 e. Should a sip phone's arp cache timeout be configurable?
 
 Shouldn't need to worry about that unless the phone is on the same
 physical network segment.

Which in most cases where asterisk is deployed (obviously not all) is 
probably the case.
 
 f. Which system(s) control the physical switch in #1 above?
 
 A voting system...all systems control it.  It is up to the switch to
 decide who isn't working right.

With probably some manual over-ride since we know that systems can 
appear to be ready for production, but the sys admin says its not ready
due to any number of valid technical reasons.
 
 g. Is sharing static/dynamic operational data across some sort of
high-availability hsrp channel acceptable, or, should two or more
database servers be deployed?
 
 DB Server clustering is a fairly solid technology these days.  Deploy a DB
 cluster if you want.

Which gets to be rather expensive, adds complexity, and additional
points of failure (decreasing the ability to approach five/four-9's).
 
  4. If a firewall/nat box is involved, what are the requirements to detect
 and handle a failed * machine?
 a. Are the requirements different for hot-spare vs clustering?
 b. What if the firewall is an inexpensive device (eg, Linksys) with
minimal configuration options?
 c. Are the nat requirements within * different for clustering?
 
  5. Should sip phones be configurable with a primary and secondary proxy?
 a. If the primary proxy fails, what determines when a sip phone fails
over to the secondary proxy?
 
 Usually a simple timeout works for this..but if your clustering/hot-spare
 switch works right...the client should