DNS as a primary method for facilitating communication between endpoints,
sipXecs servers, and gateways has proven to be an excellent method of
enabling load balancing and, to a lesser extent, redundancy of sipXecs
services.

As it is currently implemented in sipXecs, DNS is the root cause of many
outages and issues. This is because DNS configuration for proper sipXecs
operation is complex for most network engineers/administrators and is very
difficult for most telecom engineers to understand. This will only become
more complex as future versions of sipXecs will add the capability for many
more servers to be added to each sipXecs cluster, each possibly being
deployed at a remote site to be used for survivability or load balancing. It
has also been observed that individual sipXecs processes are making
unnecessarily large amounts of DNS queries which can result in network
congestion and extra load on the sipXecs cluster.

All sipXecs services require proper DNS records to facilitate interservice
communication. The current method sipXecs services utilize for learning
these records is to query DNS at seemingly random times:
"2011-06-15T21:19:15.062826Z":9:SIP:WARNING:uc.example.com:SipSrvLookupThread-24:4284E940:SipXProxy:"DNS
query for name '_sip._tls.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:15.062883Z":10:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:15.062937Z":11:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:15.067976Z":13:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:15.068022Z":14:SIP:WARNING:uc.example.com:SipSrvLookupThread-24:4284E940:SipXProxy:"DNS
query for name '_sip._tls.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:15.068162Z":15:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:40.386864Z":22:SIP:WARNING:uc.example.com:SipSrvLookupThread-24:4284E940:SipXProxy:"DNS
query for name '_sip._tls.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:40.386904Z":23:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:40.386919Z":24:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:45.355097Z":27:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:45.355154Z":28:SIP:WARNING:uc.example.com:SipSrvLookupThread-24:4284E940:SipXProxy:"DNS
query for name '_sip._tls.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:45.355168Z":29:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:45.360783Z":31:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:45.360851Z":32:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:45.360942Z":33:SIP:WARNING:uc.example.com:SipSrvLookupThread-24:4284E940:SipXProxy:"DNS
query for name '_sip._tls.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:53.439959Z":44:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:53.440031Z":45:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:53.440075Z":46:SIP:WARNING:uc.example.com:SipSrvLookupThread-24:4284E940:SipXProxy:"DNS
query for name '_sip._tls.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:53.442692Z":47:SIP:WARNING:uc.example.com:SipSrvLookupThread-23:4274D940:SipXProxy:"DNS
query for name '_sip._tcp.example.com', type = 33 (SRV): returned error"
"2011-06-15T21:19:53.442697Z":48:SIP:WARNING:uc.example.com:SipSrvLookupThread-22:4264C940:SipXProxy:"DNS
query for name '_sip._udp.example.com', type = 33 (SRV): returned error"

While the traffic and load generated by all of these requests is generally
tolerable it should be noted that:

   - If any of the expected record types, such as A, SRV, TLS, etc. are
   missing, even if they are not necessary, then a log entry is written every
   time there is a lookup for one of these values in every service. Because
   these lookups are happening constantly log files can become large rather
   quickly.
   - Because each service is constantly performing DNS lookups, if the
   primary DNS server or caching DNS server has an unexpected outage or is
   experiencing delays then these issues directly affect signaling capabilities
   between services and as a result can cause service issues.
   - This behavior negates the benefit of TTL values for DNS records .

It is apparent that the amount of unnecessary DNS queries should be
eliminated in favor of a much more conservative query approach that utilizes
TTL values received from the DNS server. This would result in much less
query traffic and would possibly reduce the load on sipXecs services. Some
considerations for taking this approach are if a DNS value was changed
shortly before an individual service restart and the TTL has not yet expired
on the remaining running services then there could be a potentially fatal
conflict of DNS records for services that have not yet updated their
internal DNS information. A possible solution for this conflict is that if
an individual sipXecs service is restarted or reloaded then sipXsupervisor
could send a signal to all running DNS dependent processes to force a DNS
refresh.

*Linux DNS subsystem failover*
The default behavior of the DNS client subsystem in Linux is to wait a full
5 seconds for a response from the primary DNS server before attempting to
contact the secondary server. In many cases by the time this 5 second delay
has been reached the request has already timed out and the signaling has
been cancelled with an error.

The resolution to this problem at first glance seems simple. According to
http://forums.whirlpool.net.au/archive/592813 a few small configuration
changes to /etc/resolv.conf will allow for a much shorter timeout in order
to fail over to the secondary DNS server. This will require further testing
to ensure this resolves the DNS failover issue.

*Simplified DNS Management from Administration Web Interface*
Advanced DNS configurations are a requirement for proper branch site setups.
These configurations are usually beyond the technical capabilities of many
telecom engineers and even many network administrators and engineers. To
accommodate these groups it will be necessary simplify DNS management and
centralize DNS configuration within sipXecs. Simplification of DNS
configurations is the eventual goal as there are many documented cases where
DNS was the root of a system outage or issue.

In anticipation for the 4.6 or 4.8/5.0 release of sipXecs with mongoDB
database backend,
http://wiki.sipfoundry.org/display/sipXecs/Location+based+DNS+views+for+sipXecs+using+BINDgives
an outline of proper branch site DNS configuration however the only
comprehensive management tool for configuration of this type is
http://www.webmin.com which is not entirely intuitive with its approach to
DNS but does provide a complete package for managing existing DNS zones and
views. Some thoughts for adding a comprehensive DNS management suite into
sipXecs are:

   - Create a default view for subnets that are not assigned to a particular
   branch.
   - Create a view for each branch that a server is assigned to and copy the
   zone from the default view to the new view assigned to the branch, then
   allow administrator to define server priorities per branch.
   - Assign subnets to branches, which will in turn be applied to the view
   assigned to each branch.

Views only need to be configured on the primary DNS server. Secondary DNS
servers located on secondary sipXecs servers, when set up as slave zones to
the primary server, will retrieve the zone that has been assigned to the
branch/view that the secondary sipXecs server is assigned to.

It should also be noted that Microsoft Windows DNS server does not currently
have a function similar to BIND views so it is not yet known how sipXecs
will address this issue in large multi-site deployments. Some thoughts for
addressing this issue is if there is a windows DC that runs DNS for a branch
site then that DC can be set up to perform a forward lookup to the sipXecs
DNS system, which would return the zone connected to the view that’s
assigned to the subnet the DC is on.

Please post comments and ideas on this subject. I'd like to see what the
community and the engineers think is the best way to approach these issues.

Thanks!
_______________________________________________
sipx-dev mailing list
[email protected]
List Archive: http://list.sipfoundry.org/archive/sipx-dev/

Reply via email to