[Asterisk-Users] Something every TDMP user should know

2005-05-12 Thread Damian Funnell




Hi team,

Not long ago a bunch of us were posting reports of a strange phenomenon
where voice quality would pack up completely from time to time,
typically resulting in loud crackling on the line and/or the voice
channel breaking up completely. With our installation it would occur
from time to time, typically when the * server was at it's busiest.

Most of the time this problem would result in all users having to
terminate their calls and re-establish them.

After a lot of (very frustrating) troubleshooting we have have now gone
two weeks without a re-occurrence of the problem and we are hoping that
we may have finally resolved it altogether. I wanted to post a quick
summary of the steps that we have taken to resolve this issue and what
we think the problem turned out to be, as (from the number of responses
to my last posts about this issue), it sounds like a few people have
been experiencing it, so hopefully our experiences will help.

The * server in question is based on a single-processor IBM xSeries 205
with a gig of RAM, SCSI 320 HDD's (RAID 1) and Red Hat ES 3. It uses
ISDN (via CAPI and a four port Eicon Diva Pro Server card) and a
mixture of SIP and analogue extensions.

A TDM400P with four FXS ports supports the four analogue extensions
(all Uniden cordless phones) and the SIP handsets consist of a mixture
of BT102's and SNOM190's.

Our turning point with this issue came when we bit the bullet and
purchased a support incident from Digium. By this stage we had spent
dozens and dozens of hours trying unsuccessfully to research and
diagnose the problem and still had no accurate idea of what was causing
it. Several people replied to our posts to this list saying that they
were having a very similar issue as well, but no one had a clue what
was causing it.

Digium support zeroed in on the issue fairly quickly and we got the
*distinct* impression that they have seen this problem many times
before. They instantly got us to look at the output of zttest and we
found that this was (in their words) 'extremely low', with 'best' and
'worst' readings of 99.975586% and 99.963379% respectively. They told
us that we needed to be getting at least 99.98% and recommended that we:


  Check that the TDMP is on it's own
IRQ (much to our embarrassment our card wasn't at the time, so we had
to play with it a bit to get it to occupy a unique IRQ).
  Disable hyper threading on the Xeon
CPU.
  Uninstall our SCSI hardware and
replace it with IDE hardware.
  Upgrade to the latest stable releases
of Asterisk, Zaptel and Libpri.

We made changes 1 and 2 in the above list
and are prepared to make changes 3 and 4 if we find the problem hasn't
gone away. It hasn't happened in over two weeks now (after occuring
many times per day for a while), so we hopefully won't have to throw
out our SCSI hardware. After we made each change (1 and 2 were made
about two weeks apart from each other) we found that the quality
improved, with the incidence of the issue halving after '1' and
disappearing (hopefully for good) after '2'. Incidentally the results
of zttest *did not* noticeably improve after making these changes (it
is still below 99.98%).

Apparently our problem is related to the fact that the TDMP generates
massive amounts of IRQ requests and that it becomes extremely upset if
a suitable number of those IRQ requests are not honoured. Dispite the
fact that a PCI device has to be able to share an IRQ in order to meet
the PCI specification, it appears that having a TDMP sharing an IRQ
with *anything* is a really really bad idea.

I haven't been able to get an explanation about why hyper threading is
a bad thing, but apparently high-performance devices such as SCSI
adapters can cause resource contention issues with the TDMP, resource
issues that the TDMP becomes very upset about.

So hopefully we have seen the back of this problem and I have to say
that I have been pretty dissappointed to find out that this issue
appears to be relatively well known by Digium, but seemingly not
publicised in the slightest. We searched for days to find anything
relating to our issue but to no avail. Hopefully the next time someone
has this issue they might find this mail and save themselves some of
the frustration that we had.

When we challenged Digiums advice about retarding the CPU (i.e.
disabling hyper threading) and slowing I/O (by throwing out our SCSI
RAID controller and replacing with IDE) they fell strangely silent -
after getting prompt and meaningful responses to our requests they
suddenly stopped responding at all.

I think that this issue constitutes a pretty major flaw in the design
of the TDMP and we will strongly avoid putting these cards into any *
servers from now on. This is a real shame, as we as a company really
want to reward Digium for all of their good work by actually buying
their products, but we no longer have any faith in the design and
suitability for production use of this product.

Maybe it's time for Digium to think about 

RE: [Asterisk-Users] Something every TDMP user should know

2005-05-12 Thread Colin Anderson



They instantly got us to look at the 
output of zttest and we found that this was (in their words) 'extremely low', 
with 'best' and'worst' readings of 99.975586% and 
99.963379% respectively.

Might want to give PCIlatency setting a try, it 
helped for me. My ZTTEST would drop occasionally to 99.95% until I 
set:

setpci -v -s 01:01.0 latency_timer=ff --Digium 
PRI card
setpci -v -s 01:04:0latency_timer=ff --Digium 401 4 X 
FXS
setpci -v -s 
XX:XX:X latency_timer=0 --1 entry for every other PCI card in system from 
LSPCI output, modify XX:XX accordingly

Before setpci I 
would get best in ZTTEST at 99.987793% and worst ~ 
99.95%

After setpci best 
is 100% and worst is 99.987793% consitient. 

I use SpanDSP to 
recieve faxes and before faxes were garbled and now they are OK (BTW, 
nowrecieving ~150 faxes a day 99.95% OK, so SpanDSP *does* work fine, you 
just have to set it up right. Ask me how.)

I put the setpci 
statements in /etc/rc.d/rc.local before my modprobes to the Digium hardware and 
Asterisk startup. 

I'm using a 4-way 
Netfinity FC2 * 1.0 stable

I dunno, maybe the 
community is being too hard on Digium about the design of the card. I can 
understand their perpective, it's brutal to make a card that has to have such 
tight tolerances and make it work acceptably on the huge variation in white box 
hardware (or black box, in your case). There's a page on the Wiki about 
motherboards that work well with installation notes but that's pointless since 
motherboards are such a moving target. Even the motherboard vendor screwing 
around with BIOS updates can invalidate that information. 


What I think is 
best for Asterisk implementation is for Digium to sell a motherboard. No, 
seriously. Find aECS or Abit or ASUS mobo that consitiently yields 100% or 
99.% and white-box it as a barebones kit with a TXXX card. Sell it as a 
case, good PSU, mobo, and TXXX card - you add your own RAM, NIC, CPU  HDD. 
Would you buy one for $699? I probably would. It took me a couple of months of 
fooling around with my Netfinity before I was pleased with the performance and 
satisfied that it would handle the things I wanted it to do without choking. If 
I had the option of saving the couple of months time obsessing over things like 
timing for $699, it would have been a no brainer. Digium wins too, because they 
get an incremental sale that they can make money on (margin on the mobo) and 
lower support costs because they don't have to chase down IRQ latency phantoms. 


hth my 
2c


___
Asterisk-Users mailing list
Asterisk-Users@lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [Asterisk-Users] Something every TDMP user should know

2005-05-12 Thread Andrew Kohlsmith
On May 12, 2005 01:17 pm, Colin Anderson wrote:
 I use SpanDSP to recieve faxes and before faxes were garbled and now they
 are OK (BTW, now recieving ~150 faxes a day 99.95% OK, so SpanDSP *does*
 work fine, you just have to set it up right. Ask me how.)

No, don't ask you how.  Show us how.  Don't tease out information like this, 
like some cheap stripper.

-A.
___
Asterisk-Users mailing list
Asterisk-Users@lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Something every TDMP user should know

2005-05-12 Thread Mark Johnson
Damian Funnell wrote:
   1. Check that the TDMP is on it's own IRQ (much to our
  embarrassment our card wasn't at the time, so we had to play
  with it a bit to get it to occupy a unique IRQ).
   2. Disable hyper threading on the Xeon CPU.
   3. Uninstall our SCSI hardware and replace it with IDE hardware.
   4. Upgrade to the latest stable releases of Asterisk, Zaptel and
  Libpri.
We made changes 1 and 2 in the above list and are prepared to make 
changes 3 and 4 if we find the problem hasn't gone away.  It hasn't 
happened in over two weeks now (after occuring many times per day for 
a while), so we hopefully won't have to throw out our SCSI hardware.  
After we made each change (1 and 2 were made about two weeks apart 
from each other) we found that the quality improved, with the 
incidence of the issue halving after '1' and disappearing (hopefully 
for good) after '2'.  Incidentally the results of zttest *did not* 
noticeably improve after making these changes (it is still below 99.98%).

This is great info.  I am running on an Intel box and attempting to go 
to a dual AMD Opteron setup on a Tyan board.  I am not having luck luck 
getting my numbers above 99.6%.  I've disabled every hardware gadget and 
service not needed and still haven't had much luck.  I'm going to try a 
custom kernel as opposed to the stock one's I've tried, but that's been 
about 4 different OS's with the same results.  Is there something to 
disable on Opteron's that would be the equivalent of disabling 
hyperthreading?  Oh, and I even tried setting the pci latencies and it 
made no noticeable difference.

Mark
___
Asterisk-Users mailing list
Asterisk-Users@lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Something every TDMP user should know

2005-05-12 Thread Erick Perez
I have never had to play with setpci before. Can you elaborate on the
use and purpose of this command?


On 5/12/05, Colin Anderson [EMAIL PROTECTED] wrote:
   They instantly got us to look at the output of zttest and we found that
 this was (in their words) 'extremely low', with 'best' and   'worst'
 readings of 99.975586% and 99.963379% respectively.  
  
 Might want to give PCI latency setting a try, it helped for me. My ZTTEST
 would drop occasionally to 99.95% until I set:
  
 setpci -v -s 01:01.0 latency_timer=ff --Digium PRI card
 setpci -v -s 01:04:0 latency_timer=ff --Digium 401 4 X FXS
 setpci -v -s XX:XX:X latency_timer=0 --1 entry for every other PCI card in
 system from LSPCI output, modify XX:XX accordingly
  
 Before setpci I would get best in ZTTEST at 99.987793% and worst ~ 99.95%
  
 After setpci best is 100% and worst is 99.987793% consitient. 
  
 I use SpanDSP to recieve faxes and before faxes were garbled and now they
 are OK (BTW, now recieving ~150 faxes a day 99.95% OK, so SpanDSP *does*
 work fine, you just have to set it up right. Ask me how.)
  
 I put the setpci statements in /etc/rc.d/rc.local before my modprobes to the
 Digium hardware and Asterisk startup. 
  
 I'm using a 4-way Netfinity FC2 * 1.0 stable
  
 I dunno, maybe the community is being too hard on Digium about the design of
 the card. I can understand their perpective, it's brutal to make a card that
 has to have such tight tolerances and make it work acceptably on the huge
 variation in white box hardware (or black box, in your case). There's a page
 on the Wiki about motherboards that work well with installation notes but
 that's pointless since motherboards are such a moving target. Even the
 motherboard vendor screwing around with BIOS updates can invalidate that
 information. 
  
 What I think is best for Asterisk implementation is for Digium to sell a
 motherboard. No, seriously. Find a ECS or Abit or ASUS mobo that
 consitiently yields 100% or 99.% and white-box it as a barebones kit
 with a TXXX card. Sell it as a case, good PSU, mobo, and TXXX card - you add
 your own RAM, NIC, CPU  HDD. Would you buy one for $699? I probably would.
 It took me a couple of months of fooling around with my Netfinity before I
 was pleased with the performance and satisfied that it would handle the
 things I wanted it to do without choking. If I had the option of saving the
 couple of months time obsessing over things like timing for $699, it would
 have been a no brainer. Digium wins too, because they get an incremental
 sale that they can make money on (margin on the mobo) and lower support
 costs because they don't have to chase down IRQ latency phantoms. 
  
 hth my 2c
  
  
 ___
 Asterisk-Users mailing list
 Asterisk-Users@lists.digium.com
 http://lists.digium.com/mailman/listinfo/asterisk-users
 To UNSUBSCRIBE or update options visit:
  
 http://lists.digium.com/mailman/listinfo/asterisk-users
 
 


-- 

---
Erick Perez
Linux User 376588
http://counter.li.org/  (Get counted!!!)
Panama, Republic of Panama
___
Asterisk-Users mailing list
Asterisk-Users@lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


RE: [Asterisk-Users] Something every TDMP user should know

2005-05-12 Thread Kris Boutilier
 -Original Message-
 From: Erick Perez [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 12, 2005 2:19 PM
 To: Asterisk Users Mailing List - Non-Commercial Discussion;
 [EMAIL PROTECTED]
 Subject: Re: [Asterisk-Users] Something every TDMP user should know
 
 
 I have never had to play with setpci before. Can you elaborate on the
 use and purpose of this command?
 

See:
http://www-106.ibm.com/developerworks/library/l-hw2.html

Also, for more PCI latency timer specifics:
http://www.reric.net/linux/pci_latency.html

Kris Boutilier
Information Services Coordinator
Sunshine Coast Regional District
___
Asterisk-Users mailing list
Asterisk-Users@lists.digium.com
http://lists.digium.com/mailman/listinfo/asterisk-users
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users