RE: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!!
-Original Message- Thanks for reading, Wes ___ Please reply with the output of the following: lspci -vv lspci -vv | grep IRQ lspci cat /proc/interrupts Thank you. Andrew ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
RE: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!!
snip Server load is averaging around 20%, plenty of memory, disk space, and bandwidth available. No QOS running on network. ulaw is the primary codec. Server is stable, and there are no extraneous services running, save mysql and httpd. Even running a processor intensive query doesn't trigger the droputs, they happen randomly. snip You mention that the remote office is fiber connected, but don't identify what equipment is used to at the ends of the fiber. How many people are in the remote office, what is their work process/habits, do they also use this circuit for internet access? The key to my questions is that I suspect you do need at least a minimal QoS implimentation. A quick check on the circuit utilization when the issue occurs can confirm this, or eliminate it. Dan ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!!
Here ya go: lspci -vv --- 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 09) Subsystem: Dell: Unknown device 016d Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0 Capabilities: [40] Vendor Specific Information 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0, Cache Line Size 10 Bus: primary=00, secondary=01, subordinate=03, sec-latency=0 Memory behind bridge: dfd0-dfff Prefetchable memory behind bridge: d800-d800 Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR+ PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee0 Data: Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s 64ns, L1 1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 256 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 2 Link: Latency L0s 4us, L1 unlimited Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Root: Correctable- Non-Fatal- Fatal- PME- 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0, Cache Line Size 10 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR- PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee0 Data: Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag- Device: Latency L0s 64ns, L1 1us Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported- Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- Device: MaxPayload 128 bytes, MaxReadReq 128 bytes Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s, Port 4 Link: Latency L0s 4us, L1 unlimited Link: ASPM Disabled RCB 64 bytes Disabled CommClk- ExtSynch- Link: Speed 2.5Gb/s, Width x8 Root: Correctable- Non-Fatal- Fatal- PME- 00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 09) (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0, Cache Line Size 10 Bus: primary=00, secondary=05, subordinate=07, sec-latency=0 I/O behind bridge: e000-efff Memory behind bridge: dfa0-dfcf Secondary status: 66Mhz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR+ PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- Reset- FastB2B- Capabilities: [50] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/1 Enable- Address: fee0 Data: Capabilities: [64] Express Root Port (Slot-) IRQ 0 Device: Supported:
RE: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!!
Your problem is intermittent. It is probably Network related as if you reboot that problems may or may not comeback. In addition to the lspci stuff requested. Have you checked your fiberlink. Is it possible that something or someone is saturating the link with Virus/Spy/PtP Ware??? SIP doesn't have a Jitter Buffer so it is sensitive to the traffic. You may want to try Olle's branch with the Jitter Buffer. See if that helps. I think you are fine on the Hardware side of things. It is also possible you got root-kitted. SNIP. ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
RE: [asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
So you need a divide and conquer strategy here: 1. Is it Asterisk or the WAN? This should be easy enough to test for. Do call dropouts happen in your datacentre? If not, your Asterisk install is good. My money's on the 10mbit WAN pipe, and that's what I would be focussing on. 2. If it's the WAN, is it a connectivity issue or a bandwidth issue? Do a continous ping from the remote location to your Asterisk server for a day. You should get NO packets dropped. If you are getting drops, it's a connectivity issue and you have to look at your SLA to see what your provider considers good. Otherwise, bandwidth issue. 3. If it's a bandwidth issue, is it your users doing things or is it a service that is eating bandwidth? If it's a service that is aggregated to a remote server, like email, then you can use bandwidth management tools like AstShape or good old tc to severely retard available bandwidth to the troublesome service. If it's your users, you have to determine what they are doing. Look at patterns: Does it happen every Tuesday afternoon when you know Bob from Accounting is running his reports? 4. Sounds like you are running Asterisk -- SIP -- 10mbit WAN -- SIP -- Phones - which probably is half the issue right there because of no jitterbuffer. Dig up an old P-3, stick in Trixbox, run it out to your remote location, and have your Eyebeam clients use *it* instead of your big Asterisk server for local connectivity. Then tie your P-3 to your big Asterisk server with IAX. Jitterbuffer + trunking = goodness and your P-3 won't choke under load if you avoid transcoding by using the same codec end-to-end. Yes it will blow having to maintain two dialplans. But IAX works frigging great. I use it to aggregate 30 remote locations over the *public* Internet to my big Asterisk server, and I never get complaints of dropouts, and in fact I use it extensively myself and IMO it sounds better* than the local CableCo's VoIP offering, which is a big POS. 5. Regardless of what it actually is, I would have some sort of traffic shaper at both ends of the WAN pipe. Again, dig up a couple of old P-2 or P-3's and stick in a bootable Monowall CD, change the default rules to allow all traffic through, but create a traffic shaping ruleset to give priority and bandwidth to 5060, 4569, 1-2 and dump everything else to a low priority queue. 6. I'd run GSM anyway (even though you tried it) because it would eliminate half your bandwidth consumption. Another variable eliminated. hth *By 'sounds better' I mean it sounds like a perfectly normal PSTN call, ALL THE TIME in s d of co s an ly s nd ng li e t hs -Original Message- From: whois wes [mailto:[EMAIL PROTECTED] Sent: Thursday, July 06, 2006 10:51 AM To: Asterisk Users Mailing List - Non-Commercial Discussion Subject: [asterisk-users] Phones cutting out.again - PLEASE HELP!!! Hate to drag this one back up, butit's happening again. Overview of architecture: Dell poweredge 2850, running fedora core 4, asterisk 1.2.7.1, zaptel 1.2.5, and sangoma wanpipe 2.3.4 drivers. T1 interface card is the sangoma a104d with onboard echo can. Server is located in our data center and connected directly to our cisco 6513 core switch, so we have almost zero latency. The office having the issues is located several miles away and is connected via a 10Mbit fiber pipe, also low latency. Ping times between remote office and here are well under 10ms. T1's are robbed-bit, EM wink signalling --- (this may be cause, but want your input). Server load is averaging around 20%, plenty of memory, disk space, and bandwidth available. No QOS running on network. ulaw is the primary codec. Server is stable, and there are no extraneous services running, save mysql and httpd. Even running a processor intensive query doesn't trigger the droputs, they happen randomly. Phones are a mix of Eyebeam 1.5.5 and Eyebeam 1.10 3010n. Both types of phones are experiencing cutting out of the signal, mainly in the Rx stream, but occassional in the Tx stream as well. The cutting out was NOT occurring last night, and the phone server is being rebooted nightly. Nothing has changed AT ALL, and the problem has started occurring again. If I don't do ANYTHING at all today, there is a 50% chance that this will NOT occur tomorrow. In other words, SOMETHING is causing our phones to drop out, but whatever changes I make seem to have no effect. The problem will start and stop seeminly at it's own whim. --- Things I have tried: 1. changed from ulaw to gsm as primary codec - no change 2. disabled hardware echo can on A104D - no change 3. moved from asterisk 1.2.4 to 1.2.7.1, recompiled both several times - no change 4. have played with gain settings a bit, doesn't seem to make much difference --- At this point, i am nearing the end of my rope - i have rebuilt this machine three times now, and have recompiled the system at least a dozen times. We have gone from Digium hardware to Sangoma harware and
Re: [asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
Thanks for the quick responses everyone. To answer some of the questions posed: The main traffic going over this pipe is voice, with a small amount of web traffic as well. There are 60 total users, 5 of which access anything other than what is on their LAN up there. In any case, we are not saturating the pipe, and our telco put some sort of filters on the Optiman switches on each side to eliminate any jitter (or so they say). Prior to the filter being installed, we had our main application server for that location located down here - when the issue started (out of the blue, nothing really triggered it, and our bandwidth didn't change or spike) we moved that server to the remote location. So, before we even had the issue, we were using WAY more bandwidth, almost 8Mbit at times...we're averaging around 2-3 now, and it rarely spikes above that. Also, when I connect to the server locally (the server is in the room next to me, in other words, and i have 1 Gbit of bandwidth all the way to the back of the server, I still get call dropouts. In other words, completely bypassing the fiber pipe results in the same problem. For that reason alone, I don't think it's the WAN (although I agree with what all of you said in regards to QOS, etc, it's just not up to me to implement that, even though it's been suggested numerous times). However, this IS the only server (of 8 total, all in the same rack and connected to the telco via the same DS3) that is having the issue, which DOES point to it being the WAN, as that is our ONLY remote location. See why I'm frustrated? I do like the idea of putting a local box up there and using an IAX trunk over the pipe, and will see about getting that implemented. GSM was already shot down as 'too low-quality' - we'd rather up the pipe to 20Mbit than go with a lower quality codec. Sorry that I forgot to mention some of this in my initial post, and hopefully the above info will shed a bit more light on my confusion. Thank you all again for replying so quickly, and if you have any other suggestions, please let me know. Wes On 7/6/06, Colin Anderson [EMAIL PROTECTED] wrote: So you need a divide and conquer strategy here: 1. Is it Asterisk or the WAN? This should be easy enough to test for. Do call dropouts happen in your datacentre? If not, your Asterisk install is good. My money's on the 10mbit WAN pipe, and that's what I would be focussing on. 2. If it's the WAN, is it a connectivity issue or a bandwidth issue? Do a continous ping from the remote location to your Asterisk server for a day. You should get NO packets dropped. If you are getting drops, it's a connectivity issue and you have to look at your SLA to see what your provider considers good. Otherwise, bandwidth issue. 3. If it's a bandwidth issue, is it your users doing things or is it a service that is eating bandwidth? If it's a service that is aggregated to a remote server, like email, then you can use bandwidth management tools like AstShape or good old tc to severely retard available bandwidth to the troublesome service. If it's your users, you have to determine what they are doing. Look at patterns: Does it happen every Tuesday afternoon when you know Bob from Accounting is running his reports? 4. Sounds like you are running Asterisk -- SIP -- 10mbit WAN -- SIP -- Phones - which probably is half the issue right there because of no jitterbuffer. Dig up an old P-3, stick in Trixbox, run it out to your remote location, and have your Eyebeam clients use *it* instead of your big Asterisk server for local connectivity. Then tie your P-3 to your big Asterisk server with IAX. Jitterbuffer + trunking = goodness and your P-3 won't choke under load if you avoid transcoding by using the same codec end-to-end. Yes it will blow having to maintain two dialplans. But IAX works frigging great. I use it to aggregate 30 remote locations over the *public* Internet to my big Asterisk server, and I never get complaints of dropouts, and in fact I use it extensively myself and IMO it sounds better* than the local CableCo's VoIP offering, which is a big POS. 5. Regardless of what it actually is, I would have some sort of traffic shaper at both ends of the WAN pipe. Again, dig up a couple of old P-2 or P-3's and stick in a bootable Monowall CD, change the default rules to allow all traffic through, but create a traffic shaping ruleset to give priority and bandwidth to 5060, 4569, 1-2 and dump everything else to a low priority queue. 6. I'd run GSM anyway (even though you tried it) because it would eliminate half your bandwidth consumption. Another variable eliminated. hth *By 'sounds better' I mean it sounds like a perfectly normal PSTN call, ALL THE TIME in s d of co s an ly s nd ng li e t hs -Original Message- From: whois wes [mailto:[EMAIL PROTECTED] Sent: Thursday, July 06, 2006 10:51 AM To: Asterisk Users Mailing List - Non-Commercial Discussion Subject: [asterisk-users] Phones cutting
RE: [asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
Also, when I connect to the server locally (the server is in the room next to me, in other words, and i have 1 Gbit of bandwidth all the way to the back of the server, I still get call dropouts. However, this IS the only server (of 8 total, all in the same rack and connected to the telco via the same DS3) that is having the issue, which DOES point to it being the WAN, as that is our ONLY remote location. So perhaps what you are seeing is two or more subtle issues with the same symptom, so subjectively it looks like the *same* issue. 1. Definitely try the remote IAX box to rule out bandwidth starvation. 2. Definitely try the ping test to rule out connectivity. 3. You have to figure out what the problem is with your big Asterisk box. There should be no reason why you are getting dropouts on the local LAN. What is the output of zttest? Is it good? Does zttool indicate IRQ misses? If it's OK, then your hardware - T1 setup is good, so you have ruled out your Asterisk box. It is also a worthwhile excercise to rule out the onboard ethernet card in the Dell. In fact, whenever I do a new box, I automatically disable the onboard LAN and replace it with an add-in 3com or Intel. It is also a worthwhile excercise to user setpci to change the latency of the cards in the Dell so that your Zap boards can grab the bus as much as possible. 4. The thing that is common in all scenarios is the EyeBeam client itself. Any soft phone is subject to the strengths and weaknesses of the audio chipset in the PC, with issues to consider like latency, audio threshold before it starts the TX, and duplex settings. Because troubleshooting these variables is often as hard as troubleshooting an entire Asterisk install, I would never run a soft-phone and expect people to use it productively. What happens when you put in a real phone? If you don't have a hardphone, maybe try something else like the Snom soft-phone. In the end, this is all about eliminating variables as much as possible, and this will determine your decision matrix of things to try. The first matrix will be the most difficult to implement because you have a whole wack of stuff to eliminate, but they will get smaller and smaller as you eliminate variables and eventually you will only have 2 or 3 variables to test for, and then you are golden. OT: I find it useful to make painstaking notes or keep a spreadsheet of test results when going through a troubleshooting process like this. Often, referring back to the spreadsheet gives me valuable insight into a problem. I read this book, and I got shivers down my spine because it's like these guys got into my brain and stole (what I thought) was an original problem-solving idea of mine: http://www.transcendstrategy.com/html/index.php?module=htmlpagesfunc=displa ypid=7 Every person that troubleshoots a complex system should read this book (disclaimer: I just read it, I have nothing to do with these guys) good luck ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Phones cutting out.....again - PLEASE HELP!! !
Colin, Very good points, and you are right, I need to start tracking what has been done. A bit of history - this server was very unstable when running Digium hardware - every day or two, it would kernel panic and lock up, requiring a manual reboot. The other servers had issues as well, and ALL of the stability problems were solved when we moved to Sangoma cards about 6 weeks ago...this problem started a few weeks after that migration. On point #3, you mention a few things - we have NEVER gotten zttest to show 100% on ANY of our boxes, which is one reason we migrated from Digium to Sangoma. For a while, we did try running a third party NIC in the box to help with the stability issues. Once we moved to Sangoma, we went back to the onboard, and when we started having audio issues, we did try putting the third party NIC back in, to no avail. In regards to eyebeam, the rest of our company is using it as well. We're a call center, and our reps are actually more productive with the softphone than with the hardphones (we've tried both). The sister division of our remote location is set up almost identically in terms of dialplan, T1 config, desktop software package, softphone, etc, and they have ZERO issues. The two divisions are literally mirrors of one another, and the only difference between the two is that one office is remote. We also have hardphones in use by the managers at the remote location, and they also experience the issue. I do see what you're saying, though, about it possibly being two smaller issues...I hope it's not, though - that much harder to pin down. Thanks ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users
Re: [asterisk-users] Phones cutting out.....again - PLEASE HELP!!!
Do you have tetheral network analyser installed on server, that can be a good start, look at the analyses of the graphs. Also try pinging the CPE's and see if there is any latency. Do you also have the abilty to check the upstreams signals? -- Original message -- From: "whois wes" [EMAIL PROTECTED] Hate to drag this one back up, butit's happening again. Overview of architecture: Dell poweredge 2850, running fedora core 4, asterisk 1.2.7.1, zaptel 1.2.5, and sangoma wanpipe 2.3.4 drivers. T1 interface card is the sangoma a104d with onboard echo can. Server is located in our data center and connected directly to our cisco 6513 core switch, so we have almost zero latency. The office having the issues is located several miles away and is connected via a 10Mbit fiber pipe, also low latency. Ping times between remote office and here are well under 10ms. T1's are robbed-bit, EM wink signalling --- (this may be cause, but want your input). & gt; Server load is averaging around 20%, plenty of memory, disk space, and bandwidth available. No QOS running on network. ulaw is the primary codec. Server is stable, and there are no extraneous services running, save mysql and httpd. Even running a processor intensive query doesn't trigger the droputs, they happen randomly. Phones are a mix of Eyebeam 1.5.5 and Eyebeam 1.10 3010n. Both types of phones are experiencing cutting out of the signal, mainly in the Rx stream, but occassional in the Tx stream as well. The cutting out was NOT occurring last night, and the phone server is being rebooted nightly. Nothing has changed AT ALL, and the problem has started occurring again. If I don't do ANYTHING at all today, there is a 50% chance that this will NOT occur tomorrow. In other words, SOMETHING is causing our phones to drop out, but whatever changes I m ake se em to have no effect. The problem will start and stop seeminly at it's own whim. --- Things I have tried: 1. changed from ulaw to gsm as primary codec - no change 2. disabled hardware echo can on A104D - no change 3. moved from asterisk 1.2.4 to 1.2.7.1, recompiled both several times - no change 4. have played with gain settings a bit, doesn't seem to make much difference --- At this point, i am nearing the end of my rope - i have rebuilt this machine three times now, and have recompiled the system at least a dozen times. We have gone from Digium hardware to Sangoma harware and back again. I have changed every conceivable setting on the phones to no avail. The problem will randomly disappear, only to come back a few days later. I can make a change, it seems to have an effect, then we're back to the same o ld thing again. I am in dire need of ANY help anyone can offer, this has been going on in some form for almost three months. Thanks for reading, Wes ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users ___ --Bandwidth and Colocation provided by Easynews.com -- asterisk-users mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-users