[Asterisk-Users] Centos 4.3 Issues

2006-05-22 Thread Greg Boehnlein
Hello,
I was wondering if anyone out there is successfully running 
Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two 
weeks that has me scratching my head and muttering strange things in the 
wee hours of the morning. I am going to try and be as descriptive as my 
brain will allow right now, but if there is something that I do not cover, 
please do not hesitate to ask and I'll be happy to answer.

For the last 2 years, I have been running a mixture of Tao Linux 
and Centos (both RHEL derivatives) on our production boxes. Asterisk has 
run flawlessly on all installations. Last week, I updated one of our 
gateway boxes from Centos 4.2 (under which it ran for 6 months without 
issue) to the new 4.3 code. Almost immediately, we began to experience 
problems. Asterisk would core w/ the following:

#0  0x004878ab in test_err () from 
/usr/lib/asterisk/modules/codec_g729a.so

The segfaults would happen under very light loads, in some cases 
with just a single call. Kevin was able to log in to the box, and put a 
debugging version of codec_g729 on the box. He determined that the problem 
was that the values that were being returned in that routine were 
incorrect. I.E. something in the system was returning a non-zero value 
when multiplying a number by 0. Barring any other explanations, we 
assumed that there was a hardware issue somewhere, either in the memory, 
or the FPU on the CPU.
So, we replaced the box w/ a brand new Dual-Core system running a 
Dual-Core Pentium D 920. We loaded the 32 bit version of Centos 4.3 onto 
the box and proceeded to start testing. BAM.. same problem.. the backtrace 
showed the failure in the same routine.
We scratched our heads, and after many hours of trying various 
things (backing off the kernel to 2.6.9-22) and even moving to the new 
development kernel 2.6.9-34.19 (from the testing tree) we could do nothing 
to solve the issue.
Mind you, this is the exact same behavior on two different 
hardware platforms running the exact same distribution. We even loaded up 
a third box and could reproduce the behavior on it as well. Three 
different boxes, one common distribution.

As a test, we installed Fedora Core 5 x86_64 on the new Dual Core 
box and ran extensive tests overnight, simulating 96 channels doing G729 
to Ulaw transcoding. The box ran completely stable. No hiccups.

So, this morning, we put it back into the cluster, and it's now 
taking about 200 concurrent calls, doing an insane amount of transcoding 
and it is working just fine. Before, it would have cored in the first 
couple of minutes.

I'm scratching my head here, because I generally have had excellent 
experiences with Centos. However, I have NO idea what might be the issue 
here. Could it be the kernel? (We tried three different ones!). Could it 
be the libc? Maybe it is the compiler?

In any case, if anyone is having success with Centos 4.3 (32 bit), please 
speak up. I'd like to get to the bottom of it. I generally do not like to 
run Fedora on production equipment as it is generally bleeding edge. In 
this case, FC5 is running 2.6.16 something..

-- 
Vice President of N2Net, a New Age Consulting Service, Inc. Company
 http://www.n2net.net Where everything clicks into place!
 KP-216-121-ST


___
--Bandwidth and Colocation provided by Easynews.com --

Asterisk-Users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Centos 4.3 Issues

2006-05-22 Thread Greg Oliver
On Mon, 2006-05-22 at 12:16 -0400, Greg Boehnlein wrote:
 Hello,
   I was wondering if anyone out there is successfully running 
 Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two 
 weeks that has me scratching my head and muttering strange things in the 
 wee hours of the morning. I am going to try and be as descriptive as my 
 brain will allow right now, but if there is something that I do not cover, 
 please do not hesitate to ask and I'll be happy to answer.
 
   For the last 2 years, I have been running a mixture of Tao Linux 
 and Centos (both RHEL derivatives) on our production boxes. Asterisk has 
 run flawlessly on all installations. Last week, I updated one of our 
 gateway boxes from Centos 4.2 (under which it ran for 6 months without 
 issue) to the new 4.3 code. Almost immediately, we began to experience 
 problems. Asterisk would core w/ the following:
 
 #0  0x004878ab in test_err () from 
 /usr/lib/asterisk/modules/codec_g729a.so
 
   The segfaults would happen under very light loads, in some cases 
 with just a single call. Kevin was able to log in to the box, and put a 
 debugging version of codec_g729 on the box. He determined that the problem 
 was that the values that were being returned in that routine were 
 incorrect. I.E. something in the system was returning a non-zero value 
 when multiplying a number by 0. Barring any other explanations, we 
 assumed that there was a hardware issue somewhere, either in the memory, 
 or the FPU on the CPU.
   So, we replaced the box w/ a brand new Dual-Core system running a 
 Dual-Core Pentium D 920. We loaded the 32 bit version of Centos 4.3 onto 
 the box and proceeded to start testing. BAM.. same problem.. the backtrace 
 showed the failure in the same routine.
   We scratched our heads, and after many hours of trying various 
 things (backing off the kernel to 2.6.9-22) and even moving to the new 
 development kernel 2.6.9-34.19 (from the testing tree) we could do nothing 
 to solve the issue.
   Mind you, this is the exact same behavior on two different 
 hardware platforms running the exact same distribution. We even loaded up 
 a third box and could reproduce the behavior on it as well. Three 
 different boxes, one common distribution.
 
   As a test, we installed Fedora Core 5 x86_64 on the new Dual Core 
 box and ran extensive tests overnight, simulating 96 channels doing G729 
 to Ulaw transcoding. The box ran completely stable. No hiccups.
 
   So, this morning, we put it back into the cluster, and it's now 
 taking about 200 concurrent calls, doing an insane amount of transcoding 
 and it is working just fine. Before, it would have cored in the first 
 couple of minutes.
 
 I'm scratching my head here, because I generally have had excellent 
 experiences with Centos. However, I have NO idea what might be the issue 
 here. Could it be the kernel? (We tried three different ones!). Could it 
 be the libc? Maybe it is the compiler?
 
 In any case, if anyone is having success with Centos 4.3 (32 bit), please 
 speak up. I'd like to get to the bottom of it. I generally do not like to 
 run Fedora on production equipment as it is generally bleeding edge. In 
 this case, FC5 is running 2.6.16 something..
 

Have you tried compiling statically on CentOS 4.2 and running on 4.3?

I am assuming you have made sure the dist is up to date with patches.
We do not use 729, so I cannot try it out for you, but we do use CentOS.
Is it only w/ SVN, or all releases of *?

-Greg

___
--Bandwidth and Colocation provided by Easynews.com --

Asterisk-Users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Centos 4.3 Issues

2006-05-22 Thread Kevin P. Fleming
Greg Oliver wrote:

 I am assuming you have made sure the dist is up to date with patches.
 We do not use 729, so I cannot try it out for you, but we do use CentOS.
 Is it only w/ SVN, or all releases of *?

The problem does not appear to be happening in Asterisk itself, but in
the G.729 codec module. The symptom is a failure of a floating-point
expression to produce the proper result, leading to a segfault when the
code tries to access a non-existent array member.

We tried building the codec without any optimization at all, and still
experienced the same issue. This points to some sort of 'core' problem
on the system, but as Greg said, hardware has been ruled out. That
leaves basically the kernel and the C library's floating point stuff, I
believe... but switching kernels did not help (although they were all
very similar kernels).
___
--Bandwidth and Colocation provided by Easynews.com --

Asterisk-Users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Centos 4.3 Issues

2006-05-22 Thread alist

Greg Boehnlein wrote:


Hello,
	I was wondering if anyone out there is successfully running 
Asterisk 1.2 svn w/ Centos 4.3. I had an experience over the last two 
weeks that has me scratching my head and muttering strange things in the 
wee hours of the morning. I am going to try and be as descriptive as my 
brain will allow right now, but if there is something that I do not cover, 
please do not hesitate to ask and I'll be happy to answer.




Greg,

When I upgraded to 4.3 I experienced problems with some non-asterisk 
RPM's that were compiled on earlier versions of CentOS 4. Once they were 
recompiled on a fully updated 4.3 system they worked fine. Have you 
tried recompiling everything?


Andrew
___
--Bandwidth and Colocation provided by Easynews.com --

Asterisk-Users mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Centos 4.3 Issues

2006-05-22 Thread Greg Boehnlein
On Mon, 22 May 2006, Greg Oliver wrote:

 Have you tried compiling statically on CentOS 4.2 and running on 4.3?

No. Not really in the plans either. Standard policy w/ Asterisk around 
here is to compile on the box it is going to be running on, under the 
distro it's running on.
 
 I am assuming you have made sure the dist is up to date with patches.
 We do not use 729, so I cannot try it out for you, but we do use CentOS.
 Is it only w/ SVN, or all releases of *?

This happens to be with the 1.2 SVN branch.

-- 
Vice President of N2Net, a New Age Consulting Service, Inc. Company
 http://www.n2net.net Where everything clicks into place!
 KP-216-121-ST



___
--Bandwidth and Colocation provided by Easynews.com --

Asterisk-Users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users


Re: [Asterisk-Users] Centos 4.3 Issues

2006-05-22 Thread Greg Boehnlein
On Mon, 22 May 2006, alist wrote:

 Greg,
 
 When I upgraded to 4.3 I experienced problems with some non-asterisk 
 RPM's that were compiled on earlier versions of CentOS 4. Once they were 
 recompiled on a fully updated 4.3 system they worked fine. Have you 
 tried recompiling everything?

We recompiled Asterisk, libpri and zaptel. The one system was an upgrade 
from Centos 4.2 to Centos 4.3, but the other two were installed w/ the 
latest Centos 4.3 ISO downloaded last week.

-- 
Vice President of N2Net, a New Age Consulting Service, Inc. Company
 http://www.n2net.net Where everything clicks into place!
 KP-216-121-ST



___
--Bandwidth and Colocation provided by Easynews.com --

Asterisk-Users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users