Re: [asterisk-users] How can I check backtrace files ? [SOLVED]

2017-12-07 Thread Olivier
2017-12-07 15:50 GMT+01:00 George Joseph :

>
>
> On Wed, Dec 6, 2017 at 11:13 AM, Olivier  wrote:
>
>>
>>
>> 2017-12-06 15:52 GMT+01:00 George Joseph :
>>
>>>
>>>
>>> On Tue, Dec 5, 2017 at 9:20 AM, Olivier  wrote:
>>>
 Hello,

 I carefully read [1] which details how backtrace files can be produced.

 Maybe this seems natural to some, but how can I go one step futher, and
 check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?

 In other words, where can I find an example on how to use one of those
 files and check by myself, that if a system ever fails, I won't have to
 wait for another failure to provide required data to support teams ?

>>>
>>> It's a great question but I could spend a week answering it and not
>>> scratch the surface. :)
>>>
>>
>> Thanks very much for trying, anyway ;-)
>>
>>
>>>  It's not a straightforward thing unless you know the code in question.
>>> The most common is a segmentation fault (segfault or SEGV).
>>>
>>
>> True ! I experienced segfaults lately and I could not configure the
>> platform I used then (Debian Jessie) to produce core files in a directory
>> Asterisk can write into.
>> Now, with Debian Stretch, I can produce core file at will (with a kill -s
>> SIGSEGV ).
>> I checked ast_coredumped worked OK as it produced thread.txt files and so
>> on.
>>
>> Ideally, I would like to go one step further: check now that a future
>> .txt file would be "workable" (and not "you should have compiled with
>> option XXX or configured with option YYY) .
>>
>>
>>
>>>   In that case, the thread1.txt file is the place to start.  Since most
>>> of the objects passed around are really pointers to objects, the most
>>> obvious cause would be a 0x0 for a value.  So for instance "chan=0x0".
>>> That would be a pointer to a channel object that was not set when it
>>> probably should have been.  Unfortunately, it's not only 0x0 that could
>>> cause a segv.   Anytime a program tries to access memory it doesn't own,
>>> that signal is raised.  So let's say there a 256 byte buffer which the
>>> process owns.  If there's a bug somewhere that causes the program to try
>>> and access bytes beyond the end of the buffer, you MAY get a segv if that
>>> process doesn't also own that memory.  If this case, the backtrace won't
>>> show anything obvious because the pointers all look valid.  There probably
>>> would be an index variable (i or ix, etc) that may be set to 257 but you'd
>>> have to know that the buffer was only 256 bytes to realize that that was
>>> the issue.
>>>
>>
>> So, with an artificial kill -s SIGSEGV , does the bellow
>> output prove I have a workable .txt files (having .txt files that let
>> people find the root cause of the issue is another story as we probably can
>> only hope for the best here) ?
>>
>>
>> # head core-brief.txt
>> !@!@!@! brief.txt !@!@!@!
>>
>>
>> Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
>> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
>> #1  0x55cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
>> "threadpool.c", lineno=1131, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
>> "worker_idle", cond_name=0x55cdcb7d4b7f ">cond",
>> mutex_name=0x55cdcb7d4b71 ">lock", cond=0x7f2abc000978,
>> t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
>> #2  0x55cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
>> threadpool.c:1131
>> #3  0x55cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
>> threadpool.c:1022
>> #4  0x55cdcb769a8c in dummy_start (data=0x7f2abc000a80) at
>> utils.c:1238
>> #5  0x7f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
>> pthread_create.c:333
>>
>
>
> That's it!  The key pieces of information are the function names
> (worker_idle, worker_start, etc.), the filename (threadpool.c, etc) and the
> line numbers (1131, 1022, etc).
>
>
>
>
>>
>>
>>> Deadlocks are even harder to troubleshoot.  For that, you need to look
>>> at full.txt to see where the threads are stuck and find the 1 thread that's
>>> holding the lock that the others are stuck on.
>>>
>>> Sorry.  I wish I had a better answer because it'd help a lot if folks
>>> could do more investigation themselves.
>>>
>>>
>>>
>>>
>>>



> --
> _
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> Check out the new Asterisk community forum at: https://community.asterisk.
> org/
>
> New to Asterisk? Start here:
>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>http://lists.digium.com/mailman/listinfo/asterisk-users
>

Thank you very much guys, for your replies !
Now, I can't wait our next Segfault to happen ;-
-- 

Re: [asterisk-users] How can I check backtrace files ?

2017-12-07 Thread George Joseph
On Wed, Dec 6, 2017 at 11:13 AM, Olivier  wrote:

>
>
> 2017-12-06 15:52 GMT+01:00 George Joseph :
>
>>
>>
>> On Tue, Dec 5, 2017 at 9:20 AM, Olivier  wrote:
>>
>>> Hello,
>>>
>>> I carefully read [1] which details how backtrace files can be produced.
>>>
>>> Maybe this seems natural to some, but how can I go one step futher, and
>>> check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?
>>>
>>> In other words, where can I find an example on how to use one of those
>>> files and check by myself, that if a system ever fails, I won't have to
>>> wait for another failure to provide required data to support teams ?
>>>
>>
>> It's a great question but I could spend a week answering it and not
>> scratch the surface. :)
>>
>
> Thanks very much for trying, anyway ;-)
>
>
>>  It's not a straightforward thing unless you know the code in question.
>> The most common is a segmentation fault (segfault or SEGV).
>>
>
> True ! I experienced segfaults lately and I could not configure the
> platform I used then (Debian Jessie) to produce core files in a directory
> Asterisk can write into.
> Now, with Debian Stretch, I can produce core file at will (with a kill -s
> SIGSEGV ).
> I checked ast_coredumped worked OK as it produced thread.txt files and so
> on.
>
> Ideally, I would like to go one step further: check now that a future .txt
> file would be "workable" (and not "you should have compiled with option XXX
> or configured with option YYY) .
>
>
>
>>   In that case, the thread1.txt file is the place to start.  Since most
>> of the objects passed around are really pointers to objects, the most
>> obvious cause would be a 0x0 for a value.  So for instance "chan=0x0".
>> That would be a pointer to a channel object that was not set when it
>> probably should have been.  Unfortunately, it's not only 0x0 that could
>> cause a segv.   Anytime a program tries to access memory it doesn't own,
>> that signal is raised.  So let's say there a 256 byte buffer which the
>> process owns.  If there's a bug somewhere that causes the program to try
>> and access bytes beyond the end of the buffer, you MAY get a segv if that
>> process doesn't also own that memory.  If this case, the backtrace won't
>> show anything obvious because the pointers all look valid.  There probably
>> would be an index variable (i or ix, etc) that may be set to 257 but you'd
>> have to know that the buffer was only 256 bytes to realize that that was
>> the issue.
>>
>
> So, with an artificial kill -s SIGSEGV , does the bellow
> output prove I have a workable .txt files (having .txt files that let
> people find the root cause of the issue is another story as we probably can
> only hope for the best here) ?
>
>
> # head core-brief.txt
> !@!@!@! brief.txt !@!@!@!
>
>
> Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/
> x86_64/pthread_cond_timedwait.S:225
> #1  0x55cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
> "threadpool.c", lineno=1131, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
> "worker_idle", cond_name=0x55cdcb7d4b7f ">cond",
> mutex_name=0x55cdcb7d4b71 ">lock", cond=0x7f2abc000978,
> t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
> #2  0x55cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
> threadpool.c:1131
> #3  0x55cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
> threadpool.c:1022
> #4  0x55cdcb769a8c in dummy_start (data=0x7f2abc000a80) at utils.c:1238
> #5  0x7f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
> pthread_create.c:333
>


That's it!  The key pieces of information are the function names
(worker_idle, worker_start, etc.), the filename (threadpool.c, etc) and the
line numbers (1131, 1022, etc).




>
>
>> Deadlocks are even harder to troubleshoot.  For that, you need to look at
>> full.txt to see where the threads are stuck and find the 1 thread that's
>> holding the lock that the others are stuck on.
>>
>> Sorry.  I wish I had a better answer because it'd help a lot if folks
>> could do more investigation themselves.
>>
>>
>>
>>
>>
>>>
>>>
>>>
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [asterisk-users] How can I check backtrace files ?

2017-12-06 Thread Richard Mudgett
On Wed, Dec 6, 2017 at 12:13 PM, Olivier  wrote:

>
>
> 2017-12-06 15:52 GMT+01:00 George Joseph :
>
>>
>>
>> On Tue, Dec 5, 2017 at 9:20 AM, Olivier  wrote:
>>
>>> Hello,
>>>
>>> I carefully read [1] which details how backtrace files can be produced.
>>>
>>> Maybe this seems natural to some, but how can I go one step futher, and
>>> check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?
>>>
>>> In other words, where can I find an example on how to use one of those
>>> files and check by myself, that if a system ever fails, I won't have to
>>> wait for another failure to provide required data to support teams ?
>>>
>>
>> It's a great question but I could spend a week answering it and not
>> scratch the surface. :)
>>
>
> Thanks very much for trying, anyway ;-)
>
>
>>  It's not a straightforward thing unless you know the code in question.
>> The most common is a segmentation fault (segfault or SEGV).
>>
>
> True ! I experienced segfaults lately and I could not configure the
> platform I used then (Debian Jessie) to produce core files in a directory
> Asterisk can write into.
> Now, with Debian Stretch, I can produce core file at will (with a kill -s
> SIGSEGV ).
> I checked ast_coredumped worked OK as it produced thread.txt files and so
> on.
>
> Ideally, I would like to go one step further: check now that a future .txt
> file would be "workable" (and not "you should have compiled with option XXX
> or configured with option YYY) .
>
>
>
>>   In that case, the thread1.txt file is the place to start.  Since most
>> of the objects passed around are really pointers to objects, the most
>> obvious cause would be a 0x0 for a value.  So for instance "chan=0x0".
>> That would be a pointer to a channel object that was not set when it
>> probably should have been.  Unfortunately, it's not only 0x0 that could
>> cause a segv.   Anytime a program tries to access memory it doesn't own,
>> that signal is raised.  So let's say there a 256 byte buffer which the
>> process owns.  If there's a bug somewhere that causes the program to try
>> and access bytes beyond the end of the buffer, you MAY get a segv if that
>> process doesn't also own that memory.  If this case, the backtrace won't
>> show anything obvious because the pointers all look valid.  There probably
>> would be an index variable (i or ix, etc) that may be set to 257 but you'd
>> have to know that the buffer was only 256 bytes to realize that that was
>> the issue.
>>
>
> So, with an artificial kill -s SIGSEGV , does the bellow
> output prove I have a workable .txt files (having .txt files that let
> people find the root cause of the issue is another story as we probably can
> only hope for the best here) ?
>
>
> # head core-brief.txt
> !@!@!@! brief.txt !@!@!@!
>
>
> Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/
> x86_64/pthread_cond_timedwait.S:225
> #1  0x55cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
> "threadpool.c", lineno=1131, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
> "worker_idle", cond_name=0x55cdcb7d4b7f ">cond",
> mutex_name=0x55cdcb7d4b71 ">lock", cond=0x7f2abc000978,
> t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
> #2  0x55cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
> threadpool.c:1131
> #3  0x55cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
> threadpool.c:1022
> #4  0x55cdcb769a8c in dummy_start (data=0x7f2abc000a80) at utils.c:1238
> #5  0x7f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
> pthread_create.c:333
>

The number one question when you supply a backtrace: Does it have symbols?

So yes, the sample above is at least workable.  It has symbols as it shows
the function
name, source file name, and line number in the backtrace.  Without symbols
nobody can
look at the backtrace and see what is going on.  It is just a bunch of
numbers and question
marks (??) with maybe a public function name.

The second question:  Is the backtrace from an unoptimized build?

Optimized builds provide some performance improvement for normal
operation.  However,
what the compiler does to the code can be difficult to figure out in a
backtrace.  The compiler
can optimize out variables that could make understanding what is going on
harder.

So it depends upon what happened if an optimized backtrace can help find
the root cause
or not.  It is up to you whether you want to run in production with an
optimized build or not.

I also recommend always compiling with BETTER_BACKTRACES enabled in
menuselect.
With that enabled then any backtraces put into log files by FRACKS and the
lock output
from the CLI command "core show locks" is understandable when symbols are
available.
You get backtraces similar to the backtrace sample above.

Richard
-- 
_
-- Bandwidth and Colocation Provided by 

Re: [asterisk-users] How can I check backtrace files ?

2017-12-06 Thread Olivier
2017-12-06 15:52 GMT+01:00 George Joseph :

>
>
> On Tue, Dec 5, 2017 at 9:20 AM, Olivier  wrote:
>
>> Hello,
>>
>> I carefully read [1] which details how backtrace files can be produced.
>>
>> Maybe this seems natural to some, but how can I go one step futher, and
>> check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?
>>
>> In other words, where can I find an example on how to use one of those
>> files and check by myself, that if a system ever fails, I won't have to
>> wait for another failure to provide required data to support teams ?
>>
>
> It's a great question but I could spend a week answering it and not
> scratch the surface. :)
>

Thanks very much for trying, anyway ;-)


>  It's not a straightforward thing unless you know the code in question.
> The most common is a segmentation fault (segfault or SEGV).
>

True ! I experienced segfaults lately and I could not configure the
platform I used then (Debian Jessie) to produce core files in a directory
Asterisk can write into.
Now, with Debian Stretch, I can produce core file at will (with a kill -s
SIGSEGV ).
I checked ast_coredumped worked OK as it produced thread.txt files and so
on.

Ideally, I would like to go one step further: check now that a future .txt
file would be "workable" (and not "you should have compiled with option XXX
or configured with option YYY) .



>   In that case, the thread1.txt file is the place to start.  Since most of
> the objects passed around are really pointers to objects, the most obvious
> cause would be a 0x0 for a value.  So for instance "chan=0x0".  That would
> be a pointer to a channel object that was not set when it probably should
> have been.  Unfortunately, it's not only 0x0 that could cause a segv.
>  Anytime a program tries to access memory it doesn't own, that signal is
> raised.  So let's say there a 256 byte buffer which the process owns.  If
> there's a bug somewhere that causes the program to try and access bytes
> beyond the end of the buffer, you MAY get a segv if that process doesn't
> also own that memory.  If this case, the backtrace won't show anything
> obvious because the pointers all look valid.  There probably would be an
> index variable (i or ix, etc) that may be set to 257 but you'd have to know
> that the buffer was only 256 bytes to realize that that was the issue.
>

So, with an artificial kill -s SIGSEGV , does the bellow output
prove I have a workable .txt files (having .txt files that let people find
the root cause of the issue is another story as we probably can only hope
for the best here) ?


# head core-brief.txt
!@!@!@! brief.txt !@!@!@!


Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x55cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
"threadpool.c", lineno=1131, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
"worker_idle", cond_name=0x55cdcb7d4b7f ">cond",
mutex_name=0x55cdcb7d4b71 ">lock", cond=0x7f2abc000978,
t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
#2  0x55cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
threadpool.c:1131
#3  0x55cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
threadpool.c:1022
#4  0x55cdcb769a8c in dummy_start (data=0x7f2abc000a80) at utils.c:1238
#5  0x7f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
pthread_create.c:333


> Deadlocks are even harder to troubleshoot.  For that, you need to look at
> full.txt to see where the threads are stuck and find the 1 thread that's
> holding the lock that the others are stuck on.
>
> Sorry.  I wish I had a better answer because it'd help a lot if folks
> could do more investigation themselves.
>
>
>
>
>
>>
>> Best regards
>>
>> [1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
>>
>> --
>> _
>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>>
>> Check out the new Asterisk community forum at:
>> https://community.asterisk.org/
>>
>> New to Asterisk? Start here:
>>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>>
>> asterisk-users mailing list
>> To UNSUBSCRIBE or update options visit:
>>http://lists.digium.com/mailman/listinfo/asterisk-users
>>
>
>
>
> --
> George Joseph
> Digium, Inc. | Software Developer
> 445 Jan Davis Drive NW - Huntsville, AL 35806 - US
> Check us out at: www.digium.com & www.asterisk.org
>
>
> --
> _
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> Check out the new Asterisk community forum at: https://community.asterisk.
> org/
>
> New to Asterisk? Start here:
>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>

Re: [asterisk-users] How can I check backtrace files ?

2017-12-06 Thread George Joseph
On Tue, Dec 5, 2017 at 9:20 AM, Olivier  wrote:

> Hello,
>
> I carefully read [1] which details how backtrace files can be produced.
>
> Maybe this seems natural to some, but how can I go one step futher, and
> check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?
>
> In other words, where can I find an example on how to use one of those
> files and check by myself, that if a system ever fails, I won't have to
> wait for another failure to provide required data to support teams ?
>

It's a great question but I could spend a week answering it and not scratch
the surface. :)   It's not a straightforward thing unless you know the code
in question.  The most common is a segmentation fault (segfault or SEGV).
In that case, the thread1.txt file is the place to start.  Since most of
the objects passed around are really pointers to objects, the most obvious
cause would be a 0x0 for a value.  So for instance "chan=0x0".  That would
be a pointer to a channel object that was not set when it probably should
have been.  Unfortunately, it's not only 0x0 that could cause a segv.
 Anytime a program tries to access memory it doesn't own, that signal is
raised.  So let's say there a 256 byte buffer which the process owns.  If
there's a bug somewhere that causes the program to try and access bytes
beyond the end of the buffer, you MAY get a segv if that process doesn't
also own that memory.  If this case, the backtrace won't show anything
obvious because the pointers all look valid.  There probably would be an
index variable (i or ix, etc) that may be set to 257 but you'd have to know
that the buffer was only 256 bytes to realize that that was the issue.

Deadlocks are even harder to troubleshoot.  For that, you need to look at
full.txt to see where the threads are stuck and find the 1 thread that's
holding the lock that the others are stuck on.

Sorry.  I wish I had a better answer because it'd help a lot if folks could
do more investigation themselves.





>
> Best regards
>
> [1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
>
> --
> _
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> Check out the new Asterisk community forum at: https://community.asterisk.
> org/
>
> New to Asterisk? Start here:
>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>http://lists.digium.com/mailman/listinfo/asterisk-users
>



-- 
George Joseph
Digium, Inc. | Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

[asterisk-users] How can I check backtrace files ?

2017-12-05 Thread Olivier
Hello,

I carefully read [1] which details how backtrace files can be produced.

Maybe this seems natural to some, but how can I go one step futher, and
check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?

In other words, where can I find an example on how to use one of those
files and check by myself, that if a system ever fails, I won't have to
wait for another failure to provide required data to support teams ?

Best regards

[1] https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users