On 3/23/23 07:14, Ales Musil wrote:
> 
> 
> On Wed, Mar 22, 2023 at 3:00 PM Aaron Conole <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     Hi Ales,
> 
>  
> Hi Aaron,
> 
> 
>     Ales Musil <[email protected] <mailto:[email protected]>> writes:
> 
>     > Use the backtrace functions that is provided by libc,
>     > this allows us to get backtrace that is independent of
>     > the current memory map of the process. Which in turn can
>     > be used for debugging/tracing purpose. The backtrace
>     > is not 100% accurate due to various optimizations, most
>     > notably "-fomit-frame-pointer" and LTO. This might result
>     > that the line in source file doesn't correspond to the
>     > real line. However, it should be able to pinpoint at least
>     > the function where the backtrace was called.
>     >
>     > The usage for SIGSEGV is determined during compilation
>     > based on available libraries. Libunwind has higher priority
>     > if both methods are available to keep the compatibility with
>     > current behavior.
>     >
>     > The backtrace is not marked as signal safe however the backtrace
>     > manual page gives more detailed explanation why it might be the
>     > case [0]. Load the "libgcc" or equivalent in advance within the
>     > "fatal_signal_init" which should ensure that subsequent calls
>     > to backtrace* do not call malloc and are signal safe.
>     >
>     > The typical backtrace will look similar to the one below:
>     > /lib64/libopenvswitch-3.1.so 
> <http://libopenvswitch-3.1.so>.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
>     > /lib64/libopenvswitch-3.1.so 
> <http://libopenvswitch-3.1.so>.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
>     > /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
>     > /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) 
> [0x7fc5db563a8d]
>     > ovsdb-server(+0xa661) [0x562cfce2e661]
>     > ovsdb-server(+0x7e39) [0x562cfce2be39]
>     > /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
>     > /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
>     > ovsdb-server(+0x8c35) [0x562cfce2cc35]
>     >
>     > backtrace.h elaborates on how to effectively get the line
>     > information associated with the addressed presented in the
>     > backtrace.
>     >
>     > [0]
>     > backtrace() and backtrace_symbols_fd() don't call malloc()
>     > explicitly, but they are part of libgcc, which gets loaded
>     > dynamically when first used.  Dynamic loading usually triggers
>     > a call to malloc(3).  If you need certain calls to these two
>     > functions to not allocate memory (in signal handlers, for
>     > example), you need to make sure libgcc is loaded beforehand
>     >
>     > Reported-at: https://bugzilla.redhat.com/2177760 
> <https://bugzilla.redhat.com/2177760>
>     > Signed-off-by: Ales Musil <[email protected] <mailto:[email protected]>>
>     > ---
>     > v2: Extend the current use of libunwind rather than replacing it.
> 
>     Thanks for this.  Sorry for the quick review here - I've not gotten a
>     chance to get too in-depth.  Comments below.
> 
> 
> Thanks for the feedback, please see the reply below.
>  
> 
> 
>     > ---
>     >  include/openvswitch/vlog.h |   4 +-
>     >  lib/backtrace.c            |  77 +++++++++-------------------
>     >  lib/backtrace.h            |  58 ++++++++++++---------
>     >  lib/daemon-unix.c          |   1 -
>     >  lib/fatal-signal.c         | 102 ++++++++++++++++++++++---------------
>     >  lib/ovsdb-error.c          |  15 ++----
>     >  lib/vlog.c                 |  14 ++---
>     >  m4/openvswitch.m4          |   9 ++--
>     >  tests/atlocal.in <http://atlocal.in>           |   2 +
>     >  tests/daemon.at <http://daemon.at>            |  71 
> ++++++++++++++++++++++++++
>     >  10 files changed, 208 insertions(+), 145 deletions(-)

<snip>

>     >          return;
>     >      }
>     > 
>     > @@ -204,44 +212,58 @@ send_backtrace_to_monitor(void) {
>     >          dep++;
>     >      }
>     > 
>     > -    if (monitor) {
> 
>     Hrrm.. did we lose the ability to send the backtrace back to the monitor
>     with this patchset?  It looks like now the daemon that is failing to
>     segv direct writes the data to the logfile, did I get it right?  If so,
>     why change this behavior?
> 
> 
> 
> Unless I'm missing something we don't need it. AFAIU the log file is the same 
> for both
> daemon and the monitor. So we can actually do the operation directly without 
> the need
> to send it to monitor with the same result.

I think, we need to send to the monitor.  We're in the signal context here
and, even worse, the SIGSEGV context.  Chances are, this process is messed up.
Even under normal conditions disk writes are tricky and there is no guarantee
that successful write() call actually results in data being written.
Here, in the signal handler, we do not re-try or even check the result.  And
we're definitely not syncing the file system before re-rising the signal.
This write() call is just a last resort when we can't really do anything else,
and we just hope that some of that data will appear in the log file.  It may
also end up in the middle of some other log messages that we're asynchronously
requested form other threads before the crash happened.

With writing from the monitor we have a proper logging with all the error
handling that will make sure that data wasn't lost.  And we're writing a much
smaller chunk of data to a much more reliable pipe from the signal handler,
increasing our chances for success.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to