Re: [ovs-dev] [PATCH v2] backtrace: Extent the backtrace functionality

Ales Musil Thu, 23 Mar 2023 06:20:09 -0700

On Thu, Mar 23, 2023 at 2:09 PM Ilya Maximets <[email protected]> wrote:


> On 3/23/23 07:14, Ales Musil wrote:
> >
> >
> > On Wed, Mar 22, 2023 at 3:00 PM Aaron Conole <[email protected]
> <mailto:[email protected]>> wrote:
> >
> >     Hi Ales,
> >
> >
> > Hi Aaron,
> >
> >
> >     Ales Musil <[email protected] <mailto:[email protected]>> writes:
> >
> >     > Use the backtrace functions that is provided by libc,
> >     > this allows us to get backtrace that is independent of
> >     > the current memory map of the process. Which in turn can
> >     > be used for debugging/tracing purpose. The backtrace
> >     > is not 100% accurate due to various optimizations, most
> >     > notably "-fomit-frame-pointer" and LTO. This might result
> >     > that the line in source file doesn't correspond to the
> >     > real line. However, it should be able to pinpoint at least
> >     > the function where the backtrace was called.
> >     >
> >     > The usage for SIGSEGV is determined during compilation
> >     > based on available libraries. Libunwind has higher priority
> >     > if both methods are available to keep the compatibility with
> >     > current behavior.
> >     >
> >     > The backtrace is not marked as signal safe however the backtrace
> >     > manual page gives more detailed explanation why it might be the
> >     > case [0]. Load the "libgcc" or equivalent in advance within the
> >     > "fatal_signal_init" which should ensure that subsequent calls
> >     > to backtrace* do not call malloc and are signal safe.
> >     >
> >     > The typical backtrace will look similar to the one below:
> >     > /lib64/libopenvswitch-3.1.so 
> > <http://libopenvswitch-3.1.so>.0(backtrace_capture+0x1e)
> [0x7fc5db298dfe]
> >     > /lib64/libopenvswitch-3.1.so 
> > <http://libopenvswitch-3.1.so>.0(log_backtrace_at+0x57)
> [0x7fc5db2999e7]
> >     > /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
> >     > /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d)
> [0x7fc5db563a8d]
> >     > ovsdb-server(+0xa661) [0x562cfce2e661]
> >     > ovsdb-server(+0x7e39) [0x562cfce2be39]
> >     > /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
> >     > /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
> >     > ovsdb-server(+0x8c35) [0x562cfce2cc35]
> >     >
> >     > backtrace.h elaborates on how to effectively get the line
> >     > information associated with the addressed presented in the
> >     > backtrace.
> >     >
> >     > [0]
> >     > backtrace() and backtrace_symbols_fd() don't call malloc()
> >     > explicitly, but they are part of libgcc, which gets loaded
> >     > dynamically when first used.  Dynamic loading usually triggers
> >     > a call to malloc(3).  If you need certain calls to these two
> >     > functions to not allocate memory (in signal handlers, for
> >     > example), you need to make sure libgcc is loaded beforehand
> >     >
> >     > Reported-at: https://bugzilla.redhat.com/2177760 <
> https://bugzilla.redhat.com/2177760>
> >     > Signed-off-by: Ales Musil <[email protected] <mailto:
> [email protected]>>
> >     > ---
> >     > v2: Extend the current use of libunwind rather than replacing it.
> >
> >     Thanks for this.  Sorry for the quick review here - I've not gotten a
> >     chance to get too in-depth.  Comments below.
> >
> >
> > Thanks for the feedback, please see the reply below.
> >
> >
> >
> >     > ---
> >     >  include/openvswitch/vlog.h |   4 +-
> >     >  lib/backtrace.c            |  77 +++++++++-------------------
> >     >  lib/backtrace.h            |  58 ++++++++++++---------
> >     >  lib/daemon-unix.c          |   1 -
> >     >  lib/fatal-signal.c         | 102
> ++++++++++++++++++++++---------------
> >     >  lib/ovsdb-error.c          |  15 ++----
> >     >  lib/vlog.c                 |  14 ++---
> >     >  m4/openvswitch.m4          |   9 ++--
> >     >  tests/atlocal.in <http://atlocal.in>           |   2 +
> >     >  tests/daemon.at <http://daemon.at>            |  71
> ++++++++++++++++++++++++++
> >     >  10 files changed, 208 insertions(+), 145 deletions(-)
>
> <snip>
>
> >     >          return;
> >     >      }
> >     >
> >     > @@ -204,44 +212,58 @@ send_backtrace_to_monitor(void) {
> >     >          dep++;
> >     >      }
> >     >
> >     > -    if (monitor) {
> >
> >     Hrrm.. did we lose the ability to send the backtrace back to the
> monitor
> >     with this patchset?  It looks like now the daemon that is failing to
> >     segv direct writes the data to the logfile, did I get it right?  If
> so,
> >     why change this behavior?
> >
> >
> >
> > Unless I'm missing something we don't need it. AFAIU the log file is the
> same for both
> > daemon and the monitor. So we can actually do the operation directly
> without the need
> > to send it to monitor with the same result.
>
> I think, we need to send to the monitor.  We're in the signal context here
> and, even worse, the SIGSEGV context.  Chances are, this process is messed
> up.
> Even under normal conditions disk writes are tricky and there is no
> guarantee
> that successful write() call actually results in data being written.
> Here, in the signal handler, we do not re-try or even check the result.
> And
> we're definitely not syncing the file system before re-rising the signal.
> This write() call is just a last resort when we can't really do anything
> else,
> and we just hope that some of that data will appear in the log file.  It
> may
> also end up in the middle of some other log messages that we're
> asynchronously
> requested form other threads before the crash happened.
>
> With writing from the monitor we have a proper logging with all the error
> handling that will make sure that data wasn't lost.  And we're writing a
> much
> smaller chunk of data to a much more reliable pipe from the signal handler,
> increasing our chances for success.
>
> Best regards, Ilya Maximets.
>
>
That makes sense, there is a catch, we need to have a common format
(string?) for both
libunwind and backtrace or something that would differentiate them for the
logging
inside monitor. Because the libunwind currently relies on the fact that it
gets decoded
in the monitor, but backtrace_monitor_fd will send it as a sequence of
strings.

Thanks.
Ales

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA <https://www.redhat.com>

[email protected]    IM: amusil
<https://red.ht/sig>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] backtrace: Extent the backtrace functionality

Reply via email to