On 3/23/23 14:19, Ales Musil wrote:
> 
> 
> On Thu, Mar 23, 2023 at 2:09 PM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 3/23/23 07:14, Ales Musil wrote:
>     >
>     >
>     > On Wed, Mar 22, 2023 at 3:00 PM Aaron Conole <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>> wrote:
>     >
>     >     Hi Ales,
>     >
>     >  
>     > Hi Aaron,
>     >
>     >
>     >     Ales Musil <[email protected] <mailto:[email protected]> 
> <mailto:[email protected] <mailto:[email protected]>>> writes:
>     >
>     >     > Use the backtrace functions that is provided by libc,
>     >     > this allows us to get backtrace that is independent of
>     >     > the current memory map of the process. Which in turn can
>     >     > be used for debugging/tracing purpose. The backtrace
>     >     > is not 100% accurate due to various optimizations, most
>     >     > notably "-fomit-frame-pointer" and LTO. This might result
>     >     > that the line in source file doesn't correspond to the
>     >     > real line. However, it should be able to pinpoint at least
>     >     > the function where the backtrace was called.
>     >     >
>     >     > The usage for SIGSEGV is determined during compilation
>     >     > based on available libraries. Libunwind has higher priority
>     >     > if both methods are available to keep the compatibility with
>     >     > current behavior.
>     >     >
>     >     > The backtrace is not marked as signal safe however the backtrace
>     >     > manual page gives more detailed explanation why it might be the
>     >     > case [0]. Load the "libgcc" or equivalent in advance within the
>     >     > "fatal_signal_init" which should ensure that subsequent calls
>     >     > to backtrace* do not call malloc and are signal safe.
>     >     >
>     >     > The typical backtrace will look similar to the one below:
>     >     > /lib64/libopenvswitch-3.1.so <http://libopenvswitch-3.1.so> 
> <http://libopenvswitch-3.1.so 
> <http://libopenvswitch-3.1.so>>.0(backtrace_capture+0x1e) [0x7fc5db298dfe]
>     >     > /lib64/libopenvswitch-3.1.so <http://libopenvswitch-3.1.so> 
> <http://libopenvswitch-3.1.so 
> <http://libopenvswitch-3.1.so>>.0(log_backtrace_at+0x57) [0x7fc5db2999e7]
>     >     > /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b]
>     >     > /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) 
> [0x7fc5db563a8d]
>     >     > ovsdb-server(+0xa661) [0x562cfce2e661]
>     >     > ovsdb-server(+0x7e39) [0x562cfce2be39]
>     >     > /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a]
>     >     > /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b]
>     >     > ovsdb-server(+0x8c35) [0x562cfce2cc35]
>     >     >
>     >     > backtrace.h elaborates on how to effectively get the line
>     >     > information associated with the addressed presented in the
>     >     > backtrace.
>     >     >
>     >     > [0]
>     >     > backtrace() and backtrace_symbols_fd() don't call malloc()
>     >     > explicitly, but they are part of libgcc, which gets loaded
>     >     > dynamically when first used.  Dynamic loading usually triggers
>     >     > a call to malloc(3).  If you need certain calls to these two
>     >     > functions to not allocate memory (in signal handlers, for
>     >     > example), you need to make sure libgcc is loaded beforehand
>     >     >
>     >     > Reported-at: https://bugzilla.redhat.com/2177760 
> <https://bugzilla.redhat.com/2177760> <https://bugzilla.redhat.com/2177760 
> <https://bugzilla.redhat.com/2177760>>
>     >     > Signed-off-by: Ales Musil <[email protected] 
> <mailto:[email protected]> <mailto:[email protected] 
> <mailto:[email protected]>>>
>     >     > ---
>     >     > v2: Extend the current use of libunwind rather than replacing it.
>     >
>     >     Thanks for this.  Sorry for the quick review here - I've not gotten 
> a
>     >     chance to get too in-depth.  Comments below.
>     >
>     >
>     > Thanks for the feedback, please see the reply below.
>     >  
>     >
>     >
>     >     > ---
>     >     >  include/openvswitch/vlog.h |   4 +-
>     >     >  lib/backtrace.c            |  77 +++++++++-------------------
>     >     >  lib/backtrace.h            |  58 ++++++++++++---------
>     >     >  lib/daemon-unix.c          |   1 -
>     >     >  lib/fatal-signal.c         | 102 
> ++++++++++++++++++++++---------------
>     >     >  lib/ovsdb-error.c          |  15 ++----
>     >     >  lib/vlog.c                 |  14 ++---
>     >     >  m4/openvswitch.m4          |   9 ++--
>     >     >  tests/atlocal.in <http://atlocal.in> <http://atlocal.in 
> <http://atlocal.in>>           |   2 +
>     >     >  tests/daemon.at <http://daemon.at> <http://daemon.at 
> <http://daemon.at>>            |  71 ++++++++++++++++++++++++++
>     >     >  10 files changed, 208 insertions(+), 145 deletions(-)
> 
>     <snip>
> 
>     >     >          return;
>     >     >      }
>     >     > 
>     >     > @@ -204,44 +212,58 @@ send_backtrace_to_monitor(void) {
>     >     >          dep++;
>     >     >      }
>     >     > 
>     >     > -    if (monitor) {
>     >
>     >     Hrrm.. did we lose the ability to send the backtrace back to the 
> monitor
>     >     with this patchset?  It looks like now the daemon that is failing to
>     >     segv direct writes the data to the logfile, did I get it right?  If 
> so,
>     >     why change this behavior?
>     >
>     >
>     >
>     > Unless I'm missing something we don't need it. AFAIU the log file is 
> the same for both
>     > daemon and the monitor. So we can actually do the operation directly 
> without the need
>     > to send it to monitor with the same result.
> 
>     I think, we need to send to the monitor.  We're in the signal context here
>     and, even worse, the SIGSEGV context.  Chances are, this process is 
> messed up.
>     Even under normal conditions disk writes are tricky and there is no 
> guarantee
>     that successful write() call actually results in data being written.
>     Here, in the signal handler, we do not re-try or even check the result.  
> And
>     we're definitely not syncing the file system before re-rising the signal.
>     This write() call is just a last resort when we can't really do anything 
> else,
>     and we just hope that some of that data will appear in the log file.  It 
> may
>     also end up in the middle of some other log messages that we're 
> asynchronously
>     requested form other threads before the crash happened.
> 
>     With writing from the monitor we have a proper logging with all the error
>     handling that will make sure that data wasn't lost.  And we're writing a 
> much
>     smaller chunk of data to a much more reliable pipe from the signal 
> handler,
>     increasing our chances for success.
> 
>     Best regards, Ilya Maximets.
> 
> 
> That makes sense, there is a catch, we need to have a common format (string?) 
> for both
> libunwind and backtrace or something that would differentiate them for the 
> logging
> inside monitor. Because the libunwind currently relies on the fact that it 
> gets decoded
> in the monitor, but backtrace_monitor_fd will send it as a sequence of 
> strings.

Monitor and the main deamon are the same binary.
Compile-time checking, i.e. conditional compiling, is fine.

> 
> Thanks.
> Ales
> 
> -- 
> 
> Ales Musil
> 
> Senior Software Engineer - OVN Core
> 
> Red Hat EMEA <https://www.redhat.com>
> 
> [email protected] <mailto:[email protected]>    IM: amusil
> 
> <https://red.ht/sig>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to