On 3/23/23 14:19, Ales Musil wrote: > > > On Thu, Mar 23, 2023 at 2:09 PM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: > > On 3/23/23 07:14, Ales Musil wrote: > > > > > > On Wed, Mar 22, 2023 at 3:00 PM Aaron Conole <[email protected] > <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> wrote: > > > > Hi Ales, > > > > > > Hi Aaron, > > > > > > Ales Musil <[email protected] <mailto:[email protected]> > <mailto:[email protected] <mailto:[email protected]>>> writes: > > > > > Use the backtrace functions that is provided by libc, > > > this allows us to get backtrace that is independent of > > > the current memory map of the process. Which in turn can > > > be used for debugging/tracing purpose. The backtrace > > > is not 100% accurate due to various optimizations, most > > > notably "-fomit-frame-pointer" and LTO. This might result > > > that the line in source file doesn't correspond to the > > > real line. However, it should be able to pinpoint at least > > > the function where the backtrace was called. > > > > > > The usage for SIGSEGV is determined during compilation > > > based on available libraries. Libunwind has higher priority > > > if both methods are available to keep the compatibility with > > > current behavior. > > > > > > The backtrace is not marked as signal safe however the backtrace > > > manual page gives more detailed explanation why it might be the > > > case [0]. Load the "libgcc" or equivalent in advance within the > > > "fatal_signal_init" which should ensure that subsequent calls > > > to backtrace* do not call malloc and are signal safe. > > > > > > The typical backtrace will look similar to the one below: > > > /lib64/libopenvswitch-3.1.so <http://libopenvswitch-3.1.so> > <http://libopenvswitch-3.1.so > <http://libopenvswitch-3.1.so>>.0(backtrace_capture+0x1e) [0x7fc5db298dfe] > > > /lib64/libopenvswitch-3.1.so <http://libopenvswitch-3.1.so> > <http://libopenvswitch-3.1.so > <http://libopenvswitch-3.1.so>>.0(log_backtrace_at+0x57) [0x7fc5db2999e7] > > > /lib64/libovsdb-3.1.so.0(ovsdb_txn_complete+0x7b) [0x7fc5db56247b] > > > /lib64/libovsdb-3.1.so.0(ovsdb_txn_propose_commit_block+0x8d) > [0x7fc5db563a8d] > > > ovsdb-server(+0xa661) [0x562cfce2e661] > > > ovsdb-server(+0x7e39) [0x562cfce2be39] > > > /lib64/libc.so.6(+0x27b4a) [0x7fc5db048b4a] > > > /lib64/libc.so.6(__libc_start_main+0x8b) [0x7fc5db048c0b] > > > ovsdb-server(+0x8c35) [0x562cfce2cc35] > > > > > > backtrace.h elaborates on how to effectively get the line > > > information associated with the addressed presented in the > > > backtrace. > > > > > > [0] > > > backtrace() and backtrace_symbols_fd() don't call malloc() > > > explicitly, but they are part of libgcc, which gets loaded > > > dynamically when first used. Dynamic loading usually triggers > > > a call to malloc(3). If you need certain calls to these two > > > functions to not allocate memory (in signal handlers, for > > > example), you need to make sure libgcc is loaded beforehand > > > > > > Reported-at: https://bugzilla.redhat.com/2177760 > <https://bugzilla.redhat.com/2177760> <https://bugzilla.redhat.com/2177760 > <https://bugzilla.redhat.com/2177760>> > > > Signed-off-by: Ales Musil <[email protected] > <mailto:[email protected]> <mailto:[email protected] > <mailto:[email protected]>>> > > > --- > > > v2: Extend the current use of libunwind rather than replacing it. > > > > Thanks for this. Sorry for the quick review here - I've not gotten > a > > chance to get too in-depth. Comments below. > > > > > > Thanks for the feedback, please see the reply below. > > > > > > > > > --- > > > include/openvswitch/vlog.h | 4 +- > > > lib/backtrace.c | 77 +++++++++------------------- > > > lib/backtrace.h | 58 ++++++++++++--------- > > > lib/daemon-unix.c | 1 - > > > lib/fatal-signal.c | 102 > ++++++++++++++++++++++--------------- > > > lib/ovsdb-error.c | 15 ++---- > > > lib/vlog.c | 14 ++--- > > > m4/openvswitch.m4 | 9 ++-- > > > tests/atlocal.in <http://atlocal.in> <http://atlocal.in > <http://atlocal.in>> | 2 + > > > tests/daemon.at <http://daemon.at> <http://daemon.at > <http://daemon.at>> | 71 ++++++++++++++++++++++++++ > > > 10 files changed, 208 insertions(+), 145 deletions(-) > > <snip> > > > > return; > > > } > > > > > > @@ -204,44 +212,58 @@ send_backtrace_to_monitor(void) { > > > dep++; > > > } > > > > > > - if (monitor) { > > > > Hrrm.. did we lose the ability to send the backtrace back to the > monitor > > with this patchset? It looks like now the daemon that is failing to > > segv direct writes the data to the logfile, did I get it right? If > so, > > why change this behavior? > > > > > > > > Unless I'm missing something we don't need it. AFAIU the log file is > the same for both > > daemon and the monitor. So we can actually do the operation directly > without the need > > to send it to monitor with the same result. > > I think, we need to send to the monitor. We're in the signal context here > and, even worse, the SIGSEGV context. Chances are, this process is > messed up. > Even under normal conditions disk writes are tricky and there is no > guarantee > that successful write() call actually results in data being written. > Here, in the signal handler, we do not re-try or even check the result. > And > we're definitely not syncing the file system before re-rising the signal. > This write() call is just a last resort when we can't really do anything > else, > and we just hope that some of that data will appear in the log file. It > may > also end up in the middle of some other log messages that we're > asynchronously > requested form other threads before the crash happened. > > With writing from the monitor we have a proper logging with all the error > handling that will make sure that data wasn't lost. And we're writing a > much > smaller chunk of data to a much more reliable pipe from the signal > handler, > increasing our chances for success. > > Best regards, Ilya Maximets. > > > That makes sense, there is a catch, we need to have a common format (string?) > for both > libunwind and backtrace or something that would differentiate them for the > logging > inside monitor. Because the libunwind currently relies on the fact that it > gets decoded > in the monitor, but backtrace_monitor_fd will send it as a sequence of > strings.
Monitor and the main deamon are the same binary. Compile-time checking, i.e. conditional compiling, is fine. > > Thanks. > Ales > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > [email protected] <mailto:[email protected]> IM: amusil > > <https://red.ht/sig> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
