[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-03-29 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #8 from Mike J  ---
Hi.

This bug can be closed. It is not caused by valgrind.

In case it is of use in future to anyone, further checks have shown that
TaniumClient version 7.4.9.1046 was running on the system and caused the
problem. valgrind was working on the system with an earlier TaniumClient
release, but stopped working when the TaniumClient package was upgraded late
last year, affecting valgrind runs.

As indirectly noted in the earlier comments, the C library getaddrinfo()
function is dynamically loaded when first called by an application. I took
valgrind out of the picture and ran "/usr/bin/hostname -d" under the control of
the gdb debugger.
- With TaniumClient running, on entry to getaddrinfo(), the int3 instruction is
seen instead of the expected push %rbp instruction, corrupting the call stack
and raising a SIGTRAP. If run with valgrind, valgrind catches the raised
SIGTRAP signal in this case and exits with a core dump, showing a corrupt call
stack.
- When TaniumClient is stopped, the expected push %rbp instruction is instead
seen. If run with valgrind, it runs correctly and completes normally.

Thanks to Paul Floyd and Mark Wielaard for checking the problem and debugging
advice which pointed me in the right direction for problem diagnosis.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-28 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #6 from Mike J  ---
Thanks Paul. I was unaware of TUI mode, its really useful.

The following extract is from the TUI asm and command windows.
It shows a int3 rather than a push %rbp on the initial entry, where the call
stack still shows as being normal.
On the stepi (rather than a step tried previously), the call stack has then
become corrupted.

Is the int3 likely to be something that valgrind might introduce instead of the
push %rbp ?

B+>x0x534b5e0 <__GI_getaddrinfo>int3   
   x0x534b5e1 <__GI_getaddrinfo+1>  mov%rsp,%rbp
   x0x534b5e4 <__GI_getaddrinfo+4>  push   %r15 
   x0x534b5e6 <__GI_getaddrinfo+6>  push   %r14 
   x0x534b5e8 <__GI_getaddrinfo+8>  mov%rdi,%r14
   x0x534b5eb <__GI_getaddrinfo+11> push   %r13 
   x0x534b5ed <__GI_getaddrinfo+13> mov%rsi,%r13
   x0x534b5f0 <__GI_getaddrinfo+16> push   %r12 
   x0x534b5f2 <__GI_getaddrinfo+18> mov%rdx,%r12
   x0x534b5f5 <__GI_getaddrinfo+21> push   %rbx 
   x0x534b5f6 <__GI_getaddrinfo+22> sub$0x518,%rsp
   x0x534b5fd <__GI_getaddrinfo+29> test   %rdi,%rdi
   x0x534b600 <__GI_getaddrinfo+32> mov%rcx,-0x530(%rbp)

(gdb) where
#0  __GI_getaddrinfo (name=0x5632040 "hostname.localdomain",
service=service@entry=0x0, hints=hints@entry=0x1ffefff930,
pai=pai@entry=0x1ffefff928) at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x00401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2  0x004013e4 in main (argc=2, argv=0x1ffefffb98) at hostname.c:550

(gdb) stepi
stepi

  >x0x534b5e1 <__GI_getaddrinfo+1>  mov%rsp,%rbp
   x0x534b5e4 <__GI_getaddrinfo+4>  push   %r15
   x0x534b5e6 <__GI_getaddrinfo+6>  push   %r14
   x0x534b5e8 <__GI_getaddrinfo+8>  mov%rdi,%r14
   x0x534b5eb <__GI_getaddrinfo+11> push   %r13 
   x0x534b5ed <__GI_getaddrinfo+13> mov%rsi,%r13
   x0x534b5f0 <__GI_getaddrinfo+16> push   %r12 
   x0x534b5f2 <__GI_getaddrinfo+18> mov%rdx,%r12
   x0x534b5f5 <__GI_getaddrinfo+21> push   %rbx 
   x0x534b5f6 <__GI_getaddrinfo+22> sub$0x518,%rsp
   x0x534b5fd <__GI_getaddrinfo+29> test   %rdi,%rdi
   x0x534b600 <__GI_getaddrinfo+32> mov%rcx,-0x530(%rbp)
   x0x534b607 <__GI_getaddrinfo+39> movq   $0x0,-0x4c0(%rbp)

(gdb) where
where
#0  0x0534b5e1 in __GI_getaddrinfo (name=0x5632040
"hostname.localdomain", service=0x0, hints=0x1ffefff930, pai=0x1ffefff928)
at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x001ffefffa40 in ?? ()
#2  0x0529d226 in __GI_getenv (name=0x1ffefff930 "\002") at getenv.c:35
#3  0x in ?? ()

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-27 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #4 from Mike J  ---
It was noted that we have Dynatrace OneAgent installed, which preloads one of
its libraries by adding it to /etc/ld.so.preload. Although originally thought
it might be involved in the problem, we have concluded today that it is not
involved, by doing two valgrind runs with gdb attached on hostname -d, with and
without the preloaded library.

The two run details are shown below with gdb output, in the hope that somebody
can spot something untoward that valgrind may be doing.
In the first run, /etc/ld.so.preload is set up to dynamically link in a
Dynatrace OneAgent library for each program started.
In the second run, /etc/ld.so.preload is renamed and ldconfig run to relink
runtime shared library cache
GDB is attached once valgrind is started.
Breakpoints are set on show_name and getaddrinfo, but stepped through from the
show_name breakpoint to also watch the dynamic linker behaviour in loading the
getaddrinfo call.
The initial step from show_name() to getaddrinfo() call shows dynamic linker
involved in loading call from glibc library.
On first entry into getaddrinfo function, the callstack is OK.
On the next step instruction, the callstack becomes corrupted.
Continuing on leads to a SIGSEGV, rather than a SIGTRAP, which crashes the
program. 
Both runs are identical in outcome.
If the debugger is not attached, a SIGTRAP is instead raised, which crashes the
program.

Lines with @ below are eye catchers for relevant notes

@
First run
@

[auser@hostname ~]$ cat /etc/ld.so.preload
/$LIB/liboneagentproc.so

[auser@hostname ~]$ ldd /usr/bin/hostname
linux-vdso.so.1 =>  (0x7ffd02d96000)
/$LIB/liboneagentproc.so => /lib64/liboneagentproc.so
(0x2af8ce652000)
libnsl.so.1 => /usr/lib64/libnsl.so.1 (0x2af8ce86)
libc.so.6 => /usr/lib64/libc.so.6 (0x2af8cea7a000)
/lib64/ld-linux-x86-64.so.2 (0x2af8ce42e000)

Terminal 1
valgrind --trace-signals=yes -v --log-file=valgrind.out.2  --vgdb=full
--vgdb-stop-at=startup hostname -d   
Terminal 2
 cat valgrind.out.2
==77535== Memcheck, a memory error detector
==77535== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==77535== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h
for copyright info
==77535== Command: hostname -d
==77535== Parent PID: 111647
==77535==
--77535--
--77535-- Valgrind options:
--77535----trace-signals=yes
--77535---v
--77535----log-file=valgrind.out.2
--77535----vgdb=full
--77535----vgdb-stop-at=startup
--77535-- Contents of /proc/version:
--77535--   Linux version 3.10.0-1160.81.1.el7.x86_64
(mockbu...@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red
Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--77535--
--77535-- Arch and hwcaps: AMD64, LittleEndian,
amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--77535-- Page sizes: currently 4096, max supported 4096
--77535-- Valgrind library directory: /home/auser/local/libexec/valgrind
--77535-- Reading syms from /usr/bin/hostname
--77535--   Considering
/usr/lib/debug/.build-id/93/633698bd11eeb4bee21a388c191a5656990d8e.debug ..
--77535--   .. build-id is valid
--77535-- Reading syms from /usr/lib64/ld-2.17.so
--77535--   Considering
/usr/lib/debug/.build-id/62/c449974331341bb08dcce3859560a22af1e172.debug ..
--77535--   .. build-id is valid
--77535-- Reading syms from
/home/auser/local/libexec/valgrind/memcheck-amd64-linux
--77535--object doesn't have a dynamic symbol table
--77535-- Scheduler: using generic scheduler lock implementation.
--77535-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--77535-- Reading suppressions file:
/home/auser/local/libexec/valgrind/default.supp
==77535== (action at startup) vgdb me ...
==77535== embedded gdbserver: reading from
/tmp/vgdb-pipe-from-vgdb-to-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: writing to  
/tmp/vgdb-pipe-to-vgdb-from-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: shared mem  
/tmp/vgdb-pipe-shared-mem-vgdb-77535-by-auser-on-hostname.localdomain
==77535==
==77535== TO CONTROL THIS PROCESS USING vgdb (which you probably
==77535== don't want to do, unless you know exactly what you're doing,
==77535== or are doing some strange experiment):
==77535==   /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
...command...
==77535==
==77535== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==77535==   /path/to/gdb hostname
==77535== and then give GDB the following command
==77535==   target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb
--pid=77535
==77535== --pid is optional if only one valgrind process is running
==77535==
[auser@hostname ~]$ gdb /usr/bin/hostname
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is 

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-21 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #3 from Mike J  ---
Thanks. Although the sysadmins installed the correct debuginfo for glibc and
hostname today, I won't have collatable results from this until 23rd Feb. I'll
provide an update then

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] New: SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-20 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

Bug ID: 466172
   Summary: SIGTRAP crash whenever getaddrinfo call is issued by
valgrind
Classification: Developer tools
   Product: valgrind
   Version: 3.20.0
  Platform: RedHat Enterprise Linux
OS: Linux
Status: REPORTED
  Severity: crash
  Priority: NOR
 Component: memcheck
  Assignee: jsew...@acm.org
  Reporter: do.not.spam.me.kde.bugzi...@gmail.com
  Target Milestone: ---

SUMMARY
On my work RHEL 7.9 development system, valgrind coredumps and exits following
a SIGTRAP generated when calling the C library getaddrinfo function. This
occurs when running valgrind on our internal applications that call
getaddrinfo(), and also when running valgrind on the 'hostname -d' command.
This will produce a core file for the program. Output from valgrind is
available below, as is gdb output from the core file. The problem occurs with
the memcheck tool, and also the callgrind tool.
Further details given below are from running valgrind with hostname -d.

 No failure occurs when running  valgrind  when hostname has no args, as it
does not generate a getaddrinfo call.

Similar results were seen when running the native valgrind v3.15 install for
RHEL 7.9, as seen when running a locally compiled valgrind v3.20 release. The
stack trace in the core file does not make a lot of sense as the system doesn't
currently have debuginfo packages installed. I'm trying to arrange these for
the glibc and hostname packages via a sysadmin, but this is not available at
the moment. 

On my home host with VirtualBox, using RHEL 7.9, RHEL 8.7 and Fedora 35,
running valgrind with signal tracing turned on against 'hostname -d' shows two
SIGSEGV signals are raised but handled by valgrind and it continues to
successful command completion. I can't comment as to whether this should or
shouldn't happen, I've just used this to establish an expected baseline. When
run with strace instead of valgrind, no signals are found to be raised. When
running strace with valgrind and hostname -d, both strace and valgrind report
the two SIGSEGV signals.

STEPS TO REPRODUCE
1. valgrind --trace-signals=yes -v hostname -d
2. 
3. 

OBSERVED RESULT
valgrind handles a single SIGSEGV and continues, then handles a SIGTRAP and
core dumps. valgrind output indicates the crash happened during a getaddrinfo
call.

EXPECTED RESULT
Program being run with valgrind should survive a call to getaddrinfo and
continue running, so that memory leak checking can be carried out.

SOFTWARE/OS VERSIONS
Linux: RHEL 7.9 server without GUI installed

ADDITIONAL INFORMATION
Valgrind generates the following output
==80114== Memcheck, a memory error detector
==80114== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==80114== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h
for copyright info
==80114== Command: hostname -d
==80114== Parent PID: 20454
==80114== 
--80114-- 
--80114-- Valgrind options:
--80114----trace-signals=yes
--80114---v
--80114----log-file=valgrind.out
--80114-- Contents of /proc/version:
--80114--   Linux version 3.10.0-1160.81.1.el7.x86_64
(mockbu...@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red
Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--80114-- 
--80114-- Arch and hwcaps: AMD64, LittleEndian,
amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--80114-- Page sizes: currently 4096, max supported 4096
--80114-- Valgrind library directory: /home/auser/local/libexec/valgrind
--80114-- Reading syms from /usr/bin/hostname
--80114--object doesn't have a symbol table
--80114-- Reading syms from /usr/lib64/ld-2.17.so
--80114-- Reading syms from
/home/auser/local/libexec/valgrind/memcheck-amd64-linux
--80114--object doesn't have a dynamic symbol table
--80114-- Scheduler: using generic scheduler lock implementation.
--80114-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--80114-- Reading suppressions file:
/home/auser/local/libexec/valgrind/default.supp
==80114== embedded gdbserver: reading from
/tmp/vgdb-pipe-from-vgdb-to-80114-by-auser-on-hostname.localdomain
==80114== embedded gdbserver: writing to  
/tmp/vgdb-pipe-to-vgdb-from-80114-by-auser-on-hostname.localdomain
==80114== embedded gdbserver: shared mem  
/tmp/vgdb-pipe-shared-mem-vgdb-80114-by-auser-on-hostname.localdomain
==80114== 
==80114== TO CONTROL THIS PROCESS USING vgdb (which you probably
==80114== don't want to do, unless you know exactly what you're doing,
==80114== or are doing some strange experiment):
==80114==   /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=80114
...command...
==80114== 
==80114== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==80114==   /path/to/gdb hostname
==80114== and then give GDB the following command
==80114==   target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb