[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-08-24 Thread Thomas Akin
https://bugs.kde.org/show_bug.cgi?id=466172

Thomas Akin  changed:

   What|Removed |Added

 CC||thomas.a...@tanium.com

--- Comment #13 from Thomas Akin  ---
The issue isn't caused by Tanium - it represents itself after the Tanium
recorder is configured to use eBPF to capture DNS acticity on certain kernel
versions. However, the actual bug appears to be in either eBPF, the kernel, or
the debugger.

You can reproduce the issue without Tanium even being installed:

# dnf install bpftrace
# bpftrace -e 'uprobe:libc:getaddrinfo {}' &
# valgrind hostname -d

We updated our configuration options to allow you to work around the issue by
disabling DNS events on systems with this issue so that you can still run the
recorder for all other events using eBPF. As it's an underlying issue on the
systems themselves, all we can do is allow you to avoid a configuration that
will trigger the underlying problem.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-05-26 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #12 from Paul Floyd  ---
I asked Tanium about this. This is their answer:

>   (Tanium UK)
>
> Hi  (Customer)​ ,
>
> Unfortunately this is a recent issue we've discovered. It's related to 
> Recorder (possibly recording DNS events). Our development team
>  are looking into this, but I can't give any timeframes for resolution.
>
> Some people have had success by upgrading their kernel. The only other 
> workaround is to switch the THR profile to Tools only to disable the
> THR Recorder extension on affected endpoints.
>
> I would encourage you to log a case with our support centre, just to get your 
> account formally registered as being affected by this issue.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-05-03 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #11 from Paul Floyd  ---
(In reply to b_betts from comment #10)
> I found that osqueryd can also cause the same problem (running under Ubuntu
> 16.04).

Interesting. I had a quick look and can't see much that would cause it -
osquery uses a lit of 3rd party libs so it is probably in one of those.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-05-03 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #10 from b_be...@yahoo.com ---
I found that osqueryd can also cause the same problem (running under Ubuntu
16.04).

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-03-30 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

Paul Floyd  changed:

   What|Removed |Added

 Resolution|--- |NOT A BUG
 Status|REPORTED|RESOLVED

--- Comment #9 from Paul Floyd  ---
Thanks for letting us know.

I also find that slightly disconcerting - the big corporate I work for also
uses Tanium, though presumably without the setting or option that caused this
problem.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-03-29 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #8 from Mike J  ---
Hi.

This bug can be closed. It is not caused by valgrind.

In case it is of use in future to anyone, further checks have shown that
TaniumClient version 7.4.9.1046 was running on the system and caused the
problem. valgrind was working on the system with an earlier TaniumClient
release, but stopped working when the TaniumClient package was upgraded late
last year, affecting valgrind runs.

As indirectly noted in the earlier comments, the C library getaddrinfo()
function is dynamically loaded when first called by an application. I took
valgrind out of the picture and ran "/usr/bin/hostname -d" under the control of
the gdb debugger.
- With TaniumClient running, on entry to getaddrinfo(), the int3 instruction is
seen instead of the expected push %rbp instruction, corrupting the call stack
and raising a SIGTRAP. If run with valgrind, valgrind catches the raised
SIGTRAP signal in this case and exits with a core dump, showing a corrupt call
stack.
- When TaniumClient is stopped, the expected push %rbp instruction is instead
seen. If run with valgrind, it runs correctly and completes normally.

Thanks to Paul Floyd and Mark Wielaard for checking the problem and debugging
advice which pointed me in the right direction for problem diagnosis.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-28 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #7 from Paul Floyd  ---
(In reply to Mike J from comment #6)
> Thanks Paul. I was unaware of TUI mode, its really useful.
> 
> The following extract is from the TUI asm and command windows.
> It shows a int3 rather than a push %rbp on the initial entry, where the call
> stack still shows as being normal.
> On the stepi (rather than a step tried previously), the call stack has then
> become corrupted.
> 
> Is the int3 likely to be something that valgrind might introduce instead of
> the push %rbp ?

int3 causes the application to stop if it is being debugged or (as the title of
this item says) terminate with SIGTRAP. Valgrind doesn't use PTRACE like
debuggers do, and it shouldn't be inserting an int3 since it will still cause a
SIGTRAP.

I don't yet have any ideas what could cause the memory to be corrupted.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-28 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #6 from Mike J  ---
Thanks Paul. I was unaware of TUI mode, its really useful.

The following extract is from the TUI asm and command windows.
It shows a int3 rather than a push %rbp on the initial entry, where the call
stack still shows as being normal.
On the stepi (rather than a step tried previously), the call stack has then
become corrupted.

Is the int3 likely to be something that valgrind might introduce instead of the
push %rbp ?

B+>x0x534b5e0 <__GI_getaddrinfo>int3   
   x0x534b5e1 <__GI_getaddrinfo+1>  mov%rsp,%rbp
   x0x534b5e4 <__GI_getaddrinfo+4>  push   %r15 
   x0x534b5e6 <__GI_getaddrinfo+6>  push   %r14 
   x0x534b5e8 <__GI_getaddrinfo+8>  mov%rdi,%r14
   x0x534b5eb <__GI_getaddrinfo+11> push   %r13 
   x0x534b5ed <__GI_getaddrinfo+13> mov%rsi,%r13
   x0x534b5f0 <__GI_getaddrinfo+16> push   %r12 
   x0x534b5f2 <__GI_getaddrinfo+18> mov%rdx,%r12
   x0x534b5f5 <__GI_getaddrinfo+21> push   %rbx 
   x0x534b5f6 <__GI_getaddrinfo+22> sub$0x518,%rsp
   x0x534b5fd <__GI_getaddrinfo+29> test   %rdi,%rdi
   x0x534b600 <__GI_getaddrinfo+32> mov%rcx,-0x530(%rbp)

(gdb) where
#0  __GI_getaddrinfo (name=0x5632040 "hostname.localdomain",
service=service@entry=0x0, hints=hints@entry=0x1ffefff930,
pai=pai@entry=0x1ffefff928) at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x00401b19 in show_name (type=type@entry=DNS) at hostname.c:339
#2  0x004013e4 in main (argc=2, argv=0x1ffefffb98) at hostname.c:550

(gdb) stepi
stepi

  >x0x534b5e1 <__GI_getaddrinfo+1>  mov%rsp,%rbp
   x0x534b5e4 <__GI_getaddrinfo+4>  push   %r15
   x0x534b5e6 <__GI_getaddrinfo+6>  push   %r14
   x0x534b5e8 <__GI_getaddrinfo+8>  mov%rdi,%r14
   x0x534b5eb <__GI_getaddrinfo+11> push   %r13 
   x0x534b5ed <__GI_getaddrinfo+13> mov%rsi,%r13
   x0x534b5f0 <__GI_getaddrinfo+16> push   %r12 
   x0x534b5f2 <__GI_getaddrinfo+18> mov%rdx,%r12
   x0x534b5f5 <__GI_getaddrinfo+21> push   %rbx 
   x0x534b5f6 <__GI_getaddrinfo+22> sub$0x518,%rsp
   x0x534b5fd <__GI_getaddrinfo+29> test   %rdi,%rdi
   x0x534b600 <__GI_getaddrinfo+32> mov%rcx,-0x530(%rbp)
   x0x534b607 <__GI_getaddrinfo+39> movq   $0x0,-0x4c0(%rbp)

(gdb) where
where
#0  0x0534b5e1 in __GI_getaddrinfo (name=0x5632040
"hostname.localdomain", service=0x0, hints=0x1ffefff930, pai=0x1ffefff928)
at ../sysdeps/posix/getaddrinfo.c:2208
#1  0x001ffefffa40 in ?? ()
#2  0x0529d226 in __GI_getenv (name=0x1ffefff930 "\002") at getenv.c:35
#3  0x in ?? ()

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-27 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #5 from Paul Floyd  ---
Thanks for the detailed analysis.

You're stepping through the code in ld.so that's resolving the PIC stuff (the
PLT). It seems to be resolving to some function address, but then the function
call is failing, Unless the function lookup is going wrong I can't think what
could be the problem.

"getaddrinfo" only takes 4 arguments so they will be passed in registers.

The asm for getaddrinfo is 

000efb00 :
   efb00:   55  push   %rbp
   efb01:   48 89 e5mov%rsp,%rbp
   efb04:   41 57   push   %r15
   efb06:   41 56   push   %r14
   efb08:   41 55   push   %r13
   efb0a:   41 54   push   %r12

If you do a 'step in' with gdb do you see the next instruction being this push
%rbp?

(I usually do this in TUI mode ctrl-x a then split screen ctrl-x 2 until I see
source / asm / command panels)

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-27 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #4 from Mike J  ---
It was noted that we have Dynatrace OneAgent installed, which preloads one of
its libraries by adding it to /etc/ld.so.preload. Although originally thought
it might be involved in the problem, we have concluded today that it is not
involved, by doing two valgrind runs with gdb attached on hostname -d, with and
without the preloaded library.

The two run details are shown below with gdb output, in the hope that somebody
can spot something untoward that valgrind may be doing.
In the first run, /etc/ld.so.preload is set up to dynamically link in a
Dynatrace OneAgent library for each program started.
In the second run, /etc/ld.so.preload is renamed and ldconfig run to relink
runtime shared library cache
GDB is attached once valgrind is started.
Breakpoints are set on show_name and getaddrinfo, but stepped through from the
show_name breakpoint to also watch the dynamic linker behaviour in loading the
getaddrinfo call.
The initial step from show_name() to getaddrinfo() call shows dynamic linker
involved in loading call from glibc library.
On first entry into getaddrinfo function, the callstack is OK.
On the next step instruction, the callstack becomes corrupted.
Continuing on leads to a SIGSEGV, rather than a SIGTRAP, which crashes the
program. 
Both runs are identical in outcome.
If the debugger is not attached, a SIGTRAP is instead raised, which crashes the
program.

Lines with @ below are eye catchers for relevant notes

@
First run
@

[auser@hostname ~]$ cat /etc/ld.so.preload
/$LIB/liboneagentproc.so

[auser@hostname ~]$ ldd /usr/bin/hostname
linux-vdso.so.1 =>  (0x7ffd02d96000)
/$LIB/liboneagentproc.so => /lib64/liboneagentproc.so
(0x2af8ce652000)
libnsl.so.1 => /usr/lib64/libnsl.so.1 (0x2af8ce86)
libc.so.6 => /usr/lib64/libc.so.6 (0x2af8cea7a000)
/lib64/ld-linux-x86-64.so.2 (0x2af8ce42e000)

Terminal 1
valgrind --trace-signals=yes -v --log-file=valgrind.out.2  --vgdb=full
--vgdb-stop-at=startup hostname -d   
Terminal 2
 cat valgrind.out.2
==77535== Memcheck, a memory error detector
==77535== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==77535== Using Valgrind-3.20.0-5147d671e4-20221024 and LibVEX; rerun with -h
for copyright info
==77535== Command: hostname -d
==77535== Parent PID: 111647
==77535==
--77535--
--77535-- Valgrind options:
--77535----trace-signals=yes
--77535---v
--77535----log-file=valgrind.out.2
--77535----vgdb=full
--77535----vgdb-stop-at=startup
--77535-- Contents of /proc/version:
--77535--   Linux version 3.10.0-1160.81.1.el7.x86_64
(mockbu...@x86-vm-38.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red
Hat 4.8.5-44) (GCC) ) #1 SMP Thu Nov 24 12:21:22 UTC 2022
--77535--
--77535-- Arch and hwcaps: AMD64, LittleEndian,
amd64-cx16-lzcnt-rdtscp-sse3-ssse3-avx-avx2-bmi-f16c-rdrand
--77535-- Page sizes: currently 4096, max supported 4096
--77535-- Valgrind library directory: /home/auser/local/libexec/valgrind
--77535-- Reading syms from /usr/bin/hostname
--77535--   Considering
/usr/lib/debug/.build-id/93/633698bd11eeb4bee21a388c191a5656990d8e.debug ..
--77535--   .. build-id is valid
--77535-- Reading syms from /usr/lib64/ld-2.17.so
--77535--   Considering
/usr/lib/debug/.build-id/62/c449974331341bb08dcce3859560a22af1e172.debug ..
--77535--   .. build-id is valid
--77535-- Reading syms from
/home/auser/local/libexec/valgrind/memcheck-amd64-linux
--77535--object doesn't have a dynamic symbol table
--77535-- Scheduler: using generic scheduler lock implementation.
--77535-- Max kernel-supported signal is 64, VG_SIGVGKILL is 64
--77535-- Reading suppressions file:
/home/auser/local/libexec/valgrind/default.supp
==77535== (action at startup) vgdb me ...
==77535== embedded gdbserver: reading from
/tmp/vgdb-pipe-from-vgdb-to-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: writing to  
/tmp/vgdb-pipe-to-vgdb-from-77535-by-auser-on-hostname.localdomain
==77535== embedded gdbserver: shared mem  
/tmp/vgdb-pipe-shared-mem-vgdb-77535-by-auser-on-hostname.localdomain
==77535==
==77535== TO CONTROL THIS PROCESS USING vgdb (which you probably
==77535== don't want to do, unless you know exactly what you're doing,
==77535== or are doing some strange experiment):
==77535==   /home/auser/local/libexec/valgrind/../../bin/vgdb --pid=77535
...command...
==77535==
==77535== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==77535==   /path/to/gdb hostname
==77535== and then give GDB the following command
==77535==   target remote | /home/auser/local/libexec/valgrind/../../bin/vgdb
--pid=77535
==77535== --pid is optional if only one valgrind process is running
==77535==
[auser@hostname ~]$ gdb /usr/bin/hostname
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you 

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-25 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=466172

b_be...@yahoo.com changed:

   What|Removed |Added

 CC||b_be...@yahoo.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-21 Thread Mike J
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #3 from Mike J  ---
Thanks. Although the sysadmins installed the correct debuginfo for glibc and
hostname today, I won't have collatable results from this until 23rd Feb. I'll
provide an update then

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-21 Thread Mark Wielaard
https://bugs.kde.org/show_bug.cgi?id=466172

Mark Wielaard  changed:

   What|Removed |Added

 CC||m...@klomp.org

--- Comment #2 from Mark Wielaard  ---
(In reply to Paul Floyd from comment #1)
> This might be fairly tricky to reproduce. I can't reproduce this on a RHEL
> 7.9 machine using vas4 ldap.

I also am unable to reproduce. It might be helpful to install the glibc
debuginfo to get a better idea where the issue comes from.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-21 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

--- Comment #1 from Paul Floyd  ---
This might be fairly tricky to reproduce. I can't reproduce this on a RHEL 7.9
machine using vas4 ldap.

What network config are you using?

I do get

SYSCALL[21118,1](12) sys_brk ( 0x0 ) --> [pre-success] Success(0x4224000) 
--21118-- REDIR: 0x4019e40 (ld-linux-x86-64.so.2:strlen) redirected to
0x580c7ed5 (???)
--21118-- sync signal handler: signal=11, si_code=1, EIP=0x4001f49,
eip=0x1002bb608b, from kernel
--21118-- SIGSEGV: si_code=1 faultaddr=0x1ffeffdf80 tid=1 ESP=0x1ffeffdf30
seg=0x1ffe801000-0x1ffeffdfff
--21118:1: signals extending a stack base 0x1ffeffe000 down by 4096 new base
0x1ffeffd000 to cover 0x1ffeffd000
--21118---> extended stack base to 0x1ffeffd000
SYSCALL[21118,1](63) sys_newuname ( 0x1ffeffdd6a )[sync] --> Success(0x0) 
--21118-- REDIR: 0x4019c10 (ld-linux-x86-64.so.2:index) redirected to
0x580c7eef (???)
SYSCALL[21118,1](9) sys_mmap ( 0x0, 4096, 3, 34, -1, 0 ) --> [pre-success]
Success(0x4022000) 
--21118-- sync signal handler: signal=11, si_code=1, EIP=0x4001cf1,
eip=0x1002bd28ae, from kernel
--21118-- SIGSEGV: si_code=1 faultaddr=0x1ffeffcef8 tid=1 ESP=0x1ffeffcef8
seg=0x1ffe801000-0x1ffeffcfff
--21118:1: signals extending a stack base 0x1ffeffd000 down by 4096 new base
0x1ffeffc000 to cover 0x1ffeffc000
--21118---> extended stack base to 0x1ffeffc000

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 466172] SIGTRAP crash whenever getaddrinfo call is issued by valgrind

2023-02-20 Thread Paul Floyd
https://bugs.kde.org/show_bug.cgi?id=466172

Paul Floyd  changed:

   What|Removed |Added

 CC||pjfl...@wanadoo.fr

-- 
You are receiving this mail because:
You are watching all bug changes.