I did not make up those strace logs in my head, all I am trying to do
is Debian bug triaging. Turns out I did a pretty bad job at it:
1. The original Debian bug report seems to be PEBCAK, and I'll close
the bug as wontfix ASAP,
2. I was not paying attention to the gcc version I was using.
The original bug report
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=928224
specified the reproducing case "valgrind /bin/true". That now works for me:
-----
$ valgrind /bin/true
==399== Memcheck, a memory error detector
==399== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==399== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==399== Command: /bin/true
==399==
==399==
==399== HEAP SUMMARY:
==399== in use at exit: 0 bytes in 0 blocks
==399== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==399==
==399== All heap blocks were freed -- no leaks are possible
==399==
==399== For lists of detected and suppressed errors, rerun with: -s
==399== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
-----
in the environment:
-----
$ valgrind --version
valgrind-3.20.0.GIT
$ gcc --version
gcc (Debian 10.2.1-6) 10.2.1 20210110
$ uname -a
Linux rpi2-20220121 5.10.0-15-armmp #1 SMP Debian 5.10.120-1 (2022-06-09)
armv7l GNU/Linux
-----
so the original bug report can be closed with "fixed in newer version"
or something like that.
So if my understanding is correct I can make valgrind produce this
"Illegal instruction" using either gcc-11 or gcc-12 (Debian package
from sid), BUT I can make valgrind run using gcc-10 (again Debian
package from sid). This also seems to be hardware specific since armhf
binary + gcc-12 runs properly on arm64 (armhf chroot).
Is it easy to install several versions (gcc-10, gcc-11, gcc-12, clang-13)
at the same time, and switch among them by using something like
CC=/path/to/gcc-12 ./configure
Where can I find hints about this?
Would you kindly indicate if you believe the bug should be reported
back to valgrind bug tracker or gcc bug tracker ? If that matters,
clang 13.0 seems to also mess up valgrind code and binaries produced
return this "Illegal instruction".
SIGILL should be diagnosed using gdb to print the instruction stream
and register contents
-----
(gdb) run args...
Program received signal SIGILL, Illegal instruction.
(gdb) x/i $pc ## the faulting instruction
(gdb) x/12i pc-6*4 ## disassemble the surrounding instructions
(Gdb) x/12xw $pc-6*4 ## and in 32-bit raw hexadecimal
(gdb) info reg ## content of all registers
(gdb) x/16xw $sp ## dump the active end of the stack
(gdb) bt ## source-level backtrace
-----
But with valgrind you must just "continue" the deliberate SIGILL and
SIGSEGV that valgrind uses. Here is an actual run:
-----
$ gdb valgrind
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Reading symbols from valgrind...
(gdb) run /bin/true
Starting program: /usr/local/bin/valgrind /bin/true
process 426 is executing new program:
/usr/local/libexec/valgrind/memcheck-arm-linux
Program received signal SIGILL, Illegal instruction.
vgPlain_machine_get_hwcaps () at m_machine.c:1719
1719 __asm__ __volatile__(".word 0xF3044F54"); /* VMAXNM.F32
q2,q2,q2 */
## Notice that this SIGILL is from valgrind trying to determine
## the actual hardware capabilities. Valgrind knows what it is doing,
## so just 'continue' to let valgrind handle the SIGILL.
(gdb) c
Continuing.
==426== Memcheck, a memory error detector
==426== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==426== Using Valgrind-3.20.0.GIT and LibVEX; rerun with -h for copyright info
==426== Command: /bin/true
==426==
Program received signal SIGSEGV, Segmentation fault. ## valgrind deliberate
0x62c68cc0 in ?? ()
(gdb) x/i $pc
=> 0x62c68cc0: str r3, [r9]
(gdb) p $r9
$1 = 3187663772
(gdb) p/x $r9
$2 = 0xbdffe39c
(gdb) x/12i $pc-6*4
0x62c68ca8: ldr r3, [r8, #424] ; 0x1a8
0x62c68cac: mov r1, r3
0x62c68cb0: movw r2, #62156 ; 0xf2cc
0x62c68cb4: movt r2, #22528 ; 0x5800
0x62c68cb8: blx r2
0x62c68cbc: ldr r3, [r8, #24]
=> 0x62c68cc0: str r3, [r9]
0x62c68cc4: add r7, r9, #4
0x62c68cc8: mov r0, r7
0x62c68ccc: ldr r3, [r8, #428] ; 0x1ac
0x62c68cd0: mov r1, r3
0x62c68cd4: movw r2, #62156 ; 0xf2cc
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault. ## valgrind deliberate
0x62c6cc0c in ?? ()
(gdb) x/i $pc
=> 0x62c6cc0c: str r9, [r11]
(gdb) x/12i $pc-6*4
0x62c6cbf4: ldr r9, [r8, #416] ; 0x1a0
0x62c6cbf8: mov r1, r9
0x62c6cbfc: movw r2, #62156 ; 0xf2cc
0x62c6cc00: movt r2, #22528 ; 0x5800
0x62c6cc04: blx r2
0x62c6cc08: ldr r9, [r8, #16]
=> 0x62c6cc0c: str r9, [r11]
0x62c6cc10: add r9, r11, #4
0x62c6cc14: mov r0, r9
0x62c6cc18: ldr r3, [r8, #420] ; 0x1a4
0x62c6cc1c: mov r1, r3
0x62c6cc20: movw r2, #62156 ; 0xf2cc
(gdb) c
Continuing.
==426==
==426== HEAP SUMMARY:
==426== in use at exit: 0 bytes in 0 blocks
==426== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==426==
==426== All heap blocks were freed -- no leaks are possible
==426==
==426== For lists of detected and suppressed errors, rerun with: -s
==426== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
[Inferior 1 (process 426) exited normally]
-----
You also can use something like
objdump --disassemble=subroutine_name
to be sure that the executing process matches the built software file.
Right now I cannot reproduce SIGILL, so I cannot dig in further.
Based on the software that I built and ran: valgrind is not to blame;
the problem lies with the compiler, operating system, or hardware.
(In the last two months I have had four hardware failures:
a 5-port ethernet switch, the sound output on a 12-year old
consumer desktop PC, the sound output on a 5-year old
self-built x86_64 desktop, and the power brick for a RaspberryPi.)
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users