Uploaded in bionic[0]. It is now waiting for SRU acceptance for valgrind
to start buiding in bionic-proposed and start the testing phase of the
SRU.
[0] [ubuntu/bionic-proposed] valgrind 1:3.13.0-2ubuntu2.2 (Waiting for approval)
- Eric
** Description changed:
[Impact]
valgrind on bionic coredump and errors out as follows:
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==11950== valgrind: Unrecognised instruction at address 0x4014c90.
==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==11950== by 0x40018C3: _dl_start_final (rtld.c:414)
==11950== by 0x4001B47: _dl_start (rtld.c:523)
==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
==11950== Your program just tried to execute an instruction that Valgrind
==11950== did not recognise. There are two possible reasons for this.
==11950== 1. Your program has a bug and erroneously jumped to a non-code
==11950== location. If you are running Memcheck and you just saw a
==11950== warning about a bad jump, it's probably your program's fault.
==11950== 2. The instruction is legitimate but Valgrind doesn't handle it,
==11950== i.e. it's Valgrind's fault. If you think this is the case or
==11950== you are not sure, please let us know and we'll try to fix it.
==11950== Either way, Valgrind will now raise a SIGILL signal which will
==11950== probably kill your program.
==11950==
==11950== Process terminating with default action of signal 4 (SIGILL)
==11950== Illegal opcode at address 0x4014C90
==11950== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==11950== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==11950== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==11950== by 0x40018C3: _dl_start_final (rtld.c:414)
==11950== by 0x4001B47: _dl_start (rtld.c:523)
==11950== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
The crash occurs because Valgrind is trying to simulate the CPU
instructions when debugging a specific process. Valgrind tries to
disassemble the whole instructions running by the process and insert the
debugging instructions in run time. However, in this case, Valgrind
cannot identify the MIDR_EL1 flag which happens in the "mrs %0,
midr_el1" instruction. And this instruction means to read the CPU ID
state register to %0(id) variable. asm volatile ("mrs %0, midr_el1" :
"=r"(id)); so, Valrind cannot recognize what "midr_el1" is and then
crashes.
-
https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
....
d) CPU Identification :
- MIDR_EL1 is exposed to help identify the processor. On a
- heterogeneous system, this could be racy (just like getcpu()). The
- process could be migrated to another CPU by the time it uses the
- register value, unless the CPU affinity is set. Hence, there is no
- guarantee that the value reflects the processor that it is
- currently executing on. The REVIDR is not exposed due to this
- constraint, as REVIDR makes sense only in conjunction with the
- MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
- at:
+ MIDR_EL1 is exposed to help identify the processor. On a
+ heterogeneous system, this could be racy (just like getcpu()). The
+ process could be migrated to another CPU by the time it uses the
+ register value, unless the CPU affinity is set. Hence, there is no
+ guarantee that the value reflects the processor that it is
+ currently executing on. The REVIDR is not exposed due to this
+ constraint, as REVIDR makes sense only in conjunction with the
+ MIDR. Alternately, MIDR_EL1 and REVIDR_EL1 are exposed via sysfs
+ at:
- /sys/devices/system/cpu/cpu$ID/regs/identification/
- \- midr
- \- revidr
+ /sys/devices/system/cpu/cpu$ID/regs/identification/
+ \- midr
+ \- revidr
[Test Case]
1) Write a 'Hello World' program:
----
#include <stdio.h>
void main(void) {
printf("Hello World!\n");
};
----
2) Build it:
$ cc -o hello hello.c
3) Then run valgrind on it:
$ valgrind ./hello
[Regression Potential]
For the regression possibility, it should be fine.
The symtpom happens when Valgrind is trying to disassemble code inside
glibc (sysdeps/unix/sysv/linux/aarch64/cpu-features.c):
Even if the HWCAP_CPUID is not supported, the default value is to assign
0 to the midr variable. So, I think it's not an important feature to
support.
+
+ As stated in the fix itself as a comment:
+
+ ++ /* Limit the AT_HWCAP to just those features we explicitly
+ ++ support in VEX. */
+
Additionally, the fix is found in Ubuntu already (disco and late).
For some reasons, if a regression happens, the regression will be
limited to ARM arch and shouldn't affect other cpu(s) architecture.
[Other information]
Upstream fix:
https://sourceware.org/git/?p=valgrind.git;a=commit;h=fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42
* Only affecting Bionic:
# git describe --contains fbbb696c5d1e93d4ac6cb548c68bb3f443ceef42
VALGRIND_3_14_0~96
# rmadison valgrind
=> valgrind | 1:3.13.0-2ubuntu2.1 | bionic-updates
valgrind | 1:3.14.0-2ubuntu6 | disco
valgrind | 1:3.15.0-1ubuntu3.1 | eoan-updates
valgrind | 1:3.15.0-1ubuntu5 | focal
[Original Description]
I'm performing Valgrind testing on an ElPotato running Ubuntu Bionic
Aarch64 image. My program is dying like in
https://bugs.kde.org/show_bug.cgi?id=381556 :
```
$ valgrind --track-origins=yes --suppressions=cryptopp.supp ./cryptest.exe v
==12969== Memcheck, a memory error detector
==12969== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==12969== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==12969== Command: ./cryptest.exe v
==12969==
ARM64 front end: branch_etc
disInstr(arm64): unhandled instruction 0xD5380000
disInstr(arm64): 1101'0101 0011'1000 0000'0000 0000'0000
==12969== valgrind: Unrecognised instruction at address 0x4014c90.
==12969== at 0x4014C90: init_cpu_features (cpu-features.c:72)
==12969== by 0x4014C90: dl_platform_init (dl-machine.h:208)
==12969== by 0x4014C90: _dl_sysdep_start (dl-sysdep.c:231)
==12969== by 0x40018C3: _dl_start_final (rtld.c:414)
==12969== by 0x4001B47: _dl_start (rtld.c:523)
==12969== by 0x40011C7: ??? (in /lib/aarch64-linux-gnu/ld-2.27.so)
...
```
Here's a similar Red Hat issue report:
https://bugzilla.redhat.com/show_bug.cgi?id=1467952 .
Please pickup the patch in the 381556 bug report.
-----
$ lsb_release -rd
Description: Ubuntu 18.04.2 LTS
Release: 18.04
$ apt-cache policy valgrind
valgrind:
Installed: 1:3.13.0-2ubuntu2.1
Candidate: 1:3.13.0-2ubuntu2.1
Version table:
*** 1:3.13.0-2ubuntu2.1 500
500 http://ports.ubuntu.com bionic-updates/main arm64 Packages
100 /var/lib/dpkg/status
1:3.13.0-2ubuntu2 500
500 http://ports.ubuntu.com bionic/main arm64 Packages
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1826811
Title:
Valgrind unhandled instruction 0xD5380000 on Aarch64
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/valgrind/+bug/1826811/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs