[Bug 1887854] Re: Spurious Data Abort on qemu-system-aarch64
An update for anyone interested: I didn't remember seeing the leading 0x10 because the values are correct when retrieved from memory. They get packed into a structure that gets returned in a single register, so the 0x10 second element ends up in the upper 4 bytes of x0 which is provided as the first argument to strcmp. strcmp doesn't appear to clear the upper bytes of x0 in ilp32 mode before using it to access memory. This issue is actually either a GCC codegen problem or a multilib selection problem in the build environment. Also of note, GDB prints the full 64bit address when printing $w0 instead of the lower 4 bytes, but I don't think that's a Qemu bug either. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1887854 Title: Spurious Data Abort on qemu-system-aarch64 Status in QEMU: Invalid Bug description: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without changing its value before each data abort would occur. This test needs 6 of these workarounds to operate successfully. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1887854/+subscriptions
[Bug 1887854] Re: Spurious Data Abort on qemu-system-aarch64
Ok, thanks for rooting this out. I could swear that I checked that address several times and I clearly remember 0x4010ca28, but I don't remember ever seeing 0x10 ahead of it. I'll dig into it a bit and hopefully find the root cause in my code. ** Changed in: qemu Status: New => Invalid -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1887854 Title: Spurious Data Abort on qemu-system-aarch64 Status in QEMU: Invalid Bug description: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without changing its value before each data abort would occur. This test needs 6 of these workarounds to operate successfully. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1887854/+subscriptions
[Bug 1887854] Re: Spurious Data Abort on qemu-system-aarch64
It does still crash on current QEMU. The proximate cause of the crash is that you are trying to read from an address which is way outside RAM: Trace 0: 0x7f8d50054340 [/400195d8/0x82104000] strcmp PC=400195d8 X00=00104010ca28 X01=4001ec28 X02=0fe8 X03=401098c8 X04=4010ba40 X05=5641526f44654b00 X06=1f276f6c62717372 X07= X08=ffda X09=401097d0 X10=0101010101010101 X11= X12= X13= X14= X15= X16=40014610 X17=0008 X18= X19=4010b9f0 X20=4001ec28 X21=00084001ec20 X22=4001ec60 X23=4001ec40 X24=4001f548 X25=00104001ec28 X26=00104001ec40 X27=00034001ec38 X28= X29=401098d0 X30=40008a38 SP=401098d0 PSTATE=4005 -Z-- EL1h Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 where the insn at 0x400195d8 is (inside strcmp) 0x400195d8: f8408402 ldr x2, [x0], #8 You can see that x0 is is 00104010ca28, so QEMU is correct to give the data abort here. Further diagnosis would require working back through the log to find out where that address came from, which will be easier for you to do since you have the source. NB: I recommend these options for producing the logfile: /tmp/q.log -d in_asm,int,cpu_reset,exec,cpu,guest_errors,nochain -singlestep Execution will be slower, but the crash here is pretty quick so that's not a problem, and these options mean that every insn executed will produce a "Trace" line and a CPU register dump. That's easier to understand and read (especially reading backwards) than logs produced when QEMU is doing its normal optimisations of chaining TBs together and putting multiple guest insns in each TB. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1887854 Title: Spurious Data Abort on qemu-system-aarch64 Status in QEMU: New Bug description: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without changing its value before each data abort would occur. This test needs 6 of these workarounds to operate successfully. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1887854/+subscriptions
[Bug 1887854] Re: Spurious Data Abort on qemu-system-aarch64
I would have thought that TLB considerations would not apply when the MMU is disabled (RTEMS runs in a completely flat memory space). I'll try to reproduce on more modern QEMU today. Thanks for taking a look at this. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1887854 Title: Spurious Data Abort on qemu-system-aarch64 Status in QEMU: New Bug description: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without changing its value before each data abort would occur. This test needs 6 of these workarounds to operate successfully. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1887854/+subscriptions
[Bug 1887854] Re: Spurious Data Abort on qemu-system-aarch64
Writing to SCTLR can cause QEMU to flush its TLB (as an internal implementation detail), so if adding SCTLR writes is sufficient to cause the problem to go away, I would be suspicious that your guest code is missing necessary TLB maintenance instructions. QEMU 3.1 and 4.1 are quite old -- can you reproduce with 5.0 or (ideally) head-of-git ? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1887854 Title: Spurious Data Abort on qemu-system-aarch64 Status in QEMU: New Bug description: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without changing its value before each data abort would occur. This test needs 6 of these workarounds to operate successfully. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1887854/+subscriptions
[Bug 1887854] Re: Spurious Data Abort on qemu-system-aarch64
** Description changed: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without - changing its value. + changing its value before each data abort would occur. This test needs 6 + of these workarounds to operate successfully. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1887854 Title: Spurious Data Abort on qemu-system-aarch64 Status in QEMU: New Bug description: When running RTEMS test psxndbm01.exe built for AArch64-ilp32 (this code is not yet publically available), the test generates a spurious data abort (the MMU and alignment checks should be disabled according to bits 1, 0 of SCTLR_EL1). The abort information is as follows: Taking exception 4 [Data Abort] ...from EL1 to EL1 ...with ESR 0x25/0x9610 ...with FAR 0x104010ca28 ...with ELR 0x400195d8 ...to EL1 PC 0x40018200 PSTATE 0x3c5 The ESR indicates that a synchronous external abort has occurred. ESR EC field: 0b100101 From the ARMv8 technical manual: Data Abort taken without a change in Exception level. Used for MMU faults generated by data accesses, alignment faults other than those caused by Stack Pointer misalignment, and synchronous External aborts, including synchronous parity or ECC errors. Not used for debug related exceptions. ESR ISS field: 0b1 From the ARMv8 technical manual: Synchronous External abort, not on translation table walk or hardware update of translation table. The following command line is used to invoke qemu: qemu-system-aarch64 -machine virt -cpu cortex-a53 -m 256M -no-reboot -nographic -serial mon:stdio -kernel build/aarch64/a53_ilp32_qemu/testsuites/psxtests/psxndbm01.exe -D qemu.log -d in_asm,int,cpu_reset,unimp,guest_errors This occurs on Qemu 3.1.0 as distributed via Debian and on Qemu 4.1 as built by the RTEMS source builder (4.1+minor patches). Edit: This bug can be worked around by getting and setting SCTLR without changing its value before each data abort would occur. This test needs 6 of these workarounds to operate successfully. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1887854/+subscriptions