Hello, I'm trying to run a .NET app with qemu-x86_64 on a s390x host. With a few fixes posted earlier it works for a while and then crashes like this:
(gdb) x/8i $pc-6 0x3f8ab23ef8: jmpq 0x3f87d2c5e0 0x3f8ab23efd: pop %rdi => 0x3f8ab23efe: sub (%rdx),%al 0x3f8ab23f00: callq 0x400143a300 <PrecodeFixupThunk> 0x3f8ab23f05: pop %rsi 0x3f8ab23f06: sub $0x1,%al 0x3f8ab23f08: callq 0x400143a300 <PrecodeFixupThunk> 0x3f8ab23f0d: pop %rsi Here is what I think happens based on the .NET code: https://github.com/dotnet/coreclr/blob/v3.1.5/src/vm/i386/stublinkerx86.h#L604 https://github.com/dotnet/coreclr/blob/v3.1.5/src/vm/i386/stublinkerx86.cpp#L6622 their design docs: https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode and my core file of course :-) At 0x3f8ab23ef8 we used to have `callq PrecodeFixupThunk`, which was patched into a jmp using `f0 48 0f b1 13 lock cmpxchg %rdx,(%rbx)` in another thread. Shortly before that current thread's disas_insn() fetched the call opcode and shortly after that it fetched the jmp offset, resulting in an inconsistent translation. The following supports this theory: (gdb) p/x $rdi 0x3f8ac5a25b (gdb) x/2a $rsp - 16 0x405e52c6c0: 0x3f8ab23efd 0x3f8ac5a25b ^^^ return addr ^^^ $rdi We have a return address 0x3f8ab23efd on stack, which means that a call was executed. However, PrecodeFixupThunk does not return, so it must have used the jmp's offset. I wonder if my analysis is correct, and if yes, then what can be done about it? I considered doing atomic loads in disas_insn() or adding a memcmp() to the beginning of each basic block, but all that looked slow, complex and fragile. P.S. I asked about this on IRC first, so putting everyone who answered there on CC. Best regards, Ilya