Hello,

I'm trying to run a .NET app with qemu-x86_64 on a s390x host. With a
few fixes posted earlier it works for a while and then crashes like
this:

(gdb) x/8i $pc-6  
   0x3f8ab23ef8:        jmpq   0x3f87d2c5e0
   0x3f8ab23efd:        pop    %rdi
=> 0x3f8ab23efe:        sub    (%rdx),%al
   0x3f8ab23f00:        callq  0x400143a300 <PrecodeFixupThunk>
   0x3f8ab23f05:        pop    %rsi
   0x3f8ab23f06:        sub    $0x1,%al
   0x3f8ab23f08:        callq  0x400143a300 <PrecodeFixupThunk>
   0x3f8ab23f0d:        pop    %rsi

Here is what I think happens based on the .NET code:

https://github.com/dotnet/coreclr/blob/v3.1.5/src/vm/i386/stublinkerx86.h#L604
https://github.com/dotnet/coreclr/blob/v3.1.5/src/vm/i386/stublinkerx86.cpp#L6622

their design docs:

https://github.com/dotnet/coreclr/blob/master/Documentation/botr/method-descriptor.md#precode

and my core file of course :-)

At 0x3f8ab23ef8 we used to have `callq PrecodeFixupThunk`, which was
patched into a jmp using `f0 48 0f b1 13   lock cmpxchg %rdx,(%rbx)` in
another thread. Shortly before that current thread's disas_insn()
fetched the call opcode and shortly after that it fetched the jmp
offset, resulting in an inconsistent translation.

The following supports this theory:

(gdb) p/x $rdi
0x3f8ac5a25b
(gdb) x/2a $rsp - 16
0x405e52c6c0:   0x3f8ab23efd    0x3f8ac5a25b
                ^^^ return addr ^^^ $rdi

We have a return address 0x3f8ab23efd on stack, which means that a call
was executed. However, PrecodeFixupThunk does not return, so it must
have used the jmp's offset.

I wonder if my analysis is correct, and if yes, then what can be done
about it? I considered doing atomic loads in disas_insn() or adding a
memcmp() to the beginning of each basic block, but all that looked
slow, complex and fragile.

P.S. I asked about this on IRC first, so putting everyone who answered
there on CC.

Best regards,
Ilya


Reply via email to