Hi,

On 23/01/18 20:16, Nadav Har'El wrote:
I don't have any bright ideas, but just a few small comments below, hopefully (?) they will help something...

Appreciated...

This writes in "addr", which seems a reasonable address (doesn't seem like junk). In object::resolve_pltgot() you can see the addr is _base + slot.r_offset maybe you can print them and see with "nm"/"readelf" of the object being loaded if this offset
address makes sense (in the PLT section)?

So that made sense as far as I can see:

(gdb)
#9  0x0000000000492c7b in elf::object::arch_relocate_jump_slot (
    this=0xffffa0010327b400, sym=1, addr=0x10000aa0fe28, addend=0)
    at arch/x64/arch-elf.cc:109
109         *static_cast<void**>(addr) = symsym.relocated_addr();
(gdb) p symsym.obj._base
$1 = (void *) 0x0
(gdb) up
#10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
    this=0xffffa0010327b400, index=0) at core/elf.cc:692
692         if (!arch_relocate_jump_slot(sym, addr, slot.r_addend)) {
(gdb) p slot.r_offset
$2 = 2162216
(gdb) p/x slot.r_offset
$3 = 0x20fe28
(gdb)

$ readelf -a _build/default/rel/dbgp_webapi/erts-9.0.5/bin/erlexec | grep 20fe28 00000020fe28 000100000007 R_X86_64_JUMP_SLO 0000000000000000 getenv@GLIBC_2.2.5 + 0

If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something like https://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349 Can you reproduce this bug? If you can, you can confirm (or rule out) this wild guess by changing in
arch/x64/mmu.cc, flush_tlb_all(), the line

if (sched::thread::current()->is_app())

to if(false).

If the bug goes away, it can be related. If it doesn't go away, than it's not related.

I tried that, same crash.

But this is just a wild guess - probably wrong... I can't think of a better explanation now.


    #10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
         this=0xffffa001042d9e00, index=0) at core/elf.cc:692
    #11 0x00000000004021ca in elf_resolve_pltgot (index=0,
    obj=0xffffa001042d9e00)
         at core/elf.cc:1538
    #12 0x000000000048727d in __elf_resolve_pltgot () at
    arch/x64/elf-dl.S:47
    #13 0xffffa001042d9e00 in ?? ()


This is strange, it's running dynamically-generated code, which calls getenv()?

I don't believe so. I think this is right where erlexec is being started. I'll work on verifying that now.

I have a start-otp.so which loads the erlexec and sets off a pthread to run it, so my hypothesis is that this is at the point that start-otp is loading up the erlexec library.

Cheers,
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to