Re: Page fault outside of application

Rick Payne Wed, 24 Jan 2018 01:08:42 -0800

Hi,

On 23/01/18 20:16, Nadav Har'El wrote:

I don't have any bright ideas, but just a few small comments below,hopefully (?) they will help something...


Appreciated...

This writes in "addr", which seems a reasonable address (doesn't seemlike junk).In object::resolve_pltgot() you can see the addr is _base +slot.r_offset maybe youcan print them and see with "nm"/"readelf" of the object being loaded ifthis offset
address makes sense (in the PLT section)?


So that made sense as far as I can see:

(gdb)
#9  0x0000000000492c7b in elf::object::arch_relocate_jump_slot (
    this=0xffffa0010327b400, sym=1, addr=0x10000aa0fe28, addend=0)
    at arch/x64/arch-elf.cc:109
109         *static_cast<void**>(addr) = symsym.relocated_addr();
(gdb) p symsym.obj._base
$1 = (void *) 0x0
(gdb) up
#10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
    this=0xffffa0010327b400, index=0) at core/elf.cc:692
692         if (!arch_relocate_jump_slot(sym, addr, slot.r_addend)) {
(gdb) p slot.r_offset
$2 = 2162216
(gdb) p/x slot.r_offset
$3 = 0x20fe28
(gdb)

$ readelf -a _build/default/rel/dbgp_webapi/erts-9.0.5/bin/erlexec |grep 20fe2800000020fe28 000100000007 R_X86_64_JUMP_SLO 0000000000000000getenv@GLIBC_2.2.5 + 0

If the address is correct, maybe we have some sort of TLB flush problem or
something - we mapped the new area but some CPUs don't see it yet, e.g.,
from something likehttps://github.com/cloudius-systems/osv/commit/7e38453390d6c0164a72e30b2616b0f3c3025349Can you reproduce this bug? If you can, you can confirm (or rule out)this wild guess by changing in
arch/x64/mmu.cc, flush_tlb_all(), the line

if (sched::thread::current()->is_app())

to if(false).
If the bug goes away, it can be related. If it doesn't go away, thanit's not related.


I tried that, same crash.

But this is just a wild guess - probably wrong... I can't think of abetter explanation now.
    #10 0x00000000003fdfd7 in elf::object::resolve_pltgot (
         this=0xffffa001042d9e00, index=0) at core/elf.cc:692
    #11 0x00000000004021ca in elf_resolve_pltgot (index=0,
    obj=0xffffa001042d9e00)
         at core/elf.cc:1538
    #12 0x000000000048727d in __elf_resolve_pltgot () at
    arch/x64/elf-dl.S:47
    #13 0xffffa001042d9e00 in ?? ()
This is strange, it's running dynamically-generated code, which callsgetenv()?

I don't believe so. I think this is right where erlexec is beingstarted. I'll work on verifying that now.

I have a start-otp.so which loads the erlexec and sets off a pthread torun it, so my hypothesis is that this is at the point that start-otp isloading up the erlexec library.


Cheers,
Rick

--
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Page fault outside of application

Reply via email to