Thanks for the review. You can hopefully find some more related background 
info in this group post from a couple of days ago - 
https://groups.google.com/forum/#!msg/osv-dev/kmkqlZt2DRY/ijpde19DDQAJ

On Tuesday, August 20, 2019 at 9:19:12 AM UTC-4, Nadav Har'El wrote:
>
>
> On Tue, Aug 20, 2019 at 7:20 AM Waldemar Kozaczuk <[email protected] 
> <javascript:>> wrote:
>
>> The commit 
>> https://github.com/cloudius-systems/osv/commit/f5cc12d56dd986d2c7982c5738b1f859702b07fb
>> addressed the issue #993 to relax handling of missing symbols in 
>> relocate_pltgot()
>> when loading ELF objects with BIND_NOW. This patch on other hand attempts 
>> to close
>> the gap of handling missing symbols this time in relocate_rela() during 
>> loading phase
>> of ELF objects by dynamic linker.
>>
>
> Do you have an understanding *why* issue #1023 is different from anything 
> we've seen
> before, where the function is relocated by relocate_rela() and not 
> relocate_pltgot()?
> My guess was that this function is somehow accessed as a *variable* (e.g., 
> its pointer is
> copied or something) instead of as a real function call, but this was just 
> a guess...
>

Fixing #1023 was not a priority to me until I discovered it was preventing 
me from running ffmpeg (
https://github.com/cloudius-systems/osv-apps/tree/master/ffmpeg) that I am 
helping someone else to run. This app requires compiling from source and it 
used to work at some point. At first, it would fail with missing 
*versionsort* symbol. After I added this, it would fail with a missing 
symbol from the family of posix_spawnattr_* functions. This rang a bell and 
this is how it led me to the issue 1023. After making simple modification 
like in an original RFC patch (see same mailing link) the app would happily 
work. As a matter of fact, I was able to run a version of ffmpeg from host. 

Honestly, I do not understand why this app, in this case, needs symbols 
resolution processed through relocate_rela() vs relocate_pltgot(). 
Actually, that is why I was asking the question about R_X86_64_GLOB_DAT 
which I would think should apply to variables, not functions. But I think 
you guess about being function pointers is a good one. Could it be 
different compiler flags?

>
>
>> The solution is similar to what we do in relocate_pltgot()
>> - let missing symbol get ignored and hope they will not be used.
>> However in this case we cannot rely on lazy symbol resolution
>> logic that would catch missing symbol later when used and abort.
>> Instead we place an indicator of missing symbol (special invalid address)
>> which would trigger page fault when symbol accessed. This indicator allows
>> us to distinguish missing symbol used scenario from other page fault 
>> related ones.
>> This unfortunately does not tell us which missing symbol was used.
>> In future we could place indicator + offset where offset would point
>> to a name of the missing symbol in some missing symbols table.
>>
>> Fixes #1023
>>
>
> Did you test this on an application (like the rust example #1023) where 
> before the
> application couldn't run, and now it can? 
>
I did test with ffmpeg which is the original reason for this patch. I did 
not try to recreate it with the original rust example which I should. 
 

> Or does the application really try to use
> the invalid pointer, and crash at that point?
>
In the ffmpeg case it never used the symbols it needed to be resolved 
through relocate_rela():

/usr/lib/libmount.so.1: ignoring missing symbol versionsort of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_destroy of 
type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_setsigmask 
of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol 
posix_spawnattr_setsigdefault of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol 
posix_spawn_file_actions_adddup2 of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol 
posix_spawn_file_actions_init of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol 
posix_spawn_file_actions_destroy of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_setflags 
of type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_init of 
type: 6
/usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnp of type: 6
/usr/lib/libgmp.so.10: ignoring missing symbol obstack_vprintf of type: 1
/usr/lib/libasound.so.2: ignoring missing symbol versionsort of type: 6
/usr/lib/libnsl.so.1: ignoring missing symbol _libc_intl_domainname of 
type: 6
/usr/lib/libasyncns.so.0: ignoring missing symbol __res_query of type: 6
/usr/lib/libasyncns.so.0: ignoring missing symbol __res_search of type: 6

So no it would not crash. Fffmpeg is a pretty complex app (toolbox for all 
kinds of media processing use cases) and depends on many shared libraries 
(or at least the version I compiled and the one provided as a package by 
Fedora or Ubuntu). I guess in all the use cases I have tested the 
particular symbols though missing were not used. I think that applies to 
many libraries - many symbols provided but not used by application actually.


>
>  
>
>>
>> Signed-off-by: Waldemar Kozaczuk <[email protected] <javascript:>>
>> ---
>>  arch/x64/arch-elf.cc | 20 ++++++++++++++++----
>>  arch/x64/arch-elf.hh |  2 ++
>>  arch/x64/mmu.cc      |  4 ++++
>>  3 files changed, 22 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x64/arch-elf.cc b/arch/x64/arch-elf.cc
>> index ebe40996..4c513150 100644
>> --- a/arch/x64/arch-elf.cc
>> +++ b/arch/x64/arch-elf.cc
>> @@ -70,16 +70,28 @@ bool object::arch_relocate_rela(u32 type, u32 sym, 
>> void *addr,
>>          memcpy(addr, sm.relocated_addr(), sm.size());
>>          break;
>>      }
>> -    case R_X86_64_64:
>> -        *static_cast<void**>(addr) = symbol(sym).relocated_addr() + 
>> addend;
>> +    case R_X86_64_64: {
>> +        auto _sym = symbol(sym, true);
>> +        if (_sym.symbol) {
>> +            *static_cast<void**>(addr) = _sym.relocated_addr() + addend;
>> +        } else {
>> +            *static_cast<void**>(addr) = MISSING_SYMBOL_INDICATOR;
>> +        }
>>          break;
>> +    }
>>
> The reason I set the address to MISSING_SYMBOL_INDICATOR  is NOT to 
prevent the app from crashing but to better inform why it crashes if 
missing symbol is used later. Otherwise it would show up as a regular null 
pointer fault.

>      case R_X86_64_RELATIVE:
>>          *static_cast<void**>(addr) = _base + addend;
>>          break;
>>      case R_X86_64_JUMP_SLOT:
>> -    case R_X86_64_GLOB_DAT:
>> -        *static_cast<void**>(addr) = symbol(sym).relocated_addr();
>> +    case R_X86_64_GLOB_DAT: {
>> +        auto _sym = symbol(sym, true);
>> +        if (_sym.symbol) {
>> +            *static_cast<void**>(addr) = _sym.relocated_addr();
>> +        } else {
>> +            *static_cast<void**>(addr) = MISSING_SYMBOL_INDICATOR;
>> +        }
>>          break;
>> +    }
>>      // The next 3 types are intended to relocate symbols of thread local 
>> variables
>>      // defined with __thread modifier
>>      //
>> diff --git a/arch/x64/arch-elf.hh b/arch/x64/arch-elf.hh
>> index 1811ceb5..b4c63967 100644
>> --- a/arch/x64/arch-elf.hh
>> +++ b/arch/x64/arch-elf.hh
>> @@ -34,6 +34,8 @@ enum {
>>      R_X86_64_IRELATIVE = 37, //  word64 indirect(B + A)
>>  };
>>
>> +void * const MISSING_SYMBOL_INDICATOR = (void*)0xffffeeeeddddcccc;
>>
>
> Can you please remind me why this is an invalid pointer? Does it not have 
> enough f's in the beginning to be a valid pointer?
>
I tried to pick something that would never be a valid pointer. Is there a 
better address I should use? 

>
> +
>>  /* for pltgot relocation */
>>  #define ARCH_JUMP_SLOT R_X86_64_JUMP_SLOT
>>
>> diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
>> index 2f1ba5e2..441b6c45 100644
>> --- a/arch/x64/mmu.cc
>> +++ b/arch/x64/mmu.cc
>> @@ -6,6 +6,7 @@
>>   */
>>
>>  #include "arch-cpu.hh"
>> +#include "arch-elf.hh"
>>  #include <osv/debug.hh>
>>  #include <osv/sched.hh>
>>  #include <osv/mmu.hh>
>> @@ -28,6 +29,9 @@ void page_fault(exception_frame *ef)
>>      if (!pc) {
>>          abort("trying to execute null pointer");
>>      }
>> +    if (pc == MISSING_SYMBOL_INDICATOR) {
>> +        abort("trying to execute missing symbol");
>>
>
> Do you have a test case where you actually see this message?
>
Yes. I was able to manually create tests that would trigger this with 
missing symbol scenario with R_X86_64_GLOB_DAT and R_X86_64_64 relocation 
type. In the latter case, I used the graalvm app that uses mprotect and I 
manually misspelled to make it not found and the message was properly 
handled. I also conducted a similar test for R_X86_64_GLOB_DAT.
 

> Because I wonder if the invalid pointer actually gets *executed* (so pc = 
> ...) - it is also possible the pointer get followed, not executed. I think 
> "pc" isn't the general indication of where the page fault happened.
>
Not sure I understand what you are saying here. I did happen in the 2 test 
scenarios I ran. I see that this particular page fault logic looks at RIP 
value which only apply to function execution, right? Do we have another 
page fault handler when data is actually read/written to where we should 
supply similar logic?

>
>
>      // The following code may sleep. So let's verify the fault did not 
>> happen
>>      // when preemption was disabled, or interrupts were disabled.
>>      assert(sched::preemptable());
>> -- 
>> 2.20.1
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/osv-dev/20190820042043.25133-1-jwkozaczuk%40gmail.com
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/3a4fe9b0-e2a1-46fc-8982-b8a06fee2be3%40googlegroups.com.

Reply via email to