To add to this: the bottom line is that this smaller version of the patch:

diff --git a/arch/x64/arch-elf.cc b/arch/x64/arch-elf.cc
index ebe40996..4c513150 100644
--- a/arch/x64/arch-elf.cc
+++ b/arch/x64/arch-elf.cc
@@ -70,16 +70,28 @@ bool object::arch_relocate_rela(u32 type, u32 sym, void 
*addr,
         memcpy(addr, sm.relocated_addr(), sm.size());
         break;
     }
-    case R_X86_64_64:
-        *static_cast<void**>(addr) = symbol(sym).relocated_addr() + addend;
+    case R_X86_64_64: {
+        auto _sym = symbol(sym, true);
+        if (_sym.symbol) {
+            *static_cast<void**>(addr) = _sym.relocated_addr() + addend;
+        }
         break;
+    }
     case R_X86_64_RELATIVE:
         *static_cast<void**>(addr) = _base + addend;
         break;
     case R_X86_64_JUMP_SLOT:
-    case R_X86_64_GLOB_DAT:
-        *static_cast<void**>(addr) = symbol(sym).relocated_addr();
+    case R_X86_64_GLOB_DAT: {
+        auto _sym = symbol(sym, true);
+        if (_sym.symbol) {
+            *static_cast<void**>(addr) = _sym.relocated_addr();
+        }
         break;
+    }
     // The next 3 types are intended to relocate symbols of thread local 
variables
     // defined with __thread modifier
     //

would suffice to make ffmpeg run. 

The remaining part is to better inform the user if such a missing symbol is 
ACTUALLY used.

On Tuesday, August 20, 2019 at 11:25:13 AM UTC-4, Waldek Kozaczuk wrote:
>
> Thanks for the review. You can hopefully find some more related background 
> info in this group post from a couple of days ago - 
> https://groups.google.com/forum/#!msg/osv-dev/kmkqlZt2DRY/ijpde19DDQAJ
>
> On Tuesday, August 20, 2019 at 9:19:12 AM UTC-4, Nadav Har'El wrote:
>>
>>
>> On Tue, Aug 20, 2019 at 7:20 AM Waldemar Kozaczuk <[email protected]> 
>> wrote:
>>
>>> The commit 
>>> https://github.com/cloudius-systems/osv/commit/f5cc12d56dd986d2c7982c5738b1f859702b07fb
>>> addressed the issue #993 to relax handling of missing symbols in 
>>> relocate_pltgot()
>>> when loading ELF objects with BIND_NOW. This patch on other hand 
>>> attempts to close
>>> the gap of handling missing symbols this time in relocate_rela() during 
>>> loading phase
>>> of ELF objects by dynamic linker.
>>>
>>
>> Do you have an understanding *why* issue #1023 is different from anything 
>> we've seen
>> before, where the function is relocated by relocate_rela() and not 
>> relocate_pltgot()?
>> My guess was that this function is somehow accessed as a *variable* 
>> (e.g., its pointer is
>> copied or something) instead of as a real function call, but this was 
>> just a guess...
>>
>
> Fixing #1023 was not a priority to me until I discovered it was preventing 
> me from running ffmpeg (
> https://github.com/cloudius-systems/osv-apps/tree/master/ffmpeg) that I 
> am helping someone else to run. This app requires compiling from source and 
> it used to work at some point. At first, it would fail with missing 
> *versionsort* symbol. After I added this, it would fail with a missing 
> symbol from the family of posix_spawnattr_* functions. This rang a bell 
> and this is how it led me to the issue 1023. After making simple 
> modification like in an original RFC patch (see same mailing link) the app 
> would happily work. As a matter of fact, I was able to run a version of 
> ffmpeg from host. 
>
> Honestly, I do not understand why this app, in this case, needs symbols 
> resolution processed through relocate_rela() vs relocate_pltgot(). 
> Actually, that is why I was asking the question about R_X86_64_GLOB_DAT 
> which I would think should apply to variables, not functions. But I think 
> you guess about being function pointers is a good one. Could it be 
> different compiler flags?
>
>>
>>
>>> The solution is similar to what we do in relocate_pltgot()
>>> - let missing symbol get ignored and hope they will not be used.
>>> However in this case we cannot rely on lazy symbol resolution
>>> logic that would catch missing symbol later when used and abort.
>>> Instead we place an indicator of missing symbol (special invalid address)
>>> which would trigger page fault when symbol accessed. This indicator 
>>> allows
>>> us to distinguish missing symbol used scenario from other page fault 
>>> related ones.
>>> This unfortunately does not tell us which missing symbol was used.
>>> In future we could place indicator + offset where offset would point
>>> to a name of the missing symbol in some missing symbols table.
>>>
>>> Fixes #1023
>>>
>>
>> Did you test this on an application (like the rust example #1023) where 
>> before the
>> application couldn't run, and now it can? 
>>
> I did test with ffmpeg which is the original reason for this patch. I did 
> not try to recreate it with the original rust example which I should. 
>  
>
>> Or does the application really try to use
>> the invalid pointer, and crash at that point?
>>
> In the ffmpeg case it never used the symbols it needed to be resolved 
> through relocate_rela():
>
> /usr/lib/libmount.so.1: ignoring missing symbol versionsort of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_destroy 
> of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol 
> posix_spawnattr_setsigmask of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol 
> posix_spawnattr_setsigdefault of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol 
> posix_spawn_file_actions_adddup2 of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol 
> posix_spawn_file_actions_init of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol 
> posix_spawn_file_actions_destroy of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_setflags 
> of type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnattr_init of 
> type: 6
> /usr/lib/librsvg-2.so.2: ignoring missing symbol posix_spawnp of type: 6
> /usr/lib/libgmp.so.10: ignoring missing symbol obstack_vprintf of type: 1
> /usr/lib/libasound.so.2: ignoring missing symbol versionsort of type: 6
> /usr/lib/libnsl.so.1: ignoring missing symbol _libc_intl_domainname of 
> type: 6
> /usr/lib/libasyncns.so.0: ignoring missing symbol __res_query of type: 6
> /usr/lib/libasyncns.so.0: ignoring missing symbol __res_search of type: 6
>
> So no it would not crash. Fffmpeg is a pretty complex app (toolbox for all 
> kinds of media processing use cases) and depends on many shared libraries 
> (or at least the version I compiled and the one provided as a package by 
> Fedora or Ubuntu). I guess in all the use cases I have tested the 
> particular symbols though missing were not used. I think that applies to 
> many libraries - many symbols provided but not used by application actually.
>
>
>>
>>  
>>
>>>
>>> Signed-off-by: Waldemar Kozaczuk <[email protected]>
>>> ---
>>>  arch/x64/arch-elf.cc | 20 ++++++++++++++++----
>>>  arch/x64/arch-elf.hh |  2 ++
>>>  arch/x64/mmu.cc      |  4 ++++
>>>  3 files changed, 22 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/x64/arch-elf.cc b/arch/x64/arch-elf.cc
>>> index ebe40996..4c513150 100644
>>> --- a/arch/x64/arch-elf.cc
>>> +++ b/arch/x64/arch-elf.cc
>>> @@ -70,16 +70,28 @@ bool object::arch_relocate_rela(u32 type, u32 sym, 
>>> void *addr,
>>>          memcpy(addr, sm.relocated_addr(), sm.size());
>>>          break;
>>>      }
>>> -    case R_X86_64_64:
>>> -        *static_cast<void**>(addr) = symbol(sym).relocated_addr() + 
>>> addend;
>>> +    case R_X86_64_64: {
>>> +        auto _sym = symbol(sym, true);
>>> +        if (_sym.symbol) {
>>> +            *static_cast<void**>(addr) = _sym.relocated_addr() + addend;
>>> +        } else {
>>> +            *static_cast<void**>(addr) = MISSING_SYMBOL_INDICATOR;
>>> +        }
>>>          break;
>>> +    }
>>>
>> The reason I set the address to MISSING_SYMBOL_INDICATOR  is NOT to 
> prevent the app from crashing but to better inform why it crashes if 
> missing symbol is used later. Otherwise it would show up as a regular null 
> pointer fault.
>
>>      case R_X86_64_RELATIVE:
>>>          *static_cast<void**>(addr) = _base + addend;
>>>          break;
>>>      case R_X86_64_JUMP_SLOT:
>>> -    case R_X86_64_GLOB_DAT:
>>> -        *static_cast<void**>(addr) = symbol(sym).relocated_addr();
>>> +    case R_X86_64_GLOB_DAT: {
>>> +        auto _sym = symbol(sym, true);
>>> +        if (_sym.symbol) {
>>> +            *static_cast<void**>(addr) = _sym.relocated_addr();
>>> +        } else {
>>> +            *static_cast<void**>(addr) = MISSING_SYMBOL_INDICATOR;
>>> +        }
>>>          break;
>>> +    }
>>>      // The next 3 types are intended to relocate symbols of thread 
>>> local variables
>>>      // defined with __thread modifier
>>>      //
>>> diff --git a/arch/x64/arch-elf.hh b/arch/x64/arch-elf.hh
>>> index 1811ceb5..b4c63967 100644
>>> --- a/arch/x64/arch-elf.hh
>>> +++ b/arch/x64/arch-elf.hh
>>> @@ -34,6 +34,8 @@ enum {
>>>      R_X86_64_IRELATIVE = 37, //  word64 indirect(B + A)
>>>  };
>>>
>>> +void * const MISSING_SYMBOL_INDICATOR = (void*)0xffffeeeeddddcccc;
>>>
>>
>> Can you please remind me why this is an invalid pointer? Does it not have 
>> enough f's in the beginning to be a valid pointer?
>>
> I tried to pick something that would never be a valid pointer. Is there a 
> better address I should use? 
>
>>
>> +
>>>  /* for pltgot relocation */
>>>  #define ARCH_JUMP_SLOT R_X86_64_JUMP_SLOT
>>>
>>> diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
>>> index 2f1ba5e2..441b6c45 100644
>>> --- a/arch/x64/mmu.cc
>>> +++ b/arch/x64/mmu.cc
>>> @@ -6,6 +6,7 @@
>>>   */
>>>
>>>  #include "arch-cpu.hh"
>>> +#include "arch-elf.hh"
>>>  #include <osv/debug.hh>
>>>  #include <osv/sched.hh>
>>>  #include <osv/mmu.hh>
>>> @@ -28,6 +29,9 @@ void page_fault(exception_frame *ef)
>>>      if (!pc) {
>>>          abort("trying to execute null pointer");
>>>      }
>>> +    if (pc == MISSING_SYMBOL_INDICATOR) {
>>> +        abort("trying to execute missing symbol");
>>>
>>
>> Do you have a test case where you actually see this message?
>>
> Yes. I was able to manually create tests that would trigger this with 
> missing symbol scenario with R_X86_64_GLOB_DAT and R_X86_64_64 relocation 
> type. In the latter case, I used the graalvm app that uses mprotect and I 
> manually misspelled to make it not found and the message was properly 
> handled. I also conducted a similar test for R_X86_64_GLOB_DAT.
>  
>
>> Because I wonder if the invalid pointer actually gets *executed* (so pc = 
>> ...) - it is also possible the pointer get followed, not executed. I think 
>> "pc" isn't the general indication of where the page fault happened.
>>
> Not sure I understand what you are saying here. I did happen in the 2 test 
> scenarios I ran. I see that this particular page fault logic looks at RIP 
> value which only apply to function execution, right? Do we have another 
> page fault handler when data is actually read/written to where we should 
> supply similar logic?
>
>>
>>
>>      // The following code may sleep. So let's verify the fault did not 
>>> happen
>>>      // when preemption was disabled, or interrupts were disabled.
>>>      assert(sched::preemptable());
>>> -- 
>>> 2.20.1
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OSv Development" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/osv-dev/20190820042043.25133-1-jwkozaczuk%40gmail.com
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/70594378-49ae-4dc1-8786-a9eb73a97267%40googlegroups.com.

Reply via email to