This is quite small and simple patch but it has taken me almost 2 months
researching and understanding the problem and finding the right solution. It
has involved reading ARMv8 programmer guide, posting questions to ARM forums
as well as trying to debug the problem mostly in trial-and-error fashion as 
somewhat
documented by the issue #1100. The special credit goes to Claudio
Fontana who helped me tremendously by explaining and suggesting
many valuable ideas. 

As the issue #1100 explains, OSv would occasionally or quite repeatedly
depending on the application, crash due to an unexpected Unknown Reason
class synchronous exception (EC=0). This would never happen in emulated
mode (QEMU with TCG) but quite freqently on real ARM hardware like RPI 4
on QEMU with KVM or Firecracker. Per ARM documentation -
https://developer.arm.com/docs/ddi0595/h/aarch64-system-registers/esr_el1#ISS_exceptionswithanunknownreason
- there are many potential causes of EC=0 exception  including "attempted 
execution
of an instruction bit pattern that has no allocated instruction" which
means trying to execute garbage.

All of those potential causes which I quite meticulously researched,
examined and discussed some with Claudio, did not seem to apply or did
not make much sense in OSv context. Until one of them did when I stumbled
across this article - 
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/caches-and-self-modifying-code
- about "self-modifying code". Initially this article seemed to apply to 
JIT-type of scenarios
but then after eventually seeing this small font annotation:
"A more common (though less obvious) example is that of an operating
system kernel: from the point of view of the processor, some code in the
system is modifying some other code in the system every time a process
is swapped in or out." it kind of started making me think that OSv
dynamic linker is somewhat close.

Then I eventually found this paragraph in ARMv8 programmer's guide
in chapter 11.5 "Cache maintenance":
"It is sometimes necessary for software to clean or invalidate a cache.
This might be required when the contents of external memory have been
changed and it is necessary to remove stale data from the cache. It can
also be required after MMU-related activity such as changing access
permissions, cache policies, or virtual to Physical Address mappings, or
when I and D-caches must be synchronized for dynamically generated code
such as JIT-compilers and dynamic library loaders."

In essence aarch64 architecture (Modified Harvard) defines separate instruction 
and data caches -
I-cache and D-cache, so it is sometimes necessary to invalidate instruction
cache after loading code into memory. Which is exactly what the article
about self modifying code explains. How does it apply to OSv?
Well, OSv dynamic linker being part of kernel (code A) loads
in memory application code (B), which by itself does not mean OSv
modifies its own kernel code but it dynamically loads another code
and executes it in the same memory space. 

Making this long story short, this patch modifies critical part
of OSv memory management code - populate_vma() - which gets called any time
vma portion (page) is filled due to page fault or eagerly. It changes
the populate_vma() by making it invalidate instruction cache if the vma is 
executable
per its permission - in essence any time any code is loaded into
memory. To achieve it delegates to an obscure built-in -
__clear_cache(). This logic is actually no-op in x86-64 port,
as this architecture has very strong automatic instruction/data cache
consistency and there is no need to do anyting special like for aarch64.

Fixes #1100

Signed-off-by: Waldemar Kozaczuk <[email protected]>
---
 arch/aarch64/mmu.cc | 19 +++++++++++++++++++
 arch/x64/mmu.cc     |  3 +++
 core/mmu.cc         |  4 ++++
 include/osv/mmu.hh  |  2 ++
 4 files changed, 28 insertions(+)

diff --git a/arch/aarch64/mmu.cc b/arch/aarch64/mmu.cc
index dd8ef850..bc89701d 100644
--- a/arch/aarch64/mmu.cc
+++ b/arch/aarch64/mmu.cc
@@ -97,4 +97,23 @@ bool is_page_fault_write_exclusive(unsigned int esr) {
 bool fast_sigsegv_check(uintptr_t addr, exception_frame* ef) {
     return false;
 }
+
+void vma_invalidate_cache(vma *vma, void *v, size_t size) {
+    // As aarch64 architecture defines separate instruction and data caches -
+    // I-cache and D-cache, it is sometimes necessary to invalidate instruction
+    // cache after loading code into memory. For more details of why and when
+    // it is necessary please read this excellent article -
+    // 
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/caches-and-self-modifying-code.
+    // 
+    // So when OSv dynamic linker, being part of kernel code, loads pages
+    // of executable sections of ELF segments into memory, we need to 
invalidate
+    // the I-cache area of that memory right before it gets executed.
+    // In essence any time part of vma with executable permission
+    // gets populated this function gets called from mmu.cc:populate_vma().
+    if (vma->perm() & perm_exec) {
+       // For more details about what this built-in does, please read this gcc 
documentation -
+       // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
+       __builtin___clear_cache((char*)v, (char*)(v + size));
+    }
+}
 }
diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc
index 24da5caa..c923e4c0 100644
--- a/arch/x64/mmu.cc
+++ b/arch/x64/mmu.cc
@@ -191,4 +191,7 @@ bool fast_sigsegv_check(uintptr_t addr, exception_frame* ef)
 
     return false;
 }
+
+void vma_invalidate_cache(vma *vma, void *v, size_t size) {
+}
 }
diff --git a/core/mmu.cc b/core/mmu.cc
index ff3fab47..10dd35e4 100644
--- a/core/mmu.cc
+++ b/core/mmu.cc
@@ -1206,6 +1206,10 @@ ulong populate_vma(vma *vma, void *v, size_t size, bool 
write = false)
         vma->operate_range(populate_small<Account>(map, vma->perm(), write, 
vma->map_dirty()), v, size) :
         vma->operate_range(populate<Account>(map, vma->perm(), write, 
vma->map_dirty()), v, size);
 
+    // On some architectures it might be necessary to invalidate CPU caches
+    // after the vma memory is populated with code
+    vma_invalidate_cache(vma, v, size);
+
     return total;
 }
 
diff --git a/include/osv/mmu.hh b/include/osv/mmu.hh
index 1830048c..87b83526 100644
--- a/include/osv/mmu.hh
+++ b/include/osv/mmu.hh
@@ -319,6 +319,8 @@ std::string procfs_maps();
 
 unsigned long all_vmas_size();
 
+void vma_invalidate_cache(vma *vma, void *v, size_t size);
+
 }
 
 #endif /* MMU_HH */
-- 
2.28.0

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20201220044251.3577-1-jwkozaczuk%40gmail.com.

Reply via email to