This is quite small and simple patch but it has taken me almost 2 months researching and understanding the problem and finding the right solution. It has involved reading ARMv8 programmer guide, posting questions to ARM forums as well as trying to debug the problem mostly in trial-and-error fashion as somewhat documented by the issue #1100. The special credit goes to Claudio Fontana who helped me tremendously by explaining and suggesting many valuable ideas.
As the issue #1100 explains, OSv would occasionally or quite repeatedly depending on the application, crash due to an unexpected Unknown Reason class synchronous exception (EC=0). This would never happen in emulated mode (QEMU with TCG) but quite freqently on real ARM hardware like RPI 4 on QEMU with KVM or Firecracker. Per ARM documentation - https://developer.arm.com/docs/ddi0595/h/aarch64-system-registers/esr_el1#ISS_exceptionswithanunknownreason - there are many potential causes of EC=0 exception including "attempted execution of an instruction bit pattern that has no allocated instruction" which means trying to execute garbage. All of those potential causes which I quite meticulously researched, examined and discussed some with Claudio, did not seem to apply or did not make much sense in OSv context. Until one of them did when I stumbled across this article - https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/caches-and-self-modifying-code - about "self-modifying code". Initially this article seemed to apply to JIT-type of scenarios but then after eventually seeing this small font annotation: "A more common (though less obvious) example is that of an operating system kernel: from the point of view of the processor, some code in the system is modifying some other code in the system every time a process is swapped in or out." it kind of started making me think that OSv dynamic linker is somewhat close. Then I eventually found this paragraph in ARMv8 programmer's guide in chapter 11.5 "Cache maintenance": "It is sometimes necessary for software to clean or invalidate a cache. This might be required when the contents of external memory have been changed and it is necessary to remove stale data from the cache. It can also be required after MMU-related activity such as changing access permissions, cache policies, or virtual to Physical Address mappings, or when I and D-caches must be synchronized for dynamically generated code such as JIT-compilers and dynamic library loaders." In essence aarch64 architecture (Modified Harvard) defines separate instruction and data caches - I-cache and D-cache, so it is sometimes necessary to invalidate instruction cache after loading code into memory. Which is exactly what the article about self modifying code explains. How does it apply to OSv? Well, OSv dynamic linker being part of kernel (code A) loads in memory application code (B), which by itself does not mean OSv modifies its own kernel code but it dynamically loads another code and executes it in the same memory space. Making this long story short, this patch modifies critical part of OSv memory management code - populate_vma() - which gets called any time vma portion (page) is filled due to page fault or eagerly. It changes the populate_vma() by making it invalidate instruction cache if the vma is executable per its permission - in essence any time any code is loaded into memory. To achieve it delegates to an obscure built-in - __clear_cache(). This logic is actually no-op in x86-64 port, as this architecture has very strong automatic instruction/data cache consistency and there is no need to do anyting special like for aarch64. Fixes #1100 Signed-off-by: Waldemar Kozaczuk <[email protected]> --- arch/aarch64/mmu.cc | 19 +++++++++++++++++++ arch/x64/mmu.cc | 3 +++ core/mmu.cc | 4 ++++ include/osv/mmu.hh | 2 ++ 4 files changed, 28 insertions(+) diff --git a/arch/aarch64/mmu.cc b/arch/aarch64/mmu.cc index dd8ef850..bc89701d 100644 --- a/arch/aarch64/mmu.cc +++ b/arch/aarch64/mmu.cc @@ -97,4 +97,23 @@ bool is_page_fault_write_exclusive(unsigned int esr) { bool fast_sigsegv_check(uintptr_t addr, exception_frame* ef) { return false; } + +void vma_invalidate_cache(vma *vma, void *v, size_t size) { + // As aarch64 architecture defines separate instruction and data caches - + // I-cache and D-cache, it is sometimes necessary to invalidate instruction + // cache after loading code into memory. For more details of why and when + // it is necessary please read this excellent article - + // https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/caches-and-self-modifying-code. + // + // So when OSv dynamic linker, being part of kernel code, loads pages + // of executable sections of ELF segments into memory, we need to invalidate + // the I-cache area of that memory right before it gets executed. + // In essence any time part of vma with executable permission + // gets populated this function gets called from mmu.cc:populate_vma(). + if (vma->perm() & perm_exec) { + // For more details about what this built-in does, please read this gcc documentation - + // https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html + __builtin___clear_cache((char*)v, (char*)(v + size)); + } +} } diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc index 24da5caa..c923e4c0 100644 --- a/arch/x64/mmu.cc +++ b/arch/x64/mmu.cc @@ -191,4 +191,7 @@ bool fast_sigsegv_check(uintptr_t addr, exception_frame* ef) return false; } + +void vma_invalidate_cache(vma *vma, void *v, size_t size) { +} } diff --git a/core/mmu.cc b/core/mmu.cc index ff3fab47..10dd35e4 100644 --- a/core/mmu.cc +++ b/core/mmu.cc @@ -1206,6 +1206,10 @@ ulong populate_vma(vma *vma, void *v, size_t size, bool write = false) vma->operate_range(populate_small<Account>(map, vma->perm(), write, vma->map_dirty()), v, size) : vma->operate_range(populate<Account>(map, vma->perm(), write, vma->map_dirty()), v, size); + // On some architectures it might be necessary to invalidate CPU caches + // after the vma memory is populated with code + vma_invalidate_cache(vma, v, size); + return total; } diff --git a/include/osv/mmu.hh b/include/osv/mmu.hh index 1830048c..87b83526 100644 --- a/include/osv/mmu.hh +++ b/include/osv/mmu.hh @@ -319,6 +319,8 @@ std::string procfs_maps(); unsigned long all_vmas_size(); +void vma_invalidate_cache(vma *vma, void *v, size_t size); + } #endif /* MMU_HH */ -- 2.28.0 -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/20201220044251.3577-1-jwkozaczuk%40gmail.com.
