From: Waldemar Kozaczuk <[email protected]> Committer: Nadav Har'El <[email protected]> Branch: master
mempool: resize l1_pool_stats once On OSv emailing list David Smith reported sporadic crashes when running OSv on firecracker in nested virtualization setup looking like this: "OSv v0.55.0-168-g31a04239 1 CPUs detected Firmware vendor: Unknown bsd: initializingAssertion failed: sched::preemptable() (arch/x64/mmu.cc: page_fault: 37) [backtrace]" or "OSv v0.55.0-237-g7cd33926 1 CPUs detected Firmware vendor: Unknown trying to execute or access missing symbol [backtrace]" After long back-and-forth, I was able to re-create this issue on firecracker and qemu microvm where I was able to connect with gdb. It turned out that OSv would trigger page fault in at core/mempool.cc:1198: l1_pool_stats[cpu_id]._nr = nr; The l1_pool_stats is a vector that gets re-sized to the number of cpus every time the cpu notifier is called. Unfortunately, with this approach we may occasionally experience a race condition when this vector gets resized in parallel at the same time on multiple CPUs. In another race scenario, the smp_allocator_cnt counter might be incremented in l2::fill_thread() and smp_allocator set to true before l1_pool_stats get resized in cpu::notifier in which case the accesses to l1_pool_stats might result in page faults. To fix this we need to guarantee that the l1_pool_stats vector is resized once and before smp_allocator flag is set. To achieve this, this patch adds another atomic counter, l1_initialized_cnt, that tracks number of initialized l1 pools. Only when that counter is equal to the number of cpus in the system we resize the l1_pool_stats vector. We also do it before incrementing the smp_allocator_cnt. Signed-off-by: Waldemar Kozaczuk <[email protected]> Message-Id: <[email protected]> --- diff --git a/core/mempool.cc b/core/mempool.cc --- a/core/mempool.cc +++ b/core/mempool.cc @@ -1323,15 +1323,18 @@ class l2 { std::unique_ptr<sched::thread> _fill_thread; }; +std::atomic<unsigned int> l1_initialized_cnt{}; PERCPU(l1*, percpu_l1); static sched::cpu::notifier _notifier([] () { *percpu_l1 = new l1(sched::cpu::current()); + if (++l1_initialized_cnt == sched::cpus.size()) { + l1_pool_stats.resize(sched::cpus.size()); + } // N per-cpu threads for L1 page pool, 1 thread for L2 page pool // Switch to smp_allocator only when all the N + 1 threads are ready if (smp_allocator_cnt++ == sched::cpus.size()) { smp_allocator = true; } - l1_pool_stats.resize(sched::cpus.size()); }); static inline l1& get_l1() { -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/000000000000ff5d9205c190edfe%40google.com.
