From: Waldemar Kozaczuk <[email protected]>
Committer: Nadav Har'El <[email protected]>
Branch: master

mempool: resize l1_pool_stats once

On OSv emailing list David Smith reported sporadic crashes
when running OSv on firecracker in nested virtualization setup
looking like this:

"OSv v0.55.0-168-g31a04239
1 CPUs detected
Firmware vendor: Unknown
bsd: initializingAssertion failed: sched::preemptable() (arch/x64/mmu.cc: 
page_fault: 37)
[backtrace]"

or

"OSv v0.55.0-237-g7cd33926
1 CPUs detected
Firmware vendor: Unknown
trying to execute or access missing symbol
[backtrace]"

After long back-and-forth, I was able to re-create this issue
on firecracker and qemu microvm where I was able to connect
with gdb. It turned out that OSv would trigger page fault in
at core/mempool.cc:1198:

 l1_pool_stats[cpu_id]._nr = nr;

The l1_pool_stats is a vector that gets re-sized to the number of cpus
every time the cpu notifier is called. Unfortunately, with this
approach we may occasionally experience a race condition when this
vector gets resized in parallel at the same time on multiple CPUs.
In another race scenario, the smp_allocator_cnt counter might be incremented
in l2::fill_thread() and smp_allocator set to true before l1_pool_stats
get resized in cpu::notifier in which case the accesses to l1_pool_stats
might result in page faults.

To fix this we need to guarantee that the l1_pool_stats vector is
resized once and before smp_allocator flag is set.

To achieve this, this patch adds another atomic counter,
l1_initialized_cnt, that tracks number of initialized l1 pools.
Only when that counter is equal to the number of cpus in the system
we resize the l1_pool_stats vector. We also do it before incrementing
the smp_allocator_cnt.

Signed-off-by: Waldemar Kozaczuk <[email protected]>
Message-Id: <[email protected]>

---
diff --git a/core/mempool.cc b/core/mempool.cc
--- a/core/mempool.cc
+++ b/core/mempool.cc
@@ -1323,15 +1323,18 @@ class l2 {
     std::unique_ptr<sched::thread> _fill_thread;
 };
 
+std::atomic<unsigned int> l1_initialized_cnt{};
 PERCPU(l1*, percpu_l1);
 static sched::cpu::notifier _notifier([] () {
     *percpu_l1 = new l1(sched::cpu::current());
+    if (++l1_initialized_cnt == sched::cpus.size()) {
+        l1_pool_stats.resize(sched::cpus.size());
+    }
     // N per-cpu threads for L1 page pool, 1 thread for L2 page pool
     // Switch to smp_allocator only when all the N + 1 threads are ready
     if (smp_allocator_cnt++ == sched::cpus.size()) {
         smp_allocator = true;
     }
-    l1_pool_stats.resize(sched::cpus.size());
 });
 static inline l1& get_l1()
 {

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/000000000000ff5d9205c190edfe%40google.com.

Reply via email to