On Sat, May 23, 2020 at 9:38 PM Dave Voutila <[email protected]> wrote:
>
> Hello tech@,
>
> Attached is a diff that patches vmd(8) to utilize libevent 2.1 (from
> ports) in an attempt to test the hypothesis that thread safety will
> help stabilize Linux guest support. There's some longer detail below
> about this hypothesis, but let me cut to the chase:

I failed to pose the hypothesis more concretely.

If I understand vmd(8) correctly, the process created by the 'vmm'
process (after privsep) forks to create a 'vm' process representing
the guest. After this fork, there's a call to vmm.c::vmm_pipe(), which
adds an event to the event_base to deal with imsg's. Once in
vm.c::run_vm(), a pthread is created (the 'event_thread') that
initiates the event pump/loop. My speculation is this design is
leading to spurious issues with the libevent state in the event_base.

My reasons for believing this come from inspecting the "time heap" in
libevent using gdb(1) during the aforementioned "lock ups" and by
setting breakpoints at the point the exit(3) call. In all instances, I
directly observed the looping or noticed states where events in the
"time heap" caused the conditions to exit(3).

>
> ** This is *not* a petition or request to switch vmd(8) to libevent2
> or to import libevent2. It's a proof of concept to help identify if
> the cause of many ghosts in the (virtual) machines is due to thread
> safety issues in vmd(8)'s vm processes. **
>
> If anyone is willing, you can apply the diff to your own copy of the
> src tree OR simply clone my github branch
> (https://github.com/voutilad/openbsd-src/tree/vmd-libevent2) and build
> just vmd(8). You'll need to pkg_add libevent first, however.
>
> To explain a bit further, I got here after back and forth with pd@ and
> debugging event queues on my Linux guests. I've been hacking a Linux
> kernel vmm-clock implementation[0] in an effort to solve often
> reported clock drift due to reliance on refined-jiffies as a
> clocksource. A few others, including pd@, have done something similar,
> and the common theme was when we'd get our kvm-clock derivatives
> attached in the Linux guest, they'd typically die a quick death.
> They'd keep proper time for once with no clock drift, but they'd live
> short miserable lives.
>
> I spent over a week debugging the crashes and lockups and I came
> across 3 common scenarios that seemed to have a common theme of race
> conditions corrupting the libevent event queue:
>
> 1. the emulated serial console locks up, host CPU goes to 100% for the
> vm process
> 2. the whole guest locks up, host CPU goes to 100% for the vm process
> 3. the host dies a sudden, instantaneous death

This should read *guest*, not host. (thanks pd@). Mea culpa.

>
> In scenario 1, the vm process would exhibit a rapid spin through the
> event loop, with some event constantly triggering hence the CPU
> utilization rising. There was speculation and a patch from pd@ to
> effectively add some buffering to how fast the emulated ns8250 would
> read data and at least on my system resolved the CPU issue, but didn't
> fix the read lockups.
>
> In scenario 2, the cause is a very tight loop inside libevent's
> timeout_process function seen in the wild for libevent 1.x [1] and
> even in cases for libevent 2.x [2] especially when not enabling
> pthread support [3].
>
> In scenario 3, after some debugging [4], it looks due to libevent
> calling exit(3) because of the event_queue_insert() function detecting
> a double queueing [5].
>
> The below diff does 2 things:
>
> 1. ports the vmd(8) codebase to use the libevent 2.1 API
> 2. recreates an event_base in the vm process forked in
> vmm.c::vmm_start_vm() and initializes pthread support before
> initializing a new event_base (which apparently enables locking
> mechanisms in the event_base)
>
> So far in my personal testing (Lenovo x270, i7-7500U) simply swapping
> in libevent 2.1 and booting my custom Linux kernel [6] with my
> paravirtualized clock results in a stable guest [7]. I was able to run
> under 100% cpu load (recompiling linux) and maintain stability and
> clock for multiple hours with a time drift from the host clock close
> to 0.

This same guest is alive and well even after a suspend/resume cycle
last night. No clock issues, no instability. Just a penguin in a cage
that finally behaves.

>
> I've also experienced less serial console lockups even without pd@'s
> patch, but they seem to be only on the emulated ns8250 reading my
> input. If I use `vmctl stop`, my vmmci(4) Linux driver [7] catches the
> request from vmd(8) and shuts down cleanly and you see console output.
>
> Going forward, while importing libevent 2.x will probably be too much
> headache + future care/feeding for little reward, I'm curious where
> best to focus next and would love some guidance on what is probably
> most fruitful:
>
> - Rework the vm.c event handling to not be multi-threaded?
> - Backport some thread safety features from libevent 2.x to the 1.x
> version in base?
> - Roll a ports-distributed version of vmd(8) that uses libevent from
> ports to allow increased testing?
> - Import libevent 2.1 into base? (just kidding :-P)
>
> I believe the "Linux clock support" conversation is a separate matter
> as there are multiple paths if we can improve guest stability (e.g.
> masquerade as KVM vs. trying to upstream VMM paravirt support into the
> kernel) and some really have little to do with OpenBSD specifically.
>
> [1] https://libevent-users.monkey.narkive.com/h4racbzJ/infinity-loop
> [2] https://www.mail-archive.com/[email protected]/msg00983.html
> [3] https://github.com/libevent/libevent/issues/431
> [4] https://gist.github.com/voutilad/9b55e8ed7abfbfec0860a1cf966aa93a
> [5] 
> https://github.com/openbsd/src/blob/915a5ad8a32fee41cc229a1395f5cd0641ab2a78/lib/libevent/event.c#L874-L881
> [6] https://github.com/voutilad/linux (grab branch linux-5.4-obsd, use
> the "config-obsd" file as the kernel config or enable "OpenBSD VMM
> Guest" support manually before building)
> [7] https://github.com/voutilad/virtio_vmmci

-- 
Dave Voutila

Reply via email to