Re: [PATCH 0/2] Expose KVM API to Linux Kernel
On Mon, May 18, 2020 at 1:50 AM Anastassios Nanos wrote: > > On Mon, May 18, 2020 at 10:50 AM Marc Zyngier wrote: > > > > On 2020-05-18 07:58, Anastassios Nanos wrote: > > > To spawn KVM-enabled Virtual Machines on Linux systems, one has to use > > > QEMU, or some other kind of VM monitor in user-space to host the vCPU > > > threads, I/O threads and various other book-keeping/management > > > mechanisms. > > > This is perfectly fine for a large number of reasons and use cases: for > > > instance, running generic VMs, running general purpose Operating > > > systems > > > that need some kind of emulation for legacy boot/hardware etc. > > > > > > What if we wanted to execute a small piece of code as a guest instance, > > > without the involvement of user-space? The KVM functions are already > > > doing > > > what they should: VM and vCPU setup is already part of the kernel, the > > > only > > > missing piece is memory handling. > > > > > > With these series, (a) we expose to the Linux Kernel the bare minimum > > > KVM > > > API functions in order to spawn a guest instance without the > > > intervention > > > of user-space; and (b) we tweak the memory handling code of KVM-related > > > functions to account for another kind of guest, spawned in > > > kernel-space. > > > > > > PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces > > > the > > > changes in the KVM memory handling code for x86_64 and aarch64. > > > > > > An example of use is provided based on kvmtest.c > > > [http://email.nubificus.co.uk/c/eJwdzU0LgjAAxvFPo0eZm1t62MEkC0xQScJTuBdfcGrpQuvTN4KHP7_bIygSDQfY7mkUXotbzQJQftIX7NI9EtEYofOW3eMJ6uTxTtIqz2B1LPhl-w6nMrc8MNa9ctp_-TzaHWUekxwfSMCRIA3gLvFrQAiGDUNE-MxWtNP6uVootGBsprbJmaQ2ChfdcyVXQ4J97EIDe6G7T8zRIJdJKmde2h_0WTe_] > > > at > > > http://email.nubificus.co.uk/c/eJwljdsKgkAYhJ9GL2X9NQ8Xe2GSBSaoJOFVrOt6QFdL17Sevq1gGPhmGKbERllRtFNb7Hvn9EIKF2Wv6AFNtPmlz33juMbXYAAR3pYwypMY8n1KT-u7O2SJYiJO2l6rf05HrjbYsCihRUEp2DYCgmyH2TowGeiVCS6oPW6EuM-K4SkQSNWtaJbiu5ZA-3EpOzYNrJ8ldk_OBZuFOuHNseTdv9LGqf4Apyg8eg > > Hi Marc, > > thanks for taking the time to check this! > > > > > You don't explain *why* we would want this. What is the overhead of > > having > > a userspace if your guest doesn't need any userspace handling? The > > kvmtest > > example indeed shows that the KVM userspace API is usable without any > > form > > of emulation, hence has almost no cost. > > The rationale behind such an approach is two-fold: > (a) we are able to ditch any user-space involvement in the creation and > spawning of a KVM guest. This is particularly interesting in use-cases > where short-lived tasks are spawned on demand. Think of a scenario where > an ABI compatible binary is loaded in memory. Spawning it as a guest from > userspace would incur a number of IOCTLs. Doing the same from the kernel > would be the same number of IOCTLs but now these are function calls; > additionally, memory handling is kind of simplified. > > (b) I agree that the userspace KVM API is usable without emulation for a > simple task, written in bytecode, adding two registers. But what about > something more complicated? something that needs I/O? for most use-cases, > I/O happens between the guest and some hardware device (network/storage > etc.). Being in the kernel saves us from doing unneccessary mode switches. > Of course there are optimizations for handling I/O on QEMU/KVM VMs > (virtio/vhost), but essentially what happens is removing mode-switches (and > exits) for I/O operations -- is there a good reason not to address that > directly? a guest running in the kernel exits because of an I/O request, > which gets processed and forwarded directly to the relevant subsystem *in* > the kernel (net/block etc.). > > We work on both directions with a particular focus on (a) -- device I/O could > be handled with other mechanisms as well (VFs for instance). > > > Without a clear description of the advantages of your solution, as well > > as a full featured in-tree use case, I find it pretty hard to support > > this. > > Totally understand that -- please keep in mind that this is a first (baby) > step for what we call KVMM (kernel virtual machine monitor). We presented > the architecture at FOSDEM and some preliminary results regarding I/O. Of > course, this is WiP, and far from being upstreamable. Hence the kvmmtest > example showcasing the potential use-case. > > To be honest my main question is whether we are interested in such an > approach in the first place, and then try to work on any rough edges. As > far as I understand, you're not in favor of this approach. The usual answer here is that the kernel is not in favor of adding in-kernel functionality that is not used in the upstream kernel. If you come up with a real use case, and that use case is GPL and has plans for upstreaming, and that use case has a real benefit (dramatically faster than user code could likely be, does something new and useful, etc), then it may well be mergeable.
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
On Mon, 2020-05-18 at 13:51 +0200, Paolo Bonzini wrote: > On 18/05/20 13:34, Maxim Levitsky wrote: > > > In high-performance configurations, most of the time virtio devices are > > > processed in another thread that polls on the virtio rings. In this > > > setup, the rings are configured to not cause a vmexit at all; this has > > > much smaller latency than even a lightweight (kernel-only) vmexit, > > > basically corresponding to writing an L1 cache line back to L2. > > > > This can be used to run kernel drivers inside a very thin VM IMHO to break > > up the stigma, > > that kernel driver is always a bad thing to and should be by all means > > replaced by a userspace driver, > > something I see a lot lately, and what was the ground for rejection of my > > nvme-mdev proposal. > > It's a tought design decision between speeding up a kernel driver with > something like eBPF or wanting to move everything to userspace. > > Networking has moved more towards the first because there are many more > opportunities for NIC-based acceleration, while storage has moved > towards the latter with things such as io_uring. That said, I don't see > why in-kernel NVMeoF drivers would be acceptable for anything but Fibre > Channel (and that's only because FC HBAs try hard to hide most of the > SAN layers). > > Paolo > Note that these days storage is as fast or even faster that many types of networking, and that there also are opportunities for acceleration (like p2p pci dma) that also are more natural to do in the kernel. io-uring is actually not about moving everything to userspace IMHO, but rather the opposite, it allows the userspace to access the kernel block subsystem in very efficent way which is the right thing to do. Sadly it doesn't help much with fast NVME virtualization because the bottleneck moves to the communication with the guest. I guess this is getting offtopic, so I won't continue this discussion here, I just wanted to voice my opinion on this manner. Another thing that comes to my mind (not that it has to be done in the kernel), is that AMD's AVIC allows peer to peer interrupts between guests, and that can in theory allow to run a 'driver' in a special guest and let it communicate with a normal guest using interrupts bi-directionally which can finally solve the need to waste a core in a busy wait loop. The only catch is that the 'special guest' has to run 100% of the time, thus it can't still share a core with other kernel/usespace tasks, but at least it can be in sleeping state most of the time, and it can itsel run various tasks that serve various needs. In other words, I don't have any objection to allowing part of the host kernel to run in VMX/SVM guest mode. This can be a very intersting thing. Best regards, Maxim Levitsky ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
On 18/05/20 13:34, Maxim Levitsky wrote: >> In high-performance configurations, most of the time virtio devices are >> processed in another thread that polls on the virtio rings. In this >> setup, the rings are configured to not cause a vmexit at all; this has >> much smaller latency than even a lightweight (kernel-only) vmexit, >> basically corresponding to writing an L1 cache line back to L2. > > This can be used to run kernel drivers inside a very thin VM IMHO to break up > the stigma, > that kernel driver is always a bad thing to and should be by all means > replaced by a userspace driver, > something I see a lot lately, and what was the ground for rejection of my > nvme-mdev proposal. It's a tought design decision between speeding up a kernel driver with something like eBPF or wanting to move everything to userspace. Networking has moved more towards the first because there are many more opportunities for NIC-based acceleration, while storage has moved towards the latter with things such as io_uring. That said, I don't see why in-kernel NVMeoF drivers would be acceptable for anything but Fibre Channel (and that's only because FC HBAs try hard to hide most of the SAN layers). Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
On Mon, 2020-05-18 at 13:18 +0200, Paolo Bonzini wrote: > On 18/05/20 10:45, Anastassios Nanos wrote: > > Being in the kernel saves us from doing unneccessary mode switches. > > Of course there are optimizations for handling I/O on QEMU/KVM VMs > > (virtio/vhost), but essentially what happens is removing mode-switches (and > > exits) for I/O operations -- is there a good reason not to address that > > directly? a guest running in the kernel exits because of an I/O request, > > which gets processed and forwarded directly to the relevant subsystem *in* > > the kernel (net/block etc.). > > In high-performance configurations, most of the time virtio devices are > processed in another thread that polls on the virtio rings. In this > setup, the rings are configured to not cause a vmexit at all; this has > much smaller latency than even a lightweight (kernel-only) vmexit, > basically corresponding to writing an L1 cache line back to L2. > > Paolo > This can be used to run kernel drivers inside a very thin VM IMHO to break up the stigma, that kernel driver is always a bad thing to and should be by all means replaced by a userspace driver, something I see a lot lately, and what was the ground for rejection of my nvme-mdev proposal. Best regards, Maxim Levitsky ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
On 18/05/20 10:45, Anastassios Nanos wrote: > Being in the kernel saves us from doing unneccessary mode switches. > Of course there are optimizations for handling I/O on QEMU/KVM VMs > (virtio/vhost), but essentially what happens is removing mode-switches (and > exits) for I/O operations -- is there a good reason not to address that > directly? a guest running in the kernel exits because of an I/O request, > which gets processed and forwarded directly to the relevant subsystem *in* > the kernel (net/block etc.). In high-performance configurations, most of the time virtio devices are processed in another thread that polls on the virtio rings. In this setup, the rings are configured to not cause a vmexit at all; this has much smaller latency than even a lightweight (kernel-only) vmexit, basically corresponding to writing an L1 cache line back to L2. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
Anastassios Nanos writes: > Moreover, it doesn't involve *any* mode switch at all while printing > out the result of the addition of these two registers -- which I > guess for a simple use-case like this it isn't much. > But if we were to scale this to a large number of exits (and their > respective handling in user-space) that would incur significant > overhead. Eliminating frequent exits to userspace when the guest is already running is absolutely fine but eliminating userspace completely, even for creation of the guest, is something dubious. To create a simple guest you need just a dozen of IOCTLs, you'll have to find a really, really good showcase when it makes a difference. E.g. I can imagine the following use-case: you need to create a lot of guests with the same (or almost the same) memory contents and allocating and populating this memory in userspace takes time. But even in this use-case, why do you need to terminate your userspace? Or would it be possible to create guests from a shared memory? (we may not have copy-on-write capabilities in KVM currently but this doesn't mean they can't be added). Alternatively, you may want to mangle vmexit handling somehow and exiting to userspace seems slow. Fine, let's add eBPF attach points to KVM and an API to attach eBPF code there. I'm, however, just guessing. I understand you may not want to reveal your original idea for some reason but without us understanding what's really needed I don't see how the change can be reviewed. -- Vitaly ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
Anastassios Nanos writes: > On Mon, May 18, 2020 at 11:43 AM Thomas Gleixner wrote: >> >> And this shows clearly how simple the user space is which is required to >> do that. So why on earth would we want to have all of that in the >> kernel? >> > well, the main idea is that all this functionality is already in the > kernel. My view is that kvmmtest is as simple as kvmtest. That still does not explain the purpose, the advantage and any reason why this should be moreged. > Moreover, it doesn't involve *any* mode switch at all while printing > out the result of the addition of these two registers -- which I guess > for a simple use-case like this it isn't much. But if we were to > scale this to a large number of exits (and their respective handling > in user-space) that would incur significant overhead. Don't you agree? No. I still do not see the real world use case you are trying to solve. We are not going to accept changes like this which have no proper justification, real world use cases and proper numbers backing it up. Thanks, tglx ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
Anastassios Nanos writes: > To spawn KVM-enabled Virtual Machines on Linux systems, one has to use > QEMU, or some other kind of VM monitor in user-space to host the vCPU > threads, I/O threads and various other book-keeping/management mechanisms. > This is perfectly fine for a large number of reasons and use cases: for > instance, running generic VMs, running general purpose Operating systems > that need some kind of emulation for legacy boot/hardware etc. > > What if we wanted to execute a small piece of code as a guest instance, > without the involvement of user-space? The KVM functions are already doing > what they should: VM and vCPU setup is already part of the kernel, the only > missing piece is memory handling. > > With these series, (a) we expose to the Linux Kernel the bare minimum KVM > API functions in order to spawn a guest instance without the intervention > of user-space; and (b) we tweak the memory handling code of KVM-related > functions to account for another kind of guest, spawned in kernel-space. > > PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the > changes in the KVM memory handling code for x86_64 and aarch64. > > An example of use is provided based on kvmtest.c > [https://lwn.net/Articles/658512/] at And this shows clearly how simple the user space is which is required to do that. So why on earth would we want to have all of that in the kernel? Thanks, tglx ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/2] Expose KVM API to Linux Kernel
On 2020-05-18 07:58, Anastassios Nanos wrote: To spawn KVM-enabled Virtual Machines on Linux systems, one has to use QEMU, or some other kind of VM monitor in user-space to host the vCPU threads, I/O threads and various other book-keeping/management mechanisms. This is perfectly fine for a large number of reasons and use cases: for instance, running generic VMs, running general purpose Operating systems that need some kind of emulation for legacy boot/hardware etc. What if we wanted to execute a small piece of code as a guest instance, without the involvement of user-space? The KVM functions are already doing what they should: VM and vCPU setup is already part of the kernel, the only missing piece is memory handling. With these series, (a) we expose to the Linux Kernel the bare minimum KVM API functions in order to spawn a guest instance without the intervention of user-space; and (b) we tweak the memory handling code of KVM-related functions to account for another kind of guest, spawned in kernel-space. PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the changes in the KVM memory handling code for x86_64 and aarch64. An example of use is provided based on kvmtest.c [https://lwn.net/Articles/658512/] at https://github.com/cloudkernels/kvmmtest You don't explain *why* we would want this. What is the overhead of having a userspace if your guest doesn't need any userspace handling? The kvmtest example indeed shows that the KVM userspace API is usable without any form of emulation, hence has almost no cost. Without a clear description of the advantages of your solution, as well as a full featured in-tree use case, I find it pretty hard to support this. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 0/2] Expose KVM API to Linux Kernel
To spawn KVM-enabled Virtual Machines on Linux systems, one has to use QEMU, or some other kind of VM monitor in user-space to host the vCPU threads, I/O threads and various other book-keeping/management mechanisms. This is perfectly fine for a large number of reasons and use cases: for instance, running generic VMs, running general purpose Operating systems that need some kind of emulation for legacy boot/hardware etc. What if we wanted to execute a small piece of code as a guest instance, without the involvement of user-space? The KVM functions are already doing what they should: VM and vCPU setup is already part of the kernel, the only missing piece is memory handling. With these series, (a) we expose to the Linux Kernel the bare minimum KVM API functions in order to spawn a guest instance without the intervention of user-space; and (b) we tweak the memory handling code of KVM-related functions to account for another kind of guest, spawned in kernel-space. PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the changes in the KVM memory handling code for x86_64 and aarch64. An example of use is provided based on kvmtest.c [https://lwn.net/Articles/658512/] at https://github.com/cloudkernels/kvmmtest Anastassios Nanos (2): KVMM: export needed symbols KVMM: Memory and interface related changes arch/arm64/include/asm/kvm_host.h | 6 ++ arch/arm64/kvm/fpsimd.c | 8 +- arch/arm64/kvm/guest.c | 48 +++ arch/x86/include/asm/fpu/internal.h | 10 ++- arch/x86/kvm/cpuid.c| 25 ++ arch/x86/kvm/emulate.c | 3 +- arch/x86/kvm/vmx/vmx.c | 3 +- arch/x86/kvm/x86.c | 38 - include/linux/kvm_host.h| 36 + virt/kvm/arm/arm.c | 18 + virt/kvm/arm/mmu.c | 34 +--- virt/kvm/async_pf.c | 4 +- virt/kvm/coalesced_mmio.c | 6 ++ virt/kvm/kvm_main.c | 120 ++-- 14 files changed, 316 insertions(+), 43 deletions(-) -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm