Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-19 Thread Andy Lutomirski
On Mon, May 18, 2020 at 1:50 AM Anastassios Nanos
 wrote:
>
> On Mon, May 18, 2020 at 10:50 AM Marc Zyngier  wrote:
> >
> > On 2020-05-18 07:58, Anastassios Nanos wrote:
> > > To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> > > QEMU, or some other kind of VM monitor in user-space to host the vCPU
> > > threads, I/O threads and various other book-keeping/management
> > > mechanisms.
> > > This is perfectly fine for a large number of reasons and use cases: for
> > > instance, running generic VMs, running general purpose Operating
> > > systems
> > > that need some kind of emulation for legacy boot/hardware etc.
> > >
> > > What if we wanted to execute a small piece of code as a guest instance,
> > > without the involvement of user-space? The KVM functions are already
> > > doing
> > > what they should: VM and vCPU setup is already part of the kernel, the
> > > only
> > > missing piece is memory handling.
> > >
> > > With these series, (a) we expose to the Linux Kernel the bare minimum
> > > KVM
> > > API functions in order to spawn a guest instance without the
> > > intervention
> > > of user-space; and (b) we tweak the memory handling code of KVM-related
> > > functions to account for another kind of guest, spawned in
> > > kernel-space.
> > >
> > > PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces
> > > the
> > > changes in the KVM memory handling code for x86_64 and aarch64.
> > >
> > > An example of use is provided based on kvmtest.c
> > > [http://email.nubificus.co.uk/c/eJwdzU0LgjAAxvFPo0eZm1t62MEkC0xQScJTuBdfcGrpQuvTN4KHP7_bIygSDQfY7mkUXotbzQJQftIX7NI9EtEYofOW3eMJ6uTxTtIqz2B1LPhl-w6nMrc8MNa9ctp_-TzaHWUekxwfSMCRIA3gLvFrQAiGDUNE-MxWtNP6uVootGBsprbJmaQ2ChfdcyVXQ4J97EIDe6G7T8zRIJdJKmde2h_0WTe_]
> > >  at
> > > http://email.nubificus.co.uk/c/eJwljdsKgkAYhJ9GL2X9NQ8Xe2GSBSaoJOFVrOt6QFdL17Sevq1gGPhmGKbERllRtFNb7Hvn9EIKF2Wv6AFNtPmlz33juMbXYAAR3pYwypMY8n1KT-u7O2SJYiJO2l6rf05HrjbYsCihRUEp2DYCgmyH2TowGeiVCS6oPW6EuM-K4SkQSNWtaJbiu5ZA-3EpOzYNrJ8ldk_OBZuFOuHNseTdv9LGqf4Apyg8eg
>
> Hi Marc,
>
> thanks for taking the time to check this!
>
> >
> > You don't explain *why* we would want this. What is the overhead of
> > having
> > a userspace if your guest doesn't need any userspace handling? The
> > kvmtest
> > example indeed shows that the KVM userspace API is usable  without any
> > form
> > of emulation, hence has almost no cost.
>
> The rationale behind such an approach is two-fold:
> (a) we are able to ditch any user-space involvement in the creation and
> spawning of a KVM guest. This is particularly interesting in use-cases
> where short-lived tasks are spawned on demand.  Think of a scenario where
> an ABI compatible binary is loaded in memory.  Spawning it as a guest from
> userspace would incur a number of IOCTLs. Doing the same from the kernel
> would be the same number of IOCTLs but now these are function calls;
> additionally, memory handling is kind of simplified.
>
> (b) I agree that the userspace KVM API is usable without emulation for a
> simple task, written in bytecode, adding two registers. But what about
> something more complicated? something that needs I/O? for most use-cases,
> I/O happens between the guest and some hardware device (network/storage
> etc.). Being in the kernel saves us from doing unneccessary mode switches.
> Of course there are optimizations for handling I/O on QEMU/KVM VMs
> (virtio/vhost), but essentially what happens is removing mode-switches (and
> exits) for I/O operations -- is there a good reason not to address that
> directly? a guest running in the kernel exits because of an I/O request,
> which gets processed and forwarded directly to the relevant subsystem *in*
> the kernel (net/block etc.).
>
> We work on both directions with a particular focus on (a) -- device I/O could
> be handled with other mechanisms as well (VFs for instance).
>
> > Without a clear description of the advantages of your solution, as well
> > as a full featured in-tree use case, I find it pretty hard to support
> > this.
>
> Totally understand that -- please keep in mind that this is a first (baby)
> step for what we call KVMM (kernel virtual machine monitor). We presented
> the architecture at FOSDEM and some preliminary results regarding I/O. Of
> course, this is WiP, and far from being upstreamable. Hence the kvmmtest
> example showcasing the potential use-case.
>
> To be honest my main question is whether we are interested in such an
> approach in the first place, and then try to work on any rough edges. As
> far as I understand, you're not in favor of this approach.

The usual answer here is that the kernel is not in favor of adding
in-kernel functionality that is not used in the upstream kernel.  If
you come up with a real use case, and that use case is GPL and has
plans for upstreaming, and that use case has a real benefit
(dramatically faster than user code could likely be, does something
new and useful, etc), then it may well be mergeable.

Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Maxim Levitsky
On Mon, 2020-05-18 at 13:51 +0200, Paolo Bonzini wrote:
> On 18/05/20 13:34, Maxim Levitsky wrote:
> > > In high-performance configurations, most of the time virtio devices are
> > > processed in another thread that polls on the virtio rings.  In this
> > > setup, the rings are configured to not cause a vmexit at all; this has
> > > much smaller latency than even a lightweight (kernel-only) vmexit,
> > > basically corresponding to writing an L1 cache line back to L2.
> > 
> > This can be used to run kernel drivers inside a very thin VM IMHO to break 
> > up the stigma,
> > that kernel driver is always a bad thing to and should be by all means 
> > replaced by a userspace driver,
> > something I see a lot lately, and what was the ground for rejection of my 
> > nvme-mdev proposal.
> 
> It's a tought design decision between speeding up a kernel driver with
> something like eBPF or wanting to move everything to userspace.
> 
> Networking has moved more towards the first because there are many more
> opportunities for NIC-based acceleration, while storage has moved
> towards the latter with things such as io_uring.  That said, I don't see
> why in-kernel NVMeoF drivers would be acceptable for anything but Fibre
> Channel (and that's only because FC HBAs try hard to hide most of the
> SAN layers).
> 
> Paolo
> 

Note that these days storage is as fast or even faster that many types of 
networking,
and that there also are opportunities for acceleration (like p2p pci dma) that 
also are more
natural to do in the kernel.

io-uring is actually not about moving everything to userspace IMHO, but rather 
the opposite,
it allows the userspace to access the kernel block subsystem in very efficent 
way which
is the right thing to do.

Sadly it doesn't help much with fast NVME virtualization because the bottleneck 
moves
to the communication with the guest.

I guess this is getting offtopic, so I won't continue this discussion here,
I just wanted to voice my opinion on this manner.

Another thing that comes to my mind (not that it has to be done in the kernel),
is that AMD's AVIC allows peer to peer interrupts between guests, and that
can in theory allow to run a 'driver' in a special guest and let it communicate
with a normal guest using interrupts bi-directionally which can finally solve 
the
need to waste a core in a busy wait loop.

The only catch is that the 'special guest' has to run 100% of the time,
thus it can't still share a core with other kernel/usespace tasks, but at least 
it can be in sleeping
state most of the time, and it can itsel run various tasks that serve various 
needs.

In other words, I don't have any objection to allowing part of the host kernel 
to run in VMX/SVM
guest mode. This can be a very intersting thing.

Best regards,
Maxim Levitsky

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Paolo Bonzini
On 18/05/20 13:34, Maxim Levitsky wrote:
>> In high-performance configurations, most of the time virtio devices are
>> processed in another thread that polls on the virtio rings.  In this
>> setup, the rings are configured to not cause a vmexit at all; this has
>> much smaller latency than even a lightweight (kernel-only) vmexit,
>> basically corresponding to writing an L1 cache line back to L2.
>
> This can be used to run kernel drivers inside a very thin VM IMHO to break up 
> the stigma,
> that kernel driver is always a bad thing to and should be by all means 
> replaced by a userspace driver,
> something I see a lot lately, and what was the ground for rejection of my 
> nvme-mdev proposal.

It's a tought design decision between speeding up a kernel driver with
something like eBPF or wanting to move everything to userspace.

Networking has moved more towards the first because there are many more
opportunities for NIC-based acceleration, while storage has moved
towards the latter with things such as io_uring.  That said, I don't see
why in-kernel NVMeoF drivers would be acceptable for anything but Fibre
Channel (and that's only because FC HBAs try hard to hide most of the
SAN layers).

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Maxim Levitsky
On Mon, 2020-05-18 at 13:18 +0200, Paolo Bonzini wrote:
> On 18/05/20 10:45, Anastassios Nanos wrote:
> > Being in the kernel saves us from doing unneccessary mode switches.
> > Of course there are optimizations for handling I/O on QEMU/KVM VMs
> > (virtio/vhost), but essentially what happens is removing mode-switches (and
> > exits) for I/O operations -- is there a good reason not to address that
> > directly? a guest running in the kernel exits because of an I/O request,
> > which gets processed and forwarded directly to the relevant subsystem *in*
> > the kernel (net/block etc.).
> 
> In high-performance configurations, most of the time virtio devices are
> processed in another thread that polls on the virtio rings.  In this
> setup, the rings are configured to not cause a vmexit at all; this has
> much smaller latency than even a lightweight (kernel-only) vmexit,
> basically corresponding to writing an L1 cache line back to L2.
> 
> Paolo
> 
This can be used to run kernel drivers inside a very thin VM IMHO to break up 
the stigma,
that kernel driver is always a bad thing to and should be by all means replaced 
by a userspace driver,
something I see a lot lately, and what was the ground for rejection of my 
nvme-mdev proposal.


Best regards,
Maxim Levitsky


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Paolo Bonzini
On 18/05/20 10:45, Anastassios Nanos wrote:
> Being in the kernel saves us from doing unneccessary mode switches.
> Of course there are optimizations for handling I/O on QEMU/KVM VMs
> (virtio/vhost), but essentially what happens is removing mode-switches (and
> exits) for I/O operations -- is there a good reason not to address that
> directly? a guest running in the kernel exits because of an I/O request,
> which gets processed and forwarded directly to the relevant subsystem *in*
> the kernel (net/block etc.).

In high-performance configurations, most of the time virtio devices are
processed in another thread that polls on the virtio rings.  In this
setup, the rings are configured to not cause a vmexit at all; this has
much smaller latency than even a lightweight (kernel-only) vmexit,
basically corresponding to writing an L1 cache line back to L2.

Paolo

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Vitaly Kuznetsov
Anastassios Nanos  writes:

> Moreover, it doesn't involve *any* mode switch at all while printing
> out the result of the  addition of these two registers -- which I
> guess for a simple use-case like this it isn't much.
> But if we were to scale this to a large number of exits (and their
> respective handling in user-space) that would incur significant
> overhead.

Eliminating frequent exits to userspace when the guest is already
running is absolutely fine but eliminating userspace completely, even
for creation of the guest, is something dubious. To create a simple
guest you need just a dozen of IOCTLs, you'll have to find a really,
really good showcase when it makes a difference. 

E.g. I can imagine the following use-case: you need to create a lot of
guests with the same (or almost the same) memory contents and allocating
and populating this memory in userspace takes time. But even in this
use-case, why do you need to terminate your userspace? Or would it be
possible to create guests from a shared memory? (we may not have
copy-on-write capabilities in KVM currently but this doesn't mean they
can't be added).

Alternatively, you may want to mangle vmexit handling somehow and
exiting to userspace seems slow. Fine, let's add eBPF attach points to
KVM and an API to attach eBPF code there.

I'm, however, just guessing. I understand you may not want to reveal
your original idea for some reason but without us understanding what's
really needed I don't see how the change can be reviewed.

-- 
Vitaly

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Thomas Gleixner
Anastassios Nanos  writes:
> On Mon, May 18, 2020 at 11:43 AM Thomas Gleixner  wrote:
>>
>> And this shows clearly how simple the user space is which is required to
>> do that. So why on earth would we want to have all of that in the
>> kernel?
>>
> well, the main idea is that all this functionality is already in the
> kernel. My view is that kvmmtest is as simple as kvmtest.

That still does not explain the purpose, the advantage and any reason
why this should be moreged.

> Moreover, it doesn't involve *any* mode switch at all while printing
> out the result of the addition of these two registers -- which I guess
> for a simple use-case like this it isn't much.  But if we were to
> scale this to a large number of exits (and their respective handling
> in user-space) that would incur significant overhead. Don't you agree?

No. I still do not see the real world use case you are trying to
solve. We are not going to accept changes like this which have no proper
justification, real world use cases and proper numbers backing it up.

Thanks,

tglx


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Thomas Gleixner
Anastassios Nanos  writes:
> To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
> QEMU, or some other kind of VM monitor in user-space to host the vCPU
> threads, I/O threads and various other book-keeping/management mechanisms.
> This is perfectly fine for a large number of reasons and use cases: for
> instance, running generic VMs, running general purpose Operating systems
> that need some kind of emulation for legacy boot/hardware etc.
>
> What if we wanted to execute a small piece of code as a guest instance,
> without the involvement of user-space? The KVM functions are already doing
> what they should: VM and vCPU setup is already part of the kernel, the only
> missing piece is memory handling.
>
> With these series, (a) we expose to the Linux Kernel the bare minimum KVM
> API functions in order to spawn a guest instance without the intervention
> of user-space; and (b) we tweak the memory handling code of KVM-related
> functions to account for another kind of guest, spawned in kernel-space.
>
> PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the
> changes in the KVM memory handling code for x86_64 and aarch64.
>
> An example of use is provided based on kvmtest.c
> [https://lwn.net/Articles/658512/] at

And this shows clearly how simple the user space is which is required to
do that. So why on earth would we want to have all of that in the
kernel?

Thanks,

tglx
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Marc Zyngier

On 2020-05-18 07:58, Anastassios Nanos wrote:

To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
QEMU, or some other kind of VM monitor in user-space to host the vCPU
threads, I/O threads and various other book-keeping/management 
mechanisms.

This is perfectly fine for a large number of reasons and use cases: for
instance, running generic VMs, running general purpose Operating 
systems

that need some kind of emulation for legacy boot/hardware etc.

What if we wanted to execute a small piece of code as a guest instance,
without the involvement of user-space? The KVM functions are already 
doing
what they should: VM and vCPU setup is already part of the kernel, the 
only

missing piece is memory handling.

With these series, (a) we expose to the Linux Kernel the bare minimum 
KVM
API functions in order to spawn a guest instance without the 
intervention

of user-space; and (b) we tweak the memory handling code of KVM-related
functions to account for another kind of guest, spawned in 
kernel-space.


PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces 
the

changes in the KVM memory handling code for x86_64 and aarch64.

An example of use is provided based on kvmtest.c
[https://lwn.net/Articles/658512/] at
https://github.com/cloudkernels/kvmmtest


You don't explain *why* we would want this. What is the overhead of 
having
a userspace if your guest doesn't need any userspace handling? The 
kvmtest
example indeed shows that the KVM userspace API is usable  without any 
form

of emulation, hence has almost no cost.

Without a clear description of the advantages of your solution, as well
as a full featured in-tree use case, I find it pretty hard to support 
this.


Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/2] Expose KVM API to Linux Kernel

2020-05-18 Thread Anastassios Nanos
To spawn KVM-enabled Virtual Machines on Linux systems, one has to use
QEMU, or some other kind of VM monitor in user-space to host the vCPU
threads, I/O threads and various other book-keeping/management mechanisms.
This is perfectly fine for a large number of reasons and use cases: for
instance, running generic VMs, running general purpose Operating systems
that need some kind of emulation for legacy boot/hardware etc.

What if we wanted to execute a small piece of code as a guest instance,
without the involvement of user-space? The KVM functions are already doing
what they should: VM and vCPU setup is already part of the kernel, the only
missing piece is memory handling.

With these series, (a) we expose to the Linux Kernel the bare minimum KVM
API functions in order to spawn a guest instance without the intervention
of user-space; and (b) we tweak the memory handling code of KVM-related
functions to account for another kind of guest, spawned in kernel-space.

PATCH #1 exposes the needed stub functions, whereas PATCH #2 introduces the
changes in the KVM memory handling code for x86_64 and aarch64.

An example of use is provided based on kvmtest.c
[https://lwn.net/Articles/658512/] at
https://github.com/cloudkernels/kvmmtest

Anastassios Nanos (2):
  KVMM: export needed symbols
  KVMM: Memory and interface related changes

 arch/arm64/include/asm/kvm_host.h   |   6 ++
 arch/arm64/kvm/fpsimd.c |   8 +-
 arch/arm64/kvm/guest.c  |  48 +++
 arch/x86/include/asm/fpu/internal.h |  10 ++-
 arch/x86/kvm/cpuid.c|  25 ++
 arch/x86/kvm/emulate.c  |   3 +-
 arch/x86/kvm/vmx/vmx.c  |   3 +-
 arch/x86/kvm/x86.c  |  38 -
 include/linux/kvm_host.h|  36 +
 virt/kvm/arm/arm.c  |  18 +
 virt/kvm/arm/mmu.c  |  34 +---
 virt/kvm/async_pf.c |   4 +-
 virt/kvm/coalesced_mmio.c   |   6 ++
 virt/kvm/kvm_main.c | 120 ++--
 14 files changed, 316 insertions(+), 43 deletions(-)

-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm