The KVM introspection subsystem provides a facility for applications running
on the host or in a separate VM, to control the execution of other VM-s
(pause, resume, shutdown), query the state of the vCPUs (GPRs, MSRs etc.),
alter the page access bits in the shadow page tables (only for the hardware
backed ones, eg. Intel's EPT) and receive notifications when events of
interest have taken place (shadow page table level faults, key MSR writes,
hypercalls etc.). Some notifications can be responded to with an action
(like preventing an MSR from being written), others are mere informative
(like breakpoint events which can be used for execution tracing).
With few exceptions, all events are optional. An application using this
subsystem will explicitly register for them.
The use case that gave way for the creation of this subsystem is to monitor
the guest OS and as such the ABI/API is highly influenced by how the guest
software (kernel, applications) sees the world. For example, some events
provide information specific for the host CPU architecture
(eg. MSR_IA32_SYSENTER_EIP) merely because its leveraged by guest software
to implement a critical feature (fast system calls).
At the moment, the target audience for KVMI are security software authors
that wish to perform forensics on newly discovered threats (exploits) or
to implement another layer of security like preventing a large set of
kernel rootkits simply by "locking" the kernel image in the shadow page
tables (ie. enforce .text r-x, .rodata rw- etc.). It's the latter case that
made KVMI a separate subsystem, even though many of these features are
available in the device manager (eg. QEMU). The ability to build a security
application that does not interfere (in terms of performance) with the
guest software asks for a specialized interface that is designed for minimum
overhead.
This patch series is based on 5.0-rc7,
commit de3ccd26fafc ("KVM: MMU: record maximum physical address width in
kvm_mmu_extended_role").
The previous RFC (v5) can be read here:
https://www.spinics.net/lists/kvm/msg179441.html
Thanks to Samuel Laurén and Mathieu Tarral, the previous version has
been integrated and tested with libVMI.
KVM-VMI: https://github.com/KVM-VMI/kvm-vmi
Kernel: https://github.com/KVM-VMI/kvm/tree/kvmi
QEMU:https://github.com/KVM-VMI/qemu/tree/kvmi
(not all patches, but enough to work)
libVMI: https://github.com/KVM-VMI/libvmi/tree/kvmi
Thanks to Weijiang Yang, the previous version has been integrated and
tested with the SPP patch series.
https://github.com/adlazar/kvm/tree/kvmi-v5-spp
Quickstart:
https://github.com/adlazar/kvm/blob/kvmi-v5-spp/tools/kvm/kvmi/README
I hope this version will be merged into KVM-VMI project too.
Patches 1-20: unroll a big part of the KVM introspection subsystem,
sent in one patch in the previous versions.
Patches 21-24: extend the current page tracking code.
Patches 25-33: make use of page tracking to support the
KVMI_SET_PAGE_ACCESS introspection command and the KVMI_EVENT_PF event
(on EPT violations caused by the tracking settings).
Patches 34-42: include the SPP feature (Enable Sub-page
Write Protection Support), already sent to KVM list:
https://lore.kernel.org/lkml/20190717133751.12910-1-weijiang.y...@intel.com/
Patches 43-46: add the commands needed to use SPP.
Patches 47-63: unroll almost all the rest of the introspection code.
Patches 64-67: add single-stepping, mostly as a way to overcome the
unimplemented instructions, but also as a feature for the introspection
tool.
Patches 68-70: cover more cases related to EPT violations.
Patches 71-73: add the remote mapping feature, allowing the introspection
tool to map into its address space a page from guest memory.
Patches 74: add a fix to hypercall emulation.
Patches 75-76: disable some features/optimizations when the introspection
code is present.
Patches 77-78: add trace functions for the introspection code and change
some related to interrupts/exceptions injection.
Patches 79-92: new instruction for the x86 emulator, including cmpxchg
fixes.
To do:
- run stress tests with SPP enabled
- add introspection support for alternate EPT views (almost done)
- add introspection support for virtualization exceptions #VE (almost done)
- add KVM tests
Changes since v5:
- small changes to the protocol, but enough to make it backward
incompatible with v5
- fix CR3 interception (thanks to Mathieu Tarral for reporting the issue)
- add SPP support (thanks to Weijiang Yang)
- add two more ioctls in order to let userspace/QEMU control
the commands/events allowed for introspection
- extend the breakpoint event with the instruction length
- complete the descriptor table registers interception
- add new instructions to the x86 emulator
- move arch dependent code to arch/x86/kvm/
- lots of fixes, especially on page tracking, single-stepping, exception