On Tue, Dec 11, 2018 at 10:31:01AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 11, 2018 at 06:20:57PM +, Jean-Philippe Brucker wrote:
> > Implement the virtio-iommu driver, following specification v0.9 [1].
> >
> > Only minor changes since v5 [2]. I fixed issues reported by Michael and
> >
Due to the way the effective SVE vector length is controlled and
trapped at different exception levels, certain mismatches in the
sets of vector lengths supported by different physical CPUs in the
system may prevent straightforward virtualisation of SVE at parity
with the host.
This patch
The roles of sve_init_vq_map(), sve_update_vq_map() and
sve_verify_vq_map() are highly non-obvious to anyone who has not dug
through cpufeatures.c in detail.
Since the way these functions interact with each other is more
important here than a full understanding of the cpufeatures code, this
patch
Since userspace may need to decide on the set of vector lengths for
the guest before setting up a vm, it is onerous to require a vcpu
fd to be available first. KVM_ARM_SVE_CONFIG_QUERY is not
vcpu-dependent anyway, so this patch wires up KVM_ARM_SVE_CONFIG to
be usable on a vm fd where
The current FPSIMD/SVE context handling support for non-task (i.e.,
KVM vcpu) contexts does not take SVE into account. This means that
only task contexts can safely use SVE at present.
In preparation for enabling KVM guests to use SVE, it is necessary
to keep track of SVE state for non-task
SVE will require the KVM_ARM_SVE_CONFIG ioctl to be used early to
configure a vcpu before other arch vcpu ioctls will behave in a
consistent way.
To hide these effects from userspace while minimising mess in the
generic code, this patch splits arch vcpu ioctls into two phases:
early configuration
The reset_unknown() system register helper initialises a guest
register to a distinctive junk value on vcpu reset, to help expose
and debug deficient register initialisation within the guest.
Some registers such as the SVE control register ZCR_EL1 contain a
mixture of UNKNOWN fields and RES0
The Arm SVE architecture defines registers that are up to 2048 bits
in size (with some possibility of further future expansion).
In order to avoid the need for an excessively large number of
ioctls when saving and restoring a vcpu's registers, this patch
adds a #define to make support for
Currently, the detection of invalid bits in the KVM_CREATE_VM type
argument is done in the kvm_arm_setup_stage2() backend.
In order to make it easier to add type flags with independent
meanings, this patch moves the logic for rejecting invalid bits to
kvm_arch_init_vm(). Backend functions are
This patch includes the SVE register IDs in the list returned by
KVM_GET_REG_LIST, as appropriate.
On a non-SVE-enabled vcpu, no extra IDs are added.
On an SVE-enabled vcpu, the appropriate number of slice IDs are
enumerated for each SVE register, depending on the maximum vector
length for the
kvm_host.h uses DECLARE_BITMAP() to declare the features member of
struct vcpu_arch, but the corresponding #include for this is
missing.
This patch adds a suitable #include for . Although
the header builds without it today, this should help to avoid
future surprises.
Signed-off-by: Dave Martin
Since SVE will be enabled or disabled on a per-vcpu basis, a flag
is needed in order to track which vcpus have it enabled.
This patch adds a suitable flag and a helper for checking it.
Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
---
arch/arm64/include/asm/kvm_host.h | 4
1 file
In order to give each vcpu its own view of the SVE registers, this
patch adds context storage via a new sve_state pointer in struct
vcpu_arch. An additional member sve_max_vl is also added for each
vcpu, to determine the maximum vector length visible to the guest
and thus the value to be
KVM will need to interrogate the set of SVE vector lengths
available on the system.
This patch exposes the relevant bits to the kernel, along with a
sve_vq_available() helper to check whether a particular vector
length is supported.
vq_to_bit() and bit_to_vq() are not intended for use outside
This patch adds sections to the KVM API documentation describing
the extensions for supporting the Scalable Vector Extension (SVE)
in guests.
Signed-off-by: Dave Martin
---
Changes since RFC v2:
* Fix documentation regarding which SVE Zn register bits must be
accessed in order to get at
Since the the sizes of members the core arm64 registers vary, the
list of register encodings that make sense is not a simple linear
sequence.
To clarify which encodings to use, this patch adds a brief list
to the documentation.
Signed-off-by: Dave Martin
---
Documentation/virtual/kvm/api.txt |
KVM_GET_REG_LIST should only enumerate registers that are actually
accessible, so it is necessary to filter out any register that is
not exposed to the guest. For features that are configured at
runtime, this will require a dynamic check.
For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be
This patch adds the necessary API extensions to allow userspace to
detect SVE support for guests and enable it.
A new capability KVM_CAP_ARM_SVE is defined to allow userspace to
detect the availability of the KVM SVE API extensions in the usual
way. In addition, userspace must opt into these
This patch adds the necessary support for context switching ZCR_EL1
for each vcpu.
ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
sense for it to be handled as part of the guest FPSIMD/SVE context
for context switch purposes instead of handling it as a general
system register.
Architecture features that are conditionally visible to the guest
will require run-time checks in the ID register accessor functions.
In particular, read_id_reg() will need to perform checks in order
to generate the correct emulated value for certain ID register
fields such as ID_AA64PFR0_EL1.SVE
In order to avoid the pointless complexity of maintaining two ioctl
register access views of the same data, this patch blocks ioctl
access to the FPSIMD V-registers on vcpus that support SVE.
This will make it more straightforward to add SVE register access
support.
Since SVE is an opt-in
kvm_arm_num_regs() adds together various partial register counts in
a freeform sum expression, which makes it harder than necessary to
read diffs that add, modify or remove a single term in the sum
(which is expected to the common case under maintenance).
This patch refactors the code to add the
__fpsimd_enabled() no longer exists, but a dangling declaration has
survived in kvm_hyp.h.
This patch gets rid of it.
Signed-off-by: Dave Martin
Reviewed-by: Alex Bennée
---
arch/arm64/include/asm/kvm_hyp.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_hyp.h
This patch updates fpsimd_flush_task_state() to mirror the new
semantics of fpsimd_flush_cpu_state(): both functions now
implicitly set TIF_FOREIGN_FPSTATE to indicate that the task's
FPSIMD state is not loaded into the cpu.
As a side-effect, fpsimd_flush_task_state() now sets
TIF_FOREIGN_FPSTATE
This series implements support for allowing KVM guests to use the Arm
Scalable Vector Extension (SVE).
The patches are also available on a branch for reviewer convenience. [1]
The patches are based on v4.20-rc5, with [3] applied (which includes
some needed refactoring).
This is an interim
On Mon, Dec 10, 2018 at 07:54:25PM +, Kristina Martsenko wrote:
> On 09/12/2018 14:24, Richard Henderson wrote:
> > On 12/7/18 12:39 PM, Kristina Martsenko wrote:
> >> #define SCTLR_ELx_DSSBS (1UL << 44)
> >> +#define SCTLR_ELx_ENIA(1 << 31)
> >
> > 1U or 1UL lest you produce signed
On 11/12/2018 18:31, Christoph Hellwig wrote:
> On Tue, Dec 11, 2018 at 06:20:57PM +, Jean-Philippe Brucker wrote:
>> Implement the virtio-iommu driver, following specification v0.9 [1].
>>
>> Only minor changes since v5 [2]. I fixed issues reported by Michael and
>> added tags from Eric and
Currently FPEXC32_EL2 is handled specially when context-switching.
However, FPEXC has no architectural effect when running in AArch64.
The only case where an arm64 host may execute in AArch32 is when
running compat user code at EL0: the architecture explicitly
documents FPEXC as having no effect
On Mon, Dec 03, 2018 at 06:05:58PM +, James Morse wrote:
> ACPI has a GHESv2 which is used on hardware reduced platforms to
> explicitly acknowledge that the memory for CPER records has been
> consumed. This lets an external agent know it can re-use this
> memory for something else.
>
>
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio transport without emulating page
tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.
The bulk of the code transforms calls coming from the IOMMU API into
The event queue offers a way for the device to report access faults from
endpoints. It is implemented on virtqueue #1. Whenever the host needs to
signal a fault, it fills one of the buffers offered by the guest and
interrupts it.
Tested-by: Bharat Bhushan
Tested-by: Eric Auger
Reviewed-by: Eric
In PCI root complex nodes, the iommu-map property describes the IOMMU that
translates each endpoint. On some platforms, the IOMMU itself is presented
as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). This isn't supported
by the current OF driver, which expects all endpoints to have an IOMMU.
For PCI devices that have an OF node, set the fwnode as well. This way
drivers that rely on fwnode don't need the special case described by
commit f94277af03ea ("of/platform: Initialise dev->fwnode appropriately").
Acked-by: Bjorn Helgaas
Signed-off-by: Jean-Philippe Brucker
---
The nature of a virtio-mmio node is discovered by the virtio driver at
probe time. However the DMA relation between devices must be described
statically. When a virtio-mmio node is a virtio-iommu device, it needs an
"#iommu-cells" property as specified by bindings/iommu/iommu.txt.
Otherwise, the
Some systems implement virtio-iommu as a PCI endpoint. The operating
system needs to discover the relationship between IOMMU and masters long
before the PCI endpoint gets probed. Add a PCI child node to describe the
virtio-iommu device.
The virtio-pci-iommu is conceptually split between a PCI
Implement the virtio-iommu driver, following specification v0.9 [1].
Only minor changes since v5 [2]. I fixed issues reported by Michael and
added tags from Eric and Bharat. Thanks!
You can find Linux driver and kvmtool device on v0.9 branches [3],
module and x86 support on virtio-iommu/devel.
On Mon, Dec 03, 2018 at 06:05:57PM +, James Morse wrote:
> Refactor the estatus queue's pool notification routine from
> NOTIFY_NMI's handlers. This will allow another notification
> method to use the estatus queue without duplicating this code.
>
> This patch adds
On Mon, Dec 03, 2018 at 06:05:55PM +, James Morse wrote:
> ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going
> on to __process_error(). This is pointless as ghes_read_estatus()
> will always set this flag if it returns success, which was checked
> earlier in the loop. Remove
From: Punit Agrawal
KVM only supports PMD hugepages at stage 2. Now that the various page
handling routines are updated, extend the stage 2 fault handling to
map in PUD hugepages.
Addition of PUD hugepage support enables additional page sizes (e.g.,
1G with 4K granule) which can be useful on
From: Punit Agrawal
In preparation for creating larger hugepages at Stage 2, add support
to the age handling notifiers for PUD hugepages when encountered.
Provide trivial helpers for arm32 to allow sharing code.
Signed-off-by: Punit Agrawal
Cc: Christoffer Dall
Cc: Marc Zyngier
Cc: Russell
From: Punit Agrawal
In preparation for creating PUD hugepages at stage 2, add support for
write protecting PUD hugepages when they are encountered. Write
protecting guest tables is used to track dirty pages when migrating
VMs.
Also, provide trivial implementations of required kvm_s2pud_*
This series is an update to the PUD hugepage support previously posted
at [0]. This patchset adds support for PUD hugepages at stage 2 a
feature that is useful on cores that have support for large sized TLB
mappings (e.g., 1GB for 4K granule).
The patches are based on v4.20-rc4
The patches have
From: Punit Agrawal
In preparation for creating PUD hugepages at stage 2, add support for
detecting execute permissions on PUD page table entries. Faults due to
lack of execute permissions on page table entries is used to perform
i-cache invalidation on first execute.
Provide trivial
From: Punit Agrawal
In preparation for creating larger hugepages at Stage 2, extend the
access fault handling at Stage 2 to support PUD hugepages when
encountered.
Provide trivial helpers for arm32 to allow sharing of code.
Signed-off-by: Punit Agrawal
Cc: Christoffer Dall
Cc: Marc Zyngier
From: Punit Agrawal
The code for operations such as marking the pfn as dirty, and
dcache/icache maintenance during stage 2 fault handling is duplicated
between normal pages and PMD hugepages.
Instead of creating another copy of the operations when we introduce
PUD hugepages, let's share them
From: Punit Agrawal
Stage 2 fault handler marks a page as executable if it is handling an
execution fault or if it was a permission fault in which case the
executable bit needs to be preserved.
The logic to decide if the page should be marked executable is
duplicated for PMD and PTE entries. To
From: Punit Agrawal
Introduce helpers to abstract architectural handling of the conversion
of pfn to page table entries and marking a PMD page table entry as a
block entry.
The helpers are introduced in preparation for supporting PUD hugepages
at stage 2 - which are supported on arm64 but do
On Mon, Dec 03, 2018 at 06:05:54PM +, James Morse wrote:
> When CPER records are found the address of the records is stashed
> in the struct ghes. Once the records have been processed, this
> address is overwritten with zero so that it won't be processed
> again without being re-populated by
On Mon, Dec 03, 2018 at 06:05:53PM +, James Morse wrote:
> Adding new NMI-like notifications duplicates the calls that grow
> and shrink the estatus pool. This is all pretty pointless, as the
> size is capped to 64K. Allocate this for each ghes and drop
> the code that grows and shrinks the
On Mon, Dec 03, 2018 at 06:05:52PM +, James Morse wrote:
> ghes.c has a memory pool it uses for the estatus cache and the estatus
> queue. The cache is initialised when registering the platform driver.
> For the queue, an NMI-like notification has to grow/shrink the pool
> as it is registered
On 10/12/2018 22:53, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 03:06:47PM +, Jean-Philippe Brucker wrote:
>> On 27/11/2018 18:53, Michael S. Tsirkin wrote:
>>> On Tue, Nov 27, 2018 at 06:10:46PM +, Jean-Philippe Brucker wrote:
On 27/11/2018 18:04, Michael S. Tsirkin wrote:
Currently, enumeration of the core register IDs for
KVM_GET_REG_LIST is open-coded in kvm_arm_copy_reg_indices().
This will become cumbersome as the enumeration logic becomes more
complex. In preparation for future patches, this patch factors the
code out into a separate function
Since commit d26c25a9d19b ("arm64: KVM: Tighten guest core register
access from userspace"), KVM_{GET,SET}_ONE_REG rejects register IDs
that do not correspond to a single underlying architectural register.
KVM_GET_REG_LIST was not changed to match however: instead, it
simply yields a list of
Currently, the only code that needs to deduce the proper size of a
KVM core register on arm64 is validate_core_offset().
In order to make this code easier to reuse, this patch factors out
the size determination into a separate function
core_reg_size_from_offset().
Since validate_core_offset()
Since commit d26c25a9d19b ("arm64: KVM: Tighten guest core register
access from userspace"), KVM_{GET,SET}_ONE_REG rejects register IDs
that do not correspond to a single underlying architectural register.
This series proposed a fix for this regression along with some related
refactoring.
For
On Tue, Dec 11, 2018 at 02:40:07PM +0100, Christoffer Dall wrote:
> On Tue, Dec 11, 2018 at 01:11:33PM +, Andrew Murray wrote:
> > On Tue, Dec 11, 2018 at 01:29:51PM +0100, Christoffer Dall wrote:
> > > On Tue, Dec 11, 2018 at 12:13:37PM +, Andrew Murray wrote:
> > > > The virt/arm core
On Tue, Dec 11, 2018 at 01:11:33PM +, Andrew Murray wrote:
> On Tue, Dec 11, 2018 at 01:29:51PM +0100, Christoffer Dall wrote:
> > On Tue, Dec 11, 2018 at 12:13:37PM +, Andrew Murray wrote:
> > > The virt/arm core allocates a percpu structure as per the
> > > kvm_cpu_context_t
> > > type,
On Tue, Dec 11, 2018 at 12:40:51PM +, Suzuki K Poulose wrote:
>
>
> On 11/12/2018 12:13, Andrew Murray wrote:
> > The virt/arm core allocates a percpu structure as per the kvm_cpu_context_t
> > type, at present this is typedef'd to kvm_cpu_context and used to store
> > host cpu context. The
On Tue, Dec 11, 2018 at 01:29:51PM +0100, Christoffer Dall wrote:
> On Tue, Dec 11, 2018 at 12:13:37PM +, Andrew Murray wrote:
> > The virt/arm core allocates a percpu structure as per the kvm_cpu_context_t
> > type, at present this is typedef'd to kvm_cpu_context and used to store
> > host
On 11/12/2018 12:13, Andrew Murray wrote:
The virt/arm core allocates a percpu structure as per the kvm_cpu_context_t
type, at present this is typedef'd to kvm_cpu_context and used to store
host cpu context. The kvm_cpu_context structure is also used elsewhere to
hold vcpu context. In order
On Tue, Dec 11, 2018 at 12:13:37PM +, Andrew Murray wrote:
> The virt/arm core allocates a percpu structure as per the kvm_cpu_context_t
> type, at present this is typedef'd to kvm_cpu_context and used to store
> host cpu context. The kvm_cpu_context structure is also used elsewhere to
> hold
We recently addressed a VMID generation race by introducing a read/write
lock around accesses and updates to the vmid generation values.
However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but
does so without taking the read lock.
As far as I can tell, this can lead to the same
In order to effeciently enable/disable guest/host only perf counters
at guest entry/exit we add bitfields to kvm_cpu_context for guest and
host events as well as accessors for updating them.
Signed-off-by: Andrew Murray
---
arch/arm64/include/asm/kvm_host.h | 28
1
Enable/disable event counters as appropriate when entering and exiting
the guest to enable support for guest or host only event counting.
For both VHE and non-VHE we switch the counters between host/guest at
EL2. EL2 is filtered out by the PMU when we are using the :G modifier.
The PMU may be on
The armv8pmu_enable_event_counter function issues an isb instruction
after enabling a pair of counters - this doesn't provide any value
and is inconsistent with the armv8pmu_disable_event_counter.
In any case armv8pmu_enable_event_counter is always called with the
PMU stopped. Starting the PMU
The virt/arm core allocates a percpu structure as per the kvm_cpu_context_t
type, at present this is typedef'd to kvm_cpu_context and used to store
host cpu context. The kvm_cpu_context structure is also used elsewhere to
hold vcpu context. In order to use the percpu to hold additional future
host
This patchset provides support for perf event modifiers :G and :H which
allows for filtering of PMU events between host and guests when used
with KVM.
As the underlying hardware cannot distinguish between guest and host
context, the performance counters must be stopped and started upon
entry/exit
Add support for the :G and :H attributes in perf by handling the
exclude_host/exclude_guest event attributes.
We notify KVM of counters that we wish to be enabled or disabled on
guest entry/exit and thus defer from starting or stopping :G events
as per the events exclude_host attribute.
With
When restoring the active state from userspace, we don't know which CPU
was the source for the active state, and this is not architecturally
exposed in any of the register state.
Set the active_source to 0 in this case. In the future, we can expand
on this and exposse the information as
On Tue, Dec 11, 2018 at 10:00:35AM +0100, Steven Miao (Arm Technology China)
wrote:
> Hi Christopher,
>
> > -Original Message-
> > From: Christoffer Dall
> > Sent: Monday, December 10, 2018 9:19 PM
> > To: Steven Miao (Arm Technology China)
> > Cc: kvmarm@lists.cs.columbia.edu
> >
On Mon, Nov 26, 2018 at 06:26:47PM +, Julien Thierry wrote:
> vgic_cpu->ap_list_lock must always be taken with interrupts disabled as
> it is used in interrupt context.
>
> For configurations such as PREEMPT_RT_FULL, this means that it should
> be a raw_spinlock since RT spinlocks are
On Mon, Nov 26, 2018 at 06:26:44PM +, Julien Thierry wrote:
> To change the active state of an MMIO, halt is requested for all vcpus of
> the affected guest before modifying the IRQ state. This is done by calling
> cond_resched_lock() in vgic_mmio_change_active(). However interrupts are
>
Hi Christopher,
> -Original Message-
> From: Christoffer Dall
> Sent: Monday, December 10, 2018 9:19 PM
> To: Steven Miao (Arm Technology China)
> Cc: kvmarm@lists.cs.columbia.edu
> Subject: Re: KVM arm realtime performance optimization
>
> On Mon, Dec 10, 2018 at 05:36:09AM +,
On Fri, Nov 09, 2018 at 03:07:10PM +, Mark Rutland wrote:
> When we emulate an MMIO instruction, we advance the CPU state within
> decode_hsr(), before emulating the instruction effects.
>
> Having this logic in decode_hsr() is opaque, and advancing the state
> before emulation is
On Fri, Nov 09, 2018 at 03:07:11PM +, Mark Rutland wrote:
> When we emulate a guest instruction, we don't advance the hardware
> singlestep state machine, and thus the guest will receive a software
> step exception after a next instruction which is not emulated by the
> host.
>
> We bodge
76 matches
Mail list logo