On Wed, Feb 14, 2018 at 06:38:11PM +0100, Christoffer Dall wrote:
> On Wed, Feb 14, 2018 at 02:43:42PM +, Dave Martin wrote:
> > [CC Ard, in case he has a view on how much we care about softirq NEON
> > performance regressions ... and whether my suggestions make sense]
> >
> > On Wed, Feb 14,
On 15/01/18 11:47, Christoffer Dall wrote:
> On Thu, Jan 04, 2018 at 06:43:23PM +, Marc Zyngier wrote:
>> So far, we're using a complicated sequence of alternatives to
>> patch the kernel/hyp VA mask on non-VHE, and NOP out the
>> masking altogether when on VHE.
>>
>> THe newly introduced
On 18/01/18 14:39, Christoffer Dall wrote:
> On Thu, Jan 04, 2018 at 06:43:29PM +, Marc Zyngier wrote:
>> We so far mapped our HYP IO (which is essencially the GICv2 control
>> registers) using the same method as for memory. It recently appeared
>> that is a bit unsafe:
>>
>> We compute the
On 18/01/18 20:28, Christoffer Dall wrote:
> On Thu, Jan 04, 2018 at 06:43:32PM +, Marc Zyngier wrote:
>> As we're moving towards a much more dynamic way to compute our
>> HYP VA, let's express the mask in a slightly different way.
>>
>> Instead of comparing the idmap position to the "low" VA
This series redesigns parts of KVM/ARM to optimize the performance on
VHE systems. The general approach is to try to do as little work as
possible when transitioning between the VM and the hypervisor. This has
the benefit of lower latency when waiting for interrupts and delivering
virtual
We have numerous checks around that checks if the HCR_EL2 has the RW bit
set to figure out if we're running an AArch64 or AArch32 VM. In some
cases, directly checking the RW bit (given its unintuitive name), is a
bit confusing, and that's not going to improve as we move logic around
for the
We currently have a separate read-modify-write of the HCR_EL2 on entry
to the guest for the sole purpose of setting the VF and VI bits, if set.
Since this is most rarely the case (only when using userspace IRQ chip
and interrupts are in flight), let's get rid of this operation and
instead modify
There is no need to figure out inside the world-switch if we should
save/restore the debug registers or not, we might as well do that in the
higher level debug setup code, making it easier to optimize down the
line.
Reviewed-by: Julien Thierry
Reviewed-by: Marc Zyngier
As we are about to handle system registers quite differently between VHE
and non-VHE systems. In preparation for that, we need to split some of
the handling functions between VHE and non-VHE functionality.
For now, we simply copy the non-VHE functions, but we do change the use
of static keys for
As we are about to move calls around in the sysreg save/restore logic,
let's first rewrite the alternative function callers, because it is
going to make the next patches much easier to read.
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall
SPSR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.
The handling of accessing the various banked SPSRs for
Some system registers do not affect the host kernel's execution and can
therefore be loaded when we are about to run a VCPU and we don't have to
restore the host state to the hardware before the time when we are
actually about to return to userspace or schedule out the VCPU thread.
The EL1 system
We currently handle 32-bit accesses to trapped VM system registers using
the 32-bit index into the coproc array on the vcpu structure, which is a
union of the coproc array and the sysreg array.
Since all the 32-bit coproc indices are created to correspond to the
architectural mapping between
ELR_EL1 is not used by a VHE host kernel and can be deferred, but we
need to rework the accesses to this register to access the latest value
depending on whether or not guest system registers are loaded on the CPU
or only reside in memory.
Signed-off-by: Christoffer Dall
We are about to defer saving and restoring some groups of system
registers to vcpu_put and vcpu_load on supported systems. This means
that we need some infrastructure to access system registes which
supports either accessing the memory backing of the register or directly
accessing the system
To make the code more readable and to avoid the overhead of a function
call, let's get rid of a pair of the alternative function selectors and
explicitly call the VHE and non-VHE functions using the has_vhe() static
key based selector instead, telling the compiler to try to inline the
static
There is really no need to store the vgic_elrsr on the VGIC data
structures as the only need we have for the elrsr is to figure out if an
LR is inactive when we save the VGIC state upon returning from the
guest. We can might as well store this in a temporary local variable.
This also gets rid of
We can program the GICv2 hypervisor control interface logic directly
from the core vgic code and can instead do the save/restore directly
from the flush/sync functions, which can lead to a number of future
optimizations.
Signed-off-by: Christoffer Dall
---
Notes:
Hi Marc, all,
On 02/06/2018 12:56 PM, Marc Zyngier wrote:
> ARM has recently published a SMC Calling Convention (SMCCC)
> specification update[1] that provides an optimised calling convention
> and optional, discoverable support for mitigating CVE-2017-5715. ARM
> Trusted Firmware (ATF) has
We already have the percpu area for the host cpu state, which points to
the VCPU, so there's no need to store the VCPU pointer on the stack on
every context switch. We can be a little more clever and just use
tpidr_el2 for the percpu offset and load the VCPU pointer from the host
context.
This
Moving the call to vcpu_load() in kvm_arch_vcpu_ioctl_run() to after
we've called kvm_vcpu_first_run_init() simplifies some of the vgic and
there is also no need to do vcpu_load() for things such as handling the
immediate_exit flag.
Reviewed-by: Julien Grall
Signed-off-by:
Calling vcpu_load() registers preempt notifiers for this vcpu and calls
kvm_arch_vcpu_load(). The latter will soon be doing a lot of heavy
lifting on arm/arm64 and will try to do things such as enabling the
virtual timer and setting us up to handle interrupts from the timer
hardware.
Loading
VHE kernels run completely in EL2 and therefore don't have a notion of
kernel and hyp addresses, they are all just kernel addresses. Therefore
don't call kern_hyp_va() in the VHE switch function.
Reviewed-by: Andrew Jones
Reviewed-by: Marc Zyngier
The current world-switch function has functionality to detect a number
of cases where we need to fixup some part of the exit condition and
possibly run the guest again, before having restored the host state.
This includes populating missing fault info, emulating GICv2 CPU
interface accesses when
There's a semantic difference between the EL1 registers that control
operation of a kernel running in EL1 and EL1 registers that only control
userspace execution in EL0. Since we can defer saving/restoring the
latter, move them into their own function.
ACTLR_EL1 is not used by a VHE host, so we
The VHE switch function calls __timer_enable_traps and
__timer_disable_traps which don't do anything on VHE systems.
Therefore, simply remove these calls from the VHE switch function and
make the functions non-conditional as they are now only called from the
non-VHE switch path.
Acked-by: Marc
There is no need to reset the VTTBR to zero when exiting the guest on
VHE systems. VHE systems don't use stage 2 translations for the EL2&0
translation regime used by the host.
Reviewed-by: Andrew Jones
Acked-by: Marc Zyngier
Signed-off-by: Christoffer
When running a 32-bit VM (EL1 in AArch32), the AArch32 system registers
can be deferred to vcpu load/put on VHE systems because neither
the host kernel nor host userspace uses these registers.
Note that we can not defer saving DBGVCR32_EL2 conditionally based
on the state of the debug dirty flag
We do not have to change the c15 trap setting on each switch to/from the
guest on VHE systems, because this setting only affects EL0.
The PMU and debug trap configuration can also be done on vcpu load/put
instead, because they don't affect how the host kernel can access the
debug registers while
32-bit registers are not used by a 64-bit host kernel and can be
deferred, but we need to rework the accesses to this register to access
the latest value depending on whether or not guest system registers are
loaded on the CPU or only reside in memory.
Signed-off-by: Christoffer Dall
As we are about to be more lazy with some of the trap configuration
register read/writes for VHE systems, move the logic that is currently
shared between VHE and non-VHE into a separate function which can be
called from either the world-switch path or from vcpu_load/vcpu_put.
Signed-off-by:
There is no longer a need for an alternative to choose the right
function to tell us whether or not FPSIMD was enabled for the VM,
because we can simply cann the appropriate functions directly fromwithin
the _vhe and _nvhe run functions.
Signed-off-by: Christoffer Dall
We can finally get completely rid of any calls to the VGICv3
save/restore functions when the AP lists are empty on VHE systems. This
requires carefully factoring out trap configuration from saving and
restoring state, and carefully choosing what to do on the VHE and
non-VHE path.
One of the
As we are about to move a bunch of save/restore logic for VHE kernels to
the load and put functions, we need some infrastructure to do this.
Reviewed-by: Andrew Jones
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall
VHE actually doesn't rely on clearing the VTTBR when returning to the
host kernel, and that is the current key mechanism of hyp_panic to
figure out how to attempt to return to a state good enough to print a
panic statement.
Therefore, we split the hyp_panic function into two functions, a VHE and
From: Shih-Wei Li
We always set the IMO and FMO bits in the HCR_EL2 when running the
guest, regardless if we use the vgic or not. By moving these flags to
HCR_GUEST_FLAGS we can avoid one of the extra save/restore operations of
HCR_EL2 in the world switch code, and we
So far this is mostly (see below) a copy of the legacy non-VHE switch
function, but we will start reworking these functions in separate
directions to work on VHE and non-VHE in the most optimal way in later
patches.
The only difference after this patch between the VHE and non-VHE run
functions is
Instead of having multiple calls from the world switch path to the debug
logic, each figuring out if the dirty bit is set and if we should
save/restore the debug registers, let's just provide two hooks to the
debug save/restore functionality, one for switching to the guest
context, and one for
The debug save/restore functions can be improved by using the has_vhe()
static key instead of the instruction alternative. Using the static key
uses the same paradigm as we're going to use elsewhere, it makes the
code more readable, and it generates slightly better code (no
stack setups and
From: Christoffer Dall
Currently we access the system registers array via the vcpu_sys_reg()
macro. However, we are about to change the behavior to some times
modify the register file directly, so let's change this to two
primitives:
* Accessor macros
The comment only applied to SPE on non-VHE systems, so we simply remove
it.
Suggested-by: Andrew Jones
Acked-by: Marc Zyngier
Signed-off-by: Christoffer Dall
---
arch/arm64/kvm/hyp/switch.c | 4
1 file changed, 4
There is no need to have multiple identical functions with different
names for saving host and guest state. When saving and restoring state
for the host and guest, the state is the same for both contexts, and
that's why we have the kvm_cpu_context structure. Delete one
version and rename the
The APRs can only have bits set when the guest acknowledges an interrupt
in the LR and can only have a bit cleared when the guest EOIs an
interrupt in the LR. Therefore, if we have no LRs with any
pending/active interrupts, the APR cannot change value and there is no
need to clear it on every
The vgic-v2-sr.c file now only contains the logic to replay unaligned
accesses to the virtual CPU interface on 16K and 64K page systems, which
is only relevant on 64-bit platforms. Therefore move this file to the
arm64 KVM tree, remove the compile directive from the 32-bit side
makefile, and
Just like we can program the GICv2 hypervisor control interface directly
from the core vgic code, we can do the same for the GICv3 hypervisor
control interface on VHE systems.
We do this by simply calling the save/restore functions when we have VHE
and we can then get rid of the save/restore
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.
But in order to give a chance to
Hi gengdongjiu, liu jun
On 05/02/18 11:24, gengdongjiu wrote:
> James Morse wrote:
>> I'd like to pick these patches onto the end of that series, but first I want
>> to
>> know what NOTIFY_SEI means for any OS. The ACPI spec doesn't say, and
>> because its asynchronous, route-able and mask-able,
Hi gengdongjiu,
On 12/02/18 10:19, gengdongjiu wrote:
> On 2018/2/10 1:44, James Morse wrote:
>> The point? We can't know what a CPU without the RAS extensions puts in there.
>>
>> Why Does this matter? When migrating a pending SError we have to know the
>> difference between 'use this 64bit
Now that ghes uses the fixmap addresses and locks via some indirection
we can support multiple NMI-like notifications on arm64.
These should be named after their notification method. x86's
NOTIFY_NMI already is, change the SEA fixmap entry to be called
FIX_APEI_GHES_SEA.
Future patches can add
To split up APEIs in_nmi() path, we need any nmi-like callers to always
be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing
out into a header file.
Currently guest synchronous external aborts are claimed as RAS
notifications by handle_guest_sea(), which is hidden in the arch
Arm64 has multiple NMI-like notifications, but GHES only has one
in_nmi() path. The interactions between these multiple NMI-like
notifications is, unclear.
Split this single path up by moving the fixmap idx and lock into
the struct ghes. Each notification's init function can consider
which other
APEI's Generic Hardware Error Source structures do not describe
whether the SDEI event is shared or private, as this information is
discoverable via the API.
GHES needs to know whether an event is normal or critical to avoid
sharing locks or fixmap entries.
Add a helper to ask firmware for this
If the GHES notification type is SDEI, register the provided event
number and point the callback at ghes_sdei_callback().
Signed-off-by: James Morse
---
drivers/acpi/apei/ghes.c | 66 ++--
include/linux/arm_sdei.h | 3 +++
2
arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.
Currently the arch code unconditionally signals user-space
Hello!
The aim of this series is to wire arm64's SDEI into APEI.
What's SDEI? Its ARM's "Software Delegated Exception Interface" [0]. It's
used by firmware to tell the OS about firmware-first RAS events.
These Software exceptions can interrupt anything, so I describe them as
NMI-like. They
To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.
First we move the estatus-queue code higher in the file so that any
notify_foo() handler can make user of it.
This patch moves
Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over to
it, and start abstracting GHES's single in_nmi() path.
Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's
To split up APEIs in_nmi() path, we need the nmi-like callers to always
be in_nmi(). Add a helper to do the work and claim the notification.
When KVM or the arch code takes an exception that might be a RAS
notification, it asks the APEI firmware-first code whether it wants
to claim the exception.
To support asynchronous NMI-like notifications on arm64 we need to use
the estatus-queue. These patches refactor it to allow multiple APEI
notification types to use it.
Refactor the estatus queue's pool grow/shrink code and notification
routine from NOTIFY_NMI's handlers. This will allow another
On Thu, 15 Feb 2018 20:59:29 +,
Jon Masters wrote:
>
> Hi Marc, all,
>
> On 02/06/2018 12:56 PM, Marc Zyngier wrote:
> > ARM has recently published a SMC Calling Convention (SMCCC)
> > specification update[1] that provides an optimised calling convention
> > and optional, discoverable
60 matches
Mail list logo