Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers

2023-06-09 Thread Jordan Niethe
On Wed, Jun 7, 2023 at 6:27 PM Nicholas Piggin  wrote:
[snip]
>
> This is a tour de force in one of these things, so I hate to be
> the "me smash with club" guy, but what if you allocated buffers
> with enough room for all the state (or 99% of cases, in which
> case an overflow would make an hcall)?
>
> What's actually a fast-path that we don't get from the interrupt
> return buffer? Getting and setting a few regs for MMIO emulation?

As it is a vcpu uses four buffers:

- One for registering it's input and output buffers
   This is allocated just large enough for GSID_RUN_OUTPUT_MIN_SIZE,
   GSID_RUN_INPUT and GSID_RUN_OUTPUT.
   Freed once the buffers are registered.
   I suppose we could just make a buffer big enough to be used for the
vcpu run input buffer then have it register its own address.

- One for process and partition table entries
   Because kvmhv_set_ptbl_entry() isn't associated with a vcpu.
   kvmhv_papr_set_ptbl_entry() allocates and frees a minimal sized
buffer on demand.

- The run vcpu input buffer
   Persists over the lifetime of the vcpu after creation. Large enough
to hold all VCPU-wide elements. The same buffer is also reused for:

 * GET state hcalls
 * SET guest wide state hcalls (guest wide can not be passed into
the vcpu run buffer)

- The run vcpu output buffer
   Persists over the lifetime of the vcpu after creation. This is
sized to be GSID_RUN_OUTPUT_MIN_SIZE as returned by the L0.
   It's unlikely that it would be larger than the run vcpu buffer
size, so I guess you could make it that size too. Probably you could
even use the run vcpu input buffer as the vcpu output buffer.

The buffers could all be that max size and could combine the
configuration buffer, input and output buffers, but I feel it's more
understandable like this.

[snip]

>
> The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
> you're calling the API that.

Will we go with KVM_NESTED_V2_ ?

>
> > +
> > +#define GSID_HOST_STATE_SIZE 0x0001 /* Size of Hypervisor Internal 
> > Format VCPU state */
> > +#define GSID_RUN_OUTPUT_MIN_SIZE 0x0002 /* Minimum size of the Run 
> > VCPU output buffer */
> > +#define GSID_LOGICAL_PVR 0x0003 /* Logical PVR */
> > +#define GSID_TB_OFFSET   0x0004 /* Timebase Offset */
> > +#define GSID_PARTITION_TABLE 0x0005 /* Partition Scoped Page Table 
> > */
> > +#define GSID_PROCESS_TABLE   0x0006 /* Process Table */
>
> > +
> > +#define GSID_RUN_INPUT   0x0C00 /* Run VCPU Input 
> > Buffer */
> > +#define GSID_RUN_OUTPUT  0x0C01 /* Run VCPU Out Buffer 
> > */
> > +#define GSID_VPA 0x0C02 /* HRA to Guest VCPU VPA */
> > +
> > +#define GSID_GPR(x)  (0x1000 + (x))
> > +#define GSID_HDEC_EXPIRY_TB  0x1020
> > +#define GSID_NIA 0x1021
> > +#define GSID_MSR 0x1022
> > +#define GSID_LR  0x1023
> > +#define GSID_XER 0x1024
> > +#define GSID_CTR 0x1025
> > +#define GSID_CFAR0x1026
> > +#define GSID_SRR00x1027
> > +#define GSID_SRR10x1028
> > +#define GSID_DAR 0x1029
>
> It's a shame you have to rip up all your wrapper functions now to
> shoehorn these in.
>
> If you included names analogous to the reg field names in the kvm
> structures, the wrappers could do macro expansions that get them.
>
> #define __GSID_WRAPPER_dar  GSID_DAR
>
> Or similar.

Before I had something pretty hacky, in the macro accessors I had
along the lines of

 gsid_table[offsetof(vcpu, reg)]

to get the GSID for the register.

We can do the wrapper idea, I just worry if it is getting too magic.

>
> And since of course you have to explicitly enumerate all these, I
> wouldn't mind defining the types and lengths up-front rather than
> down in the type function. You'd like to be able to go through the
> spec and eyeball type, number, size.

Something like
#define KVM_NESTED_V2_GS_NIA (KVM_NESTED_V2_GSID_NIA | VCPU_WIDE |
READ_WRITE | DOUBLE_WORD)
etc
?

>
> [snip]
>
> > +/**
> > + * gsb_paddress() - the physical address of buffer
> > + * @gsb: guest state buffer
> > + *
> > + * Returns the physical address of the buffer.
> > + */
> > +static inline u64 gsb_paddress(struct gs_buff *gsb)
> > +{
> > + return __pa(gsb_header(gsb));
> > +}
>
> > +/**
> > + * __gse_put_reg() - add a register type guest state element to a buffer
> > + * @gsb: guest state buffer to add element to
> > + * @iden: guest state ID
> > + * @val: host endian value
> > + *
> > + * Adds a register type guest state element. Uses the guest state ID for
> > + * determining the length of the guest element. If the guest state ID has
> > + * bits that can not be set they will be cleared.
> > + */
> > +static inline int __gse_put_reg(struct gs_buff *gsb, u16 iden, u64 val)
> > +{
> > + 

Re: [RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers

2023-06-07 Thread Nicholas Piggin
On Mon Jun 5, 2023 at 4:48 PM AEST, Jordan Niethe wrote:
> The new PAPR nested guest API introduces the concept of a Guest State
> Buffer for communication about L2 guests between L1 and L0 hosts.
>
> In the new API, the L0 manages the L2 on behalf of the L1. This means
> that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
> table...), it must request the L0 perform the modification. If the
> nested host needs to read L2 state likewise this request must
> go through the L0.
>
> The Guest State Buffer is a Type-Length-Value style data format defined
> in the PAPR which assigns all relevant partition state a unique
> identity. Unlike a typical TLV format the length is redundant as the
> length of each identity is fixed but is included for checking
> correctness.
>
> A guest state buffer consists of an element count followed by a stream
> of elements, where elements are composed of an ID number, data length,
> then the data:
>
>   Header:
>
><---4 bytes--->
>   ++-
>   | Element Count  | Elements...
>   ++-
>
>   Element:
>
><2 bytes---> <-2 bytes-> <-Length bytes->
>   ++---++
>   | Guest State ID |  Length   |  Data  |
>   ++---++
>
> Guest State IDs have other attributes defined in the PAPR such as
> whether they are per thread or per guest, or read-only.
>
> Introduce a library for using guest state buffers. This includes support
> for actions such as creating buffers, adding elements to buffers,
> reading the value of elements and parsing buffers. This will be used
> later by the PAPR nested guest support.

This is a tour de force in one of these things, so I hate to be
the "me smash with club" guy, but what if you allocated buffers
with enough room for all the state (or 99% of cases, in which
case an overflow would make an hcall)?

What's actually a fast-path that we don't get from the interrupt
return buffer? Getting and setting a few regs for MMIO emulation?


> Signed-off-by: Jordan Niethe 
> ---
> v2:
>   - Add missing #ifdef CONFIG_VSXs
>   - Move files from lib/ to kvm/
>   - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   - Use kunit for guest state buffer tests
>   - Add configuration option for the tests
>   - Use macros for contiguous id ranges like GPRs
>   - Add some missing EXPORTs to functions
>   - HEIR element is a double word not a word
> ---
>  arch/powerpc/Kconfig.debug|  12 +
>  arch/powerpc/include/asm/guest-state-buffer.h | 901 ++
>  arch/powerpc/include/asm/kvm_book3s.h |   2 +
>  arch/powerpc/kvm/Makefile |   3 +
>  arch/powerpc/kvm/guest-state-buffer.c | 563 +++
>  arch/powerpc/kvm/test-guest-state-buffer.c| 321 +++
>  6 files changed, 1802 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
>  create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
>  create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 6aaf8dc60610..ed830a714720 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
>   bool "Run self-tests of the MSI bitmap code"
>   depends on DEBUG_KERNEL
>  
> +config GUEST_STATE_BUFFER_TEST
> + def_tristate n
> + prompt "Enable Guest State Buffer unit tests"
> + depends on KUNIT
> + depends on KVM_BOOK3S_HV_POSSIBLE
> + default KUNIT_ALL_TESTS
> + help
> +   The Guest State Buffer is a data format specified in the PAPR.
> +   It is by hcalls to communicate the state of L2 guests between
> +   the L1 and L0 hypervisors. Enable unit tests for the library
> +   used to create and use guest state buffers.
> +
>  config PPC_IRQ_SOFT_MASK_DEBUG
>   bool "Include extra checks for powerpc irq soft masking"
>   depends on PPC64
> diff --git a/arch/powerpc/include/asm/guest-state-buffer.h 
> b/arch/powerpc/include/asm/guest-state-buffer.h
> new file mode 100644
> index ..65a840abf1bb
> --- /dev/null
> +++ b/arch/powerpc/include/asm/guest-state-buffer.h
> @@ -0,0 +1,901 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Interface based on include/net/netlink.h
> + */
> +#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
> +
> +#include 
> +#include 
> +#include 
> +
> +/**
> + * Guest State Buffer Constants
> + **/
> +#define GSID_BLANK   0x

The namespaces are a little abbreviated. KVM_PAPR_ might be nice if
you're calling the API that.

> +
> +#define GSID_HOST_STATE_SIZE 0x0001 /* Size of Hypervisor Internal 
> Format VCPU state */
> +#define GSID_RUN_OUTPUT_MIN_SIZE   

[RFC PATCH v2 4/6] KVM: PPC: Add helper library for Guest State Buffers

2023-06-05 Thread Jordan Niethe
The new PAPR nested guest API introduces the concept of a Guest State
Buffer for communication about L2 guests between L1 and L0 hosts.

In the new API, the L0 manages the L2 on behalf of the L1. This means
that if the L1 needs to change L2 state (e.g. GPRs, SPRs, partition
table...), it must request the L0 perform the modification. If the
nested host needs to read L2 state likewise this request must
go through the L0.

The Guest State Buffer is a Type-Length-Value style data format defined
in the PAPR which assigns all relevant partition state a unique
identity. Unlike a typical TLV format the length is redundant as the
length of each identity is fixed but is included for checking
correctness.

A guest state buffer consists of an element count followed by a stream
of elements, where elements are composed of an ID number, data length,
then the data:

  Header:

   <---4 bytes--->
  ++-
  | Element Count  | Elements...
  ++-

  Element:

   <2 bytes---> <-2 bytes-> <-Length bytes->
  ++---++
  | Guest State ID |  Length   |  Data  |
  ++---++

Guest State IDs have other attributes defined in the PAPR such as
whether they are per thread or per guest, or read-only.

Introduce a library for using guest state buffers. This includes support
for actions such as creating buffers, adding elements to buffers,
reading the value of elements and parsing buffers. This will be used
later by the PAPR nested guest support.

Signed-off-by: Jordan Niethe 
---
v2:
  - Add missing #ifdef CONFIG_VSXs
  - Move files from lib/ to kvm/
  - Guard compilation on CONFIG_KVM_BOOK3S_HV_POSSIBLE
  - Use kunit for guest state buffer tests
  - Add configuration option for the tests
  - Use macros for contiguous id ranges like GPRs
  - Add some missing EXPORTs to functions
  - HEIR element is a double word not a word
---
 arch/powerpc/Kconfig.debug|  12 +
 arch/powerpc/include/asm/guest-state-buffer.h | 901 ++
 arch/powerpc/include/asm/kvm_book3s.h |   2 +
 arch/powerpc/kvm/Makefile |   3 +
 arch/powerpc/kvm/guest-state-buffer.c | 563 +++
 arch/powerpc/kvm/test-guest-state-buffer.c| 321 +++
 6 files changed, 1802 insertions(+)
 create mode 100644 arch/powerpc/include/asm/guest-state-buffer.h
 create mode 100644 arch/powerpc/kvm/guest-state-buffer.c
 create mode 100644 arch/powerpc/kvm/test-guest-state-buffer.c

diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 6aaf8dc60610..ed830a714720 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -82,6 +82,18 @@ config MSI_BITMAP_SELFTEST
bool "Run self-tests of the MSI bitmap code"
depends on DEBUG_KERNEL
 
+config GUEST_STATE_BUFFER_TEST
+   def_tristate n
+   prompt "Enable Guest State Buffer unit tests"
+   depends on KUNIT
+   depends on KVM_BOOK3S_HV_POSSIBLE
+   default KUNIT_ALL_TESTS
+   help
+ The Guest State Buffer is a data format specified in the PAPR.
+ It is by hcalls to communicate the state of L2 guests between
+ the L1 and L0 hypervisors. Enable unit tests for the library
+ used to create and use guest state buffers.
+
 config PPC_IRQ_SOFT_MASK_DEBUG
bool "Include extra checks for powerpc irq soft masking"
depends on PPC64
diff --git a/arch/powerpc/include/asm/guest-state-buffer.h 
b/arch/powerpc/include/asm/guest-state-buffer.h
new file mode 100644
index ..65a840abf1bb
--- /dev/null
+++ b/arch/powerpc/include/asm/guest-state-buffer.h
@@ -0,0 +1,901 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Interface based on include/net/netlink.h
+ */
+#ifndef _ASM_POWERPC_GUEST_STATE_BUFFER_H
+#define _ASM_POWERPC_GUEST_STATE_BUFFER_H
+
+#include 
+#include 
+#include 
+
+/**
+ * Guest State Buffer Constants
+ **/
+#define GSID_BLANK 0x
+
+#define GSID_HOST_STATE_SIZE   0x0001 /* Size of Hypervisor Internal 
Format VCPU state */
+#define GSID_RUN_OUTPUT_MIN_SIZE   0x0002 /* Minimum size of the Run VCPU 
output buffer */
+#define GSID_LOGICAL_PVR   0x0003 /* Logical PVR */
+#define GSID_TB_OFFSET 0x0004 /* Timebase Offset */
+#define GSID_PARTITION_TABLE   0x0005 /* Partition Scoped Page Table */
+#define GSID_PROCESS_TABLE 0x0006 /* Process Table */
+
+#define GSID_RUN_INPUT 0x0C00 /* Run VCPU Input Buffer */
+#define GSID_RUN_OUTPUT0x0C01 /* Run VCPU Out Buffer */
+#define GSID_VPA   0x0C02 /* HRA to Guest VCPU VPA */
+
+#define GSID_GPR(x)(0x1000 + (x))
+#define GSID_HDEC_EXPIRY_TB0x1020
+#define GSID_NIA