Re: [PATCH v2 2/2] KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available

2021-02-18 Thread Alexandru Elisei
Hi Marc,

On 2/9/21 11:48 AM, Marc Zyngier wrote:
> When running under a nesting hypervisor, it isn't guaranteed that
> the virtual HW will include a PMU. In which case, let's not try
> to access the PMU registers in the world switch, as that'd be
> deadly.
>
> Reported-by: Andre Przywara 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kernel/image-vars.h  | 3 +++
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 9 ++---
>  2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
> index f676243abac6..32af3c865700 100644
> --- a/arch/arm64/kernel/image-vars.h
> +++ b/arch/arm64/kernel/image-vars.h
> @@ -102,6 +102,9 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
>  /* Array containing bases of nVHE per-CPU memory regions. */
>  KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
>  
> +/* PMU available static key */
> +KVM_NVHE_ALIAS(kvm_arm_pmu_available);
> +
>  #endif /* CONFIG_KVM */
>  
>  #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
> b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index 84473574c2e7..75c0faa3b791 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -90,15 +90,18 @@ static inline void __activate_traps_common(struct 
> kvm_vcpu *vcpu)
>* counter, which could make a PMXEVCNTR_EL0 access UNDEF at
>* EL1 instead of being trapped to EL2.
>*/
> - write_sysreg(0, pmselr_el0);
> - write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
> + if (kvm_arm_support_pmu_v3()) {
> + write_sysreg(0, pmselr_el0);
> + write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
> + }
>   write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
>  }
>  
>  static inline void __deactivate_traps_common(void)
>  {
>   write_sysreg(0, hstr_el2);
> - write_sysreg(0, pmuserenr_el0);
> + if (kvm_arm_support_pmu_v3())
> + write_sysreg(0, pmuserenr_el0);
>  }
>  
>  static inline void ___activate_traps(struct kvm_vcpu *vcpu)

It looks to me like this indeed fixes the issue with accessing PMU registers 
even
if the PMU is not present. Found another instance in the world switch where the
PMU is accessed, in __kvm_vcpu_run() in the nvhe code, but in that case the
registers were accessed only if there were events that needed to be context
switched. No PMU on the host, no PMU on the guest and no events for either, so
that looks fine to me:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/2] KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key

2021-02-18 Thread Alexandru Elisei
Hi Marc,

On 2/9/21 11:48 AM, Marc Zyngier wrote:
> We currently find out about the presence of a HW PMU (or the handling
> of that PMU by perf, which amounts to the same thing) in a fairly
> roundabout way, by checking the number of counters available to perf.
> That's good enough for now, but we will soon need to find about about
> that on paths where perf is out of reach (in the world switch).
>
> Instead, let's turn kvm_arm_support_pmu_v3() into a static key.

The patch looks correct to me. I compiled the kernel with CONFIG_ARM_PMU set and
unset, saw no problems. The IS_ENABLED(CONFIG_ARM_PMU) condition does prevent 
the
compile error reported by Andre, as removing it triggered an error. I also 
double
checked that static keys are initialized before KVM, which in hindsight was
obvious, since the GIC is initialized before KVM, and the GIC driver makes use 
of
static keys:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/perf.c | 10 ++
>  arch/arm64/kvm/pmu-emul.c | 10 --
>  include/kvm/arm_pmu.h |  9 +++--
>  3 files changed, 17 insertions(+), 12 deletions(-)
>
> diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
> index d45b8b9a4415..739164324afe 100644
> --- a/arch/arm64/kvm/perf.c
> +++ b/arch/arm64/kvm/perf.c
> @@ -11,6 +11,8 @@
>  
>  #include 
>  
> +DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
> +
>  static int kvm_is_in_guest(void)
>  {
>  return kvm_get_running_vcpu() != NULL;
> @@ -48,6 +50,14 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
>  
>  int kvm_perf_init(void)
>  {
> + /*
> +  * Check if HW_PERF_EVENTS are supported by checking the number of
> +  * hardware performance counters. This could ensure the presence of
> +  * a physical PMU and CONFIG_PERF_EVENT is selected.
> +  */
> + if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0)
> + static_branch_enable(_arm_pmu_available);
> +
>   return perf_register_guest_info_callbacks(_guest_cbs);
>  }
>  
> diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
> index 4ad66a532e38..44d500706ab9 100644
> --- a/arch/arm64/kvm/pmu-emul.c
> +++ b/arch/arm64/kvm/pmu-emul.c
> @@ -813,16 +813,6 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool 
> pmceid1)
>   return val & mask;
>  }
>  
> -bool kvm_arm_support_pmu_v3(void)
> -{
> - /*
> -  * Check if HW_PERF_EVENTS are supported by checking the number of
> -  * hardware performance counters. This could ensure the presence of
> -  * a physical PMU and CONFIG_PERF_EVENT is selected.
> -  */
> - return (perf_num_counters() > 0);
> -}
> -
>  int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
>  {
>   if (!kvm_vcpu_has_pmu(vcpu))
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 8dcb3e1477bc..6fd3cda608e4 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -13,6 +13,13 @@
>  #define ARMV8_PMU_CYCLE_IDX  (ARMV8_PMU_MAX_COUNTERS - 1)
>  #define ARMV8_PMU_MAX_COUNTER_PAIRS  ((ARMV8_PMU_MAX_COUNTERS + 1) >> 1)
>  
> +DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
> +
> +static __always_inline bool kvm_arm_support_pmu_v3(void)
> +{
> + return static_branch_likely(_arm_pmu_available);
> +}
> +
>  #ifdef CONFIG_HW_PERF_EVENTS
>  
>  struct kvm_pmc {
> @@ -47,7 +54,6 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
> val);
>  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
>  void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
>   u64 select_idx);
> -bool kvm_arm_support_pmu_v3(void);
>  int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu,
>   struct kvm_device_attr *attr);
>  int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu,
> @@ -87,7 +93,6 @@ static inline void kvm_pmu_software_increment(struct 
> kvm_vcpu *vcpu, u64 val) {}
>  static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {}
>  static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu,
> u64 data, u64 select_idx) {}
> -static inline bool kvm_arm_support_pmu_v3(void) { return false; }
>  static inline int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu,
> struct kvm_device_attr *attr)
>  {
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 21/21] hw/rtc: ARM/arm64: Use MMIO at higher addresses

2021-02-18 Thread Andre Przywara
On Thu, 18 Feb 2021 13:33:15 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> On 12/10/20 2:29 PM, Andre Przywara wrote:
> > Using the RTC device at its legacy I/O address as set by IBM in 1981
> > was a kludge we used for simplicity on ARM platforms as well.
> > However this imposes problems due to their missing alignment and overlap
> > with the PCI I/O address space.
> >
> > Now that we can switch a device easily between using ioports and
> > MMIO, let's move the RTC out of the first 4K of memory on ARM platforms.
> >
> > That should be transparent for well behaved guests, since the change is
> > naturally reflected in the device tree.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  hw/rtc.c | 24 
> >  1 file changed, 16 insertions(+), 8 deletions(-)
> >
> > diff --git a/hw/rtc.c b/hw/rtc.c
> > index ee4c9102..bdb88f0f 100644
> > --- a/hw/rtc.c
> > +++ b/hw/rtc.c
> > @@ -5,6 +5,15 @@
> >  
> >  #include 
> >  
> > +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > +#define RTC_BUS_TYPE   DEVICE_BUS_MMIO
> > +#define RTC_BASE_ADDRESS   0x101  
> 
> This looks correct, the base address is the serial base address + 64k, so they
> don't overlap, and it doesn't overlap with the flash memory either. Same 
> comment
> as for the serial, I think the reason for choosing this address should be
> mentioned, and the region should be put in the arch memory layout file. Other 
> than
> that, the patch looks good.

Yep, will do!

Thanks,
Andre

> 
> > +#else
> > +/* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> > +#define RTC_BUS_TYPE   DEVICE_BUS_IOPORT
> > +#define RTC_BASE_ADDRESS   0x70
> > +#endif
> > +
> >  /*
> >   * MC146818 RTC registers
> >   */
> > @@ -49,7 +58,7 @@ static void cmos_ram_io(struct kvm_cpu *vcpu, u64 addr, 
> > u8 *data,
> > time_t ti;
> >  
> > if (is_write) {
> > -   if (addr == 0x70) { /* index register */
> > +   if (addr == RTC_BASE_ADDRESS) { /* index register */
> > u8 value = ioport__read8(data);
> >  
> > vcpu->kvm->nmi_disabled = value & (1UL << 7);
> > @@ -70,7 +79,7 @@ static void cmos_ram_io(struct kvm_cpu *vcpu, u64 addr, 
> > u8 *data,
> > return;
> > }
> >  
> > -   if (addr == 0x70)
> > +   if (addr == RTC_BASE_ADDRESS)   /* index register is write-only */
> > return;
> >  
> > time();
> > @@ -127,7 +136,7 @@ static void generate_rtc_fdt_node(void *fdt,
> > u8 irq,
> > enum irq_type))
> >  {
> > -   u64 reg_prop[2] = { cpu_to_fdt64(0x70), cpu_to_fdt64(2) };
> > +   u64 reg_prop[2] = { cpu_to_fdt64(RTC_BASE_ADDRESS), cpu_to_fdt64(2) };
> >  
> > _FDT(fdt_begin_node(fdt, "rtc"));
> > _FDT(fdt_property_string(fdt, "compatible", "motorola,mc146818"));
> > @@ -139,7 +148,7 @@ static void generate_rtc_fdt_node(void *fdt,
> >  #endif
> >  
> >  struct device_header rtc_dev_hdr = {
> > -   .bus_type = DEVICE_BUS_IOPORT,
> > +   .bus_type = RTC_BUS_TYPE,
> > .data = generate_rtc_fdt_node,
> >  };
> >  
> > @@ -151,8 +160,8 @@ int rtc__init(struct kvm *kvm)
> > if (r < 0)
> > return r;
> >  
> > -   /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> > -   r = kvm__register_pio(kvm, 0x0070, 2, cmos_ram_io, NULL);
> > +   r = kvm__register_iotrap(kvm, RTC_BASE_ADDRESS, 2, cmos_ram_io, NULL,
> > +RTC_BUS_TYPE);
> > if (r < 0)
> > goto out_device;
> >  
> > @@ -170,8 +179,7 @@ dev_init(rtc__init);
> >  
> >  int rtc__exit(struct kvm *kvm)
> >  {
> > -   /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> > -   kvm__deregister_pio(kvm, 0x0070);
> > +   kvm__deregister_iotrap(kvm, RTC_BASE_ADDRESS, RTC_BUS_TYPE);
> >  
> > return 0;
> >  }  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 20/21] hw/serial: ARM/arm64: Use MMIO at higher addresses

2021-02-18 Thread Andre Przywara
On Thu, 18 Feb 2021 12:18:38 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> On 2/17/21 4:48 PM, Alexandru Elisei wrote:
> > Hi Andre,
> >
> > On 12/10/20 2:29 PM, Andre Przywara wrote:  
> >> Using the UART devices at their legacy I/O addresses as set by IBM in
> >> 1981 was a kludge we used for simplicity on ARM platforms as well.
> >> However this imposes problems due to their missing alignment and overlap
> >> with the PCI I/O address space.
> >>
> >> Now that we can switch a device easily between using ioports and MMIO,
> >> let's move the UARTs out of the first 4K of memory on ARM platforms.
> >>
> >> That should be transparent for well behaved guests, since the change is
> >> naturally reflected in the device tree. Even "earlycon" keeps working,
> >> as the stdout-path property is adjusted automatically.
> >>
> >> People providing direct earlycon parameters via the command line need to
> >> adjust it to: "earlycon=uart,mmio,0x100".
> >>
> >> Signed-off-by: Andre Przywara 
> >> ---
> >>  hw/serial.c | 52 
> >>  1 file changed, 36 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/hw/serial.c b/hw/serial.c
> >> index d840eebc..00fb3aa8 100644
> >> --- a/hw/serial.c
> >> +++ b/hw/serial.c
> >> @@ -13,6 +13,24 @@
> >>  
> >>  #include 
> >>  
> >> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> >> +#define serial_iobase(nr) (0x100 + (nr) * 0x1000)
> >> +#define serial_irq(nr)(32 + (nr))
> >> +#define SERIAL8250_BUS_TYPE   DEVICE_BUS_MMIO
> >> +#else
> >> +#define serial_iobase_0   0x3f8
> >> +#define serial_iobase_1   0x2f8
> >> +#define serial_iobase_2   0x3e8
> >> +#define serial_iobase_3   0x2e8
> >> +#define serial_irq_0  4
> >> +#define serial_irq_1  3
> >> +#define serial_irq_2  4
> >> +#define serial_irq_3  3  
> > Nitpick: serial_iobase_* and serial_irq_* could be changed to have two 
> > leading
> > underscores, to stress the fact that they are helpers for serial_iobase() 
> > and
> > serial_irq() and are not meant to be used by themselves. But that's just 
> > personal
> > preference, otherwise the patch looks really nice and clean:
> >
> > Reviewed-by: Alexandru Elisei   
> 
> I gave it more thought and I think I spoke too soon. I think it would be good 
> to
> document in the commit message how you came to the base MMIO address and size
> (even if it was just a nice looking number). Also, the CFI flash device has 
> its
> memory range described in the memory layout for the architecture, in
> arm/include/arm-common/kvm-arch.h. I think it would be a good idea to put the
> serial region there, to prevent regressions in the future.

Yes, that's very true, the memory map should be described in one place
only. Will try to do that.


> 
> I did the math to make sure there's no overlap with the ioport region (16M >> 
> 16K)
> or the flash region (16M + 0x4000 < 32M). I also tested the changes with a 
> kernel
> with 4K and 64K pages (I can't access any hardware with 16K pages at the 
> moment),
> and everything worked as expected.

Thanks!
Andre

> >> +#define serial_iobase(nr) serial_iobase_##nr
> >> +#define serial_irq(nr)serial_irq_##nr
> >> +#define SERIAL8250_BUS_TYPE   DEVICE_BUS_IOPORT
> >> +#endif
> >> +
> >>  /*
> >>   * This fakes a U6_16550A. The fifo len needs to be 64 as the kernel
> >>   * expects that for autodetection.
> >> @@ -27,7 +45,7 @@ struct serial8250_device {
> >>struct mutexmutex;
> >>u8  id;
> >>  
> >> -  u16 iobase;
> >> +  u32 iobase;
> >>u8  irq;
> >>u8  irq_state;
> >>int txcnt;
> >> @@ -65,56 +83,56 @@ static struct serial8250_device devices[] = {
> >>/* ttyS0 */
> >>[0] = {
> >>.dev_hdr = {
> >> -  .bus_type   = DEVICE_BUS_IOPORT,
> >> +  .bus_type   = SERIAL8250_BUS_TYPE,
> >>.data   = serial8250_generate_fdt_node,
> >>},
> >>.mutex  = MUTEX_INITIALIZER,
> >>  
> >>.id = 0,
> >> -  .iobase = 0x3f8,
> >> -  .irq= 4,
> >> +  .iobase = serial_iobase(0),
> >> +  .irq= serial_irq(0),
> >>  
> >>SERIAL_REGS_SETTING
> >>},
> >>/* ttyS1 */
> >>[1] = {
> >>.dev_hdr = {
> >> -  .bus_type   = DEVICE_BUS_IOPORT,
> >> +  .bus_type   = SERIAL8250_BUS_TYPE,
> >>.data   = serial8250_generate_fdt_node,
> >>},
> >>.mutex  = MUTEX_INITIALIZER,
> >>  
> >>.id = 1,
> >> -  .iobase = 0x2f8,
> >> - 

Re: [PATCH kvmtool 19/21] Remove ioport specific routines

2021-02-18 Thread Andre Przywara
On Wed, 17 Feb 2021 16:11:51 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> On 2/17/21 3:49 PM, Alexandru Elisei wrote:
> > Hi Andre,
> >
> > On 12/10/20 2:29 PM, Andre Przywara wrote:  
> >> Now that all users of the dedicated ioport trap handler interface are
> >> gone, we can retire the code associated with it.
> >>
> >> This removes ioport.c and ioport.h, along with removing prototypes from
> >> other header files.
> >>
> >> This also transfers the responsibility for port I/O trap handling
> >> entirely into the new routine in mmio.c.
> >>
> >> Signed-off-by: Andre Przywara 
> >> ---
> >>  Makefile |   1 -
> >>  include/kvm/ioport.h |  20 -
> >>  include/kvm/kvm.h|   2 -
> >>  ioport.c | 173 ---
> >>  mmio.c   |   2 +-
> >>  5 files changed, 1 insertion(+), 197 deletions(-)
> >>  delete mode 100644 ioport.c
> >>
> >> diff --git a/Makefile b/Makefile
> >> index 35bb1182..94ff5da6 100644
> >> --- a/Makefile
> >> +++ b/Makefile
> >> @@ -56,7 +56,6 @@ OBJS += framebuffer.o
> >>  OBJS  += guest_compat.o
> >>  OBJS  += hw/rtc.o
> >>  OBJS  += hw/serial.o
> >> -OBJS  += ioport.o
> >>  OBJS  += irq.o
> >>  OBJS  += kvm-cpu.o
> >>  OBJS  += kvm.o
> >> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> >> index a61038e2..38636553 100644
> >> --- a/include/kvm/ioport.h
> >> +++ b/include/kvm/ioport.h
> >> @@ -17,28 +17,8 @@
> >>  
> >>  struct kvm;  
> > Looks to me like the above forward declaration can be removed; same for all 
> > the
> > includes except linux/byteorder.h, needed for the lexx_to_cpu/cpu_to_lexx
> > functions, and linux/types.h for the uxx typedefs. Otherwise looks good.  
> 
> Actually, ignore the part about removing the includes, it opens a new can of 
> worms
> - byteorder.h doesn't include compiler.h where __always_inline is defined, and
> various files where struct kvm_cpu is used don't include kvm-cpu.h (like 
> pci.c,
> hw/serial.c, etc). The header removal is not trivial and I think it should be 
> part
> of another cleanup patch.

Well, it looks like I can remove some obvious headers like for fdt and
rbtree. Will do that.

Cheers,
Andre

> >
> > Thanks,
> >
> > Alex
> >  
> >>  
> >> -struct ioport {
> >> -  struct rb_int_node  node;
> >> -  struct ioport_operations*ops;
> >> -  void*priv;
> >> -  struct device_headerdev_hdr;
> >> -  u32 refcount;
> >> -  boolremove;
> >> -};
> >> -
> >> -struct ioport_operations {
> >> -  bool (*io_in)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> >> void *data, int size);
> >> -  bool (*io_out)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> >> void *data, int size);
> >> -};
> >> -
> >>  void ioport__map_irq(u8 *irq);
> >>  
> >> -int __must_check ioport__register(struct kvm *kvm, u16 port, struct 
> >> ioport_operations *ops,
> >> -int count, void *param);
> >> -int ioport__unregister(struct kvm *kvm, u16 port);
> >> -int ioport__init(struct kvm *kvm);
> >> -int ioport__exit(struct kvm *kvm);
> >> -
> >>  static inline u8 ioport__read8(u8 *data)
> >>  {
> >>return *data;
> >> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> >> index 14f9d58b..e70f8ef6 100644
> >> --- a/include/kvm/kvm.h
> >> +++ b/include/kvm/kvm.h
> >> @@ -119,8 +119,6 @@ void kvm__irq_line(struct kvm *kvm, int irq, int 
> >> level);
> >>  void kvm__irq_trigger(struct kvm *kvm, int irq);
> >>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int 
> >> direction, int size, u32 count);
> >>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 
> >> len, u8 is_write);
> >> -bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data,
> >> -int direction, int size, u32 count);
> >>  int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
> >> *userspace_addr);
> >>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
> >> *userspace_addr,
> >>  enum kvm_mem_type type);
> >> diff --git a/ioport.c b/ioport.c
> >> deleted file mode 100644
> >> index 204d8103..
> >> --- a/ioport.c
> >> +++ /dev/null
> >> @@ -1,173 +0,0 @@
> >> -#include "kvm/ioport.h"
> >> -
> >> -#include "kvm/kvm.h"
> >> -#include "kvm/util.h"
> >> -#include "kvm/rbtree-interval.h"
> >> -#include "kvm/mutex.h"
> >> -
> >> -#include /* for KVM_EXIT_* */
> >> -#include 
> >> -
> >> -#include 
> >> -#include 
> >> -#include 
> >> -#include 
> >> -
> >> -#define ioport_node(n) rb_entry(n, struct ioport, node)
> >> -
> >> -static DEFINE_MUTEX(ioport_lock);
> >> -
> >> -static struct rb_root ioport_tree = RB_ROOT;
> >> -
> >> -static struct ioport *ioport_search(struct rb_root *root, u64 addr)
> >> -{
> >> -  struct rb_int_node *node;
> >> -
> >> -  node = rb_int_search_single(root, addr);
> >> -  if 

Re: [PATCH kvmtool 17/21] virtio: Switch trap handling to use MMIO handler

2021-02-18 Thread Andre Przywara
On Tue, 16 Feb 2021 17:03:04 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> Nitpick below, otherwise looks good.
> 
> On 12/10/20 2:29 PM, Andre Przywara wrote:
> > With the planned retirement of the special ioport emulation code, we
> > need to provide an emulation function compatible with the MMIO prototype.
> >
> > Adjust the existing MMIO callback routine to automatically determine
> > the region this trap came through, and call the existing I/O handlers.
> > Register the ioport region using the new registration function.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  virtio/pci.c | 42 ++
> >  1 file changed, 10 insertions(+), 32 deletions(-)
> >
> > diff --git a/virtio/pci.c b/virtio/pci.c
> > index 6eea6c68..49d3f4d5 100644
> > --- a/virtio/pci.c
> > +++ b/virtio/pci.c
> > @@ -178,15 +178,6 @@ static bool virtio_pci__data_in(struct kvm_cpu *vcpu, 
> > struct virtio_device *vdev
> > return ret;
> >  }
> >  
> > -static bool virtio_pci__io_in(struct ioport *ioport, struct kvm_cpu *vcpu, 
> > u16 port, void *data, int size)
> > -{
> > -   struct virtio_device *vdev = ioport->priv;
> > -   struct virtio_pci *vpci = vdev->virtio;
> > -   unsigned long offset = port - virtio_pci__port_addr(vpci);
> > -
> > -   return virtio_pci__data_in(vcpu, vdev, offset, data, size);
> > -}
> > -
> >  static void update_msix_map(struct virtio_pci *vpci,
> > struct msix_table *msix_entry, u32 vecnum)
> >  {
> > @@ -334,20 +325,6 @@ static bool virtio_pci__data_out(struct kvm_cpu *vcpu, 
> > struct virtio_device *vde
> > return ret;
> >  }
> >  
> > -static bool virtio_pci__io_out(struct ioport *ioport, struct kvm_cpu 
> > *vcpu, u16 port, void *data, int size)
> > -{
> > -   struct virtio_device *vdev = ioport->priv;
> > -   struct virtio_pci *vpci = vdev->virtio;
> > -   unsigned long offset = port - virtio_pci__port_addr(vpci);
> > -
> > -   return virtio_pci__data_out(vcpu, vdev, offset, data, size);
> > -}
> > -
> > -static struct ioport_operations virtio_pci__io_ops = {
> > -   .io_in  = virtio_pci__io_in,
> > -   .io_out = virtio_pci__io_out,
> > -};
> > -
> >  static void virtio_pci__msix_mmio_callback(struct kvm_cpu *vcpu,
> >u64 addr, u8 *data, u32 len,
> >u8 is_write, void *ptr)
> > @@ -455,12 +432,15 @@ static void virtio_pci__io_mmio_callback(struct 
> > kvm_cpu *vcpu,
> >  {
> > struct virtio_device *vdev = ptr;
> > struct virtio_pci *vpci = vdev->virtio;
> > -   u32 mmio_addr = virtio_pci__mmio_addr(vpci);
> > +   u32 base_addr = virtio_pci__mmio_addr(vpci);
> > +
> > +   if (addr < base_addr || addr >= base_addr + PCI_IO_SIZE)
> > +   base_addr = virtio_pci__port_addr(vpci);  
> 
> There are only two BARs that use this callback, the ioport BAR (BAR0) and the
> memory BAR (BAR1) (MSIX uses a different emulation callback). The condition 
> above
> says that if addr is not inside the region described by the memory BAR, then 
> it's
> an ioport BAR. How about checking explicitly that it is inside the ioport 
> region
> like this (compile tested only), which looks a bit more intuitive for me:

Fair enough.

Cheers,
Andre

> 
> diff --git a/virtio/pci.c b/virtio/pci.c
> index 49d3f4d524b2..4024bcd709cd 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -432,10 +432,15 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu 
> *vcpu,
>  {
>     struct virtio_device *vdev = ptr;
>     struct virtio_pci *vpci = vdev->virtio;
> -   u32 base_addr = virtio_pci__mmio_addr(vpci);
> +   u32 mmio_addr = virtio_pci__mmio_addr(vpci);
> +   u32 ioport_addr = virtio_pci__port_addr(vpci);
> +   u32 base_addr;
>  
> -   if (addr < base_addr || addr >= base_addr + PCI_IO_SIZE)
> -   base_addr = virtio_pci__port_addr(vpci);
> +   if (addr >= ioport_addr &&
> +   addr < ioport_addr + pci__bar_size(>pci_hdr, 0))
> +   base_addr = ioport_addr;
> +   else
> +   base_addr = mmio_addr;
>  
>     if (!is_write)
>     virtio_pci__data_in(vcpu, vdev, addr - base_addr, data, len);
> 
> Thanks,
> 
> Alex
> 
> >  
> > if (!is_write)
> > -   virtio_pci__data_in(vcpu, vdev, addr - mmio_addr, data, len);
> > +   virtio_pci__data_in(vcpu, vdev, addr - base_addr, data, len);
> > else
> > -   virtio_pci__data_out(vcpu, vdev, addr - mmio_addr, data, len);
> > +   virtio_pci__data_out(vcpu, vdev, addr - base_addr, data, len);
> >  }
> >  
> >  static int virtio_pci__bar_activate(struct kvm *kvm,
> > @@ -478,10 +458,8 @@ static int virtio_pci__bar_activate(struct kvm *kvm,
> >  
> > switch (bar_num) {
> > case 0:
> > -   r = ioport__register(kvm, bar_addr, _pci__io_ops,
> > -bar_size, vdev);
> > -   if (r > 0)
> > -   r = 0;
> > +   r = kvm__register_pio(kvm, 

Re: [PATCH kvmtool 15/21] vfio: Refactor ioport trap handler

2021-02-18 Thread Andre Przywara
On Tue, 16 Feb 2021 14:47:31 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> Looks good, one nitpick below.
> 
> On 12/10/20 2:29 PM, Andre Przywara wrote:
> > With the planned retirement of the special ioport emulation code, we
> > need to provide an emulation function compatible with the MMIO prototype.
> >
> > Adjust the I/O port trap handler to use that new function, and provide
> > shims to implement the old ioport interface, for now.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  vfio/core.c | 51 ---
> >  1 file changed, 36 insertions(+), 15 deletions(-)
> >
> > diff --git a/vfio/core.c b/vfio/core.c
> > index 0b45e78b..f55f1f87 100644
> > --- a/vfio/core.c
> > +++ b/vfio/core.c
> > @@ -81,15 +81,12 @@ out_free_buf:
> > return ret;
> >  }
> >  
> > -static bool vfio_ioport_in(struct ioport *ioport, struct kvm_cpu *vcpu,
> > -  u16 port, void *data, int len)
> > +static bool _vfio_ioport_in(struct vfio_region *region, u32 offset,
> > +   void *data, int len)
> >  {
> > -   u32 val;
> > -   ssize_t nr;
> > -   struct vfio_region *region = ioport->priv;
> > struct vfio_device *vdev = region->vdev;
> > -
> > -   u32 offset = port - region->port_base;
> > +   ssize_t nr;
> > +   u32 val;
> >  
> > if (!(region->info.flags & VFIO_REGION_INFO_FLAG_READ))
> > return false;
> > @@ -97,7 +94,7 @@ static bool vfio_ioport_in(struct ioport *ioport, struct 
> > kvm_cpu *vcpu,
> > nr = pread(vdev->fd, , len, region->info.offset + offset);
> > if (nr != len) {
> > vfio_dev_err(vdev, "could not read %d bytes from I/O port 
> > 0x%x\n",
> > -len, port);
> > +len, offset);  
> 
> To keep things functionally identical, shouldn't that be offset +
> region->port_base? I think it's easier to identify the device when we have 
> the PCI
> ioport address.

Yeah, true. Although I think "vfio_dev_err(vdev, ..." already indicates
the device at fault, but indeed the actual ioport address is more
canonical to use.

Thanks,
Andre
 

> 
> Thanks,
> 
> Alex
> 
> > return false;
> > }
> >  
> > @@ -118,15 +115,13 @@ static bool vfio_ioport_in(struct ioport *ioport, 
> > struct kvm_cpu *vcpu,
> > return true;
> >  }
> >  
> > -static bool vfio_ioport_out(struct ioport *ioport, struct kvm_cpu *vcpu,
> > -   u16 port, void *data, int len)
> > +static bool _vfio_ioport_out(struct vfio_region *region, u32 offset,
> > +void *data, int len)
> >  {
> > -   u32 val;
> > -   ssize_t nr;
> > -   struct vfio_region *region = ioport->priv;
> > struct vfio_device *vdev = region->vdev;
> > +   ssize_t nr;
> > +   u32 val;
> >  
> > -   u32 offset = port - region->port_base;
> >  
> > if (!(region->info.flags & VFIO_REGION_INFO_FLAG_WRITE))
> > return false;
> > @@ -148,11 +143,37 @@ static bool vfio_ioport_out(struct ioport *ioport, 
> > struct kvm_cpu *vcpu,
> > nr = pwrite(vdev->fd, , len, region->info.offset + offset);
> > if (nr != len)
> > vfio_dev_err(vdev, "could not write %d bytes to I/O port 0x%x",
> > -len, port);
> > +len, offset);
> >  
> > return nr == len;
> >  }
> >  
> > +static void vfio_ioport_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 
> > len,
> > +u8 is_write, void *ptr)
> > +{
> > +   struct vfio_region *region = ptr;
> > +   u32 offset = addr - region->port_base;
> > +
> > +   if (is_write)
> > +   _vfio_ioport_out(region, offset, data, len);
> > +   else
> > +   _vfio_ioport_in(region, offset, data, len);
> > +}
> > +
> > +static bool vfio_ioport_out(struct ioport *ioport, struct kvm_cpu *vcpu,
> > +   u16 port, void *data, int len)
> > +{
> > +   vfio_ioport_mmio(vcpu, port, data, len, true, ioport->priv);
> > +   return true;
> > +}
> > +
> > +static bool vfio_ioport_in(struct ioport *ioport, struct kvm_cpu *vcpu,
> > +  u16 port, void *data, int len)
> > +{
> > +   vfio_ioport_mmio(vcpu, port, data, len, false, ioport->priv);
> > +   return true;
> > +}
> > +
> >  static struct ioport_operations vfio_ioport_ops = {
> > .io_in  = vfio_ioport_in,
> > .io_out = vfio_ioport_out,  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 13/21] hw/serial: Refactor trap handler

2021-02-18 Thread Andre Przywara
On Tue, 16 Feb 2021 14:22:05 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> Patch looks good, nitpicks below.
> 
> On 12/10/20 2:29 PM, Andre Przywara wrote:
> > With the planned retirement of the special ioport emulation code, we
> > need to provide an emulation function compatible with the MMIO prototype.
> >
> > Adjust the trap handler to use that new function, and provide shims to
> > implement the old ioport interface, for now.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  hw/serial.c | 97 +++--
> >  1 file changed, 65 insertions(+), 32 deletions(-)
> >
> > diff --git a/hw/serial.c b/hw/serial.c
> > index b0465d99..2907089c 100644
> > --- a/hw/serial.c
> > +++ b/hw/serial.c
> > @@ -242,36 +242,31 @@ void serial8250__inject_sysrq(struct kvm *kvm, char 
> > sysrq)
> > sysrq_pending = sysrq;
> >  }
> >  
> > -static bool serial8250_out(struct ioport *ioport, struct kvm_cpu *vcpu, 
> > u16 port,
> > -  void *data, int size)
> > +static bool serial8250_out(struct serial8250_device *dev, struct kvm_cpu 
> > *vcpu,
> > +  u16 offset, u8 data)
> >  {
> > -   struct serial8250_device *dev = ioport->priv;
> > -   u16 offset;
> > bool ret = true;
> > -   char *addr = data;
> >  
> > mutex_lock(>mutex);
> >  
> > -   offset = port - dev->iobase;
> > -
> > switch (offset) {
> > case UART_TX:
> > if (dev->lcr & UART_LCR_DLAB) {
> > -   dev->dll = ioport__read8(data);
> > +   dev->dll = data;
> > break;
> > }
> >  
> > /* Loopback mode */
> > if (dev->mcr & UART_MCR_LOOP) {
> > if (dev->rxcnt < FIFO_LEN) {
> > -   dev->rxbuf[dev->rxcnt++] = *addr;
> > +   dev->rxbuf[dev->rxcnt++] = data;
> > dev->lsr |= UART_LSR_DR;
> > }
> > break;
> > }
> >  
> > if (dev->txcnt < FIFO_LEN) {
> > -   dev->txbuf[dev->txcnt++] = *addr;
> > +   dev->txbuf[dev->txcnt++] = data;
> > dev->lsr &= ~UART_LSR_TEMT;
> > if (dev->txcnt == FIFO_LEN / 2)
> > dev->lsr &= ~UART_LSR_THRE;
> > @@ -283,18 +278,18 @@ static bool serial8250_out(struct ioport *ioport, 
> > struct kvm_cpu *vcpu, u16 port
> > break;
> > case UART_IER:
> > if (!(dev->lcr & UART_LCR_DLAB))
> > -   dev->ier = ioport__read8(data) & 0x0f;
> > +   dev->ier = data & 0x0f;
> > else
> > -   dev->dlm = ioport__read8(data);
> > +   dev->dlm = data;
> > break;
> > case UART_FCR:
> > -   dev->fcr = ioport__read8(data);
> > +   dev->fcr = data;
> > break;
> > case UART_LCR:
> > -   dev->lcr = ioport__read8(data);
> > +   dev->lcr = data;
> > break;
> > case UART_MCR:
> > -   dev->mcr = ioport__read8(data);
> > +   dev->mcr = data;
> > break;
> > case UART_LSR:
> > /* Factory test */
> > @@ -303,7 +298,7 @@ static bool serial8250_out(struct ioport *ioport, 
> > struct kvm_cpu *vcpu, u16 port
> > /* Not used */
> > break;
> > case UART_SCR:
> > -   dev->scr = ioport__read8(data);
> > +   dev->scr = data;
> > break;
> > default:
> > ret = false;
> > @@ -336,46 +331,43 @@ static void serial8250_rx(struct serial8250_device 
> > *dev, void *data)
> > }
> >  }
> >  
> > -static bool serial8250_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> > port, void *data, int size)
> > +static bool serial8250_in(struct serial8250_device *dev, struct kvm_cpu 
> > *vcpu,
> > + u16 offset, u8 *data)
> >  {
> > -   struct serial8250_device *dev = ioport->priv;
> > -   u16 offset;
> > bool ret = true;
> >  
> > mutex_lock(>mutex);
> >  
> > -   offset = port - dev->iobase;
> > -
> > switch (offset) {
> > case UART_RX:
> > if (dev->lcr & UART_LCR_DLAB)
> > -   ioport__write8(data, dev->dll);
> > +   *data = dev->dll;
> > else
> > serial8250_rx(dev, data);
> > break;
> > case UART_IER:
> > if (dev->lcr & UART_LCR_DLAB)
> > -   ioport__write8(data, dev->dlm);
> > +   *data = dev->dlm;
> > else
> > -   ioport__write8(data, dev->ier);
> > +   *data = dev->ier;
> > break;
> > case UART_IIR:
> > -   ioport__write8(data, dev->iir | UART_IIR_TYPE_BITS);
> > +   *data = dev->iir | UART_IIR_TYPE_BITS;
> > break;
> > case UART_LCR:
> > -   ioport__write8(data, dev->lcr);
> > +  

Re: [PATCH kvmtool 09/21] x86/ioport: Switch to new trap handlers

2021-02-18 Thread Andre Przywara
On Fri, 12 Feb 2021 11:27:59 +
Alexandru Elisei  wrote:

Hi Alex,

> On 12/10/20 2:28 PM, Andre Przywara wrote:
> > Now that the x86 I/O ports have trap handlers adhering to the MMIO fault
> > handler prototype, let's switch over to the joint registration routine.
> >
> > This allows us to get rid of the ioport shim routines.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  x86/ioport.c | 113 ++-
> >  1 file changed, 30 insertions(+), 83 deletions(-)
> >
> > diff --git a/x86/ioport.c b/x86/ioport.c
> > index 932da20a..87955da1 100644
> > --- a/x86/ioport.c
> > +++ b/x86/ioport.c
> > @@ -8,16 +8,6 @@ static void dummy_mmio(struct kvm_cpu *vcpu, u64 addr, u8 
> > *data, u32 len,
> >  {
> >  }
> >  
> > -static bool debug_io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> > port, void *data, int size)
> > -{
> > -   dummy_mmio(vcpu, port, data, size, true, NULL);
> > -   return 0;
> > -}  
> 
> 0 is false in boolean logic, which means that emulation fails according to the
> (old) ioport emulation code (ioport.c::kvm__emulate_io()).
> 
> So I guess I have a few questions:
> 
> - Is this a bug in the emulation code, where the author thought that
> debug_io_out() returns an int, and in that case 0 actually means success?

Wading through the swamp of the git history it looks like failure is
intentional, it should be used together with --debug-ioport, to trigger
that print there.
 
> - If writing to the debug port is rightfully considered an error, do we care
> enough about it to print something to stdout like kvm__emulate_io() does when
> debug_io_out() returns false?

I think you are right, we should print something here.

Cheers,
Andre

> > -
> > -static struct ioport_operations debug_ops = {
> > -   .io_out = debug_io_out,
> > -};
> > -
> >  static void seabios_debug_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
> >u32 len, u8 is_write, void *ptr)
> >  {
> > @@ -31,37 +21,6 @@ static void seabios_debug_mmio(struct kvm_cpu *vcpu, u64 
> > addr, u8 *data,
> > putchar(ch);
> >  }
> >  
> > -static bool seabios_debug_io_out(struct ioport *ioport, struct kvm_cpu 
> > *vcpu, u16 port, void *data, int size)
> > -{
> > -   seabios_debug_mmio(vcpu, port, data, size, true, NULL);
> > -   return 0;
> > -}
> > -
> > -static struct ioport_operations seabios_debug_ops = {
> > -   .io_out = seabios_debug_io_out,
> > -};
> > -
> > -static bool dummy_io_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> > port, void *data, int size)
> > -{
> > -   dummy_mmio(vcpu, port, data, size, false, NULL);
> > -   return true;
> > -}
> > -
> > -static bool dummy_io_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> > port, void *data, int size)
> > -{
> > -   dummy_mmio(vcpu, port, data, size, true, NULL);
> > -   return true;
> > -}
> > -
> > -static struct ioport_operations dummy_read_write_ioport_ops = {
> > -   .io_in  = dummy_io_in,
> > -   .io_out = dummy_io_out,
> > -};
> > -
> > -static struct ioport_operations dummy_write_only_ioport_ops = {
> > -   .io_out = dummy_io_out,
> > -};
> > -
> >  /*
> >   * The "fast A20 gate"
> >   */
> > @@ -76,17 +35,6 @@ static void ps2_control_mmio(struct kvm_cpu *vcpu, u64 
> > addr, u8 *data, u32 len,
> > ioport__write8(data, 0x02);
> >  }
> >  
> > -static bool ps2_control_a_io_in(struct ioport *ioport, struct kvm_cpu 
> > *vcpu, u16 port, void *data, int size)
> > -{
> > -   ps2_control_mmio(vcpu, port, data, size, false, NULL);
> > -   return true;
> > -}
> > -
> > -static struct ioport_operations ps2_control_a_ops = {
> > -   .io_in  = ps2_control_a_io_in,
> > -   .io_out = dummy_io_out,
> > -};
> > -
> >  void ioport__map_irq(u8 *irq)
> >  {
> >  }
> > @@ -98,75 +46,75 @@ static int ioport__setup_arch(struct kvm *kvm)
> > /* Legacy ioport setup */
> >  
> > /*  - 001F - DMA1 controller */
> > -   r = ioport__register(kvm, 0x, _read_write_ioport_ops, 32, 
> > NULL);
> > +   r = kvm__register_pio(kvm, 0x, 32, dummy_mmio, NULL);
> > if (r < 0)
> > return r;
> >  
> > /* 0x0020 - 0x003F - 8259A PIC 1 */
> > -   r = ioport__register(kvm, 0x0020, _read_write_ioport_ops, 2, 
> > NULL);
> > +   r = kvm__register_pio(kvm, 0x0020, 2, dummy_mmio, NULL);
> > if (r < 0)
> > return r;
> >  
> > /* PORT 0040-005F - PIT - PROGRAMMABLE INTERVAL TIMER (8253, 8254) */
> > -   r = ioport__register(kvm, 0x0040, _read_write_ioport_ops, 4, 
> > NULL);
> > +   r = kvm__register_pio(kvm, 0x0040, 4, dummy_mmio, NULL);
> > if (r < 0)
> > return r;
> >  
> > /* 0092 - PS/2 system control port A */
> > -   r = ioport__register(kvm, 0x0092, _control_a_ops, 1, NULL);
> > +   r = kvm__register_pio(kvm, 0x0092, 1, ps2_control_mmio, NULL);
> > if (r < 0)
> > return r;
> >  
> > /* 0x00A0 - 0x00AF - 8259A PIC 2 */
> > -   r = ioport__register(kvm, 

Re: [PATCH kvmtool 21/21] hw/rtc: ARM/arm64: Use MMIO at higher addresses

2021-02-18 Thread Alexandru Elisei
Hi Andre,

On 12/10/20 2:29 PM, Andre Przywara wrote:
> Using the RTC device at its legacy I/O address as set by IBM in 1981
> was a kludge we used for simplicity on ARM platforms as well.
> However this imposes problems due to their missing alignment and overlap
> with the PCI I/O address space.
>
> Now that we can switch a device easily between using ioports and
> MMIO, let's move the RTC out of the first 4K of memory on ARM platforms.
>
> That should be transparent for well behaved guests, since the change is
> naturally reflected in the device tree.
>
> Signed-off-by: Andre Przywara 
> ---
>  hw/rtc.c | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/hw/rtc.c b/hw/rtc.c
> index ee4c9102..bdb88f0f 100644
> --- a/hw/rtc.c
> +++ b/hw/rtc.c
> @@ -5,6 +5,15 @@
>  
>  #include 
>  
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +#define RTC_BUS_TYPE DEVICE_BUS_MMIO
> +#define RTC_BASE_ADDRESS 0x101

This looks correct, the base address is the serial base address + 64k, so they
don't overlap, and it doesn't overlap with the flash memory either. Same comment
as for the serial, I think the reason for choosing this address should be
mentioned, and the region should be put in the arch memory layout file. Other 
than
that, the patch looks good.

Thanks,

Alex

> +#else
> +/* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> +#define RTC_BUS_TYPE DEVICE_BUS_IOPORT
> +#define RTC_BASE_ADDRESS 0x70
> +#endif
> +
>  /*
>   * MC146818 RTC registers
>   */
> @@ -49,7 +58,7 @@ static void cmos_ram_io(struct kvm_cpu *vcpu, u64 addr, u8 
> *data,
>   time_t ti;
>  
>   if (is_write) {
> - if (addr == 0x70) { /* index register */
> + if (addr == RTC_BASE_ADDRESS) { /* index register */
>   u8 value = ioport__read8(data);
>  
>   vcpu->kvm->nmi_disabled = value & (1UL << 7);
> @@ -70,7 +79,7 @@ static void cmos_ram_io(struct kvm_cpu *vcpu, u64 addr, u8 
> *data,
>   return;
>   }
>  
> - if (addr == 0x70)
> + if (addr == RTC_BASE_ADDRESS)   /* index register is write-only */
>   return;
>  
>   time();
> @@ -127,7 +136,7 @@ static void generate_rtc_fdt_node(void *fdt,
>   u8 irq,
>   enum irq_type))
>  {
> - u64 reg_prop[2] = { cpu_to_fdt64(0x70), cpu_to_fdt64(2) };
> + u64 reg_prop[2] = { cpu_to_fdt64(RTC_BASE_ADDRESS), cpu_to_fdt64(2) };
>  
>   _FDT(fdt_begin_node(fdt, "rtc"));
>   _FDT(fdt_property_string(fdt, "compatible", "motorola,mc146818"));
> @@ -139,7 +148,7 @@ static void generate_rtc_fdt_node(void *fdt,
>  #endif
>  
>  struct device_header rtc_dev_hdr = {
> - .bus_type = DEVICE_BUS_IOPORT,
> + .bus_type = RTC_BUS_TYPE,
>   .data = generate_rtc_fdt_node,
>  };
>  
> @@ -151,8 +160,8 @@ int rtc__init(struct kvm *kvm)
>   if (r < 0)
>   return r;
>  
> - /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> - r = kvm__register_pio(kvm, 0x0070, 2, cmos_ram_io, NULL);
> + r = kvm__register_iotrap(kvm, RTC_BASE_ADDRESS, 2, cmos_ram_io, NULL,
> +  RTC_BUS_TYPE);
>   if (r < 0)
>   goto out_device;
>  
> @@ -170,8 +179,7 @@ dev_init(rtc__init);
>  
>  int rtc__exit(struct kvm *kvm)
>  {
> - /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */
> - kvm__deregister_pio(kvm, 0x0070);
> + kvm__deregister_iotrap(kvm, RTC_BASE_ADDRESS, RTC_BUS_TYPE);
>  
>   return 0;
>  }
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 20/21] hw/serial: ARM/arm64: Use MMIO at higher addresses

2021-02-18 Thread Alexandru Elisei
Hi Andre,

On 2/17/21 4:48 PM, Alexandru Elisei wrote:
> Hi Andre,
>
> On 12/10/20 2:29 PM, Andre Przywara wrote:
>> Using the UART devices at their legacy I/O addresses as set by IBM in
>> 1981 was a kludge we used for simplicity on ARM platforms as well.
>> However this imposes problems due to their missing alignment and overlap
>> with the PCI I/O address space.
>>
>> Now that we can switch a device easily between using ioports and MMIO,
>> let's move the UARTs out of the first 4K of memory on ARM platforms.
>>
>> That should be transparent for well behaved guests, since the change is
>> naturally reflected in the device tree. Even "earlycon" keeps working,
>> as the stdout-path property is adjusted automatically.
>>
>> People providing direct earlycon parameters via the command line need to
>> adjust it to: "earlycon=uart,mmio,0x100".
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  hw/serial.c | 52 
>>  1 file changed, 36 insertions(+), 16 deletions(-)
>>
>> diff --git a/hw/serial.c b/hw/serial.c
>> index d840eebc..00fb3aa8 100644
>> --- a/hw/serial.c
>> +++ b/hw/serial.c
>> @@ -13,6 +13,24 @@
>>  
>>  #include 
>>  
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> +#define serial_iobase(nr)   (0x100 + (nr) * 0x1000)
>> +#define serial_irq(nr)  (32 + (nr))
>> +#define SERIAL8250_BUS_TYPE DEVICE_BUS_MMIO
>> +#else
>> +#define serial_iobase_0 0x3f8
>> +#define serial_iobase_1 0x2f8
>> +#define serial_iobase_2 0x3e8
>> +#define serial_iobase_3 0x2e8
>> +#define serial_irq_04
>> +#define serial_irq_13
>> +#define serial_irq_24
>> +#define serial_irq_33
> Nitpick: serial_iobase_* and serial_irq_* could be changed to have two leading
> underscores, to stress the fact that they are helpers for serial_iobase() and
> serial_irq() and are not meant to be used by themselves. But that's just 
> personal
> preference, otherwise the patch looks really nice and clean:
>
> Reviewed-by: Alexandru Elisei 

I gave it more thought and I think I spoke too soon. I think it would be good to
document in the commit message how you came to the base MMIO address and size
(even if it was just a nice looking number). Also, the CFI flash device has its
memory range described in the memory layout for the architecture, in
arm/include/arm-common/kvm-arch.h. I think it would be a good idea to put the
serial region there, to prevent regressions in the future.

I did the math to make sure there's no overlap with the ioport region (16M >> 
16K)
or the flash region (16M + 0x4000 < 32M). I also tested the changes with a 
kernel
with 4K and 64K pages (I can't access any hardware with 16K pages at the 
moment),
and everything worked as expected.

Thanks,

Alex

> Thanks,
>
> Alex
>
>> +#define serial_iobase(nr)   serial_iobase_##nr
>> +#define serial_irq(nr)  serial_irq_##nr
>> +#define SERIAL8250_BUS_TYPE DEVICE_BUS_IOPORT
>> +#endif
>> +
>>  /*
>>   * This fakes a U6_16550A. The fifo len needs to be 64 as the kernel
>>   * expects that for autodetection.
>> @@ -27,7 +45,7 @@ struct serial8250_device {
>>  struct mutexmutex;
>>  u8  id;
>>  
>> -u16 iobase;
>> +u32 iobase;
>>  u8  irq;
>>  u8  irq_state;
>>  int txcnt;
>> @@ -65,56 +83,56 @@ static struct serial8250_device devices[] = {
>>  /* ttyS0 */
>>  [0] = {
>>  .dev_hdr = {
>> -.bus_type   = DEVICE_BUS_IOPORT,
>> +.bus_type   = SERIAL8250_BUS_TYPE,
>>  .data   = serial8250_generate_fdt_node,
>>  },
>>  .mutex  = MUTEX_INITIALIZER,
>>  
>>  .id = 0,
>> -.iobase = 0x3f8,
>> -.irq= 4,
>> +.iobase = serial_iobase(0),
>> +.irq= serial_irq(0),
>>  
>>  SERIAL_REGS_SETTING
>>  },
>>  /* ttyS1 */
>>  [1] = {
>>  .dev_hdr = {
>> -.bus_type   = DEVICE_BUS_IOPORT,
>> +.bus_type   = SERIAL8250_BUS_TYPE,
>>  .data   = serial8250_generate_fdt_node,
>>  },
>>  .mutex  = MUTEX_INITIALIZER,
>>  
>>  .id = 1,
>> -.iobase = 0x2f8,
>> -.irq= 3,
>> +.iobase = serial_iobase(1),
>> +.irq= serial_irq(1),
>>  
>>  SERIAL_REGS_SETTING
>>  },
>>  /* ttyS2 */
>>  [2] = {
>>  .dev_hdr = {
>> -.bus_type   

Re: [PATCH kvmtool 07/21] hw/i8042: Switch to new trap handlers

2021-02-18 Thread Andre Przywara
On Fri, 12 Feb 2021 10:41:20 +
Alexandru Elisei  wrote:

Hi,

> On 12/10/20 2:28 PM, Andre Przywara wrote:
> > Now that the PC keyboard has a trap handler adhering to the MMIO fault
> > handler prototype, let's switch over to the joint registration routine.
> >
> > This allows us to get rid of the ioport shim routines.
> >
> > Make the kbd_init() function static on the way.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  hw/i8042.c  | 30 --
> >  include/kvm/i8042.h |  1 -
> >  2 files changed, 4 insertions(+), 27 deletions(-)
> >
> > diff --git a/hw/i8042.c b/hw/i8042.c
> > index eb1f9d28..91d79dc4 100644
> > --- a/hw/i8042.c
> > +++ b/hw/i8042.c
> > @@ -325,40 +325,18 @@ static void kbd_io(struct kvm_cpu *vcpu, u64 addr, u8 
> > *data, u32 len,
> > ioport__write8(data, value);
> >  }
> >  
> > -/*
> > - * Called when the OS has written to one of the keyboard's ports (0x60 or 
> > 0x64)
> > - */
> > -static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size)
> > -{
> > -   kbd_io(vcpu, port, data, size, false, NULL);
> > -
> > -   return true;
> > -}
> > -
> > -static bool kbd_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size)
> > -{
> > -   kbd_io(vcpu, port, data, size, true, NULL);
> > -
> > -   return true;
> > -}
> > -
> > -static struct ioport_operations kbd_ops = {
> > -   .io_in  = kbd_in,
> > -   .io_out = kbd_out,
> > -};
> > -
> > -int kbd__init(struct kvm *kvm)
> > +static int kbd__init(struct kvm *kvm)
> >  {
> > int r;
> >  
> > kbd_reset();
> > state.kvm = kvm;
> > -   r = ioport__register(kvm, I8042_DATA_REG, _ops, 2, NULL);
> > +   r = kvm__register_pio(kvm, I8042_DATA_REG, 2, kbd_io, NULL);  
> 
> I guess you are registering two addresses here to cover I8042_PORT_B_REG, 
> right?
> Might be worth a comment.

I am registering two ports because the original code did, and I didn't
dare to touch this. I guess we put this on the wishlist for the device
emulation fixup series? ;-)

Maybe the intention was to just *reserve* those ports?

> 
> > if (r < 0)
> > return r;
> > -   r = ioport__register(kvm, I8042_COMMAND_REG, _ops, 2, NULL);
> > +   r = kvm__register_pio(kvm, I8042_COMMAND_REG, 2, kbd_io, NULL);  
> 
> Shouldn't length be 1? The emulation should work only for address 0x64
> (command/status register), right? Or am I missing something?

I don't think you are, same as above. Maybe some weird guest is using
half-word accesses (outw)?

Cheers,
Andre


> 
> Thanks,
> 
> Alex
> 
> > if (r < 0) {
> > -   ioport__unregister(kvm, I8042_DATA_REG);
> > +   kvm__deregister_pio(kvm, I8042_DATA_REG);
> > return r;
> > }
> >  
> > diff --git a/include/kvm/i8042.h b/include/kvm/i8042.h
> > index 3b4ab688..cd4ae6bb 100644
> > --- a/include/kvm/i8042.h
> > +++ b/include/kvm/i8042.h
> > @@ -7,6 +7,5 @@ struct kvm;
> >  
> >  void mouse_queue(u8 c);
> >  void kbd_queue(u8 c);
> > -int kbd__init(struct kvm *kvm);
> >  
> >  #endif  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 06/21] hw/i8042: Refactor trap handler

2021-02-18 Thread Andre Przywara
On Thu, 18 Feb 2021 11:17:58 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> On 2/18/21 10:34 AM, Andre Przywara wrote:
> > On Thu, 11 Feb 2021 17:23:13 +
> > Alexandru Elisei  wrote:
> >  
> >> Hi Andre,
> >>
> >> On 12/10/20 2:28 PM, Andre Przywara wrote:  
> >>> With the planned retirement of the special ioport emulation code, we
> >>> need to provide an emulation function compatible with the MMIO
> >>> prototype.
> >>>
> >>> Adjust the trap handler to use that new function, and provide shims to
> >>> implement the old ioport interface, for now.
> >>>
> >>> Signed-off-by: Andre Przywara 
> >>> ---
> >>>  hw/i8042.c | 68 +++---
> >>>  1 file changed, 34 insertions(+), 34 deletions(-)
> >>>
> >>> diff --git a/hw/i8042.c b/hw/i8042.c
> >>> index 36ee183f..eb1f9d28 100644
> >>> --- a/hw/i8042.c
> >>> +++ b/hw/i8042.c
> >>> @@ -292,52 +292,52 @@ static void kbd_reset(void)
> >>>   };
> >>>  }
> >>>  
> >>> -/*
> >>> - * Called when the OS has written to one of the keyboard's ports (0x60 
> >>> or 0x64)
> >>> - */
> >>> -static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> >>> port, void *data, int size)
> >>> +static void kbd_io(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
> >>> +u8 is_write, void *ptr)
> >>>  {
> >>> - switch (port) {
> >>> - case I8042_COMMAND_REG: {
> >>> - u8 value = kbd_read_status();
> >>> - ioport__write8(data, value);
> >>> + u8 value;
> >>> +
> >>> + if (is_write)
> >>> + value = ioport__read8(data);
> >>> +
> >>> + switch (addr) {
> >>> + case I8042_COMMAND_REG:
> >>> + if (is_write)
> >>> + kbd_write_command(vcpu->kvm, value);
> >>> + else
> >>> + value = kbd_read_status();
> >>>   break;
> >>> - }
> >>> - case I8042_DATA_REG: {
> >>> - u8 value = kbd_read_data();
> >>> - ioport__write8(data, value);
> >>> + case I8042_DATA_REG:
> >>> + if (is_write)
> >>> + kbd_write_data(value);
> >>> + else
> >>> + value = kbd_read_data();
> >>>   break;
> >>> - }
> >>> - case I8042_PORT_B_REG: {
> >>> - ioport__write8(data, 0x20);
> >>> + case I8042_PORT_B_REG:
> >>> + if (!is_write)
> >>> + value = 0x20;
> >>>   break;
> >>> - }
> >>>   default:
> >>> - return false;
> >>> + return;
> >> Any particular reason for removing the false return value? I don't see it
> >> mentioned in the commit message. Otherwise this is identical to the two 
> >> functions
> >> it replaces.  
> > Because the MMIO handler prototype is void, in contrast to the PIO one.
> > Since on returning "false" we only seem to panic kvmtool, this is of
> > quite limited use, I'd say.  
> 
> Actually, in ioport.c::kvm__emulate_io(), if kvm->cfg.ioport_debug is true, it
> will print an error and then panic in kvm_cpu__start(); otherwise the error is
> silently ignored. serial.c is another device where an unknown register returns
> false. In rtc.c, the unknown register is ignored. cfi_flash.c prints debugging
> information. So I guess kvmtool implements all possible methods of handling an
> unknown register *at the same time*, so it's up to you how you want to handle 
> it.

Well, the MMIO prototype we are going to use is void anyway, so it's
just one patch earlier that we get this new behaviour.
For handling MMIO errors:
- Hardware MMIO doesn't have a back channel: if the MMIO write triggers
  some error condition, the device would need to deal with it (setting
  internal error state, ignore, etc.). On some systems the device could
  throw some kind of bus error or SError, but this is a rather drastic
  measure, and is certainly not exercised by those ancient devices.
- Any kind of error reporting which can be triggered by a guest is
  frowned upon: it could spam the console or some log file, and so
  impact host operation. At the end an administrator can't do much about
  it, anyway.
- Which leaves the only use to some kvmtool developer debugging some
  device emulation or investigating weird guest behaviour. And in this
  case we can more easily have a debug message *inside* the device
  emulation code, can't we?

And since the MMIO handler prototype is void, we have no choice anyway,
at least not without another huge (and pointless) series to change
those user as well ;-)

Cheers,
Andre

> >>>   }
> >>>  
> >>> + if (!is_write)
> >>> + ioport__write8(data, value);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Called when the OS has written to one of the keyboard's ports (0x60 
> >>> or 0x64)
> >>> + */
> >>> +static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 
> >>> port, void *data, int size)
> >>> +{
> >>> + kbd_io(vcpu, port, data, size, false, NULL);
> >> is_write is an u8, not a bool.  
> > Right, will fix this.
> >  
> >> I never could wrap my head around the ioport convention of "in" (read) and 
> >> 

Re: [PATCH kvmtool 06/21] hw/i8042: Refactor trap handler

2021-02-18 Thread Alexandru Elisei
Hi Andre,

On 2/18/21 10:34 AM, Andre Przywara wrote:
> On Thu, 11 Feb 2021 17:23:13 +
> Alexandru Elisei  wrote:
>
>> Hi Andre,
>>
>> On 12/10/20 2:28 PM, Andre Przywara wrote:
>>> With the planned retirement of the special ioport emulation code, we
>>> need to provide an emulation function compatible with the MMIO
>>> prototype.
>>>
>>> Adjust the trap handler to use that new function, and provide shims to
>>> implement the old ioport interface, for now.
>>>
>>> Signed-off-by: Andre Przywara 
>>> ---
>>>  hw/i8042.c | 68 +++---
>>>  1 file changed, 34 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/hw/i8042.c b/hw/i8042.c
>>> index 36ee183f..eb1f9d28 100644
>>> --- a/hw/i8042.c
>>> +++ b/hw/i8042.c
>>> @@ -292,52 +292,52 @@ static void kbd_reset(void)
>>> };
>>>  }
>>>  
>>> -/*
>>> - * Called when the OS has written to one of the keyboard's ports (0x60 or 
>>> 0x64)
>>> - */
>>> -static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>>> void *data, int size)
>>> +static void kbd_io(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
>>> +  u8 is_write, void *ptr)
>>>  {
>>> -   switch (port) {
>>> -   case I8042_COMMAND_REG: {
>>> -   u8 value = kbd_read_status();
>>> -   ioport__write8(data, value);
>>> +   u8 value;
>>> +
>>> +   if (is_write)
>>> +   value = ioport__read8(data);
>>> +
>>> +   switch (addr) {
>>> +   case I8042_COMMAND_REG:
>>> +   if (is_write)
>>> +   kbd_write_command(vcpu->kvm, value);
>>> +   else
>>> +   value = kbd_read_status();
>>> break;
>>> -   }
>>> -   case I8042_DATA_REG: {
>>> -   u8 value = kbd_read_data();
>>> -   ioport__write8(data, value);
>>> +   case I8042_DATA_REG:
>>> +   if (is_write)
>>> +   kbd_write_data(value);
>>> +   else
>>> +   value = kbd_read_data();
>>> break;
>>> -   }
>>> -   case I8042_PORT_B_REG: {
>>> -   ioport__write8(data, 0x20);
>>> +   case I8042_PORT_B_REG:
>>> +   if (!is_write)
>>> +   value = 0x20;
>>> break;
>>> -   }
>>> default:
>>> -   return false;
>>> +   return;  
>> Any particular reason for removing the false return value? I don't see it
>> mentioned in the commit message. Otherwise this is identical to the two 
>> functions
>> it replaces.
> Because the MMIO handler prototype is void, in contrast to the PIO one.
> Since on returning "false" we only seem to panic kvmtool, this is of
> quite limited use, I'd say.

Actually, in ioport.c::kvm__emulate_io(), if kvm->cfg.ioport_debug is true, it
will print an error and then panic in kvm_cpu__start(); otherwise the error is
silently ignored. serial.c is another device where an unknown register returns
false. In rtc.c, the unknown register is ignored. cfi_flash.c prints debugging
information. So I guess kvmtool implements all possible methods of handling an
unknown register *at the same time*, so it's up to you how you want to handle 
it.

>>> }
>>>  
>>> +   if (!is_write)
>>> +   ioport__write8(data, value);
>>> +}
>>> +
>>> +/*
>>> + * Called when the OS has written to one of the keyboard's ports (0x60 or 
>>> 0x64)
>>> + */
>>> +static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>>> void *data, int size)
>>> +{
>>> +   kbd_io(vcpu, port, data, size, false, NULL);  
>> is_write is an u8, not a bool.
> Right, will fix this.
>
>> I never could wrap my head around the ioport convention of "in" (read) and 
>> "out"
>> (write). To be honest, changing is_write changed to an enum so it's crystal 
>> clear
>> what is happening would help with that a lot, but I guess that's a separate 
>> patch.
> "in" and "out" are the x86 assembly mnemonics, but it's indeed
> confusing, because the device side has a different view (CPU "in" means
> pushing data "out" of the device). I usually look at the code to see
> what it's actually meant to do.
> So yeah, I feel like a lot of those device emulations could use
> some update. but that's indeed something for another day.

Agreed.

Thanks,

Alex

>
> Cheers,
> Andre
>
>>> +
>>> return true;
>>>  }
>>>  
>>>  static bool kbd_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>>> void *data, int size)
>>>  {
>>> -   switch (port) {
>>> -   case I8042_COMMAND_REG: {
>>> -   u8 value = ioport__read8(data);
>>> -   kbd_write_command(vcpu->kvm, value);
>>> -   break;
>>> -   }
>>> -   case I8042_DATA_REG: {
>>> -   u8 value = ioport__read8(data);
>>> -   kbd_write_data(value);
>>> -   break;
>>> -   }
>>> -   case I8042_PORT_B_REG: {
>>> -   break;
>>> -   }
>>> -   default:
>>> -   return false;
>>> -   }
>>> +   kbd_io(vcpu, port, data, size, true, NULL);
>>>  
>>> return true;
>>>  }  

Re: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Auger Eric
Hi Shameer,

On 2/18/21 11:36 AM, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>>> -Original Message-
>>> From: Eric Auger [mailto:eric.au...@redhat.com]
>>> Sent: 16 November 2020 11:00
>>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>>> io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>>> k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; w...@kernel.org;
>>> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
>>> alex.william...@redhat.com
>>> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
>>> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
>>> Thodi ;
>>> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
>>> nicoleots...@gmail.com; yuzenghui 
>>> Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
>>> region
>>>
>>> In preparation for vSVA, let's register a DMA fault response region,
>>> where the userspace will push the page responses and increment the
>>> head of the buffer. The kernel will pop those responses and inject
>>> them on iommu side.
>>>
>>> Signed-off-by: Eric Auger 
>>> ---
>>>  drivers/vfio/pci/vfio_pci.c | 114 +---
>>>  drivers/vfio/pci/vfio_pci_private.h |   5 ++
>>>  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
>>>  include/uapi/linux/vfio.h   |  32 
>>>  4 files changed, 181 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
>>> index 65a83fd0e8c0..e9a904ce3f0d 100644
>>> --- a/drivers/vfio/pci/vfio_pci.c
>>> +++ b/drivers/vfio/pci/vfio_pci.c
>>> @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
>>> vfio_pci_device *vdev,
>>> kfree(vdev->fault_pages);
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> -  struct vfio_pci_region *region,
>>> -  struct vm_area_struct *vma)
>>> +static void
>>> +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region) {
>>> +   if (vdev->dma_fault_response_wq)
>>> +   destroy_workqueue(vdev->dma_fault_response_wq);
>>> +   kfree(vdev->fault_response_pages);
>>> +   vdev->fault_response_pages = NULL;
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +struct vfio_pci_region *region,
>>> +struct vm_area_struct *vma,
>>> +u8 *pages)
>>>  {
>>> u64 phys_len, req_len, pgoff, req_start;
>>> unsigned long long addr;
>>> @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
>>> req_start = pgoff << PAGE_SHIFT;
>>>
>>> -   /* only the second page of the producer fault region is mmappable */
>>> +   /* only the second page of the fault region is mmappable */
>>> if (req_start < PAGE_SIZE)
>>> return -EINVAL;
>>>
>>> if (req_start + req_len > phys_len)
>>> return -EINVAL;
>>>
>>> -   addr = virt_to_phys(vdev->fault_pages);
>>> +   addr = virt_to_phys(pages);
>>> vma->vm_private_data = vdev;
>>> vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
>>>
>>> @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
>>> vfio_pci_device *vdev,
>>> return ret;
>>>  }
>>>
>>> -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
>>> -struct vfio_pci_region *region,
>>> -struct vfio_info_cap *caps)
>>> +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_pages);
>>> +}
>>> +
>>> +static int
>>> +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
>>> +   struct vfio_pci_region *region,
>>> +   struct vm_area_struct *vma)
>>> +{
>>> +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
>>> vdev->fault_response_pages);
>>> +}
>>> +
>>> +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
>>> *vdev,
>>> +  struct vfio_pci_region *region,
>>> +  struct vfio_info_cap *caps,
>>> +  u32 cap_id)
>>>  {
>>> struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
>>> struct vfio_region_info_cap_fault cap = {
>>> -   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
>>> +   .header.id = cap_id,
>>> .header.version = 1,
>>> .version = 1,
>>> };
>>> @@ -383,6 +410,14 @@ static int
>>> vfio_pci_dma_fault_add_capability(struct
>>> 

RE: [PATCH v11 12/13] vfio/pci: Register a DMA fault response region

2021-02-18 Thread Shameerali Kolothum Thodi
Hi Eric,

> > -Original Message-
> > From: Eric Auger [mailto:eric.au...@redhat.com]
> > Sent: 16 November 2020 11:00
> > To: eric.auger@gmail.com; eric.au...@redhat.com;
> > io...@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> > k...@vger.kernel.org; kvmarm@lists.cs.columbia.edu; w...@kernel.org;
> > j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> > alex.william...@redhat.com
> > Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> > zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> > Thodi ;
> > jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> > nicoleots...@gmail.com; yuzenghui 
> > Subject: [PATCH v11 12/13] vfio/pci: Register a DMA fault response
> > region
> >
> > In preparation for vSVA, let's register a DMA fault response region,
> > where the userspace will push the page responses and increment the
> > head of the buffer. The kernel will pop those responses and inject
> > them on iommu side.
> >
> > Signed-off-by: Eric Auger 
> > ---
> >  drivers/vfio/pci/vfio_pci.c | 114 +---
> >  drivers/vfio/pci/vfio_pci_private.h |   5 ++
> >  drivers/vfio/pci/vfio_pci_rdwr.c|  39 ++
> >  include/uapi/linux/vfio.h   |  32 
> >  4 files changed, 181 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 65a83fd0e8c0..e9a904ce3f0d 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -318,9 +318,20 @@ static void vfio_pci_dma_fault_release(struct
> > vfio_pci_device *vdev,
> > kfree(vdev->fault_pages);
> >  }
> >
> > -static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > -  struct vfio_pci_region *region,
> > -  struct vm_area_struct *vma)
> > +static void
> > +vfio_pci_dma_fault_response_release(struct vfio_pci_device *vdev,
> > +   struct vfio_pci_region *region) {
> > +   if (vdev->dma_fault_response_wq)
> > +   destroy_workqueue(vdev->dma_fault_response_wq);
> > +   kfree(vdev->fault_response_pages);
> > +   vdev->fault_response_pages = NULL;
> > +}
> > +
> > +static int __vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > +struct vfio_pci_region *region,
> > +struct vm_area_struct *vma,
> > +u8 *pages)
> >  {
> > u64 phys_len, req_len, pgoff, req_start;
> > unsigned long long addr;
> > @@ -333,14 +344,14 @@ static int vfio_pci_dma_fault_mmap(struct
> > vfio_pci_device *vdev,
> > ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
> > req_start = pgoff << PAGE_SHIFT;
> >
> > -   /* only the second page of the producer fault region is mmappable */
> > +   /* only the second page of the fault region is mmappable */
> > if (req_start < PAGE_SIZE)
> > return -EINVAL;
> >
> > if (req_start + req_len > phys_len)
> > return -EINVAL;
> >
> > -   addr = virt_to_phys(vdev->fault_pages);
> > +   addr = virt_to_phys(pages);
> > vma->vm_private_data = vdev;
> > vma->vm_pgoff = (addr >> PAGE_SHIFT) + pgoff;
> >
> > @@ -349,13 +360,29 @@ static int vfio_pci_dma_fault_mmap(struct
> > vfio_pci_device *vdev,
> > return ret;
> >  }
> >
> > -static int vfio_pci_dma_fault_add_capability(struct vfio_pci_device *vdev,
> > -struct vfio_pci_region *region,
> > -struct vfio_info_cap *caps)
> > +static int vfio_pci_dma_fault_mmap(struct vfio_pci_device *vdev,
> > +  struct vfio_pci_region *region,
> > +  struct vm_area_struct *vma)
> > +{
> > +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> > vdev->fault_pages);
> > +}
> > +
> > +static int
> > +vfio_pci_dma_fault_response_mmap(struct vfio_pci_device *vdev,
> > +   struct vfio_pci_region *region,
> > +   struct vm_area_struct *vma)
> > +{
> > +   return __vfio_pci_dma_fault_mmap(vdev, region, vma,
> > vdev->fault_response_pages);
> > +}
> > +
> > +static int __vfio_pci_dma_fault_add_capability(struct vfio_pci_device 
> > *vdev,
> > +  struct vfio_pci_region *region,
> > +  struct vfio_info_cap *caps,
> > +  u32 cap_id)
> >  {
> > struct vfio_region_info_cap_sparse_mmap *sparse = NULL;
> > struct vfio_region_info_cap_fault cap = {
> > -   .header.id = VFIO_REGION_INFO_CAP_DMA_FAULT,
> > +   .header.id = cap_id,
> > .header.version = 1,
> > .version = 1,
> > };
> > @@ -383,6 +410,14 @@ static int
> > vfio_pci_dma_fault_add_capability(struct
> > vfio_pci_device *vdev,
> > return ret;
> >  }
> >
> > +static int 

Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Auger Eric
Hi Keqian,

On 2/18/21 9:43 AM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/12 16:55, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/18 19:21, Eric Auger wrote:
 On ARM, MSI are translated by the SMMU. An IOVA is allocated
 for each MSI doorbell. If both the host and the guest are exposed
 with SMMUs, we end up with 2 different IOVAs allocated by each.
 guest allocates an IOVA (gIOVA) to map onto the guest MSI
 doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
 onto the physical doorbell (hDB).

 So we end up with 2 untied mappings:
  S1S2
 gIOVA->gDB
   hIOVA->hDB

 Currently the PCI device is programmed by the host with hIOVA
 as MSI doorbell. So this does not work.

 This patch introduces an API to pass gIOVA/gDB to the host so
 that gIOVA can be reused by the host instead of re-allocating
 a new IOVA. So the goal is to create the following nested mapping:
>>> Does the gDB can be reused under non-nested mode?
>>
>> Under non nested mode the hIOVA is allocated within the MSI reserved
>> region exposed by the SMMU driver, [0x800, 80f]. see
>> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
>> is programmed in the physical device so that the physical SMMU
>> translates it into the physical doorbell (hDB = host physical ITS
> So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
> non-virtualization scenario.
Without virtualization, the host kernel also transparently allocates an
iova to map the doorbell. With standard passthrough withou vIOMMU, the
iova window is different (MSI RESV region).

Thanks

Eric
> 
>> doorbell). The gDB is not used at pIOMMU programming level. It is only
>> used when setting up the KVM irq route.
>>
>> Hope this answers your question.
> Thanks for your explanation!
>>
> 
> Thanks,
> Keqian
> 
>>>

  S1S2
 gIOVA->gDB ->hDB

 and program the PCI device with gIOVA MSI doorbell.

 In case we have several devices attached to this nested domain
 (devices belonging to the same group), they cannot be isolated
 on guest side either. So they should also end up in the same domain
 on guest side. We will enforce that all the devices attached to
 the host iommu domain use the same physical doorbell and similarly
 a single virtual doorbell mapping gets registered (1 single
 virtual doorbell is used on guest as well).

>>> [...]
>>>
 + *
 + * The associated IOVA can be reused by the host to create a nested
 + * stage2 binding mapping translating into the physical doorbell used
 + * by the devices attached to the domain.
 + *
 + * All devices within the domain must share the same physical doorbell.
 + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
 + */
 +
 +int iommu_bind_guest_msi(struct iommu_domain *domain,
 +   dma_addr_t giova, phys_addr_t gpa, size_t size)
 +{
 +  if (unlikely(!domain->ops->bind_guest_msi))
 +  return -ENODEV;
 +
 +  return domain->ops->bind_guest_msi(domain, giova, gpa, size);
 +}
 +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
 +
 +void iommu_unbind_guest_msi(struct iommu_domain *domain,
 +  dma_addr_t iova)
>>> nit: s/iova/giova
>> sure
>>>
 +{
 +  if (unlikely(!domain->ops->unbind_guest_msi))
 +  return;
 +
 +  domain->ops->unbind_guest_msi(domain, iova);
 +}
 +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
 +
>>> [...]
>>>
>>> Thanks,
>>> Keqian
>>>
>>
>> Thanks
>>
>> Eric
>>
>> .
>>
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 06/21] hw/i8042: Refactor trap handler

2021-02-18 Thread Andre Przywara
On Thu, 11 Feb 2021 17:23:13 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> On 12/10/20 2:28 PM, Andre Przywara wrote:
> > With the planned retirement of the special ioport emulation code, we
> > need to provide an emulation function compatible with the MMIO
> > prototype.
> >
> > Adjust the trap handler to use that new function, and provide shims to
> > implement the old ioport interface, for now.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  hw/i8042.c | 68 +++---
> >  1 file changed, 34 insertions(+), 34 deletions(-)
> >
> > diff --git a/hw/i8042.c b/hw/i8042.c
> > index 36ee183f..eb1f9d28 100644
> > --- a/hw/i8042.c
> > +++ b/hw/i8042.c
> > @@ -292,52 +292,52 @@ static void kbd_reset(void)
> > };
> >  }
> >  
> > -/*
> > - * Called when the OS has written to one of the keyboard's ports (0x60 or 
> > 0x64)
> > - */
> > -static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size)
> > +static void kbd_io(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len,
> > +  u8 is_write, void *ptr)
> >  {
> > -   switch (port) {
> > -   case I8042_COMMAND_REG: {
> > -   u8 value = kbd_read_status();
> > -   ioport__write8(data, value);
> > +   u8 value;
> > +
> > +   if (is_write)
> > +   value = ioport__read8(data);
> > +
> > +   switch (addr) {
> > +   case I8042_COMMAND_REG:
> > +   if (is_write)
> > +   kbd_write_command(vcpu->kvm, value);
> > +   else
> > +   value = kbd_read_status();
> > break;
> > -   }
> > -   case I8042_DATA_REG: {
> > -   u8 value = kbd_read_data();
> > -   ioport__write8(data, value);
> > +   case I8042_DATA_REG:
> > +   if (is_write)
> > +   kbd_write_data(value);
> > +   else
> > +   value = kbd_read_data();
> > break;
> > -   }
> > -   case I8042_PORT_B_REG: {
> > -   ioport__write8(data, 0x20);
> > +   case I8042_PORT_B_REG:
> > +   if (!is_write)
> > +   value = 0x20;
> > break;
> > -   }
> > default:
> > -   return false;
> > +   return;  
> 
> Any particular reason for removing the false return value? I don't see it
> mentioned in the commit message. Otherwise this is identical to the two 
> functions
> it replaces.

Because the MMIO handler prototype is void, in contrast to the PIO one.
Since on returning "false" we only seem to panic kvmtool, this is of
quite limited use, I'd say.

> > }
> >  
> > +   if (!is_write)
> > +   ioport__write8(data, value);
> > +}
> > +
> > +/*
> > + * Called when the OS has written to one of the keyboard's ports (0x60 or 
> > 0x64)
> > + */
> > +static bool kbd_in(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size)
> > +{
> > +   kbd_io(vcpu, port, data, size, false, NULL);  
> 
> is_write is an u8, not a bool.

Right, will fix this.

> I never could wrap my head around the ioport convention of "in" (read) and 
> "out"
> (write). To be honest, changing is_write changed to an enum so it's crystal 
> clear
> what is happening would help with that a lot, but I guess that's a separate 
> patch.

"in" and "out" are the x86 assembly mnemonics, but it's indeed
confusing, because the device side has a different view (CPU "in" means
pushing data "out" of the device). I usually look at the code to see
what it's actually meant to do.
So yeah, I feel like a lot of those device emulations could use
some update. but that's indeed something for another day.

Cheers,
Andre

> > +
> > return true;
> >  }
> >  
> >  static bool kbd_out(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size)
> >  {
> > -   switch (port) {
> > -   case I8042_COMMAND_REG: {
> > -   u8 value = ioport__read8(data);
> > -   kbd_write_command(vcpu->kvm, value);
> > -   break;
> > -   }
> > -   case I8042_DATA_REG: {
> > -   u8 value = ioport__read8(data);
> > -   kbd_write_data(value);
> > -   break;
> > -   }
> > -   case I8042_PORT_B_REG: {
> > -   break;
> > -   }
> > -   default:
> > -   return false;
> > -   }
> > +   kbd_io(vcpu, port, data, size, true, NULL);
> >  
> > return true;
> >  }  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v13 02/15] iommu: Introduce bind/unbind_guest_msi

2021-02-18 Thread Keqian Zhu
Hi Eric,

On 2021/2/12 16:55, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/1/21 12:52 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/18 19:21, Eric Auger wrote:
>>> On ARM, MSI are translated by the SMMU. An IOVA is allocated
>>> for each MSI doorbell. If both the host and the guest are exposed
>>> with SMMUs, we end up with 2 different IOVAs allocated by each.
>>> guest allocates an IOVA (gIOVA) to map onto the guest MSI
>>> doorbell (gDB). The Host allocates another IOVA (hIOVA) to map
>>> onto the physical doorbell (hDB).
>>>
>>> So we end up with 2 untied mappings:
>>>  S1S2
>>> gIOVA->gDB
>>>   hIOVA->hDB
>>>
>>> Currently the PCI device is programmed by the host with hIOVA
>>> as MSI doorbell. So this does not work.
>>>
>>> This patch introduces an API to pass gIOVA/gDB to the host so
>>> that gIOVA can be reused by the host instead of re-allocating
>>> a new IOVA. So the goal is to create the following nested mapping:
>> Does the gDB can be reused under non-nested mode?
> 
> Under non nested mode the hIOVA is allocated within the MSI reserved
> region exposed by the SMMU driver, [0x800, 80f]. see
> iommu_dma_prepare_msi/iommu_dma_get_msi_page in dma_iommu.c. this hIOVA
> is programmed in the physical device so that the physical SMMU
> translates it into the physical doorbell (hDB = host physical ITS
So, AFAIU, under non-nested mode, at smmu side, we reuse the workflow of 
non-virtualization scenario.

> doorbell). The gDB is not used at pIOMMU programming level. It is only
> used when setting up the KVM irq route.
> 
> Hope this answers your question.
Thanks for your explanation!
> 

Thanks,
Keqian

>>
>>>
>>>  S1S2
>>> gIOVA->gDB ->hDB
>>>
>>> and program the PCI device with gIOVA MSI doorbell.
>>>
>>> In case we have several devices attached to this nested domain
>>> (devices belonging to the same group), they cannot be isolated
>>> on guest side either. So they should also end up in the same domain
>>> on guest side. We will enforce that all the devices attached to
>>> the host iommu domain use the same physical doorbell and similarly
>>> a single virtual doorbell mapping gets registered (1 single
>>> virtual doorbell is used on guest as well).
>>>
>> [...]
>>
>>> + *
>>> + * The associated IOVA can be reused by the host to create a nested
>>> + * stage2 binding mapping translating into the physical doorbell used
>>> + * by the devices attached to the domain.
>>> + *
>>> + * All devices within the domain must share the same physical doorbell.
>>> + * A single MSI GIOVA/GPA mapping can be attached to an iommu_domain.
>>> + */
>>> +
>>> +int iommu_bind_guest_msi(struct iommu_domain *domain,
>>> +dma_addr_t giova, phys_addr_t gpa, size_t size)
>>> +{
>>> +   if (unlikely(!domain->ops->bind_guest_msi))
>>> +   return -ENODEV;
>>> +
>>> +   return domain->ops->bind_guest_msi(domain, giova, gpa, size);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_bind_guest_msi);
>>> +
>>> +void iommu_unbind_guest_msi(struct iommu_domain *domain,
>>> +   dma_addr_t iova)
>> nit: s/iova/giova
> sure
>>
>>> +{
>>> +   if (unlikely(!domain->ops->unbind_guest_msi))
>>> +   return;
>>> +
>>> +   domain->ops->unbind_guest_msi(domain, iova);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unbind_guest_msi);
>>> +
>> [...]
>>
>> Thanks,
>> Keqian
>>
> 
> Thanks
> 
> Eric
> 
> .
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm