Re: [PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-09 Thread Wei Liu
On Wed, Apr 08, 2020 at 01:58:59PM +0200, Christoph Hellwig wrote:
> Use the designated helper for allocating executable kernel memory, and
> remove the now unused PAGE_KERNEL_RX define.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Wei Liu 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 21/29] mm: remove the pgprot argument to __vmalloc

2020-04-14 Thread Wei Liu
On Tue, Apr 14, 2020 at 03:13:40PM +0200, Christoph Hellwig wrote:
> The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
> it.
> 
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Michael Kelley  [hyperv]
> Acked-by: Gao Xiang  [erofs]
> Acked-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/hyperv/hv_init.c  |  3 +--
[...]
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 5a4b363ba67b..a3d689dfc745 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -95,8 +95,7 @@ static int hv_cpu_init(unsigned int cpu)
>* not be stopped in the case of CPU offlining and the VM will hang.
>*/
>   if (!*hvp) {
> - *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
> -  PAGE_KERNEL);
> + *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
>   }

Acked-by: Wei Liu 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/hyper-v: Constify hyperv_ir_domain_ops

2020-05-26 Thread Wei Liu
On Mon, May 25, 2020 at 11:49:57PM +0200, Rikard Falkeborn wrote:
> The struct hyperv_ir_domain_ops is not modified and can be made const to
> allow the compiler to put it in read-only memory.
> 
> Before:
>textdata bss dec hex filename
>29161180112052161460 drivers/iommu/hyperv-iommu.o
> 
> After:
>textdata bss dec hex filename
>30441052112052161460 drivers/iommu/hyperv-iommu.o
> 
> Signed-off-by: Rikard Falkeborn 

Acked-by: Wei Liu 

> ---
>  drivers/iommu/hyperv-iommu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index a386b83e0e34..3c0c67a99c7b 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -131,7 +131,7 @@ static int hyperv_irq_remapping_activate(struct 
> irq_domain *domain,
>   return 0;
>  }
>  
> -static struct irq_domain_ops hyperv_ir_domain_ops = {
> +static const struct irq_domain_ops hyperv_ir_domain_ops = {
>   .alloc = hyperv_irq_remapping_alloc,
>   .free = hyperv_irq_remapping_free,
>   .activate = hyperv_irq_remapping_activate,
> -- 
> 2.26.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 14/46] x86/ioapic: Consolidate IOAPIC allocation

2020-09-08 Thread Wei Liu
On Wed, Aug 26, 2020 at 01:16:42PM +0200, Thomas Gleixner wrote:
...
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -101,7 +101,7 @@ static int hyperv_irq_remapping_alloc(st
>* in the chip_data and hyperv_irq_remapping_activate()/hyperv_ir_set_
>* affinity() set vector and dest_apicid directly into IO-APIC entry.
>*/
> - irq_data->chip_data = info->ioapic_entry;
> + irq_data->chip_data = info->ioapic.entry;

Not sure if it is required for such a trivial change but here you go:

Acked-by: Wei Liu 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 18/46] x86/msi: Consolidate MSI allocation

2020-09-08 Thread Wei Liu
On Wed, Aug 26, 2020 at 01:16:46PM +0200, Thomas Gleixner wrote:
[...]
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1534,7 +1534,7 @@ static struct irq_chip hv_msi_irq_chip =
>  static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info 
> *info,
>  msi_alloc_info_t *arg)
>  {
> - return arg->msi_hwirq;
> + return arg->hwirq;
>  }

Acked-by: Wei Liu 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/2] Fix coherence for VMbus and PCI pass-thru devices in Hyper-V VM

2022-03-29 Thread Wei Liu
On Thu, Mar 24, 2022 at 09:14:50AM -0700, Michael Kelley wrote:
> Hyper-V VMs have VMbus synthetic devices and PCI pass-thru devices that are 
> added
> dynamically via the VMbus protocol and are not represented in the ACPI DSDT. 
> Only
> the top level VMbus node exists in the DSDT. As such, on ARM64 these devices 
> don't
> pick up coherence information and default to not hardware coherent.  This 
> results
> in extra software coherence management overhead since the synthetic devices 
> are
> always hardware coherent. PCI pass-thru devices are also hardware coherent in 
> all
> current usage scenarios.
> 
> Fix this by propagating coherence information from the top level VMbus node in
> the DSDT to all VMbus synthetic devices and PCI pass-thru devices. While 
> smaller
> granularity of control would be better, basing on the VMbus node in the DSDT
> gives as escape path if a future scenario arises with devices that are not
> hardware coherent.
> 
> Changes since v2:
> * Move coherence propagation for VMbus synthetic devices to a separate
>   .dma_configure function instead of the .probe fucntion [Robin Murphy]
> 
> Changes since v1:
> * Use device_get_dma_attr() instead of acpi_get_dma_attr(), eliminating the
>   need to export acpi_get_dma_attr() [Robin Murphy]
> * Use arch_setup_dma_ops() to set device coherence [Robin Murphy]
> * Move handling of missing _CCA to vmbus_acpi_add() so it is only done once
> * Rework handling of PCI devices so existing code in pci_dma_configure()
>   just works
> 
> Michael Kelley (2):
>   Drivers: hv: vmbus: Propagate VMbus coherence to each VMbus device
>   PCI: hv: Propagate coherence from VMbus device to PCI device

Patch 2 will not be very useful without patch 1 so I've applied the
whole series to hyperv-fixes. Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-03 Thread Wei Liu
On Wed, Jan 27, 2021 at 05:47:08AM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> > 
> > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > domain to handle mapping and unmapping of IO-APIC interrupts.
> > 
> > Signed-off-by: Wei Liu 
> > ---
> >  arch/x86/hyperv/irqdomain.c |  54 ++
> >  arch/x86/include/asm/mshyperv.h |   4 +
> >  drivers/iommu/hyperv-iommu.c| 179 +++-
> >  3 files changed, 233 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > index 19637cd60231..8e2b4e478b70 100644
> > --- a/arch/x86/hyperv/irqdomain.c
> > +++ b/arch/x86/hyperv/irqdomain.c
> > @@ -330,3 +330,57 @@ struct irq_domain * __init 
> > hv_create_pci_msi_domain(void)
> >  }
> > 
> >  #endif /* CONFIG_PCI_MSI */
> > +
> > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > *entry)
> > +{
> > +   union hv_device_id device_id;
> > +
> > +   device_id.as_uint64 = 0;
> > +   device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > +   device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > +
> > +   return hv_unmap_interrupt(device_id.as_uint64, entry) & 
> > HV_HYPERCALL_RESULT_MASK;
> 
> The masking is already done in hv_unmap_interrupt.

Fixed.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > +
> > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > vector,
> > +   struct hv_interrupt_entry *entry)
> > +{
> > +   unsigned long flags;
> > +   struct hv_input_map_device_interrupt *input;
> > +   struct hv_output_map_device_interrupt *output;
> > +   union hv_device_id device_id;
> > +   struct hv_device_interrupt_descriptor *intr_desc;
> > +   u16 status;
> > +
> > +   device_id.as_uint64 = 0;
> > +   device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > +   device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > +
> > +   local_irq_save(flags);
> > +   input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +   output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > +   memset(input, 0, sizeof(*input));
> > +   intr_desc = &input->interrupt_descriptor;
> > +   input->partition_id = hv_current_partition_id;
> > +   input->device_id = device_id.as_uint64;
> > +   intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > +   intr_desc->target.vector = vector;
> > +   intr_desc->vector_count = 1;
> > +
> > +   if (level)
> > +   intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > +   else
> > +   intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > +
> > +   __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > +
> > +   status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, 
> > output) &
> > +HV_HYPERCALL_RESULT_MASK;
> > +   local_irq_restore(flags);
> > +
> > +   *entry = output->interrupt_entry;
> > +
> > +   return status;
> 
> As a cross-check, I was comparing this code against hv_map_msi_interrupt().  
> They are
> mostly parallel, though some of the assignments are done in a different 
> order.  It's a nit,
> but making them as parallel as possible would be nice. :-)
> 

Indeed. I will see about factoring out a function.

> Same 64 vCPU comment applies here as well.
> 

This is changed to use vpset instead. Took me a bit of time to get it
working because document is a bit lacking.

> 
> > +}
> > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > diff --git a/arch/x86/include/asm/mshyperv.h 
> > b/arch/x86/include/asm/mshyperv.h
> > index ccc849e25d5e..345d7c6f8c37 100644
> > --- a/arch/x86/include/asm/mshyperv.h
> > +++ b/arch/x86/include/asm/mshyperv.h
> > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > hv_msi_entry *msi_entry,
> > 
> >  struct irq_domain *hv_create_pci_msi_domain(void);
> > 
> > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > vector,
> > +   struct hv_interrupt_entry *entry);
> > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > *entry);
> > +
> >  #else /* CONFIG_HYPERV */
> >  static inline void hyperv_init(void) {}
> >  static inline void hyperv_setup_mmu_ops(void) {}
> > diff --

[PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-03 Thread Wei Liu
Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
Hypervisor when Linux runs as the root partition. Implement an IRQ
domain to handle mapping and unmapping of IO-APIC interrupts.

Signed-off-by: Wei Liu 
---
v6:
1. Simplify code due to changes in a previous patch.
---
 arch/x86/hyperv/irqdomain.c |  25 +
 arch/x86/include/asm/mshyperv.h |   4 +
 drivers/iommu/hyperv-iommu.c| 177 +++-
 3 files changed, 203 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 117f17e8c88a..0cabc9aece38 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -360,3 +360,28 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
 }
 
 #endif /* CONFIG_PCI_MSI */
+
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
+{
+   union hv_device_id device_id;
+
+   device_id.as_uint64 = 0;
+   device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+   device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+   return hv_unmap_interrupt(device_id.as_uint64, entry);
+}
+EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
+
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int cpu, int vector,
+   struct hv_interrupt_entry *entry)
+{
+   union hv_device_id device_id;
+
+   device_id.as_uint64 = 0;
+   device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+   device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+   return hv_map_interrupt(device_id, level, cpu, vector, entry);
+}
+EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccc849e25d5e..345d7c6f8c37 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union 
hv_msi_entry *msi_entry,
 
 struct irq_domain *hv_create_pci_msi_domain(void);
 
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+   struct hv_interrupt_entry *entry);
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1d21a0b5f724..e285a220c913 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -115,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = 
{
.free = hyperv_irq_remapping_free,
 };
 
+static const struct irq_domain_ops hyperv_root_ir_domain_ops;
 static int __init hyperv_prepare_irq_remapping(void)
 {
struct fwnode_handle *fn;
int i;
+   const char *name;
+   const struct irq_domain_ops *ops;
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
x86_init.hyper.msi_ext_dest_id() ||
!x2apic_supported())
return -ENODEV;
 
-   fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+   if (hv_root_partition) {
+   name = "HYPERV-ROOT-IR";
+   ops = &hyperv_root_ir_domain_ops;
+   } else {
+   name = "HYPERV-IR";
+   ops = &hyperv_ir_domain_ops;
+   }
+
+   fn = irq_domain_alloc_named_id_fwnode(name, 0);
if (!fn)
return -ENOMEM;
 
ioapic_ir_domain =
irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
-   0, IOAPIC_REMAPPING_ENTRY, fn,
-   &hyperv_ir_domain_ops, NULL);
+   0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
 
if (!ioapic_ir_domain) {
irq_domain_free_fwnode(fn);
return -ENOMEM;
}
 
+   if (hv_root_partition)
+   return 0; /* The rest is only relevant to guests */
+
/*
 * Hyper-V doesn't provide irq remapping function for
 * IO-APIC and so IO-APIC only accepts 8-bit APIC ID.
@@ -166,4 +180,161 @@ struct irq_remap_ops hyperv_irq_remap_ops = {
.enable = hyperv_enable_irq_remapping,
 };
 
+/* IRQ remapping domain when Linux runs as the root partition */
+struct hyperv_root_ir_data {
+   u8 ioapic_id;
+   bool is_level;
+   struct hv_interrupt_entry entry;
+};
+
+static void
+hyperv_root_ir_compose_msi_msg(struct irq_data *irq_data, struct msi_msg *msg)
+{
+   u64 status;
+   u32 vector;
+   struct irq_cfg *cfg;
+   int ioapic_id;
+   struct cpumask *affinity;
+   int cpu;
+   struct hv_interrupt_entry entry;
+   struct hyperv_root_ir_data *data = irq_data->chip_data;
+   struct IO_APIC_route_entry e;
+
+   cfg = irqd_cfg(irq_data);
+   affinity =

Re: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Wei Liu
On Thu, Feb 04, 2021 at 04:41:47PM +, Michael Kelley wrote:
> From: Wei Liu  Sent: Wednesday, February 3, 2021 4:47 AM
[...]
> > > > +
> > > > +   if (level)
> > > > +   intr_desc->trigger_mode = 
> > > > HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > > +   else
> > > > +   intr_desc->trigger_mode = 
> > > > HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > > +
> > > > +   __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > > +
> > > > +   status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, 
> > > > input,
> > output) &
> > > > +HV_HYPERCALL_RESULT_MASK;
> > > > +   local_irq_restore(flags);
> > > > +
> > > > +   *entry = output->interrupt_entry;
> > > > +
> > > > +   return status;
> > >
> > > As a cross-check, I was comparing this code against 
> > > hv_map_msi_interrupt().  They are
> > > mostly parallel, though some of the assignments are done in a different 
> > > order.  It's a nit,
> > > but making them as parallel as possible would be nice. :-)
> > >
> > 
> > Indeed. I will see about factoring out a function.
> 
> If factoring out a separate helper function is clumsy, just having the 
> parallel code
> in the two functions be as similar as possible makes it easier to see what's 
> the
> same and what's different.
> 

No. It is not clumsy at all. I've done it in the newly posted v6.

I was baffled why I wrote hv_unmap_interrupt helper to be used by both
hv_unmap_ioapic_interrupt and hv_unmap_msi_interrupt in the previous
patch, but didn't write a hv_map_interrupt. Maybe I didn't have enough
coffee that day. :-/

Thanks for pointing out that issue. It definitely helped improve the
quality of this series.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v1 3/8] intel/vt-d: make DMAR table parsing code more flexible

2021-07-09 Thread Wei Liu
Microsoft Hypervisor provides a set of hypercalls to manage device
domains. The root kernel should parse the DMAR so that it can program
the IOMMU (with hypercalls) correctly.

The DMAR code was designed to work with Intel IOMMU only. Add two more
parameters to make it useful to Microsoft Hypervisor. Microsoft
Hypervisor does not need the DMAR parsing code to allocate an Intel
IOMMU structure; it also wishes to always reparse the DMAR table even
after it has been parsed before.

Adjust Intel IOMMU code to use the new dmar_table_init. There should be
no functional change to Intel IOMMU code.

Signed-off-by: Wei Liu 
---
We may be able to combine alloc and force_parse?
---
 drivers/iommu/intel/dmar.c  | 38 -
 drivers/iommu/intel/iommu.c |  2 +-
 drivers/iommu/intel/irq_remapping.c |  2 +-
 include/linux/dmar.h|  2 +-
 4 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 84057cb9596c..bd72f47c728b 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -408,7 +408,8 @@ dmar_find_dmaru(struct acpi_dmar_hardware_unit *drhd)
  * structure which uniquely represent one DMA remapping hardware unit
  * present in the platform
  */
-static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)
+static int dmar_parse_one_drhd_internal(struct acpi_dmar_header *header,
+   void *arg, bool alloc)
 {
struct acpi_dmar_hardware_unit *drhd;
struct dmar_drhd_unit *dmaru;
@@ -440,12 +441,14 @@ static int dmar_parse_one_drhd(struct acpi_dmar_header 
*header, void *arg)
return -ENOMEM;
}
 
-   ret = alloc_iommu(dmaru);
-   if (ret) {
-   dmar_free_dev_scope(&dmaru->devices,
-   &dmaru->devices_cnt);
-   kfree(dmaru);
-   return ret;
+   if (alloc) {
+   ret = alloc_iommu(dmaru);
+   if (ret) {
+   dmar_free_dev_scope(&dmaru->devices,
+   &dmaru->devices_cnt);
+   kfree(dmaru);
+   return ret;
+   }
}
dmar_register_drhd_unit(dmaru);
 
@@ -456,6 +459,16 @@ static int dmar_parse_one_drhd(struct acpi_dmar_header 
*header, void *arg)
return 0;
 }
 
+static int dmar_parse_one_drhd(struct acpi_dmar_header *header, void *arg)
+{
+   return dmar_parse_one_drhd_internal(header, arg, true);
+}
+
+int dmar_parse_one_drhd_noalloc(struct acpi_dmar_header *header, void *arg)
+{
+   return dmar_parse_one_drhd_internal(header, arg, false);
+}
+
 static void dmar_free_drhd(struct dmar_drhd_unit *dmaru)
 {
if (dmaru->devices && dmaru->devices_cnt)
@@ -633,7 +646,7 @@ static inline int dmar_walk_dmar_table(struct 
acpi_table_dmar *dmar,
  * parse_dmar_table - parses the DMA reporting table
  */
 static int __init
-parse_dmar_table(void)
+parse_dmar_table(bool alloc)
 {
struct acpi_table_dmar *dmar;
int drhd_count = 0;
@@ -650,6 +663,9 @@ parse_dmar_table(void)
.cb[ACPI_DMAR_TYPE_SATC] = &dmar_parse_one_satc,
};
 
+   if (!alloc)
+   cb.cb[ACPI_DMAR_TYPE_HARDWARE_UNIT] = 
&dmar_parse_one_drhd_noalloc;
+
/*
 * Do it again, earlier dmar_tbl mapping could be mapped with
 * fixed map.
@@ -840,13 +856,13 @@ void __init dmar_register_bus_notifier(void)
 }
 
 
-int __init dmar_table_init(void)
+int __init dmar_table_init(bool alloc, bool force_parse)
 {
static int dmar_table_initialized;
int ret;
 
-   if (dmar_table_initialized == 0) {
-   ret = parse_dmar_table();
+   if (dmar_table_initialized == 0 || force_parse) {
+   ret = parse_dmar_table(alloc);
if (ret < 0) {
if (ret != -ENODEV)
pr_info("Parse DMAR table failure.\n");
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index be35284a2016..a4294d310b93 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4310,7 +4310,7 @@ int __init intel_iommu_init(void)
}
 
down_write(&dmar_global_lock);
-   if (dmar_table_init()) {
+   if (dmar_table_init(true, false)) {
if (force_on)
panic("tboot: Failed to initialize DMAR table\n");
goto out_free_dmar;
diff --git a/drivers/iommu/intel/irq_remapping.c 
b/drivers/iommu/intel/irq_remapping.c
index f912fe45bea2..0e8abef862e4 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -732,7 +732,7 @@ static int __init intel_prepare_irq_remapping(void)
return -ENODEV;
}
 
-   if (dmar_table_init() < 0)
+   if (dmar_table_init

[RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-07-09 Thread Wei Liu
Some devices may have been claimed by the hypervisor already. One such
example is a user can assign a NIC for debugging purpose.

Ideally Linux should be able to tell retrieve that information, but
there is no way to do that yet. And designing that new mechanism is
going to take time.

Provide a command line option for skipping devices. This is a stopgap
solution, so it is intentionally undocumented. Hopefully we can retire
it in the future.

Signed-off-by: Wei Liu 
---
 drivers/iommu/hyperv-iommu.c | 45 
 1 file changed, 45 insertions(+)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 043dcff06511..353da5036387 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -349,6 +349,16 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
 
 #ifdef CONFIG_HYPERV_ROOT_PVIOMMU
 
+/* The IOMMU will not claim these PCI devices. */
+static char *pci_devs_to_skip;
+static int __init mshv_iommu_setup_skip(char *str) {
+   pci_devs_to_skip = str;
+
+   return 0;
+}
+/* mshv_iommu_skip=(:BB:DD.F)(:BB:DD.F) */
+__setup("mshv_iommu_skip=", mshv_iommu_setup_skip);
+
 /* DMA remapping support */
 struct hv_iommu_domain {
struct iommu_domain domain;
@@ -774,6 +784,41 @@ static struct iommu_device *hv_iommu_probe_device(struct 
device *dev)
if (!dev_is_pci(dev))
return ERR_PTR(-ENODEV);
 
+   /*
+* Skip the PCI device specified in `pci_devs_to_skip`. This is a
+* temporary solution until we figure out a way to extract information
+* from the hypervisor what devices it is already using.
+*/
+   if (pci_devs_to_skip && *pci_devs_to_skip) {
+   int pos = 0;
+   int parsed;
+   int segment, bus, slot, func;
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   do {
+   parsed = 0;
+
+   sscanf(pci_devs_to_skip + pos,
+   " (%x:%x:%x.%x) %n",
+   &segment, &bus, &slot, &func, &parsed);
+
+   if (parsed <= 0)
+   break;
+
+   if (pci_domain_nr(pdev->bus) == segment &&
+   pdev->bus->number == bus &&
+   PCI_SLOT(pdev->devfn) == slot &&
+   PCI_FUNC(pdev->devfn) == func)
+   {
+   dev_info(dev, "skipped by MSHV IOMMU\n");
+   return ERR_PTR(-ENODEV);
+   }
+
+   pos += parsed;
+
+   } while (pci_devs_to_skip[pos]);
+   }
+
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev)
return ERR_PTR(-ENOMEM);
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v1 4/8] intel/vt-d: export intel_iommu_get_resv_regions

2021-07-09 Thread Wei Liu
When Microsoft Hypervisor runs on Intel platforms it needs to know the
reserved regions to program devices correctly. There is no reason to
duplicate intel_iommu_get_resv_regions. Export it.

Signed-off-by: Wei Liu 
---
 drivers/iommu/intel/iommu.c | 5 +++--
 include/linux/intel-iommu.h | 4 
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a4294d310b93..01973bc20080 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5176,8 +5176,8 @@ static void intel_iommu_probe_finalize(struct device *dev)
set_dma_ops(dev, NULL);
 }
 
-static void intel_iommu_get_resv_regions(struct device *device,
-struct list_head *head)
+void intel_iommu_get_resv_regions(struct device *device,
+struct list_head *head)
 {
int prot = DMA_PTE_READ | DMA_PTE_WRITE;
struct iommu_resv_region *reg;
@@ -5232,6 +5232,7 @@ static void intel_iommu_get_resv_regions(struct device 
*device,
return;
list_add_tail(®->list, head);
 }
+EXPORT_SYMBOL_GPL(intel_iommu_get_resv_regions);
 
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev)
 {
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 03faf20a6817..f91869f765bc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -814,6 +814,8 @@ extern int iommu_calculate_max_sagaw(struct intel_iommu 
*iommu);
 extern int dmar_disabled;
 extern int intel_iommu_enabled;
 extern int intel_iommu_gfx_mapped;
+extern void intel_iommu_get_resv_regions(struct device *device,
+struct list_head *head);
 #else
 static inline int iommu_calculate_agaw(struct intel_iommu *iommu)
 {
@@ -825,6 +827,8 @@ static inline int iommu_calculate_max_sagaw(struct 
intel_iommu *iommu)
 }
 #define dmar_disabled  (1)
 #define intel_iommu_enabled (0)
+static inline void intel_iommu_get_resv_regions(struct device *device,
+struct list_head *head) {}
 #endif
 
 #endif
-- 
2.30.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC v1 5/8] mshv: add paravirtualized IOMMU support

2021-07-09 Thread Wei Liu
Microsoft Hypervisor provides a set of hypercalls to manage device
domains. Implement a type-1 IOMMU using those hypercalls.

Implement DMA remapping as the first step for this driver. Interrupt
remapping will come in a later stage.

Signed-off-by: Wei Liu 
---
 drivers/iommu/Kconfig|  14 +
 drivers/iommu/hyperv-iommu.c | 628 +++
 2 files changed, 642 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1f111b399bca..e230c4536825 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -397,6 +397,20 @@ config HYPERV_IOMMU
  Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux
  guests to run with x2APIC mode enabled.
 
+config HYPERV_ROOT_PVIOMMU
+   bool "Microsoft Hypervisor paravirtualized IOMMU support"
+   depends on HYPERV && X86
+   select IOMMU_API
+   select IOMMU_DMA
+   select DMA_OPS
+   select IOMMU_IOVA
+   select INTEL_IOMMU
+   select DMAR_TABLE
+   default HYPERV
+   help
+ A paravirtualized IOMMU for Microsoft Hypervisor. It supports DMA
+ remapping.
+
 config VIRTIO_IOMMU
tristate "Virtio IOMMU driver"
depends on VIRTIO
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..043dcff06511 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -12,7 +12,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -21,6 +26,9 @@
 #include 
 #include 
 #include 
+#include 
+
+#include 
 
 #include "irq_remapping.h"
 
@@ -338,3 +346,623 @@ static const struct irq_domain_ops 
hyperv_root_ir_domain_ops = {
 };
 
 #endif
+
+#ifdef CONFIG_HYPERV_ROOT_PVIOMMU
+
+/* DMA remapping support */
+struct hv_iommu_domain {
+   struct iommu_domain domain;
+   struct hv_iommu_dev *hv_iommu;
+
+   struct hv_input_device_domain device_domain;
+
+   spinlock_t mappings_lock;
+   struct rb_root_cached mappings;
+
+   u32 map_flags;
+   u64 pgsize_bitmap;
+};
+
+/* What hardware IOMMU? */
+enum {
+   INVALID = 0,
+   INTEL, /* Only Intel for now */
+} hw_iommu_type = { INVALID };
+
+static struct hv_iommu_domain hv_identity_domain, hv_null_domain;
+
+#define to_hv_iommu_domain(d) \
+   container_of(d, struct hv_iommu_domain, domain)
+
+struct hv_iommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node iova;
+   u32 flags;
+};
+
+struct hv_iommu_dev {
+   struct iommu_device iommu;
+
+   struct ida domain_ids;
+
+   /* Device configuration */
+   struct iommu_domain_geometry geometry;
+   u64 first_domain;
+   u64 last_domain;
+
+   u32 map_flags;
+   u64 pgsize_bitmap;
+};
+
+static struct hv_iommu_dev *hv_iommu_device;
+
+struct hv_iommu_endpoint {
+   struct device *dev;
+   struct hv_iommu_dev *hv_iommu;
+   struct hv_iommu_domain *domain;
+};
+
+static void __init hv_initialize_special_domains(void)
+{
+   struct hv_iommu_domain *domain;
+
+   /* Default passthrough domain */
+   domain = &hv_identity_domain;
+
+   memset(domain, 0, sizeof(*domain));
+
+   domain->device_domain.partition_id = HV_PARTITION_ID_SELF;
+   domain->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;
+   domain->device_domain.domain_id.id = HV_DEVICE_DOMAIN_ID_S2_DEFAULT;
+
+   domain->domain.geometry = hv_iommu_device->geometry;
+
+   /* NULL domain that blocks all DMA transactions */
+   domain = &hv_null_domain;
+
+   memset(domain, 0, sizeof(*domain));
+
+   domain->device_domain.partition_id = HV_PARTITION_ID_SELF;
+   domain->device_domain.domain_id.type = HV_DEVICE_DOMAIN_TYPE_S2;
+   domain->device_domain.domain_id.id = HV_DEVICE_DOMAIN_ID_S2_NULL;
+
+   domain->domain.geometry = hv_iommu_device->geometry;
+}
+
+static bool is_identity_domain(struct hv_iommu_domain *d)
+{
+   return d->device_domain.domain_id.id == HV_DEVICE_DOMAIN_ID_S2_DEFAULT;
+}
+
+static bool is_null_domain(struct hv_iommu_domain *d)
+{
+   return d->device_domain.domain_id.id == HV_DEVICE_DOMAIN_ID_S2_NULL;
+}
+
+static struct iommu_domain *hv_iommu_domain_alloc(unsigned type)
+{
+   struct hv_iommu_domain *domain;
+   int ret;
+   u64 status;
+   unsigned long flags;
+   struct hv_input_create_device_domain *input;
+
+   if (type == IOMMU_DOMAIN_IDENTITY)
+   return &hv_identity_domain.domain;
+
+   if (type == IOMMU_DOMAIN_BLOCKED)
+   return &hv_null_domain.domain;
+
+   domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+   if (!domain)
+   goto out;
+
+   spin_lock_init(&domain->mappings_lock);
+   domain->mappings = RB_ROOT_CACHED;
+
+   if (type == IOMMU_DOMAIN_DMA &&
+  

Re: [RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-07-09 Thread Wei Liu
On Fri, Jul 09, 2021 at 01:46:19PM +0100, Robin Murphy wrote:
> On 2021-07-09 12:43, Wei Liu wrote:
> > Some devices may have been claimed by the hypervisor already. One such
> > example is a user can assign a NIC for debugging purpose.
> > 
> > Ideally Linux should be able to tell retrieve that information, but
> > there is no way to do that yet. And designing that new mechanism is
> > going to take time.
> > 
> > Provide a command line option for skipping devices. This is a stopgap
> > solution, so it is intentionally undocumented. Hopefully we can retire
> > it in the future.
> 
> Huh? If the host is using a device, why the heck is it exposing any
> knowledge of that device to the guest at all, let alone allowing the guest
> to do anything that could affect its operation!?

The host in this setup consists of the hypervisor, the root kernel and a
bunch of user space programs.

Root is not an ordinary guest. It does need to know all the hardware to
manage the platform. Hypervisor does not claim more devices than it
needs to, nor does it try to hide hardware details from the root.

The hypervisor can protect itself just fine. Any attempt to use the
already claimed devices will be blocked or rejected, so are the attempts
to attach them to device domains.

That, however, leads to some interesting interactions between the
hypervisor and Linux kernel.  When kernel initializes IOMMU during boot,
it will try to attach all devices in one go. Any failure there will
cause kernel to detach the already attached devices. That's not fatal to
kernel, and is only a minor annoyance to our current use case, because
the default domain is a passthrough domain anyway. It will become
problematic once we switch the default domain to a DMA domain to further
tighten security during Linux boot.

Wei.

> 
> Robin.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 3/8] intel/vt-d: make DMAR table parsing code more flexible

2021-07-09 Thread Wei Liu
On Fri, Jul 09, 2021 at 01:56:46PM +0100, Robin Murphy wrote:
> On 2021-07-09 12:43, Wei Liu wrote:
> > Microsoft Hypervisor provides a set of hypercalls to manage device
> > domains. The root kernel should parse the DMAR so that it can program
> > the IOMMU (with hypercalls) correctly.
> > 
> > The DMAR code was designed to work with Intel IOMMU only. Add two more
> > parameters to make it useful to Microsoft Hypervisor. Microsoft
> > Hypervisor does not need the DMAR parsing code to allocate an Intel
> > IOMMU structure; it also wishes to always reparse the DMAR table even
> > after it has been parsed before.
> 
> We've recently defined the VIOT table for describing paravirtualised IOMMUs
> - would it make more sense to extend that to support the Microsoft
> implementation than to abuse a hardware-specific table? Am I right in

I searched for VIOT and believed I found the correct link
https://lwn.net/Articles/859291/. My understanding is based on the
reading of that series.

VIOT is useful. I think it solves the problem for guests.

It does not solve the problem we have though. The DMAR tables are not
conjured up by some backend software running on the host side. They are
the real tables provided by the firmware. The kernel here is part of the
host setup, dealing with physical hardware.

No matter how much I wish all vendors unified their tables, I don't see
how that's going to happen for readily available servers. :-(

> assuming said hypervisor isn't intended to only ever run on Intel hardware?

Yes, that's correct. We also plan to add support AMD and ARM64.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 4/8] intel/vt-d: export intel_iommu_get_resv_regions

2021-07-09 Thread Wei Liu
On Fri, Jul 09, 2021 at 10:17:25PM +0800, Lu Baolu wrote:
> On 2021/7/9 19:43, Wei Liu wrote:
> > When Microsoft Hypervisor runs on Intel platforms it needs to know the
> > reserved regions to program devices correctly. There is no reason to
> > duplicate intel_iommu_get_resv_regions. Export it.
> 
> Why not using iommu_get_resv_regions()?

That calls into ops->get_resv_regions.

In this patch series, get_resv_regions is hv_iommu_resv_regions, which
wants to use intel_iommu_get_resv_regions when it detects the underlying
hardware platform is from Intel.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 5/8] mshv: add paravirtualized IOMMU support

2021-08-03 Thread Wei Liu
On Wed, Aug 04, 2021 at 12:10:45AM +0530, Praveen Kumar wrote:
> On 09-07-2021 17:13, Wei Liu wrote:
> > +static void hv_iommu_domain_free(struct iommu_domain *d)
> > +{
> > +   struct hv_iommu_domain *domain = to_hv_iommu_domain(d);
> > +   unsigned long flags;
> > +   u64 status;
> > +   struct hv_input_delete_device_domain *input;
> > +
> > +   if (is_identity_domain(domain) || is_null_domain(domain))
> > +   return;
> > +
> > +   local_irq_save(flags);
> > +   input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +   memset(input, 0, sizeof(*input));
> > +
> > +   input->device_domain= domain->device_domain;
> > +
> > +   status = hv_do_hypercall(HVCALL_DELETE_DEVICE_DOMAIN, input, NULL);
> > +
> > +   local_irq_restore(flags);
> > +
> > +   if (!hv_result_success(status))
> > +   pr_err("%s: hypercall failed, status %lld\n", __func__, status);
> 
> Is it OK to deallocate the resources, if hypercall has failed ?

It should be fine. We leak some resources in the hypervisor, but Linux
is in a rather wedged state anyway. Refusing to free up resources in
Linux does not much good.

> Do we have any specific error code EBUSY (kind of) which we need to wait upon 
> ?
> 

I have not found a circumstance that can happen.

> > +
> > +   ida_free(&domain->hv_iommu->domain_ids, 
> > domain->device_domain.domain_id.id);
> > +
> > +   iommu_put_dma_cookie(d);
> > +
> > +   kfree(domain);
> > +}
> > +
> > +static int hv_iommu_attach_dev(struct iommu_domain *d, struct device *dev)
> > +{
> > +   struct hv_iommu_domain *domain = to_hv_iommu_domain(d);
> > +   u64 status;
> > +   unsigned long flags;
> > +   struct hv_input_attach_device_domain *input;
> > +   struct pci_dev *pdev;
> > +   struct hv_iommu_endpoint *vdev = dev_iommu_priv_get(dev);
> > +
> > +   /* Only allow PCI devices for now */
> > +   if (!dev_is_pci(dev))
> > +   return -EINVAL;
> > +
> > +   pdev = to_pci_dev(dev);
> > +
> > +   dev_dbg(dev, "Attaching (%strusted) to %d\n", pdev->untrusted ? "un" : 
> > "",
> > +   domain->device_domain.domain_id.id);
> > +
> > +   local_irq_save(flags);
> > +   input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +   memset(input, 0, sizeof(*input));
> > +
> > +   input->device_domain = domain->device_domain;
> > +   input->device_id = hv_build_pci_dev_id(pdev);
> > +
> > +   status = hv_do_hypercall(HVCALL_ATTACH_DEVICE_DOMAIN, input, NULL);
> > +   local_irq_restore(flags);
> > +
> > +   if (!hv_result_success(status))
> > +   pr_err("%s: hypercall failed, status %lld\n", __func__, status);
> 
> Does it make sense to vdev->domain = NULL ?
>

It is already NULL -- there is no other code path that sets it and the
detach path always sets the field to NULL.

> > +   else
> > +   vdev->domain = domain;
> > +
> > +   return hv_status_to_errno(status);
> > +}
> > +
[...]
> > +static size_t hv_iommu_unmap(struct iommu_domain *d, unsigned long iova,
> > +  size_t size, struct iommu_iotlb_gather *gather)
> > +{
> > +   size_t unmapped;
> > +   struct hv_iommu_domain *domain = to_hv_iommu_domain(d);
> > +   unsigned long flags, npages;
> > +   struct hv_input_unmap_device_gpa_pages *input;
> > +   u64 status;
> > +
> > +   unmapped = hv_iommu_del_mappings(domain, iova, size);
> > +   if (unmapped < size)
> > +   return 0;
> 
> Is there a case where unmapped > 0 && unmapped < size ?
> 

There could be such a case -- hv_iommu_del_mappings' return value is >= 0.
Is there a problem with this predicate?

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-08-03 Thread Wei Liu
On Wed, Aug 04, 2021 at 12:20:42AM +0530, Praveen Kumar wrote:
> On 09-07-2021 17:13, Wei Liu wrote:
> > Some devices may have been claimed by the hypervisor already. One such
> > example is a user can assign a NIC for debugging purpose.
> > 
> > Ideally Linux should be able to tell retrieve that information, but
> > there is no way to do that yet. And designing that new mechanism is
> > going to take time.
> > 
> > Provide a command line option for skipping devices. This is a stopgap
> > solution, so it is intentionally undocumented. Hopefully we can retire
> > it in the future.
> > 
> > Signed-off-by: Wei Liu 
> > ---
> >  drivers/iommu/hyperv-iommu.c | 45 
> >  1 file changed, 45 insertions(+)
> > 
> > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > index 043dcff06511..353da5036387 100644
> > --- a/drivers/iommu/hyperv-iommu.c
> > +++ b/drivers/iommu/hyperv-iommu.c
> > @@ -349,6 +349,16 @@ static const struct irq_domain_ops 
> > hyperv_root_ir_domain_ops = {
> >  
> >  #ifdef CONFIG_HYPERV_ROOT_PVIOMMU
> >  
> > +/* The IOMMU will not claim these PCI devices. */
> > +static char *pci_devs_to_skip;
> > +static int __init mshv_iommu_setup_skip(char *str) {
> > +   pci_devs_to_skip = str;
> > +
> > +   return 0;
> > +}
> > +/* mshv_iommu_skip=(:BB:DD.F)(:BB:DD.F) */
> > +__setup("mshv_iommu_skip=", mshv_iommu_setup_skip);
> > +
> >  /* DMA remapping support */
> >  struct hv_iommu_domain {
> > struct iommu_domain domain;
> > @@ -774,6 +784,41 @@ static struct iommu_device 
> > *hv_iommu_probe_device(struct device *dev)
> > if (!dev_is_pci(dev))
> > return ERR_PTR(-ENODEV);
> >  
> > +   /*
> > +* Skip the PCI device specified in `pci_devs_to_skip`. This is a
> > +* temporary solution until we figure out a way to extract information
> > +* from the hypervisor what devices it is already using.
> > +*/
> > +   if (pci_devs_to_skip && *pci_devs_to_skip) {
> > +   int pos = 0;
> > +   int parsed;
> > +   int segment, bus, slot, func;
> > +   struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > +   do {
> > +   parsed = 0;
> > +
> > +   sscanf(pci_devs_to_skip + pos,
> > +   " (%x:%x:%x.%x) %n",
> > +   &segment, &bus, &slot, &func, &parsed);
> > +
> > +   if (parsed <= 0)
> > +   break;
> > +
> > +   if (pci_domain_nr(pdev->bus) == segment &&
> > +   pdev->bus->number == bus &&
> > +   PCI_SLOT(pdev->devfn) == slot &&
> > +   PCI_FUNC(pdev->devfn) == func)
> > +   {
> > +   dev_info(dev, "skipped by MSHV IOMMU\n");
> > +   return ERR_PTR(-ENODEV);
> > +   }
> > +
> > +   pos += parsed;
> > +
> > +   } while (pci_devs_to_skip[pos]);
> 
> Is there a possibility of pci_devs_to_skip + pos > sizeof(pci_devs_to_skip)
> and also a valid memory ?

pos should point to the last parsed position. If parsing fails pos does
not get updated and the code breaks out of the loop. If parsing is
success pos should point to either the start of next element of '\0'
(end of string). To me this is good enough.

> I would recommend to have a check of size as well before accessing the
> array content, just to be safer accessing any memory.
> 

What check do you have in mind?

Wei.

> > +   }
> > +
> > vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
> > if (!vdev)
> > return ERR_PTR(-ENOMEM);
> > 
> 
> Regards,
> 
> ~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-08-10 Thread Wei Liu
On Wed, Aug 04, 2021 at 12:33:54PM +0530, Praveen Kumar wrote:
> On 04-08-2021 03:26, Wei Liu wrote:
> >>>   struct iommu_domain domain;
> >>> @@ -774,6 +784,41 @@ static struct iommu_device 
> >>> *hv_iommu_probe_device(struct device *dev)
> >>>   if (!dev_is_pci(dev))
> >>>   return ERR_PTR(-ENODEV);
> >>>  
> >>> + /*
> >>> +  * Skip the PCI device specified in `pci_devs_to_skip`. This is a
> >>> +  * temporary solution until we figure out a way to extract information
> >>> +  * from the hypervisor what devices it is already using.
> >>> +  */
> >>> + if (pci_devs_to_skip && *pci_devs_to_skip) {
> >>> + int pos = 0;
> >>> + int parsed;
> >>> + int segment, bus, slot, func;
> >>> + struct pci_dev *pdev = to_pci_dev(dev);
> >>> +
> >>> + do {
> >>> + parsed = 0;
> >>> +
> >>> + sscanf(pci_devs_to_skip + pos,
> >>> + " (%x:%x:%x.%x) %n",
> >>> + &segment, &bus, &slot, &func, &parsed);
> >>> +
> >>> + if (parsed <= 0)
> >>> + break;
> >>> +
> >>> + if (pci_domain_nr(pdev->bus) == segment &&
> >>> + pdev->bus->number == bus &&
> >>> + PCI_SLOT(pdev->devfn) == slot &&
> >>> + PCI_FUNC(pdev->devfn) == func)
> >>> + {
> >>> + dev_info(dev, "skipped by MSHV IOMMU\n");
> >>> + return ERR_PTR(-ENODEV);
> >>> + }
> >>> +
> >>> + pos += parsed;
> >>> +
> >>> + } while (pci_devs_to_skip[pos]);
> >>
> >> Is there a possibility of pci_devs_to_skip + pos > sizeof(pci_devs_to_skip)
> >> and also a valid memory ?
> > 
> > pos should point to the last parsed position. If parsing fails pos does
> > not get updated and the code breaks out of the loop. If parsing is
> > success pos should point to either the start of next element of '\0'
> > (end of string). To me this is good enough.
> 
> The point is, hypothetically the address to pci_devs_to_skip + pos can
> be valid address (later to '\0'), and thus there is a possibility,
> that parsing may not fail.

Have you found an example how at any given point in time
pci_devs_to_skip + pos can point outside of user provided string?

> Another, there is also a possibility of sscanf faulting accessing the
> illegal address, if pci_devs_to_skip[pos] turns out to be not NULL or
> valid address.

That depends on pci_devs_to_skip + pos can point to an invalid address
in the first place, so that goes back to the question above.

> 
> > 
> >> I would recommend to have a check of size as well before accessing the
> >> array content, just to be safer accessing any memory.
> >>
> > 
> > What check do you have in mind?
> 
> Something like,
> size_t len = strlen(pci_devs_to_skip);
> do {
> 
>   len -= parsed;
> } while (len);
> 
> OR
> 
> do {
> ...
>   pos += parsed;
> } while (pos < len);
> 
> Further, I'm also fine with the existing code, if you think this won't
> break and already been taken care. Thanks.

But in the loop somewhere you will still need to parse pci_devs_to_skip
+ some_offset. The new code structure does not remove that, right?

Given this is for debugging and is supposed to be temporary, I think the
code is good enough. But I want to make sure if there is anything I
missed.

Wei.

> 
> Regards,
> 
> ~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 5/8] mshv: add paravirtualized IOMMU support

2021-08-10 Thread Wei Liu
On Wed, Aug 04, 2021 at 12:13:42PM +0530, Praveen Kumar wrote:
> On 04-08-2021 03:17, Wei Liu wrote:
> >>> +static size_t hv_iommu_unmap(struct iommu_domain *d, unsigned long iova,
> >>> +size_t size, struct iommu_iotlb_gather *gather)
> >>> +{
> >>> + size_t unmapped;
> >>> + struct hv_iommu_domain *domain = to_hv_iommu_domain(d);
> >>> + unsigned long flags, npages;
> >>> + struct hv_input_unmap_device_gpa_pages *input;
> >>> + u64 status;
> >>> +
> >>> + unmapped = hv_iommu_del_mappings(domain, iova, size);
> >>> + if (unmapped < size)
> >>> + return 0;
> >> Is there a case where unmapped > 0 && unmapped < size ?
> >>
> > There could be such a case -- hv_iommu_del_mappings' return value is >= 0.
> > Is there a problem with this predicate?
> 
> What I understand, if we are unmapping and return 0, means nothing was
> unmapped, and will that not cause any corruption or illegal access of
> unmapped memory later?  From __iommu_unmap

Those pages are not really unmapped. The hypercall is skipped.

> ...
> 13 while (unmapped < size) {
> 12 size_t pgsize = iommu_pgsize(domain, iova, size - 
> unmapped);
> 11
> 10 unmapped_page = ops->unmap(domain, iova, pgsize, 
> iotlb_gather);
>  9 if (!unmapped_page)
>  8 break; <<< we just break here, 
> thinking there is nothing unmapped, but actually hv_iommu_del_mappings has 
> removed some pages.
>  7
>  6 pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
>  5 ¦iova, unmapped_page);
>  4
>  3 iova += unmapped_page;
>  2 unmapped += unmapped_page;
>  1 }
> ...
> 
> Am I missing something ?
> 
> Regards,
> 
> ~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-10 Thread Wei Liu
On Mon, Aug 09, 2021 at 01:56:05PM -0400, Tianyu Lan wrote:
[...]
>  static int hv_cpu_init(unsigned int cpu)
>  {
>   union hv_vp_assist_msr_contents msr = { 0 };
> @@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
>   }
>   }
>  
> + hyperv_init_ghcb();
> +

Why is the return value not checked here? If that's not required, can
you leave a comment?

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-10 Thread Wei Liu
On Mon, Aug 09, 2021 at 01:56:07PM -0400, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.
> 
> Signed-off-by: Tianyu Lan 
> ---
> Change since v2:
>* Rework __set_memory_enc_dec() and call Hyper-V and AMD function
>  according to platform check.
> 
> Change since v1:
>* Use new staic call x86_set_memory_enc to avoid add Hyper-V
>  specific check in the set_memory code.
> ---
>  arch/x86/hyperv/Makefile   |   2 +-
>  arch/x86/hyperv/hv_init.c  |   6 ++
>  arch/x86/hyperv/ivm.c  | 114 +
>  arch/x86/include/asm/hyperv-tlfs.h |  20 +
>  arch/x86/include/asm/mshyperv.h|   4 +-
>  arch/x86/mm/pat/set_memory.c   |  19 +++--
>  include/asm-generic/hyperv-tlfs.h  |   1 +
>  include/asm-generic/mshyperv.h |   1 +
>  8 files changed, 160 insertions(+), 7 deletions(-)
>  create mode 100644 arch/x86/hyperv/ivm.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 48e2c51464e8..5d2de10809ae 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o irqdomain.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o ivm.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
>  
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;

Nit: false instead of 0.

> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;
> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
>  
[...]
> +int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
> +enum hv_mem_host_visibility visibility)
> +{
> + struct hv_gpa_range_for_visibility **input_pcpu, *input;
> + u16 pages_processed;
> + u64 hv_status;
> + unsigned long flags;
> +
> + /* no-op if partition isolation is not enabled */
> + if (!hv_is_isolation_supported())
> + return 0;
> +
> + if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
> + pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
> + HV_MAX_MODIFY_GPA_REP_COUNT);
> + return -EINVAL;
> + }
> +
> + local_irq_save(flags);
> + input_pcpu = (struct hv_gpa_range_for_visibility **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> + input = *input_pcpu;
> + if (unlikely(!input)) {
> + local_irq_restore(flags);
> + return -EINVAL;
> + }
> +
> + input->partition_id = HV_PARTITION_ID_SELF;
> + input->host_visibility = visibility;
> + input->reserved0 = 0;
> + input->reserved1 = 0;
> + memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
> + hv_status = hv_do_rep_hypercall(
> + HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
> + 0, input, &pages_processed);
> + local_irq_restore(flags);
> +
> + if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
> + return 0;
> +
> + return hv_status & HV_HYPERCALL_RESULT_MASK;

Joseph introduced a few helper functions in 753ed9c95c37d. They will
make the code simpler.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC v1 04/18] iommu/hyperv: don't setup IRQ remapping when running as root

2020-09-14 Thread Wei Liu
The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root.

Signed-off-by: Wei Liu 
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 8919c1c70b68..4da684ab292c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -143,7 +144,7 @@ static int __init hyperv_prepare_irq_remapping(void)
int i;
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
-   !x2apic_supported())
+   !x2apic_supported() || hv_root_partition)
return -ENODEV;
 
fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 04/17] iommu/hyperv: don't setup IRQ remapping when running as root

2020-11-05 Thread Wei Liu
The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root.

Signed-off-by: Wei Liu 
Acked-by: Joerg Roedel 
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..8d3ce3add57d 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -143,7 +144,7 @@ static int __init hyperv_prepare_irq_remapping(void)
int i;
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
-   !x2apic_supported())
+   !x2apic_supported() || hv_root_partition)
return -ENODEV;
 
fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 04/17] iommu/hyperv: don't setup IRQ remapping when running as root

2020-11-13 Thread Wei Liu
On Thu, Nov 12, 2020 at 04:27:14PM +0100, Vitaly Kuznetsov wrote:
> Wei Liu  writes:
> 
> > The IOMMU code needs more work. We're sure for now the IRQ remapping
> > hooks are not applicable when Linux is the root.
> 
> Super-nitpick: I would suggest we always say 'root partition' as 'root'
> has a 'slightly different' meaning in Linux and this commit message may
> sound confusing to an unprepared reader.

Fixed.

> 
> >
> > Signed-off-by: Wei Liu 
> > Acked-by: Joerg Roedel 
> > ---
> >  drivers/iommu/hyperv-iommu.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > index e09e2d734c57..8d3ce3add57d 100644
> > --- a/drivers/iommu/hyperv-iommu.c
> > +++ b/drivers/iommu/hyperv-iommu.c
> > @@ -20,6 +20,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "irq_remapping.h"
> >  
> > @@ -143,7 +144,7 @@ static int __init hyperv_prepare_irq_remapping(void)
> > int i;
> >  
> > if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
> > -   !x2apic_supported())
> > +   !x2apic_supported() || hv_root_partition)
> > return -ENODEV;
> >  
> > fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> 
> Reviewed-by: Vitaly Kuznetsov 

Thanks.

Wei.

> 
> -- 
> Vitaly
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 04/17] iommu/hyperv: don't setup IRQ remapping when running as root

2020-11-24 Thread Wei Liu
The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root partition.

Signed-off-by: Wei Liu 
Acked-by: Joerg Roedel 
Reviewed-by: Vitaly Kuznetsov 
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e09e2d734c57..8d3ce3add57d 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -143,7 +144,7 @@ static int __init hyperv_prepare_irq_remapping(void)
int i;
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
-   !x2apic_supported())
+   !x2apic_supported() || hv_root_partition)
return -ENODEV;
 
fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v4 04/17] iommu/hyperv: don't setup IRQ remapping when running as root

2021-01-06 Thread Wei Liu
The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root partition.

Signed-off-by: Wei Liu 
Acked-by: Joerg Roedel 
Reviewed-by: Vitaly Kuznetsov 
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1d21a0b5f724..b7db6024e65c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
x86_init.hyper.msi_ext_dest_id() ||
-   !x2apic_supported())
+   !x2apic_supported() || hv_root_partition)
return -ENODEV;
 
fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root

2021-01-20 Thread Wei Liu
The IOMMU code needs more work. We're sure for now the IRQ remapping
hooks are not applicable when Linux is the root partition.

Signed-off-by: Wei Liu 
Acked-by: Joerg Roedel 
Reviewed-by: Vitaly Kuznetsov 
---
 drivers/iommu/hyperv-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index 1d21a0b5f724..b7db6024e65c 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "irq_remapping.h"
 
@@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
x86_init.hyper.msi_ext_dest_id() ||
-   !x2apic_supported())
+   !x2apic_supported() || hv_root_partition)
return -ENODEV;
 
fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-01-20 Thread Wei Liu
Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
Hypervisor when Linux runs as the root partition. Implement an IRQ
domain to handle mapping and unmapping of IO-APIC interrupts.

Signed-off-by: Wei Liu 
---
 arch/x86/hyperv/irqdomain.c |  54 ++
 arch/x86/include/asm/mshyperv.h |   4 +
 drivers/iommu/hyperv-iommu.c| 179 +++-
 3 files changed, 233 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
index 19637cd60231..8e2b4e478b70 100644
--- a/arch/x86/hyperv/irqdomain.c
+++ b/arch/x86/hyperv/irqdomain.c
@@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
 }
 
 #endif /* CONFIG_PCI_MSI */
+
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry)
+{
+   union hv_device_id device_id;
+
+   device_id.as_uint64 = 0;
+   device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+   device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+   return hv_unmap_interrupt(device_id.as_uint64, entry) & 
HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
+
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+   struct hv_interrupt_entry *entry)
+{
+   unsigned long flags;
+   struct hv_input_map_device_interrupt *input;
+   struct hv_output_map_device_interrupt *output;
+   union hv_device_id device_id;
+   struct hv_device_interrupt_descriptor *intr_desc;
+   u16 status;
+
+   device_id.as_uint64 = 0;
+   device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
+   device_id.ioapic.ioapic_id = (u8)ioapic_id;
+
+   local_irq_save(flags);
+   input = *this_cpu_ptr(hyperv_pcpu_input_arg);
+   output = *this_cpu_ptr(hyperv_pcpu_output_arg);
+   memset(input, 0, sizeof(*input));
+   intr_desc = &input->interrupt_descriptor;
+   input->partition_id = hv_current_partition_id;
+   input->device_id = device_id.as_uint64;
+   intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
+   intr_desc->target.vector = vector;
+   intr_desc->vector_count = 1;
+
+   if (level)
+   intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
+   else
+   intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
+
+   __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
+
+   status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, 
output) &
+HV_HYPERCALL_RESULT_MASK;
+   local_irq_restore(flags);
+
+   *entry = output->interrupt_entry;
+
+   return status;
+}
+EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index ccc849e25d5e..345d7c6f8c37 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union 
hv_msi_entry *msi_entry,
 
 struct irq_domain *hv_create_pci_msi_domain(void);
 
+int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
+   struct hv_interrupt_entry *entry);
+int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry *entry);
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index b7db6024e65c..6d35e4c303c6 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops = 
{
.free = hyperv_irq_remapping_free,
 };
 
+static const struct irq_domain_ops hyperv_root_ir_domain_ops;
 static int __init hyperv_prepare_irq_remapping(void)
 {
struct fwnode_handle *fn;
int i;
+   const char *name;
+   const struct irq_domain_ops *ops;
 
if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
x86_init.hyper.msi_ext_dest_id() ||
-   !x2apic_supported() || hv_root_partition)
+   !x2apic_supported())
return -ENODEV;
 
-   fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
+   if (hv_root_partition) {
+   name = "HYPERV-ROOT-IR";
+   ops = &hyperv_root_ir_domain_ops;
+   } else {
+   name = "HYPERV-IR";
+   ops = &hyperv_ir_domain_ops;
+   }
+
+   fn = irq_domain_alloc_named_id_fwnode(name, 0);
if (!fn)
return -ENOMEM;
 
ioapic_ir_domain =
irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
-   0, IOAPIC_REMAPPING_ENTRY, fn,
-   &hyperv_ir_domain_ops, NULL);
+   0, IOAPIC_REMAPPING_ENTRY, fn, ops, NULL);
 
if (!ioapic_ir_domain) {
  

Re: [PATCH V3 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-02 Thread Wei Liu
On Wed, Dec 01, 2021 at 11:02:53AM -0500, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Hyper-V provides Isolation VM which has memory encrypt support. Add
> hyperv_cc_platform_has() and return true for check of GUEST_MEM_ENCRYPT
> attribute.
> 
> Signed-off-by: Tianyu Lan 
> ---
>  arch/x86/kernel/cc_platform.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> index 03bb2f343ddb..f3bb0431f5c5 100644
> --- a/arch/x86/kernel/cc_platform.c
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  
>  static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
> @@ -58,9 +59,23 @@ static bool amd_cc_platform_has(enum cc_attr attr)
>  #endif
>  }
>  
> +static bool hyperv_cc_platform_has(enum cc_attr attr)
> +{
> +#ifdef CONFIG_HYPERV
> + if (attr == CC_ATTR_GUEST_MEM_ENCRYPT)
> + return true;
> + else
> + return false;

This can be simplified as

return attr == CC_ATTR_GUEST_MEM_ENCRYPT;


Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V3 3/5] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-12-02 Thread Wei Liu
On Wed, Dec 01, 2021 at 11:02:54AM -0500, Tianyu Lan wrote:
[...]
> diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
> index 46df59aeaa06..30fd0600b008 100644
> --- a/arch/x86/xen/pci-swiotlb-xen.c
> +++ b/arch/x86/xen/pci-swiotlb-xen.c
> @@ -4,6 +4,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
>  EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
>  
>  IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
> -   NULL,
> +   hyperv_swiotlb_detect,

It is not immediately obvious why this is needed just by reading the
code. Please consider copying some of the text in the commit message to
a comment here.

Thanks,
Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()

2021-12-14 Thread Wei Liu
On Mon, Dec 13, 2021 at 02:14:03AM -0500, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Hyper-V provides Isolation VM for confidential computing support and
> guest memory is encrypted in it. Places checking cc_platform_has()
> with GUEST_MEM_ENCRYPT attr should return "True" in Isolation vm. e.g,
> swiotlb bounce buffer size needs to adjust according to memory size
> in the sev_setup_arch(). Add GUEST_MEM_ENCRYPT check for Hyper-V Isolation
> VM.
> 
> Signed-off-by: Tianyu Lan 

x86 maintainers, any comment on this patch?

> ---
> Change since v6:
>   * Change the order in the cc_platform_has() and check sev first.
> 
> Change since v3:
>   * Change code style of checking GUEST_MEM attribute in the
> hyperv_cc_platform_has().
> ---
>  arch/x86/kernel/cc_platform.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c
> index 03bb2f343ddb..6cb3a675e686 100644
> --- a/arch/x86/kernel/cc_platform.c
> +++ b/arch/x86/kernel/cc_platform.c
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  
>  static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr)
> @@ -58,12 +59,19 @@ static bool amd_cc_platform_has(enum cc_attr attr)
>  #endif
>  }
>  
> +static bool hyperv_cc_platform_has(enum cc_attr attr)
> +{
> + return attr == CC_ATTR_GUEST_MEM_ENCRYPT;
> +}
>  
>  bool cc_platform_has(enum cc_attr attr)
>  {
>   if (sme_me_mask)
>   return amd_cc_platform_has(attr);
>  
> + if (hv_is_isolation_supported())
> + return hyperv_cc_platform_has(attr);
> +
>   return false;
>  }
>  EXPORT_SYMBOL_GPL(cc_platform_has);
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM

2021-12-16 Thread Wei Liu
On Wed, Dec 15, 2021 at 01:00:38PM +0800, Tianyu Lan wrote:
> 
> 
> On 12/15/2021 6:40 AM, Dave Hansen wrote:
> > On 12/14/21 2:23 PM, Tom Lendacky wrote:
> > > > I don't really understand how this can be more general any *not* get
> > > > utilized by the existing SEV support.
> > > 
> > > The Virtual Top-of-Memory (VTOM) support is an SEV-SNP feature that is
> > > meant to be used with a (relatively) un-enlightened guest. The idea is
> > > that the C-bit in the guest page tables must be 0 for all accesses. It
> > > is only the physical address relative to VTOM that determines if the
> > > access is encrypted or not. So setting sme_me_mask will actually cause
> > > issues when running with this feature. Since all DMA for an SEV-SNP
> > > guest must still be to shared (unencrypted) memory, some enlightenment
> > > is needed. In this case, memory mapped above VTOM will provide that via
> > > the SWIOTLB update. For SEV-SNP guests running with VTOM, they are
> > > likely to also be running with the Reflect #VC feature, allowing a
> > > "paravisor" to handle any #VCs generated by the guest.
> > > 
> > > See sections 15.36.8 "Virtual Top-of-Memory" and 15.36.9 "Reflect #VC"
> > > in volume 2 of the AMD APM [1].
> > 
> > Thanks, Tom, that's pretty much what I was looking for.
> > 
> > The C-bit normally comes from the page tables.  But, the hardware also
> > provides an alternative way to effectively get C-bit behavior without
> > actually setting the bit in the page tables: Virtual Top-of-Memory
> > (VTOM).  Right?
> > 
> > It sounds like Hyper-V has chosen to use VTOM instead of requiring the
> > guest to do the C-bit in its page tables.
> > 
> > But, the thing that confuses me is when you said: "it (VTOM) is meant to
> > be used with a (relatively) un-enlightened guest".  We don't have an
> > unenlightened guest here.  We have Linux, which is quite enlightened.
> > 
> > > Is VTOM being used because there's something that completely rules out
> > > using the C-bit in the page tables?  What's that "something"?
> 
> 
> For "un-enlightened" guest, there is an another system running insider
> the VM to emulate some functions(tsc, timer, interrupt and so on) and
> this helps not to modify OS(Linux/Windows) a lot. In Hyper-V Isolation
> VM, we called the new system as HCL/paravisor. HCL runs in the VMPL0 and
> Linux runs in VMPL2. This is similar with nested virtualization. HCL
> plays similar role as L1 hypervisor to emulate some general functions
> (e.g, rdmsr/wrmsr accessing and interrupt injection) which needs to be
> enlightened in the enlightened guest. Linux kernel needs to handle
> #vc/#ve exception directly in the enlightened guest. HCL handles such
> exception in un-enlightened guest and emulate interrupt injection which
> helps not to modify OS core part code. Using vTOM also is same purpose.
> Hyper-V uses vTOM avoid changing page table related code in OS(include
> Windows and Linux)and just needs to map memory into decrypted address
> space above vTOM in the driver code.
> 
> Linux has generic swiotlb bounce buffer implementation and so introduce
> swiotlb_unencrypted_base here to set shared memory boundary or vTOM.
> Hyper-V Isolation VM is un-enlightened guest. Hyper-V doesn't expose sev/sme
> capability to guest and so SEV code actually doesn't work.
> So we also can't interact current existing SEV code and these code is
> for enlightened guest support without HCL/paravisor. If other platforms
> or SEV want to use similar vTOM feature, swiotlb_unencrypted_base can
> be reused. So swiotlb_unencrypted_base is a general solution for all
> platforms besides SEV and Hyper-V.
> 

Thanks for the detailed explanation.

Dave, are you happy with this?

The code looks pretty solid to my untrained eyes. And the series has
collected necessary acks from stakeholders. If I don't hear objection by
EOD Friday I will apply this series to hyperv-next.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V7 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)

2021-12-17 Thread Wei Liu
On Mon, Dec 13, 2021 at 02:14:01AM -0500, Tianyu Lan wrote:
> From: Tianyu Lan 
[...]
> 
> Tianyu Lan (5):
>   swiotlb: Add swiotlb bounce buffer remap function for HV IVM
>   x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
>   hyper-v: Enable swiotlb bounce buffer for Isolation VM
>   scsi: storvsc: Add Isolation VM support for storvsc driver
>   net: netvsc: Add Isolation VM support for netvsc driver
> 

Applied to hyperv-next. Thanks.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Swiotlb: Add CONFIG_HAS_IOMEM check around memremap() in the swiotlb_mem_remap()

2022-01-02 Thread Wei Liu
On Fri, Dec 31, 2021 at 11:56:40AM -0500, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> HAS_IOMEM option may not be selected on some platforms(e.g, s390) and this
> will cause compile error due to miss memremap() implementation. Fix it via

miss -> missingk

> adding HAS_IOMEM check around memremap() in the swiotlb.c.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Tianyu Lan 

Hi Christoph

Are you happy with this fix?

Thanks,
Wei.

> ---
>  kernel/dma/swiotlb.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b36c1cdd0c4f..3de651ba38cc 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -167,6 +167,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
> unsigned long bytes)
>  {
>   void *vaddr = NULL;
>  
> +#ifdef CONFIG_HAS_IOMEM
>   if (swiotlb_unencrypted_base) {
>   phys_addr_t paddr = mem->start + swiotlb_unencrypted_base;
>  
> @@ -175,6 +176,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
> unsigned long bytes)
>   pr_err("Failed to map the unencrypted memory %pa size 
> %lx.\n",
>  &paddr, bytes);
>   }
> +#endif
>  
>   return vaddr;
>  }
> -- 
> 2.25.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Swiotlb: Add CONFIG_HAS_IOMEM check around memremap() in the swiotlb_mem_remap()

2022-01-04 Thread Wei Liu
On Sun, Jan 02, 2022 at 11:54:46PM -0800, Christoph Hellwig wrote:
> On Fri, Dec 31, 2021 at 11:56:40AM -0500, Tianyu Lan wrote:
> > From: Tianyu Lan 
> > 
> > HAS_IOMEM option may not be selected on some platforms(e.g, s390) and this
> > will cause compile error due to miss memremap() implementation. Fix it via
> > adding HAS_IOMEM check around memremap() in the swiotlb.c.
> > 
> > Reported-by: kernel test robot 
> > Signed-off-by: Tianyu Lan 
> > ---
> >  kernel/dma/swiotlb.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index b36c1cdd0c4f..3de651ba38cc 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -167,6 +167,7 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
> > unsigned long bytes)
> >  {
> > void *vaddr = NULL;
> >  
> > +#ifdef CONFIG_HAS_IOMEM
> 
> Please stub out all of swiotlb_mem_remap instead of the ifdef inside the
> function.

Does this look okay to you?

---8<---
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b36c1cdd0c4f..f1e7ea160b43 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -163,6 +163,7 @@ static inline unsigned long nr_slots(u64 val)
  * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP
  * Isolation VMs).
  */
+#ifdef CONFIG_HAS_IOMEM
 static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
 {
void *vaddr = NULL;
@@ -178,6 +179,12 @@ static void *swiotlb_mem_remap(struct io_tlb_mem *mem, 
unsigned long bytes)
 
return vaddr;
 }
+#else
+static void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes)
+{
+   return NULL;
+}
+#endif
 
 /*
  * Early SWIOTLB allocation may be too early to allow an architecture to
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] Swiotlb: Add CONFIG_HAS_IOMEM check around memremap() in the swiotlb_mem_remap()

2022-01-04 Thread Wei Liu
On Tue, Jan 04, 2022 at 04:03:07PM +0100, Christoph Hellwig wrote:
> On Tue, Jan 04, 2022 at 02:51:55PM +0000, Wei Liu wrote:
> > > Please stub out all of swiotlb_mem_remap instead of the ifdef inside the
> > > function.
> > 
> > Does this look okay to you?
> 
> Yes, thanks!

Thanks for confirming!

I will turn this into an ack and apply the diff to hyperv-next.

Wei.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu