[PATCH 1/1] iommu: Fix deferred domain attachment in iommu_probe_device()

2020-10-25 Thread Lu Baolu
The IOMMU core has support for deferring the attachment of default domain
to device. Fix a missed deferred attaching check in iommu_probe_device().

Fixes: cf193888bfbd3 ("iommu: Move new probe_device path to separate function")
Cc: sta...@vger.kernel.org # v5.8+
Signed-off-by: Lu Baolu 
---
 drivers/iommu/iommu.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6d847027d35e..690abf60239d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -93,6 +93,8 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
   struct device *dev);
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
+static bool iommu_is_attach_deferred(struct iommu_domain *domain,
+struct device *dev);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -264,7 +266,8 @@ int iommu_probe_device(struct device *dev)
 */
iommu_alloc_default_domain(group, dev);
 
-   if (group->default_domain)
+   if (group->default_domain &&
+   !iommu_is_attach_deferred(group->default_domain, dev))
ret = __iommu_attach_device(group->default_domain, dev);
 
iommu_create_device_direct_mappings(group, dev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Increase interrupt remapping table limit to 512 entries

2020-10-25 Thread Suravee Suthikulpanit

Hi Joerg,

Do you have any concerns regarding this patch?

Thanks,
Suravee

On 10/15/20 9:50 AM, Suravee Suthikulpanit wrote:

Certain device drivers allocate IO queues on a per-cpu basis.
On AMD EPYC platform, which can support up-to 256 cpu threads,
this can exceed the current MAX_IRQ_PER_TABLE limit of 256,
and result in the error message:

 AMD-Vi: Failed to allocate IRTE

This has been observed with certain NVME devices.

AMD IOMMU hardware can actually support upto 512 interrupt
remapping table entries. Therefore, update the driver to
match the hardware limit.

Please note that this also increases the size of interrupt remapping
table to 8KB per device when using the 128-bit IRTE format.

Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/amd_iommu_types.h | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..427484c45589 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -406,7 +406,11 @@ extern bool amd_iommu_np_cache;
  /* Only true if all IOMMUs support device IOTLBs */
  extern bool amd_iommu_iotlb_sup;
  
-#define MAX_IRQS_PER_TABLE	256

+/*
+ * AMD IOMMU hardware only support 512 IRTEs despite
+ * the architectural limitation of 2048 entries.
+ */
+#define MAX_IRQS_PER_TABLE 512
  #define IRQ_TABLE_ALIGNMENT   128
  
  struct irq_remap_table {



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Laight
From: David Woodhouse
> Sent: 25 October 2020 10:26
> To: David Laight ; x...@kernel.org
> 
> On Sun, 2020-10-25 at 09:49 +, David Laight wrote:
> > Just looking at a random one of these patches...
> >
> > Does the compiler manage to optimise that reasonably?
> > Or does it generate a lot of shifts and masks as each
> > bitfield is set?
> >
> > The code generation for bitfields is often a lot worse
> > that that for |= setting bits in a word.
> 
> Indeed, it appears to be utterly appalling. That was one of my
> motivations for doing it with masks and shifts in the first place.

I thought it would be.

I'm not even sure using bitfields to map hardware registers
makes the code any more readable (or fool proof).

I suspect using #define constants and explicit |= and &= ~
is actually best - provided the names are descriptive enough.

If you set all the fields the compiler will merge the values
and do a single write - provided nothing you read might
alias the target.

The only way to get strongly typed integers is to cast them
to a structure pointer type.
Together with a static inline function to remove the casts
and | the values together it might make things fool proof.
But I've never tried it.

ISTR once doing something like that with error codes, but
it didn't ever make source control.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Woodhouse
On Sun, 2020-10-25 at 09:49 +, David Laight wrote:
> Just looking at a random one of these patches...
> 
> Does the compiler manage to optimise that reasonably?
> Or does it generate a lot of shifts and masks as each
> bitfield is set?
> 
> The code generation for bitfields is often a lot worse
> that that for |= setting bits in a word.

Indeed, it appears to be utterly appalling. That was one of my
motivations for doing it with masks and shifts in the first place.

Because in ioapic_setup_msg_from_msi(), for example, these two are
consecutive in both source and destination:

entry->vector   = msg.arch_data.vector;
entry->delivery_mode= msg.arch_data.delivery_mode;

And so are those:

entry->ir_format= msg.arch_addr_lo.dmar_format;
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;

But GCC 7.5.0 here is doing them all separately...

11e0 :
{
11e0:   53  push   %rbx
11e1:   48 89 f3mov%rsi,%rbx
11e4:   48 83 ec 18 sub$0x18,%rsp
irq_chip_compose_msi_msg(irq_data, &msg);
11e8:   48 89 e6mov%rsp,%rsi
{
11eb:   65 48 8b 04 25 28 00mov%gs:0x28,%rax
11f2:   00 00 
11f4:   48 89 44 24 10  mov%rax,0x10(%rsp)
11f9:   31 c0   xor%eax,%eax
irq_chip_compose_msi_msg(irq_data, &msg);
11fb:   e8 00 00 00 00  callq  1200 

entry->vector   = msg.arch_data.vector;
1200:   0f b6 44 24 08  movzbl 0x8(%rsp),%eax
entry->delivery_mode= msg.arch_data.delivery_mode;
1205:   0f b6 53 01 movzbl 0x1(%rbx),%edx
1209:   0f b6 74 24 09  movzbl 0x9(%rsp),%esi
entry->vector   = msg.arch_data.vector;
120e:   88 03   mov%al,(%rbx)
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
1210:   0f b6 04 24 movzbl (%rsp),%eax
entry->delivery_mode= msg.arch_data.delivery_mode;
1214:   83 e2 f0and$0xfff0,%edx
1217:   83 e6 07and$0x7,%esi
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
121a:   09 f2   or %esi,%edx
121c:   8d 0c 00lea(%rax,%rax,1),%ecx
entry->ir_format= msg.arch_addr_lo.dmar_format;
121f:   c0 e8 04shr$0x4,%al
entry->dest_mode_logical= msg.arch_addr_lo.dest_mode_logical;
1222:   83 e1 08and$0x8,%ecx
1225:   09 ca   or %ecx,%edx
1227:   88 53 01mov%dl,0x1(%rbx)
entry->ir_format= msg.arch_addr_lo.dmar_format;
122a:   89 c2   mov%eax,%edx
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;
122c:   8b 04 24mov(%rsp),%eax
122f:   83 e2 01and$0x1,%edx
1232:   c1 e8 05shr$0x5,%eax
1235:   66 25 ff 7f and$0x7fff,%ax
1239:   8d 0c 00lea(%rax,%rax,1),%ecx
123c:   66 c1 e8 07 shr$0x7,%ax
1240:   88 43 07mov%al,0x7(%rbx)
1243:   09 ca   or %ecx,%edx
}
1245:   48 8b 44 24 10  mov0x10(%rsp),%rax
124a:   65 48 33 04 25 28 00xor%gs:0x28,%rax
1251:   00 00 
entry->ir_index_0_14= msg.arch_addr_lo.dmar_index_0_14;
1253:   88 53 06mov%dl,0x6(%rbx)
}
1256:   75 06   jne125e 

1258:   48 83 c4 18 add$0x18,%rsp
125c:   5b  pop%rbx
125d:   c3  retq   
125e:   e8 00 00 00 00  callq  1263 

1263:   0f 1f 00nopl   (%rax)
1266:   66 2e 0f 1f 84 00 00nopw   %cs:0x0(%rax,%rax,1)
126d:   00 00 00 


Still, this isn't really a fast path so I'm content to file the GCC PR
for the really *obvious* misses and let it take its course.


smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v3 17/35] x86/pci/xen: Use msi_msg shadow structs

2020-10-25 Thread David Laight
From: David Woodhouse
> Sent: 24 October 2020 22:35
> 
> From: Thomas Gleixner 
> 
> Use the msi_msg shadow structs and compose the message with named bitfields
> instead of the unreadable macro maze.
> 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: David Woodhouse 
> ---
>  arch/x86/pci/xen.c | 26 +++---
>  1 file changed, 11 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
> index c552cd2d0632..3d41a09c2c14 100644
> --- a/arch/x86/pci/xen.c
> +++ b/arch/x86/pci/xen.c
> @@ -152,7 +152,6 @@ static int acpi_register_gsi_xen(struct device *dev, u32 
> gsi,
> 
>  #if defined(CONFIG_PCI_MSI)
>  #include 
> -#include 
> 
>  struct xen_pci_frontend_ops *xen_pci_frontend;
>  EXPORT_SYMBOL_GPL(xen_pci_frontend);
> @@ -210,23 +209,20 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int 
> nvec, int type)
>   return ret;
>  }
> 
> -#define XEN_PIRQ_MSI_DATA  (MSI_DATA_TRIGGER_EDGE | \
> - MSI_DATA_LEVEL_ASSERT | (3 << 8) | MSI_DATA_VECTOR(0))
> -
>  static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq,
>   struct msi_msg *msg)
>  {
> - /* We set vector == 0 to tell the hypervisor we don't care about it,
> -  * but we want a pirq setup instead.
> -  * We use the dest_id field to pass the pirq that we want. */
> - msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq);
> - msg->address_lo =
> - MSI_ADDR_BASE_LO |
> - MSI_ADDR_DEST_MODE_PHYSICAL |
> - MSI_ADDR_REDIRECTION_CPU |
> - MSI_ADDR_DEST_ID(pirq);
> -
> - msg->data = XEN_PIRQ_MSI_DATA;
> + /*
> +  * We set vector == 0 to tell the hypervisor we don't care about
> +  * it, but we want a pirq setup instead.  We use the dest_id fields
> +  * to pass the pirq that we want.
> +  */
> + memset(msg, 0, sizeof(*msg));
> + msg->address_hi = X86_MSI_BASE_ADDRESS_HIGH;
> + msg->arch_addr_hi.destid_8_31 = pirq >> 8;
> + msg->arch_addr_lo.destid_0_7 = pirq & 0xFF;
> + msg->arch_addr_lo.base_address = X86_MSI_BASE_ADDRESS_LOW;
> + msg->arch_data.delivery_mode = APIC_DELIVERY_MODE_EXTINT;
>  }

Just looking at a random one of these patches...

Does the compiler manage to optimise that reasonably?
Or does it generate a lot of shifts and masks as each
bitfield is set?

The code generation for bitfields is often a lot worse
that that for |= setting bits in a word.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 22/35] genirq/irqdomain: Implement get_name() method on irqchip fwnodes

2020-10-25 Thread Marc Zyngier
Hi David,

nit: please use my kernel.org address for kernel related stuff.

On Sat, 24 Oct 2020 22:35:22 +0100,
David Woodhouse  wrote:
> 
> From: David Woodhouse 
> 
> Prerequesite to make x86 more irqdomain compliant.

Prerequisite?

> 
> Signed-off-by: David Woodhouse 
> Signed-off-by: Thomas Gleixner 
> ---
>  kernel/irq/irqdomain.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index cf8b374b892d..673fa64c1c44 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -42,7 +42,16 @@ static inline void debugfs_add_domain_dir(struct 
> irq_domain *d) { }
>  static inline void debugfs_remove_domain_dir(struct irq_domain *d) { }
>  #endif
>  
> -const struct fwnode_operations irqchip_fwnode_ops;
> +static const char *irqchip_fwnode_get_name(const struct fwnode_handle 
> *fwnode)
> +{
> + struct irqchip_fwid *fwid = container_of(fwnode, struct irqchip_fwid, 
> fwnode);
> +
> + return fwid->name;
> +}
> +
> +const struct fwnode_operations irqchip_fwnode_ops = {
> + .get_name = irqchip_fwnode_get_name,
> +};
>  EXPORT_SYMBOL_GPL(irqchip_fwnode_ops);
>  
>  /**

Acked-by: Marc Zyngier 

-- 
Without deviation from the norm, progress is not possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/35] Fix x2apic enablement and allow more CPUs, clean up I/OAPIC and MSI bitfields

2020-10-25 Thread David Woodhouse
On Sat, 2020-10-24 at 22:35 +0100, David Woodhouse wrote:
> Fix the conditions for enabling x2apic on guests without interrupt 
> remapping, and support 15-bit Extended Destination ID to allow 32768 
> CPUs without IR on hypervisors that support it.
> 
> Make the I/OAPIC code generate its RTE directly from the MSI message
> created by the parent irqchip, and fix up a bunch of magic mask/shift
> macros to use bitfields for MSI messages and I/OAPIC RTEs while we're
> at it.

Forgot to mention (since I thought I'd posted it in a previous series)
that v3 also ditches irq_remapping_get_irq_domain() and some icky
special cases of hard-coding "x86_vector_domain", and makes HPET and
I/OAPIC use irq_find_matching_fwspeC() to find their parent irqdomain.

> v3:
>  • Lots of bitfield cleanups from Thomas.
>  • Disable hyperv-iommu if 15-bit extension is present.
>  • Fix inconsistent CONFIG_PCI_MSI/CONFIG_GENERIC_MSI_IRQ in hpet.c
>  • Split KVM_FEATURE_MSI_EXT_DEST_ID patch, half of which is going upstream
>through KVM tree (and the other half needs to wait, or have an #ifdef) so
>is left at the top of the tree.
> 
> v2:
>  • Minor cleanups.
>  • Move __irq_msi_compose_msg() to apic.c, make virt_ext_dest_id static.
>  • Generate I/OAPIC RTE directly from parent irqchip's MSI messages.
>  • Clean up HPET MSI support into hpet.c now that we can.



smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu