Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-12 Thread Huang Adrian
On Wed, Nov 13, 2019 at 12:05 AM Joerg Roedel  wrote:
>
> On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote:
> > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel  wrote:
>
> > > So there are a couple of options here:
> > >
> > > 1) Bail out and disable the IOMMU as the BIOS screwed up
> > >
> > > 2) Treat per-device exclusion ranges just as r/w unity-mapped
> > >regions.
> > >
> > >
> > > I think option 2) is the best fix here.
> >
> > Yes. This is what this patch does (The first exclusion region still
> > uses the exclusion range while the remaining exclusion regions are
> > modified to be r/w unity-mapped regions when detecting multiple
> > exclusion regions) .
>
> Yeah, but I think it is better to just stop using the exclusion-range
> feature of the hardware (because it meand BIOSes are correct) and just
> treat every exclusion range (also the first one) as an r/w unity range.

Okay, I see. If you're okay with this (treat per-device exclusion
ranges as r/w unity-mapped regions - including the first one), I can
prepare the patch.

>
> > But, I'm guessing you're talking about that BIOS has to define r/w
> > unity-mapped regions instead of the per-device exclusions (Fix from
> > BIOS, not prevent the failure from kernel). Right?
>
> That would be best, but I fear this is too much to wish for.

Totally agree. That's why I submit this patch. :-)

-- Adrian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-12 Thread Joerg Roedel
On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote:
> > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel  wrote:

> > So there are a couple of options here:
> >
> > 1) Bail out and disable the IOMMU as the BIOS screwed up
> >
> > 2) Treat per-device exclusion ranges just as r/w unity-mapped
> >regions.
> >
> >
> > I think option 2) is the best fix here.
> 
> Yes. This is what this patch does (The first exclusion region still
> uses the exclusion range while the remaining exclusion regions are
> modified to be r/w unity-mapped regions when detecting multiple
> exclusion regions) .

Yeah, but I think it is better to just stop using the exclusion-range
feature of the hardware (because it meand BIOSes are correct) and just
treat every exclusion range (also the first one) as an r/w unity range.

> But, I'm guessing you're talking about that BIOS has to define r/w
> unity-mapped regions instead of the per-device exclusions (Fix from
> BIOS, not prevent the failure from kernel). Right?

That would be best, but I fear this is too much to wish for.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-12 Thread Huang Adrian
Hi Joerg,

> On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel  wrote:
>
> Hi Adrian,
>
> On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote:
> >   2) When set_device_exclusion_range() parses the IVMD of devce id
> >  '4200', the exclusion range of the amd_iommu struct becomes:
> >
> >   iommu->exclusion_start = 0x9F58D000;
> >   iommu->exclusion_length = 0x804;
> >
> >   3) When parsing the second IVMD (device id '4300') in
> >  set_device_exclusion_range(), the exclusion range of the
> >  amd_iommu struct becomes:
> >
> >   iommu->exclusion_start = 0x9754D000;
> >   iommu->exclusion_length = 0x804;
> >
> >  This overwrites the first IVMD configuration, which leads to
> >  the failure of the first RAID controller.
>
> Okay, I think this is clearly a BIOS bug as the BIOS should not define
> two different exclusion ranges in the IVRS table.

Thanks for the review.

Yes. I understand this is a BIOS bug. The purpose of this patch is to
prevent the failure event though the system boots with the buggy BIOS.

> So there are a couple of options here:
>
> 1) Bail out and disable the IOMMU as the BIOS screwed up
>
> 2) Treat per-device exclusion ranges just as r/w unity-mapped
>regions.
>
>
> I think option 2) is the best fix here.

Yes. This is what this patch does (The first exclusion region still
uses the exclusion range while the remaining exclusion regions are
modified to be r/w unity-mapped regions when detecting multiple
exclusion regions) .

But, I'm guessing you're talking about that BIOS has to define r/w
unity-mapped regions instead of the per-device exclusions (Fix from
BIOS, not prevent the failure from kernel). Right?

-- Adrian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-11 Thread Joerg Roedel
Hi Adrian,

On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote:
>   2) When set_device_exclusion_range() parses the IVMD of devce id
>  '4200', the exclusion range of the amd_iommu struct becomes:
> 
>   iommu->exclusion_start = 0x9F58D000;
>   iommu->exclusion_length = 0x804;
> 
>   3) When parsing the second IVMD (device id '4300') in
>  set_device_exclusion_range(), the exclusion range of the
>  amd_iommu struct becomes:
> 
>   iommu->exclusion_start = 0x9754D000;
>   iommu->exclusion_length = 0x804;
> 
>  This overwrites the first IVMD configuration, which leads to
>  the failure of the first RAID controller.

Okay, I think this is clearly a BIOS bug as the BIOS should not define
two different exclusion ranges in the IVRS table.

So there are a couple of options here:

1) Bail out and disable the IOMMU as the BIOS screwed up

2) Treat per-device exclusion ranges just as r/w unity-mapped
   regions.


I think option 2) is the best fix here.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs

2019-11-04 Thread Adrian Huang
When attaching two Broadcom RAID controllers to a server, the first one
reports the failure during booting (the disks connecting to the RAID
controller cannot be detected):

  megaraid_sas :42:00.0: Init cmd return status FAILED for SCSI host 0
  megaraid_sas :42:00.0: Failed from megasas_init_fw 6376

Root-cause of the issue:
1) Those two RAID controllers define their own IVMDs with the
   valid exclusion range, and they are associated with the same
   IOMMU hardware:

 Subtable Type : 10 [Hardware Definition Block]
 Flags : B0
Length : 0028
  DeviceId : 4002

 Capability Offset : 0040
  Base Address : B410
 PCI Segment Group : 
   Virtualization Info : 
  Reserved : 80048F6F

Entry Type : 03
 Device ID : 4008
  Data Setting : 00

Entry Type : 04
 Device ID : 7FFE
  Data Setting : 00

 Subtable Type : 21 [Memory Definition Block]
 Flags : 08
Length : 0020
  DeviceId : 4200

Auxiliary Data : 
  Reserved : 
 Start Address : 9F58D000
 Memory Length : 0804

 Subtable Type : 21 [Memory Definition Block]
 Flags : 08
Length : 0020
  DeviceId : 4300

Auxiliary Data : 
  Reserved : 
 Start Address : 9754D000
 Memory Length : 0804

2) When set_device_exclusion_range() parses the IVMD of devce id
   '4200', the exclusion range of the amd_iommu struct becomes:

iommu->exclusion_start = 0x9F58D000;
iommu->exclusion_length = 0x804;

3) When parsing the second IVMD (device id '4300') in
   set_device_exclusion_range(), the exclusion range of the
   amd_iommu struct becomes:

iommu->exclusion_start = 0x9754D000;
iommu->exclusion_length = 0x804;

   This overwrites the first IVMD configuration, which leads to
   the failure of the first RAID controller.

This patch fixes the issue by using unity map for multiple IVMDs if
those IVMDs define the valid exclusion range (different exclusion range)
and they are associated with the same IOMMU hardware. Note that the first
IVMD still uses the exclusion range.

Signed-off-by: Adrian Huang 
---
 drivers/iommu/amd_iommu_init.c | 33 +++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 568c52317757..d65b548a42f5 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -71,6 +71,8 @@
 #define IVHD_FLAG_ISOC_EN_MASK  0x08
 
 #define IVMD_FLAG_EXCL_RANGE0x08
+#define IVMD_FLAG_IW0x04
+#define IVMD_FLAG_IR0x02
 #define IVMD_FLAG_UNITY_MAP 0x01
 
 #define ACPI_DEVFLAG_INITPASS   0x01
@@ -1110,6 +1112,32 @@ static int __init add_early_maps(void)
return 0;
 }
 
+static int __init exclusion_range_has_configured(struct amd_iommu *iommu,
+   struct ivmd_header *m)
+{
+   /* Not configure yet. */
+   if (!iommu->exclusion_start) {
+   iommu->exclusion_start = m->range_start;
+   iommu->exclusion_length = m->range_length;
+
+   return 0;
+   }
+
+   if (iommu->exclusion_start == m->range_start &&
+   iommu->exclusion_length == m->range_length)
+   return 0;
+
+   /*
+* The exclusion range of the iommu has been configured
+* by the other IVMD, so we need to use unity map for this
+* IVMD to avoid the overwritten exclusion range members of the
+* amd_iommu structure.
+*/
+   m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP);
+
+   return 1;
+}
+
 /*
  * Reads the device exclusion range from ACPI and initializes the IOMMU with
  * it
@@ -1122,14 +1150,15 @@ static void __init set_device_exclusion_range(u16 
devid, struct ivmd_header *m)
return;
 
if (iommu) {
+   if (exclusion_range_has_configured(iommu, m))
+   return;
+
/*
 * We only can configure exclusion ranges per IOMMU, not
 * per device. But we can enable the exclusion range per
 * device. This is done here
 */
set_dev_entry_bit(devid, DEV_ENTRY_EX);
-