Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
On Wed, Nov 13, 2019 at 12:05 AM Joerg Roedel wrote: > > On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote: > > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel wrote: > > > > So there are a couple of options here: > > > > > > 1) Bail out and disable the IOMMU as the BIOS screwed up > > > > > > 2) Treat per-device exclusion ranges just as r/w unity-mapped > > >regions. > > > > > > > > > I think option 2) is the best fix here. > > > > Yes. This is what this patch does (The first exclusion region still > > uses the exclusion range while the remaining exclusion regions are > > modified to be r/w unity-mapped regions when detecting multiple > > exclusion regions) . > > Yeah, but I think it is better to just stop using the exclusion-range > feature of the hardware (because it meand BIOSes are correct) and just > treat every exclusion range (also the first one) as an r/w unity range. Okay, I see. If you're okay with this (treat per-device exclusion ranges as r/w unity-mapped regions - including the first one), I can prepare the patch. > > > But, I'm guessing you're talking about that BIOS has to define r/w > > unity-mapped regions instead of the per-device exclusions (Fix from > > BIOS, not prevent the failure from kernel). Right? > > That would be best, but I fear this is too much to wish for. Totally agree. That's why I submit this patch. :-) -- Adrian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
On Tue, Nov 12, 2019 at 05:32:31PM +0800, Huang Adrian wrote: > > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel wrote: > > So there are a couple of options here: > > > > 1) Bail out and disable the IOMMU as the BIOS screwed up > > > > 2) Treat per-device exclusion ranges just as r/w unity-mapped > >regions. > > > > > > I think option 2) is the best fix here. > > Yes. This is what this patch does (The first exclusion region still > uses the exclusion range while the remaining exclusion regions are > modified to be r/w unity-mapped regions when detecting multiple > exclusion regions) . Yeah, but I think it is better to just stop using the exclusion-range feature of the hardware (because it meand BIOSes are correct) and just treat every exclusion range (also the first one) as an r/w unity range. > But, I'm guessing you're talking about that BIOS has to define r/w > unity-mapped regions instead of the per-device exclusions (Fix from > BIOS, not prevent the failure from kernel). Right? That would be best, but I fear this is too much to wish for. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
Hi Joerg, > On Mon, Nov 11, 2019 at 11:22 PM Joerg Roedel wrote: > > Hi Adrian, > > On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote: > > 2) When set_device_exclusion_range() parses the IVMD of devce id > > '4200', the exclusion range of the amd_iommu struct becomes: > > > > iommu->exclusion_start = 0x9F58D000; > > iommu->exclusion_length = 0x804; > > > > 3) When parsing the second IVMD (device id '4300') in > > set_device_exclusion_range(), the exclusion range of the > > amd_iommu struct becomes: > > > > iommu->exclusion_start = 0x9754D000; > > iommu->exclusion_length = 0x804; > > > > This overwrites the first IVMD configuration, which leads to > > the failure of the first RAID controller. > > Okay, I think this is clearly a BIOS bug as the BIOS should not define > two different exclusion ranges in the IVRS table. Thanks for the review. Yes. I understand this is a BIOS bug. The purpose of this patch is to prevent the failure event though the system boots with the buggy BIOS. > So there are a couple of options here: > > 1) Bail out and disable the IOMMU as the BIOS screwed up > > 2) Treat per-device exclusion ranges just as r/w unity-mapped >regions. > > > I think option 2) is the best fix here. Yes. This is what this patch does (The first exclusion region still uses the exclusion range while the remaining exclusion regions are modified to be r/w unity-mapped regions when detecting multiple exclusion regions) . But, I'm guessing you're talking about that BIOS has to define r/w unity-mapped regions instead of the per-device exclusions (Fix from BIOS, not prevent the failure from kernel). Right? -- Adrian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
Hi Adrian, On Mon, Nov 04, 2019 at 01:58:52PM +0800, Adrian Huang wrote: > 2) When set_device_exclusion_range() parses the IVMD of devce id > '4200', the exclusion range of the amd_iommu struct becomes: > > iommu->exclusion_start = 0x9F58D000; > iommu->exclusion_length = 0x804; > > 3) When parsing the second IVMD (device id '4300') in > set_device_exclusion_range(), the exclusion range of the > amd_iommu struct becomes: > > iommu->exclusion_start = 0x9754D000; > iommu->exclusion_length = 0x804; > > This overwrites the first IVMD configuration, which leads to > the failure of the first RAID controller. Okay, I think this is clearly a BIOS bug as the BIOS should not define two different exclusion ranges in the IVRS table. So there are a couple of options here: 1) Bail out and disable the IOMMU as the BIOS screwed up 2) Treat per-device exclusion ranges just as r/w unity-mapped regions. I think option 2) is the best fix here. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/1] iommu/amd: Fix the overwritten exclusion range with multiple IVMDs
When attaching two Broadcom RAID controllers to a server, the first one reports the failure during booting (the disks connecting to the RAID controller cannot be detected): megaraid_sas :42:00.0: Init cmd return status FAILED for SCSI host 0 megaraid_sas :42:00.0: Failed from megasas_init_fw 6376 Root-cause of the issue: 1) Those two RAID controllers define their own IVMDs with the valid exclusion range, and they are associated with the same IOMMU hardware: Subtable Type : 10 [Hardware Definition Block] Flags : B0 Length : 0028 DeviceId : 4002 Capability Offset : 0040 Base Address : B410 PCI Segment Group : Virtualization Info : Reserved : 80048F6F Entry Type : 03 Device ID : 4008 Data Setting : 00 Entry Type : 04 Device ID : 7FFE Data Setting : 00 Subtable Type : 21 [Memory Definition Block] Flags : 08 Length : 0020 DeviceId : 4200 Auxiliary Data : Reserved : Start Address : 9F58D000 Memory Length : 0804 Subtable Type : 21 [Memory Definition Block] Flags : 08 Length : 0020 DeviceId : 4300 Auxiliary Data : Reserved : Start Address : 9754D000 Memory Length : 0804 2) When set_device_exclusion_range() parses the IVMD of devce id '4200', the exclusion range of the amd_iommu struct becomes: iommu->exclusion_start = 0x9F58D000; iommu->exclusion_length = 0x804; 3) When parsing the second IVMD (device id '4300') in set_device_exclusion_range(), the exclusion range of the amd_iommu struct becomes: iommu->exclusion_start = 0x9754D000; iommu->exclusion_length = 0x804; This overwrites the first IVMD configuration, which leads to the failure of the first RAID controller. This patch fixes the issue by using unity map for multiple IVMDs if those IVMDs define the valid exclusion range (different exclusion range) and they are associated with the same IOMMU hardware. Note that the first IVMD still uses the exclusion range. Signed-off-by: Adrian Huang --- drivers/iommu/amd_iommu_init.c | 33 +++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 568c52317757..d65b548a42f5 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -71,6 +71,8 @@ #define IVHD_FLAG_ISOC_EN_MASK 0x08 #define IVMD_FLAG_EXCL_RANGE0x08 +#define IVMD_FLAG_IW0x04 +#define IVMD_FLAG_IR0x02 #define IVMD_FLAG_UNITY_MAP 0x01 #define ACPI_DEVFLAG_INITPASS 0x01 @@ -1110,6 +1112,32 @@ static int __init add_early_maps(void) return 0; } +static int __init exclusion_range_has_configured(struct amd_iommu *iommu, + struct ivmd_header *m) +{ + /* Not configure yet. */ + if (!iommu->exclusion_start) { + iommu->exclusion_start = m->range_start; + iommu->exclusion_length = m->range_length; + + return 0; + } + + if (iommu->exclusion_start == m->range_start && + iommu->exclusion_length == m->range_length) + return 0; + + /* +* The exclusion range of the iommu has been configured +* by the other IVMD, so we need to use unity map for this +* IVMD to avoid the overwritten exclusion range members of the +* amd_iommu structure. +*/ + m->flags = (IVMD_FLAG_IW | IVMD_FLAG_IR | IVMD_FLAG_UNITY_MAP); + + return 1; +} + /* * Reads the device exclusion range from ACPI and initializes the IOMMU with * it @@ -1122,14 +1150,15 @@ static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m) return; if (iommu) { + if (exclusion_range_has_configured(iommu, m)) + return; + /* * We only can configure exclusion ranges per IOMMU, not * per device. But we can enable the exclusion range per * device. This is done here */ set_dev_entry_bit(devid, DEV_ENTRY_EX); -