RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

2020-12-06 Thread Merger, Edgar [AUTOSOL/MAS/AUGS]
Hi Alex,

I believe in the patch file, this
+   (pdev->subsystem_device == 0x0c19 ||
+pdev->subsystem_device == 0x0c10))

Has to be changed to:
+   (pdev->subsystem_device == 0xce19 ||
+pdev->subsystem_device == 0xcc10))

Because our SSIDs are "ea50:ce19" and "ea50:cc10" respectively and another one 
would "ea50:cc08". 

I will apply that patch and feedback the results soon plus the patch file that 
I actually had applied.


-Original Message-
From: Deucher, Alexander  
Sent: Montag, 30. November 2020 19:36
To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; Huang, Ray 
; Kuehling, Felix 
Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
linux-...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas 
; Joerg Roedel ; Zhu, Changfeng 

Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken

[AMD Public Use]

> -Original Message-
> From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
> Sent: Thursday, November 26, 2020 4:24 AM
> To: Deucher, Alexander ; Huang, Ray 
> ; Kuehling, Felix 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> Alex,
> 
> This is pretty much the same patch as what I have received from Joerg 
> previously, except that it is tied to the particular Emerson platform 
> and its derivatives (listed with Subsystem IDs).

Right.  As per my original point, I don't want to disable ATS on all Picasso 
chips because doing so would break GPU compute on them, so I'd like to apply 
this quirk as narrowly as possible.

> 
> Below patch was what Joerg provided me and I successfully tested.
> 
> This diff to the kernel should do that:
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 
> f70692ac79c5..3911b0ec57ba 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5176,6 +5176,8 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI,
> 0x6900, quirk_amd_harvest_no_ats);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7312, 
> quirk_amd_harvest_no_ats);
>  /* AMD Navi14 dGPU */
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7340, 
> quirk_amd_harvest_no_ats);
> +/* AMD Raven platform iGPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x15d8, 
> +quirk_amd_harvest_no_ats);
>  #endif /* CONFIG_PCI_ATS */
> 
>  /* Freescale PCIe doesn't support MSI in RC mode */
> 
> So far I have seen this issue on two instances of this chip, but I 
> must admit that I did test only two of them to this extent, so I guess 
> it is not a bad chip in particular, but the chips we use are from the 
> same production lot, so it might be a systematical problem of that production 
> lot?
> 
> UEFI-Setup shows:
> Processor Family: 17h
> Procossor Model: 20h - 2Fh
> CPUID: 00820F01
> Microcode Patch Level: 8200103
> 
> Looking at the chip-die I found that this is a fully qualified IP 
> Silicon (according to Ryzen Embedded R1000 SOC Interlock).
> YE1305C9T20FG
> BI2015SUY
> 9JB6496P00123
> 2016 AMD
> DIFFUSED IN USA
> MADE IN CHINA
> 
> Currently used SBIOS is a branch from "EmbeddedPI-FP5 1.2.0.3RC3".
> 
> In the future our SBIOS might merge with EmbeddedPI-FP5_1.2.0.5RC3.
> 

I think it's more likely an sbios issue, so hopefully the new release fixes it.

Alex

> 
> 
> 
> -Original Message-
> From: Deucher, Alexander 
> Sent: Mittwoch, 25. November 2020 17:08
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] ; 
> Huang, Ray ; Kuehling, Felix 
> 
> Cc: Will Deacon ; linux-ker...@vger.kernel.org; 
> linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> Helgaas ; Joerg Roedel ; Zhu, 
> Changfeng 
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> broken
> 
> [AMD Public Use]
> 
> > -Original Message-
> > From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> 
> > Sent: Wednesday, November 25, 2020 5:04 AM
> > To: Deucher, Alexander ; Huang, Ray 
> > ; Kuehling, Felix 
> > Cc: Will Deacon ; linux-ker...@vger.kernel.org;
> > linux- p...@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn 
> > Helgaas ; Joerg Roedel ; Zhu, 
> > Changfeng 
> > Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as 
> > broken
> >
> > I do have also other problems with this unit, when IOMMU is enabled 
> > and pci=noats is not set as kernel parameter.
> >
> > [ 2004.265906] amdgpu :0b:00.0: [drm:amdgpu_ib_ring_tests 
> > [amdgpu]]
> > *ERROR* IB test failed on gfx (-110).
> > [ 2004.266024] [drm:amdgpu_device_delayed_init_work_handler
> [amdgpu]]
> > *ERROR* ib ring test failed (-110).
> >
> 
> Is this seen on all instances of this chip or only specific silicon?  
> I.e., could this be a bad chip?  Would it be possible to test a newer 
> sbios?  I think the attached patch should work if we can't get it 
> fixed on the platform side.  It should only enable the quirk on your 
> partic

Re: [PATCH] iommu/amd: Increase interrupt remapping table limit to 512 entries

2020-12-06 Thread Suravee Suthikulpanit

Jerry,

On 12/2/20 6:53 AM, Jerry Snitselaar wrote:


Suravee Suthikulpanit @ 2020-10-14 19:50 MST:


Certain device drivers allocate IO queues on a per-cpu basis.
On AMD EPYC platform, which can support up-to 256 cpu threads,
this can exceed the current MAX_IRQ_PER_TABLE limit of 256,
and result in the error message:

 AMD-Vi: Failed to allocate IRTE

This has been observed with certain NVME devices.

AMD IOMMU hardware can actually support upto 512 interrupt
remapping table entries. Therefore, update the driver to
match the hardware limit.

Please note that this also increases the size of interrupt remapping
table to 8KB per device when using the 128-bit IRTE format.

Signed-off-by: Suravee Suthikulpanit 
---
  drivers/iommu/amd/amd_iommu_types.h | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/amd_iommu_types.h 
b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..427484c45589 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -406,7 +406,11 @@ extern bool amd_iommu_np_cache;
  /* Only true if all IOMMUs support device IOTLBs */
  extern bool amd_iommu_iotlb_sup;
  
-#define MAX_IRQS_PER_TABLE	256

+/*
+ * AMD IOMMU hardware only support 512 IRTEs despite
+ * the architectural limitation of 2048 entries.
+ */
+#define MAX_IRQS_PER_TABLE 512
  #define IRQ_TABLE_ALIGNMENT   128
  
  struct irq_remap_table {


With this change should DTE_IRQ_TABLE_LEN be changed to 9? IIUC the spec
correctly leaving it at 8 is saying the table is 256 entries long.


You are correct. Sorry I missed this part. I'll send the fix-up patch ASAP.

Thank you,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] drivers/iommu: fix a null-ptr-deref bug in msm_iommu.c

2020-12-06 Thread tangzhenhao
At line 600 in drivers/iommu/msm_iommu.c, the ret-val of kzalloc should be 
checked to avoid null-ptr-deref bug.

Signed-off-by: tangzhenhao 
---
 drivers/iommu/msm_iommu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 3615cd6241c4..e3c576e5babb 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -598,6 +598,10 @@ static void insert_iommu_master(struct device *dev,
 
if (list_empty(&(*iommu)->ctx_list)) {
master = kzalloc(sizeof(*master), GFP_ATOMIC);
+   if (!master) {
+   dev_err(dev, "Failed to allocate IOMMU context bank 
instance\n");
+   return;
+   }
master->of_node = dev->of_node;
list_add(&master->list, &(*iommu)->ctx_list);
dev_iommu_priv_set(dev, master);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu