Re: [PATCH v2 3/4] iommu/iova: Extend rbtree node caching

2017-07-31 Thread Robin Murphy
Hi Nate,

On 29/07/17 04:57, Nate Watterson wrote:
> Hi Robin,
> I am seeing a crash when performing very basic testing on this series
> with a Mellanox CX4 NIC. I dug into the crash a bit, and think this
> patch is the culprit, but this rcache business is still mostly
> witchcraft to me.
> 
> # ifconfig eth5 up
> # ifconfig eth5 down
> Unable to handle kernel NULL pointer dereference at virtual address
> 0020
> user pgtable: 64k pages, 48-bit VAs, pgd = 8007dbf47c00
> [0020] *pgd=0006efab0003, *pud=0006efab0003,
> *pmd=0007d8720003, *pte=
> Internal error: Oops: 9607 [#1] SMP
> Modules linked in:
> CPU: 47 PID: 5082 Comm: ifconfig Not tainted 4.13.0-rtp-enablement+ #3
> task: 8007da1e5780 task.stack: 8007ddcb8000
> PC is at __cached_rbnode_delete_update+0x2c/0x58
> LR is at private_free_iova+0x2c/0x60
> pc : [] lr : [] pstate: 204001c5
> sp : 8007ddcbba00
> x29: 8007ddcbba00 x28: 8007c8350210
> x27: 8007d1a8 x26: 8007dcc20800
> x25: 0140 x24: 8007c98f0008
> x23: fe4e x22: 0140
> x21: 8007c98f0008 x20: 8007c9adb240
> x19: 8007c98f0018 x18: 0010
> x17:  x16: 
> x15: 4000 x14: 
> x13:  x12: 0001
> x11: dead0200 x10: 
> x9 :  x8 : 8007c9adb1c0
> x7 : 40002000 x6 : 00210d00
> x5 :  x4 : c57e
> x3 : ffcf x2 : ffcf
> x1 : 8007c9adb240 x0 : 
> [...]
> [] __cached_rbnode_delete_update+0x2c/0x58
> [] private_free_iova+0x2c/0x60
> [] iova_magazine_free_pfns+0x4c/0xa0
> [] free_iova_fast+0x1b0/0x230
> [] iommu_dma_free_iova+0x5c/0x80
> [] __iommu_dma_unmap+0x5c/0x98
> [] iommu_dma_unmap_resource+0x24/0x30
> [] iommu_dma_unmap_page+0xc/0x18
> [] __iommu_unmap_page+0x40/0x60
> [] mlx5e_page_release+0xbc/0x128
> [] mlx5e_dealloc_rx_wqe+0x30/0x40
> [] mlx5e_close_channel+0x70/0x1f8
> [] mlx5e_close_channels+0x2c/0x50
> [] mlx5e_close_locked+0x54/0x68
> [] mlx5e_close+0x30/0x58
> [...]
> 
> ** Disassembly for __cached_rbnode_delete_update() near the fault **
>   92|if (free->pfn_hi < iovad->dma_32bit_pfn)
> 0852C6C4|ldr x3,[x1,#0x18]; x3,[free,#24]
> 0852C6C8|ldr x2,[x0,#0x30]; x2,[iovad,#48]
> 0852C6CC|cmp x3,x2
> 0852C6D0|b.cs0x0852C708
> |curr = >cached32_node;
>   94|if (!curr)
> 0852C6D4|addsx19,x0,#0x18 ; x19,iovad,#24
> 0852C6D8|b.eq0x0852C708
> |
> |cached_iova = rb_entry(*curr, struct iova, node);
> |
>   99|if (free->pfn_lo >= cached_iova->pfn_lo)
> 0852C6DC|ldr x0,[x19] ; xiovad,[curr]
> 0852C6E0|ldr x2,[x1,#0x20]; x2,[free,#32]
> 0852C6E4|ldr x0,[x0,#0x20]; x0,[x0,#32]
> Apparently cached_iova was NULL so the pfn_lo access faulted.
> 
> 0852C6E8|cmp x2,x0
> 0852C6EC|b.cc0x0852C6FC
> 0852C6F0|mov x0,x1; x0,free
>  100|*curr = rb_next(>node);
> After instrumenting the code a bit, this seems to be the culprit. In the
> previous call, free->pfn_lo was 0x_ which is actually the
> dma_limit for the domain so rb_next() returns NULL.
> 
> Let me know if you have any questions or would like additional tests
> run. I also applied your "DMA domain debug info" patches and dumped the
> contents of the domain at each of the steps above in case that would be
> useful. If nothing else, they reinforce how thirsty the CX4 NIC is
> especially when using 64k pages and many CPUs.

Thanks for the report - I somehow managed to reason myself out of
keeping the "no cached node" check in __cached_rbnode_delete_update() on
the assumption that it must always be set by a previous allocation.
However, there is indeed just one case case for which that fails: when
you free any IOVA immediately after freeing the very topmost one. Which
is something that freeing an entire magazine's worth of IOVAs back to
the tree all at once has a very real chance of doing...

The obvious straightforward fix is inline below, but I'm now starting to
understand the appeal of reserving a sentinel node to ensure the tree
can never be empty, so I might have a quick go at that to see if it
results in nicer code overall.

Robin.

>
> -Nate
> 
> On 7/21/2017 7:42 AM, Robin Murphy wrote:
>> The cached node mechanism provides a significant performance benefit 

Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

2017-07-31 Thread Baoquan He
On 07/31/17 at 12:21pm, Joerg Roedel wrote:
> Hi Baoquan,
> 
> On Mon, Jul 31, 2017 at 06:15:30PM +0800, Baoquan He wrote:
> > I plan to add GFP_DMA32 when allocate amd_iommu_dev_table in
> > early_amd_iommu_init() as below. Then in kdump kernel we don't need to
> > worry if the old amd_iommu_dev_table could be above 4G, right? And might
> > not need to check if it's above 4G, right?
> > 
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index 781a138..85d6445 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -2436,7 +2436,8 @@ static int __init early_amd_iommu_init(void)
> >  
> > /* Device table - directly used by all IOMMUs */
> > ret = -ENOMEM;
> > -   amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > +   amd_iommu_dev_table = (void *)__get_free_pages(
> > + GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
> >   get_order(dev_table_size));
> > if (amd_iommu_dev_table == NULL)
> > goto out;
> 
> Yeah, adding GFP_DMA32 is right. But you still need to check it in the
> kdump path. Not checking it would mean you trust the old kernel, but
> since it paniced there is no reason to put any trust in what happened
> before.

You are right, it could be touched accidentally. It must be checked.
Thanks a lot for your answer!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

2017-07-31 Thread Joerg Roedel
Hi Baoquan,

On Mon, Jul 31, 2017 at 06:15:30PM +0800, Baoquan He wrote:
> I plan to add GFP_DMA32 when allocate amd_iommu_dev_table in
> early_amd_iommu_init() as below. Then in kdump kernel we don't need to
> worry if the old amd_iommu_dev_table could be above 4G, right? And might
> not need to check if it's above 4G, right?
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index 781a138..85d6445 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -2436,7 +2436,8 @@ static int __init early_amd_iommu_init(void)
>  
>   /* Device table - directly used by all IOMMUs */
>   ret = -ENOMEM;
> - amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> + amd_iommu_dev_table = (void *)__get_free_pages(
> +   GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
> get_order(dev_table_size));
>   if (amd_iommu_dev_table == NULL)
>   goto out;

Yeah, adding GFP_DMA32 is right. But you still need to check it in the
kdump path. Not checking it would mean you trust the old kernel, but
since it paniced there is no reason to put any trust in what happened
before.



Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 10/13] iommu/amd: Allocate memory below 4G for dev table if translation pre-enabled

2017-07-31 Thread Baoquan He
Hi Joerg,

On 07/28/17 at 05:06pm, Baoquan He wrote:
> Hi Joerg,
> 
> On 07/27/17 at 05:55pm, Joerg Roedel wrote:
> > On Fri, Jul 21, 2017 at 04:59:08PM +0800, Baoquan He wrote:
> > > AMD pointed out it's unsafe to update the device-table while iommu
> > > is enabled. It turns out that device-table pointer update is split
> > > up into two 32bit writes in the IOMMU hardware. So updating it while
> > > the IOMMU is enabled could have some nasty side effects.
> > > 
> > > The only way to work around this is to allocate the device-table below
> > > 4GB if translation is pre-enabled in kdump kernel. If allocation failed,
> > > still use the old one.
> > 
> > Not only for the kdump kernel. The old device table must also be below
> > 4GB so that its pointer can be updated with a 32bit write.
> > 
> > If the old table is above 4GB you still need the second write to zero
> > the upper parts of the pointer in hardware.
> 
> Do you mean the allocation of amd_iommu_dev_table in
> early_amd_iommu_init() also need be addressed for 1st kernel? Seems we
> don't make sure that for 1st kernel, like adding GFP_DMA32 flag when
> allocate amd_iommu_dev_table in amd_iommu_dev_table
> early_amd_iommu_init().

I plan to add GFP_DMA32 when allocate amd_iommu_dev_table in
early_amd_iommu_init() as below. Then in kdump kernel we don't need to
worry if the old amd_iommu_dev_table could be above 4G, right? And might
not need to check if it's above 4G, right?

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 781a138..85d6445 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2436,7 +2436,8 @@ static int __init early_amd_iommu_init(void)
 
/* Device table - directly used by all IOMMUs */
ret = -ENOMEM;
-   amd_iommu_dev_table = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
+   amd_iommu_dev_table = (void *)__get_free_pages(
+ GFP_KERNEL | __GFP_ZERO | GFP_DMA32,
  get_order(dev_table_size));
if (amd_iommu_dev_table == NULL)
goto out;
-- 
2.9.4


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 02/13] iommu/amd: add several helper functions

2017-07-31 Thread Baoquan He
Hi Joerg,

On 07/27/17 at 05:06pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:00PM +0800, Baoquan He wrote:
> > Move single iommu enabling codes into a wrapper function 
> > early_enable_iommu().
> > This can make later kdump change easier.
> > 
> > And also add iommu_disable_command_buffer and iommu_disable_event_buffer
> > for later usage.
> > 
> > Signed-off-by: Baoquan He 
> > ---
> >  drivers/iommu/amd_iommu_init.c | 42 
> > +++---
> >  1 file changed, 31 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index e39857ce6481..4ca6e3257d92 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -634,6 +634,14 @@ static void iommu_enable_command_buffer(struct 
> > amd_iommu *iommu)
> > amd_iommu_reset_cmd_buffer(iommu);
> >  }
> >  
> > +/*
> > + * This function disables the command buffer
> > + */
> > +static void iommu_disable_command_buffer(struct amd_iommu *iommu)
> > +{
> > +   iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
> > +}
> > +
> >  static void __init free_command_buffer(struct amd_iommu *iommu)
> >  {
> > free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
> > @@ -666,6 +674,14 @@ static void iommu_enable_event_buffer(struct amd_iommu 
> > *iommu)
> > iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
> >  }
> >  
> > +/*
> > + * This function disables the command buffer
> 
> s/command buffer/event log/
> 
> > + */
> > +static void iommu_disable_event_buffer(struct amd_iommu *iommu)
> 
> Please also use event_log here.

I found the event log related handling functions all take
xxx_event_buffer names, like:
alloc_event_buffer()
iommu_enable_event_buffer()
free_event_buffer()

So for consistency, I plan to still use iommu_disable_event_buffer().
Maybe later we can change them in a clean up patch independently.

Thanks
Baoquan

> 
> > +{
> > +   iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
> > +}
> > +
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/3] memory: mtk-smi: add larbid handle routine

2017-07-31 Thread Matthias Brugger



On 07/28/2017 12:55 PM, honghui.zh...@mediatek.com wrote:

From: Honghui Zhang 

In the commit 3c8f4ad85c4b ("memory/mediatek: add support for mt2701"),
the larb->larbid was added but not initialized.
Mediatek's gen1 smi need this hardware larbid information to get the
register offset which controls whether enable iommu for this larb.
This patch add the initialize routine for larbid.

Signed-off-by: Honghui Zhang 


Reviewed-by: Matthias Brugger 


---
  drivers/memory/mtk-smi.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
index 2b798bb4..8f75aaa 100644
--- a/drivers/memory/mtk-smi.c
+++ b/drivers/memory/mtk-smi.c
@@ -42,6 +42,7 @@
  #define SMI_SECUR_CON_VAL_DOMAIN(id)  (0x3 << id) & 0x7) << 2) + 1))
  
  struct mtk_smi_larb_gen {

+   bool need_larbid;
int port_in_larb[MTK_LARB_NR_MAX + 1];
void (*config_port)(struct device *);
  };
@@ -214,6 +215,7 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt8173 = {
  };
  
  static const struct mtk_smi_larb_gen mtk_smi_larb_mt2701 = {

+   .need_larbid = true,
.port_in_larb = {
LARB0_PORT_OFFSET, LARB1_PORT_OFFSET,
LARB2_PORT_OFFSET, LARB3_PORT_OFFSET
@@ -240,6 +242,7 @@ static int mtk_smi_larb_probe(struct platform_device *pdev)
struct device *dev = >dev;
struct device_node *smi_node;
struct platform_device *smi_pdev;
+   int err;
  
  	if (!dev->pm_domain)

return -EPROBE_DEFER;
@@ -263,6 +266,15 @@ static int mtk_smi_larb_probe(struct platform_device *pdev)
return PTR_ERR(larb->smi.clk_smi);
larb->smi.dev = dev;
  
+	if (larb->larb_gen->need_larbid) {

+   err = of_property_read_u32(dev->of_node, "mediatek,larbid",
+  >larbid);
+   if (err) {
+   dev_err(dev, "missing larbid property\n");
+   return err;
+   }
+   }
+
smi_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0);
if (!smi_node)
return -EINVAL;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu