Re: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted

2019-10-16 Thread Jason Wang


On 2019/10/15 下午3:35, Christoph Hellwig wrote:

On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote:

From: Thiago Jung Bauermann 

Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must
be set by both device and guest driver. However, as a hack, when DMA API
returns physical addresses, guest driver can use the DMA API; even though
device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical
addresses.

Sorry, but this is a complete bullshit hack.  Driver must always use
the DMA API if they do DMA, and if virtio devices use physical addresses
that needs to be returned through the platform firmware interfaces for
the dma setup.  If you don't do that yet (which based on previous
informations you don't), you need to fix it, and we can then quirk
old implementations that already are out in the field.

In other words: we finally need to fix that virtio mess and not pile
hacks on top of hacks.



I agree, the only reason for IOMMU_PLATFORM is to make sure guest works 
for some old and buggy qemu when vIOMMU is enabled which seems useless 
(note we don't even support vIOMMU migration in that case).


Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] drivers: iommu: hyperv: Make HYPERV_IOMMU only available on x86

2019-10-16 Thread Boqun Feng
Currently hyperv-iommu is implemented in a x86 specific way, for
example, apic is used. So make the HYPERV_IOMMU Kconfig depend on X86
as a preparation for enabling HyperV on architecture other than x86.

Cc: Lan Tianyu 
Cc: Michael Kelley 
Cc: linux-hyp...@vger.kernel.org
Signed-off-by: Boqun Feng (Microsoft) 
---

Without this patch, I could observe compile error:

| drivers/iommu/hyperv-iommu.c:17:10: fatal error: asm/apic.h: No such
| file or directory
|   17 | #include 
|  |  ^~~~

, after apply Michael's ARM64 on HyperV enablement patchset.

 drivers/iommu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index e3842eabcfdd..f1086eaed41c 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -467,7 +467,7 @@ config QCOM_IOMMU
 
 config HYPERV_IOMMU
bool "Hyper-V x2APIC IRQ Handling"
-   depends on HYPERV
+   depends on HYPERV && X86
select IOMMU_API
default HYPERV
help
-- 
2.23.0



Re: [PATCH -next] iommu/amd: fix a warning in increase_address_space

2019-10-16 Thread Jerry Snitselaar

On Wed Oct 16 19, Jerry Snitselaar wrote:

On Wed Oct 16 19, Qian Cai wrote:


BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
suspicious. Not sure what the purpose of it.

*updated = increase_address_space(domain, gfp) || *updated;



Looking at it again I think that isn't an issue really, it would just
not lose updated being set in a previous loop iteration, but now
I'm wondering about the loop itself. In the cases where it would return
false, how does the evaluation of the condition for the while loop
change?



I guess the mode level 6 check is really for other potential callers
increase_address_space, none exist at the moment, and the condition
of the while loop in alloc_pte should fail if the mode level is 6.



Re: [PATCH -next] iommu/amd: fix a warning in increase_address_space

2019-10-16 Thread Jerry Snitselaar

On Wed Oct 16 19, Qian Cai wrote:

After the commit 754265bcab78 ("iommu/amd: Fix race in
increase_address_space()"), it could still possible trigger a race
condition under some heavy memory pressure below. The race to trigger a
warning is,

CPU0:   CPU1:
in alloc_pte(): in increase_address_space():
while (address > PM_LEVEL_SIZE(domain->mode)) [1]

spin_lock_irqsave(>lock
domain->mode+= 1;
spin_unlock_irqrestore(>lock

in increase_address_space():
spin_lock_irqsave(>lock
if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))

[1] domain->mode = 5

It is unclear the triggering of the warning is the root cause of the
smartpqi offline yet, but let's fix it first by lifting the locking.

WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474
iommu_map_page+0x718/0x7e0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec flags=0x0010]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1000 flags=0x0010]
CPU: 57 PID: 124314 Comm: oom01 Tainted: G   O
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
RIP: 0010:iommu_map_page+0x718/0x7e0
Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8
08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48
8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48
RSP: 0018:888da4816cb8 EFLAGS: 00010046
RAX:  RBX: 8885fe689000 RCX: 96f4a6c4
RDX: 0007 RSI: dc00 RDI: 8885fe689124
RBP: 888da4816da8 R08: ed10bfcd120e R09: ed10bfcd120e
R10: ed10bfcd120d R11: 8885fe68906b R12: 
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1a00 flags=0x0010]
R13: 8885fe689068 R14: 8885fe689124 R15: 
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1e00 flags=0x0010]
FS:  7f29722ba700() GS:88902f88()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f27f82d8000 CR3: 00102ed9c000 CR4: 003406e0
Call Trace:
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2000 flags=0x0010]
map_sg+0x1ce/0x2f0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2400 flags=0x0010]
scsi_dma_map+0xd7/0x160
pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2800 flags=0x0010]
pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2c00 flags=0x0010]
scsi_queue_rq+0xd19/0x1360
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec3000 flags=0x0010]
__blk_mq_try_issue_directly+0x295/0x3f0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec3400 flags=0x0010]
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x
address=0xffec3800 flags=0x0010]
blk_mq_request_issue_directly+0xb5/0x100
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x
address=0xffec3c00 flags=0x0010]
blk_mq_try_issue_list_directly+0xa9/0x160
blk_mq_sched_insert_requests+0x228/0x380
blk_mq_flush_plug_list+0x448/0x7e0
blk_flush_plug_list+0x1eb/0x230
blk_finish_plug+0x43/0x5d
shrink_node_memcg+0x9c5/0x1550
smartpqi :23:00.0: controller is offline: status code 0x14803
smartpqi :23:00.0: controller offline

Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Qian Cai 
---

BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
suspicious. Not sure what the purpose of it.

*updated = increase_address_space(domain, gfp) || *updated;



Looking at it again I think that isn't an issue really, it would just
not lose updated being set in a previous loop iteration, but now
I'm wondering about the loop itself. In the cases where it would return
false, how does the evaluation of the condition for the while loop
change?


drivers/iommu/amd_iommu.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2369b8af81f3..a5754068aa29 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain 
*domain)
static bool increase_address_space(struct protection_domain *domain,
   gfp_t gfp)
{
-   unsigned long flags;
bool ret = false;
u64 *pte;

-   spin_lock_irqsave(>lock, flags);
-
if (WARN_ON_ONCE(domain->mode == 

[PATCH -next] iommu/amd: fix a warning in increase_address_space

2019-10-16 Thread Qian Cai
After the commit 754265bcab78 ("iommu/amd: Fix race in
increase_address_space()"), it could still possible trigger a race
condition under some heavy memory pressure below. The race to trigger a
warning is,

CPU0:   CPU1:
in alloc_pte(): in increase_address_space():
while (address > PM_LEVEL_SIZE(domain->mode)) [1]

spin_lock_irqsave(>lock
domain->mode+= 1;
spin_unlock_irqrestore(>lock

in increase_address_space():
spin_lock_irqsave(>lock
if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))

[1] domain->mode = 5

It is unclear the triggering of the warning is the root cause of the
smartpqi offline yet, but let's fix it first by lifting the locking.

WARNING: CPU: 57 PID: 124314 at drivers/iommu/amd_iommu.c:1474
iommu_map_page+0x718/0x7e0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec flags=0x0010]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1000 flags=0x0010]
CPU: 57 PID: 124314 Comm: oom01 Tainted: G   O
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40
07/10/2019
RIP: 0010:iommu_map_page+0x718/0x7e0
Code: 88 a5 70 ff ff ff e9 5d fa ff ff 48 8b b5 70 ff ff ff 4c 89 ef e8
08 32 2f 00 41 80 fc 01 0f 87 b7 3d 00 00 41 83 e4 01 eb be <0f> 0b 48
8b b5 70 ff ff ff 4c 89 ef e8 e7 31 2f 00 eb dd 0f 0b 48
RSP: 0018:888da4816cb8 EFLAGS: 00010046
RAX:  RBX: 8885fe689000 RCX: 96f4a6c4
RDX: 0007 RSI: dc00 RDI: 8885fe689124
RBP: 888da4816da8 R08: ed10bfcd120e R09: ed10bfcd120e
R10: ed10bfcd120d R11: 8885fe68906b R12: 
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1a00 flags=0x0010]
R13: 8885fe689068 R14: 8885fe689124 R15: 
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec1e00 flags=0x0010]
FS:  7f29722ba700() GS:88902f88()
knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f27f82d8000 CR3: 00102ed9c000 CR4: 003406e0
Call Trace:
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2000 flags=0x0010]
 map_sg+0x1ce/0x2f0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2400 flags=0x0010]
 scsi_dma_map+0xd7/0x160
 pqi_raid_submit_scsi_cmd_with_io_request+0x1b8/0x420 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2800 flags=0x0010]
 pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec2c00 flags=0x0010]
 scsi_queue_rq+0xd19/0x1360
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec3000 flags=0x0010]
 __blk_mq_try_issue_directly+0x295/0x3f0
smartpqi :23:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x
address=0xffec3400 flags=0x0010]
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x
address=0xffec3800 flags=0x0010]
 blk_mq_request_issue_directly+0xb5/0x100
AMD-Vi: Event logged [IO_PAGE_FAULT device=23:00.0 domain=0x
address=0xffec3c00 flags=0x0010]
 blk_mq_try_issue_list_directly+0xa9/0x160
 blk_mq_sched_insert_requests+0x228/0x380
 blk_mq_flush_plug_list+0x448/0x7e0
 blk_flush_plug_list+0x1eb/0x230
 blk_finish_plug+0x43/0x5d
 shrink_node_memcg+0x9c5/0x1550
smartpqi :23:00.0: controller is offline: status code 0x14803
smartpqi :23:00.0: controller offline

Fixes: 754265bcab78 ("iommu/amd: Fix race in increase_address_space()")
Signed-off-by: Qian Cai 
---

BTW, Joerg, this line from the commit "iommu/amd: Remove domain->updated" looks
suspicious. Not sure what the purpose of it.

*updated = increase_address_space(domain, gfp) || *updated;

 drivers/iommu/amd_iommu.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 2369b8af81f3..a5754068aa29 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1465,12 +1465,9 @@ static void free_pagetable(struct protection_domain 
*domain)
 static bool increase_address_space(struct protection_domain *domain,
   gfp_t gfp)
 {
-   unsigned long flags;
bool ret = false;
u64 *pte;
 
-   spin_lock_irqsave(>lock, flags);
-
if (WARN_ON_ONCE(domain->mode == PAGE_MODE_6_LEVEL))
/* address space already 64 bit large */
goto out;
@@ -1487,8 +1484,6 @@ static bool increase_address_space(struct 
protection_domain *domain,
ret = true;
 
 out:
-   spin_unlock_irqrestore(>lock, flags);
-
return ret;
 }
 
@@ -1499,14 

[bug report] iommu/amd: Pass gfp-flags to iommu_map_page()

2019-10-16 Thread Dan Carpenter
Hello Joerg Roedel,

The patch b911b89b6d01: "iommu/amd: Pass gfp-flags to
iommu_map_page()" from Jul 5, 2016, leads to the following static
checker warning:

drivers/iommu/amd_iommu.c:2545 amd_iommu_map()
warn: use 'gfp' here instead of GFP_XXX?

drivers/iommu/amd_iommu.c
  2529  static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
  2530   phys_addr_t paddr, size_t page_size, int 
iommu_prot,
  2531   gfp_t gfp)
 ^
I don't know why I'm suddenly getting this warning even though the code
is three years old...

  2532  {
  2533  struct protection_domain *domain = to_pdomain(dom);
  2534  int prot = 0;
  2535  int ret;
  2536  
  2537  if (domain->mode == PAGE_MODE_NONE)
  2538  return -EINVAL;
  2539  
  2540  if (iommu_prot & IOMMU_READ)
  2541  prot |= IOMMU_PROT_IR;
  2542  if (iommu_prot & IOMMU_WRITE)
  2543  prot |= IOMMU_PROT_IW;
  2544  
  2545  ret = iommu_map_page(domain, iova, paddr, page_size, prot, 
GFP_KERNEL);
   
^^
But it does seem like maybe gfp was intended here?

  2546  
  2547  domain_flush_np_cache(domain, iova, page_size);
  2548  
  2549  return ret;
  2550  }

See also:
drivers/iommu/dma-iommu.c:625 iommu_dma_alloc_remap() warn: use 'gfp' here 
instead of GFP_XXX?


regards,
dan carpenter
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[linux-next:master 4470/4908] drivers/iommu/iommu.c:1857:5: sparse: sparse: symbol '__iommu_map' was not declared. Should it be static?

2019-10-16 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 
master
head:   78662bffde37ccbb66ac3311fa709d8960435e98
commit: 781ca2de89bae1b1d2c96df9ef33e9a324415995 [4470/4908] iommu: Add gfp 
parameter to iommu_ops::map
reproduce:
# apt-get install sparse
# sparse version: v0.6.1-dirty
git checkout 781ca2de89bae1b1d2c96df9ef33e9a324415995
make ARCH=x86_64 allmodconfig
make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 


sparse warnings: (new ones prefixed by >>)

   drivers/iommu/iommu.c:292:5: sparse: sparse: symbol 
'iommu_insert_resv_region' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH linux-next] iommu: Add gfp parameter to iommu_ops:: __iommu_map() can be static

2019-10-16 Thread kbuild test robot


Fixes: 781ca2de89ba ("iommu: Add gfp parameter to iommu_ops::map")
Signed-off-by: kbuild test robot 
---
 iommu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f8853dbf1c4eb..89bfdbbfaa11b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1854,7 +1854,7 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
return pgsize;
 }
 
-int __iommu_map(struct iommu_domain *domain, unsigned long iova,
+static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
  phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
 {
const struct iommu_ops *ops = domain->ops;


Re: [PATCH] iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved

2019-10-16 Thread Chen, Yian




On 10/16/2019 12:51 AM, Joerg Roedel wrote:

Hi,

On Tue, Oct 15, 2019 at 09:49:32AM -0700, Yian Chen wrote:

VT-d RMRR (Reserved Memory Region Reporting) regions are reserved
for device use only and should not be part of allocable memory pool of OS.

BIOS e820_table reports complete memory map to OS, including OS usable
memory ranges and BIOS reserved memory ranges etc.

x86 BIOS may not be trusted to include RMRR regions as reserved type
of memory in its e820 memory map, hence validate every RMRR entry
with the e820 memory map to make sure the RMRR regions will not be
used by OS for any other purposes.

Are there real systems in the wild where this is a problem?
Firmware reports e820 and RMRR in separate structure. The system will 
not work stably
if RMRR is not in the e820 table as reserved type mem and can be used 
for other purposes.
In system engineering phase, I practiced with some kind bugs from BIOS, 
but not yet exactly same as
the one here. Please consider this is a generic patch to avoid 
subsequent failure at runtime.

+static inline int __init
+arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
+{
+   u64 start = rmrr->base_address;
+   u64 end = rmrr->end_address + 1;
+
+   if (e820__mapped_all(start, end, E820_TYPE_RESERVED))
+   return 0;
+
+   pr_err(FW_BUG "No firmware reserved region can cover this RMRR 
[%#018Lx-%#018Lx], contact BIOS vendor for fixes\n",
+  start, end - 1);
+   return -EFAULT;
+}

Why -EFAULT, there is no fault involved? Usibg -EINVAL seems to be a better 
choice.

-EFAULT could be used for address related errors.
For this case, I agree, -EINVAL seems better while
consider it as an input problem from firmware. I will make change.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/dmar: collect fault statistics

2019-10-16 Thread Bjorn Helgaas
Hi Yuri,

On Tue, Oct 15, 2019 at 05:11:11PM +0200, Yuri Volchkov wrote:
> Currently dmar_fault handler only prints a message in the dmesg. This
> commit introduces counters - how many faults have happened, and
> exposes them via sysfs. Each pci device will have an entry
> 'dmar_faults' reading from which will give user 3 lines
>   remap: xxx
>   read: xxx
>   write: xxx

I think you should have three files instead of putting all these in a
single file.  See https://lore.kernel.org/r/20190621072911.ga21...@kroah.com
They should also be documented in Documentation/ABI/

I'm not sure this should be DMAR-specific.  Couldn't we count similar
events for other IOMMUs as well?

> This functionality is targeted for health monitoring daemons.
> 
> Signed-off-by: Yuri Volchkov 
> ---
>  drivers/iommu/dmar.c| 133 +++-
>  drivers/pci/pci-sysfs.c |  20 ++
>  include/linux/intel-iommu.h |   3 +
>  include/linux/pci.h |  11 +++
>  4 files changed, 150 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index eecd6a421667..0749873e3e41 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1107,6 +1107,7 @@ static void free_iommu(struct intel_iommu *iommu)
>   }
>  
>   if (iommu->irq) {
> + destroy_workqueue(iommu->fault_wq);
>   if (iommu->pr_irq) {
>   free_irq(iommu->pr_irq, iommu);
>   dmar_free_hwirq(iommu->pr_irq);
> @@ -1672,9 +1673,46 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
>   raw_spin_unlock_irqrestore(>register_lock, flag);
>  }
>  
> -static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
> - u8 fault_reason, int pasid, u16 source_id,
> - unsigned long long addr)
> +struct dmar_fault_info {
> + struct work_struct work;
> + struct intel_iommu *iommu;
> + int type;
> + int pasid;
> + u16 source_id;
> + unsigned long long addr;
> + u8 fault_reason;
> +};
> +
> +static struct kmem_cache *dmar_fault_info_cache;
> +int __init dmar_fault_info_cache_init(void)
> +{
> + int ret = 0;
> +
> + dmar_fault_info_cache =
> + kmem_cache_create("dmar_fault_info",
> +   sizeof(struct dmar_fault_info), 0,
> +   SLAB_HWCACHE_ALIGN, NULL);
> + if (!dmar_fault_info_cache) {
> + pr_err("Couldn't create dmar_fault_info cache\n");
> + ret = -ENOMEM;
> + }
> +
> + return ret;
> +}
> +
> +static inline void *alloc_dmar_fault_info(void)
> +{
> + return kmem_cache_alloc(dmar_fault_info_cache, GFP_ATOMIC);
> +}
> +
> +static inline void free_dmar_fault_info(void *vaddr)
> +{
> + kmem_cache_free(dmar_fault_info_cache, vaddr);
> +}
> +
> +static int dmar_fault_dump_one(struct intel_iommu *iommu, int type,
> +u8 fault_reason, int pasid, u16 source_id,
> +unsigned long long addr)
>  {
>   const char *reason;
>   int fault_type;
> @@ -1695,6 +1733,57 @@ static int dmar_fault_do_one(struct intel_iommu 
> *iommu, int type,
>   return 0;
>  }
>  
> +static int dmar_fault_handle_one(struct dmar_fault_info *info)
> +{
> + struct pci_dev *pdev;
> + u8 devfn;
> + atomic_t *pcnt;
> +
> + devfn = PCI_DEVFN(PCI_SLOT(info->source_id), PCI_FUNC(info->source_id));
> + pdev = pci_get_domain_bus_and_slot(info->iommu->segment,
> +PCI_BUS_NUM(info->source_id), devfn);

I'm sure you've considered this already, but it's not completely clear
to me whether these counters should be in the pci_dev (as in this
patch) or in something IOMMU-related.

The pci_dev is nice because you automatically have counters for each
PCI device, and most faults can be tied back to a device.

But on the other hand, it's not the PCI device actually detecting and
reporting the error.  It's the IOMMU reporting the fault, and while
it's *likely* there's a pci_dev corresponding to the
bus/device/function info in the IOMMU error registers, there's no
guarantee: the device may have been hot-removed between reading the
IOMMU registers and doing the pci_get_domain_bus_and_slot(), or (I
suspect) faults could be caused by corrupted or malicious TLPs.

Another possible issue is that if the counts are in the pci_dev,
they're lost if the device is removed, which might happen while
diagnosing faulty hardware.

So I tend to think this is really IOMMU error information and the
IOMMU driver should handle it itself, including logging and exposing
it via sysfs.

> + if (!pdev)
> + return -ENXIO;
> +
> + if (info->fault_reason == INTR_REMAP)
> + pcnt = >faults_cnt.remap;
> + else if (info->type)
> + pcnt = >faults_cnt.read;
> + else
> + pcnt = >faults_cnt.write;
> +
> + atomic_inc(pcnt);

pci_get_domain_bus_and_slot() increments pdev's 

[PATCH] iommu/dma: Relax locking in iommu_dma_prepare_msi()

2019-10-16 Thread Robin Murphy
Since commit ece6e6f0218b ("iommu/dma-iommu: Split iommu_dma_map_msi_msg()
in two parts"), iommu_dma_prepare_msi() should no longer have to worry
about preempting itself, nor being called in atomic context at all. Thus
we can downgrade the IRQ-safe locking to a simple mutex to avoid angering
the new might_sleep() check in iommu_map().

Reported-by: Qian Cai 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f321279baf9e..4ba3c49de017 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,7 +44,6 @@ struct iommu_dma_cookie {
dma_addr_t  msi_iova;
};
struct list_headmsi_page_list;
-   spinlock_t  msi_lock;
 
/* Domain for flush queue callback; NULL if flush queue not in use */
struct iommu_domain *fq_domain;
@@ -62,7 +62,6 @@ static struct iommu_dma_cookie *cookie_alloc(enum 
iommu_dma_cookie_type type)
 
cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
if (cookie) {
-   spin_lock_init(>msi_lock);
INIT_LIST_HEAD(>msi_page_list);
cookie->type = type;
}
@@ -1150,7 +1149,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
if (msi_page->phys == msi_addr)
return msi_page;
 
-   msi_page = kzalloc(sizeof(*msi_page), GFP_ATOMIC);
+   msi_page = kzalloc(sizeof(*msi_page), GFP_KERNEL);
if (!msi_page)
return NULL;
 
@@ -1180,7 +1179,7 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, 
phys_addr_t msi_addr)
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct iommu_dma_cookie *cookie;
struct iommu_dma_msi_page *msi_page;
-   unsigned long flags;
+   static DEFINE_MUTEX(msi_prepare_lock);
 
if (!domain || !domain->iova_cookie) {
desc->iommu_cookie = NULL;
@@ -1190,13 +1189,12 @@ int iommu_dma_prepare_msi(struct msi_desc *desc, 
phys_addr_t msi_addr)
cookie = domain->iova_cookie;
 
/*
-* We disable IRQs to rule out a possible inversion against
-* irq_desc_lock if, say, someone tries to retarget the affinity
-* of an MSI from within an IPI handler.
+* In fact we should already be serialised by irq_domain_mutex
+* here, but that's way subtle, so belt and braces...
 */
-   spin_lock_irqsave(>msi_lock, flags);
+   mutex_lock(_prepare_lock);
msi_page = iommu_dma_get_msi_page(dev, msi_addr, domain);
-   spin_unlock_irqrestore(>msi_lock, flags);
+   mutex_unlock(_prepare_lock);
 
msi_desc_set_iommu_cookie(desc, msi_page);
 
-- 
2.21.0.dirty



Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Julien Grall

Hi Robin,

On 16/10/2019 17:09, Robin Murphy wrote:

On 16/10/2019 17:03, Joerg Roedel wrote:

On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote:

On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:



The x86 one might just be a mistake.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ad05484d0c80..63c4b894751d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
unsigned long iova,
 if (iommu_prot & IOMMU_WRITE)
 prot |= IOMMU_PROT_IW;
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);


Yeah, that is a bug, I spotted that too.


@@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
*iommu_dma_get_msi_page(struct device *dev,
 if (!iova)
 goto out_free_page;
-   if (iommu_map(domain, iova, msi_addr, size, prot))
+   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
 goto out_free_iova;


Not so sure this is a bug, this code is only about setting up MSIs on
ARM. It probably doesn't need to be atomic.
Right, since the iommu_dma_prepare_msi() operation was broken out to happen at 
MSI domain setup time, I don't think the comment in there applies any more, so 
we can probably stop disabling irqs around the iommu_dma_get_msi_page() call.


Yes, I agree that it does not need to be atomic anymore.

Cheers,

--
Julien Grall


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Robin Murphy

On 16/10/2019 17:11, Qian Cai wrote:

On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote:

On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote:

On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:
The x86 one might just be a mistake.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ad05484d0c80..63c4b894751d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
unsigned long iova,
 if (iommu_prot & IOMMU_WRITE)
 prot |= IOMMU_PROT_IW;
  
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);

+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);


Yeah, that is a bug, I spotted that too.


@@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
*iommu_dma_get_msi_page(struct device *dev,
 if (!iova)
 goto out_free_page;
  
-   if (iommu_map(domain, iova, msi_addr, size, prot))

+   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
 goto out_free_iova;


Not so sure this is a bug, this code is only about setting up MSIs on
ARM. It probably doesn't need to be atomic.


The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be
called from an atomic context as showed in the arm64 call traces,

+int iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot)
+{
+   might_sleep();
+   return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+}


Also note that it's *only* the might_sleep() at issue here - none of the 
arm64 IOMMU drivers have been converted to look at the new gfp argument 
yet, so anything they actually allocate while mapping will still be 
GFP_ATOMIC anyway.


(Carrying that flag down through the whole io-pgtable stack is on my 
to-do list...)


Robin.


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
On Wed, 2019-10-16 at 18:03 +0200, Joerg Roedel wrote:
> On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote:
> > On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:
> > The x86 one might just be a mistake.
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index ad05484d0c80..63c4b894751d 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
> > unsigned long iova,
> > if (iommu_prot & IOMMU_WRITE)
> > prot |= IOMMU_PROT_IW;
> >  
> > -   ret = iommu_map_page(domain, iova, paddr, page_size, prot, 
> > GFP_KERNEL);
> > +   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
> 
> Yeah, that is a bug, I spotted that too.
> 
> > @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
> > *iommu_dma_get_msi_page(struct device *dev,
> > if (!iova)
> > goto out_free_page;
> >  
> > -   if (iommu_map(domain, iova, msi_addr, size, prot))
> > +   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
> > goto out_free_iova;
> 
> Not so sure this is a bug, this code is only about setting up MSIs on
> ARM. It probably doesn't need to be atomic.

The patch "iommu: Add gfp parameter to iommu_ops::map" does this. It could be
called from an atomic context as showed in the arm64 call traces,

+int iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot)
+{
+   might_sleep();
+   return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL);
+}


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Robin Murphy

On 16/10/2019 17:03, Joerg Roedel wrote:

On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote:

On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:



The x86 one might just be a mistake.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ad05484d0c80..63c4b894751d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
unsigned long iova,
 if (iommu_prot & IOMMU_WRITE)
 prot |= IOMMU_PROT_IW;
  
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);

+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);


Yeah, that is a bug, I spotted that too.


@@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
*iommu_dma_get_msi_page(struct device *dev,
 if (!iova)
 goto out_free_page;
  
-   if (iommu_map(domain, iova, msi_addr, size, prot))

+   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
 goto out_free_iova;


Not so sure this is a bug, this code is only about setting up MSIs on
ARM. It probably doesn't need to be atomic.
Right, since the iommu_dma_prepare_msi() operation was broken out to 
happen at MSI domain setup time, I don't think the comment in there 
applies any more, so we can probably stop disabling irqs around the 
iommu_dma_get_msi_page() call.


Robin.


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Joerg Roedel
On Wed, Oct 16, 2019 at 11:53:33AM -0400, Qian Cai wrote:
> On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:

> The x86 one might just be a mistake.
> 
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index ad05484d0c80..63c4b894751d 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
> unsigned long iova,
> if (iommu_prot & IOMMU_WRITE)
> prot |= IOMMU_PROT_IW;
>  
> -   ret = iommu_map_page(domain, iova, paddr, page_size, prot, 
> GFP_KERNEL);
> +   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);

Yeah, that is a bug, I spotted that too.

> @@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
> *iommu_dma_get_msi_page(struct device *dev,
> if (!iova)
> goto out_free_page;
>  
> -   if (iommu_map(domain, iova, msi_addr, size, prot))
> +   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
> goto out_free_iova;

Not so sure this is a bug, this code is only about setting up MSIs on
ARM. It probably doesn't need to be atomic.

Regards,

Joerg


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
On Wed, 2019-10-16 at 17:31 +0200, Joerg Roedel wrote:
> Hi Qian,
> 
> thanks for the report!
> 
> On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote:
> > On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote:
> > > Today's linux-next generates a lot of warnings on multiple servers during 
> > > boot
> > > due to the series "iommu/amd: Convert the AMD iommu driver to the 
> > > dma-iommu api"
> > > [1]. Reverted the whole things fixed them.
> > > 
> > > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/
> > > 
> > 
> > BTW, the previous x86 warning was from only reverted one patch "iommu: Add 
> > gfp
> > parameter to iommu_ops::map" where proved to be insufficient. Now, pasting 
> > the
> > correct warning.

The x86 one might just be a mistake.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index ad05484d0c80..63c4b894751d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2542,7 +2542,7 @@ static int amd_iommu_map(struct iommu_domain *dom,
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
 
domain_flush_np_cache(domain, iova, page_size);

The arm64 -- does it forget to do this?

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ecc08aef9b58..8dd0ef0656f4 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1185,7 +1185,7 @@ static struct iommu_dma_msi_page
*iommu_dma_get_msi_page(struct device *dev,
if (!iova)
goto out_free_page;
 
-   if (iommu_map(domain, iova, msi_addr, size, prot))
+   if (iommu_map_atomic(domain, iova, msi_addr, size, prot))
goto out_free_iova;
 
INIT_LIST_HEAD(_page->list);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Joerg Roedel
On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote:
> BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp
> parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the
> correct warning.

Can you please test this small fix:

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 78a2cca3ac5c..e7a4464e8594 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2562,7 +2562,7 @@ static int amd_iommu_map(struct iommu_domain *dom, 
unsigned long iova,
if (iommu_prot & IOMMU_WRITE)
prot |= IOMMU_PROT_IW;
 
-   ret = iommu_map_page(domain, iova, paddr, page_size, prot, GFP_KERNEL);
+   ret = iommu_map_page(domain, iova, paddr, page_size, prot, gfp);
 
domain_flush_np_cache(domain, iova, page_size);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Joerg Roedel
Hi Qian,

thanks for the report!

On Wed, Oct 16, 2019 at 10:59:42AM -0400, Qian Cai wrote:
> On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote:
> > Today's linux-next generates a lot of warnings on multiple servers during 
> > boot
> > due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu 
> > api"
> > [1]. Reverted the whole things fixed them.
> > 
> > [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/
> > 
> 
> BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp
> parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the
> correct warning.

I am looking into it and may send you fixes to test.

Thanks,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: "Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
On Wed, 2019-10-16 at 10:55 -0400, Qian Cai wrote:
> Today's linux-next generates a lot of warnings on multiple servers during boot
> due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu 
> api"
> [1]. Reverted the whole things fixed them.
> 
> [1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/
> 

BTW, the previous x86 warning was from only reverted one patch "iommu: Add gfp
parameter to iommu_ops::map" where proved to be insufficient. Now, pasting the
correct warning.

[  564.365768][ T6222] BUG: sleeping function called from invalid context at
mm/page_alloc.c:4692
[  564.374447][ T6222] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
6222, name: git
[  564.382969][ T6222] INFO: lockdep is turned off.
[  564.387644][ T6222] CPU: 25 PID: 6222 Comm: git Tainted:
GW 5.4.0-rc3-next-20191016 #6
[  564.397011][ T6222] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 07/10/2019
[  564.406291][ T6222] Call Trace:
[  564.409470][ T6222]  dump_stack+0x86/0xca
[  564.413517][ T6222]  ___might_sleep.cold.92+0xd2/0x122
[  564.418694][ T6222]  __might_sleep+0x73/0xe0
[  564.422999][ T6222]  __alloc_pages_nodemask+0x442/0x720
[  564.428265][ T6222]  ? __alloc_pages_slowpath+0x18d0/0x18d0
[  564.433883][ T6222]  ? arch_stack_walk+0x7f/0xf0
[  564.438534][ T6222]  ? create_object+0x4a2/0x540
[  564.443188][ T6222]  alloc_pages_current+0x9c/0x110
[  564.448098][ T6222]  __get_free_pages+0x12/0x60
[  564.452659][ T6222]  get_zeroed_page+0x16/0x20
[  564.457137][ T6222]  amd_iommu_map+0x504/0x850
[  564.461612][ T6222]  ? amd_iommu_domain_direct_map+0x60/0x60
[  564.467312][ T6222]  ? lockdep_hardirqs_on+0x16/0x2a0
[  564.472400][ T6222]  ? alloc_iova+0x189/0x210
[  564.476790][ T6222]  __iommu_map+0x1c1/0x4e0
[  564.481090][ T6222]  ? iommu_get_dma_domain+0x40/0x40
[  564.486181][ T6222]  ? alloc_iova_fast+0x258/0x3d1
[  564.491009][ T6222]  ? create_object+0x4a2/0x540
[  564.495656][ T6222]  __iommu_map_sg+0xa5/0x130
[  564.500135][ T6222]  iommu_map_sg_atomic+0x14/0x20
[  564.504958][ T6222]  iommu_dma_map_sg+0x2c3/0x450
[  564.509699][ T6222]  scsi_dma_map+0xd7/0x160
[  564.514010][ T6222]  pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420
[smartpqi]
[  564.521811][ T6222]  ? pqi_alloc_io_request+0x127/0x140 [smartpqi]
[  564.528037][ T6222]  pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
[  564.534264][ T6222]  ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi]
[  564.541100][ T6222]  ? sd_init_command+0xa25/0x1346 [sd_mod]
[  564.546802][ T6222]  scsi_queue_rq+0xd19/0x1360
[  564.551372][ T6222]  __blk_mq_try_issue_directly+0x295/0x3f0
[  564.557071][ T6222]  ? blk_mq_request_bypass_insert+0xd0/0xd0
[  564.562860][ T6222]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  564.568384][ T6222]  blk_mq_try_issue_directly+0xad/0x130
[  564.573821][ T6222]  ? __blk_mq_try_issue_directly+0x3f0/0x3f0
[  564.579693][ T6222]  ? blk_add_rq_to_plug+0xcd/0x110
[  564.584693][ T6222]  blk_mq_make_request+0xcee/0x1120
[  564.589777][ T6222]  ? lock_downgrade+0x3c0/0x3c0
[  564.594517][ T6222]  ? blk_mq_try_issue_directly+0x130/0x130
[  564.600218][ T6222]  ? blk_queue_enter+0x78d/0x810
[  564.605041][ T6222]  ? generic_make_request_checks+0xf30/0xf30
[  564.610915][ T6222]  ? lock_downgrade+0x3c0/0x3c0
[  564.615655][ T6222]  ? __srcu_read_unlock+0x24/0x50
[  564.620565][ T6222]  ? generic_make_request+0x150/0x650
[  564.625833][ T6222]  generic_make_request+0x196/0x650
[  564.630921][ T6222]  ? blk_queue_enter+0x810/0x810
[  564.635747][ T6222]  submit_bio+0xaa/0x270
[  564.639873][ T6222]  ? submit_bio+0xaa/0x270
[  564.644172][ T6222]  ? generic_make_request+0x650/0x650
[  564.649437][ T6222]  ? iomap_readpage+0x260/0x260
[  564.654173][ T6222]  iomap_readpages+0x154/0x3d0
[  564.658823][ T6222]  ? iomap_zero_range_actor+0x330/0x330
[  564.664257][ T6222]  ? __might_sleep+0x73/0xe0
[  564.668836][ T6222]  xfs_vm_readpages+0xaf/0x1f0 [xfs]
[  564.674016][ T6222]  read_pages+0xe2/0x3b0
[  564.678142][ T6222]  ? read_cache_pages+0x350/0x350
[  564.683057][ T6222]  ? __page_cache_alloc+0x12c/0x230
[  564.688148][ T6222]  __do_page_cache_readahead+0x346/0x3a0
[  564.693670][ T6222]  ? read_pages+0x3b0/0x3b0
[  564.698059][ T6222]  ? lockdep_hardirqs_on+0x16/0x2a0
[  564.703247][ T6222]  ? __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  564.708947][ T6222]  filemap_fault+0xa13/0xe70
[  564.713528][ T6222]  __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  564.719059][ T6222]  ? kmemleak_alloc+0x57/0x90
[  564.723724][ T6222]  ? xfs_file_read_iter+0x3c0/0x3c0 [xfs]
[  564.729337][ T6222]  ? debug_check_no_locks_freed+0x2c/0xe0
[  564.734946][ T6222]  ? lockdep_init_map+0x8b/0x2b0
[  564.739872][ T6222]  xfs_filemap_fault+0x68/0x70 [xfs]
[  564.745046][ T6222]  __do_fault+0x83/0x220
[  564.749172][ T6222]  __handle_mm_fault+0xd76/0x1f40
[  564.754084][ T6222]  ? __pmd_alloc+0x280/0x280
[  564.758559][ T6222]  ? debug_lockdep_

"Convert the AMD iommu driver to the dma-iommu api" is buggy

2019-10-16 Thread Qian Cai
Today's linux-next generates a lot of warnings on multiple servers during boot
due to the series "iommu/amd: Convert the AMD iommu driver to the dma-iommu api"
[1]. Reverted the whole things fixed them.

[1] https://lore.kernel.org/lkml/20190908165642.22253-1-murph...@tcd.ie/

[  257.785749][ T6184] BUG: sleeping function called from invalid context at
mm/page_alloc.c:4692
[  257.794886][ T6184] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
6184, name: readelf
[  257.803574][ T6184] INFO: lockdep is turned off.
[  257.808233][ T6184] CPU: 86 PID: 6184 Comm: readelf Tainted:
GW 5.4.0-rc3-next-20191016+ #7
[  257.818035][ T6184] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385
Gen10, BIOS A40 07/10/2019
[  257.827313][ T6184] Call Trace:
[  257.830487][ T6184]  dump_stack+0x86/0xca
[  257.834530][ T6184]  ___might_sleep.cold.92+0xd2/0x122
[  257.839708][ T6184]  __might_sleep+0x73/0xe0
[  257.844011][ T6184]  __alloc_pages_nodemask+0x442/0x720
[  257.849274][ T6184]  ? __alloc_pages_slowpath+0x18d0/0x18d0
[  257.854886][ T6184]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  257.860415][ T6184]  ? lock_downgrade+0x3c0/0x3c0
[  257.865156][ T6184]  alloc_pages_current+0x9c/0x110
[  257.870071][ T6184]  __get_free_pages+0x12/0x60
[  257.874634][ T6184]  get_zeroed_page+0x16/0x20
[  257.879112][ T6184]  amd_iommu_map+0x504/0x850
[  257.883588][ T6184]  ? amd_iommu_domain_direct_map+0x60/0x60
[  257.889286][ T6184]  ? lockdep_hardirqs_on+0x16/0x2a0
[  257.894373][ T6184]  ? alloc_iova+0x189/0x210
[  257.898765][ T6184]  ? trace_hardirqs_on+0x3a/0x160
[  257.903677][ T6184]  iommu_map+0x1b3/0x4d0
[  257.907802][ T6184]  ? iommu_unmap+0xf0/0xf0
[  257.912104][ T6184]  ? alloc_iova_fast+0x258/0x3d1
[  257.916929][ T6184]  ? create_object+0x4a2/0x540
[  257.921579][ T6184]  iommu_map_sg+0x9d/0x120
[  257.925882][ T6184]  iommu_dma_map_sg+0x2c3/0x450
[  257.930627][ T6184]  scsi_dma_map+0xd7/0x160
[  257.934936][ T6184]  pqi_raid_submit_scsi_cmd_with_io_request+0x392/0x420
[smartpqi]
[  257.942735][ T6184]  ? pqi_alloc_io_request+0x127/0x140 [smartpqi]
[  257.948962][ T6184]  pqi_scsi_queue_command+0x8ab/0xe00 [smartpqi]
[  257.955192][ T6184]  ? pqi_eh_device_reset_handler+0x9c0/0x9c0 [smartpqi]
[  257.962029][ T6184]  ? sd_init_command+0xa25/0x1346 [sd_mod]
[  257.967730][ T6184]  scsi_queue_rq+0xd19/0x1360
[  257.972298][ T6184]  __blk_mq_try_issue_directly+0x295/0x3f0
[  257.977999][ T6184]  ? blk_mq_request_bypass_insert+0xd0/0xd0
[  257.983787][ T6184]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  257.989312][ T6184]  blk_mq_request_issue_directly+0xb5/0x100
[  257.995098][ T6184]  ? blk_mq_flush_plug_list+0x7e0/0x7e0
[  258.000537][ T6184]  ? blk_mq_sched_insert_requests+0xd6/0x380
[  258.006409][ T6184]  ? lock_downgrade+0x3c0/0x3c0
[  258.011147][ T6184]  blk_mq_try_issue_list_directly+0xa9/0x160
[  258.017023][ T6184]  blk_mq_sched_insert_requests+0x228/0x380
[  258.022810][ T6184]  blk_mq_flush_plug_list+0x448/0x7e0
[  258.028073][ T6184]  ? blk_mq_insert_requests+0x380/0x380
[  258.033516][ T6184]  blk_flush_plug_list+0x1eb/0x230
[  258.038515][ T6184]  ? blk_insert_cloned_request+0x1b0/0x1b0
[  258.044215][ T6184]  blk_finish_plug+0x43/0x5d
[  258.048695][ T6184]  read_pages+0xf6/0x3b0
[  258.052823][ T6184]  ? read_cache_pages+0x350/0x350
[  258.057737][ T6184]  ? __page_cache_alloc+0x12c/0x230
[  258.062826][ T6184]  __do_page_cache_readahead+0x346/0x3a0
[  258.068348][ T6184]  ? read_pages+0x3b0/0x3b0
[  258.072738][ T6184]  ? lockdep_hardirqs_on+0x16/0x2a0
[  258.077928][ T6184]  ? __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  258.083625][ T6184]  filemap_fault+0xa13/0xe70
[  258.088201][ T6184]  __xfs_filemap_fault+0x167/0x4a0 [xfs]
[  258.093731][ T6184]  ? kmemleak_alloc+0x57/0x90
[  258.098397][ T6184]  ? xfs_file_read_iter+0x3c0/0x3c0 [xfs]
[  258.104009][ T6184]  ? debug_check_no_locks_freed+0x2c/0xe0
[  258.109618][ T6184]  ? lockdep_init_map+0x8b/0x2b0
[  258.114543][ T6184]  xfs_filemap_fault+0x68/0x70 [xfs]
[  258.119720][ T6184]  __do_fault+0x83/0x220
[  258.123847][ T6184]  __handle_mm_fault+0xd76/0x1f40
[  258.128757][ T6184]  ? __pmd_alloc+0x280/0x280
[  258.133231][ T6184]  ? debug_lockdep_rcu_enabled+0x27/0x60
[  258.138755][ T6184]  ? handle_mm_fault+0x178/0x4c0
[  258.143581][ T6184]  ? lockdep_hardirqs_on+0x16/0x2a0
[  258.148674][ T6184]  ? __do_page_fault+0x29c/0x640
[  258.153501][ T6184]  handle_mm_fault+0x205/0x4c0
[  258.158151][ T6184]  __do_page_fault+0x29c/0x640
[  258.162800][ T6184]  do_page_fault+0x50/0x37f
[  258.167189][ T6184]  page_fault+0x34/0x40
[  258.171228][ T6184] RIP: 0010:__clear_user+0x3b/0x70

[  183.553150] BUG: sleeping function called from invalid context at
drivers/iommu/iommu.c:1950
[  183.562306] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 1486,
name: kworker/0:3
[  183.571450] 5 locks held by kworker/0:3/1486:
[  183.576510]  #0: 44ff0008000ce128 ((wq_completion)events){+.+.}, at:
process_one_work+0

Re: [PATCH 1/5] iommu: Implement iommu_put_resv_regions_simple()

2019-10-16 Thread Thierry Reding
On Wed, Sep 18, 2019 at 04:37:38PM +0100, Will Deacon wrote:
> On Thu, Aug 29, 2019 at 01:17:48PM +0200, Thierry Reding wrote:
> > From: Thierry Reding 
> > 
> > Implement a generic function for removing reserved regions. This can be
> > used by drivers that don't do anything fancy with these regions other
> > than allocating memory for them.
> > 
> > Signed-off-by: Thierry Reding 
> > ---
> >  drivers/iommu/iommu.c | 19 +++
> >  include/linux/iommu.h |  2 ++
> >  2 files changed, 21 insertions(+)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 0f585b614657..73a2a6b13507 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2170,6 +2170,25 @@ void iommu_put_resv_regions(struct device *dev, 
> > struct list_head *list)
> > ops->put_resv_regions(dev, list);
> >  }
> >  
> > +/**
> > + * iommu_put_resv_regions_simple - Reserved region driver helper
> > + * @dev: device for which to free reserved regions
> > + * @list: reserved region list for device
> > + *
> > + * IOMMU drivers can use this to implement their .put_resv_regions() 
> > callback
> > + * for simple reservations. Memory allocated for each reserved region will 
> > be
> > + * freed. If an IOMMU driver allocates additional resources per region, it 
> > is
> > + * going to have to implement a custom callback.
> > + */
> > +void iommu_put_resv_regions_simple(struct device *dev, struct list_head 
> > *list)
> > +{
> > +   struct iommu_resv_region *entry, *next;
> > +
> > +   list_for_each_entry_safe(entry, next, list, list)
> > +   kfree(entry);
> > +}
> > +EXPORT_SYMBOL(iommu_put_resv_regions_simple);
> 
> Can you call this directly from iommu_put_resv_regions() if the function
> pointer in ops is NULL? That would save having to plumb the default callback
> into a bunch of drivers.

I probably could, but I don't think that necessarily improves things.
The reason is that that would cause the helper to get called even if the
driver doesn't support reserved regions. That's likely harmless because
in that case the list of regions passed to it would be empty. However, I
think the current way to do this, where we have to implement both hooks
for ->get_resv_regions() and ->put_resv_regions() is nicely symmetric.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support

2019-10-16 Thread John Garry



Hi Robin,


Two significant concerns right off the bat:

- It seems more common than not for silicon designers to fail to
implement IIDR correctly, so it's only a matter of time before
inevitably needing to bring back some firmware-level identifier
abstraction (if not already - does Hi161x have PMCGs?)


Maybe there's a way that we can switch to this method, and leave the
door open for an easy way to support firmware-level identifier again,
if ever needed. I'm not too pushed - this was secondary to just
allowing the PMCG driver know the associated SMMU model.


But that's the part I'm not buying - there's no clear advantage to
pushing that complexity down into the PMCG driver, vs. leaving the IORT
code responsible for translating an SMMU model into a PMCG model, yet
the aforementioned disadvantages jump out right away.



One advantage is that the next piece of quirky hw with a properly 
implemented IIDR does not require a new IORT model.


And today, this handling is only for hi1620, and since we can use hi1620 
IIDR to id it, then it seems good to remove code outside the PMCG driver 
specifically to handle it.


But if you think it's going to be needed again, then it makes sense not 
to remove it.



And, no, hi161x does not have any PMCGs.


Hooray, I guess :)



- This seems like a step in entirely the wrong direction for supporting
.


So to support PMCGs that reference a Named Component or Root Complex,
I thought that the IORT parsing code would have to do some secondary
lookup to the associated SMMU, through the Named Component or Root
Complex node.

What was your idea here?


The associated SMMU has no relevance in that context - the reason for
the Node Reference to point to a non-SMMU node is for devices that
implement their own embedded TLB (e.g. AMBA DTI masters) and expose a
standard PMCG interface to monitor it. It isn't reasonable to expect any
old PCIe controller or on-chip-accelerator driver to expose a fake SMMU
IIDR just to keep some other driver happy.


But won't there still be an SMMU associated with the AMBA DTI masters, 
in your example?


It's this SMMU which the PMCG driver would reference as the "parent" 
device, and the IORT parsing would need to do the lookup for this reference.


But then, this becomes something that the DT parsing would need to 
handle also.





Note: I do acknowledge that an overall issue is that we assume all
PMCG IMP DEF events are same for a given SMMU model.


That assumption does technically fail already - I know MMU-600 has
different IMP-DEF events for its TCU and TBUs, however as long as we can
get as far as "this is some part of an MMU-600" the driver should be
able to figure out the rest (annoyingly it looks like both PMCG types
expose the same PMCG_ID_REGS information, but they should be
distinguishable by PMCG_CEIDn).


JFYI, PMCG_CEIDn contents for hi1620 are all zero, apart from PMDEVARCH 
and PMDEVTYPE, which are same as arm implementation according to the 
spec - sigh...





Interpreting the Node Reference is definitely a welcome improvement over
matching table headers, but absent a truly compelling argument to the
contrary, I'd rather retain the "PMCG model" abstraction in between that
and the driver itself (especially since those can trivially be hung off
compatibles once it comes to DT support).


For DT, I would assume that we just use compatible strings would allow
us to identify the PMCG model.


Right, that was largely my point - DT probing can start with a PMCG
model, so it's a lot more logical for ACPI probing to do the same, with
the actual PMCG model determination hidden away in the ACPI code. That's
the basis of the current design.

I have been nagging the architects that PMCGs not having their own IIDR
is an unwelcome hole in the spec, so hopefully this might get a bit
easier some day.


For sure. The spec reads that the PMCGs may be "independently-designed", 
hence no general id method. I don't get this.





On a related matter, is there still a need to deal with scenarios of
the PMCG being located within the SMMU register map? As you may
remember, we did have this issue but relocated the PMCG to outside the
SMMU register map in a later chip rev.


MMU-600 has its TCU PMCG page 0 in the middle of its SMMU page 0 space,
but given that it's an Arm IP, I expect that when the heat gets turned
up for making it work, it's most likely to be under me ;)


OK, so this is another reason why I thought that having a reference to 
the SMMU device could be useful in terms of solving that problem.




Robin.

.


Thanks again,
John








[PATCH 1/3] iommu/tegra-smmu: Use non-secure register for flushing

2019-10-16 Thread Thierry Reding
From: Navneet Kumar 

Use PTB_ASID instead of SMMU_CONFIG to flush smmu.
PTB_ASID can be accessed from non-secure mode, SMMU_CONFIG cannot be.
Using SMMU_CONFIG could pose a problem when kernel doesn't have secure
mode access enabled from boot.

Signed-off-by: Navneet Kumar 
Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Thierry Reding 
---
 drivers/iommu/tegra-smmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 99f85fb5a704..03e667480ec6 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -240,7 +240,7 @@ static inline void smmu_flush_tlb_group(struct tegra_smmu 
*smmu,
 
 static inline void smmu_flush(struct tegra_smmu *smmu)
 {
-   smmu_readl(smmu, SMMU_CONFIG);
+   smmu_readl(smmu, SMMU_PTB_ASID);
 }
 
 static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp)
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu/tegra-smmu: Fix page tables in > 4 GiB memory

2019-10-16 Thread Thierry Reding
From: Thierry Reding 

Page tables that reside in physical memory beyond the 4 GiB boundary are
currently not working properly. The reason is that when the physical
address for page directory entries is read, it gets truncated at 32 bits
and can cause crashes when passing that address to the DMA API.

Fix this by first casting the PDE value to a dma_addr_t and then using
the page frame number mask for the SMMU instance to mask out the invalid
bits, which are typically used for mapping attributes, etc.

Signed-off-by: Thierry Reding 
---
 drivers/iommu/tegra-smmu.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 9425d01a95ac..63a147b623e6 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -159,9 +159,9 @@ static bool smmu_dma_addr_valid(struct tegra_smmu *smmu, 
dma_addr_t addr)
return (addr & smmu->pfn_mask) == addr;
 }
 
-static dma_addr_t smmu_pde_to_dma(u32 pde)
+static dma_addr_t smmu_pde_to_dma(struct tegra_smmu *smmu, u32 pde)
 {
-   return pde << 12;
+   return (dma_addr_t)(pde & smmu->pfn_mask) << 12;
 }
 
 static void smmu_flush_ptc_all(struct tegra_smmu *smmu)
@@ -554,6 +554,7 @@ static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, 
unsigned long iova,
  dma_addr_t *dmap)
 {
unsigned int pd_index = iova_pd_index(iova);
+   struct tegra_smmu *smmu = as->smmu;
struct page *pt_page;
u32 *pd;
 
@@ -562,7 +563,7 @@ static u32 *tegra_smmu_pte_lookup(struct tegra_smmu_as *as, 
unsigned long iova,
return NULL;
 
pd = page_address(as->pd);
-   *dmap = smmu_pde_to_dma(pd[pd_index]);
+   *dmap = smmu_pde_to_dma(smmu, pd[pd_index]);
 
return tegra_smmu_pte_offset(pt_page, iova);
 }
@@ -604,7 +605,7 @@ static u32 *as_get_pte(struct tegra_smmu_as *as, dma_addr_t 
iova,
} else {
u32 *pd = page_address(as->pd);
 
-   *dmap = smmu_pde_to_dma(pd[pde]);
+   *dmap = smmu_pde_to_dma(smmu, pd[pde]);
}
 
return tegra_smmu_pte_offset(as->pts[pde], iova);
@@ -629,7 +630,7 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as 
*as, unsigned long iova)
if (--as->count[pde] == 0) {
struct tegra_smmu *smmu = as->smmu;
u32 *pd = page_address(as->pd);
-   dma_addr_t pte_dma = smmu_pde_to_dma(pd[pde]);
+   dma_addr_t pte_dma = smmu_pde_to_dma(smmu, pd[pde]);
 
tegra_smmu_set_pde(as, iova, 0);
 
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/tegra-smmu: Fix client enablement order

2019-10-16 Thread Thierry Reding
From: Navneet Kumar 

Enable clients' translation only after setting up the swgroups.

Signed-off-by: Navneet Kumar 
Signed-off-by: Thierry Reding 
---
 drivers/iommu/tegra-smmu.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 03e667480ec6..9425d01a95ac 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -351,6 +351,20 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, 
unsigned int swgroup,
unsigned int i;
u32 value;
 
+   group = tegra_smmu_find_swgroup(smmu, swgroup);
+   if (group) {
+   value = smmu_readl(smmu, group->reg);
+   value &= ~SMMU_ASID_MASK;
+   value |= SMMU_ASID_VALUE(asid);
+   value |= SMMU_ASID_ENABLE;
+   smmu_writel(smmu, value, group->reg);
+   } else {
+   pr_warn("%s group from swgroup %u not found\n", __func__,
+   swgroup);
+   /* No point moving ahead if group was not found */
+   return;
+   }
+
for (i = 0; i < smmu->soc->num_clients; i++) {
const struct tegra_mc_client *client = >soc->clients[i];
 
@@ -361,15 +375,6 @@ static void tegra_smmu_enable(struct tegra_smmu *smmu, 
unsigned int swgroup,
value |= BIT(client->smmu.bit);
smmu_writel(smmu, value, client->smmu.reg);
}
-
-   group = tegra_smmu_find_swgroup(smmu, swgroup);
-   if (group) {
-   value = smmu_readl(smmu, group->reg);
-   value &= ~SMMU_ASID_MASK;
-   value |= SMMU_ASID_VALUE(asid);
-   value |= SMMU_ASID_ENABLE;
-   smmu_writel(smmu, value, group->reg);
-   }
 }
 
 static void tegra_smmu_disable(struct tegra_smmu *smmu, unsigned int swgroup,
-- 
2.23.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support

2019-10-16 Thread Robin Murphy

On 2019-10-16 9:47 am, John Garry wrote:

On 15/10/2019 19:00, Robin Murphy wrote:

Hi John,

On 30/09/2019 15:33, John Garry wrote:

This patchset adds IMP DEF event support for the SMMUv3 PMCG.

It is marked as an RFC as the method to identify the PMCG implementation
may be a quite disliked. And, in general, the series is somewhat
incomplete.

So the background is that the PMCG supports IMP DEF events, yet we
have no
method to identify the PMCG to know the IMP DEF events.

A method for identifying the PMCG implementation could be using
PMDEVARCH, but we cannot rely on this being set properly, as whether 
this

is implemented is not defined in SMMUv3 spec.

Another method would be perf event aliasing, but this method of event
matching is based on CPU id, which would not guarantee same
uniqueness as PMCG implementation.

Yet another method could be to continue using ACPI OEM ID in the IORT
code, but this does not scale. And it is not suitable if we ever add DT
support to the PMCG driver.

The method used in this series is based on matching on the parent SMMUv3
IIDR. We store this IIDR contents in the arm smmu structure as the first
element, which means that we don't have to expose SMMU APIs - this is
the part which may be disliked.

The final two patches switch the pre-existing PMCG model identification
from ACPI OEM ID to the same parent SMMUv3 IIDR matching.

For now, we only consider SMMUv3' nodes being the associated node for
PMCG.




Hi Robin,


Two significant concerns right off the bat:

- It seems more common than not for silicon designers to fail to
implement IIDR correctly, so it's only a matter of time before
inevitably needing to bring back some firmware-level identifier
abstraction (if not already - does Hi161x have PMCGs?)


Maybe there's a way that we can switch to this method, and leave the 
door open for an easy way to support firmware-level identifier again, if 
ever needed. I'm not too pushed - this was secondary to just allowing 
the PMCG driver know the associated SMMU model.


But that's the part I'm not buying - there's no clear advantage to 
pushing that complexity down into the PMCG driver, vs. leaving the IORT 
code responsible for translating an SMMU model into a PMCG model, yet 
the aforementioned disadvantages jump out right away.



And, no, hi161x does not have any PMCGs.


Hooray, I guess :)



- This seems like a step in entirely the wrong direction for supporting
.


So to support PMCGs that reference a Named Component or Root Complex, I 
thought that the IORT parsing code would have to do some secondary 
lookup to the associated SMMU, through the Named Component or Root 
Complex node.


What was your idea here?


The associated SMMU has no relevance in that context - the reason for 
the Node Reference to point to a non-SMMU node is for devices that 
implement their own embedded TLB (e.g. AMBA DTI masters) and expose a 
standard PMCG interface to monitor it. It isn't reasonable to expect any 
old PCIe controller or on-chip-accelerator driver to expose a fake SMMU 
IIDR just to keep some other driver happy.


Note: I do acknowledge that an overall issue is that we assume all PMCG 
IMP DEF events are same for a given SMMU model.


That assumption does technically fail already - I know MMU-600 has 
different IMP-DEF events for its TCU and TBUs, however as long as we can 
get as far as "this is some part of an MMU-600" the driver should be 
able to figure out the rest (annoyingly it looks like both PMCG types 
expose the same PMCG_ID_REGS information, but they should be 
distinguishable by PMCG_CEIDn).



Interpreting the Node Reference is definitely a welcome improvement over
matching table headers, but absent a truly compelling argument to the
contrary, I'd rather retain the "PMCG model" abstraction in between that
and the driver itself (especially since those can trivially be hung off
compatibles once it comes to DT support).


For DT, I would assume that we just use compatible strings would allow 
us to identify the PMCG model.


Right, that was largely my point - DT probing can start with a PMCG 
model, so it's a lot more logical for ACPI probing to do the same, with 
the actual PMCG model determination hidden away in the ACPI code. That's 
the basis of the current design.


I have been nagging the architects that PMCGs not having their own IIDR 
is an unwelcome hole in the spec, so hopefully this might get a bit 
easier some day.


On a related matter, is there still a need to deal with scenarios of the 
PMCG being located within the SMMU register map? As you may remember, we 
did have this issue but relocated the PMCG to outside the SMMU register 
map in a later chip rev.


MMU-600 has its TCU PMCG page 0 in the middle of its SMMU page 0 space, 
but given that it's an Arm IP, I expect that when the heat gets turned 
up for making it work, it's most likely to be under me ;)


Robin.


Re: [PATCH] kernel: dma: Make CMA boot parameters __ro_after_init

2019-10-16 Thread Shyam Saini
Hi Nathan,

On Mon, Oct 14, 2019 at 7:55 AM Nathan Chancellor
 wrote:
>
> On Sat, Oct 12, 2019 at 05:59:18PM +0530, Shyam Saini wrote:
> > This parameters are not changed after early boot.
> > By making them __ro_after_init will reduce any attack surface in the
> > kernel.
> >
> > Link: https://lwn.net/Articles/676145/
> > Cc: Christoph Hellwig 
> > Cc: Marek Szyprowski 
> > Cc: Robin Murphy 
> > Cc: Matthew Wilcox 
> > Cc: Christopher Lameter 
> > Cc: Kees Cook 
> > Signed-off-by: Shyam Saini 
> > ---
> >  kernel/dma/contiguous.c | 8 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> > index 69cfb4345388..1b689b1303cd 100644
> > --- a/kernel/dma/contiguous.c
> > +++ b/kernel/dma/contiguous.c
> > @@ -42,10 +42,10 @@ struct cma *dma_contiguous_default_area;
> >   * Users, who want to set the size of global CMA area for their system
> >   * should use cma= kernel parameter.
> >   */
> > -static const phys_addr_t size_bytes = (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
> > -static phys_addr_t size_cmdline = -1;
> > -static phys_addr_t base_cmdline;
> > -static phys_addr_t limit_cmdline;
> > +static const phys_addr_t __ro_after_init size_bytes = 
> > (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
>
> The 0day bot reported an issue with this change with clang:
>
> https://groups.google.com/d/msgid/clang-built-linux/201910140334.nhultlt8%25lkp%40intel.com
>
> kernel/dma/contiguous.c:46:36: error: 'size_cmdline' causes a section type 
> conflict with 'size_bytes'
> static phys_addr_t __ro_after_init size_cmdline = -1;
>^
> kernel/dma/contiguous.c:45:42: note: declared here
> static const phys_addr_t __ro_after_init size_bytes = 
> (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
>  ^
> kernel/dma/contiguous.c:47:36: error: 'base_cmdline' causes a section type 
> conflict with 'size_bytes'
> static phys_addr_t __ro_after_init base_cmdline;
>^
> kernel/dma/contiguous.c:45:42: note: declared here
> static const phys_addr_t __ro_after_init size_bytes = 
> (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
>  ^
> kernel/dma/contiguous.c:48:36: error: 'limit_cmdline' causes a section type 
> conflict with 'size_bytes'
> static phys_addr_t __ro_after_init limit_cmdline;
>^
> kernel/dma/contiguous.c:45:42: note: declared here
> static const phys_addr_t __ro_after_init size_bytes = 
> (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M;
>  ^
> 3 errors generated.

Thanks for your feedback and reporting this error.

> The errors seem kind of cryptic at first but something that is const
> should automatically be in the read only section, this part of the
> commit seems unnecessary. Removing that part of the change fixes the error.

I have overlooked size_bytes variable
It shouldn't be const if it is declared as __ro_after_init.

I will fix and resend it.


Re: [RFC PATCH 0/6] SMMUv3 PMCG IMP DEF event support

2019-10-16 Thread John Garry

On 15/10/2019 19:00, Robin Murphy wrote:

Hi John,

On 30/09/2019 15:33, John Garry wrote:

This patchset adds IMP DEF event support for the SMMUv3 PMCG.

It is marked as an RFC as the method to identify the PMCG implementation
may be a quite disliked. And, in general, the series is somewhat
incomplete.

So the background is that the PMCG supports IMP DEF events, yet we
have no
method to identify the PMCG to know the IMP DEF events.

A method for identifying the PMCG implementation could be using
PMDEVARCH, but we cannot rely on this being set properly, as whether this
is implemented is not defined in SMMUv3 spec.

Another method would be perf event aliasing, but this method of event
matching is based on CPU id, which would not guarantee same
uniqueness as PMCG implementation.

Yet another method could be to continue using ACPI OEM ID in the IORT
code, but this does not scale. And it is not suitable if we ever add DT
support to the PMCG driver.

The method used in this series is based on matching on the parent SMMUv3
IIDR. We store this IIDR contents in the arm smmu structure as the first
element, which means that we don't have to expose SMMU APIs - this is
the part which may be disliked.

The final two patches switch the pre-existing PMCG model identification
from ACPI OEM ID to the same parent SMMUv3 IIDR matching.

For now, we only consider SMMUv3' nodes being the associated node for
PMCG.




Hi Robin,


Two significant concerns right off the bat:

- It seems more common than not for silicon designers to fail to
implement IIDR correctly, so it's only a matter of time before
inevitably needing to bring back some firmware-level identifier
abstraction (if not already - does Hi161x have PMCGs?)


Maybe there's a way that we can switch to this method, and leave the 
door open for an easy way to support firmware-level identifier again, if 
ever needed. I'm not too pushed - this was secondary to just allowing 
the PMCG driver know the associated SMMU model.


And, no, hi161x does not have any PMCGs.



- This seems like a step in entirely the wrong direction for supporting
.


So to support PMCGs that reference a Named Component or Root Complex, I 
thought that the IORT parsing code would have to do some secondary 
lookup to the associated SMMU, through the Named Component or Root 
Complex node.


What was your idea here?

Note: I do acknowledge that an overall issue is that we assume all PMCG 
IMP DEF events are same for a given SMMU model.




Interpreting the Node Reference is definitely a welcome improvement over
matching table headers, but absent a truly compelling argument to the
contrary, I'd rather retain the "PMCG model" abstraction in between that
and the driver itself (especially since those can trivially be hung off
compatibles once it comes to DT support).


For DT, I would assume that we just use compatible strings would allow 
us to identify the PMCG model.


On a related matter, is there still a need to deal with scenarios of the 
PMCG being located within the SMMU register map? As you may remember, we 
did have this issue but relocated the PMCG to outside the SMMU register 
map in a later chip rev.


Cheers,
John



Thanks,
Robin.



John Garry (6):
   ACPI/IORT: Set PMCG device parent
   iommu/arm-smmu-v3: Record IIDR in arm_smmu_device structure
   perf/smmuv3: Retrieve parent SMMUv3 IIDR
   perf/smmuv3: Support HiSilicon hip08 (hi1620) IMP DEF events
   perf/smmuv3: Match implementation options based on parent SMMU IIDR
   ACPI/IORT: Drop code to set the PMCG software-defined model

  drivers/acpi/arm64/iort.c | 69 ++--
  drivers/iommu/arm-smmu-v3.c   |  5 +++
  drivers/perf/arm_smmuv3_pmu.c | 84 ++-
  include/linux/acpi_iort.h |  8 
  4 files changed, 112 insertions(+), 54 deletions(-)



.






RE: [PATCH 2/2] virtio_ring: Use DMA API if memory is encrypted

2019-10-16 Thread Ram Pai
On Tue, Oct 15, 2019 at 09:35:01AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 11, 2019 at 06:25:19PM -0700, Ram Pai wrote:
> > From: Thiago Jung Bauermann 
> > 
> > Normally, virtio enables DMA API with VIRTIO_F_IOMMU_PLATFORM, which must
> > be set by both device and guest driver. However, as a hack, when DMA API
> > returns physical addresses, guest driver can use the DMA API; even though
> > device does not set VIRTIO_F_IOMMU_PLATFORM and just uses physical
> > addresses.
> 
> Sorry, but this is a complete bullshit hack.  Driver must always use
> the DMA API if they do DMA, and if virtio devices use physical addresses
> that needs to be returned through the platform firmware interfaces for
> the dma setup.  If you don't do that yet (which based on previous
> informations you don't), you need to fix it, and we can then quirk
> old implementations that already are out in the field.
> 
> In other words: we finally need to fix that virtio mess and not pile
> hacks on top of hacks.

So force all virtio devices to use DMA API, except when
VIRTIO_F_IOMMU_PLATFORM is not enabled?

Any help detailing the idea, will enable us fix this issue once for all.

Will something like below work? It removes the prior hacks, and
always uses DMA API; except when VIRTIO_F_IOMMU_PLATFORM is not enabled.

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c8be1c4..b593d3d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -240,22 +240,10 @@ static inline bool virtqueue_use_indirect(struct 
virtqueue *_vq,
 
 static bool vring_use_dma_api(struct virtio_device *vdev)
 {
-   if (!virtio_has_iommu_quirk(vdev))
-   return true;
-
-   /* Otherwise, we are left to guess. */
-   /*
-* In theory, it's possible to have a buggy QEMU-supposed
-* emulated Q35 IOMMU and Xen enabled at the same time.  On
-* such a configuration, virtio has never worked and will
-* not work without an even larger kludge.  Instead, enable
-* the DMA API if we're a Xen guest, which at least allows
-* all of the sensible Xen configurations to work correctly.
-*/
-   if (xen_domain())
-   return true;
+   if (virtio_has_iommu_quirk(vdev))
+   return false;
 
-   return false;
+   return true;
 }
 
 size_t virtio_max_dma_size(struct virtio_device *vdev)


-- 
Ram Pai

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved

2019-10-16 Thread Joerg Roedel
Hi,

On Tue, Oct 15, 2019 at 09:49:32AM -0700, Yian Chen wrote:
> VT-d RMRR (Reserved Memory Region Reporting) regions are reserved
> for device use only and should not be part of allocable memory pool of OS.
> 
> BIOS e820_table reports complete memory map to OS, including OS usable
> memory ranges and BIOS reserved memory ranges etc.
> 
> x86 BIOS may not be trusted to include RMRR regions as reserved type
> of memory in its e820 memory map, hence validate every RMRR entry
> with the e820 memory map to make sure the RMRR regions will not be
> used by OS for any other purposes.

Are there real systems in the wild where this is a problem?

> +static inline int __init
> +arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
> +{
> + u64 start = rmrr->base_address;
> + u64 end = rmrr->end_address + 1;
> +
> + if (e820__mapped_all(start, end, E820_TYPE_RESERVED))
> + return 0;
> +
> + pr_err(FW_BUG "No firmware reserved region can cover this RMRR 
> [%#018Lx-%#018Lx], contact BIOS vendor for fixes\n",
> +start, end - 1);
> + return -EFAULT;
> +}

Why -EFAULT, there is no fault involved? Usibg -EINVAL seems to be a better 
choice.



Re: [PATCH] iommu: rockchip: Free domain on .domain_free

2019-10-16 Thread Joerg Roedel
On Wed, Oct 02, 2019 at 02:29:23PM -0300, Ezequiel Garcia wrote:
> IOMMU domain resource life is well-defined, managed
> by .domain_alloc and .domain_free.
> 
> Therefore, domain-specific resources shouldn't be tied to
> the device life, but instead to its domain.
> 
> Signed-off-by: Ezequiel Garcia 
> ---
>  drivers/iommu/rockchip-iommu.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/6] iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers

2019-10-16 Thread Geert Uytterhoeven
On Tue, Oct 15, 2019 at 2:10 PM Yoshihiro Shimoda
 wrote:
> Since we will have changed memory mapping of the IPMMU in the future,
> This patch adds helper functions ipmmu_utlb_reg() and
> ipmmu_imu{asid,ctr}_write() for "uTLB" registers. No behavior change.
>
> Signed-off-by: Yoshihiro Shimoda 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/6] iommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro

2019-10-16 Thread Geert Uytterhoeven
On Tue, Oct 15, 2019 at 2:10 PM Yoshihiro Shimoda
 wrote:
> Since we will have changed memory mapping of the IPMMU in the future,
> this patch uses ipmmu_features values instead of a macro to
> calculate context registers offset. No behavior change.
>
> Signed-off-by: Yoshihiro Shimoda 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/6] iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers

2019-10-16 Thread Geert Uytterhoeven
On Tue, Oct 15, 2019 at 2:10 PM Yoshihiro Shimoda
 wrote:
> Since we will have changed memory mapping of the IPMMU in the future,
> This patch adds helper functions ipmmu_ctx_{reg,read,write}()
> for MMU "context" registers. No behavior change.
>
> Signed-off-by: Yoshihiro Shimoda 

Reviewed-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu