RE: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Yoshihiro Shimoda
Hi Christoph,

> From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 8:31 PM
> 
> On Wed, Jun 12, 2019 at 08:52:21AM +, Yoshihiro Shimoda wrote:
> > Hi Christoph,
> >
> > > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM
> > >
> > > First things first:
> > >
> > > Yoshihiro, can you try this git branch?  The new bits are just the three
> > > patches at the end, but they sit on top of a few patches already sent
> > > out to the list, so a branch is probably either:
> > >
> > >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes
> >
> > Thank you for the patches!
> > Unfortunately, the three patches could not resolve this issue.
> > However, it's a hint to me, and then I found the root cause:
> >  - slave_configure() in drivers/usb/storage/scsiglue.c calls
> >blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when 
> > USB_SPEED_SUPER or more.
> >  -- So that, even if your patches (also I fixed it a little [1]) could not 
> > resolve
> > the issue because the max_sectors is overwritten by above code.
> >
> > So, I think we should fix the slave_configure() by using 
> > dma_max_mapping_size().
> > What do you think? If so, I can make such a patch.
> 
> Yes, please do.

Thank you for your comment. I sent a patch to related mailing lists and you.

Best regards,
Yoshihiro Shimoda

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] iommu/amd: make iommu_disable safer

2019-06-12 Thread Kevin Mitchell via iommu
Make it safe to call iommu_disable during early init error conditions
before mmio_base is set, but after the struct amd_iommu has been added
to the amd_iommu_list. For example, this happens if firmware fails to
fill in mmio_phys in the ACPI table leading to a NULL pointer
dereference in iommu_feature_disable.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index f773792d77fd..3798d7303c99 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -424,6 +424,9 @@ static void iommu_enable(struct amd_iommu *iommu)
 
 static void iommu_disable(struct amd_iommu *iommu)
 {
+   if (!iommu->mmio_base)
+   return;
+
/* Disable command buffer */
iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] iommu/amd: only free resources once on init error

2019-06-12 Thread Kevin Mitchell via iommu
When amd_iommu=off was specified on the command line, free_X_resources
functions were called immediately after early_amd_iommu_init. They were
then called again when amd_iommu_init also failed (as expected).

Instead, call them only once: at the end of state_next() whenever
there's an error. These functions should be safe to call any time and
any number of times. However, since state_next is never called again in
an error state, the cleanup will only ever be run once.

This also ensures that cleanup code is run as soon as possible after an
error is detected rather than waiting for amd_iommu_init() to be called.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 27 +--
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 5f3df5ae6ba8..24fc060fe596 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2638,8 +2638,6 @@ static int __init state_next(void)
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_ACPI_FINISHED;
if (init_state == IOMMU_ACPI_FINISHED && amd_iommu_disabled) {
pr_info("AMD IOMMU disabled on kernel command-line\n");
-   free_dma_resources();
-   free_iommu_resources();
init_state = IOMMU_CMDLINE_DISABLED;
ret = -EINVAL;
}
@@ -2680,6 +2678,19 @@ static int __init state_next(void)
BUG();
}
 
+   if (ret) {
+   free_dma_resources();
+   if (!irq_remapping_enabled) {
+   disable_iommus();
+   free_iommu_resources();
+   } else {
+   struct amd_iommu *iommu;
+
+   uninit_device_table_dma();
+   for_each_iommu(iommu)
+   iommu_flush_all_caches(iommu);
+   }
+   }
return ret;
 }
 
@@ -2753,18 +2764,6 @@ static int __init amd_iommu_init(void)
int ret;
 
ret = iommu_go_to_state(IOMMU_INITIALIZED);
-   if (ret) {
-   free_dma_resources();
-   if (!irq_remapping_enabled) {
-   disable_iommus();
-   free_iommu_resources();
-   } else {
-   uninit_device_table_dma();
-   for_each_iommu(iommu)
-   iommu_flush_all_caches(iommu);
-   }
-   }
-
 #ifdef CONFIG_GART_IOMMU
if (ret && list_empty(_iommu_list)) {
/*
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/3] iommu/amd: move gart fallback to amd_iommu_init

2019-06-12 Thread Kevin Mitchell via iommu
The fallback to the GART driver in the case amd_iommu doesn't work was
executed in a function called free_iommu_resources, which didn't really
make sense. This was even being called twice if amd_iommu=off was
specified on the command line.

The only complication is that it needs to be verified that amd_iommu has
fully relinquished control by calling free_iommu_resources and emptying
the amd_iommu_list.

Signed-off-by: Kevin Mitchell 
---
 drivers/iommu/amd_iommu_init.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 3798d7303c99..5f3df5ae6ba8 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -2345,15 +2345,6 @@ static void __init free_iommu_resources(void)
amd_iommu_dev_table = NULL;
 
free_iommu_all();
-
-#ifdef CONFIG_GART_IOMMU
-   /*
-* We failed to initialize the AMD IOMMU - try fallback to GART
-* if possible.
-*/
-   gart_iommu_init();
-
-#endif
 }
 
 /* SB IOAPIC is always on this device in AMD systems */
@@ -2774,6 +2765,16 @@ static int __init amd_iommu_init(void)
}
}
 
+#ifdef CONFIG_GART_IOMMU
+   if (ret && list_empty(_iommu_list)) {
+   /*
+* We failed to initialize the AMD IOMMU - try fallback
+* to GART if possible.
+*/
+   gart_iommu_init();
+   }
+#endif
+
for_each_iommu(iommu)
amd_iommu_debugfs_setup(iommu);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] handle init errors more gracefully in amd_iommu

2019-06-12 Thread Kevin Mitchell via iommu
This series makes error handling more robust in the amd_iommu init
code. It was initially motivated by problematic firmware that does not
set up the physical address of the iommu. This led to a NULL dereference
panic when iommu_disable was called during cleanup.

While the first patch is sufficient to avoid the panic, the subsequent
two move the cleanup closer to the actual error and avoid calling the
cleanup code twice when amd_iommu=off is specified on the command line.

I have tested this series on a variety of AMD CPUs with firmware
exhibiting the issue. I have additionally tested on platforms where the
firmware has been fixed. I tried both with and without amd_iommu=off. I
have also tested on older CPUs where no IOMMU is detected and even one
where the GART driver ends up running.

Thanks,

Kevin

Kevin Mitchell (3):
  iommu/amd: make iommu_disable safer
  iommu/amd: move gart fallback to amd_iommu_init
  iommu/amd: only free resources once on init error

 drivers/iommu/amd_iommu_init.c | 45 ++
 1 file changed, 24 insertions(+), 21 deletions(-)

-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: Add padding to struct iommu_fault

2019-06-12 Thread Auger Eric
Hi Jean,

On 6/12/19 7:59 PM, Jean-Philippe Brucker wrote:
> Ease future extensions of struct iommu_fault_page_request and struct
> iommu_fault_unrecoverable by adding a few bytes of padding. That way, a
> new field can be added to either of these structures by simply introducing
> a new flag. To extend it after the size limit is reached, a new fault
> reporting structure will have to be negotiated with userspace.
> 
> With 56 bytes of padding, the total size of iommu_fault is 64 bytes and
> fits in a cache line on a lot of contemporary machines, while providing 16
> and 24 bytes of extension to structures iommu_fault_page_request and
> iommu_fault_unrecoverable respectively.
> 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  include/uapi/linux/iommu.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index f45d8e9e59c3..fc00c5d4741b 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -105,15 +105,17 @@ struct iommu_fault_page_request {
>   * @type: fault type from  iommu_fault_type
>   * @padding: reserved for future use (should be zero)
>   * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
>   * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
> + * @padding2: sets the fault size to allow for future extensions
>   */
>  struct iommu_fault {
>   __u32   type;
>   __u32   padding;
>   union {
>   struct iommu_fault_unrecoverable event;
>   struct iommu_fault_page_request prm;
> + __u8 padding2[56];
>   };
>  };
>  
>  /**
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 1/2] iommu: arm-smmu: Handoff SMR registers and context banks

2019-06-12 Thread Jeffrey Hugo

On 6/12/2019 12:42 PM, Bjorn Andersson wrote:

On Wed 12 Jun 11:07 PDT 2019, Jeffrey Hugo wrote:


On Wed, Jun 5, 2019 at 3:09 PM Bjorn Andersson
 wrote:


Boot splash screen or EFI framebuffer requires the display hardware to
operate while the Linux iommu driver probes. Therefore, we cannot simply
wipe out the SMR register settings programmed by the bootloader.

Detect which SMR registers are in use during probe, and which context
banks they are associated with. Reserve these context banks for the
first Linux device whose stream-id matches the SMR register.

Any existing page-tables will be discarded.

Heavily based on downstream implementation by Patrick Daly
.

Signed-off-by: Bjorn Andersson 


Reviewed-by: Jeffrey Hugo 

This is very similar to the hacked up version I did, and I'm glad to see it
cleaned up and posted.

This is important work, and I want to see it move forward, however it doesn't
completely address everything, IMO.  Fixing the remaining issues certainly
can be follow on work, but I don't know if they would end up affecting this
implementation.

So, with this series, we've gone from a crash on msm8998/sdm845, to causing
context faults.  This is because while we are now not nuking the config, we
are preventing the bootloader installed translations from working.  Essentially
the display already has a mapping (technically passthrough) that is likely being
used by EFIFB.  The instant the SMMU inits, that mapping becomes invalid,
and the display is going to generate context faults.  While not fatal,
this provides
a bad user experience as there are nasty messages, and EFIFB stops working.

The situation does get resolved once the display driver inits and takes over the
HW and SMMU mappings, so we are not stuck with it, but it would be nice to
have that addressed as well, ie have EFIFB working up until the Linux display
driver takes over.



But do you see this even though you don't enable the mdss driver?


The only way I can see that happening is if the SMMU leaves the context bank
alone, with M == 0, and the iommu framework defines a domain attribute or
some other mechanism to allow the driver to flip the M bit in the context bank
after installing the necessary handover translations.



 From what I can tell this implementation leaves the framebuffer mapping
in working condition until the attach_dev of the display driver, at
which time we do get context faults until the display driver is done
initializing things.

So we're reducing the problem to a question of how to seamlessly carry
over the mapping during the attach.


Actually, you are correct.  Without mdss this won't occur.  The window 
is from the creation of the default domain for the mdss device, to the 
point that the display is fully init'd (and EFIFB is shut down).




Regards,
Bjorn


---
  drivers/iommu/arm-smmu-regs.h |  2 +
  drivers/iommu/arm-smmu.c  | 80 ---
  2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index e9132a926761..8c1fd84032a2 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -105,7 +105,9 @@
  #define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
  #define SMR_VALID  (1 << 31)
  #define SMR_MASK_SHIFT 16
+#define SMR_MASK_MASK  0x7fff
  #define SMR_ID_SHIFT   0
+#define SMR_ID_MASK0x

  #define ARM_SMMU_GR0_S2CR(n)   (0xc00 + ((n) << 2))
  #define S2CR_CBNDX_SHIFT   0
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 5e54cc0a28b3..c8629a656b42 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -135,6 +135,7 @@ struct arm_smmu_s2cr {
 enum arm_smmu_s2cr_type type;
 enum arm_smmu_s2cr_privcfg  privcfg;
 u8  cbndx;
+   boolhandoff;
  };

  #define s2cr_init_val (struct arm_smmu_s2cr){  \
@@ -399,9 +400,22 @@ static int arm_smmu_register_legacy_master(struct device 
*dev,
 return err;
  }

-static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
+static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start,
+  struct device *dev)
  {
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   unsigned long *map = smmu->context_map;
+   int end = smmu->num_context_banks;
+   int cbndx;
 int idx;
+   int i;
+
+   for_each_cfg_sme(fwspec, i, idx) {
+   if (smmu->s2crs[idx].handoff) {
+   cbndx = smmu->s2crs[idx].cbndx;
+   goto found_handoff;
+   }
+   }

 do {
 idx = find_next_zero_bit(map, end, start);
@@ -410,6 +424,17 @@ static int __arm_smmu_alloc_bitmap(unsigned long *map, int 
start, int end)
 

Re: [PATCH] iommu: Add padding to struct iommu_fault

2019-06-12 Thread Jacob Pan
On Wed, 12 Jun 2019 18:59:38 +0100
Jean-Philippe Brucker  wrote:

> Ease future extensions of struct iommu_fault_page_request and struct
> iommu_fault_unrecoverable by adding a few bytes of padding. That way,
> a new field can be added to either of these structures by simply
> introducing a new flag. To extend it after the size limit is reached,
> a new fault reporting structure will have to be negotiated with
> userspace.
> 
> With 56 bytes of padding, the total size of iommu_fault is 64 bytes
> and fits in a cache line on a lot of contemporary machines, while
> providing 16 and 24 bytes of extension to structures
> iommu_fault_page_request and iommu_fault_unrecoverable respectively.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  include/uapi/linux/iommu.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index f45d8e9e59c3..fc00c5d4741b 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -105,15 +105,17 @@ struct iommu_fault_page_request {
>   * @type: fault type from  iommu_fault_type
>   * @padding: reserved for future use (should be zero)
>   * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
>   * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
> + * @padding2: sets the fault size to allow for future extensions
>   */
>  struct iommu_fault {
>   __u32   type;
>   __u32   padding;
>   union {
>   struct iommu_fault_unrecoverable event;
>   struct iommu_fault_page_request prm;
> + __u8 padding2[56];
>   };
>  };
>  

Acked-by: Jacob Pan 

>  /**

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-12 Thread Jacob Pan
On Wed, 12 Jun 2019 15:11:43 +0200
Joerg Roedel  wrote:

> On Wed, Jun 12, 2019 at 12:54:51PM +0100, Jean-Philippe Brucker wrote:
> > Thanks! As discussed I think we need to add padding into the
> > iommu_fault structure before this reaches mainline, to make the
> > UAPI easier to extend in the future. It's already possible to
> > extend but requires introducing a new ABI version number and
> > support two structures. Adding some padding would only require
> > introducing new flags. If there is no objection I'll send a
> > one-line patch bumping the structure size to 64 bytes (currently
> > 48)  
> 
> Sounds good, please submit the patch.
> 
Could you also add padding to page response per our discussion here?
https://lkml.org/lkml/2019/6/12/1131

> Regards,
> 
>   Joerg

[Jacob Pan]


Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-12 Thread Jacob Pan
On Tue, 11 Jun 2019 14:14:33 +0100
Jean-Philippe Brucker  wrote:

> On 10/06/2019 22:31, Jacob Pan wrote:
> > On Mon, 10 Jun 2019 13:45:02 +0100
> > Jean-Philippe Brucker  wrote:
> >   
> >> On 07/06/2019 18:43, Jacob Pan wrote:  
> > So it seems we agree on the following:
> > - iommu_unregister_device_fault_handler() will never fail
> > - iommu driver cleans up all pending faults when handler is
> > unregistered
> > - assume device driver or guest not sending more page response
> > _after_ handler is unregistered.
> > - system will tolerate rare spurious response
> >
> > Sounds right?  
> 
>  Yes, I'll add that to the fault series
> >>> Hold on a second please, I think we need more clarifications.
> >>> Ashok pointed out to me that the spurious response can be harmful
> >>> to other devices when it comes to mdev, where PRQ group id is not
> >>> per PASID, device may reuse the group number and receiving
> >>> spurious page response can confuse the entire PF. 
> >>
> >> I don't understand how mdev differs from the non-mdev situation
> >> (but I also still don't fully get how mdev+PASID will be
> >> implemented). Is the following the case you're worried about?
> >>
> >>   M#: mdev #
> >>
> >> # Dev Hostmdev drv   VFIO/QEMUGuest
> >> 
> >> 1 <- reg(handler)
> >> 2 PR1 G1 P1-> M1 PR1 G1inject -> M1 PR1 G1
> >> 3 <- unreg(handler)
> >> 4   <- PS1 G1 P1 (F)  |
> >> 5unreg(handler)
> >> 6 <- reg(handler)
> >> 7 PR2 G1 P1-> M2 PR2 G1inject -> M2 PR2 G1
> >> 8 <- M1 PS1 G1
> >> 9 accept ??<- PS1 G1 P1
> >> 10<- M2 PS2 G1
> >> 11accept   <- PS2 G1 P1
> >>  
> > Not really. I am not worried about PASID reuse or unbind. Just
> > within the same PASID bind lifetime of a single mdev, back to back
> > register/unregister fault handler.
> > After Step 4, device will think G1 is done. Device could reuse G1
> > for the next PR, if we accept PS1 in step 9, device will terminate
> > G1 before the real G1 PS arrives in Step 11. The real G1 PS might
> > have a different response code. Then we just drop the PS in Step
> > 11?  
> 
> Yes, I think we do. Two possibilities:
> 
> * G1 is reused at step 7 for the same PASID context, which means that
> it is for the same mdev. The problem is then identical to the non-mdev
> case, new page faults and old page response may cross:
> 
> # Dev Hostmdev drv   VFIO/QEMUGuest
> 
> 7 PR2 G1 P1  --.
> 8   \ .- M1 PS1 G1
> 9'->  PR2 G1 P1  ->  /   inject  --> M1 PR2 G1
> 10   accept <---  PS1 G1 P1  <--'
> 11   reject <---  PS2 G1 P1  <-- M1 PS2 G1
> 
> And the incorrect page response is returned to the guest. However it
> affects a single mdev/guest context, it doesn't affect other mdevs.
> 
> * Or G1 is reused at step 7 for a different PASID. At step 10 the
> fault handler rejects the page response because the PASID is
> different, and step 11 is accepted.
> 
> 
> >>> Having spurious page response is also not
> >>> abiding the PCIe spec. exactly.
> >>
> >> We are following the PCI spec though, in that we don't send page
> >> responses for PRGIs that aren't in flight.
> >>  
> > You are right, the worst case of the spurious PS is to terminate the
> > group prematurely. Need to know the scope of the HW damage in case
> > of mdev where group IDs can be shared among mdevs belong to the
> > same PF.  
> 
> But from the IOMMU fault API point of view, the full page request is
> identified by both PRGI and PASID. Given that each mdev has its own
> set of PASIDs, it should be easy to isolate page responses per mdev.
> 
On Intel platform, devices sending page request with private data must
receive page response with matching private data. If we solely depend
on PRGI and PASID, we may send stale private data to the device in
those incorrect page response. Since private data may represent PF
device wide contexts, the consequence of sending page response with
wrong private data may affect other mdev/PASID.

One solution we are thinking to do is to inject the sequence #(e.g.
ktime raw mono clock) as vIOMMU private data into to the guest. Guest
would return this fake private data in page response, then host will
send page response back to the device that matches PRG1 and PASID and
private_data.

This solution does not expose HW context related private data to the
guest but need to extend page response in iommu uapi.

/**
 * struct iommu_page_response - Generic page response information
 * @version: 

Re: [RFC 1/2] iommu: arm-smmu: Handoff SMR registers and context banks

2019-06-12 Thread Bjorn Andersson
On Wed 12 Jun 11:07 PDT 2019, Jeffrey Hugo wrote:

> On Wed, Jun 5, 2019 at 3:09 PM Bjorn Andersson
>  wrote:
> >
> > Boot splash screen or EFI framebuffer requires the display hardware to
> > operate while the Linux iommu driver probes. Therefore, we cannot simply
> > wipe out the SMR register settings programmed by the bootloader.
> >
> > Detect which SMR registers are in use during probe, and which context
> > banks they are associated with. Reserve these context banks for the
> > first Linux device whose stream-id matches the SMR register.
> >
> > Any existing page-tables will be discarded.
> >
> > Heavily based on downstream implementation by Patrick Daly
> > .
> >
> > Signed-off-by: Bjorn Andersson 
> 
> Reviewed-by: Jeffrey Hugo 
> 
> This is very similar to the hacked up version I did, and I'm glad to see it
> cleaned up and posted.
> 
> This is important work, and I want to see it move forward, however it doesn't
> completely address everything, IMO.  Fixing the remaining issues certainly
> can be follow on work, but I don't know if they would end up affecting this
> implementation.
> 
> So, with this series, we've gone from a crash on msm8998/sdm845, to causing
> context faults.  This is because while we are now not nuking the config, we
> are preventing the bootloader installed translations from working.  
> Essentially
> the display already has a mapping (technically passthrough) that is likely 
> being
> used by EFIFB.  The instant the SMMU inits, that mapping becomes invalid,
> and the display is going to generate context faults.  While not fatal,
> this provides
> a bad user experience as there are nasty messages, and EFIFB stops working.
> 
> The situation does get resolved once the display driver inits and takes over 
> the
> HW and SMMU mappings, so we are not stuck with it, but it would be nice to
> have that addressed as well, ie have EFIFB working up until the Linux display
> driver takes over.
> 

But do you see this even though you don't enable the mdss driver?

> The only way I can see that happening is if the SMMU leaves the context bank
> alone, with M == 0, and the iommu framework defines a domain attribute or
> some other mechanism to allow the driver to flip the M bit in the context bank
> after installing the necessary handover translations.
> 

>From what I can tell this implementation leaves the framebuffer mapping
in working condition until the attach_dev of the display driver, at
which time we do get context faults until the display driver is done
initializing things.

So we're reducing the problem to a question of how to seamlessly carry
over the mapping during the attach.

Regards,
Bjorn

> > ---
> >  drivers/iommu/arm-smmu-regs.h |  2 +
> >  drivers/iommu/arm-smmu.c  | 80 ---
> >  2 files changed, 77 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> > index e9132a926761..8c1fd84032a2 100644
> > --- a/drivers/iommu/arm-smmu-regs.h
> > +++ b/drivers/iommu/arm-smmu-regs.h
> > @@ -105,7 +105,9 @@
> >  #define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
> >  #define SMR_VALID  (1 << 31)
> >  #define SMR_MASK_SHIFT 16
> > +#define SMR_MASK_MASK  0x7fff
> >  #define SMR_ID_SHIFT   0
> > +#define SMR_ID_MASK0x
> >
> >  #define ARM_SMMU_GR0_S2CR(n)   (0xc00 + ((n) << 2))
> >  #define S2CR_CBNDX_SHIFT   0
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index 5e54cc0a28b3..c8629a656b42 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -135,6 +135,7 @@ struct arm_smmu_s2cr {
> > enum arm_smmu_s2cr_type type;
> > enum arm_smmu_s2cr_privcfg  privcfg;
> > u8  cbndx;
> > +   boolhandoff;
> >  };
> >
> >  #define s2cr_init_val (struct arm_smmu_s2cr){  \
> > @@ -399,9 +400,22 @@ static int arm_smmu_register_legacy_master(struct 
> > device *dev,
> > return err;
> >  }
> >
> > -static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
> > +static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start,
> > +  struct device *dev)
> >  {
> > +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> > +   unsigned long *map = smmu->context_map;
> > +   int end = smmu->num_context_banks;
> > +   int cbndx;
> > int idx;
> > +   int i;
> > +
> > +   for_each_cfg_sme(fwspec, i, idx) {
> > +   if (smmu->s2crs[idx].handoff) {
> > +   cbndx = smmu->s2crs[idx].cbndx;
> > +   goto found_handoff;
> > +   }
> > +   }
> >
> > do {
> > idx = find_next_zero_bit(map, end, start);
> > @@ -410,6 +424,17 @@ static int 

Re: [RFC 2/2] iommu: arm-smmu: Don't blindly use first SMR to calculate mask

2019-06-12 Thread Bjorn Andersson
On Wed 12 Jun 10:58 PDT 2019, Jeffrey Hugo wrote:

> On Wed, Jun 5, 2019 at 3:09 PM Bjorn Andersson
>  wrote:
> >
> > With the SMRs inherited from the bootloader the first SMR might actually
> > be valid and in use. As such probing the SMR mask using the first SMR
> > might break a stream in use. Search for an unused stream and use this to
> > probe the SMR mask.
> >
> > Signed-off-by: Bjorn Andersson 
> 
> Reviewed-by: Jeffrey Hugo 
> 
> I don't quite like the situation where the is no SMR to compute the mask, but 
> I
> think the way you've handled it is the best option/
> 

Right, if this happens we would end up using the smr_mask that was
previously calculated. We just won't update it based on the hardware.

> I'm curious, why is this not included in patch #1?  Seems like patch
> #1 introduces
> the issue, yet doesn't also fix it.
> 

You're right, didn't think about that. This needs to either predate that
patch or be included in it.

Thanks,
Bjorn

> > ---
> >  drivers/iommu/arm-smmu.c | 20 
> >  1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index c8629a656b42..0c6f5fe6f382 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1084,23 +1084,35 @@ static void arm_smmu_test_smr_masks(struct 
> > arm_smmu_device *smmu)
> >  {
> > void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> > u32 smr;
> > +   int idx;
> >
> > if (!smmu->smrs)
> > return;
> >
> > +   for (idx = 0; idx < smmu->num_mapping_groups; idx++) {
> > +   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
> > +   if (!(smr & SMR_VALID))
> > +   break;
> > +   }
> > +
> > +   if (idx == smmu->num_mapping_groups) {
> > +   dev_err(smmu->dev, "Unable to compute streamid_mask\n");
> > +   return;
> > +   }
> > +
> > /*
> >  * SMR.ID bits may not be preserved if the corresponding MASK
> >  * bits are set, so check each one separately. We can reject
> >  * masters later if they try to claim IDs outside these masks.
> >  */
> > smr = smmu->streamid_mask << SMR_ID_SHIFT;
> > -   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
> > -   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
> > +   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx));
> > +   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
> > smmu->streamid_mask = smr >> SMR_ID_SHIFT;
> >
> > smr = smmu->streamid_mask << SMR_MASK_SHIFT;
> > -   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
> > -   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
> > +   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx));
> > +   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
> > smmu->smr_mask_mask = smr >> SMR_MASK_SHIFT;
> >  }
> >
> > --
> > 2.18.0
> >
> >
> > ___
> > linux-arm-kernel mailing list
> > linux-arm-ker...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [RFC 1/2] iommu: arm-smmu: Handoff SMR registers and context banks

2019-06-12 Thread Jeffrey Hugo
On Wed, Jun 5, 2019 at 3:09 PM Bjorn Andersson
 wrote:
>
> Boot splash screen or EFI framebuffer requires the display hardware to
> operate while the Linux iommu driver probes. Therefore, we cannot simply
> wipe out the SMR register settings programmed by the bootloader.
>
> Detect which SMR registers are in use during probe, and which context
> banks they are associated with. Reserve these context banks for the
> first Linux device whose stream-id matches the SMR register.
>
> Any existing page-tables will be discarded.
>
> Heavily based on downstream implementation by Patrick Daly
> .
>
> Signed-off-by: Bjorn Andersson 

Reviewed-by: Jeffrey Hugo 

This is very similar to the hacked up version I did, and I'm glad to see it
cleaned up and posted.

This is important work, and I want to see it move forward, however it doesn't
completely address everything, IMO.  Fixing the remaining issues certainly
can be follow on work, but I don't know if they would end up affecting this
implementation.

So, with this series, we've gone from a crash on msm8998/sdm845, to causing
context faults.  This is because while we are now not nuking the config, we
are preventing the bootloader installed translations from working.  Essentially
the display already has a mapping (technically passthrough) that is likely being
used by EFIFB.  The instant the SMMU inits, that mapping becomes invalid,
and the display is going to generate context faults.  While not fatal,
this provides
a bad user experience as there are nasty messages, and EFIFB stops working.

The situation does get resolved once the display driver inits and takes over the
HW and SMMU mappings, so we are not stuck with it, but it would be nice to
have that addressed as well, ie have EFIFB working up until the Linux display
driver takes over.

The only way I can see that happening is if the SMMU leaves the context bank
alone, with M == 0, and the iommu framework defines a domain attribute or
some other mechanism to allow the driver to flip the M bit in the context bank
after installing the necessary handover translations.

> ---
>  drivers/iommu/arm-smmu-regs.h |  2 +
>  drivers/iommu/arm-smmu.c  | 80 ---
>  2 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> index e9132a926761..8c1fd84032a2 100644
> --- a/drivers/iommu/arm-smmu-regs.h
> +++ b/drivers/iommu/arm-smmu-regs.h
> @@ -105,7 +105,9 @@
>  #define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
>  #define SMR_VALID  (1 << 31)
>  #define SMR_MASK_SHIFT 16
> +#define SMR_MASK_MASK  0x7fff
>  #define SMR_ID_SHIFT   0
> +#define SMR_ID_MASK0x
>
>  #define ARM_SMMU_GR0_S2CR(n)   (0xc00 + ((n) << 2))
>  #define S2CR_CBNDX_SHIFT   0
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 5e54cc0a28b3..c8629a656b42 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -135,6 +135,7 @@ struct arm_smmu_s2cr {
> enum arm_smmu_s2cr_type type;
> enum arm_smmu_s2cr_privcfg  privcfg;
> u8  cbndx;
> +   boolhandoff;
>  };
>
>  #define s2cr_init_val (struct arm_smmu_s2cr){  \
> @@ -399,9 +400,22 @@ static int arm_smmu_register_legacy_master(struct device 
> *dev,
> return err;
>  }
>
> -static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
> +static int __arm_smmu_alloc_cb(struct arm_smmu_device *smmu, int start,
> +  struct device *dev)
>  {
> +   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +   unsigned long *map = smmu->context_map;
> +   int end = smmu->num_context_banks;
> +   int cbndx;
> int idx;
> +   int i;
> +
> +   for_each_cfg_sme(fwspec, i, idx) {
> +   if (smmu->s2crs[idx].handoff) {
> +   cbndx = smmu->s2crs[idx].cbndx;
> +   goto found_handoff;
> +   }
> +   }
>
> do {
> idx = find_next_zero_bit(map, end, start);
> @@ -410,6 +424,17 @@ static int __arm_smmu_alloc_bitmap(unsigned long *map, 
> int start, int end)
> } while (test_and_set_bit(idx, map));
>
> return idx;
> +
> +found_handoff:
> +   for (i = 0; i < smmu->num_mapping_groups; i++) {
> +   if (smmu->s2crs[i].cbndx == cbndx) {
> +   smmu->s2crs[i].cbndx = 0;
> +   smmu->s2crs[i].handoff = false;
> +   smmu->s2crs[i].count--;
> +   }
> +   }
> +
> +   return cbndx;
>  }
>
>  static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
> @@ -759,7 +784,8 @@ static void arm_smmu_write_context_bank(struct 
> arm_smmu_device *smmu, int idx)
>  }
>
>  static int 

[PATCH] iommu: Add padding to struct iommu_fault

2019-06-12 Thread Jean-Philippe Brucker
Ease future extensions of struct iommu_fault_page_request and struct
iommu_fault_unrecoverable by adding a few bytes of padding. That way, a
new field can be added to either of these structures by simply introducing
a new flag. To extend it after the size limit is reached, a new fault
reporting structure will have to be negotiated with userspace.

With 56 bytes of padding, the total size of iommu_fault is 64 bytes and
fits in a cache line on a lot of contemporary machines, while providing 16
and 24 bytes of extension to structures iommu_fault_page_request and
iommu_fault_unrecoverable respectively.

Signed-off-by: Jean-Philippe Brucker 
---
 include/uapi/linux/iommu.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index f45d8e9e59c3..fc00c5d4741b 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -105,15 +105,17 @@ struct iommu_fault_page_request {
  * @type: fault type from  iommu_fault_type
  * @padding: reserved for future use (should be zero)
  * @event: fault event, when @type is %IOMMU_FAULT_DMA_UNRECOV
  * @prm: Page Request message, when @type is %IOMMU_FAULT_PAGE_REQ
+ * @padding2: sets the fault size to allow for future extensions
  */
 struct iommu_fault {
__u32   type;
__u32   padding;
union {
struct iommu_fault_unrecoverable event;
struct iommu_fault_page_request prm;
+   __u8 padding2[56];
};
 };
 
 /**
-- 
2.21.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 2/2] iommu: arm-smmu: Don't blindly use first SMR to calculate mask

2019-06-12 Thread Jeffrey Hugo
On Wed, Jun 5, 2019 at 3:09 PM Bjorn Andersson
 wrote:
>
> With the SMRs inherited from the bootloader the first SMR might actually
> be valid and in use. As such probing the SMR mask using the first SMR
> might break a stream in use. Search for an unused stream and use this to
> probe the SMR mask.
>
> Signed-off-by: Bjorn Andersson 

Reviewed-by: Jeffrey Hugo 

I don't quite like the situation where the is no SMR to compute the mask, but I
think the way you've handled it is the best option/

I'm curious, why is this not included in patch #1?  Seems like patch
#1 introduces
the issue, yet doesn't also fix it.

> ---
>  drivers/iommu/arm-smmu.c | 20 
>  1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8629a656b42..0c6f5fe6f382 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1084,23 +1084,35 @@ static void arm_smmu_test_smr_masks(struct 
> arm_smmu_device *smmu)
>  {
> void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> u32 smr;
> +   int idx;
>
> if (!smmu->smrs)
> return;
>
> +   for (idx = 0; idx < smmu->num_mapping_groups; idx++) {
> +   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
> +   if (!(smr & SMR_VALID))
> +   break;
> +   }
> +
> +   if (idx == smmu->num_mapping_groups) {
> +   dev_err(smmu->dev, "Unable to compute streamid_mask\n");
> +   return;
> +   }
> +
> /*
>  * SMR.ID bits may not be preserved if the corresponding MASK
>  * bits are set, so check each one separately. We can reject
>  * masters later if they try to claim IDs outside these masks.
>  */
> smr = smmu->streamid_mask << SMR_ID_SHIFT;
> -   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
> -   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
> +   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx));
> +   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
> smmu->streamid_mask = smr >> SMR_ID_SHIFT;
>
> smr = smmu->streamid_mask << SMR_MASK_SHIFT;
> -   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
> -   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
> +   writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(idx));
> +   smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(idx));
> smmu->smr_mask_mask = smr >> SMR_MASK_SHIFT;
>  }
>
> --
> 2.18.0
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


[PATCH] swiotlb: no need to check return value of debugfs_create functions

2019-06-12 Thread Greg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Konrad Rzeszutek Wilk 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Robin Murphy 
Cc: iommu@lists.linux-foundation.org
Signed-off-by: Greg Kroah-Hartman 
---
 kernel/dma/swiotlb.c | 25 -
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 13f0cb080a4d..62fa5a82a065 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -696,29 +696,12 @@ bool is_swiotlb_active(void)
 
 static int __init swiotlb_create_debugfs(void)
 {
-   struct dentry *d_swiotlb_usage;
-   struct dentry *ent;
-
-   d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
-
-   if (!d_swiotlb_usage)
-   return -ENOMEM;
-
-   ent = debugfs_create_ulong("io_tlb_nslabs", 0400,
-  d_swiotlb_usage, _tlb_nslabs);
-   if (!ent)
-   goto fail;
-
-   ent = debugfs_create_ulong("io_tlb_used", 0400,
-  d_swiotlb_usage, _tlb_used);
-   if (!ent)
-   goto fail;
+   struct dentry *root;
 
+   root = debugfs_create_dir("swiotlb", NULL);
+   debugfs_create_ulong("io_tlb_nslabs", 0400, root, _tlb_nslabs);
+   debugfs_create_ulong("io_tlb_used", 0400, root, _tlb_used);
return 0;
-
-fail:
-   debugfs_remove_recursive(d_swiotlb_usage);
-   return -ENOMEM;
 }
 
 late_initcall(swiotlb_create_debugfs);
-- 
2.22.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Alan Stern
On Wed, 12 Jun 2019, Christoph Hellwig wrote:

> On Wed, Jun 12, 2019 at 01:46:06PM +0200, Oliver Neukum wrote:
> > > Thay is someething the virt_boundary prevents.  But could still give
> > > you something like:
> > > 
> > >   1536 4096 4096 1024
> > > 
> > > or
> > >   1536 16384 8192 4096 16384 512
> > 
> > That would kill the driver, if maxpacket were 1024.
> > 
> > USB has really two kinds of requirements
> > 
> > 1. What comes from the protocol
> > 2. What comes from the HCD
> > 
> > The protocol wants just multiples of maxpacket. XHCI can satisfy
> > that in arbitrary scatter/gather. Other HCs cannot.
> 
> We have no real way to enforce that for the other HCs unfortunately.
> I can't really think of any better way to handle their limitations
> except for setting max_segments to 1 or bounce buffering.

Would it be okay to rely on the assumption that USB block devices never 
have block size < 512?  (We could even add code to the driver to 
enforce this, although refusing to handle such devices at all might be 
worse than getting an occasional error.)

As I mentioned before, the only HCD that sometimes ends up with
maxpacket = 1024 but is unable to do full SG is vhci-hcd, and that one
shouldn't be too hard to fix.

Alan Stern



Re: [PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-12 Thread Joerg Roedel
On Wed, Jun 12, 2019 at 12:54:51PM +0100, Jean-Philippe Brucker wrote:
> Thanks! As discussed I think we need to add padding into the iommu_fault
> structure before this reaches mainline, to make the UAPI easier to
> extend in the future. It's already possible to extend but requires
> introducing a new ABI version number and support two structures. Adding
> some padding would only require introducing new flags. If there is no
> objection I'll send a one-line patch bumping the structure size to 64
> bytes (currently 48)

Sounds good, please submit the patch.

Regards,

Joerg


Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 01:46:06PM +0200, Oliver Neukum wrote:
> > Thay is someething the virt_boundary prevents.  But could still give
> > you something like:
> > 
> > 1536 4096 4096 1024
> > 
> > or
> > 1536 16384 8192 4096 16384 512
> 
> That would kill the driver, if maxpacket were 1024.
> 
> USB has really two kinds of requirements
> 
> 1. What comes from the protocol
> 2. What comes from the HCD
> 
> The protocol wants just multiples of maxpacket. XHCI can satisfy
> that in arbitrary scatter/gather. Other HCs cannot.

We have no real way to enforce that for the other HCs unfortunately.
I can't really think of any better way to handle their limitations
except for setting max_segments to 1 or bounce buffering.


Re: [PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-12 Thread Jean-Philippe Brucker
On 12/06/2019 09:19, Joerg Roedel wrote:
> On Mon, Jun 03, 2019 at 03:57:45PM +0100, Jean-Philippe Brucker wrote:
>> Jacob Pan (3):
>>   driver core: Add per device iommu param
>>   iommu: Introduce device fault data
>>   iommu: Introduce device fault report API
>>
>> Jean-Philippe Brucker (1):
>>   iommu: Add recoverable fault reporting
>>
>>  drivers/iommu/iommu.c  | 236 -
>>  include/linux/device.h |   3 +
>>  include/linux/iommu.h  |  87 ++
>>  include/uapi/linux/iommu.h | 153 
>>  4 files changed, 476 insertions(+), 3 deletions(-)
>>  create mode 100644 include/uapi/linux/iommu.h
> 
> Applied, thanks.

Thanks! As discussed I think we need to add padding into the iommu_fault
structure before this reaches mainline, to make the UAPI easier to
extend in the future. It's already possible to extend but requires
introducing a new ABI version number and support two structures. Adding
some padding would only require introducing new flags. If there is no
objection I'll send a one-line patch bumping the structure size to 64
bytes (currently 48)

Thanks,
Jean


Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Oliver Neukum
Am Mittwoch, den 12.06.2019, 09:30 +0200 schrieb Christoph Hellwig:
> 
> So based on the above I'm a little confused about the actual requirement
> again.  Can you still split the SCSI command into multiple URBs?  And

Yes. The device sees only a number of packets over the wire. They can
come from an arbitrary number of URBs with the two restrictions that
- we cannot split a packet among URBs
- every packet but the last must be a multiple of maxpacket

> is the boundary for that split still the scatterlist entry as in the
> description above?  If so I don't really see how the virt_boundary
> helps you at all. as it only guarnatees that in a bio, each subsequent
> segment start as the advertised virt_boundary.  It says nothing about
> the size of each segment.

That is problematic.

> Thay is someething the virt_boundary prevents.  But could still give
> you something like:
> 
>   1536 4096 4096 1024
> 
> or
>   1536 16384 8192 4096 16384 512

That would kill the driver, if maxpacket were 1024.

USB has really two kinds of requirements

1. What comes from the protocol
2. What comes from the HCD

The protocol wants just multiples of maxpacket. XHCI can satisfy
that in arbitrary scatter/gather. Other HCs cannot.

Regards
Oliver



Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Christoph Hellwig
On Wed, Jun 12, 2019 at 08:52:21AM +, Yoshihiro Shimoda wrote:
> Hi Christoph,
> 
> > From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM
> > 
> > First things first:
> > 
> > Yoshihiro, can you try this git branch?  The new bits are just the three
> > patches at the end, but they sit on top of a few patches already sent
> > out to the list, so a branch is probably either:
> > 
> >git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes
> 
> Thank you for the patches!
> Unfortunately, the three patches could not resolve this issue.
> However, it's a hint to me, and then I found the root cause:
>  - slave_configure() in drivers/usb/storage/scsiglue.c calls
>blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when USB_SPEED_SUPER 
> or more.
>  -- So that, even if your patches (also I fixed it a little [1]) could not 
> resolve
> the issue because the max_sectors is overwritten by above code.
> 
> So, I think we should fix the slave_configure() by using 
> dma_max_mapping_size().
> What do you think? If so, I can make such a patch.

Yes, please do.


Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-12 Thread Jean-Philippe Brucker
On 11/06/2019 18:10, Jacob Pan wrote:
>> The issue is theoretical at the moment because no users do this, but
>> I'd be more comfortable taking the xa_lock, which prevents a
>> concurrent xa_erase()+free(). (I commented on your v3 but you might
>> have missed it)
>>
> Did you reply to my v3? I did not see it. I only saw your comments about
> v3 in your commit message.

My fault, I sneaked the comments in a random reply three levels down the
thread:
https://lore.kernel.org/linux-iommu/836caf0d-699e-33ba-5303-b1c9c949c...@arm.com/

(Great, linux-iommu is indexed by lore! I won't have to Cc lkml anymore)

 +  ioasid_data = xa_load(_xa, ioasid);
 +  if (ioasid_data)
 +  rcu_assign_pointer(ioasid_data->private, data);  
>>> it is good to publish and have barrier here. But I just wonder even
>>> for weakly ordered machine, this pointer update is quite far away
>>> from its data update.  
>>
>> I don't know, it could be right before calling ioasid_set_data():
>>
>>  mydata = kzalloc(sizeof(*mydata));
>>  mydata->ops = _ops;  (1)
>>  ioasid_set_data(ioasid, mydata);
>>  ... /* no write barrier here */
>>  data->private = mydata; (2)
>>
>> And then another thread calls ioasid_find():
>>
>>  mydata = ioasid_find(ioasid);
>>  if (mydata)
>>  mydata->ops->do_something();
>>
>> On a weakly ordered machine, this thread could observe the pointer
>> assignment (2) before the ops assignment (1), and dereference NULL.
>> Using rcu_assign_pointer() should fix that
>>
> I agree it is better to have the barrier. Just thought there is already
> a rcu_read_lock() in xa_load() in between. rcu_read_lock() may have
> barrier in some case but better not count on it. 

Yes, and even if rcu_read_lock() provided a barrier I don't think it
would be sufficient, because acquire semantics don't guarantee that
prior writes appear to happen before the barrier, only the other way
round. A lock operation with release semantics, for example
spin_unlock(), should work.

Thanks,
Jean

> No issues here. I will
> integrate this in the next version.
> 
>> Thanks,
>> Jean
> 
> [Jacob Pan]
> 



RE: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Yoshihiro Shimoda
Hi Christoph,

> From: Christoph Hellwig, Sent: Wednesday, June 12, 2019 4:31 PM
> 
> First things first:
> 
> Yoshihiro, can you try this git branch?  The new bits are just the three
> patches at the end, but they sit on top of a few patches already sent
> out to the list, so a branch is probably either:
> 
>git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes

Thank you for the patches!
Unfortunately, the three patches could not resolve this issue.
However, it's a hint to me, and then I found the root cause:
 - slave_configure() in drivers/usb/storage/scsiglue.c calls
   blk_queue_max_hw_sectors() with 2048 sectors (1 MiB) when USB_SPEED_SUPER or 
more.
 -- So that, even if your patches (also I fixed it a little [1]) could not 
resolve
the issue because the max_sectors is overwritten by above code.

So, I think we should fix the slave_configure() by using dma_max_mapping_size().
What do you think? If so, I can make such a patch.

[1]
In the "scsi: take the DMA max mapping size into account" patch,
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_max_mapping_size(dev) << SECTOR_SHIFT);

it should be:
+   dma_max_mapping_size(dev) >> SECTOR_SHIFT);

But, if we fix the slave_configure(), we don't need this patch, IIUC.

Best regards,
Yoshihiro Shimoda



Re: [PATCH v2 0/7] iommu/vt-d: Fixes and cleanups for linux-next

2019-06-12 Thread Joerg Roedel
On Wed, Jun 12, 2019 at 08:28:44AM +0800, Lu Baolu wrote:
> Lu Baolu (6):
>   iommu/vt-d: Don't return error when device gets right domain
>   iommu/vt-d: Set domain type for a private domain
>   iommu/vt-d: Don't enable iommu's which have been ignored
>   iommu/vt-d: Allow DMA domain attaching to rmrr locked device
>   iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
>   iommu/vt-d: Consolidate domain_init() to avoid duplication

Applied, thanks.


Re: [PATCH v6 0/7] RMRR related fixes and enhancements

2019-06-12 Thread Joerg Roedel
On Mon, Jun 03, 2019 at 08:53:29AM +0200, Eric Auger wrote:
> Eric Auger (7):
>   iommu: Fix a leak in iommu_insert_resv_region
>   iommu/vt-d: Duplicate iommu_resv_region objects per device list
>   iommu/vt-d: Introduce is_downstream_to_pci_bridge helper
>   iommu/vt-d: Handle RMRR with PCI bridge device scopes
>   iommu/vt-d: Handle PCI bridge RMRR device scopes in
> intel_iommu_get_resv_regions
>   iommu: Introduce IOMMU_RESV_DIRECT_RELAXABLE reserved memory regions
>   iommu/vt-d: Differentiate relaxable and non relaxable RMRRs

Applied, thanks.


Re: [PATCH v2 0/4] iommu: Add device fault reporting API

2019-06-12 Thread Joerg Roedel
On Mon, Jun 03, 2019 at 03:57:45PM +0100, Jean-Philippe Brucker wrote:
> Jacob Pan (3):
>   driver core: Add per device iommu param
>   iommu: Introduce device fault data
>   iommu: Introduce device fault report API
> 
> Jean-Philippe Brucker (1):
>   iommu: Add recoverable fault reporting
> 
>  drivers/iommu/iommu.c  | 236 -
>  include/linux/device.h |   3 +
>  include/linux/iommu.h  |  87 ++
>  include/uapi/linux/iommu.h | 153 
>  4 files changed, 476 insertions(+), 3 deletions(-)
>  create mode 100644 include/uapi/linux/iommu.h

Applied, thanks.


Re: [PATCH v3] iommu/arm-smmu: Avoid constant zero in TLBI writes

2019-06-12 Thread Joerg Roedel
On Mon, Jun 03, 2019 at 02:15:37PM +0200, Marc Gonzalez wrote:
>  drivers/iommu/arm-smmu.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)

Applied, thanks. It should show up in linux-next in the next days.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH -next v2] iommu/amd: fix a null-ptr-deref in map_sg()

2019-06-12 Thread Joerg Roedel
On Mon, May 06, 2019 at 12:44:40PM -0400, Qian Cai wrote:
> The commit 1a1079011da3 ("iommu/amd: Flush not present cache in
> iommu_map_page") 

That patch was reverted by me in

97a18f548548a6ee1b9be14c6fc72090b8839875

because it caused issues by testers. So maybe re-submit the above patch
with this fix included?

Regards,

Joerg



Re: How to resolve an issue in swiotlb environment?

2019-06-12 Thread Christoph Hellwig
First things first:

Yoshihiro, can you try this git branch?  The new bits are just the three
patches at the end, but they sit on top of a few patches already sent
out to the list, so a branch is probably either:

   git://git.infradead.org/users/hch/misc.git scsi-virt-boundary-fixes

Gitweb:

   
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/scsi-virt-boundary-fixes

And now on to the rest:

> We would like to avoid the extra I/O overhead for host controllers that
> can't handle SG.  In fact, switching to sg_tablesize = 1 would probably
> be considered a regression.

Ok, makes sense.

> >  - set the virt boundary as-is for devices supporting "basic" scatterlist,
> >although that still assumes they can rejiggle them because for example
> >you could still get a smaller than expected first segment ala (assuming
> >a 1024 byte packet size and thus 1023 virt_boundary_mask):
> > 
> > | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |
> > 
> >as the virt_bondary does not guarantee that the first segment is
> >the same size as all the mid segments.
> 
> But that is exactly the problem we need to solve.

So based on the above I'm a little confused about the actual requirement
again.  Can you still split the SCSI command into multiple URBs?  And
is the boundary for that split still the scatterlist entry as in the
description above?  If so I don't really see how the virt_boundary
helps you at all. as it only guarnatees that in a bio, each subsequent
segment start as the advertised virt_boundary.  It says nothing about
the size of each segment.

> The issue which prompted the commit this thread is about arose in a
> situation where the block layer set up a scatterlist containing buffer
> sizes something like:
> 
>   4096 4096 1536 1024
> 
> and the maximum packet size was 1024.  The situation was a little 
> unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't 
> matter much in normal practice because:

Thay is someething the virt_boundary prevents.  But could still give
you something like:

1536 4096 4096 1024

or
1536 16384 8192 4096 16384 512

> The ->sysdev field points to the device used for DMA mapping.  It is
> often the same as ->controller, but sometimes it is
> ->controller->parent because of the peculiarities of some platforms.

Thanks, taken into account in the above patches!


[PATCH v3 0/4] Qcom smmu-500 wait-for-safe handling for sdm845

2019-06-12 Thread Vivek Gautam
Subject changed, older subject was -
Qcom smmu-500 TLB invalidation errata for sdm845.
Previous version of the patches are at [1]:

Qcom's implementation of smmu-500 on sdm845 adds a hardware logic called
wait-for-safe. This logic helps in meeting the invalidation requirements
from 'real-time clients', such as display and camera. This wait-for-safe
logic ensures that the invalidations happen after getting an ack from these
devices.
In this patch-series we are disabling this wait-for-safe logic from the
arm-smmu driver's probe as with this enabled the hardware tries to
throttle invalidations from 'non-real-time clients', such as USB and UFS.

For detailed information please refer to patch [3/4] in this series.
I have included the device tree patch too in this series for someone who
would like to test out this. Here's a branch [2] that gets display on MTP
SDM845 device.

This patch series is inspired from downstream work to handle under-performance
issues on real-time clients on sdm845. In downstream we add separate page table
ops to handle TLB maintenance and toggle wait-for-safe in tlb_sync call so that
achieve required performance for display and camera [3, 4].

Changes since v2:
 * Dropped the patch to add atomic io_read/write scm API.
 * Removed support for any separate page table ops to handle wait-for-safe.
   Currently just disabling this wait-for-safe logic from 
arm_smmu_device_probe()
   to achieve performance on USB/UFS on sdm845.
 * Added a device tree patch to add smmu option for fw-implemented support
   for SCM call to take care of SAFE toggling.

Changes since v1:
 * Addressed Will and Robin's comments:
- Dropped the patch[4] that forked out __arm_smmu_tlb_inv_range_nosync(),
  and __arm_smmu_tlb_sync().
- Cleaned up the errata patch further to use downstream polling mechanism
  for tlb sync.
 * No change in SCM call patches - patches 1 to 3.

[1] https://lore.kernel.org/patchwork/cover/983913/
[2] https://github.com/vivekgautam1/linux/tree/v5.2-rc4/sdm845-display-working
[3] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9=da765c6c75266b38191b38ef086274943f353ea7
[4] 
https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/msm-4.9=8696005aaaf745de68f57793c1a534a34345c30a

Vivek Gautam (4):
  firmware: qcom_scm-64: Add atomic version of qcom_scm_call
  firmware/qcom_scm: Add scm call to handle smmu errata
  iommu/arm-smmu: Add support to handle Qcom's wait-for-safe logic
  arm64: dts/sdm845: Enable FW implemented safe sequence handler on MTP

 arch/arm64/boot/dts/qcom/sdm845.dtsi |   1 +
 drivers/firmware/qcom_scm-32.c   |   5 ++
 drivers/firmware/qcom_scm-64.c   | 149 ---
 drivers/firmware/qcom_scm.c  |   6 ++
 drivers/firmware/qcom_scm.h  |   5 ++
 drivers/iommu/arm-smmu.c |  16 
 include/linux/qcom_scm.h |   2 +
 7 files changed, 140 insertions(+), 44 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/4] firmware/qcom_scm: Add scm call to handle smmu errata

2019-06-12 Thread Vivek Gautam
Qcom's smmu-500 needs to toggle wait-for-safe logic to
handle TLB invalidations.
Few firmwares allow doing that through SCM interface.
Add API to toggle wait for safe from firmware through a
SCM call.

Signed-off-by: Vivek Gautam 
Reviewed-by: Bjorn Andersson 
---
 drivers/firmware/qcom_scm-32.c |  5 +
 drivers/firmware/qcom_scm-64.c | 13 +
 drivers/firmware/qcom_scm.c|  6 ++
 drivers/firmware/qcom_scm.h|  5 +
 include/linux/qcom_scm.h   |  2 ++
 5 files changed, 31 insertions(+)

diff --git a/drivers/firmware/qcom_scm-32.c b/drivers/firmware/qcom_scm-32.c
index 215061c581e1..bee8729525ec 100644
--- a/drivers/firmware/qcom_scm-32.c
+++ b/drivers/firmware/qcom_scm-32.c
@@ -614,3 +614,8 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t 
addr, unsigned int val)
return qcom_scm_call_atomic2(QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE,
 addr, val);
 }
+
+int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool enable)
+{
+   return -ENODEV;
+}
diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
index b6dca32c5ac4..23de54b75cd7 100644
--- a/drivers/firmware/qcom_scm-64.c
+++ b/drivers/firmware/qcom_scm-64.c
@@ -550,3 +550,16 @@ int __qcom_scm_io_writel(struct device *dev, phys_addr_t 
addr, unsigned int val)
return qcom_scm_call(dev, QCOM_SCM_SVC_IO, QCOM_SCM_IO_WRITE,
 , );
 }
+
+int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev, bool en)
+{
+   struct qcom_scm_desc desc = {0};
+   struct arm_smccc_res res;
+
+   desc.args[0] = QCOM_SCM_CONFIG_SAFE_EN_CLIENT_ALL;
+   desc.args[1] = en;
+   desc.arginfo = QCOM_SCM_ARGS(2);
+
+   return qcom_scm_call_atomic(dev, QCOM_SCM_SVC_SMMU_PROGRAM,
+   QCOM_SCM_CONFIG_SAFE_EN, , );
+}
diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index 2ddc118dba1b..2b3b7a8c4270 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -344,6 +344,12 @@ int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, 
u32 spare)
 }
 EXPORT_SYMBOL(qcom_scm_iommu_secure_ptbl_init);
 
+int qcom_scm_qsmmu500_wait_safe_toggle(bool en)
+{
+   return __qcom_scm_qsmmu500_wait_safe_toggle(__scm->dev, en);
+}
+EXPORT_SYMBOL(qcom_scm_qsmmu500_wait_safe_toggle);
+
 int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val)
 {
return __qcom_scm_io_readl(__scm->dev, addr, val);
diff --git a/drivers/firmware/qcom_scm.h b/drivers/firmware/qcom_scm.h
index 99506bd873c0..0b63ded89b41 100644
--- a/drivers/firmware/qcom_scm.h
+++ b/drivers/firmware/qcom_scm.h
@@ -91,10 +91,15 @@ extern int __qcom_scm_restore_sec_cfg(struct device *dev, 
u32 device_id,
  u32 spare);
 #define QCOM_SCM_IOMMU_SECURE_PTBL_SIZE3
 #define QCOM_SCM_IOMMU_SECURE_PTBL_INIT4
+#define QCOM_SCM_SVC_SMMU_PROGRAM  0x15
+#define QCOM_SCM_CONFIG_SAFE_EN0x3
+#define QCOM_SCM_CONFIG_SAFE_EN_CLIENT_ALL 0x2
 extern int __qcom_scm_iommu_secure_ptbl_size(struct device *dev, u32 spare,
 size_t *size);
 extern int __qcom_scm_iommu_secure_ptbl_init(struct device *dev, u64 addr,
 u32 size, u32 spare);
+extern int __qcom_scm_qsmmu500_wait_safe_toggle(struct device *dev,
+   bool enable);
 #define QCOM_MEM_PROT_ASSIGN_ID0x16
 extern int  __qcom_scm_assign_mem(struct device *dev,
  phys_addr_t mem_region, size_t mem_sz,
diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h
index 3f12cc77fb58..aee3d8580d89 100644
--- a/include/linux/qcom_scm.h
+++ b/include/linux/qcom_scm.h
@@ -57,6 +57,7 @@ extern int qcom_scm_set_remote_state(u32 state, u32 id);
 extern int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare);
 extern int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size);
 extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare);
+extern int qcom_scm_qsmmu500_wait_safe_toggle(bool en);
 extern int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val);
 extern int qcom_scm_io_writel(phys_addr_t addr, unsigned int val);
 #else
@@ -96,6 +97,7 @@ qcom_scm_set_remote_state(u32 state,u32 id) { return -ENODEV; 
}
 static inline int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare) { return 
-ENODEV; }
 static inline int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size) { 
return -ENODEV; }
 static inline int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 
spare) { return -ENODEV; }
+static inline int qcom_scm_qsmmu500_wait_safe_toggle(bool en) { return 
-ENODEV; }
 static inline int qcom_scm_io_readl(phys_addr_t addr, unsigned int *val) { 
return -ENODEV; }
 static inline int qcom_scm_io_writel(phys_addr_t addr, unsigned int val) { 
return -ENODEV; }
 #endif
-- 
QUALCOMM INDIA, on behalf of Qualcomm 

[PATCH v3 3/4] iommu/arm-smmu: Add support to handle Qcom's wait-for-safe logic

2019-06-12 Thread Vivek Gautam
Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
to address under-performance issues in real-time clients, such as
Display, and Camera.
On receiving an invalidation requests, the SMMU forwards SAFE request
to these clients and waits for SAFE ack signal from real-time clients.
The SAFE signal from such clients is used to qualify the start of
invalidation.
This logic is controlled by chicken bits, one for each - MDP (display),
IFE0, and IFE1 (camera), that can be accessed only from secure software
on sdm845.

This configuration, however, degrades the performance of non-real time
clients, such as USB, and UFS etc. This happens because, with wait-for-safe
logic enabled the hardware tries to throttle non-real time clients while
waiting for SAFE ack signals from real-time clients.

On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time
by the bootloaders we see degraded performance of USB and UFS when kernel
enables the smmu stage-1 translations for these clients.
Turn off this wait-for-safe logic from the kernel gets us back the perf
of USB and UFS devices until we re-visit this when we start seeing perf
issues on display/camera on upstream supported SDM845 platforms.

Now, different bootloaders with their access control policies allow this
register access differently through secure monitor calls -
1) With one we can issue io-read/write secure monitor call (qcom-scm)
   to update the register, while,
2) With other, such as one on MTP sdm845 we should use the specific
   qcom-scm command to send request to do the complete register
   configuration.
Adding a separate device tree flag for arm-smmu to identify which
firmware configuration of the two mentioned above we use.
Not adding code change to allow type-(1) bootloaders to toggle the
safe using io-read/write qcom-scm call.

This change is inspired by the downstream change from Patrick Daly
to address performance issues with display and camera by handling
this wait-for-safe within separte io-pagetable ops to do TLB
maintenance. So a big thanks to him for the change.

Without this change the UFS reads are pretty slow:
$ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
10+0 records in
10+0 records out
10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
real0m 22.39s
user0m 0.00s
sys 0m 0.01s

With this change they are back to rock!
$ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
300+0 records in
300+0 records out
314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
real0m 1.03s
user0m 0.00s
sys 0m 0.54s

Signed-off-by: Vivek Gautam 
---
 drivers/iommu/arm-smmu.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0ad086da399c..3c3ad43eda97 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -177,6 +178,7 @@ struct arm_smmu_device {
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
+#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
u32 options;
enum arm_smmu_arch_version  version;
enum arm_smmu_implementationmodel;
@@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
+   { ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, 
"qcom,smmu-500-fw-impl-safe-errata" },
{ 0, NULL},
 };
 
@@ -2292,6 +2295,19 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
arm_smmu_device_reset(smmu);
arm_smmu_test_smr_masks(smmu);
 
+   /*
+* To address performance degradation in non-real time clients,
+* such as USB and UFS, turn off wait-for-safe on sdm845 platforms,
+* such as MTP, whose firmwares implement corresponding secure monitor
+* call handlers.
+*/
+   if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500") 
&&
+   smmu->options & ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA) {
+   err = qcom_scm_qsmmu500_wait_safe_toggle(0);
+   if (err)
+   dev_warn(dev, "Failed to turn off SAFE logic\n");
+   }
+
/*
 * We want to avoid touching dev->power.lock in fastpaths unless
 * it's really going to do something useful - pm_runtime_enabled()
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation



[PATCH v3 1/4] firmware: qcom_scm-64: Add atomic version of qcom_scm_call

2019-06-12 Thread Vivek Gautam
There are scnenarios where drivers are required to make a
scm call in atomic context, such as in one of the qcom's
arm-smmu-500 errata [1].

[1] ("https://source.codeaurora.org/quic/la/kernel/msm-4.9/commit/
  drivers/iommu/arm-smmu.c?h=CogSystems-msm-49/
  msm-4.9=da765c6c75266b38191b38ef086274943f353ea7")

Signed-off-by: Vivek Gautam 
Reviewed-by: Bjorn Andersson 
---
 drivers/firmware/qcom_scm-64.c | 136 -
 1 file changed, 92 insertions(+), 44 deletions(-)

diff --git a/drivers/firmware/qcom_scm-64.c b/drivers/firmware/qcom_scm-64.c
index 91d5ad7cf58b..b6dca32c5ac4 100644
--- a/drivers/firmware/qcom_scm-64.c
+++ b/drivers/firmware/qcom_scm-64.c
@@ -62,32 +62,71 @@ static DEFINE_MUTEX(qcom_scm_lock);
 #define FIRST_EXT_ARG_IDX 3
 #define N_REGISTER_ARGS (MAX_QCOM_SCM_ARGS - N_EXT_QCOM_SCM_ARGS + 1)
 
-/**
- * qcom_scm_call() - Invoke a syscall in the secure world
- * @dev:   device
- * @svc_id:service identifier
- * @cmd_id:command identifier
- * @desc:  Descriptor structure containing arguments and return values
- *
- * Sends a command to the SCM and waits for the command to finish processing.
- * This should *only* be called in pre-emptible context.
-*/
-static int qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
-const struct qcom_scm_desc *desc,
-struct arm_smccc_res *res)
+static void __qcom_scm_call_do(const struct qcom_scm_desc *desc,
+  struct arm_smccc_res *res, u32 fn_id,
+  u64 x5, u32 type)
+{
+   u64 cmd;
+   struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6};
+
+   cmd = ARM_SMCCC_CALL_VAL(type, qcom_smccc_convention,
+ARM_SMCCC_OWNER_SIP, fn_id);
+
+   quirk.state.a6 = 0;
+
+   do {
+   arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0],
+   desc->args[1], desc->args[2], x5,
+   quirk.state.a6, 0, res, );
+
+   if (res->a0 == QCOM_SCM_INTERRUPTED)
+   cmd = res->a0;
+
+   } while (res->a0 == QCOM_SCM_INTERRUPTED);
+}
+
+static void qcom_scm_call_do(const struct qcom_scm_desc *desc,
+struct arm_smccc_res *res, u32 fn_id,
+u64 x5, bool atomic)
+{
+   int retry_count = 0;
+
+   if (!atomic) {
+   do {
+   mutex_lock(_scm_lock);
+
+   __qcom_scm_call_do(desc, res, fn_id, x5,
+  ARM_SMCCC_STD_CALL);
+
+   mutex_unlock(_scm_lock);
+
+   if (res->a0 == QCOM_SCM_V2_EBUSY) {
+   if (retry_count++ > QCOM_SCM_EBUSY_MAX_RETRY)
+   break;
+   msleep(QCOM_SCM_EBUSY_WAIT_MS);
+   }
+   }  while (res->a0 == QCOM_SCM_V2_EBUSY);
+   } else {
+   __qcom_scm_call_do(desc, res, fn_id, x5, ARM_SMCCC_FAST_CALL);
+   }
+}
+
+static int ___qcom_scm_call(struct device *dev, u32 svc_id, u32 cmd_id,
+   const struct qcom_scm_desc *desc,
+   struct arm_smccc_res *res, bool atomic)
 {
int arglen = desc->arginfo & 0xf;
-   int retry_count = 0, i;
+   int i;
u32 fn_id = QCOM_SCM_FNID(svc_id, cmd_id);
-   u64 cmd, x5 = desc->args[FIRST_EXT_ARG_IDX];
+   u64 x5 = desc->args[FIRST_EXT_ARG_IDX];
dma_addr_t args_phys = 0;
void *args_virt = NULL;
size_t alloc_len;
-   struct arm_smccc_quirk quirk = {.id = ARM_SMCCC_QUIRK_QCOM_A6};
+   gfp_t flag = atomic ? GFP_ATOMIC : GFP_KERNEL;
 
if (unlikely(arglen > N_REGISTER_ARGS)) {
alloc_len = N_EXT_QCOM_SCM_ARGS * sizeof(u64);
-   args_virt = kzalloc(PAGE_ALIGN(alloc_len), GFP_KERNEL);
+   args_virt = kzalloc(PAGE_ALIGN(alloc_len), flag);
 
if (!args_virt)
return -ENOMEM;
@@ -117,33 +156,7 @@ static int qcom_scm_call(struct device *dev, u32 svc_id, 
u32 cmd_id,
x5 = args_phys;
}
 
-   do {
-   mutex_lock(_scm_lock);
-
-   cmd = ARM_SMCCC_CALL_VAL(ARM_SMCCC_STD_CALL,
-qcom_smccc_convention,
-ARM_SMCCC_OWNER_SIP, fn_id);
-
-   quirk.state.a6 = 0;
-
-   do {
-   arm_smccc_smc_quirk(cmd, desc->arginfo, desc->args[0],
- desc->args[1], desc->args[2], x5,
- quirk.state.a6, 0, res, );
-
-   if (res->a0 == QCOM_SCM_INTERRUPTED)
-   cmd = res->a0;
-
-   } while (res->a0 == QCOM_SCM_INTERRUPTED);
-
-  

[PATCH v3 4/4] arm64: dts/sdm845: Enable FW implemented safe sequence handler on MTP

2019-06-12 Thread Vivek Gautam
Indicate on MTP SDM845 that firmware implements handler to
TLB invalidate erratum SCM call where SAFE sequence is toggled
to achieve optimum performance on real-time clients, such as
display and camera.

Signed-off-by: Vivek Gautam 
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 78ec373a2b18..6a73d9744a71 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -2368,6 +2368,7 @@
compatible = "qcom,sdm845-smmu-500", "arm,mmu-500";
reg = <0 0x1500 0 0x8>;
#iommu-cells = <2>;
+   qcom,smmu-500-fw-impl-safe-errata;
#global-interrupts = <1>;
interrupts = ,
 ,
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code

2019-06-12 Thread Leizhen (ThunderTown)



On 2019/6/12 13:16, Borislav Petkov wrote:
> On Thu, May 30, 2019 at 11:48:26AM +0800, Zhen Lei wrote:
>> This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to
>> improve readablity.
> 
> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.

OK, thanks.

> 
> Also, do
> 
> $ git grep 'This patch' Documentation/process
> 
> for more details.
> 
>> Signed-off-by: Zhen Lei 
>> ---
>>  arch/x86/kernel/pci-dma.c | 7 ++-
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
>> index dcd272dbd0a9330..9f2b19c35a060df 100644
>> --- a/arch/x86/kernel/pci-dma.c
>> +++ b/arch/x86/kernel/pci-dma.c
>> @@ -43,11 +43,8 @@
>>   * It is also possible to disable by default in kernel config, and enable 
>> with
>>   * iommu=nopt at boot time.
>>   */
>> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
>> -int iommu_pass_through __read_mostly = 1;
>> -#else
>> -int iommu_pass_through __read_mostly;
>> -#endif
>> +int iommu_pass_through __read_mostly =
>> +IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);
> 
> Let that line stick out.

OK, I will merge them on the same line.

> 
> Thx.
> 



Re: [PATCH v4 0/9] iommu: Bounce page for untrusted devices

2019-06-12 Thread Mika Westerberg
On Wed, Jun 12, 2019 at 11:00:06AM +0800, Lu Baolu wrote:
> > What kind of devices did you test it with?
> 
> Most test work was done by Xu Pengfei (cc'ed). He has run the code
> on real platforms with various thunderbolt peripherals (usb, disk,
> network, etc.).

In addtition to that we are also in works to build a real thunderclap
platform to verify it can only access the bounce buffer if the DMA
transfer was to a memory not filling the whole page.