Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

2020-10-19 Thread Raj, Ashok
Hi Jean

On Mon, Oct 19, 2020 at 04:08:24PM +0200, Jean-Philippe Brucker wrote:
> On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > > For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):
> > > 
> > >   To stop [using a PASID] without using a Stop Marker Message, the
> > >   function shall:
> > >   1. Stop queueing new Page Request Messages for this PASID.
> > 
> > The device driver would need to tell stop sending any new PR's.
> > 
> > >   2. Finish transmitting any multi-page Page Request Messages for this
> > >  PASID (i.e. send the Page Request Message with the L bit Set).
> > >   3. Wait for PRG Response Messages associated any outstanding Page
> > >  Request Messages for the PASID.
> > > 
> > > So they have to flush their PR themselves. And since the device driver
> > > completes this sequence before calling unbind(), then there shouldn't be
> > > any oustanding PR for the PASID, and unbind() doesn't need to flush,
> > > right?
> > 
> > I can see how the device can complete #2,3 above. But the device driver
> > isn't the one managing page-responses right. So in order for the device to
> > know the above sequence is complete, it would need to get some assist from
> > IOMMU driver?
> 
> No the device driver just waits for the device to indicate that it has
> completed the sequence. That's what the magic stop-PASID mechanism
> described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
> says:

The goal is we do this when the device is in a messup up state. So we can't
take for granted the device is properly operating which is why we are going
to wack the device with a flr().

The only thing that's supposed to work without a brain freeze is the
invalidation logic. Spec requires devices to respond to invalidations even when
they are in the process of flr().

So when IOMMU does an invalidation wait with a Page-Drain, IOMMU waits till
the response for that arrives before completing the descriptor. Due to 
the posted semantics it will ensure any PR's issued and in the fabric are 
flushed 
out to memory. 

I suppose you can wait for the device to vouch for all responses, but that
is assuming the device is still functioning properly. Given that we use it
in two places,

* Reclaiming a PASID - only during a tear down sequence, skipping it
  doesn't really buy us much.
* During FLR this can't be skipped anyway due to the above sequence
  requirement. 

> 
> "A Function must have a mechanism to request that it gracefully stop using
>  a specific PASID. This mechanism is device specific but must satisfy the
>  following rules:
>  [...]
>  * When the stop request mechanism indicates completion, the Function has:
>[...]
>* Complied with additional rules described in Address Translation
>  Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
>  or Page Requests were issued on the behalf of this PASID."
> 
> So after the device driver initiates this mechanism in the device, the
> device must be able to indicate completion of the mechanism, which
> includes completing all in-flight Page Requests. At that point the device
> driver can call unbind() knowing there is no pending PR for this PASID.
> 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 3/3] iommu/arm-smmu-qcom: Implement S2CR quirk

2020-10-19 Thread Robin Murphy

On 2020-10-19 19:23, Bjorn Andersson wrote:

The firmware found in some Qualcomm platforms intercepts writes to S2CR
in order to replace bypass type streams with fault; and ignore S2CR
updates of type fault.

Detect this behavior and implement a custom write_s2cr function in order
to trick the firmware into supporting bypass streams by the means of
configuring the stream for translation using a reserved and disabled
context bank.

Also circumvent the problem of configuring faulting streams by
configuring the stream as bypass.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Made the bypass_cbndx an integer...
- Separated out the "quirk enabled or not" into a bool, rather than reusing
   (the valid) context bank 0 to represent this.
- Dropped the unused EXIDS handling.

  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 67 ++
  1 file changed, 67 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 48627fcf6bed..66ba4870659f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,8 +10,15 @@
  
  struct qcom_smmu {

struct arm_smmu_device smmu;
+   bool bypass_quirk;
+   u8 bypass_cbndx;
  };
  
+static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)

+{
+   return container_of(smmu, struct qcom_smmu, smmu);
+}
+
  static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = 
{
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -25,9 +32,33 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
  
  static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)

  {
+   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 reg;
u32 smr;
int i;
  
+	/*

+* With some firmware versions writes to S2CR of type FAULT are
+* ignored, and writing BYPASS will end up written as FAULT in the
+* register. Perform a write to S2CR to detect if this is the case and
+* if so reserve a context bank to emulate bypass streams.
+*/
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, S2CR_TYPE_BYPASS) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 0xff) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, S2CR_PRIVCFG_DEFAULT);
+   arm_smmu_gr0_write(smmu, last_s2cr, reg);
+   reg = arm_smmu_gr0_read(smmu, last_s2cr);
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
+   qsmmu->bypass_quirk = true;
+   qsmmu->bypass_cbndx = smmu->num_context_banks - 1;


FWIW you could arguably just calculate that at point of use. Or store 
the index as an int and use a negative value to indicate when it's 
irrelevant to save the separate flag. But there's also nothing *wrong* 
with having it all spelled out, so regardless,


Acked-by: Robin Murphy 

Cheers,
Robin.


+
+   set_bit(qsmmu->bypass_cbndx, smmu->context_map);
+
+   reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, 
CBAR_TYPE_S1_TRANS_S2_BYPASS);
+   arm_smmu_gr1_write(smmu, 
ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
+   }
+
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
  
@@ -45,6 +76,41 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)

return 0;
  }
  
+static void qcom_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)

+{
+   struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 cbndx = s2cr->cbndx;
+   u32 type = s2cr->type;
+   u32 reg;
+
+   if (qsmmu->bypass_quirk) {
+   if (type == S2CR_TYPE_BYPASS) {
+   /*
+* Firmware with quirky S2CR handling will substitute
+* BYPASS writes with FAULT, so point the stream to the
+* reserved context bank and ask for translation on the
+* stream
+*/
+   type = S2CR_TYPE_TRANS;
+   cbndx = qsmmu->bypass_cbndx;
+   } else if (type == S2CR_TYPE_FAULT) {
+   /*
+* Firmware with quirky S2CR handling will ignore FAULT
+* writes, so trick it to write FAULT by asking for a
+* BYPASS.
+*/
+   type = S2CR_TYPE_BYPASS;
+   cbndx = 0xff;
+   }
+   }
+
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+}
+
  static int 

Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-10-19 Thread Ross Philipson
On 10/19/20 1:06 PM, Arvind Sankar wrote:
> On Mon, Oct 19, 2020 at 10:38:08AM -0400, Ross Philipson wrote:
>> On 10/16/20 4:51 PM, Arvind Sankar wrote:
>>> On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote:

 I am discussing with Ross the other option. We can create
 .rodata.mle_header section and put it at fixed offset as
 kernel_info is. So, we would have, e.g.:

 arch/x86/boot/compressed/vmlinux.lds.S:
 .rodata.kernel_info KERNEL_INFO_OFFSET : {
 *(.rodata.kernel_info)
 }
 ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info 
 at bad address!")

 .rodata.mle_header MLE_HEADER_OFFSET : {
 *(.rodata.mle_header)
 }
 ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at 
 bad address!")

 arch/x86/boot/compressed/sl_stub.S:
 #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET)

 .section ".rodata.mle_header", "a"

 SYM_DATA_START(mle_header)
 .long   0x9082ac5a/* UUID0 */
 .long   0x74a7476f/* UUID1 */
 .long   0xa2555c0f/* UUID2 */
 .long   0x42b651cb/* UUID3 */
 .long   0x0034/* MLE header size */
 .long   0x00020002/* MLE version 2.2 */
 .long   mleh_rva(sl_stub_entry)/* Linear entry point of MLE 
 (virt. address) */
 .long   0x/* First valid page of MLE */
 .long   0x/* Offset within binary of first byte of MLE 
 */
 .long   0x/* Offset within binary of last byte + 1 of 
 MLE */
 .long   0x0223/* Bit vector of MLE-supported capabilities 
 */
 .long   0x/* Starting linear address of command line 
 (unused) */
 .long   0x/* Ending linear address of command line 
 (unused) */
 SYM_DATA_END(mle_header)

 Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere.
 Anyway, is it acceptable?

 There is also another problem. We have to put into mle_header size of
 the Linux kernel image. Currently it is done by the bootloader but
 I think it is not a role of the bootloader. The kernel image should
 provide all data describing its properties and do not rely on the
 bootloader to do that. Ross and I investigated various options but we
 did not find a good/simple way to do that. Could you suggest how we
 should do that or at least where we should take a look to get some
 ideas?

 Daniel
>>>
>>> What exactly is the size you need here? Is it just the size of the
>>> protected mode image, that's startup_32 to _edata. Or is it the size of
>>
>> It is the size of the protected mode image. Though how to reference
>> those symbols to get the size might all more relocation issues.
>>
> 
> Ok, then I think mleh_rva(_edata) should get you that -- I assume you
> don't want to include the uninitialized data in the size? The kernel
> will access memory beyond _edata (upto the init_size in the setup
> header), but that's not part of the image itself.

Yea we basically want the size of the image. There is nothing to measure
beyond the image as loaded into memory by the bootloader. rva(_edata)
seems to be the ticket.

Thanks
Ross

> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/3] iommu/arm-smmu-qcom: Read back stream mappings

2020-10-19 Thread Robin Murphy

On 2020-10-19 19:23, Bjorn Andersson wrote:

The Qualcomm boot loader configures stream mapping for the peripherals
that it accesses and in particular it sets up the stream mapping for the
display controller to be allowed to scan out a splash screen or EFI
framebuffer.

Read back the stream mappings during initialization and make the
arm-smmu driver maintain the streams in bypass mode.


Acked-by: Robin Murphy 


Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Don't increment s2cr[i]->count, as this is not actually needed to survive
   probe deferral

  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 23 ++
  1 file changed, 23 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..48627fcf6bed 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -23,6 +23,28 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
{ }
  };
  
+static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)

+{
+   u32 smr;
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   }
+   }
+
+   return 0;
+}
+
  static int qcom_smmu_def_domain_type(struct device *dev)
  {
const struct of_device_id *match =
@@ -61,6 +83,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
  }
  
  static const struct arm_smmu_impl qcom_smmu_impl = {

+   .cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
  };


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

2020-10-19 Thread Jacob Pan
Hi Jean-Philippe,

On Mon, 19 Oct 2020 16:08:24 +0200, Jean-Philippe Brucker
 wrote:

> On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > > For devices that *don't* use a stop marker, the PCIe spec says
> > > (10.4.1.2):
> > > 
> > >   To stop [using a PASID] without using a Stop Marker Message, the
> > >   function shall:
> > >   1. Stop queueing new Page Request Messages for this PASID.  
> > 
> > The device driver would need to tell stop sending any new PR's.
> >   
> > >   2. Finish transmitting any multi-page Page Request Messages for this
> > >  PASID (i.e. send the Page Request Message with the L bit Set).
> > >   3. Wait for PRG Response Messages associated any outstanding Page
> > >  Request Messages for the PASID.
> > > 
> > > So they have to flush their PR themselves. And since the device driver
> > > completes this sequence before calling unbind(), then there shouldn't
> > > be any oustanding PR for the PASID, and unbind() doesn't need to
> > > flush, right?  
> > 
> > I can see how the device can complete #2,3 above. But the device driver
> > isn't the one managing page-responses right. So in order for the device
> > to know the above sequence is complete, it would need to get some
> > assist from IOMMU driver?  
> 
> No the device driver just waits for the device to indicate that it has
> completed the sequence. That's what the magic stop-PASID mechanism
> described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
> says:
> 
> "A Function must have a mechanism to request that it gracefully stop using
>  a specific PASID. This mechanism is device specific but must satisfy the
>  following rules:
>  [...]
>  * When the stop request mechanism indicates completion, the Function has:
>[...]
>* Complied with additional rules described in Address Translation
>  Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
>  or Page Requests were issued on the behalf of this PASID."
> 
> So after the device driver initiates this mechanism in the device, the
> device must be able to indicate completion of the mechanism, which
> includes completing all in-flight Page Requests. At that point the device
> driver can call unbind() knowing there is no pending PR for this PASID.
> 
In step #3, I think it is possible that device driver received page response
as part of the auto page response, so it may not guarantee all the in-flight
PRQs are completed inside IOMMU. Therefore, drain is _always_ needed to be
sure?

> Thanks,
> Jean
> 
> > 
> > How does the driver know that everything host received has been
> > responded back to device?
> >   
> > >   
> > > > I'm not sure about other IOMMU's how they behave, When there is no
> > > > space in the PRQ, IOMMU auto-responds to the device. This puts the
> > > > device in a while (1) loop. The fake successful response will let
> > > > the device do a ATS lookup, and that would fail forcing the device
> > > > to do another PRQ.  
> > > 
> > > But in the sequence above, step 1 should ensure that the device will
> > > not send another PR for any successful response coming back at step
> > > 3.  
> > 
> > True, but there could be some page-request in flight on its way to the
> > IOMMU. By draining and getting that round trip back to IOMMU we
> > gaurantee things in flight are flushed to PRQ after that Drain
> > completes.  
> > > 
> > > So I agree with the below if we suspect there could be pending PR, but
> > > given that pending PR are a stop marker thing and we don't know any
> > > device using stop markers, I wondered why I bothered implementing
> > > PRIq flush at all for SMMUv3, hence this RFC.
> > >   
> > 
> > Cheers,
> > Ashok  


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/3] iommu/arm-smmu-qcom: Implement S2CR quirk

2020-10-19 Thread Bjorn Andersson
The firmware found in some Qualcomm platforms intercepts writes to S2CR
in order to replace bypass type streams with fault; and ignore S2CR
updates of type fault.

Detect this behavior and implement a custom write_s2cr function in order
to trick the firmware into supporting bypass streams by the means of
configuring the stream for translation using a reserved and disabled
context bank.

Also circumvent the problem of configuring faulting streams by
configuring the stream as bypass.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Made the bypass_cbndx an integer...
- Separated out the "quirk enabled or not" into a bool, rather than reusing
  (the valid) context bank 0 to represent this.
- Dropped the unused EXIDS handling.

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 67 ++
 1 file changed, 67 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 48627fcf6bed..66ba4870659f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,8 +10,15 @@
 
 struct qcom_smmu {
struct arm_smmu_device smmu;
+   bool bypass_quirk;
+   u8 bypass_cbndx;
 };
 
+static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
+{
+   return container_of(smmu, struct qcom_smmu, smmu);
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -25,9 +32,33 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
 
 static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
 {
+   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 reg;
u32 smr;
int i;
 
+   /*
+* With some firmware versions writes to S2CR of type FAULT are
+* ignored, and writing BYPASS will end up written as FAULT in the
+* register. Perform a write to S2CR to detect if this is the case and
+* if so reserve a context bank to emulate bypass streams.
+*/
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, S2CR_TYPE_BYPASS) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 0xff) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, S2CR_PRIVCFG_DEFAULT);
+   arm_smmu_gr0_write(smmu, last_s2cr, reg);
+   reg = arm_smmu_gr0_read(smmu, last_s2cr);
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
+   qsmmu->bypass_quirk = true;
+   qsmmu->bypass_cbndx = smmu->num_context_banks - 1;
+
+   set_bit(qsmmu->bypass_cbndx, smmu->context_map);
+
+   reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, 
CBAR_TYPE_S1_TRANS_S2_BYPASS);
+   arm_smmu_gr1_write(smmu, 
ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
+   }
+
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
 
@@ -45,6 +76,41 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
return 0;
 }
 
+static void qcom_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
+{
+   struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 cbndx = s2cr->cbndx;
+   u32 type = s2cr->type;
+   u32 reg;
+
+   if (qsmmu->bypass_quirk) {
+   if (type == S2CR_TYPE_BYPASS) {
+   /*
+* Firmware with quirky S2CR handling will substitute
+* BYPASS writes with FAULT, so point the stream to the
+* reserved context bank and ask for translation on the
+* stream
+*/
+   type = S2CR_TYPE_TRANS;
+   cbndx = qsmmu->bypass_cbndx;
+   } else if (type == S2CR_TYPE_FAULT) {
+   /*
+* Firmware with quirky S2CR handling will ignore FAULT
+* writes, so trick it to write FAULT by asking for a
+* BYPASS.
+*/
+   type = S2CR_TYPE_BYPASS;
+   cbndx = 0xff;
+   }
+   }
+
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -86,6 +152,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
.cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
+   .write_s2cr = qcom_smmu_write_s2cr,
 };
 
 struct arm_smmu_device *qcom_smmu_impl_init(struct 

[PATCH v5 2/3] iommu/arm-smmu-qcom: Read back stream mappings

2020-10-19 Thread Bjorn Andersson
The Qualcomm boot loader configures stream mapping for the peripherals
that it accesses and in particular it sets up the stream mapping for the
display controller to be allowed to scan out a splash screen or EFI
framebuffer.

Read back the stream mappings during initialization and make the
arm-smmu driver maintain the streams in bypass mode.

Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Don't increment s2cr[i]->count, as this is not actually needed to survive
  probe deferral

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 23 ++
 1 file changed, 23 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..48627fcf6bed 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -23,6 +23,28 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
{ }
 };
 
+static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
+{
+   u32 smr;
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   }
+   }
+
+   return 0;
+}
+
 static int qcom_smmu_def_domain_type(struct device *dev)
 {
const struct of_device_id *match =
@@ -61,6 +83,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
 }
 
 static const struct arm_smmu_impl qcom_smmu_impl = {
+   .cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
 };
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/3] iommu/arm-smmu: Allow implementation specific write_s2cr

2020-10-19 Thread Bjorn Andersson
The firmware found in some Qualcomm platforms intercepts writes to the
S2CR register in order to replace the BYPASS type with FAULT. Further
more it treats faults at this level as catastrophic and restarts the
device.

Add support for providing implementation specific versions of the S2CR
write function, to allow the Qualcomm driver to work around this
behavior.

Reviewed-by: Robin Murphy 
Signed-off-by: Bjorn Andersson 
---

Changes since v4:
- Return early instead of indenting the rest of the function

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 ++---
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..bcbacf22331d 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -929,9 +929,16 @@ static void arm_smmu_write_smr(struct arm_smmu_device 
*smmu, int idx)
 static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
 {
struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
-   u32 reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
- FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
- FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+   u32 reg;
+
+   if (smmu->impl && smmu->impl->write_s2cr) {
+   smmu->impl->write_s2cr(smmu, idx);
+   return;
+   }
+
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
 
if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
smmu->smrs[idx].valid)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 1a746476927c..b71647eaa319 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -436,6 +436,7 @@ struct arm_smmu_impl {
int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
  struct arm_smmu_device *smmu,
  struct device *dev, int start);
+   void (*write_s2cr)(struct arm_smmu_device *smmu, int idx);
 };
 
 #define INVALID_SMENDX -1
-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/3] iommu/arm-smmu-qcom: Support maintaining bootloader mappings

2020-10-19 Thread Bjorn Andersson
This is the revised fourth attempt of inheriting the stream mapping for
the framebuffer on many Qualcomm platforms, in order to not hit
catastrophic faults during arm-smmu initialization.

The new approach does, based on Robin's suggestion, take a much more
direct approach with the allocation of a context bank for bypass
emulation and use of this context bank pretty much isolated to the
Qualcomm specific implementation.

The patchset has been tested to boot DB845c and RB5 (with splash screen)
and Lenovo Yoga C630 (with EFI framebuffer).

Bjorn Andersson (3):
  iommu/arm-smmu: Allow implementation specific write_s2cr
  iommu/arm-smmu-qcom: Read back stream mappings
  iommu/arm-smmu-qcom: Implement S2CR quirk

 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 90 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c  | 13 +++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 +
 3 files changed, 101 insertions(+), 3 deletions(-)

-- 
2.28.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/arm-smmu-qcom: Implement S2CR quirk

2020-10-19 Thread Bjorn Andersson
On Mon 19 Oct 09:04 CDT 2020, Robin Murphy wrote:

> On 2020-10-17 05:39, Bjorn Andersson wrote:
> > The firmware found in some Qualcomm platforms intercepts writes to S2CR
> > in order to replace bypass type streams with fault; and ignore S2CR
> > updates of type fault.
> > 
> > Detect this behavior and implement a custom write_s2cr function in order
> > to trick the firmware into supporting bypass streams by the means of
> > configuring the stream for translation using a reserved and disabled
> > context bank.
> > 
> > Also circumvent the problem of configuring faulting streams by
> > configuring the stream as bypass.
> > 
> > Signed-off-by: Bjorn Andersson 
> > ---
> > 
> > Changes since v3:
> > - Move the reservation of the "identity context bank" to the Qualcomm 
> > specific
> >implementation.
> > - Implement the S2CR quirk with the newly introduced write_s2cr callback.
> > 
> >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 68 ++
> >   1 file changed, 68 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > index 0089048342dd..c0f42d6a6e01 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > @@ -10,8 +10,14 @@
> >   struct qcom_smmu {
> > struct arm_smmu_device smmu;
> > +   bool bypass_cbndx;
> 
> Nit: variables named "*ndx" usually hold an actual index value. If it's just
> a flag then maybe name it something like "use_bypass_context"?
> 
> >   };
> > +static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)
> > +{
> > +   return container_of(smmu, struct qcom_smmu, smmu);
> > +}
> > +
> >   static const struct of_device_id qcom_smmu_client_of_match[] 
> > __maybe_unused = {
> > { .compatible = "qcom,adreno" },
> > { .compatible = "qcom,mdp4" },
> > @@ -25,9 +31,32 @@ static const struct of_device_id 
> > qcom_smmu_client_of_match[] __maybe_unused = {
> >   static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
> >   {
> > +   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
> > 1);
> > +   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> > +   u32 reg;
> > u32 smr;
> > int i;
> > +   /*
> > +* With some firmware versions writes to S2CR of type FAULT are
> > +* ignored, and writing BYPASS will end up written as FAULT in the
> > +* register. Perform a write to S2CR to detect if this is the case and
> > +* if so reserve a context bank to emulate bypass streams.
> > +*/
> > +   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, S2CR_TYPE_BYPASS) |
> > + FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 0xff) |
> > + FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, S2CR_PRIVCFG_DEFAULT);
> > +   arm_smmu_gr0_write(smmu, last_s2cr, reg);
> > +   reg = arm_smmu_gr0_read(smmu, last_s2cr);
> > +   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
> > +   qsmmu->bypass_cbndx = smmu->num_context_banks - 1;
> 
> Oh, so maybe the name is in fact OK but the type is wrong :/
> 
> I guess this does happens to work out, but for the wrong reason...
> 

Odd, but "it works on my machine"... Sorry about that.

> > +
> > +   set_bit(qsmmu->bypass_cbndx, smmu->context_map);
> > +
> > +   reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, 
> > CBAR_TYPE_S1_TRANS_S2_BYPASS);
> > +   arm_smmu_gr1_write(smmu, 
> > ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
> > +   }
> > +
> > for (i = 0; i < smmu->num_mapping_groups; i++) {
> > smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
> > @@ -46,6 +75,44 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device 
> > *smmu)
> > return 0;
> >   }
> > +static void qcom_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
> > +{
> > +   struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
> > +   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
> > +   u32 cbndx = s2cr->cbndx;
> > +   u32 type = s2cr->type;
> > +   u32 reg;
> > +
> > +   if (qsmmu->bypass_cbndx) {
> 
> Note that if we are talking indices here then 0 would be perfectly valid in
> general. This works out OK in practice given that we're always reserving the
> last implemented context above, and if we ever *did* only have one such that
> index 0 is the last then we're going to have a bad time either way, but it's
> not necessarily the most obvious.
> 

Right. In the event that we have a SMMU instance with a single context
bank hitting this quirk would probably be bad regardless, as the
cfg_probe would have just stolen the only available context bank for
bypass purposes :)

But I've updated this to keep track of the need for bypass separate from
the index. We don't have a lot of SMMU controllers, so it's not a big
waste.

> > +   if (type == S2CR_TYPE_BYPASS) {
> > +   /*
> > +* Firmware with quirky S2CR handling will substitute
> > +* BYPASS writes with FAULT, so point the stream to the
> > +  

Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-10-19 Thread Arvind Sankar
On Mon, Oct 19, 2020 at 04:51:53PM +0200, Daniel Kiper wrote:
> On Fri, Oct 16, 2020 at 04:51:51PM -0400, Arvind Sankar wrote:
> > On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote:
> > >
> > > I am discussing with Ross the other option. We can create
> > > .rodata.mle_header section and put it at fixed offset as
> > > kernel_info is. So, we would have, e.g.:
> > >
> > > arch/x86/boot/compressed/vmlinux.lds.S:
> > > .rodata.kernel_info KERNEL_INFO_OFFSET : {
> > > *(.rodata.kernel_info)
> > > }
> > > ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info 
> > > at bad address!")
> > >
> > > .rodata.mle_header MLE_HEADER_OFFSET : {
> > > *(.rodata.mle_header)
> > > }
> > > ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at 
> > > bad address!")
> > >
> > > arch/x86/boot/compressed/sl_stub.S:
> > > #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET)
> > >
> > > .section ".rodata.mle_header", "a"
> > >
> > > SYM_DATA_START(mle_header)
> > > .long   0x9082ac5a/* UUID0 */
> > > .long   0x74a7476f/* UUID1 */
> > > .long   0xa2555c0f/* UUID2 */
> > > .long   0x42b651cb/* UUID3 */
> > > .long   0x0034/* MLE header size */
> > > .long   0x00020002/* MLE version 2.2 */
> > > .long   mleh_rva(sl_stub_entry)/* Linear entry point of MLE 
> > > (virt. address) */
> > > .long   0x/* First valid page of MLE */
> > > .long   0x/* Offset within binary of first byte of 
> > > MLE */
> > > .long   0x/* Offset within binary of last byte + 1 of 
> > > MLE */
> > > .long   0x0223/* Bit vector of MLE-supported capabilities 
> > > */
> > > .long   0x/* Starting linear address of command line 
> > > (unused) */
> > > .long   0x/* Ending linear address of command line 
> > > (unused) */
> > > SYM_DATA_END(mle_header)
> > >
> > > Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere.
> > > Anyway, is it acceptable?
> 
> What do you think about my MLE_HEADER_OFFSET and related stuff proposal?
> 

I'm wondering if it would be easier to just allow relocations in these
special "header" sections. I need to check how easy/hard it is to do
that without triggering linker warnings.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-10-19 Thread Arvind Sankar
On Mon, Oct 19, 2020 at 10:38:08AM -0400, Ross Philipson wrote:
> On 10/16/20 4:51 PM, Arvind Sankar wrote:
> > On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote:
> >>
> >> I am discussing with Ross the other option. We can create
> >> .rodata.mle_header section and put it at fixed offset as
> >> kernel_info is. So, we would have, e.g.:
> >>
> >> arch/x86/boot/compressed/vmlinux.lds.S:
> >> .rodata.kernel_info KERNEL_INFO_OFFSET : {
> >> *(.rodata.kernel_info)
> >> }
> >> ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info 
> >> at bad address!")
> >>
> >> .rodata.mle_header MLE_HEADER_OFFSET : {
> >> *(.rodata.mle_header)
> >> }
> >> ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at 
> >> bad address!")
> >>
> >> arch/x86/boot/compressed/sl_stub.S:
> >> #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET)
> >>
> >> .section ".rodata.mle_header", "a"
> >>
> >> SYM_DATA_START(mle_header)
> >> .long   0x9082ac5a/* UUID0 */
> >> .long   0x74a7476f/* UUID1 */
> >> .long   0xa2555c0f/* UUID2 */
> >> .long   0x42b651cb/* UUID3 */
> >> .long   0x0034/* MLE header size */
> >> .long   0x00020002/* MLE version 2.2 */
> >> .long   mleh_rva(sl_stub_entry)/* Linear entry point of MLE 
> >> (virt. address) */
> >> .long   0x/* First valid page of MLE */
> >> .long   0x/* Offset within binary of first byte of MLE 
> >> */
> >> .long   0x/* Offset within binary of last byte + 1 of 
> >> MLE */
> >> .long   0x0223/* Bit vector of MLE-supported capabilities 
> >> */
> >> .long   0x/* Starting linear address of command line 
> >> (unused) */
> >> .long   0x/* Ending linear address of command line 
> >> (unused) */
> >> SYM_DATA_END(mle_header)
> >>
> >> Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere.
> >> Anyway, is it acceptable?
> >>
> >> There is also another problem. We have to put into mle_header size of
> >> the Linux kernel image. Currently it is done by the bootloader but
> >> I think it is not a role of the bootloader. The kernel image should
> >> provide all data describing its properties and do not rely on the
> >> bootloader to do that. Ross and I investigated various options but we
> >> did not find a good/simple way to do that. Could you suggest how we
> >> should do that or at least where we should take a look to get some
> >> ideas?
> >>
> >> Daniel
> > 
> > What exactly is the size you need here? Is it just the size of the
> > protected mode image, that's startup_32 to _edata. Or is it the size of
> 
> It is the size of the protected mode image. Though how to reference
> those symbols to get the size might all more relocation issues.
> 

Ok, then I think mleh_rva(_edata) should get you that -- I assume you
don't want to include the uninitialized data in the size? The kernel
will access memory beyond _edata (upto the init_size in the setup
header), but that's not part of the image itself.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 00/14] IOASID extensions for guest SVA

2020-10-19 Thread Jacob Pan
Hi,

Any comments on this? I know we have some opens w.r.t. PASID management
UAPI, but I think having this common kernel API features should be
justified.

Thanks!

Jacob


On Mon, 28 Sep 2020 14:38:27 -0700, Jacob Pan 
wrote:

> IOASID was introduced in v5.5 as a generic kernel allocator service for
> both PCIe Process Address Space ID (PASID) and ARM SMMU's Sub Stream
> ID. In addition to basic ID allocation, ioasid_set was defined as a
> token that is shared by a group of IOASIDs. This set token can be used
> for permission checking, but lack of some features to address the
> following needs by guest Shared Virtual Address (SVA).
> - Manage IOASIDs by group, group ownership, quota, etc.
> - State synchronization among IOASID users
> - Non-identity guest-host IOASID mapping
> - Lifecycle management across many users
> 
> This patchset introduces the following extensions as solutions to the
> problems above.
> - Redefine and extend IOASID set such that IOASIDs can be managed by
> groups.
> - Add notifications for IOASID state synchronization
> - Add reference counting for life cycle alignment among users
> - Support ioasid_set private IDs, which can be used as guest IOASIDs
> Please refer to Documentation/ioasid.rst in enclosed patch 1/9 for more
> details.
> 
> This patchset only included VT-d driver as users of some of the new APIs.
> VFIO and KVM patches are coming up to fully utilize the APIs introduced
> here.
> 
> You can find this series at:
> https://github.com/jacobpan/linux.git ioasid_v3
> (VFIO and KVM patches will be available at this branch when published.)
> 
> This work is a result of collaboration with many people:
> Liu, Yi L 
> Wu Hao 
> Ashok Raj 
> Kevin Tian 
> 
> Thanks,
> 
> Jacob
> 
> Changelog:
> 
> V3:
> - Use consistent ioasid_set_ prefix for ioasid_set level APIs
> - Make SPID and private detach/attach APIs symmetric
> - Use the same ioasid_put semantics as Jean-Phillippe IOASID reference
> patch
> - Take away the public ioasid_notify() function, notifications are now
> emitted by IOASID core as a result of certain IOASID APIs
> - Partition into finer incremental patches
> - Miscellaneous cleanup, locking, exception handling fixes based on v2
> reviews
> 
> V2:
> - Redesigned ioasid_set APIs, removed set ID
> - Added set private ID (SPID) for guest PASID usage.
> - Add per ioasid_set notification and priority support.
> - Back to use spinlocks and atomic notifications.
> - Added async work in VT-d driver to perform teardown outside atomic
> context
> 
> Jacob Pan (14):
>   docs: Document IO Address Space ID (IOASID) APIs
>   iommu/ioasid: Rename ioasid_set_data()
>   iommu/ioasid: Add a separate function for detach data
>   iommu/ioasid: Support setting system-wide capacity
>   iommu/ioasid: Redefine IOASID set and allocation APIs
>   iommu/ioasid: Introduce API to adjust the quota of an ioasid_set
>   iommu/ioasid: Add an iterator API for ioasid_set
>   iommu/ioasid: Add reference couting functions
>   iommu/ioasid: Introduce ioasid_set private ID
>   iommu/ioasid: Introduce notification APIs
>   iommu/ioasid: Support mm type ioasid_set notifications
>   iommu/vt-d: Remove mm reference for guest SVA
>   iommu/vt-d: Listen to IOASID notifications
>   iommu/vt-d: Store guest PASID during bind
> 
>  Documentation/driver-api/ioasid.rst | 648 ++
>  drivers/iommu/intel/iommu.c |  29 +-
>  drivers/iommu/intel/pasid.h |   1 +
>  drivers/iommu/intel/svm.c   | 132 +-
>  drivers/iommu/ioasid.c  | 890
> ++-- include/linux/intel-iommu.h
> |   2 + include/linux/ioasid.h  | 197 +++-
>  7 files changed, 1830 insertions(+), 69 deletions(-)
>  create mode 100644 Documentation/driver-api/ioasid.rst
> 


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Fixes for Linux since iommu-updates-v5.10

2020-10-19 Thread Joerg Roedel
Hi Linus,

The following changes since commit 7e3c3883c381aeda903778d7e99fc4cd523be610:

  Merge branches 'arm/allwinner', 'arm/mediatek', 'arm/renesas', 'arm/tegra', 
'arm/qcom', 'arm/smmu', 'ppc/pamu', 'x86/amd', 'x86/vt-d' and 'core' into next 
(2020-10-07 11:51:59 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-fix-v5.10

for you to fetch changes up to 9def3b1a07c41e21c68a0eb353e3e569fdd1d2b1:

  iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built 
(2020-10-19 14:16:02 +0200)


IOMMU Fix for Linux v5.10:

- Fix a build regression with !CONFIG_IOMMU_API


Bartosz Golaszewski (1):
  iommu/vt-d: Don't dereference iommu_device if IOMMU_API is not built

 drivers/iommu/intel/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Please pull.

Thanks,

Joerg


signature.asc
Description: Digital signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 2/3] iommu/arm-smmu-qcom: Read back stream mappings

2020-10-19 Thread Bjorn Andersson
On Mon 19 Oct 09:03 CDT 2020, Robin Murphy wrote:

> On 2020-10-17 05:39, Bjorn Andersson wrote:
> > The Qualcomm boot loader configures stream mapping for the peripherals
> > that it accesses and in particular it sets up the stream mapping for the
> > display controller to be allowed to scan out a splash screen or EFI
> > framebuffer.
> > 
> > Read back the stream mappings during initialization and make the
> > arm-smmu driver maintain the streams in bypass mode.
> > 
> > Signed-off-by: Bjorn Andersson 
> > ---
> > 
> > Changes since v3:
> > - Extracted from different patch in v3.
> > - Now configures the stream as BYPASS, rather than translate, which should 
> > work
> >for platforms with working S2CR handling as well.
> > 
> >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 24 ++
> >   1 file changed, 24 insertions(+)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
> > b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > index be4318044f96..0089048342dd 100644
> > --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> > @@ -23,6 +23,29 @@ static const struct of_device_id 
> > qcom_smmu_client_of_match[] __maybe_unused = {
> > { }
> >   };
> > +static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)
> > +{
> > +   u32 smr;
> > +   int i;
> > +
> > +   for (i = 0; i < smmu->num_mapping_groups; i++) {
> > +   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
> > +
> > +   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
> > +   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
> > +   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
> > +   smmu->smrs[i].valid = true;
> > +
> > +   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
> > +   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
> > +   smmu->s2crs[i].cbndx = 0xff;
> > +   smmu->s2crs[i].count++;
> 
> FWIW I don't think you actually need to adjust the count here - the SMR
> being valid should already prevent the whole SME from being reassigned until
> the display probes, at which point it should "take over" the SMR based on
> matching values and claim the "initial" refcount. After that you're back
> into the standard flow. It might be a little unintuitive to have something
> in a valid but "unused" state, but arguably it's entirely accurate in terms
> of the software abstraction here.
> 
> Otherwise, you end up making boot-time SMRs - so potentially all SMRs after
> a kexec - effectively immutable, since even after Linux has taken control of
> the whole system such that they *could* be reassigned safely, there's still
> this undroppable refcount hanging around preventing it.
> 

I did increment the count here to make sure the stream mapping do
survive a probe deferral of the display controller (which is rather
common when you have some bridge chip hanging off it).

But after digging through the code further I've convinced myself that
the sme won't be freed while the device is pending probe deferral.

So I will drop this.

> That said, for a mobile SoC use-case if you have enough SMRs for all your
> stream IDs and don't have any kind of device hotplug, that restriction
> shouldn't make much difference in practice, so I'm not too concerned either
> way. Otherwise this is as nice and tidy as I'd hoped :)

I agree, I'm quite happy with where we are now!

Thanks,
Bjorn

> 
> Robin.
> 
> > +   }
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> >   static int qcom_smmu_def_domain_type(struct device *dev)
> >   {
> > const struct of_device_id *match =
> > @@ -61,6 +84,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device 
> > *smmu)
> >   }
> >   static const struct arm_smmu_impl qcom_smmu_impl = {
> > +   .cfg_probe = qcom_smmu_cfg_probe,
> > .def_domain_type = qcom_smmu_def_domain_type,
> > .reset = qcom_smmu500_reset,
> >   };
> > 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-10-19 Thread Daniel Kiper
On Fri, Oct 16, 2020 at 04:51:51PM -0400, Arvind Sankar wrote:
> On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote:
> >
> > I am discussing with Ross the other option. We can create
> > .rodata.mle_header section and put it at fixed offset as
> > kernel_info is. So, we would have, e.g.:
> >
> > arch/x86/boot/compressed/vmlinux.lds.S:
> > .rodata.kernel_info KERNEL_INFO_OFFSET : {
> > *(.rodata.kernel_info)
> > }
> > ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at 
> > bad address!")
> >
> > .rodata.mle_header MLE_HEADER_OFFSET : {
> > *(.rodata.mle_header)
> > }
> > ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at 
> > bad address!")
> >
> > arch/x86/boot/compressed/sl_stub.S:
> > #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET)
> >
> > .section ".rodata.mle_header", "a"
> >
> > SYM_DATA_START(mle_header)
> > .long   0x9082ac5a/* UUID0 */
> > .long   0x74a7476f/* UUID1 */
> > .long   0xa2555c0f/* UUID2 */
> > .long   0x42b651cb/* UUID3 */
> > .long   0x0034/* MLE header size */
> > .long   0x00020002/* MLE version 2.2 */
> > .long   mleh_rva(sl_stub_entry)/* Linear entry point of MLE 
> > (virt. address) */
> > .long   0x/* First valid page of MLE */
> > .long   0x/* Offset within binary of first byte of MLE 
> > */
> > .long   0x/* Offset within binary of last byte + 1 of 
> > MLE */
> > .long   0x0223/* Bit vector of MLE-supported capabilities */
> > .long   0x/* Starting linear address of command line 
> > (unused) */
> > .long   0x/* Ending linear address of command line 
> > (unused) */
> > SYM_DATA_END(mle_header)
> >
> > Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere.
> > Anyway, is it acceptable?

What do you think about my MLE_HEADER_OFFSET and related stuff proposal?

> > There is also another problem. We have to put into mle_header size of
> > the Linux kernel image. Currently it is done by the bootloader but
> > I think it is not a role of the bootloader. The kernel image should
> > provide all data describing its properties and do not rely on the
> > bootloader to do that. Ross and I investigated various options but we
> > did not find a good/simple way to do that. Could you suggest how we
> > should do that or at least where we should take a look to get some
> > ideas?
> >
> > Daniel
>
> What exactly is the size you need here? Is it just the size of the
> protected mode image, that's startup_32 to _edata. Or is it the size of
> the whole bzImage file, or something else? I guess the same question
> applies to "first valid page of MLE" and "first byte of MLE", and the
> linear entry point -- are those all relative to startup_32 or do they
> need to be relative to the start of the bzImage, i.e. you have to add
> the size of the real-mode boot stub?
>
> If you need to include the size of the bzImage file, that's not known
> when the files in boot/compressed are built. It's only known after the
> real-mode stub is linked. arch/x86/boot/tools/build.c fills in various
> details in the setup header and creates the bzImage file, but it does
> not currently modify anything in the protected-mode portion of the
> compressed kernel (i.e. arch/x86/boot/compressed/vmlinux, which then
> gets converted to binary format as arch/x86/boot/vmlinux.bin), so it
> would need to be extended if you need to modify the MLE header to
> include the bzImage size or anything depending on the size of the
> real-mode stub.

Ross clarified this. So, I not have to add much here.

Daniel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-10-19 Thread Ross Philipson
On 10/16/20 4:51 PM, Arvind Sankar wrote:
> On Thu, Oct 15, 2020 at 08:26:54PM +0200, Daniel Kiper wrote:
>>
>> I am discussing with Ross the other option. We can create
>> .rodata.mle_header section and put it at fixed offset as
>> kernel_info is. So, we would have, e.g.:
>>
>> arch/x86/boot/compressed/vmlinux.lds.S:
>> .rodata.kernel_info KERNEL_INFO_OFFSET : {
>> *(.rodata.kernel_info)
>> }
>> ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at 
>> bad address!")
>>
>> .rodata.mle_header MLE_HEADER_OFFSET : {
>> *(.rodata.mle_header)
>> }
>> ASSERT(ABSOLUTE(mle_header) == MLE_HEADER_OFFSET, "mle_header at bad 
>> address!")
>>
>> arch/x86/boot/compressed/sl_stub.S:
>> #define mleh_rva(X) (((X) - mle_header) + MLE_HEADER_OFFSET)
>>
>> .section ".rodata.mle_header", "a"
>>
>> SYM_DATA_START(mle_header)
>> .long   0x9082ac5a/* UUID0 */
>> .long   0x74a7476f/* UUID1 */
>> .long   0xa2555c0f/* UUID2 */
>> .long   0x42b651cb/* UUID3 */
>> .long   0x0034/* MLE header size */
>> .long   0x00020002/* MLE version 2.2 */
>> .long   mleh_rva(sl_stub_entry)/* Linear entry point of MLE 
>> (virt. address) */
>> .long   0x/* First valid page of MLE */
>> .long   0x/* Offset within binary of first byte of MLE */
>> .long   0x/* Offset within binary of last byte + 1 of 
>> MLE */
>> .long   0x0223/* Bit vector of MLE-supported capabilities */
>> .long   0x/* Starting linear address of command line 
>> (unused) */
>> .long   0x/* Ending linear address of command line 
>> (unused) */
>> SYM_DATA_END(mle_header)
>>
>> Of course MLE_HEADER_OFFSET has to be defined as a constant somewhere.
>> Anyway, is it acceptable?
>>
>> There is also another problem. We have to put into mle_header size of
>> the Linux kernel image. Currently it is done by the bootloader but
>> I think it is not a role of the bootloader. The kernel image should
>> provide all data describing its properties and do not rely on the
>> bootloader to do that. Ross and I investigated various options but we
>> did not find a good/simple way to do that. Could you suggest how we
>> should do that or at least where we should take a look to get some
>> ideas?
>>
>> Daniel
> 
> What exactly is the size you need here? Is it just the size of the
> protected mode image, that's startup_32 to _edata. Or is it the size of

It is the size of the protected mode image. Though how to reference
those symbols to get the size might all more relocation issues.

> the whole bzImage file, or something else? I guess the same question
> applies to "first valid page of MLE" and "first byte of MLE", and the

Because of the way the launch environment is setup, both "first valid
page of MLE" and "first byte of MLE" are always zero so nothing to have
to sort out there. The only fields that need to be updated are "Linear
entry point of MLE" and "Offset within binary of last byte + 1 of MLE".

Thanks
Ross

> linear entry point -- are those all relative to startup_32 or do they
> need to be relative to the start of the bzImage, i.e. you have to add
> the size of the real-mode boot stub?
> 
> If you need to include the size of the bzImage file, that's not known
> when the files in boot/compressed are built. It's only known after the
> real-mode stub is linked. arch/x86/boot/tools/build.c fills in various
> details in the setup header and creates the bzImage file, but it does
> not currently modify anything in the protected-mode portion of the
> compressed kernel (i.e. arch/x86/boot/compressed/vmlinux, which then
> gets converted to binary format as arch/x86/boot/vmlinux.bin), so it
> would need to be extended if you need to modify the MLE header to
> include the bzImage size or anything depending on the size of the
> real-mode stub.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Jason Gunthorpe
On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote:
> Hi Jason,
> 
> Good to see your response.

Ah, I was away

> > > > Second, IOMMU nested translation is a per IOMMU domain
> > > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > > etc.), reporting/enabling the nesting capability is an natural
> > > > extension to the domain uAPI of existing passthrough frameworks.
> > > > Actually, VFIO already includes a nesting enable interface even
> > > > before this series. So it doesn't make sense to generalize this uAPI
> > > > out.
> > 
> > The subsystem that obtains an IOMMU domain for a device would have to
> > register it with an open FD of the '/dev/sva'. That is the connection
> > between the two subsystems. It would be some simple kernel internal
> > stuff:
> > 
> >   sva = get_sva_from_file(fd);
> 
> Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
> which will finally program page table to host iommu driver. As far as I know,
> it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
> it sets a proper iommu type to the opened container. VFIO container already
> stands for an iommu context with which userspace could program page mapping
> to host iommu.

Again the point is to dis-aggregate the vIOMMU related stuff from VFIO
so it can be shared between more subsystems that need it. I'm sure
there will be some weird overlaps because we can't delete any of the
existing VFIO APIs, but that should not be a blocker.

Having VFIO run in a mode where '/dev/sva' provides all the IOMMU
handling is a possible path.

If your plan is to just opencode everything into VFIO then I don't see
how VDPA will work well, and if proper in-kernel abstractions are
built I fail to see how routing some of it through userspace is a
fundamental problem.

> >   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);
> 
> So this is supposed to be called by VFIO/VDPA to register the info to 
> /dev/sva.
> right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
> info? will it be duplicated with VFIO/VDPA?

Each part needs to have the information it needs? 

> > > > Moreover, mapping page fault to subdevice requires pre-
> > > > registering subdevice fault data to IOMMU layer when binding
> > > > guest page table, while such fault data can be only retrieved from
> > > > parent driver through VFIO/VDPA.
> > 
> > Not sure what this means, page fault should be tied to the PASID, any
> > hookup needed for that should be done in-kernel when the device is
> > connected to the PASID.
> 
> you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
> software together with the requestor id of the device. For the page request
> injects to guest, it should have the device info.

Whoever provides the vIOMMU emulation and relays the page fault to the
guest has to translate the RID - what does that have to do with VFIO?

How will VPDA provide the vIOMMU emulation?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 0/2] iommu: Avoid unnecessary PRI queue flushes

2020-10-19 Thread Jean-Philippe Brucker
On Sat, Oct 17, 2020 at 04:25:25AM -0700, Raj, Ashok wrote:
> > For devices that *don't* use a stop marker, the PCIe spec says (10.4.1.2):
> > 
> >   To stop [using a PASID] without using a Stop Marker Message, the
> >   function shall:
> >   1. Stop queueing new Page Request Messages for this PASID.
> 
> The device driver would need to tell stop sending any new PR's.
> 
> >   2. Finish transmitting any multi-page Page Request Messages for this
> >  PASID (i.e. send the Page Request Message with the L bit Set).
> >   3. Wait for PRG Response Messages associated any outstanding Page
> >  Request Messages for the PASID.
> > 
> > So they have to flush their PR themselves. And since the device driver
> > completes this sequence before calling unbind(), then there shouldn't be
> > any oustanding PR for the PASID, and unbind() doesn't need to flush,
> > right?
> 
> I can see how the device can complete #2,3 above. But the device driver
> isn't the one managing page-responses right. So in order for the device to
> know the above sequence is complete, it would need to get some assist from
> IOMMU driver?

No the device driver just waits for the device to indicate that it has
completed the sequence. That's what the magic stop-PASID mechanism
described by PCIe does. In 6.20.1 "Managing PASID TLP Prefix Usage" it
says:

"A Function must have a mechanism to request that it gracefully stop using
 a specific PASID. This mechanism is device specific but must satisfy the
 following rules:
 [...]
 * When the stop request mechanism indicates completion, the Function has:
   [...]
   * Complied with additional rules described in Address Translation
 Services (Chapter 10 [10.4.1.2 quoted above]) if Address Translations
 or Page Requests were issued on the behalf of this PASID."

So after the device driver initiates this mechanism in the device, the
device must be able to indicate completion of the mechanism, which
includes completing all in-flight Page Requests. At that point the device
driver can call unbind() knowing there is no pending PR for this PASID.

Thanks,
Jean

> 
> How does the driver know that everything host received has been responded
> back to device?
> 
> > 
> > > I'm not sure about other IOMMU's how they behave, When there is no space 
> > > in
> > > the PRQ, IOMMU auto-responds to the device. This puts the device in a
> > > while (1) loop. The fake successful response will let the device do a ATS
> > > lookup, and that would fail forcing the device to do another PRQ.
> > 
> > But in the sequence above, step 1 should ensure that the device will not
> > send another PR for any successful response coming back at step 3.
> 
> True, but there could be some page-request in flight on its way to the
> IOMMU. By draining and getting that round trip back to IOMMU we gaurantee
> things in flight are flushed to PRQ after that Drain completes.
> > 
> > So I agree with the below if we suspect there could be pending PR, but
> > given that pending PR are a stop marker thing and we don't know any device
> > using stop markers, I wondered why I bothered implementing PRIq flush at
> > all for SMMUv3, hence this RFC.
> > 
> 
> Cheers,
> Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/3] iommu/arm-smmu-qcom: Implement S2CR quirk

2020-10-19 Thread Robin Murphy

On 2020-10-17 05:39, Bjorn Andersson wrote:

The firmware found in some Qualcomm platforms intercepts writes to S2CR
in order to replace bypass type streams with fault; and ignore S2CR
updates of type fault.

Detect this behavior and implement a custom write_s2cr function in order
to trick the firmware into supporting bypass streams by the means of
configuring the stream for translation using a reserved and disabled
context bank.

Also circumvent the problem of configuring faulting streams by
configuring the stream as bypass.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Move the reservation of the "identity context bank" to the Qualcomm specific
   implementation.
- Implement the S2CR quirk with the newly introduced write_s2cr callback.

  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 68 ++
  1 file changed, 68 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 0089048342dd..c0f42d6a6e01 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -10,8 +10,14 @@
  
  struct qcom_smmu {

struct arm_smmu_device smmu;
+   bool bypass_cbndx;


Nit: variables named "*ndx" usually hold an actual index value. If it's 
just a flag then maybe name it something like "use_bypass_context"?



  };
  
+static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu)

+{
+   return container_of(smmu, struct qcom_smmu, smmu);
+}
+
  static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = 
{
{ .compatible = "qcom,adreno" },
{ .compatible = "qcom,mdp4" },
@@ -25,9 +31,32 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
  
  static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)

  {
+   unsigned int last_s2cr = ARM_SMMU_GR0_S2CR(smmu->num_mapping_groups - 
1);
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 reg;
u32 smr;
int i;
  
+	/*

+* With some firmware versions writes to S2CR of type FAULT are
+* ignored, and writing BYPASS will end up written as FAULT in the
+* register. Perform a write to S2CR to detect if this is the case and
+* if so reserve a context bank to emulate bypass streams.
+*/
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, S2CR_TYPE_BYPASS) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, 0xff) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, S2CR_PRIVCFG_DEFAULT);
+   arm_smmu_gr0_write(smmu, last_s2cr, reg);
+   reg = arm_smmu_gr0_read(smmu, last_s2cr);
+   if (FIELD_GET(ARM_SMMU_S2CR_TYPE, reg) != S2CR_TYPE_BYPASS) {
+   qsmmu->bypass_cbndx = smmu->num_context_banks - 1;


Oh, so maybe the name is in fact OK but the type is wrong :/

I guess this does happens to work out, but for the wrong reason...


+
+   set_bit(qsmmu->bypass_cbndx, smmu->context_map);
+
+   reg = FIELD_PREP(ARM_SMMU_CBAR_TYPE, 
CBAR_TYPE_S1_TRANS_S2_BYPASS);
+   arm_smmu_gr1_write(smmu, 
ARM_SMMU_GR1_CBAR(qsmmu->bypass_cbndx), reg);
+   }
+
for (i = 0; i < smmu->num_mapping_groups; i++) {
smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
  
@@ -46,6 +75,44 @@ static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)

return 0;
  }
  
+static void qcom_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)

+{
+   struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
+   struct qcom_smmu *qsmmu = to_qcom_smmu(smmu);
+   u32 cbndx = s2cr->cbndx;
+   u32 type = s2cr->type;
+   u32 reg;
+
+   if (qsmmu->bypass_cbndx) {


Note that if we are talking indices here then 0 would be perfectly valid 
in general. This works out OK in practice given that we're always 
reserving the last implemented context above, and if we ever *did* only 
have one such that index 0 is the last then we're going to have a bad 
time either way, but it's not necessarily the most obvious.



+   if (type == S2CR_TYPE_BYPASS) {
+   /*
+* Firmware with quirky S2CR handling will substitute
+* BYPASS writes with FAULT, so point the stream to the
+* reserved context bank and ask for translation on the
+* stream
+*/
+   type = S2CR_TYPE_TRANS;
+   cbndx = qsmmu->bypass_cbndx;
+   } else if (type == S2CR_TYPE_FAULT) {
+   /*
+* Firmware with quirky S2CR handling will ignore FAULT
+* writes, so trick it to write FAULT by asking for a
+* BYPASS.
+*/
+   type = S2CR_TYPE_BYPASS;


Ha, that's brilliant :)


+   cbndx = 0xff;
+   }
+   }
+
+   reg = 

Re: [PATCH v4 2/3] iommu/arm-smmu-qcom: Read back stream mappings

2020-10-19 Thread Robin Murphy

On 2020-10-17 05:39, Bjorn Andersson wrote:

The Qualcomm boot loader configures stream mapping for the peripherals
that it accesses and in particular it sets up the stream mapping for the
display controller to be allowed to scan out a splash screen or EFI
framebuffer.

Read back the stream mappings during initialization and make the
arm-smmu driver maintain the streams in bypass mode.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- Extracted from different patch in v3.
- Now configures the stream as BYPASS, rather than translate, which should work
   for platforms with working S2CR handling as well.

  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 24 ++
  1 file changed, 24 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index be4318044f96..0089048342dd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -23,6 +23,29 @@ static const struct of_device_id qcom_smmu_client_of_match[] 
__maybe_unused = {
{ }
  };
  
+static int qcom_smmu_cfg_probe(struct arm_smmu_device *smmu)

+{
+   u32 smr;
+   int i;
+
+   for (i = 0; i < smmu->num_mapping_groups; i++) {
+   smr = arm_smmu_gr0_read(smmu, ARM_SMMU_GR0_SMR(i));
+
+   if (FIELD_GET(ARM_SMMU_SMR_VALID, smr)) {
+   smmu->smrs[i].id = FIELD_GET(ARM_SMMU_SMR_ID, smr);
+   smmu->smrs[i].mask = FIELD_GET(ARM_SMMU_SMR_MASK, smr);
+   smmu->smrs[i].valid = true;
+
+   smmu->s2crs[i].type = S2CR_TYPE_BYPASS;
+   smmu->s2crs[i].privcfg = S2CR_PRIVCFG_DEFAULT;
+   smmu->s2crs[i].cbndx = 0xff;
+   smmu->s2crs[i].count++;


FWIW I don't think you actually need to adjust the count here - the SMR 
being valid should already prevent the whole SME from being reassigned 
until the display probes, at which point it should "take over" the SMR 
based on matching values and claim the "initial" refcount. After that 
you're back into the standard flow. It might be a little unintuitive to 
have something in a valid but "unused" state, but arguably it's entirely 
accurate in terms of the software abstraction here.


Otherwise, you end up making boot-time SMRs - so potentially all SMRs 
after a kexec - effectively immutable, since even after Linux has taken 
control of the whole system such that they *could* be reassigned safely, 
there's still this undroppable refcount hanging around preventing it.


That said, for a mobile SoC use-case if you have enough SMRs for all 
your stream IDs and don't have any kind of device hotplug, that 
restriction shouldn't make much difference in practice, so I'm not too 
concerned either way. Otherwise this is as nice and tidy as I'd hoped :)


Robin.


+   }
+   }
+
+   return 0;
+}
+
  static int qcom_smmu_def_domain_type(struct device *dev)
  {
const struct of_device_id *match =
@@ -61,6 +84,7 @@ static int qcom_smmu500_reset(struct arm_smmu_device *smmu)
  }
  
  static const struct arm_smmu_impl qcom_smmu_impl = {

+   .cfg_probe = qcom_smmu_cfg_probe,
.def_domain_type = qcom_smmu_def_domain_type,
.reset = qcom_smmu500_reset,
  };


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 1/3] iommu/arm-smmu: Allow implementation specific write_s2cr

2020-10-19 Thread Robin Murphy

On 2020-10-17 05:39, Bjorn Andersson wrote:

The firmware found in some Qualcomm platforms intercepts writes to the
S2CR register in order to replace the BYPASS type with FAULT. Further
more it treats faults at this level as catastrophic and restarts the
device.

Add support for providing implementation specific versions of the S2CR
write function, to allow the Qualcomm driver to work around this
behavior.

Signed-off-by: Bjorn Andersson 
---

Changes since v3:
- New patch

  drivers/iommu/arm/arm-smmu/arm-smmu.c | 22 ++
  drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
  2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dad7fa86fbd4..ed3f0428c110 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -929,14 +929,20 @@ static void arm_smmu_write_smr(struct arm_smmu_device 
*smmu, int idx)
  static void arm_smmu_write_s2cr(struct arm_smmu_device *smmu, int idx)
  {
struct arm_smmu_s2cr *s2cr = smmu->s2crs + idx;
-   u32 reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
- FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
- FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
-
-   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
-   smmu->smrs[idx].valid)
-   reg |= ARM_SMMU_S2CR_EXIDVALID;
-   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+   u32 reg;
+
+   if (smmu->impl && smmu->impl->write_s2cr) {
+   smmu->impl->write_s2cr(smmu, idx);


Nit: just add an early return here to avoid reindenting the whole 
function. Otherwise this looks like a reasonable level of abstraction to 
me - we'll still have plenty of flexibility to adjust things in future 
if necessary.


With that change,

Reviewed-by: Robin Murphy 


+   } else {
+   reg = FIELD_PREP(ARM_SMMU_S2CR_TYPE, s2cr->type) |
+ FIELD_PREP(ARM_SMMU_S2CR_CBNDX, s2cr->cbndx) |
+ FIELD_PREP(ARM_SMMU_S2CR_PRIVCFG, s2cr->privcfg);
+
+   if (smmu->features & ARM_SMMU_FEAT_EXIDS && smmu->smrs &&
+   smmu->smrs[idx].valid)
+   reg |= ARM_SMMU_S2CR_EXIDVALID;
+   arm_smmu_gr0_write(smmu, ARM_SMMU_GR0_S2CR(idx), reg);
+   }
  }
  
  static void arm_smmu_write_sme(struct arm_smmu_device *smmu, int idx)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 1a746476927c..b71647eaa319 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -436,6 +436,7 @@ struct arm_smmu_impl {
int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
  struct arm_smmu_device *smmu,
  struct device *dev, int start);
+   void (*write_s2cr)(struct arm_smmu_device *smmu, int idx);
  };
  
  #define INVALID_SMENDX			-1



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH next] iommu: intel: don't dereference iommu_device if IOMMU_API is not built

2020-10-19 Thread Joerg Roedel
On Tue, Oct 13, 2020 at 09:30:55AM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Since commit c40c1018 ("iommu/vt-d: Gracefully handle DMAR units
> with no supported address widths") dmar.c needs struct iommu_device to
> be selected. We can drop this dependency by not dereferencing struct
> iommu_device if IOMMU_API is not selected and by reusing the information
> stored in iommu->drhd->ignored instead.
> 
> This fixes the following build error when IOMMU_API is not selected:
> 
> drivers/iommu/intel/dmar.c: In function ‘free_iommu’:
> drivers/iommu/intel/dmar.c:1139:41: error: ‘struct iommu_device’ has no 
> member named ‘ops’
>  1139 |  if (intel_iommu_enabled && iommu->iommu.ops) {
> ^
> 
> Fixes: c40c1018 ("iommu/vt-d: Gracefully handle DMAR units with no 
> supported address widths")
> Signed-off-by: Bartosz Golaszewski 
> ---
>  drivers/iommu/intel/dmar.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 0/4] Add system mmu support for Armada-806

2020-10-19 Thread Denis Odintsov

Hi,

2) Second issue I got (btw I have ClearFog GT 8k armada-8040 based board) is 
mpci ath10k card.
It is found, it is enumerated, it is visible in lspci, but it fails to be 
initialised. Here is the log:
[1.743754] armada8k-pcie f260.pcie: host bridge /cp0/pcie@f260 
ranges:
[1.751116] armada8k-pcie f260.pcie:  MEM 0x00f600..0x00f6ef 
-> 0x00f600
[1.964690] armada8k-pcie f260.pcie: Link up
[1.969379] armada8k-pcie f260.pcie: PCI host bridge to bus :00
[1.976026] pci_bus :00: root bus resource [bus 00-ff]
[1.981537] pci_bus :00: root bus resource [mem 0xf600-0xf6ef]
[1.988462] pci :00:00.0: [11ab:0110] type 01 class 0x060400
[1.994504] pci :00:00.0: reg 0x10: [mem 0x-0x000f]
[2.000843] pci :00:00.0: supports D1 D2
[2.005132] pci :00:00.0: PME# supported from D0 D1 D3hot
[2.011853] pci :01:00.0: [168c:003c] type 00 class 0x028000
[2.018001] pci :01:00.0: reg 0x10: [mem 0x-0x001f 64bit]
[2.025002] pci :01:00.0: reg 0x30: [mem 0x-0x pref]
[2.032111] pci :01:00.0: supports D1 D2
[2.049409] pci :00:00.0: BAR 14: assigned [mem 0xf600-0xf61f]
[2.056322] pci :00:00.0: BAR 0: assigned [mem 0xf620-0xf62f]
[2.063142] pci :00:00.0: BAR 15: assigned [mem 0xf630-0xf63f 
pref]
[2.070484] pci :01:00.0: BAR 0: assigned [mem 0xf600-0xf61f 
64bit]
[2.077880] pci :01:00.0: BAR 6: assigned [mem 0xf630-0xf630 
pref]
[2.085135] pci :00:00.0: PCI bridge to [bus 01-ff]
[2.090384] pci :00:00.0:   bridge window [mem 0xf600-0xf61f]
[2.097202] pci :00:00.0:   bridge window [mem 0xf630-0xf63f 
pref]
[2.104539] pcieport :00:00.0: Adding to iommu group 4
[2.110232] pcieport :00:00.0: PME: Signaling with IRQ 38
[2.116141] pcieport :00:00.0: AER: enabled with IRQ 38
[8.131135] ath10k_pci :01:00.0: Adding to iommu group 4
[8.131874] ath10k_pci :01:00.0: enabling device ( -> 0002)
[8.132203] ath10k_pci :01:00.0: pci irq msi oper_irq_mode 2 irq_mode 0 
reset_mode 0
up to that point the log is the same as without SMMU enabled, except "Adding to 
iommu group N" lines, and IRQ being 37

Does forcing ath10k to use legacy interrupts rather than MSIs make a difference?

Judging by the DT it looks like MSIs ought to be targeting the GICv2M widget, 
but if things somehow end up trying to use the PCIe controller's internal MSI 
doorbell (upstream of SMMU translation) instead, then that might account for 
general interrupt-related weirdness.

Robin.


Frankly speaking you quickly overcome here my knowledge depth, this is already 
way far from what I understand about PCI devices. But I tried my best to try it 
out what you suggested.
putting 0 to /sys/bus/pci/devices/\:01\:00.0/msi_bus (bus of ath10k) and 
reloading the driver didn't make a difference with those almost same messages:

[  103.245287] ath10k_pci :01:00.0: MSI/MSI-X disallowed for future drivers
[  145.938562] ath10k_pci :01:00.0: pci irq legacy oper_irq_mode 1 irq_mode 
0 reset_mode 0
[  146.053590] ath10k_pci :01:00.0: failed to poke copy engine: -16
[  146.161637] ath10k_pci :01:00.0: failed to poke copy engine: -16
[  146.269515] ath10k_pci :01:00.0: failed to poke copy engine: -16
[  146.453633] ath10k_pci :01:00.0: failed to poke copy engine: -16
[  146.561589] ath10k_pci :01:00.0: failed to poke copy engine: -16
[  146.669550] ath10k_pci :01:00.0: failed to poke copy engine: -16
[  146.753456] ath10k_pci :01:00.0: Failed to get pcie state addr: -16
[  146.760129] ath10k_pci :01:00.0: failed to setup init config: -16
[  146.766695] ath10k_pci :01:00.0: could not power on hif bus (-16)
[  146.773196] ath10k_pci :01:00.0: could not probe fw (-16)

lspci -v says
Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit-

So it seems it does what you suggested, disabled the MSI. With not much 
differences unfortunately. =(

I also compared lscpi output also with non-SMMU-dts (with which ath10k works) 
and SMMU-dts, and there is a difference I guess, I'm not sure how affecting it 
is.
On non-SMMU dts host controller (served by pcieport driver) IRQ is 37 and 
ath10k IRQ is 82:

00:00.0 PCI bridge: Marvell Technology Group Ltd. Device 0110 (prog-if 00 
[Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 37
...
01:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless 
Network Adapter
Flags: bus master, fast devsel, latency 0, IRQ 82

On SMMU dt's though both IRQ's are same and are 38:

00:00.0 PCI bridge: Marvell Technology Group Ltd. Device 0110 (prog-if 00 
[Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 38
...
01:00.0 Network controller: Qualcomm Atheros QCA986x/988x 802.11ac Wireless 
Network Adapter
Flags: bus master, fast devsel, 

[PATCH 2/4] iommu/mediatek: Add iotlb_sync_range() support

2020-10-19 Thread Chao Hao
MTK_IOMMU driver writes one page entry and does tlb flush at a time
currently. More optimal would be to aggregate the writes and flush
BUS buffer in the end.
For 50MB buffer mapping, if mtk_iommu driver use iotlb_sync_range()
instead of tlb_add_range() and tlb_flush_walk/leaf(), it can increase
50% performance or more(depending on size of every page size) in
comparison to flushing after each page entry update. So we prefer to
use iotlb_sync_range() to replace iotlb_sync(), tlb_add_range() and
tlb_flush_walk/leaf() for MTK platforms.

Signed-off-by: Chao Hao 
---
 drivers/iommu/mtk_iommu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 785b228d39a6..d3400c15ff7b 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -224,6 +224,11 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
}
 }
 
+static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size)
+{
+   mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
+}
+
 static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
unsigned long iova, size_t granule,
void *cookie)
@@ -536,6 +541,7 @@ static const struct iommu_ops mtk_iommu_ops = {
.map= mtk_iommu_map,
.unmap  = mtk_iommu_unmap,
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
+   .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
.iotlb_sync = mtk_iommu_iotlb_sync,
.iova_to_phys   = mtk_iommu_iova_to_phys,
.probe_device   = mtk_iommu_probe_device,
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/4] MTK_IOMMU: Optimize mapping / unmapping performance

2020-10-19 Thread Chao Hao
For MTK platforms, mtk_iommu is using iotlb_sync(), tlb_add_range() and 
tlb_flush_walk/leaf()
to do tlb sync when iommu driver runs iova mapping/unmapping. But if buffer 
size is large,
it maybe consist of many pages(4K/8K/64K/1MB..). So iommu driver maybe run 
many times tlb
sync in mapping for this case and it will degrade performance seriously. In 
order to resolve the
issue, we hope to add iotlb_sync_range() callback in iommu_ops, it can appiont 
iova and size to
do tlb sync. MTK_IOMMU will use iotlb_sync_range() callback when the whole 
mapping/unmapping is
completed and remove iotlb_sync(), tlb_add_range() and tlb_flush_walk/leaf().
So this patchset will replace iotlb_sync(), tlb_add_range() and 
tlb_flush_walk/leaf() with
iotlb_sync_range() callback.

Chao Hao (4):
  iommu: Introduce iotlb_sync_range callback
  iommu/mediatek: Add iotlb_sync_range() support
  iommu/mediatek: Remove unnecessary tlb sync
  iommu/mediatek: Adjust iotlb_sync_range

drivers/iommu/dma-iommu.c |  9 +
drivers/iommu/iommu.c |  7 +++
drivers/iommu/mtk_iommu.c | 36 
include/linux/iommu.h |  2 ++
4 files changed, 26 insertions(+), 28 deletions(-)

--
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/4] iommu/mediatek: Remove unnecessary tlb sync

2020-10-19 Thread Chao Hao
As is "[PATCH 2/4]" described, we will use iotlb_sync_range() to replace
iotlb_sync(), tlb_add_range() and tlb_flush_walk/leaf() to enhance
performance. So we will remove the implementation of iotlb_sync(),
tlb_add_range() and tlb_flush_walk/leaf().

Signed-off-by: Chao Hao 
---
 drivers/iommu/mtk_iommu.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d3400c15ff7b..bca1f53c0ab9 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -229,21 +229,15 @@ static void __mtk_iommu_tlb_flush_range_sync(unsigned 
long iova, size_t size)
mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
 }
 
-static void mtk_iommu_tlb_flush_page_nosync(struct iommu_iotlb_gather *gather,
-   unsigned long iova, size_t granule,
-   void *cookie)
+static void mtk_iommu_tlb_flush_skip(unsigned long iova, size_t size,
+size_t granule, void *cookie)
 {
-   struct mtk_iommu_data *data = cookie;
-   struct iommu_domain *domain = >m4u_dom->domain;
-
-   iommu_iotlb_gather_add_page(domain, gather, iova, granule);
 }
 
 static const struct iommu_flush_ops mtk_iommu_flush_ops = {
.tlb_flush_all = mtk_iommu_tlb_flush_all,
-   .tlb_flush_walk = mtk_iommu_tlb_flush_range_sync,
-   .tlb_flush_leaf = mtk_iommu_tlb_flush_range_sync,
-   .tlb_add_page = mtk_iommu_tlb_flush_page_nosync,
+   .tlb_flush_walk = mtk_iommu_tlb_flush_skip,
+   .tlb_flush_leaf = mtk_iommu_tlb_flush_skip,
 };
 
 static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
@@ -443,19 +437,6 @@ static void mtk_iommu_flush_iotlb_all(struct iommu_domain 
*domain)
mtk_iommu_tlb_flush_all(mtk_iommu_get_m4u_data());
 }
 
-static void mtk_iommu_iotlb_sync(struct iommu_domain *domain,
-struct iommu_iotlb_gather *gather)
-{
-   struct mtk_iommu_data *data = mtk_iommu_get_m4u_data();
-   size_t length = gather->end - gather->start;
-
-   if (gather->start == ULONG_MAX)
-   return;
-
-   mtk_iommu_tlb_flush_range_sync(gather->start, length, gather->pgsize,
-  data);
-}
-
 static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
  dma_addr_t iova)
 {
@@ -542,7 +523,6 @@ static const struct iommu_ops mtk_iommu_ops = {
.unmap  = mtk_iommu_unmap,
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
.iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
-   .iotlb_sync = mtk_iommu_iotlb_sync,
.iova_to_phys   = mtk_iommu_iova_to_phys,
.probe_device   = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/4] iommu: Introduce iotlb_sync_range callback

2020-10-19 Thread Chao Hao
Add iotlb_sync_range callback to support that driver can appoint iova
and size to do tlb sync.
Iommu will call iotlb_sync_range() after the whole mapping/unmapping
is completed, and the iova and size of iotlb_sync_range() are start_iova
and buffer total_size respectively. At the same time, iotlb_sync() and
tlb_flush_walk/leaf() can be skipped. So iotlb_sync_range() will enhance
performance by reducing the time of tlb sync.

Signed-off-by: Chao Hao 
---
 drivers/iommu/dma-iommu.c | 9 +
 drivers/iommu/iommu.c | 7 +++
 include/linux/iommu.h | 2 ++
 3 files changed, 18 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..e2e9114c4ae2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -479,6 +479,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size_t size, int prot, u64 dma_mask)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   const struct iommu_ops *ops = domain->ops;
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = >iovad;
size_t iova_off = iova_offset(iovad, phys);
@@ -497,6 +498,10 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
iommu_dma_free_iova(cookie, iova, size);
return DMA_MAPPING_ERROR;
}
+
+   if (ops->iotlb_sync_range)
+   ops->iotlb_sync_range(iova, size);
+
return iova + iova_off;
 }
 
@@ -1165,6 +1170,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size)
 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
phys_addr_t msi_addr, struct iommu_domain *domain)
 {
+   const struct iommu_ops *ops = domain->ops;
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi_page;
dma_addr_t iova;
@@ -1187,6 +1193,9 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
if (iommu_map(domain, iova, msi_addr, size, prot))
goto out_free_iova;
 
+   if (ops->iotlb_sync_range)
+   ops->iotlb_sync_range(iova, size);
+
INIT_LIST_HEAD(_page->list);
msi_page->phys = msi_addr;
msi_page->iova = iova;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 609bd25bf154..e399a238d1e9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2304,6 +2304,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain,
unmapped += unmapped_page;
}
 
+   if (ops->iotlb_sync_range)
+   ops->iotlb_sync_range(iova, size);
+
trace_unmap(orig_iova, size, unmapped);
return unmapped;
 }
@@ -2334,6 +2337,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
 struct scatterlist *sg, unsigned int nents, int 
prot,
 gfp_t gfp)
 {
+   const struct iommu_ops *ops = domain->ops;
size_t len = 0, mapped = 0;
phys_addr_t start;
unsigned int i = 0;
@@ -2364,6 +2368,9 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, 
unsigned long iova,
sg = sg_next(sg);
}
 
+   if (ops->iotlb_sync_range)
+   ops->iotlb_sync_range(iova, mapped);
+
return mapped;
 
 out_err:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fee209efb756..4be90324bd23 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -192,6 +192,7 @@ struct iommu_iotlb_gather {
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
  * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain
+ * @iotlb_sync_range: Sync specific iova and size mappings to the hardware
  * @iotlb_sync_map: Sync mappings created recently using @map to the hardware
  * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush
  *queue
@@ -244,6 +245,7 @@ struct iommu_ops {
size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 size_t size, struct iommu_iotlb_gather *iotlb_gather);
void (*flush_iotlb_all)(struct iommu_domain *domain);
+   void (*iotlb_sync_range)(unsigned long iova, size_t size);
void (*iotlb_sync_map)(struct iommu_domain *domain);
void (*iotlb_sync)(struct iommu_domain *domain,
   struct iommu_iotlb_gather *iotlb_gather);
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/4] iommu/mediatek: Adjust iotlb_sync_range

2020-10-19 Thread Chao Hao
As is title, the patch only adjusts the architecture of
iotlb_sync_range().

No functional change.

Signed-off-by: Chao Hao 
---
 drivers/iommu/mtk_iommu.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index bca1f53c0ab9..66e5b9d3c575 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -191,10 +191,9 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
}
 }
 
-static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size,
-  size_t granule, void *cookie)
+static void mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size)
 {
-   struct mtk_iommu_data *data = cookie;
+   struct mtk_iommu_data *data;
unsigned long flags;
int ret;
u32 tmp;
@@ -216,7 +215,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
if (ret) {
dev_warn(data->dev,
 "Partial TLB flush timed out, falling back to 
full flush\n");
-   mtk_iommu_tlb_flush_all(cookie);
+   mtk_iommu_tlb_flush_all(data);
}
/* Clear the CPE status */
writel_relaxed(0, data->base + REG_MMU_CPE_DONE);
@@ -224,11 +223,6 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
}
 }
 
-static void __mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size)
-{
-   mtk_iommu_tlb_flush_range_sync(iova, size, 0, NULL)
-}
-
 static void mtk_iommu_tlb_flush_skip(unsigned long iova, size_t size,
 size_t granule, void *cookie)
 {
@@ -522,7 +516,7 @@ static const struct iommu_ops mtk_iommu_ops = {
.map= mtk_iommu_map,
.unmap  = mtk_iommu_unmap,
.flush_iotlb_all = mtk_iommu_flush_iotlb_all,
-   .iotlb_sync_range = __mtk_iommu_tlb_flush_range_sync,
+   .iotlb_sync_range = mtk_iommu_tlb_flush_range_sync,
.iova_to_phys   = mtk_iommu_iova_to_phys,
.probe_device   = mtk_iommu_probe_device,
.release_device = mtk_iommu_release_device,
-- 
2.18.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 3/3] iommu/tegra-smmu: Add PCI support

2020-10-19 Thread Robin Murphy

On 2020-10-17 02:56, Nicolin Chen wrote:

On Fri, Oct 16, 2020 at 03:10:26PM +0100, Robin Murphy wrote:

On 2020-10-16 04:53, Nicolin Chen wrote:

On Thu, Oct 15, 2020 at 10:55:52AM +0100, Robin Murphy wrote:

On 2020-10-15 05:13, Nicolin Chen wrote:

On Wed, Oct 14, 2020 at 06:42:36PM +0100, Robin Murphy wrote:

On 2020-10-09 17:19, Nicolin Chen wrote:

This patch simply adds support for PCI devices.

Reviewed-by: Dmitry Osipenko 
Tested-by: Dmitry Osipenko 
Signed-off-by: Nicolin Chen 
---

Changelog
v6->v7
 * Renamed goto labels, suggested by Thierry.
v5->v6
 * Added Dmitry's Reviewed-by and Tested-by.
v4->v5
 * Added Dmitry's Reviewed-by
v3->v4
 * Dropped !iommu_present() check
 * Added CONFIG_PCI check in the exit path
v2->v3
 * Replaced ternary conditional operator with if-else in .device_group()
 * Dropped change in tegra_smmu_remove()
v1->v2
 * Added error-out labels in tegra_smmu_probe()
 * Dropped pci_request_acs() since IOMMU core would call it.

 drivers/iommu/tegra-smmu.c | 35 +--
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index be29f5977145..2941d6459076 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -865,7 +866,11 @@ static struct iommu_group *tegra_smmu_device_group(struct 
device *dev)
group->smmu = smmu;
group->soc = soc;
-   group->group = iommu_group_alloc();
+   if (dev_is_pci(dev))
+   group->group = pci_device_group(dev);


Just to check, is it OK to have two or more swgroups "owning" the same
iommu_group if an existing one gets returned here? It looks like that might
not play nice with the use of iommu_group_set_iommudata().


Do you mean by "gets returned here" the "IS_ERR" check below?


I mean that unlike iommu_group_alloc()/generic_device_group(),
pci_device_group() may give you back a group that already contains another
device and has already been set up from that device's perspective. This can
happen for topological reasons like requester ID aliasing through a PCI-PCIe
bridge or lack of isolation between functions.


Okay..but we don't really have two swgroups owning the same groups
in case of PCI devices. For Tegra210, all PCI devices inherit the
same swgroup from the PCI controller. And I'd think previous chips
do the same. The only use case currently of 2+ swgroups owning the
same iommu_group is for display controller.

Or do you suggest we need an additional check for pci_device_group?


Ah, OK - I still don't have the best comprehension of what exactly swgroups


The "swgroup" stands for "software group", which associates with
an ASID (Address Space Identifier) for SMMU to determine whether
this client is going through SMMU translation or not.


So in Arm SMMU analogy terms it's more like a context bank index than a 
stream ID - got it.



are, and the path through .of_xlate looked like you might be using the PCI
requester ID as the swgroup identifier, but I guess that ultimately depends
on what your "iommu-map" is supposed to look like. If pci_device_group()


This is copied from pcie node in our downstream dtsi file:

iommus = < TEGRA_SWGROUP_AFI>;
iommu-map = <0x0  TEGRA_SWGROUP_AFI 0x1000>;
iommu-map-mask = <0x0>;


Aha, indeed that iommu-map-mask is the trick :)


will effectively only ever get called once regardless of how many endpoints
exist, then indeed this won't be a concern (although if that's *guaranteed*
to be the case then you may as well just stick with calling
iommu_group_alloc() directly). Thanks for clarifying.


All PCI devices are supposed to get this swgroup of the pcie node
above. So the function will return the existing group of the pcie
controller, before giving a chance to call iommu_group_alloc().


Yes, the "iommus" property will mean that the group always gets created 
first for the platform device owning the host bridge, and thus won't be 
visible to pci_device_group() anyway. Subsequent tegra_smmu_group_get() 
calls for the PCI devices (including the PCI side of the host bridge 
itself) are then going to match TEGRA_SWGROUP_AFI in the smmu->groups 
list lookup and return early, so the dev_is_pci() condition will never 
be true, and the call to pci_device_group() is in fact entirely dead code.


(I was assuming a case where you didn't have the "iommus" property, in 
which case you would reach that path exactly once for the first PCI 
device probed, wherein pci_device_group() is still only going to fall 
through to calling iommu_group_alloc() anyway).


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Liu, Yi L
Hi Jason,

Good to see your response.

> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 11:37 PM
> 
> On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote:
> > Hi, Alex and Jason (G),
> >
> > How about your opinion for this new proposal? For now looks both
> > Jason (W) and Jean are OK with this direction and more discussions
> > are possibly required for the new /dev/ioasid interface. Internally
> > we're doing a quick prototype to see any unforeseen issue with this
> > separation.
> 
> Assuming VDPA and VFIO will be the only two users so duplicating
> everything only twice sounds pretty restricting to me.
> 
> > > Second, IOMMU nested translation is a per IOMMU domain
> > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > etc.), reporting/enabling the nesting capability is an natural
> > > extension to the domain uAPI of existing passthrough frameworks.
> > > Actually, VFIO already includes a nesting enable interface even
> > > before this series. So it doesn't make sense to generalize this uAPI
> > > out.
> 
> The subsystem that obtains an IOMMU domain for a device would have to
> register it with an open FD of the '/dev/sva'. That is the connection
> between the two subsystems. It would be some simple kernel internal
> stuff:
> 
>   sva = get_sva_from_file(fd);

Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
which will finally program page table to host iommu driver. As far as I know,
it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
it sets a proper iommu type to the opened container. VFIO container already
stands for an iommu context with which userspace could program page mapping
to host iommu.

>   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva.
right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
info? will it be duplicated with VFIO/VDPA?

> Not sure why this is a roadblock?
> 
> How would this be any different from having some kernel libsva that
> VDPA and VFIO would both rely on?
> 
> You don't plan to just open code all this stuff in VFIO, do you?
> 
> > > Then the tricky part comes with the remaining operations (3/4/5),
> > > which are all backed by iommu_ops thus effective only within an
> > > IOMMU domain. To generalize them, the first thing is to find a way
> > > to associate the sva_FD (opened through generic /dev/sva) with an
> > > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > > to replicate {domain<->device/subdevice} association in /dev/sva
> > > path because some operations (e.g. page fault) is triggered/handled
> > > per device/subdevice. Therefore, /dev/sva must provide both per-
> > > domain and per-device uAPIs similar to what VFIO/VDPA already
> > > does.
> 
> Yes, the point here was to move the general APIs out of VFIO and into
> a sharable location. So, of course one would expect some duplication
> during the transition period.
> 
> > > Moreover, mapping page fault to subdevice requires pre-
> > > registering subdevice fault data to IOMMU layer when binding
> > > guest page table, while such fault data can be only retrieved from
> > > parent driver through VFIO/VDPA.
> 
> Not sure what this means, page fault should be tied to the PASID, any
> hookup needed for that should be done in-kernel when the device is
> connected to the PASID.

you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
software together with the requestor id of the device. For the page request
injects to guest, it should have the device info.

Regards,
Yi Liu

> 
> > > space but they may be organized in multiple IOMMU domains based
> > > on their bus type. How (should we let) the userspace know the
> > > domain information and open an sva_FD for each domain is the main
> > > problem here.
> 
> Why is one sva_FD per iommu domain required? The HW can attach the
> same PASID to multiple iommu domains, right?
> 
> > > In the end we just realized that doing such generalization doesn't
> > > really lead to a clear design and instead requires tight coordination
> > > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > > (especially about synchronization when the domain/device
> > > association is changed or when the device/subdevice is being reset/
> > > drained). Finally it may become a usability burden to the userspace
> > > on proper use of the two interfaces on the assigned device.
> 
> If you have a list of things that needs to be done to attach a PCI
> device to a PASID then of course they should be tidy kernel APIs
> already, and not just hard wired into VFIO.
> 
> The worst outcome would be to have VDPA and VFIO have to different
> ways to do all of this with a different set of bugs. Bug fixes/new
> features in VFIO won't flow over to VDPA.
> 
> Jason