Re: [RFC 0/2] arm-smmu-v3 tlbi-on-map option

2017-07-13 Thread Michael S. Tsirkin
On Thu, Jul 13, 2017 at 08:17:49PM +0100, Jean-Philippe Brucker wrote:
> But it is now mostly obsolete. Lots of things had to change while writing
> a prototype and following public discussions. At the moment my focus is on
> tidying up the base, but I will send another RFC afterwards.

Thanks!  Pls remember to copy virtio-dev at oasis (you need to subscribe
to post) as that's a requirement for anything involving host/guest APIs.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/2] arm-smmu-v3 tlbi-on-map option

2017-07-13 Thread Jean-Philippe Brucker
On 13/07/17 18:44, Michael S. Tsirkin wrote:
> On Thu, Jul 13, 2017 at 10:29:42AM +0100, Jean-Philippe Brucker wrote:
>> On 12/07/17 23:07, Michael S. Tsirkin wrote:
>> [...]
>>> I think using hardware support for nesting is the right final
>>> solution. It will take some time though. Given this, what should
>>> we do meanwhile?
>>>
>>> Assuming that's the final destination, a simple quirk like this
>>> seems to be preferable to virtio-iommu which we'll never be
>>> able to offload to hardware.
>>
>> That's not entirely true. virtio-iommu will have an extension for hardware
>> nesting support. It was presented in my initial proposal, and I've made
>> significant progress since then.
>>
>> Thanks,
>> Jean
> 
> I don't recall seeing this.
> 
> Hardware specific extensions to virtio would be interesting, the
> difficulty is in finding the balance between enabling minor quirks and
> each vendor going their own way.

Yes, in order to avoid too many quirks and vendor-specific formats, we'd
like the guest to only manage page tables, which have relatively stable
format, while the host kernel manages context tables (PASID tables) and
other structures, that are more volatile across implementations.

Unavoidably, a few architecture-specific details need to be described in
the API, but it seems manageable. The host tells the guest which page
table format is supported by hardware, and the guest sends back a page
directory pointer along with a few configuration parameters.

In the guest, page table operations may be abstracted in a module and used
by multiple IOMMU drivers (what ARM does in drivers/iommu/io-pgtable-arm.c
for the various vendor drivers). This abstraction is quite helpful to find
out which information needs to be exchanged between virtio-iommu device
and driver. For ARM-based IOMMUs it amounts to 6 configuration registers
and 4 quirk bits.

The other problem is forwarding TLB invalidations to the host kernel.
There is an ongoing discussion [1] about the best way to do it in VFIO,
and it seems like the format can stay mostly generic, with a few
optimization hints depending on the hardware IOMMU.

> Is this the proposal you refer to?
> https://www.spinics.net/lists/kvm/msg147990.html

Yes, I'm referring to "II. Page table sharing" in
https://www.spinics.net/lists/kvm/msg147993.html

But it is now mostly obsolete. Lots of things had to change while writing
a prototype and following public discussions. At the moment my focus is on
tidying up the base, but I will send another RFC afterwards.

Thanks,
Jean

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg01117.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/2] arm-smmu-v3 tlbi-on-map option

2017-07-13 Thread Michael S. Tsirkin
On Thu, Jul 13, 2017 at 10:29:42AM +0100, Jean-Philippe Brucker wrote:
> On 12/07/17 23:07, Michael S. Tsirkin wrote:
> [...]
> > I think using hardware support for nesting is the right final
> > solution. It will take some time though. Given this, what should
> > we do meanwhile?
> > 
> > Assuming that's the final destination, a simple quirk like this
> > seems to be preferable to virtio-iommu which we'll never be
> > able to offload to hardware.
> 
> That's not entirely true. virtio-iommu will have an extension for hardware
> nesting support. It was presented in my initial proposal, and I've made
> significant progress since then.
> 
> Thanks,
> Jean

I don't recall seeing this.

Hardware specific extensions to virtio would be interesting, the
difficulty is in finding the balance between enabling minor quirks and
each vendor going their own way.

Is this the proposal you refer to?
https://www.spinics.net/lists/kvm/msg147990.html

I couldn't find any mention of nesting, it seems to
say that map requests are relayed through host.

-- 
MST
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Rob Clark
On Thu, Jul 13, 2017 at 9:53 AM, Sricharan R  wrote:
> Hi,
>
> On 7/13/2017 5:20 PM, Rob Clark wrote:
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R  
>> wrote:
>>> Hi Vivek,
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
 Hi Stephen,


 On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> On 07/06, Vivek Gautam wrote:
>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
>> *domain, unsigned long iova,
>>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned 
>> long iova,
>>size_t size)
>>   {
>> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +size_t ret;
>> if (!ops)
>>   return 0;
>>   -return ops->unmap(ops, iova, size);
>> +pm_runtime_get_sync(smmu_domain->smmu->dev);
> Can these map/unmap ops be called from an atomic context? I seem
> to recall that being a problem before.

 That's something which was dropped in the following patch merged in master:
 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

 Looks like we don't  need locks here anymore?
>>>
>>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>>  from unmap. Somehow looks like some path in the master using that
>>>  should have enabled the pm ?
>>>
>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>  Ok, with that being the case, there are two things here,
>
>  1) If the device links are still intact at these places where unmap is 
> called,
> then pm_runtime from the master would setup the all the clocks. That would
> avoid reintroducing the locking indirectly here.
>
>  2) If not, then doing it here is the only way. But for both cases, since
> the unmap can be called from atomic context, resume handler here should
> avoid doing clk_prepare_enable , instead move the clk_prepare to the init.
>

I do kinda like the approach Marek suggested.. of deferring the tlb
flush until resume.  I'm wondering if we could combine that with
putting the mmu in a stalled state when we suspend (and not resume the
mmu until after the pending tlb flush)?

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Vivek Gautam
On Thu, Jul 13, 2017 at 7:27 PM, Vivek Gautam
 wrote:
> On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R  
> wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
 On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
> *domain, unsigned long iova,
>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
> iova,
>size_t size)
>   {
> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +size_t ret;
> if (!ops)
>   return 0;
>   -return ops->unmap(ops, iova, size);
> +pm_runtime_get_sync(smmu_domain->smmu->dev);
 Can these map/unmap ops be called from an atomic context? I seem
 to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>
> Right, the master should have done a runtime_get(), and with
> device links the iommu will also resume.
>
> The master will call the unmap when it is attached to the iommu
> and therefore the iommu should be in resume state.
> We shouldn't have an unmap without the master attached anyways.
> Will investigate this further if we need the pm_runtime() calls
> around unmap or not.

My apologies. My email client didn't update the thread. So please ignore
this comment.

>
> Best regards
> Vivek
>
>>
>> Regards,
>>  Sricharan
>>
>> --
>> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member 
>> of Code Aurora Forum, hosted by The Linux Foundation
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Vivek Gautam
On Thu, Jul 13, 2017 at 11:05 AM, Sricharan R  wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
 @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
 *domain, unsigned long iova,
   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
 iova,
size_t size)
   {
 -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
 +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 +size_t ret;
 if (!ops)
   return 0;
   -return ops->unmap(ops, iova, size);
 +pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?

Right, the master should have done a runtime_get(), and with
device links the iommu will also resume.

The master will call the unmap when it is attached to the iommu
and therefore the iommu should be in resume state.
We shouldn't have an unmap without the master attached anyways.
Will investigate this further if we need the pm_runtime() calls
around unmap or not.

Best regards
Vivek

>
> Regards,
>  Sricharan
>
> --
> "QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
> Code Aurora Forum, hosted by The Linux Foundation
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Sricharan R
Hi,

On 7/13/2017 5:20 PM, Rob Clark wrote:
> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R  wrote:
>> Hi Vivek,
>>
>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
 On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
> *domain, unsigned long iova,
>   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
> iova,
>size_t size)
>   {
> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +size_t ret;
> if (!ops)
>   return 0;
>   -return ops->unmap(ops, iova, size);
> +pm_runtime_get_sync(smmu_domain->smmu->dev);
 Can these map/unmap ops be called from an atomic context? I seem
 to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>
>>  Apart from the locking, wonder why a explicit pm_runtime is needed
>>  from unmap. Somehow looks like some path in the master using that
>>  should have enabled the pm ?
>>
> 
> Yes, there are a bunch of scenarios where unmap can happen with
> disabled master (but not in atomic context).  On the gpu side we
> opportunistically keep a buffer mapping until the buffer is freed
> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
> an exported dmabuf while some other driver holds a reference to it
> (which can be dropped when the v4l2 device is suspended).
> 
> Since unmap triggers tbl flush which touches iommu regs, the iommu
> driver *definitely* needs a pm_runtime_get_sync().

 Ok, with that being the case, there are two things here,

 1) If the device links are still intact at these places where unmap is called,
then pm_runtime from the master would setup the all the clocks. That would
avoid reintroducing the locking indirectly here.

 2) If not, then doing it here is the only way. But for both cases, since
the unmap can be called from atomic context, resume handler here should
avoid doing clk_prepare_enable , instead move the clk_prepare to the init.

Regards,
 Sricharan


-- 
"QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Marek Szyprowski

Hi Rob,

On 2017-07-13 14:10, Rob Clark wrote:

On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
 wrote:

On 2017-07-13 13:50, Rob Clark wrote:

On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R 
wrote:

On 7/13/2017 10:43 AM, Vivek Gautam wrote:

On 07/13/2017 04:24 AM, Stephen Boyd wrote:

On 07/06, Vivek Gautam wrote:

@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
*domain, unsigned long iova,
static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
long iova,
 size_t size)
{
-struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+size_t ret;
  if (!ops)
return 0;
-return ops->unmap(ops, iova, size);
+pm_runtime_get_sync(smmu_domain->smmu->dev);

Can these map/unmap ops be called from an atomic context? I seem
to recall that being a problem before.

That's something which was dropped in the following patch merged in
master:
523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

Looks like we don't  need locks here anymore?

   Apart from the locking, wonder why a explicit pm_runtime is needed
   from unmap. Somehow looks like some path in the master using that
   should have enabled the pm ?


Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().


Afair unmap might be called from atomic context as well, for example as
a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
PM state of IOMMU device. TLB flush is performed only when IOMMU is in
active
state. If it is suspended, I assume that the IOMMU controller's context
is already lost and its respective power domain might be already turned off,
so there is no point in touching IOMMU registers.


that seems like an interesting approach.. although I wonder if there
can be some race w/ new device memory access once clks are enabled
before tlb flush completes?  That would be rather bad, since this
approach is letting the backing pages of memory be freed before tlb
flush.


Exynos IOMMU has spinlock for ensuring that there is no race between PM 
runtime

suspend and unmap/tlb flush.

Best regards
--
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Rob Clark
On Thu, Jul 13, 2017 at 8:02 AM, Marek Szyprowski
 wrote:
> Hi All,
>
> On 2017-07-13 13:50, Rob Clark wrote:
>>
>> On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R 
>> wrote:
>>>
>>> On 7/13/2017 10:43 AM, Vivek Gautam wrote:

 On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>
> On 07/06, Vivek Gautam wrote:
>>
>> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain
>> *domain, unsigned long iova,
>>static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned
>> long iova,
>> size_t size)
>>{
>> -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
>> +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>> +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
>> +size_t ret;
>>  if (!ops)
>>return 0;
>>-return ops->unmap(ops, iova, size);
>> +pm_runtime_get_sync(smmu_domain->smmu->dev);
>
> Can these map/unmap ops be called from an atomic context? I seem
> to recall that being a problem before.

 That's something which was dropped in the following patch merged in
 master:
 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

 Looks like we don't  need locks here anymore?
>>>
>>>   Apart from the locking, wonder why a explicit pm_runtime is needed
>>>   from unmap. Somehow looks like some path in the master using that
>>>   should have enabled the pm ?
>>>
>> Yes, there are a bunch of scenarios where unmap can happen with
>> disabled master (but not in atomic context).  On the gpu side we
>> opportunistically keep a buffer mapping until the buffer is freed
>> (which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
>> an exported dmabuf while some other driver holds a reference to it
>> (which can be dropped when the v4l2 device is suspended).
>>
>> Since unmap triggers tbl flush which touches iommu regs, the iommu
>> driver *definitely* needs a pm_runtime_get_sync().
>
>
> Afair unmap might be called from atomic context as well, for example as
> a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
> PM state of IOMMU device. TLB flush is performed only when IOMMU is in
> active
> state. If it is suspended, I assume that the IOMMU controller's context
> is already lost and its respective power domain might be already turned off,
> so there is no point in touching IOMMU registers.
>

that seems like an interesting approach.. although I wonder if there
can be some race w/ new device memory access once clks are enabled
before tlb flush completes?  That would be rather bad, since this
approach is letting the backing pages of memory be freed before tlb
flush.

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH 1/4] Docs: dt: document qcom iommu bindings

2017-07-13 Thread Rob Clark
Cc: devicet...@vger.kernel.org
Signed-off-by: Rob Clark 
Reviewed-by: Rob Herring 
---
 .../devicetree/bindings/iommu/qcom,iommu.txt   | 121 +
 1 file changed, 121 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/qcom,iommu.txt

diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt 
b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
new file mode 100644
index ..b2641ceb2b40
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt
@@ -0,0 +1,121 @@
+* QCOM IOMMU v1 Implementation
+
+Qualcomm "B" family devices which are not compatible with arm-smmu have
+a similar looking IOMMU but without access to the global register space,
+and optionally requiring additional configuration to route context irqs
+to non-secure vs secure interrupt line.
+
+** Required properties:
+
+- compatible   : Should be one of:
+
+"qcom,msm8916-iommu"
+
+ Followed by "qcom,msm-iommu-v1".
+
+- clock-names  : Should be a pair of "iface" (required for IOMMUs
+ register group access) and "bus" (required for
+ the IOMMUs underlying bus access).
+
+- clocks   : Phandles for respective clocks described by
+ clock-names.
+
+- #address-cells   : must be 1.
+
+- #size-cells  : must be 1.
+
+- #iommu-cells : Must be 1.  Index identifies the context-bank #.
+
+- ranges   : Base address and size of the iommu context banks.
+
+- qcom,iommu-secure-id  : secure-id.
+
+- List of sub-nodes, one per translation context bank.  Each sub-node
+  has the following required properties:
+
+  - compatible : Should be one of:
+- "qcom,msm-iommu-v1-ns"  : non-secure context bank
+- "qcom,msm-iommu-v1-sec" : secure context bank
+  - reg: Base address and size of context bank within the iommu
+  - interrupts : The context fault irq.
+
+** Optional properties:
+
+- reg  : Base address and size of the SMMU local base, should
+ be only specified if the iommu requires configuration
+ for routing of context bank irq's to secure vs non-
+ secure lines.  (Ie. if the iommu contains secure
+ context banks)
+
+
+** Examples:
+
+   apps_iommu: iommu@1e2 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+   compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
+   ranges = <0 0x1e2 0x4>;
+   reg = <0x1ef 0x3000>;
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_APSS_TCU_CLK>;
+   clock-names = "iface", "bus";
+   qcom,iommu-secure-id = <17>;
+
+   // mdp_0:
+   iommu-ctx@4000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x4000 0x1000>;
+   interrupts = ;
+   };
+
+   // venus_ns:
+   iommu-ctx@5000 {
+   compatible = "qcom,msm-iommu-v1-sec";
+   reg = <0x5000 0x1000>;
+   interrupts = ;
+   };
+   };
+
+   gpu_iommu: iommu@1f08000 {
+   #address-cells = <1>;
+   #size-cells = <1>;
+   #iommu-cells = <1>;
+   compatible = "qcom,msm8916-iommu", "qcom,msm-iommu-v1";
+   ranges = <0 0x1f08000 0x1>;
+   clocks = < GCC_SMMU_CFG_CLK>,
+< GCC_GFX_TCU_CLK>;
+   clock-names = "iface", "bus";
+   qcom,iommu-secure-id = <18>;
+
+   // gfx3d_user:
+   iommu-ctx@1000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x1000 0x1000>;
+   interrupts = ;
+   };
+
+   // gfx3d_priv:
+   iommu-ctx@2000 {
+   compatible = "qcom,msm-iommu-v1-ns";
+   reg = <0x2000 0x1000>;
+   interrupts = ;
+   };
+   };
+
+   ...
+
+   venus: video-codec@1d0 {
+   ...
+   iommus = <_iommu 5>;
+   };
+
+   mdp: mdp@1a01000 {
+   ...
+   iommus = <_iommu 4>;
+   };
+
+   gpu@01c0 {
+   ...
+   iommus = <_iommu 1>, <_iommu 2>;
+   };
-- 
2.13.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH 4/4] iommu: qcom: initialize secure page table

2017-07-13 Thread Rob Clark
From: Stanimir Varbanov 

This basically gets the secure page table size, allocates memory for
secure pagetables and passes the physical address to the trusted zone.

Signed-off-by: Stanimir Varbanov 
Signed-off-by: Rob Clark 
---
 drivers/iommu/qcom_iommu.c | 64 ++
 1 file changed, 64 insertions(+)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 860cad1cb167..48b62aa52787 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -604,6 +604,51 @@ static void qcom_iommu_disable_clocks(struct 
qcom_iommu_dev *qcom_iommu)
clk_disable_unprepare(qcom_iommu->iface_clk);
 }
 
+static int qcom_iommu_sec_ptbl_init(struct device *dev)
+{
+   size_t psize = 0;
+   unsigned int spare = 0;
+   void *cpu_addr;
+   dma_addr_t paddr;
+   unsigned long attrs;
+   static bool allocated = false;
+   int ret;
+
+   if (allocated)
+   return 0;
+
+   ret = qcom_scm_iommu_secure_ptbl_size(spare, );
+   if (ret) {
+   dev_err(dev, "failed to get iommu secure pgtable size (%d)\n",
+   ret);
+   return ret;
+   }
+
+   dev_info(dev, "iommu sec: pgtable size: %zu\n", psize);
+
+   attrs = DMA_ATTR_NO_KERNEL_MAPPING;
+
+   cpu_addr = dma_alloc_attrs(dev, psize, , GFP_KERNEL, attrs);
+   if (!cpu_addr) {
+   dev_err(dev, "failed to allocate %zu bytes for pgtable\n",
+   psize);
+   return -ENOMEM;
+   }
+
+   ret = qcom_scm_iommu_secure_ptbl_init(paddr, psize, spare);
+   if (ret) {
+   dev_err(dev, "failed to init iommu pgtable (%d)\n", ret);
+   goto free_mem;
+   }
+
+   allocated = true;
+   return 0;
+
+free_mem:
+   dma_free_attrs(dev, psize, cpu_addr, paddr, attrs);
+   return ret;
+}
+
 static int get_asid(const struct device_node *np)
 {
u32 reg;
@@ -700,6 +745,17 @@ static struct platform_driver qcom_iommu_ctx_driver = {
.remove = qcom_iommu_ctx_remove,
 };
 
+static bool qcom_iommu_has_secure_context(struct qcom_iommu_dev *qcom_iommu)
+{
+   struct device_node *child;
+
+   for_each_child_of_node(qcom_iommu->dev->of_node, child)
+   if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec"))
+   return true;
+
+   return false;
+}
+
 static int qcom_iommu_device_probe(struct platform_device *pdev)
 {
struct device_node *child;
@@ -744,6 +800,14 @@ static int qcom_iommu_device_probe(struct platform_device 
*pdev)
return -ENODEV;
}
 
+   if (qcom_iommu_has_secure_context(qcom_iommu)) {
+   ret = qcom_iommu_sec_ptbl_init(dev);
+   if (ret) {
+   dev_err(dev, "cannot init secure pg table(%d)\n", ret);
+   return ret;
+   }
+   }
+
platform_set_drvdata(pdev, qcom_iommu);
 
pm_runtime_enable(dev);
-- 
2.13.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH 0/4] iommu: add qcom_iommu for early "B" family devices

2017-07-13 Thread Rob Clark
An iommu driver for Qualcomm "B" family devices which do implement the
ARM SMMU spec, but not in a way that arm-smmu can support.

(I initially added support to arm-smmu, but it was decided that approach
was too intrusive and it would be cleaner to have a separate driver.)

I should note that all the dependencies for this driver have been merged
since 4.12, and it is the last thing needed for having another fully-
enabled (gpu/display/video codec/etc) ARM device that is fully upstream.

Rob Clark (3):
  Docs: dt: document qcom iommu bindings
  iommu: arm-smmu: split out register defines
  iommu: add qcom_iommu

Stanimir Varbanov (1):
  iommu: qcom: initialize secure page table

 .../devicetree/bindings/iommu/qcom,iommu.txt   | 121 +++
 drivers/iommu/Kconfig  |  10 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/arm-smmu-regs.h  | 227 +
 drivers/iommu/arm-smmu.c   | 203 +
 drivers/iommu/qcom_iommu.c | 932 +
 6 files changed, 1292 insertions(+), 202 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/qcom,iommu.txt
 create mode 100644 drivers/iommu/arm-smmu-regs.h
 create mode 100644 drivers/iommu/qcom_iommu.c

-- 
2.13.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RESEND PATCH 2/4] iommu: arm-smmu: split out register defines

2017-07-13 Thread Rob Clark
I want to re-use some of these for qcom_iommu, which has (roughly) the
same context-bank registers.

Signed-off-by: Rob Clark 
---
 drivers/iommu/arm-smmu-regs.h | 227 ++
 drivers/iommu/arm-smmu.c  | 203 +
 2 files changed, 228 insertions(+), 202 deletions(-)
 create mode 100644 drivers/iommu/arm-smmu-regs.h

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
new file mode 100644
index ..87589c863068
--- /dev/null
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -0,0 +1,227 @@
+/*
+ * IOMMU API for ARM architected SMMU implementations.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon 
+ */
+
+#ifndef _ARM_SMMU_REGS_H
+#define _ARM_SMMU_REGS_H
+
+/* Configuration registers */
+#define ARM_SMMU_GR0_sCR0  0x0
+#define sCR0_CLIENTPD  (1 << 0)
+#define sCR0_GFRE  (1 << 1)
+#define sCR0_GFIE  (1 << 2)
+#define sCR0_EXIDENABLE(1 << 3)
+#define sCR0_GCFGFRE   (1 << 4)
+#define sCR0_GCFGFIE   (1 << 5)
+#define sCR0_USFCFG(1 << 10)
+#define sCR0_VMIDPNE   (1 << 11)
+#define sCR0_PTM   (1 << 12)
+#define sCR0_FB(1 << 13)
+#define sCR0_VMID16EN  (1 << 31)
+#define sCR0_BSU_SHIFT 14
+#define sCR0_BSU_MASK  0x3
+
+/* Auxiliary Configuration register */
+#define ARM_SMMU_GR0_sACR  0x10
+
+/* Identification registers */
+#define ARM_SMMU_GR0_ID0   0x20
+#define ARM_SMMU_GR0_ID1   0x24
+#define ARM_SMMU_GR0_ID2   0x28
+#define ARM_SMMU_GR0_ID3   0x2c
+#define ARM_SMMU_GR0_ID4   0x30
+#define ARM_SMMU_GR0_ID5   0x34
+#define ARM_SMMU_GR0_ID6   0x38
+#define ARM_SMMU_GR0_ID7   0x3c
+#define ARM_SMMU_GR0_sGFSR 0x48
+#define ARM_SMMU_GR0_sGFSYNR0  0x50
+#define ARM_SMMU_GR0_sGFSYNR1  0x54
+#define ARM_SMMU_GR0_sGFSYNR2  0x58
+
+#define ID0_S1TS   (1 << 30)
+#define ID0_S2TS   (1 << 29)
+#define ID0_NTS(1 << 28)
+#define ID0_SMS(1 << 27)
+#define ID0_ATOSNS (1 << 26)
+#define ID0_PTFS_NO_AARCH32(1 << 25)
+#define ID0_PTFS_NO_AARCH32S   (1 << 24)
+#define ID0_CTTW   (1 << 14)
+#define ID0_NUMIRPT_SHIFT  16
+#define ID0_NUMIRPT_MASK   0xff
+#define ID0_NUMSIDB_SHIFT  9
+#define ID0_NUMSIDB_MASK   0xf
+#define ID0_EXIDS  (1 << 8)
+#define ID0_NUMSMRG_SHIFT  0
+#define ID0_NUMSMRG_MASK   0xff
+
+#define ID1_PAGESIZE   (1 << 31)
+#define ID1_NUMPAGENDXB_SHIFT  28
+#define ID1_NUMPAGENDXB_MASK   7
+#define ID1_NUMS2CB_SHIFT  16
+#define ID1_NUMS2CB_MASK   0xff
+#define ID1_NUMCB_SHIFT0
+#define ID1_NUMCB_MASK 0xff
+
+#define ID2_OAS_SHIFT  4
+#define ID2_OAS_MASK   0xf
+#define ID2_IAS_SHIFT  0
+#define ID2_IAS_MASK   0xf
+#define ID2_UBS_SHIFT  8
+#define ID2_UBS_MASK   0xf
+#define ID2_PTFS_4K(1 << 12)
+#define ID2_PTFS_16K   (1 << 13)
+#define ID2_PTFS_64K   (1 << 14)
+#define ID2_VMID16 (1 << 15)
+
+#define ID7_MAJOR_SHIFT4
+#define ID7_MAJOR_MASK 0xf
+
+/* Global TLB invalidation */
+#define ARM_SMMU_GR0_TLBIVMID  0x64
+#define ARM_SMMU_GR0_TLBIALLNSNH   0x68
+#define ARM_SMMU_GR0_TLBIALLH  0x6c
+#define ARM_SMMU_GR0_sTLBGSYNC 0x70
+#define ARM_SMMU_GR0_sTLBGSTATUS   0x74
+#define sTLBGSTATUS_GSACTIVE   (1 << 0)
+#define TLB_LOOP_TIMEOUT   100 /* 1s! */
+#define TLB_SPIN_COUNT 10
+
+/* Stream mapping registers */
+#define ARM_SMMU_GR0_SMR(n)(0x800 + ((n) << 2))
+#define SMR_VALID   

[RESEND PATCH 3/4] iommu: add qcom_iommu

2017-07-13 Thread Rob Clark
An iommu driver for Qualcomm "B" family devices which do implement the
ARM SMMU spec, but not in a way that is compatible with how the arm-smmu
driver is designed.  It seems SMMU_SCR1.GASRAE=1 so the global register
space is not accessible.  This means it needs to get configuration from
devicetree instead of setting it up dynamically.

In the end, other than register definitions, there is not much code to
share with arm-smmu (other than what has already been refactored out
into the pgtable helpers).

Signed-off-by: Rob Clark 
Tested-by: Riku Voipio 
---
v1: original
v2: bindings cleanups and kconfig issues that kbuild robot pointed out
v3: fix issues pointed out by Rob H. and actually make device removal
work
v4: fix WARN_ON() splats reported by Archit
v5: some fixes to build as a module.. note that it cannot actually
be built as a module yet (at minimum a bunch of other iommu syms
that are needed are not exported, but there may be more to it
than that), but at least qcom_iommu is ready should it become
possible to build iommu drivers as modules.
v6: Add additional pm-runtime get/puts around paths that can hit
TLB inv, to avoid unclocked register access if device using the
iommu is not powered on.  And pre-emptively clear interrupts
before registering IRQ handler just in case the bootloader has
left us a surpise.
v7: Address review comments from Robin (don't associate iommu_group
with context bank, table lookup instead of list to find context
bank, etc)
v8: Fix silly bug on detach.  Actually Robin already pointed it out
but I somehow overlooked that comment when preparing v7.

 drivers/iommu/Kconfig  |  10 +
 drivers/iommu/Makefile |   1 +
 drivers/iommu/qcom_iommu.c | 868 +
 3 files changed, 879 insertions(+)
 create mode 100644 drivers/iommu/qcom_iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 6ee3a25ae731..aa4b62893fe1 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -367,4 +367,14 @@ config MTK_IOMMU_V1
 
  if unsure, say N here.
 
+config QCOM_IOMMU
+   # Note: iommu drivers cannot (yet?) be built as modules
+   bool "Qualcomm IOMMU Support"
+   depends on ARCH_QCOM || COMPILE_TEST
+   select IOMMU_API
+   select IOMMU_IO_PGTABLE_LPAE
+   select ARM_DMA_USE_IOMMU
+   help
+ Support for IOMMU on certain Qualcomm SoCs.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 195f7b997d8e..b910aea813a1 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
+obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
new file mode 100644
index ..860cad1cb167
--- /dev/null
+++ b/drivers/iommu/qcom_iommu.c
@@ -0,0 +1,868 @@
+/*
+ * IOMMU API for QCOM secure IOMMUs.  Somewhat based on arm-smmu.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ *
+ * Copyright (C) 2013 ARM Limited
+ * Copyright (C) 2017 Red Hat
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "io-pgtable.h"
+#include "arm-smmu-regs.h"
+
+#define SMMU_INTR_SEL_NS 0x2000
+
+struct qcom_iommu_ctx;
+
+struct qcom_iommu_dev {
+   /* IOMMU core code handle */
+   struct iommu_device  iommu;
+   struct device   *dev;
+   struct clk  *iface_clk;
+   struct clk  *bus_clk;
+   void __iomem*local_base;
+   u32  sec_id;
+   u8   num_ctxs;
+   struct qcom_iommu_ctx   *ctxs[0];   /* indexed by asid-1 */
+};
+
+struct qcom_iommu_ctx {
+   struct device   *dev;
+   void __iomem*base;
+   bool secure_init;
+   u8   asid;  /* asid and ctx bank # are 1:1 */
+};
+
+struct qcom_iommu_domain {
+   struct io_pgtable_ops   *pgtbl_ops;
+   

Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Marek Szyprowski

Hi All,

On 2017-07-13 13:50, Rob Clark wrote:

On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R  wrote:

On 7/13/2017 10:43 AM, Vivek Gautam wrote:

On 07/13/2017 04:24 AM, Stephen Boyd wrote:

On 07/06, Vivek Gautam wrote:

@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size)
   {
-struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
+size_t ret;
 if (!ops)
   return 0;
   -return ops->unmap(ops, iova, size);
+pm_runtime_get_sync(smmu_domain->smmu->dev);

Can these map/unmap ops be called from an atomic context? I seem
to recall that being a problem before.

That's something which was dropped in the following patch merged in master:
523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock

Looks like we don't  need locks here anymore?

  Apart from the locking, wonder why a explicit pm_runtime is needed
  from unmap. Somehow looks like some path in the master using that
  should have enabled the pm ?


Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().


Afair unmap might be called from atomic context as well, for example as
a result of dma_unmap_page(). In exynos IOMMU I simply check the runtime
PM state of IOMMU device. TLB flush is performed only when IOMMU is in 
active

state. If it is suspended, I assume that the IOMMU controller's context
is already lost and its respective power domain might be already turned off,
so there is no point in touching IOMMU registers.

Best regards
--
Marek Szyprowski, PhD
Samsung R Institute Poland

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Rob Clark
On Thu, Jul 13, 2017 at 5:50 AM, Robin Murphy  wrote:
> On 13/07/17 07:48, Stephen Boyd wrote:
>> On 07/13, Vivek Gautam wrote:
>>> Hi Stephen,
>>>
>>>
>>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
 On 07/06, Vivek Gautam wrote:
> @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
> *domain, unsigned long iova,
>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
> iova,
> size_t size)
>  {
> -  struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> +  struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> +  struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> +  size_t ret;
>if (!ops)
>return 0;
> -  return ops->unmap(ops, iova, size);
> +  pm_runtime_get_sync(smmu_domain->smmu->dev);
 Can these map/unmap ops be called from an atomic context? I seem
 to recall that being a problem before.
>>>
>>> That's something which was dropped in the following patch merged in master:
>>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>>
>>> Looks like we don't  need locks here anymore?
>>>
>>
>> While removing the spinlock around the map/unmap path may be one
>> thing, I'm not sure that's all of them. Is there a path from an
>> atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
>> IOMMU for a device that can eventually get down to here and
>> attempt to turn a clk on?
>
> Yes, in the DMA path map/unmap will frequently be called from IRQ
> handlers (think e.g. network packets). The whole point of removing the
> lock was to allow multiple maps/unmaps to execute in parallel (since we
> know they will be safely operating on different areas of the pagetable).
> AFAICS this change is going to largely reintroduce that bottleneck via
> dev->power_lock, which is anything but what we want :(
>

Maybe __pm_runtime_resume() needs some sort of fast-path if already
enabled?  Or otherwise we need some sort of flag to tell the iommu
that it cannot rely on the unmapping device to be resumed?

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Rob Clark
On Thu, Jul 13, 2017 at 1:35 AM, Sricharan R  wrote:
> Hi Vivek,
>
> On 7/13/2017 10:43 AM, Vivek Gautam wrote:
>> Hi Stephen,
>>
>>
>> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
>>> On 07/06, Vivek Gautam wrote:
 @@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
 *domain, unsigned long iova,
   static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
 iova,
size_t size)
   {
 -struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
 +struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 +struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 +size_t ret;
 if (!ops)
   return 0;
   -return ops->unmap(ops, iova, size);
 +pm_runtime_get_sync(smmu_domain->smmu->dev);
>>> Can these map/unmap ops be called from an atomic context? I seem
>>> to recall that being a problem before.
>>
>> That's something which was dropped in the following patch merged in master:
>> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
>>
>> Looks like we don't  need locks here anymore?
>
>  Apart from the locking, wonder why a explicit pm_runtime is needed
>  from unmap. Somehow looks like some path in the master using that
>  should have enabled the pm ?
>

Yes, there are a bunch of scenarios where unmap can happen with
disabled master (but not in atomic context).  On the gpu side we
opportunistically keep a buffer mapping until the buffer is freed
(which can happen after gpu is disabled).  Likewise, v4l2 won't unmap
an exported dmabuf while some other driver holds a reference to it
(which can be dropped when the v4l2 device is suspended).

Since unmap triggers tbl flush which touches iommu regs, the iommu
driver *definitely* needs a pm_runtime_get_sync().

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 0/2] arm-smmu-v3 tlbi-on-map option

2017-07-13 Thread Jean-Philippe Brucker
On 12/07/17 23:07, Michael S. Tsirkin wrote:
[...]
> I think using hardware support for nesting is the right final
> solution. It will take some time though. Given this, what should
> we do meanwhile?
> 
> Assuming that's the final destination, a simple quirk like this
> seems to be preferable to virtio-iommu which we'll never be
> able to offload to hardware.

That's not entirely true. virtio-iommu will have an extension for hardware
nesting support. It was presented in my initial proposal, and I've made
significant progress since then.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: IOMMU - DMA debugging

2017-07-13 Thread Federico Vaga
On Wednesday, July 12, 2017 8:02:42 PM CEST Robin Murphy wrote:
> On 12/07/17 18:20, Federico Vaga wrote:
> > On Wednesday, July 12, 2017 2:15:34 PM CEST Federico Vaga wrote:
> >> Thank you Robin
> >> 
> >> (inline comments)
> >> 
> >> On Wednesday, July 12, 2017 1:10:51 PM CEST Robin Murphy wrote:
> >>> On 12/07/17 08:11, Federico Vaga wrote:
>  Hello,
>  
>  kernel version 4.4.x
>  
>  I'm facing an issue with the INTEL IOMMU driver and DMA mapping. I have
>  an
>  Ethernet driver that uses `dma_alloc_coherent()` to allocate and map
>  some
>  memory for DMA transfers.
> >>> 
> >>> Assuming 02:00.0 is your actual endpoint and not some upstream aliasing
> >>> bridge, is your driver definitely using the correct struct device
> >>> pointer corresponding to that for its DMA API calls?
> > 
> > I had a look at this point. The driver is using the device 02:08.0 (which
> > is the one that should use) but the errors refers to the 02:00.0. I have
> > a rough idea about how the IOMMU works but I do not know the details
> > involved in the process.
> > 
> > \-[:00]-+-00.0
> > 
> >  +-01.0-[01-02]00.0-[02]08.0  <<
> >  +-01.1-[03]00.0
> 
> OK, this is what I suspected might be happening, thanks for confirming.
> 
> > Then among all the other devices, I have this from `dmesg`
> > 
> > [...]
> > [2.212107] DMAR: Hardware identity mapping for device :00:1f.3
> > [2.219113] DMAR: Hardware identity mapping for device :03:00.0
> > [2.226118] DMAR: Hardware identity mapping for device :04:00.0
> > [2.233123] DMAR: Hardware identity mapping for device :04:00.1
> > [...]
> > [2.693295] iommu: Adding device :00:1f.3 to group 22
> > [2.699350] iommu: Adding device :01:00.0 to group 23
> > [2.705389] iommu: Adding device :02:08.0 to group 23
> > [2.711444] iommu: Adding device :03:00.0 to group 24
> > [2.717552] iommu: Adding device :04:00.0 to group 25
> > [...]
> > 
> > 
> > It misses the message "Hardware identity mapping for device :02:08.0".
> > Is it possible that there is not a valid DMAR table?
> 
> I'm a bit sketchy on intel-iommu details as well, but based on a quick
> scan through the code I'd assume your endpoint doesn't get an identity
> mapping because it's a PCI device behind a PCIe-to-PCI bridge (which
> ties in with the RID alias to DevFn 00.0). AFAICS that then means that
> the DMA ops should always give back a remapped address (i.e. iommu=pt
> ends up behaving the same as iommu=on), at which point it does start to
> look like your device is simply making bogus accesses.
> 
> The IOVA allocator will allocate DMA addresses downwards from
> 0xf000, but your reported fault addresses don't look anything like
> that, so I'd imagine that either some part of the driver is bypassing
> the DMA API and erroneously passing physical addresses to the hardware,

I don't think that this is the case. The addresses returned by 
dma_alloc_coherent are consistent with the IOVA logic:

[...]
[67594.620282] DMA addr 0xd000
[67594.620294] DMA addr 0xc000
[67594.620305] DMA addr 0xb000
[67594.620314] DMA addr 0xa000
[...]

and this is the address that is going to be programmed in the hardware 
register.


> or alternatively the hardware itself is going wrong somehow (e.g. trying
> to read a buffer address from an in-memory descriptor, getting back
> junk, and going downhill from there).

Probably it worth to have a look there
 
> Hope that helps,

Thank you

-- 
Federico Vaga
http://www.federicovaga.it/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V4 3/6] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2017-07-13 Thread Stephen Boyd
On 07/13, Vivek Gautam wrote:
> Hi Stephen,
> 
> 
> On 07/13/2017 04:24 AM, Stephen Boyd wrote:
> >On 07/06, Vivek Gautam wrote:
> >>@@ -1231,12 +1237,18 @@ static int arm_smmu_map(struct iommu_domain 
> >>*domain, unsigned long iova,
> >>  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long 
> >> iova,
> >> size_t size)
> >>  {
> >>-   struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
> >>+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >>+   struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
> >>+   size_t ret;
> >>if (!ops)
> >>return 0;
> >>-   return ops->unmap(ops, iova, size);
> >>+   pm_runtime_get_sync(smmu_domain->smmu->dev);
> >Can these map/unmap ops be called from an atomic context? I seem
> >to recall that being a problem before.
> 
> That's something which was dropped in the following patch merged in master:
> 523d7423e21b iommu/arm-smmu: Remove io-pgtable spinlock
> 
> Looks like we don't  need locks here anymore?
> 

While removing the spinlock around the map/unmap path may be one
thing, I'm not sure that's all of them. Is there a path from an
atomic DMA allocation (GFP_ATOMIC sort of thing) mapped into an
IOMMU for a device that can eventually get down to here and
attempt to turn a clk on?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu