RE: [PATCH 00/10] iommu/vt-d: Add scalable mode support

2018-07-16 Thread Liu, Yi L
Hi Jean,

> From: Jean-Philippe Brucker
> Sent: Monday, July 16, 2018 6:52 PM
> On 16/07/18 07:49, Lu Baolu wrote:
> > Intel vt-d rev3.0 [1] introduces a new translation mode called
> > 'scalable mode', which enables PASID-granular translations for first
> > level, second level, nested and pass-through modes. The vt-d scalable
> > mode is the key ingredient to enable Scalable I/O Virtualization
> > (Scalable IOV) [2] [3], which allows sharing a device in minimal
> > possible granularity (ADI - Assignable Device Interface). It also
> > includes all the capabilities required to enable Shared Virtual
> > Addressing (SVA). As a result, previous Extended Context (ECS) mode is
> > deprecated (no production ever implements ECS).
> >
> > Each scalable mode pasid table entry is 64 bytes in length, with
> > fields point to the first level page table and the second level page
> > table. The PGTT (Pasid Granular Translation Type) field is used by
> > hardware to determine the translation type.
> 
> Looks promising! Since the 2nd level page tables are in the PASID entry, the
> hypervisor traps guest accesses to the PASID tables instead of passing 
> through the
> whole PASID directory? Are you still planning to use the VFIO BIND_PASID_TABLE
> interface in this mode, or a slightly different one for individual PASIDs?

You are right. For Intel VT-d, we don't need to give the access to the whole 
guest
PASID table in Scalable Mode. However, VFIO BIND_PASID_TABLE may still needed
for other vendor. So it may still in the proposed list. This would be covered 
in the
new vSVA patchset from Jacob and me.

Thanks,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/pci: Some buggy virtual functions incorrectly report 1 for intx.

2018-07-16 Thread Raj, Ashok
Hi Alex

On Mon, Jul 16, 2018 at 03:17:57PM -0600, Alex Williamson wrote:
> >  static bool vfio_pci_nointx(struct pci_dev *pdev)
> >  {
> > +   /*
> > +* Per PCI, no VF's should have INTx
> > +* Simply disable it here
> > +*/
> > +   if (pdev->is_virtfn)
> > +   return true;
> > +
> > switch (pdev->vendor) {
> > case PCI_VENDOR_ID_INTEL:
> > switch (pdev->device) {
> 
> Nak, this is not what vfio_pci_nointx() is meant for, it's been tried
> before and it was broken:

That seems to make sense. I'll spin a new patch and have that tested
before reposting with the following suggestions.

> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/vfio/pci/vfio_pci.c?id=834814e80268c818f354c8f402e0c6604ed75589
> 
> This would cause *every* VF to print:
> 
>   dev_info(>dev, "Masking broken INTx support\n");
> 
> If the device is incapable of generating INTx then it's probably better
> to drop it at vfio_pci_get_irq_count().  Please also note the subject is
> misidentified, this is a "vfio/pci" patch, not an "x86/pci" patch.
> kvm@vger is the official list for vfio patches.  Thanks,
> 
> Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] x86/pci: Some buggy virtual functions incorrectly report 1 for intx.

2018-07-16 Thread Alex Williamson
On Mon, 16 Jul 2018 13:42:25 -0700
Ashok Raj  wrote:

> PCI_INTERRUPT_PIN should always read  0 for SRIOV Virtual Functions.
> 
> Some SRIOV devices have some bugs in RTL and VF's end up reading 1
> instead of 0 for the PIN.
> 
> We could enforce it by default in vfio_pci_nointx.
> 
> Reported-by: Gage Eads 
> Tested-by: Gage Eads 
> Signed-off-by: Ashok Raj 
> Cc: linux-ker...@vger.kernel.org
> Cc: sta...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org
> Cc: Joerg Roedel 
> Cc: Bjorn Helgaas 
> Cc: Gage Eads 
> ---
>  drivers/vfio/pci/vfio_pci.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b423a30..bc3f4fa 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -192,6 +192,13 @@ static void vfio_pci_disable(struct vfio_pci_device 
> *vdev);
>   */
>  static bool vfio_pci_nointx(struct pci_dev *pdev)
>  {
> + /*
> +  * Per PCI, no VF's should have INTx
> +  * Simply disable it here
> +  */
> + if (pdev->is_virtfn)
> + return true;
> +
>   switch (pdev->vendor) {
>   case PCI_VENDOR_ID_INTEL:
>   switch (pdev->device) {

Nak, this is not what vfio_pci_nointx() is meant for, it's been tried
before and it was broken:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/vfio/pci/vfio_pci.c?id=834814e80268c818f354c8f402e0c6604ed75589

This would cause *every* VF to print:

dev_info(>dev, "Masking broken INTx support\n");

If the device is incapable of generating INTx then it's probably better
to drop it at vfio_pci_get_irq_count().  Please also note the subject is
misidentified, this is a "vfio/pci" patch, not an "x86/pci" patch.
kvm@vger is the official list for vfio patches.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] x86/pci: Some buggy virtual functions incorrectly report 1 for intx.

2018-07-16 Thread Ashok Raj
PCI_INTERRUPT_PIN should always read  0 for SRIOV Virtual Functions.

Some SRIOV devices have some bugs in RTL and VF's end up reading 1
instead of 0 for the PIN.

We could enforce it by default in vfio_pci_nointx.

Reported-by: Gage Eads 
Tested-by: Gage Eads 
Signed-off-by: Ashok Raj 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Joerg Roedel 
Cc: Bjorn Helgaas 
Cc: Gage Eads 
---
 drivers/vfio/pci/vfio_pci.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b423a30..bc3f4fa 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -192,6 +192,13 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev);
  */
 static bool vfio_pci_nointx(struct pci_dev *pdev)
 {
+   /*
+* Per PCI, no VF's should have INTx
+* Simply disable it here
+*/
+   if (pdev->is_virtfn)
+   return true;
+
switch (pdev->vendor) {
case PCI_VENDOR_ID_INTEL:
switch (pdev->device) {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] arm64: dts: stratix10: Add SMMU Node

2018-07-16 Thread Thor Thayer

Hi Robin,

On 07/13/2018 01:09 PM, Robin Murphy wrote:

On 13/07/18 17:27, thor.tha...@linux.intel.com wrote:

From: Thor Thayer 

Add the SMMU node and IOMMU parameters to the
Stratix10 Device Tree.

Signed-off-by: Thor Thayer 
---
  arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi | 44 
+++

  1 file changed, 44 insertions(+)

diff --git a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi 
b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi

index ca67ecb5866e..9b6ead87ae70 100644
--- a/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
+++ b/arch/arm64/boot/dts/altera/socfpga_stratix10.dtsi
@@ -162,6 +162,8 @@
  reset-names = "stmmaceth", "stmmaceth-ocp";
  clocks = < STRATIX10_EMAC0_CLK>;
  clock-names = "stmmaceth";
+    #stream-id-cells = <1>;


The #stream-id-cells property is part of the deprecated mmu-masters 
binding, so you don't need to add these.



OK. Thank you.


+    iommus = < 1>;
  status = "disabled";
  };





  status = "disabled";
  };
@@ -323,6 +331,8 @@
  #dma-requests = <32>;
  clocks = < STRATIX10_L4_MAIN_CLK>;
  clock-names = "apb_pclk";
+    #stream-id-cells = <1>;
+    iommus = < 8>;


Just to double-check, all the channel threads and the manager thread 
share the one stream ID? (I'm accustomed to seeing DMA-330 integrated 
with an SMMU by tapping the AxID outputs off to the stream ID input.)


Yes, we have only one stream ID for the DMA. I'll forward the 
differences you noted to our architecture team as something to consider 
for future chips.



  };
  rst: rstmgr@ffd11000 {
@@ -332,6 +342,36 @@
  altr,modrst-offset = <0x20>;
  };
+    smmu: iommu@fa00 {
+    compatible = "arm,mmu-500", "arm,smmu-v2";
+    reg = <0xfa00 0x4>;
+    #global-interrupts = <9>;
+    #iommu-cells = <1>;
+    clocks = < STRATIX10_L4_MAIN_CLK>;
+    clock-names = "smmu_clk";
+    interrupt-parent = <>;
+    interrupts = <0 128 4>,    /* Global Secure Fault */
+    <0 129 4>, /* Global Non-secure Fault */
+    <0 130 4>, /* FPGA Performance Counter */
+    <0 131 4>, /* DMA Performance Counter */
+    <0 132 4>, /* EMAC Performance Counter */
+    <0 133 4>, /* IO Performance Counter */
+    <0 134 4>, /* SDM Performance Counter */


Note that there isn't much benefit to adding the secure or PMU 
interrupts here other than to document the hardware - FWIW I have 
actually been working on a PMU driver, and needless to say it turns out 
not to be sufficient just having those munged into the SMMU global fault 
handler.


Thanks for pointing this out. I was following other SMMU-500 device 
trees. Just to clarify, how should I simplify this? Should I replace all 
the above with the following?


<0 129 4>, /* Global Non-secure Fault */

Or will your upcoming PMU driver need the PMU units? It sounded like 
using the just Global fault was not sufficient.



+    <0 136 4>, /* Non-secure Combined Interrupt */
+    <0 137 4>, /* Secure Combined Interrupt */


Similarly the combined interrupt; that's literally just all these other 
interrupt lines ORed together at the SMMU end, and would generally only 
be useful if you *didn't* have the individual lines wired up. As it 
stands with everything listed, any event will also generate a spurious 
global fault IRQ, which isn't ideal (not that you should get many 
interrupts during normal operation, but still...)



And I'd remove both of these then, right?

Thanks for the review and helpful comments (and also pointing out the 
existing clock patch in my patch series summary)!


Thor


Robin.


+    /* Non-secure Context Interrupts (32) */
+    <0 138 4>, <0 139 4>, <0 140 4>, <0 141 4>,
+    <0 142 4>, <0 143 4>, <0 144 4>, <0 145 4>,
+    <0 146 4>, <0 147 4>, <0 148 4>, <0 149 4>,
+    <0 150 4>, <0 151 4>, <0 152 4>, <0 153 4>,
+    <0 154 4>, <0 155 4>, <0 156 4>, <0 157 4>,
+    <0 158 4>, <0 159 4>, <0 160 4>, <0 161 4>,
+    <0 162 4>, <0 163 4>, <0 164 4>, <0 165 4>,
+    <0 166 4>, <0 167 4>, <0 168 4>, <0 169 4>;
+    stream-match-mask = <0x7ff0>;
+    status = "disabled";
+    };
+



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4] dt-bindings: mediatek: Add binding for mt2712 IOMMU and SMI

2018-07-16 Thread Matthias Brugger



On 21/06/18 08:27, Yong Wu wrote:
> Hi Matthias,
> 
>   A gentle ping on this.
> 
> On Thu, 2018-05-24 at 20:35 +0800, Yong Wu wrote:
>> This patch adds decriptions for mt2712 IOMMU and SMI.
>>
>> In order to balance the bandwidth, mt2712 has two M4Us, two
>> smi-commons, 10 smi-larbs. and mt2712 is also MTK IOMMU gen2 which
>> uses ARM Short-Descriptor translation table format.
>>
>> The mt2712 M4U-SMI HW diagram is as below:
>>
>> EMI
>>  |
>>   
>>   |  |
>>  M4U0  M4U1
>>   |  |
>>  smi-common0smi-common1
>>   |  |
>>   -   
>>   | | | | |   | || | |
>>   | | | | |   | || | |
>> larb0 larb1 larb2 larb3 larb6larb4larb5larb7 larb8 larb9
>> disp0 vdec  cam   venc   jpg  mdp1/disp1 mdp2/disp2 mdp3 vdo/nr tvd
>>
>> All the connections are HW fixed, SW can NOT adjust it.
>>
>> Signed-off-by: Yong Wu 
>> Acked-by: Rob Herring 
>> ---
>> change notes:
>> v4: change the license of the new file in this patch to SPDX.
>>
>> v3: http://lists.infradead.org/pipermail/linux-mediatek/2018-May/013279.html
>> Add a new ECO port(DISP_RDMA2) in larb0/port7.
>>
>> v2:
>> https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023848.html
>>
>> v1:
>> https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023665.html
>> ---
>>  .../devicetree/bindings/iommu/mediatek,iommu.txt   |  6 +-
>>  .../memory-controllers/mediatek,smi-common.txt |  6 +-
>>  .../memory-controllers/mediatek,smi-larb.txt   |  5 +-
>>  include/dt-bindings/memory/mt2712-larb-port.h  | 95 
>> ++
>>  4 files changed, 106 insertions(+), 6 deletions(-)
>>  create mode 100644 include/dt-bindings/memory/mt2712-larb-port.h
>>

Joerg, will you take this through your tree, or would you mind to provide a
Acked-By?

Thanks,
Matthias

>> diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
>> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
>> index 53c20ca..df5db73 100644
>> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
>> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
>> @@ -40,6 +40,7 @@ video decode local arbiter, all these ports are according 
>> to the video HW.
>>  Required properties:
>>  - compatible : must be one of the following string:
>>  "mediatek,mt2701-m4u" for mt2701 which uses generation one m4u HW.
>> +"mediatek,mt2712-m4u" for mt2712 which uses generation two m4u HW.
>>  "mediatek,mt8173-m4u" for mt8173 which uses generation two m4u HW.
>>  - reg : m4u register base and size.
>>  - interrupts : the interrupt of m4u.
>> @@ -50,8 +51,9 @@ Required properties:
>>  according to the local arbiter index, like larb0, larb1, larb2...
>>  - iommu-cells : must be 1. This is the mtk_m4u_id according to the HW.
>>  Specifies the mtk_m4u_id as defined in
>> -dt-binding/memory/mt2701-larb-port.h for mt2701 and
>> -dt-binding/memory/mt8173-larb-port.h for mt8173
>> +dt-binding/memory/mt2701-larb-port.h for mt2701,
>> +dt-binding/memory/mt2712-larb-port.h for mt2712, and
>> +dt-binding/memory/mt8173-larb-port.h for mt8173.
>>  
>>  Example:
>>  iommu: iommu@10205000 {
>> diff --git 
>> a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
>>  
>> b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
>> index aa614b2..615abdd 100644
>> --- 
>> a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
>> +++ 
>> b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.txt
>> @@ -2,8 +2,9 @@ SMI (Smart Multimedia Interface) Common
>>  
>>  The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
>>  
>> -Mediatek SMI have two generations of HW architecture, mt8173 uses the second
>> -generation of SMI HW while mt2701 uses the first generation HW of SMI.
>> +Mediatek SMI have two generations of HW architecture, mt2712 and mt8173 use
>> +the second generation of SMI HW while mt2701 uses the first generation HW of
>> +SMI.
>>  
>>  There's slight differences between the two SMI, for generation 2, the
>>  register which control the iommu port is at each larb's register base. But
>> @@ -15,6 +16,7 @@ not needed for SMI generation 2.
>>  Required properties:
>>  - compatible : must be one of :
>>  "mediatek,mt2701-smi-common"
>> +"mediatek,mt2712-smi-common"
>>  "mediatek,mt8173-smi-common"
>>  - reg : the register and size of the SMI block.
>>  - power-domains : a phandle to the power domain of this local arbiter.
>> diff --git 
>> 

Re: [PATCH v12 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

2018-07-16 Thread Vivek Gautam




On 7/16/2018 2:25 PM, Rafael J. Wysocki wrote:

On Thu, Jul 12, 2018 at 2:41 PM, Vivek Gautam
 wrote:

Hi Rafael,


On Wed, Jul 11, 2018 at 4:06 PM, Vivek Gautam
 wrote:

Hi Rafael,



On 7/11/2018 3:23 PM, Rafael J. Wysocki wrote:

On Sunday, July 8, 2018 7:34:12 PM CEST Vivek Gautam wrote:

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Cc: Rafael J. Wysocki 
Cc: Lukas Wunner 
---

   - Change since v11
 * Replaced DL_FLAG_AUTOREMOVE flag with DL_FLAG_AUTOREMOVE_SUPPLIER.

   drivers/iommu/arm-smmu.c | 12 
   1 file changed, 12 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 09265e206e2d..916cde4954d2 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1461,8 +1461,20 @@ static int arm_smmu_add_device(struct device *dev)
 iommu_device_link(>iommu, dev);
   + if (pm_runtime_enabled(smmu->dev) &&

Why does the creation of the link depend on whether or not runtime PM
is enabled for the MMU device?


The main purpose of this device link is to handle the runtime PM
synchronization
between the supplier (iommu) and consumer (client devices, such as
GPU/display).
Moreover, the runtime pm is conditionally enabled for smmu devices that
support
such [1].

Is there something you would like me to modify in this patch?

Not really, as long as you are sure that it is correct. :-)

You need to remember, however, that if you add system-wide PM
callbacks to the driver, the ordering between them and the client
device callbacks during system-wide suspend matters as well.  Don't
you need the link the ensure the correct system-wide suspend ordering
too?


The fact that currently we handle clocks only through runtime pm callbacks,
would it be better to call runtime pm put/get in system-wide PM callbacks.
This would be same as i mentioned in the other thread.

Best regards
Vivek


--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 00/10] iommu/vt-d: Add scalable mode support

2018-07-16 Thread Jean-Philippe Brucker
Hi,

On 16/07/18 07:49, Lu Baolu wrote:
> Intel vt-d rev3.0 [1] introduces a new translation mode called
> 'scalable mode', which enables PASID-granular translations for
> first level, second level, nested and pass-through modes. The
> vt-d scalable mode is the key ingredient to enable Scalable I/O
> Virtualization (Scalable IOV) [2] [3], which allows sharing a
> device in minimal possible granularity (ADI - Assignable Device
> Interface). It also includes all the capabilities required to
> enable Shared Virtual Addressing (SVA). As a result, previous
> Extended Context (ECS) mode is deprecated (no production ever
> implements ECS).
> 
> Each scalable mode pasid table entry is 64 bytes in length, with
> fields point to the first level page table and the second level
> page table. The PGTT (Pasid Granular Translation Type) field is
> used by hardware to determine the translation type.

Looks promising! Since the 2nd level page tables are in the PASID entry,
the hypervisor traps guest accesses to the PASID tables instead of
passing through the whole PASID directory? Are you still planning to use
the VFIO BIND_PASID_TABLE interface in this mode, or a slightly
different one for individual PASIDs?

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-07-16 Thread Rafael J. Wysocki
Hi,

On Mon, Jul 16, 2018 at 12:11 PM, Vivek Gautam
 wrote:
> HI Rafael,
>
>
>
> On 7/16/2018 2:21 PM, Rafael J. Wysocki wrote:
>>
>> On Thu, Jul 12, 2018 at 12:57 PM, Vivek Gautam
>>  wrote:

[cut]

 Although, given the PM
 subsystem internals, the suspend function wouldn't be called on SMMU
 implementation needed power control (since they would have runtime PM
 enabled) and on others, it would be called but do nothing (since no
 clocks).

> Honestly, I just don't know. :-)
>
> It just looks odd the way it is done.  I think the clock should be
> gated during system-wide suspend too, because the system can spend
> much more time in a sleep state than in the working state, on average.
>
> And note that you cannot rely on runtime PM to always do it for you,
> because it may be disabled at a client device or even blocked by user
> space via power/control in sysfs and that shouldn't matter for
> system-wide PM.

 User space blocking runtime PM through sysfs is a good point. I'm not
 100% sure how the PM subsystem deals with that in case of system-wide
 suspend. I guess for consistency and safety, we should have the
 suspend callback.
>>>
>>> Will add the following suspend callback (same as
>>> arm_smmu_runtime_suspend):
>>>
>>>   static int __maybe_unused arm_smmu_pm_suspend(struct device *dev)
>>>   {
>>>   struct arm_smmu_device *smmu = dev_get_drvdata(dev);
>>>
>>>   clk_bulk_disable(smmu->num_clks, smmu->clks);
>>>
>>>   return 0;
>>>   }
>>
>> I think you also need to check if the clock has already been disabled
>> by runtime PM.  Otherwise you may end up disabling it twice in a row.
>
>
> Should I rather call a pm_runtime_put() in suspend callback?

That wouldn't work as runtime PM may be effectively disabled by user
space via sysfs.  That's one of the reasons why you need the extra
system-wide suspend callback in the first place. :-)

> Or an expanded form something similar to:
> https://elixir.bootlin.com/linux/v4.18-rc5/source/drivers/slimbus/qcom-ctrl.c#L695

Yes, you can do something like that, but be careful to make sure that
the state of the device after system-wide resume is consistent with
its runtime PM status in all cases.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-07-16 Thread Vivek Gautam

HI Rafael,


On 7/16/2018 2:21 PM, Rafael J. Wysocki wrote:

On Thu, Jul 12, 2018 at 12:57 PM, Vivek Gautam
 wrote:

Hi,


On Wed, Jul 11, 2018 at 6:21 PM, Tomasz Figa  wrote:

On Wed, Jul 11, 2018 at 8:11 PM Rafael J. Wysocki  wrote:

On Wed, Jul 11, 2018 at 12:55 PM, Vivek Gautam
 wrote:

Hi Rafael,


On Wed, Jul 11, 2018 at 3:20 PM, Rafael J. Wysocki  wrote:

On Sunday, July 8, 2018 7:34:10 PM CEST Vivek Gautam wrote:

From: Sricharan R 

The smmu needs to be functional only when the respective
master's using it are active. The device_link feature
helps to track such functional dependencies, so that the
iommu gets powered when the master device enables itself
using pm_runtime. So by adapting the smmu driver for
runtime pm, above said dependency can be addressed.

This patch adds the pm runtime/sleep callbacks to the
driver and also the functions to parse the smmu clocks
from DT and enable them in resume/suspend.

Signed-off-by: Sricharan R 
Signed-off-by: Archit Taneja 
[vivek: Clock rework to request bulk of clocks]
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
---

  - No change since v11.

  drivers/iommu/arm-smmu.c | 60 ++--
  1 file changed, 58 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index f7a96bcf94a6..a01d0dde21dd 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -48,6 +48,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 

@@ -205,6 +206,8 @@ struct arm_smmu_device {
   u32 num_global_irqs;
   u32 num_context_irqs;
   unsigned int*irqs;
+ struct clk_bulk_data*clks;
+ int num_clks;

   u32 cavium_id_base; /* Specific to Cavium */

@@ -1897,10 +1900,12 @@ static int arm_smmu_device_cfg_probe(struct 
arm_smmu_device *smmu)
  struct arm_smmu_match_data {
   enum arm_smmu_arch_version version;
   enum arm_smmu_implementation model;
+ const char * const *clks;
+ int num_clks;
  };

  #define ARM_SMMU_MATCH_DATA(name, ver, imp)  \
-static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+static const struct arm_smmu_match_data name = { .version = ver, .model = imp }

  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
@@ -1919,6 +1924,23 @@ static const struct of_device_id arm_smmu_of_match[] = {
  };
  MODULE_DEVICE_TABLE(of, arm_smmu_of_match);

+static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,
+const char * const *clks)
+{
+ int i;
+
+ if (smmu->num_clks < 1)
+ return;
+
+ smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
+   sizeof(*smmu->clks), GFP_KERNEL);
+ if (!smmu->clks)
+ return;
+
+ for (i = 0; i < smmu->num_clks; i++)
+ smmu->clks[i].id = clks[i];
+}
+
  #ifdef CONFIG_ACPI
  static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
  {
@@ -2001,6 +2023,9 @@ static int arm_smmu_device_dt_probe(struct 
platform_device *pdev,
   data = of_device_get_match_data(dev);
   smmu->version = data->version;
   smmu->model = data->model;
+ smmu->num_clks = data->num_clks;
+
+ arm_smmu_fill_clk_data(smmu, data->clks);

   parse_driver_options(smmu);

@@ -2099,6 +2124,14 @@ static int arm_smmu_device_probe(struct platform_device 
*pdev)
   smmu->irqs[i] = irq;
   }

+ err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
+ if (err)
+ return err;
+
+ err = clk_bulk_prepare(smmu->num_clks, smmu->clks);
+ if (err)
+ return err;
+
   err = arm_smmu_device_cfg_probe(smmu);
   if (err)
   return err;
@@ -2181,6 +2214,9 @@ static int arm_smmu_device_remove(struct platform_device 
*pdev)

   /* Turn the thing off */
   writel(sCR0_CLIENTPD, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
+
+ clk_bulk_unprepare(smmu->num_clks, smmu->clks);
+
   return 0;
  }

@@ -2197,7 +2233,27 @@ static int __maybe_unused arm_smmu_pm_resume(struct 
device *dev)
   return 0;
  }

-static SIMPLE_DEV_PM_OPS(arm_smmu_pm_ops, NULL, arm_smmu_pm_resume);
+static int __maybe_unused arm_smmu_runtime_resume(struct device *dev)
+{
+ struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+ return clk_bulk_enable(smmu->num_clks, smmu->clks);
+}
+
+static int __maybe_unused arm_smmu_runtime_suspend(struct device *dev)
+{
+ struct arm_smmu_device *smmu = dev_get_drvdata(dev);
+
+ clk_bulk_disable(smmu->num_clks, smmu->clks);
+
+ return 0;
+}
+
+static const struct dev_pm_ops arm_smmu_pm_ops = {
+ SET_SYSTEM_SLEEP_PM_OPS(NULL, arm_smmu_pm_resume)

This is suspicious.

If you need a runtime suspend method, why do you 

Re: [PATCH v12 3/4] iommu/arm-smmu: Add the device_link between masters and smmu

2018-07-16 Thread Rafael J. Wysocki
On Thu, Jul 12, 2018 at 2:41 PM, Vivek Gautam
 wrote:
> Hi Rafael,
>
>
> On Wed, Jul 11, 2018 at 4:06 PM, Vivek Gautam
>  wrote:
>> Hi Rafael,
>>
>>
>>
>> On 7/11/2018 3:23 PM, Rafael J. Wysocki wrote:
>>>
>>> On Sunday, July 8, 2018 7:34:12 PM CEST Vivek Gautam wrote:

 From: Sricharan R 

 Finally add the device link between the master device and
 smmu, so that the smmu gets runtime enabled/disabled only when the
 master needs it. This is done from add_device callback which gets
 called once when the master is added to the smmu.

 Signed-off-by: Sricharan R 
 Signed-off-by: Vivek Gautam 
 Reviewed-by: Tomasz Figa 
 Cc: Rafael J. Wysocki 
 Cc: Lukas Wunner 
 ---

   - Change since v11
 * Replaced DL_FLAG_AUTOREMOVE flag with DL_FLAG_AUTOREMOVE_SUPPLIER.

   drivers/iommu/arm-smmu.c | 12 
   1 file changed, 12 insertions(+)

 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index 09265e206e2d..916cde4954d2 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -1461,8 +1461,20 @@ static int arm_smmu_add_device(struct device *dev)
 iommu_device_link(>iommu, dev);
   + if (pm_runtime_enabled(smmu->dev) &&
>>>
>>> Why does the creation of the link depend on whether or not runtime PM
>>> is enabled for the MMU device?
>>
>>
>> The main purpose of this device link is to handle the runtime PM
>> synchronization
>> between the supplier (iommu) and consumer (client devices, such as
>> GPU/display).
>> Moreover, the runtime pm is conditionally enabled for smmu devices that
>> support
>> such [1].
>
> Is there something you would like me to modify in this patch?

Not really, as long as you are sure that it is correct. :-)

You need to remember, however, that if you add system-wide PM
callbacks to the driver, the ordering between them and the client
device callbacks during system-wide suspend matters as well.  Don't
you need the link the ensure the correct system-wide suspend ordering
too?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v12 1/4] iommu/arm-smmu: Add pm_runtime/sleep ops

2018-07-16 Thread Rafael J. Wysocki
On Thu, Jul 12, 2018 at 12:57 PM, Vivek Gautam
 wrote:
> Hi,
>
>
> On Wed, Jul 11, 2018 at 6:21 PM, Tomasz Figa  wrote:
>> On Wed, Jul 11, 2018 at 8:11 PM Rafael J. Wysocki  wrote:
>>>
>>> On Wed, Jul 11, 2018 at 12:55 PM, Vivek Gautam
>>>  wrote:
>>> > Hi Rafael,
>>> >
>>> >
>>> > On Wed, Jul 11, 2018 at 3:20 PM, Rafael J. Wysocki  
>>> > wrote:
>>> >> On Sunday, July 8, 2018 7:34:10 PM CEST Vivek Gautam wrote:
>>> >>> From: Sricharan R 
>>> >>>
>>> >>> The smmu needs to be functional only when the respective
>>> >>> master's using it are active. The device_link feature
>>> >>> helps to track such functional dependencies, so that the
>>> >>> iommu gets powered when the master device enables itself
>>> >>> using pm_runtime. So by adapting the smmu driver for
>>> >>> runtime pm, above said dependency can be addressed.
>>> >>>
>>> >>> This patch adds the pm runtime/sleep callbacks to the
>>> >>> driver and also the functions to parse the smmu clocks
>>> >>> from DT and enable them in resume/suspend.
>>> >>>
>>> >>> Signed-off-by: Sricharan R 
>>> >>> Signed-off-by: Archit Taneja 
>>> >>> [vivek: Clock rework to request bulk of clocks]
>>> >>> Signed-off-by: Vivek Gautam 
>>> >>> Reviewed-by: Tomasz Figa 
>>> >>> ---
>>> >>>
>>> >>>  - No change since v11.
>>> >>>
>>> >>>  drivers/iommu/arm-smmu.c | 60 
>>> >>> ++--
>>> >>>  1 file changed, 58 insertions(+), 2 deletions(-)
>>> >>>
>>> >>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>>> >>> index f7a96bcf94a6..a01d0dde21dd 100644
>>> >>> --- a/drivers/iommu/arm-smmu.c
>>> >>> +++ b/drivers/iommu/arm-smmu.c
>>> >>> @@ -48,6 +48,7 @@
>>> >>>  #include 
>>> >>>  #include 
>>> >>>  #include 
>>> >>> +#include 
>>> >>>  #include 
>>> >>>  #include 
>>> >>>
>>> >>> @@ -205,6 +206,8 @@ struct arm_smmu_device {
>>> >>>   u32 num_global_irqs;
>>> >>>   u32 num_context_irqs;
>>> >>>   unsigned int*irqs;
>>> >>> + struct clk_bulk_data*clks;
>>> >>> + int num_clks;
>>> >>>
>>> >>>   u32 cavium_id_base; /* Specific to 
>>> >>> Cavium */
>>> >>>
>>> >>> @@ -1897,10 +1900,12 @@ static int arm_smmu_device_cfg_probe(struct 
>>> >>> arm_smmu_device *smmu)
>>> >>>  struct arm_smmu_match_data {
>>> >>>   enum arm_smmu_arch_version version;
>>> >>>   enum arm_smmu_implementation model;
>>> >>> + const char * const *clks;
>>> >>> + int num_clks;
>>> >>>  };
>>> >>>
>>> >>>  #define ARM_SMMU_MATCH_DATA(name, ver, imp)  \
>>> >>> -static struct arm_smmu_match_data name = { .version = ver, .model = 
>>> >>> imp }
>>> >>> +static const struct arm_smmu_match_data name = { .version = ver, 
>>> >>> .model = imp }
>>> >>>
>>> >>>  ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>>> >>>  ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>>> >>> @@ -1919,6 +1924,23 @@ static const struct of_device_id 
>>> >>> arm_smmu_of_match[] = {
>>> >>>  };
>>> >>>  MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
>>> >>>
>>> >>> +static void arm_smmu_fill_clk_data(struct arm_smmu_device *smmu,
>>> >>> +const char * const *clks)
>>> >>> +{
>>> >>> + int i;
>>> >>> +
>>> >>> + if (smmu->num_clks < 1)
>>> >>> + return;
>>> >>> +
>>> >>> + smmu->clks = devm_kcalloc(smmu->dev, smmu->num_clks,
>>> >>> +   sizeof(*smmu->clks), GFP_KERNEL);
>>> >>> + if (!smmu->clks)
>>> >>> + return;
>>> >>> +
>>> >>> + for (i = 0; i < smmu->num_clks; i++)
>>> >>> + smmu->clks[i].id = clks[i];
>>> >>> +}
>>> >>> +
>>> >>>  #ifdef CONFIG_ACPI
>>> >>>  static int acpi_smmu_get_data(u32 model, struct arm_smmu_device *smmu)
>>> >>>  {
>>> >>> @@ -2001,6 +2023,9 @@ static int arm_smmu_device_dt_probe(struct 
>>> >>> platform_device *pdev,
>>> >>>   data = of_device_get_match_data(dev);
>>> >>>   smmu->version = data->version;
>>> >>>   smmu->model = data->model;
>>> >>> + smmu->num_clks = data->num_clks;
>>> >>> +
>>> >>> + arm_smmu_fill_clk_data(smmu, data->clks);
>>> >>>
>>> >>>   parse_driver_options(smmu);
>>> >>>
>>> >>> @@ -2099,6 +2124,14 @@ static int arm_smmu_device_probe(struct 
>>> >>> platform_device *pdev)
>>> >>>   smmu->irqs[i] = irq;
>>> >>>   }
>>> >>>
>>> >>> + err = devm_clk_bulk_get(smmu->dev, smmu->num_clks, smmu->clks);
>>> >>> + if (err)
>>> >>> + return err;
>>> >>> +
>>> >>> + err = clk_bulk_prepare(smmu->num_clks, smmu->clks);
>>> >>> + if (err)
>>> >>> + return err;
>>> >>> +
>>> >>>   err = arm_smmu_device_cfg_probe(smmu);
>>> >>>   if (err)
>>> >>>   return err;
>>> >>> @@ -2181,6 +2214,9 @@ static int arm_smmu_device_remove(struct 
>>> >>> platform_device *pdev)
>>> >>>
>>> >>>   /* Turn the thing off */
>>> >>>

Re: [PATCH 1/1] Revert "iommu/vt-d: Clean up pasid quirk for pre-production devices"

2018-07-16 Thread Zhenyu Wang
On 2018.07.16 14:02:12 +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> The graphic guys are looking forward to having this in 4.18.
> Is it possible to take it in the following rcs?
> 

This breakes intel gfx driver in 4.18 when gfx dmar is on.
Please include this fix ASAP.

Tested-by: Zhenyu Wang 

thanks!

> 
> On 07/08/2018 02:23 PM, Lu Baolu wrote:
> > This reverts commit ab96746aaa344fb720a198245a837e266fad3b62.
> >
> > The commit ab96746aaa34 ("iommu/vt-d: Clean up pasid quirk for
> > pre-production devices") triggers ECS mode on some platforms
> > which have broken ECS support. As the result, graphic device
> > will be inoperable on boot.
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107017
> >
> > Cc: Ashok Raj 
> > Signed-off-by: Lu Baolu 
> > ---
> >  drivers/iommu/intel-iommu.c | 32 ++--
> >  include/linux/intel-iommu.h |  1 +
> >  2 files changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index b344a88..115ff26 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -484,14 +484,37 @@ static int dmar_forcedac;
> >  static int intel_iommu_strict;
> >  static int intel_iommu_superpage = 1;
> >  static int intel_iommu_ecs = 1;
> > +static int intel_iommu_pasid28;
> >  static int iommu_identity_mapping;
> >  
> >  #define IDENTMAP_ALL   1
> >  #define IDENTMAP_GFX   2
> >  #define IDENTMAP_AZALIA4
> >  
> > -#define ecs_enabled(iommu) (intel_iommu_ecs && ecap_ecs(iommu->ecap))
> > -#define pasid_enabled(iommu)   (ecs_enabled(iommu) && 
> > ecap_pasid(iommu->ecap))
> > +/* Broadwell and Skylake have broken ECS support — normal so-called "second
> > + * level" translation of DMA requests-without-PASID doesn't actually happen
> > + * unless you also set the NESTE bit in an extended context-entry. Which of
> > + * course means that SVM doesn't work because it's trying to do nested
> > + * translation of the physical addresses it finds in the process page 
> > tables,
> > + * through the IOVA->phys mapping found in the "second level" page tables.
> > + *
> > + * The VT-d specification was retroactively changed to change the 
> > definition
> > + * of the capability bits and pretend that Broadwell/Skylake never 
> > happened...
> > + * but unfortunately the wrong bit was changed. It's ECS which is broken, 
> > but
> > + * for some reason it was the PASID capability bit which was redefined 
> > (from
> > + * bit 28 on BDW/SKL to bit 40 in future).
> > + *
> > + * So our test for ECS needs to eschew those implementations which set the 
> > old
> > + * PASID capabiity bit 28, since those are the ones on which ECS is broken.
> > + * Unless we are working around the 'pasid28' limitations, that is, by 
> > putting
> > + * the device into passthrough mode for normal DMA and thus masking the 
> > bug.
> > + */
> > +#define ecs_enabled(iommu) (intel_iommu_ecs && ecap_ecs(iommu->ecap) && \
> > +   (intel_iommu_pasid28 || 
> > !ecap_broken_pasid(iommu->ecap)))
> > +/* PASID support is thus enabled if ECS is enabled and *either* of the old
> > + * or new capability bits are set. */
> > +#define pasid_enabled(iommu) (ecs_enabled(iommu) &&
> > \
> > + (ecap_pasid(iommu->ecap) || 
> > ecap_broken_pasid(iommu->ecap)))
> >  
> >  int intel_iommu_gfx_mapped;
> >  EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
> > @@ -554,6 +577,11 @@ static int __init intel_iommu_setup(char *str)
> > printk(KERN_INFO
> > "Intel-IOMMU: disable extended context table 
> > support\n");
> > intel_iommu_ecs = 0;
> > +   } else if (!strncmp(str, "pasid28", 7)) {
> > +   printk(KERN_INFO
> > +   "Intel-IOMMU: enable pre-production PASID 
> > support\n");
> > +   intel_iommu_pasid28 = 1;
> > +   iommu_identity_mapping |= IDENTMAP_GFX;
> > } else if (!strncmp(str, "tboot_noforce", 13)) {
> > printk(KERN_INFO
> > "Intel-IOMMU: not forcing on after tboot. This 
> > could expose security risk for tboot\n");
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 1df9401..ef169d6 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -121,6 +121,7 @@
> >  #define ecap_srs(e)((e >> 31) & 0x1)
> >  #define ecap_ers(e)((e >> 30) & 0x1)
> >  #define ecap_prs(e)((e >> 29) & 0x1)
> > +#define ecap_broken_pasid(e)   ((e >> 28) & 0x1)
> >  #define ecap_dis(e)((e >> 27) & 0x1)
> >  #define ecap_nest(e)   ((e >> 26) & 0x1)
> >  #define ecap_mts(e)((e >> 25) & 0x1)
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net 

Re: [PATCH 1/2] mm/cma: remove unsupported gfp_mask parameter from cma_alloc()

2018-07-16 Thread Vlastimil Babka
On 07/09/2018 02:19 PM, Marek Szyprowski wrote:
> cma_alloc() function doesn't really support gfp flags other than
> __GFP_NOWARN, so convert gfp_mask parameter to boolean no_warn parameter.
> 
> This will help to avoid giving false feeling that this function supports
> standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
> what has already been an issue: see commit dd65a941f6ba ("arm64:
> dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
> 
> Signed-off-by: Marek Szyprowski 

Acked-by: Vlastimil Babka 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] dma: remove unsupported gfp_mask parameter from dma_alloc_from_contiguous()

2018-07-16 Thread Vlastimil Babka
On 07/09/2018 02:19 PM, Marek Szyprowski wrote:
> The CMA memory allocator doesn't support standard gfp flags for memory
> allocation, so there is no point having it as a parameter for
> dma_alloc_from_contiguous() function. Replace it by a boolean no_warn
> argument, which covers all the underlaying cma_alloc() function supports.
> 
> This will help to avoid giving false feeling that this function supports
> standard gfp flags and callers can pass __GFP_ZERO to get zeroed buffer,
> what has already been an issue: see commit dd65a941f6ba ("arm64:
> dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag").
> 
> Signed-off-by: Marek Szyprowski 

Acked-by: Vlastimil Babka 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/10] iommu/vt-d: Remove deferred invalidation

2018-07-16 Thread Lu Baolu
Deferred invalidation is an ECS specific feature. It will not be
supported when IOMMU works in scalable mode. As we deprecated the
ECS support, remove deferred invalidation and cleanup the code.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c |  1 -
 drivers/iommu/intel-svm.c   | 45 -
 include/linux/intel-iommu.h |  8 
 3 files changed, 54 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 88ec860..0b0209e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1698,7 +1698,6 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (pasid_supported(iommu)) {
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
-   intel_svm_exit(iommu);
}
 #endif
 }
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a16a421..da16a74 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -30,15 +30,8 @@
 
 static irqreturn_t prq_event_thread(int irq, void *d);
 
-struct pasid_state_entry {
-   u64 val;
-};
-
 int intel_svm_init(struct intel_iommu *iommu)
 {
-   struct page *pages;
-   int order;
-
if (cpu_feature_enabled(X86_FEATURE_GBPAGES) &&
!cap_fl1gp_support(iommu->cap))
return -EINVAL;
@@ -47,39 +40,6 @@ int intel_svm_init(struct intel_iommu *iommu)
!cap_5lp_support(iommu->cap))
return -EINVAL;
 
-   /* Start at 2 because it's defined as 2^(1+PSS) */
-   iommu->pasid_max = 2 << ecap_pss(iommu->ecap);
-
-   /* Eventually I'm promised we will get a multi-level PASID table
-* and it won't have to be physically contiguous. Until then,
-* limit the size because 8MiB contiguous allocations can be hard
-* to come by. The limit of 0x2, which is 1MiB for each of
-* the PASID and PASID-state tables, is somewhat arbitrary. */
-   if (iommu->pasid_max > 0x2)
-   iommu->pasid_max = 0x2;
-
-   order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
-   if (ecap_dis(iommu->ecap)) {
-   pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
-   if (pages)
-   iommu->pasid_state_table = page_address(pages);
-   else
-   pr_warn("IOMMU: %s: Failed to allocate PASID state 
table\n",
-   iommu->name);
-   }
-
-   return 0;
-}
-
-int intel_svm_exit(struct intel_iommu *iommu)
-{
-   int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
-
-   if (iommu->pasid_state_table) {
-   free_pages((unsigned long)iommu->pasid_state_table, order);
-   iommu->pasid_state_table = NULL;
-   }
-
return 0;
 }
 
@@ -197,11 +157,6 @@ static void intel_flush_svm_range(struct intel_svm *svm, 
unsigned long address,
 {
struct intel_svm_dev *sdev;
 
-   /* Try deferred invalidate if available */
-   if (svm->iommu->pasid_state_table &&
-   !cmpxchg64(>iommu->pasid_state_table[svm->pasid].val, 0, 1ULL 
<< 63))
-   return;
-
rcu_read_lock();
list_for_each_entry_rcu(sdev, >devs, list)
intel_flush_svm_range_dev(svm, sdev, address, pages, ih, gl);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4b58946..9fbd1a7 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -451,15 +451,8 @@ struct intel_iommu {
struct iommu_flush flush;
 #endif
 #ifdef CONFIG_INTEL_IOMMU_SVM
-   /* These are large and need to be contiguous, so we allocate just
-* one for now. We'll maybe want to rethink that if we truly give
-* devices away to userspace processes (e.g. for DPDK) and don't
-* want to trust that userspace will use *only* the PASID it was
-* told to. But while it's all driver-arbitrated, we're fine. */
-   struct pasid_state_entry *pasid_state_table;
struct page_req_dsc *prq;
unsigned char prq_name[16];/* Name for PRQ interrupt */
-   u32 pasid_max;
 #endif
struct q_inval  *qi;/* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
@@ -573,7 +566,6 @@ int for_each_device_domain(int (*fn)(struct 
device_domain_info *info,
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
 int intel_svm_init(struct intel_iommu *iommu);
-int intel_svm_exit(struct intel_iommu *iommu);
 extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/10] iommu/vt-d: Shared virtual address in scalable mode

2018-07-16 Thread Lu Baolu
This patch enables the current SVA (Shared Virtual Address)
implementation to work in the scalable mode.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c   | 38 --
 drivers/iommu/intel-svm.c | 24 ++--
 include/linux/dma_remapping.h |  9 +
 3 files changed, 3 insertions(+), 68 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 13f3d17..88ec860 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5228,18 +5228,6 @@ static void intel_iommu_put_resv_regions(struct device 
*dev,
 }
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
-static inline unsigned long intel_iommu_get_pts(struct device *dev)
-{
-   int pts, max_pasid;
-
-   max_pasid = intel_pasid_get_dev_max_id(dev);
-   pts = find_first_bit((unsigned long *)_pasid, MAX_NR_PASID_BITS);
-   if (pts < 5)
-   return 0;
-
-   return pts - 5;
-}
-
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev 
*sdev)
 {
struct device_domain_info *info;
@@ -5271,33 +5259,7 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, 
struct intel_svm_dev *sd
sdev->sid = PCI_DEVID(info->bus, info->devfn);
 
if (!(ctx_lo & CONTEXT_PASIDE)) {
-   if (iommu->pasid_state_table)
-   context[1].hi = 
(u64)virt_to_phys(iommu->pasid_state_table);
-   context[1].lo = (u64)virt_to_phys(info->pasid_table->table) |
-   intel_iommu_get_pts(sdev->dev);
-
-   wmb();
-   /* CONTEXT_TT_MULTI_LEVEL and CONTEXT_TT_DEV_IOTLB are both
-* extended to permit requests-with-PASID if the PASIDE bit
-* is set. which makes sense. For CONTEXT_TT_PASS_THROUGH,
-* however, the PASIDE bit is ignored and requests-with-PASID
-* are unconditionally blocked. Which makes less sense.
-* So convert from CONTEXT_TT_PASS_THROUGH to one of the new
-* "guest mode" translation types depending on whether ATS
-* is available or not. Annoyingly, we can't use the new
-* modes *unless* PASIDE is set. */
-   if ((ctx_lo & CONTEXT_TT_MASK) == (CONTEXT_TT_PASS_THROUGH << 
2)) {
-   ctx_lo &= ~CONTEXT_TT_MASK;
-   if (info->ats_supported)
-   ctx_lo |= CONTEXT_TT_PT_PASID_DEV_IOTLB << 2;
-   else
-   ctx_lo |= CONTEXT_TT_PT_PASID << 2;
-   }
ctx_lo |= CONTEXT_PASIDE;
-   if (iommu->pasid_state_table)
-   ctx_lo |= CONTEXT_DINVE;
-   if (info->pri_supported)
-   ctx_lo |= CONTEXT_PRS;
context[0].lo = ctx_lo;
wmb();
iommu->flush.flush_context(iommu, sdev->did, sdev->sid,
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8d4a911..a16a421 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -28,9 +28,6 @@
 
 #include "intel-pasid.h"
 
-#define PASID_ENTRY_P  BIT_ULL(0)
-#define PASID_ENTRY_SREBIT_ULL(11)
-
 static irqreturn_t prq_event_thread(int irq, void *d);
 
 struct pasid_state_entry {
@@ -280,11 +277,9 @@ static LIST_HEAD(global_svm_list);
 int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct 
svm_dev_ops *ops)
 {
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
-   struct pasid_entry *entry;
struct intel_svm_dev *sdev;
struct intel_svm *svm = NULL;
struct mm_struct *mm = NULL;
-   u64 pasid_entry_val;
int pasid_max;
int ret;
 
@@ -393,23 +388,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
kfree(sdev);
goto out;
}
-   pasid_entry_val = (u64)__pa(mm->pgd) | PASID_ENTRY_P;
-   } else
-   pasid_entry_val = (u64)__pa(init_mm.pgd) |
- PASID_ENTRY_P | PASID_ENTRY_SRE;
-   if (cpu_feature_enabled(X86_FEATURE_LA57))
-   pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
-
-   entry = intel_pasid_get_entry(dev, svm->pasid);
-   WRITE_ONCE(entry->val[0], pasid_entry_val);
-
-   /*
-* Flush PASID cache when a PASID table entry becomes
-* present.
-*/
-   if (cap_caching_mode(iommu->cap))
-   intel_flush_pasid_dev(svm, sdev, svm->pasid);
-
+   }
+   intel_pasid_setup_first_level(iommu, mm, dev, svm->pasid);
   

[PATCH 07/10] iommu/vt-d: Setup context and enable RID2PASID support

2018-07-16 Thread Lu Baolu
This patch enables the translation for requests without PASID in
the scalable mode by setting up the root and context entries.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c | 109 +---
 drivers/iommu/intel-pasid.h |   1 +
 include/linux/intel-iommu.h |   1 +
 3 files changed, 95 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index add7e3e..13f3d17 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1217,6 +1217,8 @@ static void iommu_set_root_entry(struct intel_iommu 
*iommu)
unsigned long flag;
 
addr = virt_to_phys(iommu->root_entry);
+   if (sm_supported(iommu))
+   addr |= DMA_RTADDR_SMT;
 
raw_spin_lock_irqsave(>register_lock, flag);
dmar_writeq(iommu->reg + DMAR_RTADDR_REG, addr);
@@ -1916,6 +1918,55 @@ static void domain_exit(struct dmar_domain *domain)
free_domain_mem(domain);
 }
 
+/*
+ * Get the PASID directory size for scalable mode context entry.
+ * Value of X in the PDTS field of a scalable mode context entry
+ * indicates PASID directory with 2^(X + 7) entries.
+ */
+static inline unsigned long context_get_sm_pds(struct pasid_table *table)
+{
+   int pds, max_pde;
+
+   max_pde = table->max_pasid >> PASID_PDE_SHIFT;
+   pds = find_first_bit((unsigned long *)_pde, MAX_NR_PASID_BITS);
+   if (pds < 7)
+   return 0;
+
+   return pds - 7;
+}
+
+/*
+ * Set the RID_PASID field of a scalable mode context entry. The
+ * IOMMU hardware will use the PASID value set in this field for
+ * DMA translations of DMA requests without PASID.
+ */
+static inline void
+context_set_sm_rid2pasid(struct context_entry *context, unsigned long pasid)
+{
+   context->hi |= pasid & ((1 << 20) - 1);
+}
+
+/*
+ * Set the DTE(Device-TLB Enable) field of a scalable mode context
+ * entry.
+ */
+static inline void context_set_sm_dte(struct context_entry *context)
+{
+   context->lo |= (1 << 2);
+}
+
+/*
+ * Set the PRE(Page Request Enable) field of a scalable mode context
+ * entry.
+ */
+static inline void context_set_sm_pre(struct context_entry *context)
+{
+   context->lo |= (1 << 4);
+}
+
+/* Convert value to context PASID directory size field coding. */
+#define context_pdts(pds)  (((pds) & 0x7) << 9)
+
 static int domain_context_mapping_one(struct dmar_domain *domain,
  struct intel_iommu *iommu,
  struct pasid_table *table,
@@ -1974,9 +2025,7 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
}
 
pgd = domain->pgd;
-
context_clear_entry(context);
-   context_set_domain_id(context, did);
 
/*
 * Skip top levels of page tables for iommu which has less agaw
@@ -1989,25 +2038,54 @@ static int domain_context_mapping_one(struct 
dmar_domain *domain,
if (!dma_pte_present(pgd))
goto out_unlock;
}
+   }
 
-   info = iommu_support_dev_iotlb(domain, iommu, bus, devfn);
-   if (info && info->ats_supported)
-   translation = CONTEXT_TT_DEV_IOTLB;
-   else
-   translation = CONTEXT_TT_MULTI_LEVEL;
+   if (sm_supported(iommu)) {
+   unsigned long pds;
+
+   WARN_ON(!table);
+
+   /* Setup the PASID DIR pointer: */
+   pds = context_get_sm_pds(table);
+   context->lo = (u64)virt_to_phys(table->table) |
+   context_pdts(pds);
+
+   /* Setup the RID_PASID field: */
+   context_set_sm_rid2pasid(context, PASID_RID2PASID);
 
-   context_set_address_root(context, virt_to_phys(pgd));
-   context_set_address_width(context, iommu->agaw);
-   } else {
/*
-* In pass through mode, AW must be programmed to
-* indicate the largest AGAW value supported by
-* hardware. And ASR is ignored by hardware.
+* Setup the Device-TLB enable bit and Page request
+* Enable bit:
 */
-   context_set_address_width(context, iommu->msagaw);
+   info = iommu_support_dev_iotlb(domain, iommu, bus, devfn);
+   if (info && info->ats_supported)
+   context_set_sm_dte(context);
+   if (info && info->pri_supported)
+   context_set_sm_pre(context);
+   } else {
+   context_set_domain_id(context, did);
+
+   if (translation != CONTEXT_TT_PASS_THROUGH) {
+   info = iommu_support_dev_iotlb(domain, iommu,
+  bus, 

[PATCH 04/10] iommu/vt-d: Add second level page table interface

2018-07-16 Thread Lu Baolu
This adds an interface to setup the structures for second
level page table translation type. This includes the types
of second level translation only and pass through.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-pasid.c | 158 
 drivers/iommu/intel-pasid.h |   4 ++
 include/linux/intel-iommu.h |   1 +
 3 files changed, 163 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index d6e90cd..da504576 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -9,6 +9,7 @@
 
 #define pr_fmt(fmt)"DMAR: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -291,3 +292,160 @@ void intel_pasid_clear_entry(struct device *dev, int 
pasid)
 
pasid_clear_entry(pe);
 }
+
+static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
+{
+   u64 old;
+
+   old = READ_ONCE(*ptr);
+   WRITE_ONCE(*ptr, (old & ~mask) | bits);
+}
+
+/*
+ * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
+ * PASID entry.
+ */
+static inline void
+pasid_set_domain_id(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[1], GENMASK_ULL(15, 0), value);
+}
+
+/*
+ * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_address_root(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[0], VTD_PAGE_MASK, value);
+}
+
+/*
+ * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID
+ * entry.
+ */
+static inline void
+pasid_set_address_width(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[0], GENMASK_ULL(4, 2), value << 2);
+}
+
+/*
+ * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_translation_type(struct pasid_entry *pe, u64 value)
+{
+   pasid_set_bits(>val[0], GENMASK_ULL(8, 6), value << 6);
+}
+
+/*
+ * Enable fault processing by clearing the FPD(Fault Processing
+ * Disable) field (Bit 1) of a scalable mode PASID entry.
+ */
+static inline void pasid_set_fault_enable(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[0], 1 << 1, 0);
+}
+
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 0, 1);
+}
+
+/*
+ * Setup the P(Present) field (Bit 0) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_present(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[0], 1 << 0, 1);
+}
+
+/*
+ * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
+{
+   pasid_set_bits(>val[1], 1 << 23, value);
+}
+
+static inline void
+flush_pasid_cache(struct intel_iommu *iommu, int did, int pasid)
+{
+   struct qi_desc desc;
+
+   desc.high = 0;
+   desc.low = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+
+   qi_submit_sync(, iommu);
+}
+
+/*
+ * Set up the scalable mode pasid table entry for second only or
+ * passthrough translation type.
+ */
+void intel_pasid_setup_second_level(struct intel_iommu *iommu,
+   struct dmar_domain *domain,
+   struct device *dev, int pasid,
+   bool pass_through)
+{
+   u16 did = domain->iommu_did[iommu->seq_id];
+   struct pasid_entry *pte;
+   struct dma_pte *pgd;
+   u64 pgd_val;
+   int agaw;
+
+   /*
+* Skip top levels of page tables for iommu which has less agaw
+* than default.  Unnecessary for PT mode.
+*/
+   pgd = domain->pgd;
+   if (!pass_through) {
+   for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+   pgd = phys_to_virt(dma_pte_addr(pgd));
+   if (!dma_pte_present(pgd)) {
+   dev_err(dev, "Invalid domain page table\n");
+   return;
+   }
+   }
+   }
+   pgd_val = pass_through ? 0 : virt_to_phys(pgd);
+
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (!pte) {
+   dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
+   return;
+   }
+
+   pasid_clear_entry(pte);
+   pasid_set_domain_id(pte, did);
+
+   if (!pass_through)
+   pasid_set_address_root(pte, pgd_val);
+
+   pasid_set_address_width(pte, iommu->agaw);
+   pasid_set_translation_type(pte, pass_through ? 4 : 2);
+   pasid_set_fault_enable(pte);
+   pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+   /*
+* Since it is a second level only translation setup, we should

[PATCH 03/10] iommu/vt-d: Move page table helpers into header

2018-07-16 Thread Lu Baolu
So that they could also be used in other source files.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c | 43 ---
 include/linux/intel-iommu.h | 43 +++
 2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f9036c8..a139a45 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -315,49 +315,6 @@ static inline void context_clear_entry(struct 
context_entry *context)
 }
 
 /*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-10: available
- * 11: snoop behavior
- * 12-63: Host physcial address
- */
-struct dma_pte {
-   u64 val;
-};
-
-static inline void dma_clear_pte(struct dma_pte *pte)
-{
-   pte->val = 0;
-}
-
-static inline u64 dma_pte_addr(struct dma_pte *pte)
-{
-#ifdef CONFIG_64BIT
-   return pte->val & VTD_PAGE_MASK;
-#else
-   /* Must have a full atomic 64-bit read */
-   return  __cmpxchg64(>val, 0ULL, 0ULL) & VTD_PAGE_MASK;
-#endif
-}
-
-static inline bool dma_pte_present(struct dma_pte *pte)
-{
-   return (pte->val & 3) != 0;
-}
-
-static inline bool dma_pte_superpage(struct dma_pte *pte)
-{
-   return (pte->val & DMA_PTE_LARGE_PAGE);
-}
-
-static inline int first_pte_in_page(struct dma_pte *pte)
-{
-   return !((unsigned long)pte & ~VTD_PAGE_MASK);
-}
-
-/*
  * This domain is a statically identity mapping domain.
  * 1. This domain creats a static 1:1 mapping to all usable memory.
  * 2. It maps to each iommu if successful.
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4124cd9..7818b1c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
 }
 
+/*
+ * 0: readable
+ * 1: writable
+ * 2-6: reserved
+ * 7: super page
+ * 8-10: available
+ * 11: snoop behavior
+ * 12-63: Host physcial address
+ */
+struct dma_pte {
+   u64 val;
+};
+
+static inline void dma_clear_pte(struct dma_pte *pte)
+{
+   pte->val = 0;
+}
+
+static inline u64 dma_pte_addr(struct dma_pte *pte)
+{
+#ifdef CONFIG_64BIT
+   return pte->val & VTD_PAGE_MASK;
+#else
+   /* Must have a full atomic 64-bit read */
+   return  __cmpxchg64(>val, 0ULL, 0ULL) & VTD_PAGE_MASK;
+#endif
+}
+
+static inline bool dma_pte_present(struct dma_pte *pte)
+{
+   return (pte->val & 3) != 0;
+}
+
+static inline bool dma_pte_superpage(struct dma_pte *pte)
+{
+   return (pte->val & DMA_PTE_LARGE_PAGE);
+}
+
+static inline int first_pte_in_page(struct dma_pte *pte)
+{
+   return !((unsigned long)pte & ~VTD_PAGE_MASK);
+}
+
 extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev 
*dev);
 extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/10] iommu/vt-d: Manage scalalble mode PASID tables

2018-07-16 Thread Lu Baolu
In scalable mode, pasid structure is a two level table with
a pasid directory table and a pasid table. Any pasid entry
can be identified by a pasid value in below way.

   1
   9   6 5  0
.---.---.
|  PASID|   |
'---'---'.-.
 ||  | |
 ||  | |
 ||  | |
 | .---.  |  .-.
 | |   |  |->| PASID Entry |
 | |   |  |  '-'
 | |   |  |Plus  | |
 | .---.  |  | |
 |>| DIR Entry |>| |
 | '---' '-'
.-.  |Plus |   |
| Context |  | |   |
|  Entry  |--->|   |
'-''---'

This changes the pasid table APIs to support scalable mode
PASID directory and PASID table. It also adds a helper to
get the PASID table entry according to the pasid value.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c |  2 +-
 drivers/iommu/intel-pasid.c | 72 +++--
 drivers/iommu/intel-pasid.h | 10 ++-
 drivers/iommu/intel-svm.c   |  6 +---
 4 files changed, 74 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0a7362b..f9036c8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2463,7 +2463,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev)
dev->archdata.iommu = info;
 
-   if (dev && dev_is_pci(dev) && info->pasid_supported) {
+   if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
ret = intel_pasid_alloc_table(dev);
if (ret) {
__dmar_remove_one_dev_info(info);
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index fe95c9b..d6e90cd 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
int ret, order;
 
info = dev->archdata.iommu;
-   if (WARN_ON(!info || !dev_is_pci(dev) ||
-   !info->pasid_supported || info->pasid_table))
+   if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
return -EINVAL;
 
/* DMA alias device already has a pasid table, use it: */
@@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
return -ENOMEM;
INIT_LIST_HEAD(_table->dev);
 
-   size = sizeof(struct pasid_entry);
+   size = sizeof(struct pasid_dir_entry);
count = min_t(int, pci_max_pasids(to_pci_dev(dev)), intel_pasid_max_id);
+   count >>= PASID_PDE_SHIFT;
order = get_order(size * count);
pages = alloc_pages_node(info->iommu->node,
 GFP_ATOMIC | __GFP_ZERO,
@@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
 
pasid_table->table = page_address(pages);
pasid_table->order = order;
-   pasid_table->max_pasid = count;
+   pasid_table->max_pasid = count << PASID_PDE_SHIFT;
 
 attach_out:
device_attach_pasid_table(info, pasid_table);
@@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
return 0;
 }
 
+/* Get PRESENT bit of a PASID directory entry. */
+static inline bool
+pasid_pde_is_present(struct pasid_dir_entry *pde)
+{
+   return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
+}
+
+/* Get PASID table from a PASID directory entry. */
+static inline struct pasid_entry *
+get_pasid_table_from_pde(struct pasid_dir_entry *pde)
+{
+   if (!pasid_pde_is_present(pde))
+   return NULL;
+
+   return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK);
+}
+
 void intel_pasid_free_table(struct device *dev)
 {
struct device_domain_info *info;
struct pasid_table *pasid_table;
+   struct pasid_dir_entry *dir;
+   struct pasid_entry *table;
+   int i, max_pde;
 
info = dev->archdata.iommu;
-   if (!info || !dev_is_pci(dev) ||
-   !info->pasid_supported || !info->pasid_table)
+   if (!info || !dev_is_pci(dev) || !info->pasid_table)
return;
 
pasid_table = info->pasid_table;
@@ -178,6 +197,14 @@ void intel_pasid_free_table(struct device *dev)
if (!list_empty(_table->dev))
return;
 
+   /* Free scalable mode PASID directory tables: */
+   dir = pasid_table->table;
+   max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
+   for (i = 0; i < max_pde; i++) 

[PATCH 01/10] iommu/vt-d: Enumerate the scalable mode capability

2018-07-16 Thread Lu Baolu
The Intel vt-d spec rev3.0 introduces a new translation
mode called scalable mode, which enables PASID-granular
translations for first level, second level, nested and
pass-through modes. At the same time, the previous
Extended Context (ECS) mode is deprecated (no production
ever implements ECS).

This patch adds enumeration for Scalable Mode and removes
the deprecated ECS enumeration. It provides a boot time
option to disable scalable mode in case it's required as
a chicken bit option.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Signed-off-by: Sanjay Kumar 
Signed-off-by: Lu Baolu 
Reviewed-by: Ashok Raj 
---
 drivers/iommu/intel-iommu.c | 35 ---
 include/linux/intel-iommu.h |  1 +
 2 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fed67c6..0a7362b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -429,15 +429,15 @@ static int dmar_map_gfx = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
-static int intel_iommu_ecs = 1;
+static int intel_iommu_sm = 1;
 static int iommu_identity_mapping;
 
 #define IDENTMAP_ALL   1
 #define IDENTMAP_GFX   2
 #define IDENTMAP_AZALIA4
 
-#define ecs_enabled(iommu) (intel_iommu_ecs && ecap_ecs(iommu->ecap))
-#define pasid_enabled(iommu)   (ecs_enabled(iommu) && ecap_pasid(iommu->ecap))
+#define sm_supported(iu)   (intel_iommu_sm && ecap_smts((iu)->ecap))
+#define pasid_supported(iu)(sm_supported(iu) && ecap_pasid((iu)->ecap))
 
 int intel_iommu_gfx_mapped;
 EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
@@ -517,10 +517,9 @@ static int __init intel_iommu_setup(char *str)
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
-   } else if (!strncmp(str, "ecs_off", 7)) {
-   printk(KERN_INFO
-   "Intel-IOMMU: disable extended context table 
support\n");
-   intel_iommu_ecs = 0;
+   } else if (!strncmp(str, "sm_off", 6)) {
+   pr_info("Intel-IOMMU: disable scalable mode support\n");
+   intel_iommu_sm = 0;
} else if (!strncmp(str, "tboot_noforce", 13)) {
printk(KERN_INFO
"Intel-IOMMU: not forcing on after tboot. This 
could expose security risk for tboot\n");
@@ -767,7 +766,7 @@ static inline struct context_entry 
*iommu_context_addr(struct intel_iommu *iommu
u64 *entry;
 
entry = >lo;
-   if (ecs_enabled(iommu)) {
+   if (sm_supported(iommu)) {
if (devfn >= 0x80) {
devfn -= 0x80;
entry = >hi;
@@ -909,7 +908,7 @@ static void free_context_table(struct intel_iommu *iommu)
if (context)
free_pgtable_page(context);
 
-   if (!ecs_enabled(iommu))
+   if (!sm_supported(iommu))
continue;
 
context = iommu_context_addr(iommu, i, 0x80, 0);
@@ -1261,8 +1260,6 @@ static void iommu_set_root_entry(struct intel_iommu 
*iommu)
unsigned long flag;
 
addr = virt_to_phys(iommu->root_entry);
-   if (ecs_enabled(iommu))
-   addr |= DMA_RTADDR_RTT;
 
raw_spin_lock_irqsave(>register_lock, flag);
dmar_writeq(iommu->reg + DMAR_RTADDR_REG, addr);
@@ -1739,7 +1736,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
free_context_table(iommu);
 
 #ifdef CONFIG_INTEL_IOMMU_SVM
-   if (pasid_enabled(iommu)) {
+   if (pasid_supported(iommu)) {
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
intel_svm_exit(iommu);
@@ -2418,8 +2415,8 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
dmar_find_matched_atsr_unit(pdev))
info->ats_supported = 1;
 
-   if (ecs_enabled(iommu)) {
-   if (pasid_enabled(iommu)) {
+   if (sm_supported(iommu)) {
+   if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
if (features >= 0)
info->pasid_supported = features | 1;
@@ -3231,7 +3228,7 @@ static int __init init_dmars(void)
 * We need to ensure the system pasid table is no bigger
 * than the smallest supported.
 */
-   if (pasid_enabled(iommu)) {
+   if (pasid_supported(iommu)) {
u32 temp = 2 << ecap_pss(iommu->ecap);
 
intel_pasid_max_id = min_t(u32, temp,
@@ -3292,7 

[PATCH 00/10] iommu/vt-d: Add scalable mode support

2018-07-16 Thread Lu Baolu
Hi,

Intel vt-d rev3.0 [1] introduces a new translation mode called
'scalable mode', which enables PASID-granular translations for
first level, second level, nested and pass-through modes. The
vt-d scalable mode is the key ingredient to enable Scalable I/O
Virtualization (Scalable IOV) [2] [3], which allows sharing a
device in minimal possible granularity (ADI - Assignable Device
Interface). It also includes all the capabilities required to
enable Shared Virtual Addressing (SVA). As a result, previous
Extended Context (ECS) mode is deprecated (no production ever
implements ECS).

Each scalable mode pasid table entry is 64 bytes in length, with
fields point to the first level page table and the second level
page table. The PGTT (Pasid Granular Translation Type) field is
used by hardware to determine the translation type.


  A Scalable Mode.-.
   PASID Entry .-| |
   .--.  .-| | 1st Level   |
  7|  |  | | | Page Table  |
   .--.  | | | |
  6|  |  | | | |
   '--'  | | '-'
  5|  |  | '-'
   '--'  '-'
  4|  |^
   '--'   /
  3|  |  /   .-.
   ..---.-. /  .-| |
  2|| FLPTR | |/ .-| | 2nd Level   |
   .'---'-.  | | | Page Table  |
  1|  |  | | | |
   .-.---..--..  | | | |
  0| | SLPTR || PGTT ||> | | '-'
   '-'---''--''  | '-'
   6 |0  '-'
   3 v
 ..
 | PASID Granular Translation Type|
 ||
 | 001b: 1st level translation only   |
 | 101b: 2nd level translation only   |
 | 011b: Nested translation   |
 | 100b: Pass through |
 ''

This patch series adds the scalable mode support in the Intel
IOMMU driver. It will make all the Intel IOMMU features work
in scalable mode. The changes are all constrained within the
Intel IOMMU driver, as it's purely internal format change.

This patch series depends on a patch set titled ("iommu/vt-d:
Improve PASID id and table management") post here [4] which
implements global pasid namespace and per-device pasid table
APIs.

References:
[1] 
https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] 
https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[4] https://lkml.org/lkml/2018/7/14/69

Best regards,
Lu Baolu

Lu Baolu (10):
  iommu/vt-d: Enumerate the scalable mode capability
  iommu/vt-d: Manage scalalble mode PASID tables
  iommu/vt-d: Move page table helpers into header
  iommu/vt-d: Add second level page table interface
  iommu/vt-d: Setup pasid entry for RID2PASID support
  iommu/vt-d: Pass pasid table to context mapping
  iommu/vt-d: Setup context and enable RID2PASID support
  iommu/vt-d: Add first level page table interface
  iommu/vt-d: Shared virtual address in scalable mode
  iommu/vt-d: Remove deferred invalidation

 drivers/iommu/intel-iommu.c   | 252 +++-
 drivers/iommu/intel-pasid.c   | 295 --
 drivers/iommu/intel-pasid.h   |  20 ++-
 drivers/iommu/intel-svm.c |  74 +--
 include/linux/dma_remapping.h |   9 +-
 include/linux/intel-iommu.h   |  54 ++--
 6 files changed, 485 insertions(+), 219 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/1] Revert "iommu/vt-d: Clean up pasid quirk for pre-production devices"

2018-07-16 Thread Lu Baolu
Hi Joerg,

The graphic guys are looking forward to having this in 4.18.
Is it possible to take it in the following rcs?

Best regards,
Lu Baolu

On 07/08/2018 02:23 PM, Lu Baolu wrote:
> This reverts commit ab96746aaa344fb720a198245a837e266fad3b62.
>
> The commit ab96746aaa34 ("iommu/vt-d: Clean up pasid quirk for
> pre-production devices") triggers ECS mode on some platforms
> which have broken ECS support. As the result, graphic device
> will be inoperable on boot.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107017
>
> Cc: Ashok Raj 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/intel-iommu.c | 32 ++--
>  include/linux/intel-iommu.h |  1 +
>  2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index b344a88..115ff26 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -484,14 +484,37 @@ static int dmar_forcedac;
>  static int intel_iommu_strict;
>  static int intel_iommu_superpage = 1;
>  static int intel_iommu_ecs = 1;
> +static int intel_iommu_pasid28;
>  static int iommu_identity_mapping;
>  
>  #define IDENTMAP_ALL 1
>  #define IDENTMAP_GFX 2
>  #define IDENTMAP_AZALIA  4
>  
> -#define ecs_enabled(iommu)   (intel_iommu_ecs && ecap_ecs(iommu->ecap))
> -#define pasid_enabled(iommu) (ecs_enabled(iommu) && ecap_pasid(iommu->ecap))
> +/* Broadwell and Skylake have broken ECS support — normal so-called "second
> + * level" translation of DMA requests-without-PASID doesn't actually happen
> + * unless you also set the NESTE bit in an extended context-entry. Which of
> + * course means that SVM doesn't work because it's trying to do nested
> + * translation of the physical addresses it finds in the process page tables,
> + * through the IOVA->phys mapping found in the "second level" page tables.
> + *
> + * The VT-d specification was retroactively changed to change the definition
> + * of the capability bits and pretend that Broadwell/Skylake never 
> happened...
> + * but unfortunately the wrong bit was changed. It's ECS which is broken, but
> + * for some reason it was the PASID capability bit which was redefined (from
> + * bit 28 on BDW/SKL to bit 40 in future).
> + *
> + * So our test for ECS needs to eschew those implementations which set the 
> old
> + * PASID capabiity bit 28, since those are the ones on which ECS is broken.
> + * Unless we are working around the 'pasid28' limitations, that is, by 
> putting
> + * the device into passthrough mode for normal DMA and thus masking the bug.
> + */
> +#define ecs_enabled(iommu) (intel_iommu_ecs && ecap_ecs(iommu->ecap) && \
> + (intel_iommu_pasid28 || 
> !ecap_broken_pasid(iommu->ecap)))
> +/* PASID support is thus enabled if ECS is enabled and *either* of the old
> + * or new capability bits are set. */
> +#define pasid_enabled(iommu) (ecs_enabled(iommu) &&  \
> +   (ecap_pasid(iommu->ecap) || 
> ecap_broken_pasid(iommu->ecap)))
>  
>  int intel_iommu_gfx_mapped;
>  EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
> @@ -554,6 +577,11 @@ static int __init intel_iommu_setup(char *str)
>   printk(KERN_INFO
>   "Intel-IOMMU: disable extended context table 
> support\n");
>   intel_iommu_ecs = 0;
> + } else if (!strncmp(str, "pasid28", 7)) {
> + printk(KERN_INFO
> + "Intel-IOMMU: enable pre-production PASID 
> support\n");
> + intel_iommu_pasid28 = 1;
> + iommu_identity_mapping |= IDENTMAP_GFX;
>   } else if (!strncmp(str, "tboot_noforce", 13)) {
>   printk(KERN_INFO
>   "Intel-IOMMU: not forcing on after tboot. This 
> could expose security risk for tboot\n");
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 1df9401..ef169d6 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -121,6 +121,7 @@
>  #define ecap_srs(e)  ((e >> 31) & 0x1)
>  #define ecap_ers(e)  ((e >> 30) & 0x1)
>  #define ecap_prs(e)  ((e >> 29) & 0x1)
> +#define ecap_broken_pasid(e) ((e >> 28) & 0x1)
>  #define ecap_dis(e)  ((e >> 27) & 0x1)
>  #define ecap_nest(e) ((e >> 26) & 0x1)
>  #define ecap_mts(e)  ((e >> 25) & 0x1)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu