Re: [PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables

2016-07-14 Thread nwatters

On 2016-07-14 09:31, Will Deacon wrote:

On Tue, Jul 12, 2016 at 02:19:20PM -0400, Nate Watterson wrote:

In the current arm-smmu-v3 driver, all smmus that support 2-level
stream tables are being forced to use them. This is suboptimal for
smmus that support fewer stream id bits than would fill in a single
second level table. This patch limits the use of 2-level tables to
smmus that both support the feature and whose first level table can
possibly contain more than a single entry.


Just to be clear, what exactly are you seeing as being suboptimal here?
Is it the memory wastage from overallocating the L2 table, or something
more?


Disregarding the config cache, fetching an STE when 2-level tables are
being used will require the hw to perform more memory accesses than it
would have to with a linear table since the L1 descriptor must also be
fetched. Presumably this is why the spec states, "ARM recommends that
a more efficient linear table is used instead of programming SPLIT >
LOG2SIZE".

My understanding is that the only benefit to using 2-level tables is
that it can save space when stream ids are sparsely distributed. Are
there any other compelling reasons to use them?



if it's just the memory allocation, I'd sooner restrict the span field
in the L1 desc.


Although I am not especially concerned about the memory allocation, even
if the span was reduced, we would still be wasting a page for the L1
table unless L1 and L2 tables were allocated in a single 
dmam_alloc_coherent

call.



Will


Nate

--
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative Project.


Re: [PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables

2016-07-14 Thread nwatters

On 2016-07-14 09:31, Will Deacon wrote:

On Tue, Jul 12, 2016 at 02:19:20PM -0400, Nate Watterson wrote:

In the current arm-smmu-v3 driver, all smmus that support 2-level
stream tables are being forced to use them. This is suboptimal for
smmus that support fewer stream id bits than would fill in a single
second level table. This patch limits the use of 2-level tables to
smmus that both support the feature and whose first level table can
possibly contain more than a single entry.


Just to be clear, what exactly are you seeing as being suboptimal here?
Is it the memory wastage from overallocating the L2 table, or something
more?


Disregarding the config cache, fetching an STE when 2-level tables are
being used will require the hw to perform more memory accesses than it
would have to with a linear table since the L1 descriptor must also be
fetched. Presumably this is why the spec states, "ARM recommends that
a more efficient linear table is used instead of programming SPLIT >
LOG2SIZE".

My understanding is that the only benefit to using 2-level tables is
that it can save space when stream ids are sparsely distributed. Are
there any other compelling reasons to use them?



if it's just the memory allocation, I'd sooner restrict the span field
in the L1 desc.


Although I am not especially concerned about the memory allocation, even
if the span was reduced, we would still be wasting a page for the L1
table unless L1 and L2 tables were allocated in a single 
dmam_alloc_coherent

call.



Will


Nate

--
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a 
Linux Foundation Collaborative Project.


Re: [PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables

2016-07-14 Thread Will Deacon
On Tue, Jul 12, 2016 at 02:19:20PM -0400, Nate Watterson wrote:
> In the current arm-smmu-v3 driver, all smmus that support 2-level
> stream tables are being forced to use them. This is suboptimal for
> smmus that support fewer stream id bits than would fill in a single
> second level table. This patch limits the use of 2-level tables to
> smmus that both support the feature and whose first level table can
> possibly contain more than a single entry.

Just to be clear, what exactly are you seeing as being suboptimal here?
Is it the memory wastage from overallocating the L2 table, or something
more?

if it's just the memory allocation, I'd sooner restrict the span field
in the L1 desc.

Will


Re: [PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables

2016-07-14 Thread Will Deacon
On Tue, Jul 12, 2016 at 02:19:20PM -0400, Nate Watterson wrote:
> In the current arm-smmu-v3 driver, all smmus that support 2-level
> stream tables are being forced to use them. This is suboptimal for
> smmus that support fewer stream id bits than would fill in a single
> second level table. This patch limits the use of 2-level tables to
> smmus that both support the feature and whose first level table can
> possibly contain more than a single entry.

Just to be clear, what exactly are you seeing as being suboptimal here?
Is it the memory wastage from overallocating the L2 table, or something
more?

if it's just the memory allocation, I'd sooner restrict the span field
in the L1 desc.

Will


[PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables

2016-07-12 Thread Nate Watterson
In the current arm-smmu-v3 driver, all smmus that support 2-level
stream tables are being forced to use them. This is suboptimal for
smmus that support fewer stream id bits than would fill in a single
second level table. This patch limits the use of 2-level tables to
smmus that both support the feature and whose first level table can
possibly contain more than a single entry.

Signed-off-by: Nate Watterson 
---
 drivers/iommu/arm-smmu-v3.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 5f6b3bc..f27b8dc 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2033,17 +2033,9 @@ static int arm_smmu_init_strtab_2lvl(struct 
arm_smmu_device *smmu)
u32 size, l1size;
struct arm_smmu_strtab_cfg *cfg = >strtab_cfg;
 
-   /*
-* If we can resolve everything with a single L2 table, then we
-* just need a single L1 descriptor. Otherwise, calculate the L1
-* size, capped to the SIDSIZE.
-*/
-   if (smmu->sid_bits < STRTAB_SPLIT) {
-   size = 0;
-   } else {
-   size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
-   size = min(size, smmu->sid_bits - STRTAB_SPLIT);
-   }
+   /* Calculate the L1 size, capped to the SIDSIZE. */
+   size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
+   size = min(size, smmu->sid_bits - STRTAB_SPLIT);
cfg->num_l1_ents = 1 << size;
 
size += STRTAB_SPLIT;
@@ -2531,6 +2523,13 @@ static int arm_smmu_device_probe(struct arm_smmu_device 
*smmu)
smmu->ssid_bits = reg >> IDR1_SSID_SHIFT & IDR1_SSID_MASK;
smmu->sid_bits = reg >> IDR1_SID_SHIFT & IDR1_SID_MASK;
 
+   /*
+* If the SMMU supports fewer bits than would fill a single L2 stream
+* table, use a linear table instead.
+*/
+   if (smmu->sid_bits <= STRTAB_SPLIT)
+   smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+
/* IDR5 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux
Foundation Collaborative Project.



[PATCH v2] iommu/arm-smmu-v3: limit use of 2-level stream tables

2016-07-12 Thread Nate Watterson
In the current arm-smmu-v3 driver, all smmus that support 2-level
stream tables are being forced to use them. This is suboptimal for
smmus that support fewer stream id bits than would fill in a single
second level table. This patch limits the use of 2-level tables to
smmus that both support the feature and whose first level table can
possibly contain more than a single entry.

Signed-off-by: Nate Watterson 
---
 drivers/iommu/arm-smmu-v3.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 5f6b3bc..f27b8dc 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2033,17 +2033,9 @@ static int arm_smmu_init_strtab_2lvl(struct 
arm_smmu_device *smmu)
u32 size, l1size;
struct arm_smmu_strtab_cfg *cfg = >strtab_cfg;
 
-   /*
-* If we can resolve everything with a single L2 table, then we
-* just need a single L1 descriptor. Otherwise, calculate the L1
-* size, capped to the SIDSIZE.
-*/
-   if (smmu->sid_bits < STRTAB_SPLIT) {
-   size = 0;
-   } else {
-   size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
-   size = min(size, smmu->sid_bits - STRTAB_SPLIT);
-   }
+   /* Calculate the L1 size, capped to the SIDSIZE. */
+   size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3);
+   size = min(size, smmu->sid_bits - STRTAB_SPLIT);
cfg->num_l1_ents = 1 << size;
 
size += STRTAB_SPLIT;
@@ -2531,6 +2523,13 @@ static int arm_smmu_device_probe(struct arm_smmu_device 
*smmu)
smmu->ssid_bits = reg >> IDR1_SSID_SHIFT & IDR1_SSID_MASK;
smmu->sid_bits = reg >> IDR1_SID_SHIFT & IDR1_SID_MASK;
 
+   /*
+* If the SMMU supports fewer bits than would fill a single L2 stream
+* table, use a linear table instead.
+*/
+   if (smmu->sid_bits <= STRTAB_SPLIT)
+   smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB;
+
/* IDR5 */
reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5);
 
-- 
Qualcomm Datacenter Technologies, Inc. on behalf of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux
Foundation Collaborative Project.