Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-09 Thread Robin Murphy

[ +Lorenzo ]

On 09/03/18 04:50, Tomasz Figa wrote:
[...]

Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?



AFAICS, it might be as simple as arm_smmu_probe() doing this:

 /*
  * We want to avoid touching dev->power.lock in fastpaths unless
  * it's really going to do something useful - pm_runtime_enabled()
  * can serve as an ideal proxy for that decision.
  */
 if (dev->pm_domain)
 pm_runtime_enable(dev);

or maybe even just gate all the calls with "if (smmu->dev.pm_domain)"
directly (like pcie-mediatek does), but I'm not sure which would be
conceptually cleaner.


Okay, that was easier than I expected. Thanks. :)

Actually, there is one more thing that might need rechecking. Are you
sure that dev->pm_domain is NULL for the devices, for which we don't
want runtime PM to be enabled? I think ACPI was mentioned and ACPI
includes the concept of PM domains.


Thanks for pointing that out - thankfully, I've confirmed that the SMMUs 
on my Juno don't have dev->pm_domain set when booting with ACPI, and 
double-checking the ACPI code I think we're OK here. Since the SMMUs are 
only described in the static IORT table and not in the ACPI namespace, 
they won't have the ACPI companion device that acpi_dev_pm_attach() 
looks for, and thus should always be ignored. Lorenzo, do I have that right?


Robin.


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-09 Thread Robin Murphy

[ +Lorenzo ]

On 09/03/18 04:50, Tomasz Figa wrote:
[...]

Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?



AFAICS, it might be as simple as arm_smmu_probe() doing this:

 /*
  * We want to avoid touching dev->power.lock in fastpaths unless
  * it's really going to do something useful - pm_runtime_enabled()
  * can serve as an ideal proxy for that decision.
  */
 if (dev->pm_domain)
 pm_runtime_enable(dev);

or maybe even just gate all the calls with "if (smmu->dev.pm_domain)"
directly (like pcie-mediatek does), but I'm not sure which would be
conceptually cleaner.


Okay, that was easier than I expected. Thanks. :)

Actually, there is one more thing that might need rechecking. Are you
sure that dev->pm_domain is NULL for the devices, for which we don't
want runtime PM to be enabled? I think ACPI was mentioned and ACPI
includes the concept of PM domains.


Thanks for pointing that out - thankfully, I've confirmed that the SMMUs 
on my Juno don't have dev->pm_domain set when booting with ACPI, and 
double-checking the ACPI code I think we're OK here. Since the SMMUs are 
only described in the static IORT table and not in the ACPI namespace, 
they won't have the ACPI companion device that acpi_dev_pm_attach() 
looks for, and thus should always be ignored. Lorenzo, do I have that right?


Robin.


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-08 Thread Tomasz Figa
On Thu, Mar 8, 2018 at 9:12 PM, Robin Murphy  wrote:
> On 08/03/18 04:33, Tomasz Figa wrote:
>>
>> On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy  wrote:
>>>
>>> On 07/03/18 13:52, Tomasz Figa wrote:


 On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy 
 wrote:
>
>
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>>
>>
>> From: Sricharan R 
>>
>> The smmu device probe/remove and add/remove master device callbacks
>> gets called when the smmu is not linked to its master, that is without
>> the context of the master device. So calling runtime apis in those
>> places
>> separately.
>>
>> Signed-off-by: Sricharan R 
>> [vivek: Cleanup pm runtime calls]
>> Signed-off-by: Vivek Gautam 
>> ---
>> drivers/iommu/arm-smmu.c | 96
>> 
>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8b16f53f597..3d6a1875431f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>>   struct clk_bulk_data*clks;
>>   int num_clks;
>> + boolrpm_supported;
>> +
>
>
>
>
> Can we not automatically infer this from whether clocks and/or power
> domains
> are specified or not, then just use pm_runtime_enabled() as the
> fast-path
> check as Tomasz originally proposed?



 I wouldn't tie this to presence of clocks, since as a next step we
 would want to actually control the clocks separately. (As far as I
 understand, on QCom SoCs we might want to have runtime PM active for
 the translation to work, but clocks gated whenever access to SMMU
 registers is not needed.) Moreover, you might still have some super
 high scale thousand-core systems that require clocks to be
 prepare-enabled, but runtime PM would be undesirable for the reasons
 we discussed before.

>
> I worry that relying on statically-defined matchdata is just going to
> blow
> up the driver and DT binding into a maintenance nightmare; I really
> don't
> want to start needing separate definitions for e.g.
> "arm,juno-etr-mmu-401"
> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
> instance
> within the SoC is in a separate controllable power domain while the
> others
> aren't.



 I don't see a reason why both couldn't just have RPM supported
 regardless of whether there is a real power domain. It would
 effectively be just a no-op for those that don't have one.
>>>
>>>
>>>
>>> Because you're then effectively defining "compatible" values for the sake
>>> of
>>> attaching software policy to them, rather than actually describing
>>> different
>>> hardware implementations.
>>>
>>> The fact that RPM can't do anything meaningful unless relevant
>>> clock/power
>>> aspects *are* described, however, means that we shouldn't need additional
>>> information redundant with that. Much like the fact that we don't
>>> *already*
>>> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
>>> integrated such that IDR0.CTTW has the wrong value, since the presence or
>>> not of the "dma-coherent" property already describes the truth in that
>>> regard.
>>
>>
>> Fair enough.
>>
>>>
 IMHO the
 only reason to avoid having the RPM enabled is the scalability issue
 we discussed before.
>>>
>>>
>>>
>>> Yes, but that's kind of my point; in reality high throughput/minimal
>>> latency
>>> and aggressive power management are more or less mutually exclusive.
>>> Mobile
>>> SoCs with fine-grained clock trees and power domains won't have multiple
>>> 40GBe/NVMf/whatever links running flat out in parallel; conversely
>>> networking/infrastructure/server SoCs aren't designed around saving every
>>> last microamp of leakage current - even in the (fairly unlikely) case of
>>> the
>>> interconnect clocks being software-gateable at all I would be very
>>> surprised
>>> if that were ever exposed directly to Linux (FWIW I believe ACPI
>>> essentially
>>> *requires* clocks to be abstracted behind firmware).
>>>
>>> Realistically then, explicit clocks are only expected on systems which
>>> care
>>> about power management. We can always revisit that assumption if anything
>>> crazy where it isn't the case ever becomes non-theoretical, but for now
>>> it's
>>> one I'm entirely comfortable with. If on the other hand it turns out that
>>> we
>>> can rely on just a power domain being present wherever we want RPM,
>>> making
>>> clocks moot, then all the better.
>>
>>

Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-08 Thread Tomasz Figa
On Thu, Mar 8, 2018 at 9:12 PM, Robin Murphy  wrote:
> On 08/03/18 04:33, Tomasz Figa wrote:
>>
>> On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy  wrote:
>>>
>>> On 07/03/18 13:52, Tomasz Figa wrote:


 On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy 
 wrote:
>
>
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>>
>>
>> From: Sricharan R 
>>
>> The smmu device probe/remove and add/remove master device callbacks
>> gets called when the smmu is not linked to its master, that is without
>> the context of the master device. So calling runtime apis in those
>> places
>> separately.
>>
>> Signed-off-by: Sricharan R 
>> [vivek: Cleanup pm runtime calls]
>> Signed-off-by: Vivek Gautam 
>> ---
>> drivers/iommu/arm-smmu.c | 96
>> 
>> 1 file changed, 88 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8b16f53f597..3d6a1875431f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>>   struct clk_bulk_data*clks;
>>   int num_clks;
>> + boolrpm_supported;
>> +
>
>
>
>
> Can we not automatically infer this from whether clocks and/or power
> domains
> are specified or not, then just use pm_runtime_enabled() as the
> fast-path
> check as Tomasz originally proposed?



 I wouldn't tie this to presence of clocks, since as a next step we
 would want to actually control the clocks separately. (As far as I
 understand, on QCom SoCs we might want to have runtime PM active for
 the translation to work, but clocks gated whenever access to SMMU
 registers is not needed.) Moreover, you might still have some super
 high scale thousand-core systems that require clocks to be
 prepare-enabled, but runtime PM would be undesirable for the reasons
 we discussed before.

>
> I worry that relying on statically-defined matchdata is just going to
> blow
> up the driver and DT binding into a maintenance nightmare; I really
> don't
> want to start needing separate definitions for e.g.
> "arm,juno-etr-mmu-401"
> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
> instance
> within the SoC is in a separate controllable power domain while the
> others
> aren't.



 I don't see a reason why both couldn't just have RPM supported
 regardless of whether there is a real power domain. It would
 effectively be just a no-op for those that don't have one.
>>>
>>>
>>>
>>> Because you're then effectively defining "compatible" values for the sake
>>> of
>>> attaching software policy to them, rather than actually describing
>>> different
>>> hardware implementations.
>>>
>>> The fact that RPM can't do anything meaningful unless relevant
>>> clock/power
>>> aspects *are* described, however, means that we shouldn't need additional
>>> information redundant with that. Much like the fact that we don't
>>> *already*
>>> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
>>> integrated such that IDR0.CTTW has the wrong value, since the presence or
>>> not of the "dma-coherent" property already describes the truth in that
>>> regard.
>>
>>
>> Fair enough.
>>
>>>
 IMHO the
 only reason to avoid having the RPM enabled is the scalability issue
 we discussed before.
>>>
>>>
>>>
>>> Yes, but that's kind of my point; in reality high throughput/minimal
>>> latency
>>> and aggressive power management are more or less mutually exclusive.
>>> Mobile
>>> SoCs with fine-grained clock trees and power domains won't have multiple
>>> 40GBe/NVMf/whatever links running flat out in parallel; conversely
>>> networking/infrastructure/server SoCs aren't designed around saving every
>>> last microamp of leakage current - even in the (fairly unlikely) case of
>>> the
>>> interconnect clocks being software-gateable at all I would be very
>>> surprised
>>> if that were ever exposed directly to Linux (FWIW I believe ACPI
>>> essentially
>>> *requires* clocks to be abstracted behind firmware).
>>>
>>> Realistically then, explicit clocks are only expected on systems which
>>> care
>>> about power management. We can always revisit that assumption if anything
>>> crazy where it isn't the case ever becomes non-theoretical, but for now
>>> it's
>>> one I'm entirely comfortable with. If on the other hand it turns out that
>>> we
>>> can rely on just a power domain being present wherever we want RPM,
>>> making
>>> clocks moot, then all the better.
>>
>>
>> Alright. Since Qcom would be the only user of clock and power handling
>> for the time being, I think checking power domain presence could work

Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-08 Thread Robin Murphy

On 08/03/18 04:33, Tomasz Figa wrote:

On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy  wrote:

On 07/03/18 13:52, Tomasz Figa wrote:


On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:


On 02/03/18 10:10, Vivek Gautam wrote:



From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those
places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
drivers/iommu/arm-smmu.c | 96

1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
  struct clk_bulk_data*clks;
  int num_clks;
+ boolrpm_supported;
+




Can we not automatically infer this from whether clocks and/or power
domains
are specified or not, then just use pm_runtime_enabled() as the fast-path
check as Tomasz originally proposed?



I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.



I worry that relying on statically-defined matchdata is just going to
blow
up the driver and DT binding into a maintenance nightmare; I really don't
want to start needing separate definitions for e.g.
"arm,juno-etr-mmu-401"
and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
instance
within the SoC is in a separate controllable power domain while the
others
aren't.



I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one.



Because you're then effectively defining "compatible" values for the sake of
attaching software policy to them, rather than actually describing different
hardware implementations.

The fact that RPM can't do anything meaningful unless relevant clock/power
aspects *are* described, however, means that we shouldn't need additional
information redundant with that. Much like the fact that we don't *already*
have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
integrated such that IDR0.CTTW has the wrong value, since the presence or
not of the "dma-coherent" property already describes the truth in that
regard.


Fair enough.




IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.



Yes, but that's kind of my point; in reality high throughput/minimal latency
and aggressive power management are more or less mutually exclusive. Mobile
SoCs with fine-grained clock trees and power domains won't have multiple
40GBe/NVMf/whatever links running flat out in parallel; conversely
networking/infrastructure/server SoCs aren't designed around saving every
last microamp of leakage current - even in the (fairly unlikely) case of the
interconnect clocks being software-gateable at all I would be very surprised
if that were ever exposed directly to Linux (FWIW I believe ACPI essentially
*requires* clocks to be abstracted behind firmware).

Realistically then, explicit clocks are only expected on systems which care
about power management. We can always revisit that assumption if anything
crazy where it isn't the case ever becomes non-theoretical, but for now it's
one I'm entirely comfortable with. If on the other hand it turns out that we
can rely on just a power domain being present wherever we want RPM, making
clocks moot, then all the better.


Alright. Since Qcom would be the only user of clock and power handling
for the time being, I think checking power domain presence could work
for us. +/- the fact that clocks need to be handled even if power
domain is not present, but we should normally always have both.


Great! (the issue of Qcom-specific clock handling is a separate argument 
which I don't feel like reigniting just now...)



Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?


AFAICS, it might be as simple as arm_smmu_probe() doing this:

/*
 * We want to avoid touching dev->power.lock in fastpaths unless
 * it's really going to do 

Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-08 Thread Robin Murphy

On 08/03/18 04:33, Tomasz Figa wrote:

On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy  wrote:

On 07/03/18 13:52, Tomasz Figa wrote:


On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:


On 02/03/18 10:10, Vivek Gautam wrote:



From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those
places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
drivers/iommu/arm-smmu.c | 96

1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
  struct clk_bulk_data*clks;
  int num_clks;
+ boolrpm_supported;
+




Can we not automatically infer this from whether clocks and/or power
domains
are specified or not, then just use pm_runtime_enabled() as the fast-path
check as Tomasz originally proposed?



I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.



I worry that relying on statically-defined matchdata is just going to
blow
up the driver and DT binding into a maintenance nightmare; I really don't
want to start needing separate definitions for e.g.
"arm,juno-etr-mmu-401"
and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
instance
within the SoC is in a separate controllable power domain while the
others
aren't.



I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one.



Because you're then effectively defining "compatible" values for the sake of
attaching software policy to them, rather than actually describing different
hardware implementations.

The fact that RPM can't do anything meaningful unless relevant clock/power
aspects *are* described, however, means that we shouldn't need additional
information redundant with that. Much like the fact that we don't *already*
have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
integrated such that IDR0.CTTW has the wrong value, since the presence or
not of the "dma-coherent" property already describes the truth in that
regard.


Fair enough.




IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.



Yes, but that's kind of my point; in reality high throughput/minimal latency
and aggressive power management are more or less mutually exclusive. Mobile
SoCs with fine-grained clock trees and power domains won't have multiple
40GBe/NVMf/whatever links running flat out in parallel; conversely
networking/infrastructure/server SoCs aren't designed around saving every
last microamp of leakage current - even in the (fairly unlikely) case of the
interconnect clocks being software-gateable at all I would be very surprised
if that were ever exposed directly to Linux (FWIW I believe ACPI essentially
*requires* clocks to be abstracted behind firmware).

Realistically then, explicit clocks are only expected on systems which care
about power management. We can always revisit that assumption if anything
crazy where it isn't the case ever becomes non-theoretical, but for now it's
one I'm entirely comfortable with. If on the other hand it turns out that we
can rely on just a power domain being present wherever we want RPM, making
clocks moot, then all the better.


Alright. Since Qcom would be the only user of clock and power handling
for the time being, I think checking power domain presence could work
for us. +/- the fact that clocks need to be handled even if power
domain is not present, but we should normally always have both.


Great! (the issue of Qcom-specific clock handling is a separate argument 
which I don't feel like reigniting just now...)



Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?


AFAICS, it might be as simple as arm_smmu_probe() doing this:

/*
 * We want to avoid touching dev->power.lock in fastpaths unless
 * it's really going to do something useful - pm_runtime_enabled()
 * can serve as an ideal proxy for that decision.
 */
if 

Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Tomasz Figa
On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy  wrote:
> On 07/03/18 13:52, Tomasz Figa wrote:
>>
>> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:
>>>
>>> On 02/03/18 10:10, Vivek Gautam wrote:


 From: Sricharan R 

 The smmu device probe/remove and add/remove master device callbacks
 gets called when the smmu is not linked to its master, that is without
 the context of the master device. So calling runtime apis in those
 places
 separately.

 Signed-off-by: Sricharan R 
 [vivek: Cleanup pm runtime calls]
 Signed-off-by: Vivek Gautam 
 ---
drivers/iommu/arm-smmu.c | 96
 
1 file changed, 88 insertions(+), 8 deletions(-)

 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index c8b16f53f597..3d6a1875431f 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -209,6 +209,8 @@ struct arm_smmu_device {
  struct clk_bulk_data*clks;
  int num_clks;
+ boolrpm_supported;
 +
>>>
>>>
>>>
>>> Can we not automatically infer this from whether clocks and/or power
>>> domains
>>> are specified or not, then just use pm_runtime_enabled() as the fast-path
>>> check as Tomasz originally proposed?
>>
>>
>> I wouldn't tie this to presence of clocks, since as a next step we
>> would want to actually control the clocks separately. (As far as I
>> understand, on QCom SoCs we might want to have runtime PM active for
>> the translation to work, but clocks gated whenever access to SMMU
>> registers is not needed.) Moreover, you might still have some super
>> high scale thousand-core systems that require clocks to be
>> prepare-enabled, but runtime PM would be undesirable for the reasons
>> we discussed before.
>>
>>>
>>> I worry that relying on statically-defined matchdata is just going to
>>> blow
>>> up the driver and DT binding into a maintenance nightmare; I really don't
>>> want to start needing separate definitions for e.g.
>>> "arm,juno-etr-mmu-401"
>>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
>>> instance
>>> within the SoC is in a separate controllable power domain while the
>>> others
>>> aren't.
>>
>>
>> I don't see a reason why both couldn't just have RPM supported
>> regardless of whether there is a real power domain. It would
>> effectively be just a no-op for those that don't have one.
>
>
> Because you're then effectively defining "compatible" values for the sake of
> attaching software policy to them, rather than actually describing different
> hardware implementations.
>
> The fact that RPM can't do anything meaningful unless relevant clock/power
> aspects *are* described, however, means that we shouldn't need additional
> information redundant with that. Much like the fact that we don't *already*
> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
> integrated such that IDR0.CTTW has the wrong value, since the presence or
> not of the "dma-coherent" property already describes the truth in that
> regard.

Fair enough.

>
>> IMHO the
>> only reason to avoid having the RPM enabled is the scalability issue
>> we discussed before.
>
>
> Yes, but that's kind of my point; in reality high throughput/minimal latency
> and aggressive power management are more or less mutually exclusive. Mobile
> SoCs with fine-grained clock trees and power domains won't have multiple
> 40GBe/NVMf/whatever links running flat out in parallel; conversely
> networking/infrastructure/server SoCs aren't designed around saving every
> last microamp of leakage current - even in the (fairly unlikely) case of the
> interconnect clocks being software-gateable at all I would be very surprised
> if that were ever exposed directly to Linux (FWIW I believe ACPI essentially
> *requires* clocks to be abstracted behind firmware).
>
> Realistically then, explicit clocks are only expected on systems which care
> about power management. We can always revisit that assumption if anything
> crazy where it isn't the case ever becomes non-theoretical, but for now it's
> one I'm entirely comfortable with. If on the other hand it turns out that we
> can rely on just a power domain being present wherever we want RPM, making
> clocks moot, then all the better.

Alright. Since Qcom would be the only user of clock and power handling
for the time being, I think checking power domain presence could work
for us. +/- the fact that clocks need to be handled even if power
domain is not present, but we should normally always have both.

Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?

Best regards,
Tomasz


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Tomasz Figa
On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy  wrote:
> On 07/03/18 13:52, Tomasz Figa wrote:
>>
>> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:
>>>
>>> On 02/03/18 10:10, Vivek Gautam wrote:


 From: Sricharan R 

 The smmu device probe/remove and add/remove master device callbacks
 gets called when the smmu is not linked to its master, that is without
 the context of the master device. So calling runtime apis in those
 places
 separately.

 Signed-off-by: Sricharan R 
 [vivek: Cleanup pm runtime calls]
 Signed-off-by: Vivek Gautam 
 ---
drivers/iommu/arm-smmu.c | 96
 
1 file changed, 88 insertions(+), 8 deletions(-)

 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
 index c8b16f53f597..3d6a1875431f 100644
 --- a/drivers/iommu/arm-smmu.c
 +++ b/drivers/iommu/arm-smmu.c
 @@ -209,6 +209,8 @@ struct arm_smmu_device {
  struct clk_bulk_data*clks;
  int num_clks;
+ boolrpm_supported;
 +
>>>
>>>
>>>
>>> Can we not automatically infer this from whether clocks and/or power
>>> domains
>>> are specified or not, then just use pm_runtime_enabled() as the fast-path
>>> check as Tomasz originally proposed?
>>
>>
>> I wouldn't tie this to presence of clocks, since as a next step we
>> would want to actually control the clocks separately. (As far as I
>> understand, on QCom SoCs we might want to have runtime PM active for
>> the translation to work, but clocks gated whenever access to SMMU
>> registers is not needed.) Moreover, you might still have some super
>> high scale thousand-core systems that require clocks to be
>> prepare-enabled, but runtime PM would be undesirable for the reasons
>> we discussed before.
>>
>>>
>>> I worry that relying on statically-defined matchdata is just going to
>>> blow
>>> up the driver and DT binding into a maintenance nightmare; I really don't
>>> want to start needing separate definitions for e.g.
>>> "arm,juno-etr-mmu-401"
>>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical
>>> instance
>>> within the SoC is in a separate controllable power domain while the
>>> others
>>> aren't.
>>
>>
>> I don't see a reason why both couldn't just have RPM supported
>> regardless of whether there is a real power domain. It would
>> effectively be just a no-op for those that don't have one.
>
>
> Because you're then effectively defining "compatible" values for the sake of
> attaching software policy to them, rather than actually describing different
> hardware implementations.
>
> The fact that RPM can't do anything meaningful unless relevant clock/power
> aspects *are* described, however, means that we shouldn't need additional
> information redundant with that. Much like the fact that we don't *already*
> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being
> integrated such that IDR0.CTTW has the wrong value, since the presence or
> not of the "dma-coherent" property already describes the truth in that
> regard.

Fair enough.

>
>> IMHO the
>> only reason to avoid having the RPM enabled is the scalability issue
>> we discussed before.
>
>
> Yes, but that's kind of my point; in reality high throughput/minimal latency
> and aggressive power management are more or less mutually exclusive. Mobile
> SoCs with fine-grained clock trees and power domains won't have multiple
> 40GBe/NVMf/whatever links running flat out in parallel; conversely
> networking/infrastructure/server SoCs aren't designed around saving every
> last microamp of leakage current - even in the (fairly unlikely) case of the
> interconnect clocks being software-gateable at all I would be very surprised
> if that were ever exposed directly to Linux (FWIW I believe ACPI essentially
> *requires* clocks to be abstracted behind firmware).
>
> Realistically then, explicit clocks are only expected on systems which care
> about power management. We can always revisit that assumption if anything
> crazy where it isn't the case ever becomes non-theoretical, but for now it's
> one I'm entirely comfortable with. If on the other hand it turns out that we
> can rely on just a power domain being present wherever we want RPM, making
> clocks moot, then all the better.

Alright. Since Qcom would be the only user of clock and power handling
for the time being, I think checking power domain presence could work
for us. +/- the fact that clocks need to be handled even if power
domain is not present, but we should normally always have both.

Now we need a way to do the check. Perhaps for the time being it would
be enough to just check for the power-domains property in DT?

Best regards,
Tomasz


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Robin Murphy

On 07/03/18 13:52, Tomasz Figa wrote:

On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:

On 02/03/18 10:10, Vivek Gautam wrote:


From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
   drivers/iommu/arm-smmu.c | 96

   1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
 struct clk_bulk_data*clks;
 int num_clks;
   + boolrpm_supported;
+



Can we not automatically infer this from whether clocks and/or power domains
are specified or not, then just use pm_runtime_enabled() as the fast-path
check as Tomasz originally proposed?


I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.



I worry that relying on statically-defined matchdata is just going to blow
up the driver and DT binding into a maintenance nightmare; I really don't
want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401"
and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance
within the SoC is in a separate controllable power domain while the others
aren't.


I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one.


Because you're then effectively defining "compatible" values for the 
sake of attaching software policy to them, rather than actually 
describing different hardware implementations.


The fact that RPM can't do anything meaningful unless relevant 
clock/power aspects *are* described, however, means that we shouldn't 
need additional information redundant with that. Much like the fact that 
we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to 
account for those being integrated such that IDR0.CTTW has the wrong 
value, since the presence or not of the "dma-coherent" property already 
describes the truth in that regard.



IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.


Yes, but that's kind of my point; in reality high throughput/minimal 
latency and aggressive power management are more or less mutually 
exclusive. Mobile SoCs with fine-grained clock trees and power domains 
won't have multiple 40GBe/NVMf/whatever links running flat out in 
parallel; conversely networking/infrastructure/server SoCs aren't 
designed around saving every last microamp of leakage current - even in 
the (fairly unlikely) case of the interconnect clocks being 
software-gateable at all I would be very surprised if that were ever 
exposed directly to Linux (FWIW I believe ACPI essentially *requires* 
clocks to be abstracted behind firmware).


Realistically then, explicit clocks are only expected on systems which 
care about power management. We can always revisit that assumption if 
anything crazy where it isn't the case ever becomes non-theoretical, but 
for now it's one I'm entirely comfortable with. If on the other hand it 
turns out that we can rely on just a power domain being present wherever 
we want RPM, making clocks moot, then all the better.


Robin.


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Robin Murphy

On 07/03/18 13:52, Tomasz Figa wrote:

On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:

On 02/03/18 10:10, Vivek Gautam wrote:


From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
   drivers/iommu/arm-smmu.c | 96

   1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
 struct clk_bulk_data*clks;
 int num_clks;
   + boolrpm_supported;
+



Can we not automatically infer this from whether clocks and/or power domains
are specified or not, then just use pm_runtime_enabled() as the fast-path
check as Tomasz originally proposed?


I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.



I worry that relying on statically-defined matchdata is just going to blow
up the driver and DT binding into a maintenance nightmare; I really don't
want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401"
and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance
within the SoC is in a separate controllable power domain while the others
aren't.


I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one.


Because you're then effectively defining "compatible" values for the 
sake of attaching software policy to them, rather than actually 
describing different hardware implementations.


The fact that RPM can't do anything meaningful unless relevant 
clock/power aspects *are* described, however, means that we shouldn't 
need additional information redundant with that. Much like the fact that 
we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to 
account for those being integrated such that IDR0.CTTW has the wrong 
value, since the presence or not of the "dma-coherent" property already 
describes the truth in that regard.



IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.


Yes, but that's kind of my point; in reality high throughput/minimal 
latency and aggressive power management are more or less mutually 
exclusive. Mobile SoCs with fine-grained clock trees and power domains 
won't have multiple 40GBe/NVMf/whatever links running flat out in 
parallel; conversely networking/infrastructure/server SoCs aren't 
designed around saving every last microamp of leakage current - even in 
the (fairly unlikely) case of the interconnect clocks being 
software-gateable at all I would be very surprised if that were ever 
exposed directly to Linux (FWIW I believe ACPI essentially *requires* 
clocks to be abstracted behind firmware).


Realistically then, explicit clocks are only expected on systems which 
care about power management. We can always revisit that assumption if 
anything crazy where it isn't the case ever becomes non-theoretical, but 
for now it's one I'm entirely comfortable with. If on the other hand it 
turns out that we can rely on just a power domain being present wherever 
we want RPM, making clocks moot, then all the better.


Robin.


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Tomasz Figa
On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>> From: Sricharan R 
>>
>> The smmu device probe/remove and add/remove master device callbacks
>> gets called when the smmu is not linked to its master, that is without
>> the context of the master device. So calling runtime apis in those places
>> separately.
>>
>> Signed-off-by: Sricharan R 
>> [vivek: Cleanup pm runtime calls]
>> Signed-off-by: Vivek Gautam 
>> ---
>>   drivers/iommu/arm-smmu.c | 96
>> 
>>   1 file changed, 88 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8b16f53f597..3d6a1875431f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>> struct clk_bulk_data*clks;
>> int num_clks;
>>   + boolrpm_supported;
>> +
>
>
> Can we not automatically infer this from whether clocks and/or power domains
> are specified or not, then just use pm_runtime_enabled() as the fast-path
> check as Tomasz originally proposed?

I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.

>
> I worry that relying on statically-defined matchdata is just going to blow
> up the driver and DT binding into a maintenance nightmare; I really don't
> want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401"
> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance
> within the SoC is in a separate controllable power domain while the others
> aren't.

I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one. IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.

Best regards,
Tomasz


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Tomasz Figa
On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy  wrote:
> On 02/03/18 10:10, Vivek Gautam wrote:
>>
>> From: Sricharan R 
>>
>> The smmu device probe/remove and add/remove master device callbacks
>> gets called when the smmu is not linked to its master, that is without
>> the context of the master device. So calling runtime apis in those places
>> separately.
>>
>> Signed-off-by: Sricharan R 
>> [vivek: Cleanup pm runtime calls]
>> Signed-off-by: Vivek Gautam 
>> ---
>>   drivers/iommu/arm-smmu.c | 96
>> 
>>   1 file changed, 88 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8b16f53f597..3d6a1875431f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -209,6 +209,8 @@ struct arm_smmu_device {
>> struct clk_bulk_data*clks;
>> int num_clks;
>>   + boolrpm_supported;
>> +
>
>
> Can we not automatically infer this from whether clocks and/or power domains
> are specified or not, then just use pm_runtime_enabled() as the fast-path
> check as Tomasz originally proposed?

I wouldn't tie this to presence of clocks, since as a next step we
would want to actually control the clocks separately. (As far as I
understand, on QCom SoCs we might want to have runtime PM active for
the translation to work, but clocks gated whenever access to SMMU
registers is not needed.) Moreover, you might still have some super
high scale thousand-core systems that require clocks to be
prepare-enabled, but runtime PM would be undesirable for the reasons
we discussed before.

>
> I worry that relying on statically-defined matchdata is just going to blow
> up the driver and DT binding into a maintenance nightmare; I really don't
> want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401"
> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance
> within the SoC is in a separate controllable power domain while the others
> aren't.

I don't see a reason why both couldn't just have RPM supported
regardless of whether there is a real power domain. It would
effectively be just a no-op for those that don't have one. IMHO the
only reason to avoid having the RPM enabled is the scalability issue
we discussed before.

Best regards,
Tomasz


Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Robin Murphy

On 02/03/18 10:10, Vivek Gautam wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 96 
  1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
struct clk_bulk_data*clks;
int num_clks;
  
+	boolrpm_supported;

+


Can we not automatically infer this from whether clocks and/or power 
domains are specified or not, then just use pm_runtime_enabled() as the 
fast-path check as Tomasz originally proposed?


I worry that relying on statically-defined matchdata is just going to 
blow up the driver and DT binding into a maintenance nightmare; I really 
don't want to start needing separate definitions for e.g. 
"arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one 
otherwise-identical instance within the SoC is in a separate 
controllable power domain while the others aren't.


Robin.


u32 cavium_id_base; /* Specific to Cavium */
  
  	spinlock_t			global_sync_lock;

@@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
  };
  
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)

+{
+   if (smmu->rpm_supported)
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (smmu->rpm_supported)
+   pm_runtime_put(smmu->dev);
+}
+
  static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
  {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   int irq;
+   int ret, irq;
  
  	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)

return;
  
+	ret = arm_smmu_rpm_get(smmu);

+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
  
  	free_io_pgtable_ops(smmu_domain->pgtbl_ops);

__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
  }
  
  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)

@@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
  
  	smmu = fwspec_smmu(fwspec);

+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
  
  	/*

 * Sanity check the domain. We don't support domains across
@@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
}
  
  	/* Looks ok, so add the device to the domain */

-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
  
  destroy_domain:

arm_smmu_destroy_domain_context(domain);
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+
return ret;
  }
  
@@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,

phys_addr_t paddr, size_t size, int prot)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   int ret;
  
  	if (!ops)

return -ENODEV;
  
-	return ops->map(ops, iova, paddr, size, prot);

+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
  }
  
  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,

 size_t size)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   

Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-07 Thread Robin Murphy

On 02/03/18 10:10, Vivek Gautam wrote:

From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
  drivers/iommu/arm-smmu.c | 96 
  1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
struct clk_bulk_data*clks;
int num_clks;
  
+	boolrpm_supported;

+


Can we not automatically infer this from whether clocks and/or power 
domains are specified or not, then just use pm_runtime_enabled() as the 
fast-path check as Tomasz originally proposed?


I worry that relying on statically-defined matchdata is just going to 
blow up the driver and DT binding into a maintenance nightmare; I really 
don't want to start needing separate definitions for e.g. 
"arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one 
otherwise-identical instance within the SoC is in a separate 
controllable power domain while the others aren't.


Robin.


u32 cavium_id_base; /* Specific to Cavium */
  
  	spinlock_t			global_sync_lock;

@@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
  };
  
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)

+{
+   if (smmu->rpm_supported)
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (smmu->rpm_supported)
+   pm_runtime_put(smmu->dev);
+}
+
  static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
  {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   int irq;
+   int ret, irq;
  
  	if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)

return;
  
+	ret = arm_smmu_rpm_get(smmu);

+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
  
  	free_io_pgtable_ops(smmu_domain->pgtbl_ops);

__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
  }
  
  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)

@@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
  
  	smmu = fwspec_smmu(fwspec);

+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
  
  	/*

 * Sanity check the domain. We don't support domains across
@@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
}
  
  	/* Looks ok, so add the device to the domain */

-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
  
  destroy_domain:

arm_smmu_destroy_domain_context(domain);
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+
return ret;
  }
  
@@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,

phys_addr_t paddr, size_t size, int prot)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   int ret;
  
  	if (!ops)

return -ENODEV;
  
-	return ops->map(ops, iova, paddr, size, prot);

+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
  }
  
  static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,

 size_t size)
  {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct 

[PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-02 Thread Vivek Gautam
From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
 drivers/iommu/arm-smmu.c | 96 
 1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
struct clk_bulk_data*clks;
int num_clks;
 
+   boolrpm_supported;
+
u32 cavium_id_base; /* Specific to Cavium */
 
spinlock_t  global_sync_lock;
@@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
 };
 
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+{
+   if (smmu->rpm_supported)
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (smmu->rpm_supported)
+   pm_runtime_put(smmu->dev);
+}
+
 static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   int irq;
+   int ret, irq;
 
if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
return;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
 
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
 }
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
 
smmu = fwspec_smmu(fwspec);
+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
 
/*
 * Sanity check the domain. We don't support domains across
@@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
}
 
/* Looks ok, so add the device to the domain */
-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 
 destroy_domain:
arm_smmu_destroy_domain_context(domain);
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+
return ret;
 }
 
@@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   int ret;
 
if (!ops)
return -ENODEV;
 
-   return ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 size_t size)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   size_t ret;
 
if (!ops)
return 0;
 
-   return ops->unmap(ops, iova, size);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->unmap(ops, iova, size);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -1412,14 +1460,22 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;
 
+   

[PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device

2018-03-02 Thread Vivek Gautam
From: Sricharan R 

The smmu device probe/remove and add/remove master device callbacks
gets called when the smmu is not linked to its master, that is without
the context of the master device. So calling runtime apis in those places
separately.

Signed-off-by: Sricharan R 
[vivek: Cleanup pm runtime calls]
Signed-off-by: Vivek Gautam 
---
 drivers/iommu/arm-smmu.c | 96 
 1 file changed, 88 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8b16f53f597..3d6a1875431f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -209,6 +209,8 @@ struct arm_smmu_device {
struct clk_bulk_data*clks;
int num_clks;
 
+   boolrpm_supported;
+
u32 cavium_id_base; /* Specific to Cavium */
 
spinlock_t  global_sync_lock;
@@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
{ 0, NULL},
 };
 
+static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
+{
+   if (smmu->rpm_supported)
+   return pm_runtime_get_sync(smmu->dev);
+
+   return 0;
+}
+
+static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu)
+{
+   if (smmu->rpm_supported)
+   pm_runtime_put(smmu->dev);
+}
+
 static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
return container_of(dom, struct arm_smmu_domain, domain);
@@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_cfg *cfg = _domain->cfg;
-   int irq;
+   int ret, irq;
 
if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY)
return;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return;
+
/*
 * Disable the context bank and free the page tables before freeing
 * it.
@@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct 
iommu_domain *domain)
 
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
__arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx);
+
+   arm_smmu_rpm_put(smmu);
 }
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
@@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENODEV;
 
smmu = fwspec_smmu(fwspec);
+
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   return ret;
+
/* Ensure that the domain is finalised */
ret = arm_smmu_init_domain_context(domain, smmu);
if (ret < 0)
-   return ret;
+   goto rpm_put;
 
/*
 * Sanity check the domain. We don't support domains across
@@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
}
 
/* Looks ok, so add the device to the domain */
-   return arm_smmu_domain_add_master(smmu_domain, fwspec);
+   ret = arm_smmu_domain_add_master(smmu_domain, fwspec);
+
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 
 destroy_domain:
arm_smmu_destroy_domain_context(domain);
+rpm_put:
+   arm_smmu_rpm_put(smmu);
+
return ret;
 }
 
@@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, 
unsigned long iova,
phys_addr_t paddr, size_t size, int prot)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   int ret;
 
if (!ops)
return -ENODEV;
 
-   return ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->map(ops, iova, paddr, size, prot);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 size_t size)
 {
struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops;
+   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
+   size_t ret;
 
if (!ops)
return 0;
 
-   return ops->unmap(ops, iova, size);
+   arm_smmu_rpm_get(smmu);
+   ret = ops->unmap(ops, iova, size);
+   arm_smmu_rpm_put(smmu);
+
+   return ret;
 }
 
 static void arm_smmu_iotlb_sync(struct iommu_domain *domain)
@@ -1412,14 +1460,22 @@ static int arm_smmu_add_device(struct device *dev)
while (i--)
cfg->smendx[i] = INVALID_SMENDX;
 
+   ret = arm_smmu_rpm_get(smmu);
+   if (ret < 0)
+   goto