Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
[ +Lorenzo ] On 09/03/18 04:50, Tomasz Figa wrote: [...] Now we need a way to do the check. Perhaps for the time being it would be enough to just check for the power-domains property in DT? AFAICS, it might be as simple as arm_smmu_probe() doing this: /* * We want to avoid touching dev->power.lock in fastpaths unless * it's really going to do something useful - pm_runtime_enabled() * can serve as an ideal proxy for that decision. */ if (dev->pm_domain) pm_runtime_enable(dev); or maybe even just gate all the calls with "if (smmu->dev.pm_domain)" directly (like pcie-mediatek does), but I'm not sure which would be conceptually cleaner. Okay, that was easier than I expected. Thanks. :) Actually, there is one more thing that might need rechecking. Are you sure that dev->pm_domain is NULL for the devices, for which we don't want runtime PM to be enabled? I think ACPI was mentioned and ACPI includes the concept of PM domains. Thanks for pointing that out - thankfully, I've confirmed that the SMMUs on my Juno don't have dev->pm_domain set when booting with ACPI, and double-checking the ACPI code I think we're OK here. Since the SMMUs are only described in the static IORT table and not in the ACPI namespace, they won't have the ACPI companion device that acpi_dev_pm_attach() looks for, and thus should always be ignored. Lorenzo, do I have that right? Robin.
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
[ +Lorenzo ] On 09/03/18 04:50, Tomasz Figa wrote: [...] Now we need a way to do the check. Perhaps for the time being it would be enough to just check for the power-domains property in DT? AFAICS, it might be as simple as arm_smmu_probe() doing this: /* * We want to avoid touching dev->power.lock in fastpaths unless * it's really going to do something useful - pm_runtime_enabled() * can serve as an ideal proxy for that decision. */ if (dev->pm_domain) pm_runtime_enable(dev); or maybe even just gate all the calls with "if (smmu->dev.pm_domain)" directly (like pcie-mediatek does), but I'm not sure which would be conceptually cleaner. Okay, that was easier than I expected. Thanks. :) Actually, there is one more thing that might need rechecking. Are you sure that dev->pm_domain is NULL for the devices, for which we don't want runtime PM to be enabled? I think ACPI was mentioned and ACPI includes the concept of PM domains. Thanks for pointing that out - thankfully, I've confirmed that the SMMUs on my Juno don't have dev->pm_domain set when booting with ACPI, and double-checking the ACPI code I think we're OK here. Since the SMMUs are only described in the static IORT table and not in the ACPI namespace, they won't have the ACPI companion device that acpi_dev_pm_attach() looks for, and thus should always be ignored. Lorenzo, do I have that right? Robin.
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Thu, Mar 8, 2018 at 9:12 PM, Robin Murphywrote: > On 08/03/18 04:33, Tomasz Figa wrote: >> >> On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy wrote: >>> >>> On 07/03/18 13:52, Tomasz Figa wrote: On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: > > > On 02/03/18 10:10, Vivek Gautam wrote: >> >> >> >> From: Sricharan R >> >> The smmu device probe/remove and add/remove master device callbacks >> gets called when the smmu is not linked to its master, that is without >> the context of the master device. So calling runtime apis in those >> places >> separately. >> >> Signed-off-by: Sricharan R >> [vivek: Cleanup pm runtime calls] >> Signed-off-by: Vivek Gautam >> --- >> drivers/iommu/arm-smmu.c | 96 >> >> 1 file changed, 88 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index c8b16f53f597..3d6a1875431f 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -209,6 +209,8 @@ struct arm_smmu_device { >> struct clk_bulk_data*clks; >> int num_clks; >> + boolrpm_supported; >> + > > > > > Can we not automatically infer this from whether clocks and/or power > domains > are specified or not, then just use pm_runtime_enabled() as the > fast-path > check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. > > I worry that relying on statically-defined matchdata is just going to > blow > up the driver and DT binding into a maintenance nightmare; I really > don't > want to start needing separate definitions for e.g. > "arm,juno-etr-mmu-401" > and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical > instance > within the SoC is in a separate controllable power domain while the > others > aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. >>> >>> >>> >>> Because you're then effectively defining "compatible" values for the sake >>> of >>> attaching software policy to them, rather than actually describing >>> different >>> hardware implementations. >>> >>> The fact that RPM can't do anything meaningful unless relevant >>> clock/power >>> aspects *are* described, however, means that we shouldn't need additional >>> information redundant with that. Much like the fact that we don't >>> *already* >>> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being >>> integrated such that IDR0.CTTW has the wrong value, since the presence or >>> not of the "dma-coherent" property already describes the truth in that >>> regard. >> >> >> Fair enough. >> >>> IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. >>> >>> >>> >>> Yes, but that's kind of my point; in reality high throughput/minimal >>> latency >>> and aggressive power management are more or less mutually exclusive. >>> Mobile >>> SoCs with fine-grained clock trees and power domains won't have multiple >>> 40GBe/NVMf/whatever links running flat out in parallel; conversely >>> networking/infrastructure/server SoCs aren't designed around saving every >>> last microamp of leakage current - even in the (fairly unlikely) case of >>> the >>> interconnect clocks being software-gateable at all I would be very >>> surprised >>> if that were ever exposed directly to Linux (FWIW I believe ACPI >>> essentially >>> *requires* clocks to be abstracted behind firmware). >>> >>> Realistically then, explicit clocks are only expected on systems which >>> care >>> about power management. We can always revisit that assumption if anything >>> crazy where it isn't the case ever becomes non-theoretical, but for now >>> it's >>> one I'm entirely comfortable with. If on the other hand it turns out that >>> we >>> can rely on just a power domain being present wherever we want RPM, >>> making >>> clocks moot, then all the better. >> >>
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Thu, Mar 8, 2018 at 9:12 PM, Robin Murphy wrote: > On 08/03/18 04:33, Tomasz Figa wrote: >> >> On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy wrote: >>> >>> On 07/03/18 13:52, Tomasz Figa wrote: On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: > > > On 02/03/18 10:10, Vivek Gautam wrote: >> >> >> >> From: Sricharan R >> >> The smmu device probe/remove and add/remove master device callbacks >> gets called when the smmu is not linked to its master, that is without >> the context of the master device. So calling runtime apis in those >> places >> separately. >> >> Signed-off-by: Sricharan R >> [vivek: Cleanup pm runtime calls] >> Signed-off-by: Vivek Gautam >> --- >> drivers/iommu/arm-smmu.c | 96 >> >> 1 file changed, 88 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index c8b16f53f597..3d6a1875431f 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -209,6 +209,8 @@ struct arm_smmu_device { >> struct clk_bulk_data*clks; >> int num_clks; >> + boolrpm_supported; >> + > > > > > Can we not automatically infer this from whether clocks and/or power > domains > are specified or not, then just use pm_runtime_enabled() as the > fast-path > check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. > > I worry that relying on statically-defined matchdata is just going to > blow > up the driver and DT binding into a maintenance nightmare; I really > don't > want to start needing separate definitions for e.g. > "arm,juno-etr-mmu-401" > and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical > instance > within the SoC is in a separate controllable power domain while the > others > aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. >>> >>> >>> >>> Because you're then effectively defining "compatible" values for the sake >>> of >>> attaching software policy to them, rather than actually describing >>> different >>> hardware implementations. >>> >>> The fact that RPM can't do anything meaningful unless relevant >>> clock/power >>> aspects *are* described, however, means that we shouldn't need additional >>> information redundant with that. Much like the fact that we don't >>> *already* >>> have an "arm,juno-hdlcd-mmu-401" compatible to account for those being >>> integrated such that IDR0.CTTW has the wrong value, since the presence or >>> not of the "dma-coherent" property already describes the truth in that >>> regard. >> >> >> Fair enough. >> >>> IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. >>> >>> >>> >>> Yes, but that's kind of my point; in reality high throughput/minimal >>> latency >>> and aggressive power management are more or less mutually exclusive. >>> Mobile >>> SoCs with fine-grained clock trees and power domains won't have multiple >>> 40GBe/NVMf/whatever links running flat out in parallel; conversely >>> networking/infrastructure/server SoCs aren't designed around saving every >>> last microamp of leakage current - even in the (fairly unlikely) case of >>> the >>> interconnect clocks being software-gateable at all I would be very >>> surprised >>> if that were ever exposed directly to Linux (FWIW I believe ACPI >>> essentially >>> *requires* clocks to be abstracted behind firmware). >>> >>> Realistically then, explicit clocks are only expected on systems which >>> care >>> about power management. We can always revisit that assumption if anything >>> crazy where it isn't the case ever becomes non-theoretical, but for now >>> it's >>> one I'm entirely comfortable with. If on the other hand it turns out that >>> we >>> can rely on just a power domain being present wherever we want RPM, >>> making >>> clocks moot, then all the better. >> >> >> Alright. Since Qcom would be the only user of clock and power handling >> for the time being, I think checking power domain presence could work
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 08/03/18 04:33, Tomasz Figa wrote: On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphywrote: On 07/03/18 13:52, Tomasz Figa wrote: On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + Can we not automatically infer this from whether clocks and/or power domains are specified or not, then just use pm_runtime_enabled() as the fast-path check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. I worry that relying on statically-defined matchdata is just going to blow up the driver and DT binding into a maintenance nightmare; I really don't want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance within the SoC is in a separate controllable power domain while the others aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. Because you're then effectively defining "compatible" values for the sake of attaching software policy to them, rather than actually describing different hardware implementations. The fact that RPM can't do anything meaningful unless relevant clock/power aspects *are* described, however, means that we shouldn't need additional information redundant with that. Much like the fact that we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to account for those being integrated such that IDR0.CTTW has the wrong value, since the presence or not of the "dma-coherent" property already describes the truth in that regard. Fair enough. IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. Yes, but that's kind of my point; in reality high throughput/minimal latency and aggressive power management are more or less mutually exclusive. Mobile SoCs with fine-grained clock trees and power domains won't have multiple 40GBe/NVMf/whatever links running flat out in parallel; conversely networking/infrastructure/server SoCs aren't designed around saving every last microamp of leakage current - even in the (fairly unlikely) case of the interconnect clocks being software-gateable at all I would be very surprised if that were ever exposed directly to Linux (FWIW I believe ACPI essentially *requires* clocks to be abstracted behind firmware). Realistically then, explicit clocks are only expected on systems which care about power management. We can always revisit that assumption if anything crazy where it isn't the case ever becomes non-theoretical, but for now it's one I'm entirely comfortable with. If on the other hand it turns out that we can rely on just a power domain being present wherever we want RPM, making clocks moot, then all the better. Alright. Since Qcom would be the only user of clock and power handling for the time being, I think checking power domain presence could work for us. +/- the fact that clocks need to be handled even if power domain is not present, but we should normally always have both. Great! (the issue of Qcom-specific clock handling is a separate argument which I don't feel like reigniting just now...) Now we need a way to do the check. Perhaps for the time being it would be enough to just check for the power-domains property in DT? AFAICS, it might be as simple as arm_smmu_probe() doing this: /* * We want to avoid touching dev->power.lock in fastpaths unless * it's really going to do
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 08/03/18 04:33, Tomasz Figa wrote: On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy wrote: On 07/03/18 13:52, Tomasz Figa wrote: On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + Can we not automatically infer this from whether clocks and/or power domains are specified or not, then just use pm_runtime_enabled() as the fast-path check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. I worry that relying on statically-defined matchdata is just going to blow up the driver and DT binding into a maintenance nightmare; I really don't want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance within the SoC is in a separate controllable power domain while the others aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. Because you're then effectively defining "compatible" values for the sake of attaching software policy to them, rather than actually describing different hardware implementations. The fact that RPM can't do anything meaningful unless relevant clock/power aspects *are* described, however, means that we shouldn't need additional information redundant with that. Much like the fact that we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to account for those being integrated such that IDR0.CTTW has the wrong value, since the presence or not of the "dma-coherent" property already describes the truth in that regard. Fair enough. IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. Yes, but that's kind of my point; in reality high throughput/minimal latency and aggressive power management are more or less mutually exclusive. Mobile SoCs with fine-grained clock trees and power domains won't have multiple 40GBe/NVMf/whatever links running flat out in parallel; conversely networking/infrastructure/server SoCs aren't designed around saving every last microamp of leakage current - even in the (fairly unlikely) case of the interconnect clocks being software-gateable at all I would be very surprised if that were ever exposed directly to Linux (FWIW I believe ACPI essentially *requires* clocks to be abstracted behind firmware). Realistically then, explicit clocks are only expected on systems which care about power management. We can always revisit that assumption if anything crazy where it isn't the case ever becomes non-theoretical, but for now it's one I'm entirely comfortable with. If on the other hand it turns out that we can rely on just a power domain being present wherever we want RPM, making clocks moot, then all the better. Alright. Since Qcom would be the only user of clock and power handling for the time being, I think checking power domain presence could work for us. +/- the fact that clocks need to be handled even if power domain is not present, but we should normally always have both. Great! (the issue of Qcom-specific clock handling is a separate argument which I don't feel like reigniting just now...) Now we need a way to do the check. Perhaps for the time being it would be enough to just check for the power-domains property in DT? AFAICS, it might be as simple as arm_smmu_probe() doing this: /* * We want to avoid touching dev->power.lock in fastpaths unless * it's really going to do something useful - pm_runtime_enabled() * can serve as an ideal proxy for that decision. */ if
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphywrote: > On 07/03/18 13:52, Tomasz Figa wrote: >> >> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: >>> >>> On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + >>> >>> >>> >>> Can we not automatically infer this from whether clocks and/or power >>> domains >>> are specified or not, then just use pm_runtime_enabled() as the fast-path >>> check as Tomasz originally proposed? >> >> >> I wouldn't tie this to presence of clocks, since as a next step we >> would want to actually control the clocks separately. (As far as I >> understand, on QCom SoCs we might want to have runtime PM active for >> the translation to work, but clocks gated whenever access to SMMU >> registers is not needed.) Moreover, you might still have some super >> high scale thousand-core systems that require clocks to be >> prepare-enabled, but runtime PM would be undesirable for the reasons >> we discussed before. >> >>> >>> I worry that relying on statically-defined matchdata is just going to >>> blow >>> up the driver and DT binding into a maintenance nightmare; I really don't >>> want to start needing separate definitions for e.g. >>> "arm,juno-etr-mmu-401" >>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical >>> instance >>> within the SoC is in a separate controllable power domain while the >>> others >>> aren't. >> >> >> I don't see a reason why both couldn't just have RPM supported >> regardless of whether there is a real power domain. It would >> effectively be just a no-op for those that don't have one. > > > Because you're then effectively defining "compatible" values for the sake of > attaching software policy to them, rather than actually describing different > hardware implementations. > > The fact that RPM can't do anything meaningful unless relevant clock/power > aspects *are* described, however, means that we shouldn't need additional > information redundant with that. Much like the fact that we don't *already* > have an "arm,juno-hdlcd-mmu-401" compatible to account for those being > integrated such that IDR0.CTTW has the wrong value, since the presence or > not of the "dma-coherent" property already describes the truth in that > regard. Fair enough. > >> IMHO the >> only reason to avoid having the RPM enabled is the scalability issue >> we discussed before. > > > Yes, but that's kind of my point; in reality high throughput/minimal latency > and aggressive power management are more or less mutually exclusive. Mobile > SoCs with fine-grained clock trees and power domains won't have multiple > 40GBe/NVMf/whatever links running flat out in parallel; conversely > networking/infrastructure/server SoCs aren't designed around saving every > last microamp of leakage current - even in the (fairly unlikely) case of the > interconnect clocks being software-gateable at all I would be very surprised > if that were ever exposed directly to Linux (FWIW I believe ACPI essentially > *requires* clocks to be abstracted behind firmware). > > Realistically then, explicit clocks are only expected on systems which care > about power management. We can always revisit that assumption if anything > crazy where it isn't the case ever becomes non-theoretical, but for now it's > one I'm entirely comfortable with. If on the other hand it turns out that we > can rely on just a power domain being present wherever we want RPM, making > clocks moot, then all the better. Alright. Since Qcom would be the only user of clock and power handling for the time being, I think checking power domain presence could work for us. +/- the fact that clocks need to be handled even if power domain is not present, but we should normally always have both. Now we need a way to do the check. Perhaps for the time being it would be enough to just check for the power-domains property in DT? Best regards, Tomasz
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Thu, Mar 8, 2018 at 1:58 AM, Robin Murphy wrote: > On 07/03/18 13:52, Tomasz Figa wrote: >> >> On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: >>> >>> On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + >>> >>> >>> >>> Can we not automatically infer this from whether clocks and/or power >>> domains >>> are specified or not, then just use pm_runtime_enabled() as the fast-path >>> check as Tomasz originally proposed? >> >> >> I wouldn't tie this to presence of clocks, since as a next step we >> would want to actually control the clocks separately. (As far as I >> understand, on QCom SoCs we might want to have runtime PM active for >> the translation to work, but clocks gated whenever access to SMMU >> registers is not needed.) Moreover, you might still have some super >> high scale thousand-core systems that require clocks to be >> prepare-enabled, but runtime PM would be undesirable for the reasons >> we discussed before. >> >>> >>> I worry that relying on statically-defined matchdata is just going to >>> blow >>> up the driver and DT binding into a maintenance nightmare; I really don't >>> want to start needing separate definitions for e.g. >>> "arm,juno-etr-mmu-401" >>> and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical >>> instance >>> within the SoC is in a separate controllable power domain while the >>> others >>> aren't. >> >> >> I don't see a reason why both couldn't just have RPM supported >> regardless of whether there is a real power domain. It would >> effectively be just a no-op for those that don't have one. > > > Because you're then effectively defining "compatible" values for the sake of > attaching software policy to them, rather than actually describing different > hardware implementations. > > The fact that RPM can't do anything meaningful unless relevant clock/power > aspects *are* described, however, means that we shouldn't need additional > information redundant with that. Much like the fact that we don't *already* > have an "arm,juno-hdlcd-mmu-401" compatible to account for those being > integrated such that IDR0.CTTW has the wrong value, since the presence or > not of the "dma-coherent" property already describes the truth in that > regard. Fair enough. > >> IMHO the >> only reason to avoid having the RPM enabled is the scalability issue >> we discussed before. > > > Yes, but that's kind of my point; in reality high throughput/minimal latency > and aggressive power management are more or less mutually exclusive. Mobile > SoCs with fine-grained clock trees and power domains won't have multiple > 40GBe/NVMf/whatever links running flat out in parallel; conversely > networking/infrastructure/server SoCs aren't designed around saving every > last microamp of leakage current - even in the (fairly unlikely) case of the > interconnect clocks being software-gateable at all I would be very surprised > if that were ever exposed directly to Linux (FWIW I believe ACPI essentially > *requires* clocks to be abstracted behind firmware). > > Realistically then, explicit clocks are only expected on systems which care > about power management. We can always revisit that assumption if anything > crazy where it isn't the case ever becomes non-theoretical, but for now it's > one I'm entirely comfortable with. If on the other hand it turns out that we > can rely on just a power domain being present wherever we want RPM, making > clocks moot, then all the better. Alright. Since Qcom would be the only user of clock and power handling for the time being, I think checking power domain presence could work for us. +/- the fact that clocks need to be handled even if power domain is not present, but we should normally always have both. Now we need a way to do the check. Perhaps for the time being it would be enough to just check for the power-domains property in DT? Best regards, Tomasz
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 07/03/18 13:52, Tomasz Figa wrote: On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphywrote: On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + Can we not automatically infer this from whether clocks and/or power domains are specified or not, then just use pm_runtime_enabled() as the fast-path check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. I worry that relying on statically-defined matchdata is just going to blow up the driver and DT binding into a maintenance nightmare; I really don't want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance within the SoC is in a separate controllable power domain while the others aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. Because you're then effectively defining "compatible" values for the sake of attaching software policy to them, rather than actually describing different hardware implementations. The fact that RPM can't do anything meaningful unless relevant clock/power aspects *are* described, however, means that we shouldn't need additional information redundant with that. Much like the fact that we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to account for those being integrated such that IDR0.CTTW has the wrong value, since the presence or not of the "dma-coherent" property already describes the truth in that regard. IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. Yes, but that's kind of my point; in reality high throughput/minimal latency and aggressive power management are more or less mutually exclusive. Mobile SoCs with fine-grained clock trees and power domains won't have multiple 40GBe/NVMf/whatever links running flat out in parallel; conversely networking/infrastructure/server SoCs aren't designed around saving every last microamp of leakage current - even in the (fairly unlikely) case of the interconnect clocks being software-gateable at all I would be very surprised if that were ever exposed directly to Linux (FWIW I believe ACPI essentially *requires* clocks to be abstracted behind firmware). Realistically then, explicit clocks are only expected on systems which care about power management. We can always revisit that assumption if anything crazy where it isn't the case ever becomes non-theoretical, but for now it's one I'm entirely comfortable with. If on the other hand it turns out that we can rely on just a power domain being present wherever we want RPM, making clocks moot, then all the better. Robin.
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 07/03/18 13:52, Tomasz Figa wrote: On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + Can we not automatically infer this from whether clocks and/or power domains are specified or not, then just use pm_runtime_enabled() as the fast-path check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. I worry that relying on statically-defined matchdata is just going to blow up the driver and DT binding into a maintenance nightmare; I really don't want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance within the SoC is in a separate controllable power domain while the others aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. Because you're then effectively defining "compatible" values for the sake of attaching software policy to them, rather than actually describing different hardware implementations. The fact that RPM can't do anything meaningful unless relevant clock/power aspects *are* described, however, means that we shouldn't need additional information redundant with that. Much like the fact that we don't *already* have an "arm,juno-hdlcd-mmu-401" compatible to account for those being integrated such that IDR0.CTTW has the wrong value, since the presence or not of the "dma-coherent" property already describes the truth in that regard. IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. Yes, but that's kind of my point; in reality high throughput/minimal latency and aggressive power management are more or less mutually exclusive. Mobile SoCs with fine-grained clock trees and power domains won't have multiple 40GBe/NVMf/whatever links running flat out in parallel; conversely networking/infrastructure/server SoCs aren't designed around saving every last microamp of leakage current - even in the (fairly unlikely) case of the interconnect clocks being software-gateable at all I would be very surprised if that were ever exposed directly to Linux (FWIW I believe ACPI essentially *requires* clocks to be abstracted behind firmware). Realistically then, explicit clocks are only expected on systems which care about power management. We can always revisit that assumption if anything crazy where it isn't the case ever becomes non-theoretical, but for now it's one I'm entirely comfortable with. If on the other hand it turns out that we can rely on just a power domain being present wherever we want RPM, making clocks moot, then all the better. Robin.
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphywrote: > On 02/03/18 10:10, Vivek Gautam wrote: >> >> From: Sricharan R >> >> The smmu device probe/remove and add/remove master device callbacks >> gets called when the smmu is not linked to its master, that is without >> the context of the master device. So calling runtime apis in those places >> separately. >> >> Signed-off-by: Sricharan R >> [vivek: Cleanup pm runtime calls] >> Signed-off-by: Vivek Gautam >> --- >> drivers/iommu/arm-smmu.c | 96 >> >> 1 file changed, 88 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index c8b16f53f597..3d6a1875431f 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -209,6 +209,8 @@ struct arm_smmu_device { >> struct clk_bulk_data*clks; >> int num_clks; >> + boolrpm_supported; >> + > > > Can we not automatically infer this from whether clocks and/or power domains > are specified or not, then just use pm_runtime_enabled() as the fast-path > check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. > > I worry that relying on statically-defined matchdata is just going to blow > up the driver and DT binding into a maintenance nightmare; I really don't > want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" > and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance > within the SoC is in a separate controllable power domain while the others > aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. Best regards, Tomasz
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On Wed, Mar 7, 2018 at 9:38 PM, Robin Murphy wrote: > On 02/03/18 10:10, Vivek Gautam wrote: >> >> From: Sricharan R >> >> The smmu device probe/remove and add/remove master device callbacks >> gets called when the smmu is not linked to its master, that is without >> the context of the master device. So calling runtime apis in those places >> separately. >> >> Signed-off-by: Sricharan R >> [vivek: Cleanup pm runtime calls] >> Signed-off-by: Vivek Gautam >> --- >> drivers/iommu/arm-smmu.c | 96 >> >> 1 file changed, 88 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c >> index c8b16f53f597..3d6a1875431f 100644 >> --- a/drivers/iommu/arm-smmu.c >> +++ b/drivers/iommu/arm-smmu.c >> @@ -209,6 +209,8 @@ struct arm_smmu_device { >> struct clk_bulk_data*clks; >> int num_clks; >> + boolrpm_supported; >> + > > > Can we not automatically infer this from whether clocks and/or power domains > are specified or not, then just use pm_runtime_enabled() as the fast-path > check as Tomasz originally proposed? I wouldn't tie this to presence of clocks, since as a next step we would want to actually control the clocks separately. (As far as I understand, on QCom SoCs we might want to have runtime PM active for the translation to work, but clocks gated whenever access to SMMU registers is not needed.) Moreover, you might still have some super high scale thousand-core systems that require clocks to be prepare-enabled, but runtime PM would be undesirable for the reasons we discussed before. > > I worry that relying on statically-defined matchdata is just going to blow > up the driver and DT binding into a maintenance nightmare; I really don't > want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" > and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance > within the SoC is in a separate controllable power domain while the others > aren't. I don't see a reason why both couldn't just have RPM supported regardless of whether there is a real power domain. It would effectively be just a no-op for those that don't have one. IMHO the only reason to avoid having the RPM enabled is the scalability issue we discussed before. Best regards, Tomasz
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan RThe smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + Can we not automatically infer this from whether clocks and/or power domains are specified or not, then just use pm_runtime_enabled() as the fast-path check as Tomasz originally proposed? I worry that relying on statically-defined matchdata is just going to blow up the driver and DT binding into a maintenance nightmare; I really don't want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance within the SoC is in a separate controllable power domain while the others aren't. Robin. u32 cavium_id_base; /* Specific to Cavium */ spinlock_t global_sync_lock; @@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, }; +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + return pm_runtime_get_sync(smmu->dev); + + return 0; +} + +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + pm_runtime_put(smmu->dev); +} + static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); @@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = _domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + arm_smmu_rpm_put(smmu); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENODEV; smmu = fwspec_smmu(fwspec); + + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return ret; + /* Ensure that the domain is finalised */ ret = arm_smmu_init_domain_context(domain, smmu); if (ret < 0) - return ret; + goto rpm_put; /* * Sanity check the domain. We don't support domains across @@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } /* Looks ok, so add the device to the domain */ - return arm_smmu_domain_add_master(smmu_domain, fwspec); + ret = arm_smmu_domain_add_master(smmu_domain, fwspec); + + arm_smmu_rpm_put(smmu); + + return ret; destroy_domain: arm_smmu_destroy_domain_context(domain); +rpm_put: + arm_smmu_rpm_put(smmu); + return ret; } @@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + int ret; if (!ops) return -ENODEV; - return ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_get(smmu); + ret = ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_put(smmu); + + return ret; } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; +
Re: [PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
On 02/03/18 10:10, Vivek Gautam wrote: From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + Can we not automatically infer this from whether clocks and/or power domains are specified or not, then just use pm_runtime_enabled() as the fast-path check as Tomasz originally proposed? I worry that relying on statically-defined matchdata is just going to blow up the driver and DT binding into a maintenance nightmare; I really don't want to start needing separate definitions for e.g. "arm,juno-etr-mmu-401" and "arm,juno-hdlcd-mmu-401" just because one otherwise-identical instance within the SoC is in a separate controllable power domain while the others aren't. Robin. u32 cavium_id_base; /* Specific to Cavium */ spinlock_t global_sync_lock; @@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, }; +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + return pm_runtime_get_sync(smmu->dev); + + return 0; +} + +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + pm_runtime_put(smmu->dev); +} + static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); @@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = _domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + arm_smmu_rpm_put(smmu); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENODEV; smmu = fwspec_smmu(fwspec); + + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return ret; + /* Ensure that the domain is finalised */ ret = arm_smmu_init_domain_context(domain, smmu); if (ret < 0) - return ret; + goto rpm_put; /* * Sanity check the domain. We don't support domains across @@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } /* Looks ok, so add the device to the domain */ - return arm_smmu_domain_add_master(smmu_domain, fwspec); + ret = arm_smmu_domain_add_master(smmu_domain, fwspec); + + arm_smmu_rpm_put(smmu); + + return ret; destroy_domain: arm_smmu_destroy_domain_context(domain); +rpm_put: + arm_smmu_rpm_put(smmu); + return ret; } @@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + int ret; if (!ops) return -ENODEV; - return ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_get(smmu); + ret = ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_put(smmu); + + return ret; } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct
[PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
From: Sricharan RThe smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + u32 cavium_id_base; /* Specific to Cavium */ spinlock_t global_sync_lock; @@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, }; +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + return pm_runtime_get_sync(smmu->dev); + + return 0; +} + +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + pm_runtime_put(smmu->dev); +} + static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); @@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = _domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + arm_smmu_rpm_put(smmu); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENODEV; smmu = fwspec_smmu(fwspec); + + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return ret; + /* Ensure that the domain is finalised */ ret = arm_smmu_init_domain_context(domain, smmu); if (ret < 0) - return ret; + goto rpm_put; /* * Sanity check the domain. We don't support domains across @@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } /* Looks ok, so add the device to the domain */ - return arm_smmu_domain_add_master(smmu_domain, fwspec); + ret = arm_smmu_domain_add_master(smmu_domain, fwspec); + + arm_smmu_rpm_put(smmu); + + return ret; destroy_domain: arm_smmu_destroy_domain_context(domain); +rpm_put: + arm_smmu_rpm_put(smmu); + return ret; } @@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + int ret; if (!ops) return -ENODEV; - return ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_get(smmu); + ret = ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_put(smmu); + + return ret; } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + size_t ret; if (!ops) return 0; - return ops->unmap(ops, iova, size); + arm_smmu_rpm_get(smmu); + ret = ops->unmap(ops, iova, size); + arm_smmu_rpm_put(smmu); + + return ret; } static void arm_smmu_iotlb_sync(struct iommu_domain *domain) @@ -1412,14 +1460,22 @@ static int arm_smmu_add_device(struct device *dev) while (i--) cfg->smendx[i] = INVALID_SMENDX; +
[PATCH v8 3/5] iommu/arm-smmu: Invoke pm_runtime during probe, add/remove device
From: Sricharan R The smmu device probe/remove and add/remove master device callbacks gets called when the smmu is not linked to its master, that is without the context of the master device. So calling runtime apis in those places separately. Signed-off-by: Sricharan R [vivek: Cleanup pm runtime calls] Signed-off-by: Vivek Gautam --- drivers/iommu/arm-smmu.c | 96 1 file changed, 88 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index c8b16f53f597..3d6a1875431f 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -209,6 +209,8 @@ struct arm_smmu_device { struct clk_bulk_data*clks; int num_clks; + boolrpm_supported; + u32 cavium_id_base; /* Specific to Cavium */ spinlock_t global_sync_lock; @@ -268,6 +270,20 @@ static struct arm_smmu_option_prop arm_smmu_options[] = { { 0, NULL}, }; +static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + return pm_runtime_get_sync(smmu->dev); + + return 0; +} + +static inline void arm_smmu_rpm_put(struct arm_smmu_device *smmu) +{ + if (smmu->rpm_supported) + pm_runtime_put(smmu->dev); +} + static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain); @@ -913,11 +929,15 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct arm_smmu_device *smmu = smmu_domain->smmu; struct arm_smmu_cfg *cfg = _domain->cfg; - int irq; + int ret, irq; if (!smmu || domain->type == IOMMU_DOMAIN_IDENTITY) return; + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return; + /* * Disable the context bank and free the page tables before freeing * it. @@ -932,6 +952,8 @@ static void arm_smmu_destroy_domain_context(struct iommu_domain *domain) free_io_pgtable_ops(smmu_domain->pgtbl_ops); __arm_smmu_free_bitmap(smmu->context_map, cfg->cbndx); + + arm_smmu_rpm_put(smmu); } static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) @@ -1213,10 +1235,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENODEV; smmu = fwspec_smmu(fwspec); + + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + return ret; + /* Ensure that the domain is finalised */ ret = arm_smmu_init_domain_context(domain, smmu); if (ret < 0) - return ret; + goto rpm_put; /* * Sanity check the domain. We don't support domains across @@ -1231,10 +1258,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) } /* Looks ok, so add the device to the domain */ - return arm_smmu_domain_add_master(smmu_domain, fwspec); + ret = arm_smmu_domain_add_master(smmu_domain, fwspec); + + arm_smmu_rpm_put(smmu); + + return ret; destroy_domain: arm_smmu_destroy_domain_context(domain); +rpm_put: + arm_smmu_rpm_put(smmu); + return ret; } @@ -1242,22 +1276,36 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + int ret; if (!ops) return -ENODEV; - return ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_get(smmu); + ret = ops->map(ops, iova, paddr, size, prot); + arm_smmu_rpm_put(smmu); + + return ret; } static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + size_t ret; if (!ops) return 0; - return ops->unmap(ops, iova, size); + arm_smmu_rpm_get(smmu); + ret = ops->unmap(ops, iova, size); + arm_smmu_rpm_put(smmu); + + return ret; } static void arm_smmu_iotlb_sync(struct iommu_domain *domain) @@ -1412,14 +1460,22 @@ static int arm_smmu_add_device(struct device *dev) while (i--) cfg->smendx[i] = INVALID_SMENDX; + ret = arm_smmu_rpm_get(smmu); + if (ret < 0) + goto