Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
Hi Robin, On 7/6/2022 10:15 PM, Robin Murphy wrote: On 2022-05-26 05:14, Sai Prakash Ranjan wrote: TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- Changes in v2: * Use scm call consistently so that it works on older chipsets where some of these regs are secure registers. * Add device specific data to get the implementation defined register offsets. --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + 3 files changed, 146 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7820711c4560..bb68aa85b28b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -5,13 +5,27 @@ #include #include +#include #include #include #include "arm-smmu.h" +#define QCOM_DUMMY_VAL -1 + +enum qcom_smmu_impl_reg_offset { + QCOM_SMMU_TBU_PWR_STATUS, + QCOM_SMMU_STATS_SYNC_INV_TBU_ACK, + QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR, +}; + +struct qcom_smmu_config { + const u32 *reg_offset; +}; + struct qcom_smmu { struct arm_smmu_device smmu; + const struct qcom_smmu_config *cfg; bool bypass_quirk; u8 bypass_cbndx; u32 stall_enabled; @@ -22,6 +36,56 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) return container_of(smmu, struct qcom_smmu, smmu); } +static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, + int sync, int status) +{ + int ret; + unsigned int spin_cnt, delay; + u32 reg, tbu_pwr_status, sync_inv_ack, sync_inv_progress; + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); + const struct qcom_smmu_config *cfg; + + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); + for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { + for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { + reg = arm_smmu_readl(smmu, page, status); + if (!(reg & ARM_SMMU_sTLBGSTATUS_GSACTIVE)) + return; + cpu_relax(); + } + udelay(delay); + } + + dev_err_ratelimited(smmu->dev, + "TLB sync timed out -- SMMU may be deadlocked\n"); Maybe consider a single ratelimit state for the whole function so all the output stays together. If things go sufficiently wrong, mixed up bits of partial output from different events may be misleadingly unhelpful (and at the very least it'll be up to 5x more effective at the intent of limiting log spam). Right, makes sense. Will change it. + cfg = qsmmu->cfg; + if (!cfg) + return; + + ret = qcom_scm_io_readl(smmu->ioaddr + cfg->reg_offset[QCOM_SMMU_TBU_PWR_STATUS], + &tbu_pwr_status); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TBU power status: %d\n", ret); + + ret = qcom_scm_io_readl(smmu->ioaddr + cfg->reg_offset[QCOM_SMMU_STATS_SYNC_INV_TBU_ACK], + &sync_inv_ack); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TBU sync/inv ack status: %d\n", ret); + + ret = qcom_scm_io_readl(smmu->ioaddr + cfg->reg_offset[QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR], + &sync_inv_progress); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TCU syn/inv progress: %d\n", ret); + + dev_err_ratelimited(smmu->dev, + "TBU: power_status %#x sync_inv_ack %#x sync_inv_progress %#x\n", + tbu_pwr_status, sync_inv_ack, sync_inv_progress); +} + static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx, u32 reg) { @@ -374,6 +438,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = { .def_domain_type = qcom_smmu_def_domain_type, .reset = qcom_smmu500_reset, .write_s2cr = qcom_smmu_write_s2cr, + .tlb_sync = qcom_smmu_tlb_sync, }; static const struct arm_smmu_impl qcom_adreno_smmu_impl = { @@ -382,12 +447,84 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = { .reset = qcom_smmu500_reset, .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank, .write_sctlr = qcom_adreno_smmu_write_sctlr, + .tlb_sync = qcom_smmu_tlb_sync, +}; + +/* Implementation Defined Register Space 0 register offsets */ +static const u32 qcom_smmu_impl0_reg_offset[] = { + [QC
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
On 7/6/2022 5:26 PM, Will Deacon wrote: On Thu, May 26, 2022 at 09:44:03AM +0530, Sai Prakash Ranjan wrote: TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- Changes in v2: * Use scm call consistently so that it works on older chipsets where some of these regs are secure registers. * Add device specific data to get the implementation defined register offsets. --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + 3 files changed, 146 insertions(+), 18 deletions(-) If this is useful to you, then I suppose it's something we could support, however I'm pretty worried about our ability to maintain/scale this stuff as it is extended to support additional SoCs and other custom debugging features. Perhaps you could stick it all in arm-smmu-qcom-debug.c and have a new config option for that, so at least it's even further out of the way? Will Sounds good to me, will do that. Thanks, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
On 2022-05-26 05:14, Sai Prakash Ranjan wrote: TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- Changes in v2: * Use scm call consistently so that it works on older chipsets where some of these regs are secure registers. * Add device specific data to get the implementation defined register offsets. --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + 3 files changed, 146 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 7820711c4560..bb68aa85b28b 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -5,13 +5,27 @@ #include #include +#include #include #include #include "arm-smmu.h" +#define QCOM_DUMMY_VAL -1 + +enum qcom_smmu_impl_reg_offset { + QCOM_SMMU_TBU_PWR_STATUS, + QCOM_SMMU_STATS_SYNC_INV_TBU_ACK, + QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR, +}; + +struct qcom_smmu_config { + const u32 *reg_offset; +}; + struct qcom_smmu { struct arm_smmu_device smmu; + const struct qcom_smmu_config *cfg; bool bypass_quirk; u8 bypass_cbndx; u32 stall_enabled; @@ -22,6 +36,56 @@ static struct qcom_smmu *to_qcom_smmu(struct arm_smmu_device *smmu) return container_of(smmu, struct qcom_smmu, smmu); } +static void qcom_smmu_tlb_sync(struct arm_smmu_device *smmu, int page, + int sync, int status) +{ + int ret; + unsigned int spin_cnt, delay; + u32 reg, tbu_pwr_status, sync_inv_ack, sync_inv_progress; + struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); + const struct qcom_smmu_config *cfg; + + arm_smmu_writel(smmu, page, sync, QCOM_DUMMY_VAL); + for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) { + for (spin_cnt = TLB_SPIN_COUNT; spin_cnt > 0; spin_cnt--) { + reg = arm_smmu_readl(smmu, page, status); + if (!(reg & ARM_SMMU_sTLBGSTATUS_GSACTIVE)) + return; + cpu_relax(); + } + udelay(delay); + } + + dev_err_ratelimited(smmu->dev, + "TLB sync timed out -- SMMU may be deadlocked\n"); Maybe consider a single ratelimit state for the whole function so all the output stays together. If things go sufficiently wrong, mixed up bits of partial output from different events may be misleadingly unhelpful (and at the very least it'll be up to 5x more effective at the intent of limiting log spam). + cfg = qsmmu->cfg; + if (!cfg) + return; + + ret = qcom_scm_io_readl(smmu->ioaddr + cfg->reg_offset[QCOM_SMMU_TBU_PWR_STATUS], + &tbu_pwr_status); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TBU power status: %d\n", ret); + + ret = qcom_scm_io_readl(smmu->ioaddr + cfg->reg_offset[QCOM_SMMU_STATS_SYNC_INV_TBU_ACK], + &sync_inv_ack); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TBU sync/inv ack status: %d\n", ret); + + ret = qcom_scm_io_readl(smmu->ioaddr + cfg->reg_offset[QCOM_SMMU_MMU2QSS_AND_SAFE_WAIT_CNTR], + &sync_inv_progress); + if (ret) + dev_err_ratelimited(smmu->dev, + "Failed to read TCU syn/inv progress: %d\n", ret); + + dev_err_ratelimited(smmu->dev, + "TBU: power_status %#x sync_inv_ack %#x sync_inv_progress %#x\n", + tbu_pwr_status, sync_inv_ack, sync_inv_progress); +} + static void qcom_adreno_smmu_write_sctlr(struct arm_smmu_device *smmu, int idx, u32 reg) { @@ -374,6 +438,7 @@ static const struct arm_smmu_impl qcom_smmu_impl = { .def_domain_type = qcom_smmu_def_domain_type, .reset = qcom_smmu500_reset, .write_s2cr = qcom_smmu_write_s2cr, + .tlb_sync = qcom_smmu_tlb_sync, }; static const struct arm_smmu_impl qcom_adreno_smmu_impl = { @@ -382,12 +447,84 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = { .reset = qcom_smmu500_reset, .alloc_context_bank = qco
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
On Thu, May 26, 2022 at 09:44:03AM +0530, Sai Prakash Ranjan wrote: > TLB sync timeouts can be due to various reasons such as TBU power down > or pending TCU/TBU invalidation/sync and so on. Debugging these often > require dumping of some implementation defined registers to know the > status of TBU/TCU operations and some of these registers are not > accessible in non-secure world such as from kernel and requires SMC > calls to read them in the secure world. So, add this debug support > to dump implementation defined registers for TLB sync timeout issues. > > Signed-off-by: Sai Prakash Ranjan > --- > > Changes in v2: > * Use scm call consistently so that it works on older chipsets where >some of these regs are secure registers. > * Add device specific data to get the implementation defined register >offsets. > > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + > drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + > 3 files changed, 146 insertions(+), 18 deletions(-) If this is useful to you, then I suppose it's something we could support, however I'm pretty worried about our ability to maintain/scale this stuff as it is extended to support additional SoCs and other custom debugging features. Perhaps you could stick it all in arm-smmu-qcom-debug.c and have a new config option for that, so at least it's even further out of the way? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
On 6/23/2022 11:32 AM, Sai Prakash Ranjan wrote: On 5/26/2022 9:44 AM, Sai Prakash Ranjan wrote: TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- Changes in v2: * Use scm call consistently so that it works on older chipsets where some of these regs are secure registers. * Add device specific data to get the implementation defined register offsets. --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + 3 files changed, 146 insertions(+), 18 deletions(-) Any comments on this patch? Gentle Ping !! Thanks, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
On 5/26/2022 9:44 AM, Sai Prakash Ranjan wrote: TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- Changes in v2: * Use scm call consistently so that it works on older chipsets where some of these regs are secure registers. * Add device specific data to get the implementation defined register offsets. --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + 3 files changed, 146 insertions(+), 18 deletions(-) Any comments on this patch? Thanks, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
Hi Vincent, On 6/9/2022 2:52 AM, Vincent Knecht wrote: Le jeudi 26 mai 2022 à 09:44 +0530, Sai Prakash Ranjan a écrit : TLB sync timeouts can be due to various reasons such as TBU power down or pending TCU/TBU invalidation/sync and so on. Debugging these often require dumping of some implementation defined registers to know the status of TBU/TCU operations and some of these registers are not accessible in non-secure world such as from kernel and requires SMC calls to read them in the secure world. So, add this debug support to dump implementation defined registers for TLB sync timeout issues. Signed-off-by: Sai Prakash Ranjan --- Changes in v2: * Use scm call consistently so that it works on older chipsets where some of these regs are secure registers. * Add device specific data to get the implementation defined register offsets. --- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + 3 files changed, 146 insertions(+), 18 deletions(-) Hi Sai, and thanks for this patch ! I've encountered TLB sync timeouts with msm8939 SoC recently. What would be needed to add to this patch so this SoC is supported ? Like, where could one check the values to be used in an equivalent of qcom_smmu_impl0_reg_offset values for this SoC (if any change needed) ? Current values are not found by simply greping in downstream/vendor dtsi/dts files... These are implementation defined registers and some might not be present on older SoCs and sometimes they don't add this support in downstream kernels even if the registers are present. I looked up the IP doc for msm8939 and I could find only TBU_PWR_STATUS register and you can use the same offset for it as given in this patch. Thanks, Sai ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2] iommu/arm-smmu-qcom: Add debug support for TLB sync timeouts
Le jeudi 26 mai 2022 à 09:44 +0530, Sai Prakash Ranjan a écrit : > TLB sync timeouts can be due to various reasons such as TBU power down > or pending TCU/TBU invalidation/sync and so on. Debugging these often > require dumping of some implementation defined registers to know the > status of TBU/TCU operations and some of these registers are not > accessible in non-secure world such as from kernel and requires SMC > calls to read them in the secure world. So, add this debug support > to dump implementation defined registers for TLB sync timeout issues. > > Signed-off-by: Sai Prakash Ranjan > --- > > Changes in v2: > * Use scm call consistently so that it works on older chipsets where > some of these regs are secure registers. > * Add device specific data to get the implementation defined register > offsets. > > --- > drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 161 ++--- > drivers/iommu/arm/arm-smmu/arm-smmu.c | 2 + > drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + > 3 files changed, 146 insertions(+), 18 deletions(-) Hi Sai, and thanks for this patch ! I've encountered TLB sync timeouts with msm8939 SoC recently. What would be needed to add to this patch so this SoC is supported ? Like, where could one check the values to be used in an equivalent of qcom_smmu_impl0_reg_offset values for this SoC (if any change needed) ? Current values are not found by simply greping in downstream/vendor dtsi/dts files... ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu