Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c
On Mon, 20 Jul 2020 at 21:27, Rob Clark wrote: > > On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy wrote: > > > > On 2020-07-20 08:17, Arnd Bergmann wrote: > > > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju > > > wrote: <> > > >> [5.444121] Unable to handle kernel NULL pointer dereference at > > >> virtual address 0018 > > >> [5.456615] ESR = 0x9604 > > >> [5.464471] SET = 0, FnV = 0 > > >> [5.464487] EA = 0, S1PTW = 0 > > >> [5.466521] Data abort info: > > >> [5.469971] ISV = 0, ISS = 0x0004 > > >> [5.472768] CM = 0, WnR = 0 > > >> [5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000 > > >> [5.479349] [0018] pgd=, > > >> p4d= > > >> [5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP > > >> [5.492448] Modules linked in: crct10dif_ce adv7511(+) > > >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg > > >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng > > >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns > > >> rmtfs_mem fuse > > >> [5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 > > >> #1 > > >> [5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC > > >> (DT) > > >> [5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--) > > >> [5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 > > >> [5.541148] lr : free_io_pgtable_ops+0x28/0x58 <> > > >> [5.628297] Call trace: > > >> [5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 > > > > > > This means that dev_iommu_fwspec_get() has returned NULL > > > in qcom_iommu_tlb_inv_context(), either because dev->iommu > > > is NULL, or because dev->iommu->fwspec is NULL. > > > > > > qcom_iommu_tlb_inv_context() does not check for a NULL > > > pointer before using the returned object. > > > > > > The bug is either in the lack of error handling, or the fact > > > that it's possible to get into this function for a device > > > that has not been fully set up. > > > > Not quite - the device *was* properly set up, but has already been > > properly torn down again in the removal path by iommu_release_device(). > > The problem is that qcom-iommu kept the device pointer as its TLB cookie > > for the domain, but the domain has a longer lifespan than the validity > > of that device - that's a fundamental design flaw in the driver. > > fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb > cookie".. untested but looks like a straightforward enough change to > switch over to using the domain rather than dev as cookie The proposed patch tested and confirmed the reported problem fixed. ref: https://lore.kernel.org/linux-iommu/ca+g9fytj1rbycphxzrm-qm5ygtdlj1jd8vfzsqqvwi_dnjl...@mail.gmail.com/T/#m36a1fca18098f6c34275d928f9ba9c40c6d7fd63 https://lkft.validation.linaro.org/scheduler/job/1593950#L3392 > > BR, > -R - Naresh ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c
On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy wrote: > > On 2020-07-20 08:17, Arnd Bergmann wrote: > > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju > > wrote: > >> > >> This kernel oops while boot linux mainline kernel on arm64 db410c device. > >> > >> metadata: > >>git branch: master > >>git repo: > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >>git commit: f8456690ba8eb18ea4714e68554e242a04f65cff > >>git describe: v5.8-rc5-48-gf8456690ba8e > >>make_kernelversion: 5.8.0-rc5 > >>kernel-config: > >> https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config > > > > Thanks for the report. Adding freedreno folks to Cc, as this may have > > something > > to do with that driver. > > > >> > >> [5.444121] Unable to handle kernel NULL pointer dereference at > >> virtual address 0018 > >> [5.456615] ESR = 0x9604 > >> [5.464471] SET = 0, FnV = 0 > >> [5.464487] EA = 0, S1PTW = 0 > >> [5.466521] Data abort info: > >> [5.469971] ISV = 0, ISS = 0x0004 > >> [5.472768] CM = 0, WnR = 0 > >> [5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000 > >> [5.479349] [0018] pgd=, > >> p4d= > >> [5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP > >> [5.492448] Modules linked in: crct10dif_ce adv7511(+) > >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg > >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng > >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns > >> rmtfs_mem fuse > >> [5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 > >> [5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > >> [5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--) > >> [5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 > >> [5.541148] lr : free_io_pgtable_ops+0x28/0x58 > >> [5.546350] sp : 80001219b5f0 > >> [5.550689] x29: 80001219b5f0 x28: 0013 > >> [5.554078] x27: 0100 x26: 36add3b8 > >> [5.559459] x25: 8915e910 x24: 3a5458c0 > >> [5.564753] x23: 0003 x22: 36a37058 > >> [5.570049] x21: 36a3a100 x20: 36a3a480 > >> [5.575344] x19: 36a37158 x18: > >> [5.580639] x17: x16: > >> [5.585935] x15: 0004 x14: 0368 > >> [5.591229] x13: x12: 39c61798 > >> [5.596525] x11: 39c616d0 x10: 4000 > >> [5.601820] x9 : x8 : 39c616f8 > >> [5.607114] x7 : x6 : 09f699a0 > >> [5.612410] x5 : 80001219b520 x4 : 36a3a000 > >> [5.617705] x3 : 09f69904 x2 : > >> [5.623001] x1 : 8000107e27e8 x0 : 3a545810 > >> [5.628297] Call trace: > >> [5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 > > > > This means that dev_iommu_fwspec_get() has returned NULL > > in qcom_iommu_tlb_inv_context(), either because dev->iommu > > is NULL, or because dev->iommu->fwspec is NULL. > > > > qcom_iommu_tlb_inv_context() does not check for a NULL > > pointer before using the returned object. > > > > The bug is either in the lack of error handling, or the fact > > that it's possible to get into this function for a device > > that has not been fully set up. > > Not quite - the device *was* properly set up, but has already been > properly torn down again in the removal path by iommu_release_device(). > The problem is that qcom-iommu kept the device pointer as its TLB cookie > for the domain, but the domain has a longer lifespan than the validity > of that device - that's a fundamental design flaw in the driver. fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb cookie".. untested but looks like a straightforward enough change to switch over to using the domain rather than dev as cookie BR, -R > > Robin. > > >> [5.635764] free_io_pgtable_ops+0x28/0x58 > >> [5.640624] qcom_iommu_domain_free+0x38/0x60 > >> [5.644617] iommu_group_release+0x4c/0x70 > >> [5.649045] kobject_put+0x6c/0x120 > >> [5.653035] kobject_del+0x64/0x90 > >> [5.656421] kobject_put+0xfc/0x120 > >> [5.659893] iommu_group_remove_device+0xdc/0xf0 > >> [5.663281] iommu_release_device+0x44/0x70 > >> [5.668142] iommu_bus_notifier+0xbc/0xd0 > >> [5.672048] notifier_call_chain+0x54/0x98 > >> [5.676214] blocking_notifier_call_chain+0x48/0x70 > >> [5.680209] device_del+0x26c/0x3a0 > >> [5.684981] platform_device_del.part.0+0x1c/0x88 > >> [5.688453] platform_device_unregister+0x24/0x40 > >> [5.693316] of_platform_device_destroy+0xe4/0xf8 > >> [5.698002] device_for_each_child+0x5c/0xa8 > >> [5.702689] of_platform_depopulate+0x3c/0x80 >