Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c

2020-07-20 Thread Naresh Kamboju
On Mon, 20 Jul 2020 at 21:27, Rob Clark  wrote:
>
> On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy  wrote:
> >
> > On 2020-07-20 08:17, Arnd Bergmann wrote:
> > > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju
> > >  wrote:
<>
> > >> [5.444121] Unable to handle kernel NULL pointer dereference at
> > >> virtual address 0018
> > >> [5.456615]   ESR = 0x9604
> > >> [5.464471]   SET = 0, FnV = 0
> > >> [5.464487]   EA = 0, S1PTW = 0
> > >> [5.466521] Data abort info:
> > >> [5.469971]   ISV = 0, ISS = 0x0004
> > >> [5.472768]   CM = 0, WnR = 0
> > >> [5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000
> > >> [5.479349] [0018] pgd=, 
> > >> p4d=
> > >> [5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP
> > >> [5.492448] Modules linked in: crct10dif_ce adv7511(+)
> > >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg
> > >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng
> > >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns
> > >> rmtfs_mem fuse
> > >> [5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 
> > >> #1
> > >> [5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC 
> > >> (DT)
> > >> [5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--)
> > >> [5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8
> > >> [5.541148] lr : free_io_pgtable_ops+0x28/0x58
<>
> > >> [5.628297] Call trace:
> > >> [5.633592]  qcom_iommu_tlb_inv_context+0x18/0xa8
> > >
> > > This means that dev_iommu_fwspec_get() has returned NULL
> > > in qcom_iommu_tlb_inv_context(), either because dev->iommu
> > > is NULL, or because dev->iommu->fwspec is NULL.
> > >
> > > qcom_iommu_tlb_inv_context() does not check for a NULL
> > > pointer before using the returned object.
> > >
> > > The bug is either in the lack of error handling, or the fact
> > > that it's possible to get into this function for a device
> > > that has not been fully set up.
> >
> > Not quite - the device *was* properly set up, but has already been
> > properly torn down again in the removal path by iommu_release_device().
> > The problem is that qcom-iommu kept the device pointer as its TLB cookie
> > for the domain, but the domain has a longer lifespan than the validity
> > of that device - that's a fundamental design flaw in the driver.
>
> fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb
> cookie".. untested but looks like a straightforward enough change to
> switch over to using the domain rather than dev as cookie

The proposed patch tested and confirmed the reported problem fixed.

ref:
https://lore.kernel.org/linux-iommu/ca+g9fytj1rbycphxzrm-qm5ygtdlj1jd8vfzsqqvwi_dnjl...@mail.gmail.com/T/#m36a1fca18098f6c34275d928f9ba9c40c6d7fd63
https://lkft.validation.linaro.org/scheduler/job/1593950#L3392


>
> BR,
> -R


- Naresh
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c

2020-07-20 Thread Rob Clark
On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy  wrote:
>
> On 2020-07-20 08:17, Arnd Bergmann wrote:
> > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju
> >  wrote:
> >>
> >> This kernel oops while boot linux mainline kernel on arm64  db410c device.
> >>
> >> metadata:
> >>git branch: master
> >>git repo: 
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> >>git commit: f8456690ba8eb18ea4714e68554e242a04f65cff
> >>git describe: v5.8-rc5-48-gf8456690ba8e
> >>make_kernelversion: 5.8.0-rc5
> >>kernel-config:
> >> https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config
> >
> > Thanks for the report. Adding freedreno folks to Cc, as this may have 
> > something
> > to do with that driver.
> >
> >>
> >> [5.444121] Unable to handle kernel NULL pointer dereference at
> >> virtual address 0018
> >> [5.456615]   ESR = 0x9604
> >> [5.464471]   SET = 0, FnV = 0
> >> [5.464487]   EA = 0, S1PTW = 0
> >> [5.466521] Data abort info:
> >> [5.469971]   ISV = 0, ISS = 0x0004
> >> [5.472768]   CM = 0, WnR = 0
> >> [5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=bacba000
> >> [5.479349] [0018] pgd=, 
> >> p4d=
> >> [5.485820] Internal error: Oops: 9604 [#1] PREEMPT SMP
> >> [5.492448] Modules linked in: crct10dif_ce adv7511(+)
> >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg
> >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng
> >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns
> >> rmtfs_mem fuse
> >> [5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1
> >> [5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> >> [5.529170] pstate: 2005 (nzCv daif -PAN -UAO BTYPE=--)
> >> [5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8
> >> [5.541148] lr : free_io_pgtable_ops+0x28/0x58
> >> [5.546350] sp : 80001219b5f0
> >> [5.550689] x29: 80001219b5f0 x28: 0013
> >> [5.554078] x27: 0100 x26: 36add3b8
> >> [5.559459] x25: 8915e910 x24: 3a5458c0
> >> [5.564753] x23: 0003 x22: 36a37058
> >> [5.570049] x21: 36a3a100 x20: 36a3a480
> >> [5.575344] x19: 36a37158 x18: 
> >> [5.580639] x17:  x16: 
> >> [5.585935] x15: 0004 x14: 0368
> >> [5.591229] x13:  x12: 39c61798
> >> [5.596525] x11: 39c616d0 x10: 4000
> >> [5.601820] x9 :  x8 : 39c616f8
> >> [5.607114] x7 :  x6 : 09f699a0
> >> [5.612410] x5 : 80001219b520 x4 : 36a3a000
> >> [5.617705] x3 : 09f69904 x2 : 
> >> [5.623001] x1 : 8000107e27e8 x0 : 3a545810
> >> [5.628297] Call trace:
> >> [5.633592]  qcom_iommu_tlb_inv_context+0x18/0xa8
> >
> > This means that dev_iommu_fwspec_get() has returned NULL
> > in qcom_iommu_tlb_inv_context(), either because dev->iommu
> > is NULL, or because dev->iommu->fwspec is NULL.
> >
> > qcom_iommu_tlb_inv_context() does not check for a NULL
> > pointer before using the returned object.
> >
> > The bug is either in the lack of error handling, or the fact
> > that it's possible to get into this function for a device
> > that has not been fully set up.
>
> Not quite - the device *was* properly set up, but has already been
> properly torn down again in the removal path by iommu_release_device().
> The problem is that qcom-iommu kept the device pointer as its TLB cookie
> for the domain, but the domain has a longer lifespan than the validity
> of that device - that's a fundamental design flaw in the driver.

fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb
cookie".. untested but looks like a straightforward enough change to
switch over to using the domain rather than dev as cookie

BR,
-R

>
> Robin.
>
> >> [5.635764]  free_io_pgtable_ops+0x28/0x58
> >> [5.640624]  qcom_iommu_domain_free+0x38/0x60
> >> [5.644617]  iommu_group_release+0x4c/0x70
> >> [5.649045]  kobject_put+0x6c/0x120
> >> [5.653035]  kobject_del+0x64/0x90
> >> [5.656421]  kobject_put+0xfc/0x120
> >> [5.659893]  iommu_group_remove_device+0xdc/0xf0
> >> [5.663281]  iommu_release_device+0x44/0x70
> >> [5.668142]  iommu_bus_notifier+0xbc/0xd0
> >> [5.672048]  notifier_call_chain+0x54/0x98
> >> [5.676214]  blocking_notifier_call_chain+0x48/0x70
> >> [5.680209]  device_del+0x26c/0x3a0
> >> [5.684981]  platform_device_del.part.0+0x1c/0x88
> >> [5.688453]  platform_device_unregister+0x24/0x40
> >> [5.693316]  of_platform_device_destroy+0xe4/0xf8
> >> [5.698002]  device_for_each_child+0x5c/0xa8
> >> [5.702689]  of_platform_depopulate+0x3c/0x80
>