Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On 11/04/2019 12:01, Robin Murphy wrote: The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? Hi Greg, No, I don't think so. I'd say it supplements it. Here I'm trying to fix up another path in which we tear down the DMA ops prior to releasing the device's resources. I didn't add a fixes tag as 376991db4b64 didn't have one either. It will need to be backported to stable, I figure the same as 376991db4b64. Hi Greg, I still think that we should consider merging this patch. Bah, sorry, I thought I'd replied to this thread already, but apparently not :( I don't have a particularly strong opinion, but at the moment I am leaning slightly towards this being a parallel fix for another part of 09515ef5ddad, rather than a specific fix of the other fix. Either way, you can have a: I'd say a parallel fix of 09515ef5ddad. Reviewed-by: Robin Murphy cheers Robin Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On 11/04/2019 09:50, John Garry wrote: On 04/04/2019 12:17, John Garry wrote: On 03/04/2019 10:20, John Garry wrote: On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. Hi all, A friendly reminder on this patch... I didn't see any update. I thought that it had some importance. Thanks, John However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:7ec4fd40 count:1 mapcount:0 mapping: index:0x0 flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd48 7ec4fd48 raw: 0001 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:7ec4fd80 count:1 mapcount:0 mapping: index:0x0 [ 89.322983] flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd88 7ec4fd88 raw: 0001 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? Hi Greg, No, I don't think so. I'd say it supplements it. Here I'm trying to fix up another path in which we tear down the DMA ops prior to releasing the device's resources. I didn't add a fixes tag as 376991db4b64 didn't have one either. It will need to be backported to stable, I figure the same as 376991db4b64. Hi Greg, I still think that we should consider merging this patch. Bah, sorry, I thought I'd replied to this thread already, but apparently not :( I don't have a particularly strong opinion, but at the moment I am leaning slightly towards this being a parallel fix for another part of 09515ef5ddad, rather than a specific fix of the other fix. Either way, you can have a: Reviewed-by: Robin Murphy Thanks, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On 04/04/2019 12:17, John Garry wrote: On 03/04/2019 10:20, John Garry wrote: On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. Hi all, A friendly reminder on this patch... I didn't see any update. I thought that it had some importance. Thanks, John However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:7ec4fd40 count:1 mapcount:0 mapping: index:0x0 flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd48 7ec4fd48 raw: 0001 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:7ec4fd80 count:1 mapcount:0 mapping: index:0x0 [ 89.322983] flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd88 7ec4fd88 raw: 0001 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? Hi Greg, No, I don't think so. I'd say it supplements it. Here I'm trying to fix up another path in which we tear down the DMA ops prior to releasing the device's resources. I didn't add a fixes tag as 376991db4b64 didn't have one either. It will need to be backported to stable, I figure the same as 376991db4b64. Hi Greg, I still think that we should consider merging this patch. Thanks, John So 376991db4b64 required manual backporting to stable, and this patch would require the same: https://www.spinics.net/lists/stable/msg289685.html Robin, any further comment? Thanks, John thanks, greg k-h . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On 03/04/2019 10:20, John Garry wrote: On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. Hi all, A friendly reminder on this patch... I didn't see any update. I thought that it had some importance. Thanks, John However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:7ec4fd40 count:1 mapcount:0 mapping: index:0x0 flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd48 7ec4fd48 raw: 0001 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:7ec4fd80 count:1 mapcount:0 mapping: index:0x0 [ 89.322983] flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd88 7ec4fd88 raw: 0001 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? Hi Greg, No, I don't think so. I'd say it supplements it. Here I'm trying to fix up another path in which we tear down the DMA ops prior to releasing the device's resources. I didn't add a fixes tag as 376991db4b64 didn't have one either. It will need to be backported to stable, I figure the same as 376991db4b64. So 376991db4b64 required manual backporting to stable, and this patch would require the same: https://www.spinics.net/lists/stable/msg289685.html Robin, any further comment? Thanks, John thanks, greg k-h . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On 03/04/2019 09:14, Greg KH wrote: On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. Hi all, A friendly reminder on this patch... I didn't see any update. I thought that it had some importance. Thanks, John However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:7ec4fd40 count:1 mapcount:0 mapping: index:0x0 flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd48 7ec4fd48 raw: 0001 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:7ec4fd80 count:1 mapcount:0 mapping: index:0x0 [ 89.322983] flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd88 7ec4fd88 raw: 0001 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? Hi Greg, No, I don't think so. I'd say it supplements it. Here I'm trying to fix up another path in which we tear down the DMA ops prior to releasing the device's resources. I didn't add a fixes tag as 376991db4b64 didn't have one either. It will need to be backported to stable, I figure the same as 376991db4b64. Thanks, John thanks, greg k-h . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On Wed, Apr 03, 2019 at 09:02:36AM +0100, John Garry wrote: > On 28/03/2019 10:08, John Garry wrote: > > In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after > > devres release"), we changed the ordering of tearing down the device DMA > > ops and releasing all the device's resources; this was because the DMA ops > > should be maintained until we release the device's managed DMA memories. > > > > Hi all, > > A friendly reminder on this patch... I didn't see any update. > > I thought that it had some importance. > > Thanks, > John > > > However, we have seen another crash on an arm64 system when a > > device driver probe fails: > > > > hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 > > scsi host1: hisi_sas_v3_hw > > BUG: Bad page state in process swapper/0 pfn:313f5 > > page:7ec4fd40 count:1 mapcount:0 > > mapping: index:0x0 > > flags: 0xfffe0001000(reserved) > > raw: 0fffe0001000 7ec4fd48 7ec4fd48 > > > > raw: 0001 > > > > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > > bad because of flags: 0x1000(reserved) > > Modules linked in: > > CPU: 49 PID: 1 Comm: swapper/0 Not tainted > > 5.1.0-rc1-43081-g22d97fd-dirty #1433 > > Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI > > RC0 - V1.12.01 01/29/2019 > > Call trace: > > dump_backtrace+0x0/0x118 > > show_stack+0x14/0x1c > > dump_stack+0xa4/0xc8 > > bad_page+0xe4/0x13c > > free_pages_check_bad+0x4c/0xc0 > > __free_pages_ok+0x30c/0x340 > > __free_pages+0x30/0x44 > > __dma_direct_free_pages+0x30/0x38 > > dma_direct_free+0x24/0x38 > > dma_free_attrs+0x9c/0xd8 > > dmam_release+0x20/0x28 > > release_nodes+0x17c/0x220 > > devres_release_all+0x34/0x54 > > really_probe+0xc4/0x2c8 > > driver_probe_device+0x58/0xfc > > device_driver_attach+0x68/0x70 > > __driver_attach+0x94/0xdc > > bus_for_each_dev+0x5c/0xb4 > > driver_attach+0x20/0x28 > > bus_add_driver+0x14c/0x200 > > driver_register+0x6c/0x124 > > __pci_register_driver+0x48/0x50 > > sas_v3_pci_driver_init+0x20/0x28 > > do_one_initcall+0x40/0x25c > > kernel_init_freeable+0x2b8/0x3c0 > > kernel_init+0x10/0x100 > > ret_from_fork+0x10/0x18 > > Disabling lock debugging due to kernel taint > > BUG: Bad page state in process swapper/0 pfn:313f6 > > page:7ec4fd80 count:1 mapcount:0 > > mapping: index:0x0 > > [ 89.322983] flags: 0xfffe0001000(reserved) > > raw: 0fffe0001000 7ec4fd88 7ec4fd88 > > > > raw: 0001 > > > > > > The crash occurs for the same reason. > > > > In this case, on the really_probe() failure path, we are still clearing > > the DMA ops prior to releasing the device's managed memories. > > > > This patch fixes this issue by reordering the DMA ops teardown and the > > call to devres_release_all() on the failure path. > > > > Reported-by: Xiang Chen > > Tested-by: Xiang Chen > > Signed-off-by: John Garry So does this "fix" 376991db4b64? If so, should this be added to the patch and also backported to the stable trees? thanks, greg k-h ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
On 28/03/2019 10:08, John Garry wrote: In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. Hi all, A friendly reminder on this patch... I didn't see any update. I thought that it had some importance. Thanks, John However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:7ec4fd40 count:1 mapcount:0 mapping: index:0x0 flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd48 7ec4fd48 raw: 0001 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:7ec4fd80 count:1 mapcount:0 mapping: index:0x0 [ 89.322983] flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd88 7ec4fd88 raw: 0001 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry --- For convenience, here is the 2nd half of really_probe() now: atomic_inc(_count); pr_debug("bus: '%s': %s: probing driver %s with device %s\n", drv->bus->name, __func__, drv->name, dev_name(dev)); WARN_ON(!list_empty(>devres_head)); re_probe: dev->driver = drv; /* If using pinctrl, bind pins now before probing */ ret = pinctrl_bind_pins(dev); if (ret) goto pinctrl_bind_failed; if (dev->bus->dma_configure) { ret = dev->bus->dma_configure(dev); if (ret) goto probe_failed; } if (driver_sysfs_add(dev)) { printk(KERN_ERR "%s: driver_sysfs_add(%s) failed\n", __func__, dev_name(dev)); goto probe_failed; } if (dev->pm_domain && dev->pm_domain->activate) { ret = dev->pm_domain->activate(dev); if (ret) goto probe_failed; } if (dev->bus->probe) { ret = dev->bus->probe(dev); if (ret) goto probe_failed; } else if (drv->probe) { ret = drv->probe(dev); if (ret) goto probe_failed; } if (test_remove) { test_remove = false; if (dev->bus->remove) dev->bus->remove(dev); else if (drv->remove) drv->remove(dev); devres_release_all(dev); driver_sysfs_remove(dev); dev->driver = NULL; dev_set_drvdata(dev, NULL); if (dev->pm_domain && dev->pm_domain->dismiss) dev->pm_domain->dismiss(dev); pm_runtime_reinit(dev); goto re_probe; } pinctrl_init_done(dev); if (dev->pm_domain && dev->pm_domain->sync) dev->pm_domain->sync(dev); driver_bound(dev); ret = 1; pr_debug("bus: '%s': %s: bound device %s to driver %s\n", drv->bus->name, __func__,
[PATCH] driver core: Postpone DMA tear-down until after devres release for probe failure
In commit 376991db4b64 ("driver core: Postpone DMA tear-down until after devres release"), we changed the ordering of tearing down the device DMA ops and releasing all the device's resources; this was because the DMA ops should be maintained until we release the device's managed DMA memories. However, we have seen another crash on an arm64 system when a device driver probe fails: hisi_sas_v3_hw :74:02.0: Adding to iommu group 2 scsi host1: hisi_sas_v3_hw BUG: Bad page state in process swapper/0 pfn:313f5 page:7ec4fd40 count:1 mapcount:0 mapping: index:0x0 flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd48 7ec4fd48 raw: 0001 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: 0x1000(reserved) Modules linked in: CPU: 49 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-43081-g22d97fd-dirty #1433 Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - V1.12.01 01/29/2019 Call trace: dump_backtrace+0x0/0x118 show_stack+0x14/0x1c dump_stack+0xa4/0xc8 bad_page+0xe4/0x13c free_pages_check_bad+0x4c/0xc0 __free_pages_ok+0x30c/0x340 __free_pages+0x30/0x44 __dma_direct_free_pages+0x30/0x38 dma_direct_free+0x24/0x38 dma_free_attrs+0x9c/0xd8 dmam_release+0x20/0x28 release_nodes+0x17c/0x220 devres_release_all+0x34/0x54 really_probe+0xc4/0x2c8 driver_probe_device+0x58/0xfc device_driver_attach+0x68/0x70 __driver_attach+0x94/0xdc bus_for_each_dev+0x5c/0xb4 driver_attach+0x20/0x28 bus_add_driver+0x14c/0x200 driver_register+0x6c/0x124 __pci_register_driver+0x48/0x50 sas_v3_pci_driver_init+0x20/0x28 do_one_initcall+0x40/0x25c kernel_init_freeable+0x2b8/0x3c0 kernel_init+0x10/0x100 ret_from_fork+0x10/0x18 Disabling lock debugging due to kernel taint BUG: Bad page state in process swapper/0 pfn:313f6 page:7ec4fd80 count:1 mapcount:0 mapping: index:0x0 [ 89.322983] flags: 0xfffe0001000(reserved) raw: 0fffe0001000 7ec4fd88 7ec4fd88 raw: 0001 The crash occurs for the same reason. In this case, on the really_probe() failure path, we are still clearing the DMA ops prior to releasing the device's managed memories. This patch fixes this issue by reordering the DMA ops teardown and the call to devres_release_all() on the failure path. Reported-by: Xiang Chen Tested-by: Xiang Chen Signed-off-by: John Garry --- For convenience, here is the 2nd half of really_probe() now: atomic_inc(_count); pr_debug("bus: '%s': %s: probing driver %s with device %s\n", drv->bus->name, __func__, drv->name, dev_name(dev)); WARN_ON(!list_empty(>devres_head)); re_probe: dev->driver = drv; /* If using pinctrl, bind pins now before probing */ ret = pinctrl_bind_pins(dev); if (ret) goto pinctrl_bind_failed; if (dev->bus->dma_configure) { ret = dev->bus->dma_configure(dev); if (ret) goto probe_failed; } if (driver_sysfs_add(dev)) { printk(KERN_ERR "%s: driver_sysfs_add(%s) failed\n", __func__, dev_name(dev)); goto probe_failed; } if (dev->pm_domain && dev->pm_domain->activate) { ret = dev->pm_domain->activate(dev); if (ret) goto probe_failed; } if (dev->bus->probe) { ret = dev->bus->probe(dev); if (ret) goto probe_failed; } else if (drv->probe) { ret = drv->probe(dev); if (ret) goto probe_failed; } if (test_remove) { test_remove = false; if (dev->bus->remove) dev->bus->remove(dev); else if (drv->remove) drv->remove(dev); devres_release_all(dev); driver_sysfs_remove(dev); dev->driver = NULL; dev_set_drvdata(dev, NULL); if (dev->pm_domain && dev->pm_domain->dismiss) dev->pm_domain->dismiss(dev); pm_runtime_reinit(dev); goto re_probe; } pinctrl_init_done(dev); if (dev->pm_domain && dev->pm_domain->sync) dev->pm_domain->sync(dev); driver_bound(dev); ret = 1; pr_debug("bus: '%s': %s: bound device %s to driver %s\n", drv->bus->name, __func__, dev_name(dev), drv->name); goto done; probe_failed: if (dev->bus) blocking_notifier_call_chain(>bus->p->bus_notifier,