Re: [PATCH 4/4] dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops

2018-09-26 Thread Robin Murphy

On 25/09/18 21:16, Christoph Hellwig wrote:

Looking at the code I think this commit is simply broken for
architectures not using arch_setup_dma_ops, but instead setting up
the dma ops through arch specific magic.

I'll revert the patch.


Ugh, sorry about missing that too. Ack to a revert - thinking about 
those PPC symptoms, it might actually be that other architectures could 
also get into the same pickle just by unbinding and rebinding a driver 
(e.g. switching to VFIO then back again).


Robin.


Re: [PATCH 4/4] dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops

2018-09-25 Thread Michael Ellerman
Christoph Hellwig  writes:

> Looking at the code I think this commit is simply broken for
> architectures not using arch_setup_dma_ops, but instead setting up
> the dma ops through arch specific magic.

I arch_setup_dma_ops() called from of_dma_configure(), but
pci_dma_configure() doesn't call that for this device:

static int pci_dma_configure(struct device *dev)
{
...
if (IS_ENABLED(CONFIG_OF) && bridge->parent &&
bridge->parent->of_node) {
ret = of_dma_configure(dev, bridge->parent->of_node, true);

bridge->parent is NULL.

So I don't think arch_setup_dma_ops() would help here?

> I'll revert the patch.

Thanks.

cheers


Re: [PATCH 4/4] dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops

2018-09-25 Thread Christoph Hellwig
Looking at the code I think this commit is simply broken for
architectures not using arch_setup_dma_ops, but instead setting up
the dma ops through arch specific magic.

I'll revert the patch.


Re: [PATCH 4/4] dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops

2018-09-24 Thread Michael Ellerman
Guenter Roeck  writes:
> Hi,
>
> On Mon, Aug 27, 2018 at 10:47:11AM +0200, Christoph Hellwig wrote:
>> There is no reason to leave the per-device dma_ops around when
>> deconfiguring a device, so move this code from arm64 into the
>> common code.
>> Signed-off-by: Christoph Hellwig 
>> Reviewed-by: Robin Murphy 
>
> This patch causes various ppc images to crash in -next due to NULL
> DMA ops in dma_alloc_attrs().

I finally remembered where you autobuilder is :)

https://kerneltests.org/builders/qemu-ppc-next/builds/970/steps/qemubuildcommand_1/logs/stdio

Looks like mac99 at least is failing.

Just reproduced it on my setup.

> Looking into the code, the macio driver tries to instantiate on
> the :f0:0d.0 PCI address (the driver maps to all Apple PCI IDs)
> and fails. This results in a call to arch_teardown_dma_ops(). Later,
> the same device pointer is used to instantiate ohci-pci, which
> crashes in probe because the dma_ops pointer has been cleared.
>
> I don't claim to fully understand the code, but to me it looks like
> the pci device is allocated and waiting for a driver to attach to.
> See arch/powerpc/kernel/pci_of_scan.c:of_create_pci_dev().
> If attaching a driver (here: macio) fails, the same device pointer
> is then reused for the next matching driver until a match is found
> and the device is successfully instantiated. Of course this fails
> quite badly if the device pointer has been scrubbed after the first
> failure.
>
> I don't know if this is generic PCI behavior or ppc specific.
> I am copying pci and ppc maintainers for additional input.

I would guess this is some PPC special sauce  O_o

Will have a look.

cheers


Re: [PATCH 4/4] dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops

2018-09-22 Thread Guenter Roeck
Hi,

On Mon, Aug 27, 2018 at 10:47:11AM +0200, Christoph Hellwig wrote:
> There is no reason to leave the per-device dma_ops around when
> deconfiguring a device, so move this code from arm64 into the
> common code.
> Signed-off-by: Christoph Hellwig 
> Reviewed-by: Robin Murphy 

This patch causes various ppc images to crash in -next due to NULL
DMA ops in dma_alloc_attrs().

Looking into the code, the macio driver tries to instantiate on
the :f0:0d.0 PCI address (the driver maps to all Apple PCI IDs)
and fails. This results in a call to arch_teardown_dma_ops(). Later,
the same device pointer is used to instantiate ohci-pci, which
crashes in probe because the dma_ops pointer has been cleared.

I don't claim to fully understand the code, but to me it looks like
the pci device is allocated and waiting for a driver to attach to.
See arch/powerpc/kernel/pci_of_scan.c:of_create_pci_dev().
If attaching a driver (here: macio) fails, the same device pointer
is then reused for the next matching driver until a match is found
and the device is successfully instantiated. Of course this fails
quite badly if the device pointer has been scrubbed after the first
failure.

I don't know if this is generic PCI behavior or ppc specific.
I am copying pci and ppc maintainers for additional input.

Either case, reverting the patch fixes the problem.

Guenter

---
ohci-pci :f0:0d.0: OHCI PCI host controller
ohci-pci :f0:0d.0: new USB bus registered, assigned bus number 1
[ cut here ]
kernel BUG at ./include/linux/dma-mapping.h:516!
Oops: Exception in kernel mode, sig: 5 [#1]
BE PREEMPT SMP NR_CPUS=4 NUMA PowerMac
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
4.19.0-rc4-next-20180921-dirty #1
NIP:  c086b824 LR: c086b5dc CTR: c086dd70
REGS: c0003d30f0b0 TRAP: 0700   Tainted: GW
(4.19.0-rc4-next-20180921-dirty)
MSR:  8002b032   CR: 28008842  XER: 
IRQMASK: 0 
GPR00: c086b5dc c0003d30f330 c1199900 c0003d3ce898 
GPR04: c115b798 c0003d8c3a48   
GPR08:  ff00  0020 
GPR12: 24008442 c1299000 c000f720  
GPR16:     
GPR20:     
GPR24:  c10452e8 c1045420 0080 
GPR28: 001c c1045408 c0003d3ce898 c0003d8c3a30 
NIP [c086b824] .ohci_init+0x564/0x570
LR [c086b5dc] .ohci_init+0x31c/0x570
Call Trace:
[c0003d30f330] [c086b5dc] .ohci_init+0x31c/0x570 (unreliable)
[c0003d30f3c0] [c086de58] .ohci_pci_reset+0xa8/0xb0
[c0003d30f440] [c08335ec] .usb_add_hcd+0x35c/0x9b0
[c0003d30f510] [c084ea90] .usb_hcd_pci_probe+0x320/0x510
[c0003d30f5c0] [c06c7df8] .local_pci_probe+0x68/0x140
[c0003d30f660] [c06c92a4] .pci_device_probe+0x144/0x210
[c0003d30f710] [c074cd48] .really_probe+0x2a8/0x3c0
[c0003d30f7b0] [c074d100] .driver_probe_device+0x80/0x170
[c0003d30f840] [c074d33c] .__driver_attach+0x14c/0x150
[c0003d30f8d0] [c0749c6c] .bus_for_each_dev+0xac/0x100
[c0003d30f970] [c074c334] .driver_attach+0x34/0x50
[c0003d30f9f0] [c074b9b8] .bus_add_driver+0x178/0x2f0
[c0003d30fa90] [c074e560] .driver_register+0x90/0x1a0
[c0003d30fb10] [c06c707c] .__pci_register_driver+0x6c/0x90
[c0003d30fba0] [c0f39f14] .ohci_pci_init+0x90/0xac
[c0003d30fc10] [c000f380] .do_one_initcall+0x70/0x2d0
[c0003d30fce0] [c0edfca4] .kernel_init_freeable+0x3b8/0x4b0
[c0003d30fdb0] [c000f744] .kernel_init+0x24/0x160
[c0003d30fe30] [c000b7a4] .ret_from_kernel_thread+0x58/0x74

---
# bad: [46c163a036b41a29b0ec3c475bf97515d755ff41] Add linux-next specific files 
for 20180921
# good: [7876320f88802b22d4e2daf7eb027dd14175a0f8] Linux 4.19-rc4
git bisect start 'HEAD' 'v4.19-rc4'
# bad: [03b5533c4d89cc558063a98fa4201a5d2b4eb1f7] Merge remote-tracking branch 
'crypto/master'
git bisect bad 03b5533c4d89cc558063a98fa4201a5d2b4eb1f7
# bad: [62c54071a46255d59e26e95528b80bf432796cb4] Merge remote-tracking branch 
'v9fs/9p-next'
git bisect bad 62c54071a46255d59e26e95528b80bf432796cb4
# bad: [1eee72bfcf0977daca74e9f902956adbb4f38847] Merge remote-tracking branch 
'realtek/for-next'
git bisect bad 1eee72bfcf0977daca74e9f902956adbb4f38847
# good: [30d3220045f49c707bbeec1d35423bd60488c433] Merge remote-tracking branch 
'scsi-fixes/fixes'
git bisect good 30d3220045f49c707bbeec1d35423bd60488c433
# bad: [a31a9772a2aa569dc279468da4be555b737e51f8] Merge remote-tracking branch 
'at91/at91-next'
git bisect bad a31a9772a2aa569dc279468da4be555b737e51f8
# bad: [3f52a0ad91857e78a5feed28327eafa11c44412c] Merge