Re: [PATCH] powerpc: eeh: Fix oops when probing in early boot
On 10 May 2010 20:38, Anton Blanchard an...@samba.org wrote: If we take an EEH early enough, we oops: Call Trace: [c00010483770] [c0013ee4] .show_stack+0xd8/0x218 (unreliable) [c00010483850] [c0658940] .dump_stack+0x28/0x3c [c000104838d0] [c0057a68] .eeh_dn_check_failure+0x2b8/0x304 [c00010483990] [c00259c8] .rtas_read_config+0x120/0x168 [c00010483a40] [c0025af4] .rtas_pci_read_config+0xe4/0x124 [c00010483af0] [c037af18] .pci_bus_read_config_word+0xac/0x104 [c00010483bc0] [c08fec98] .pcibios_allocate_resources+0x7c/0x220 [c00010483c90] [c08feed8] .pcibios_resource_survey+0x9c/0x418 [c00010483d80] [c08fea10] .pcibios_init+0xbc/0xf4 [c00010483e20] [c0009844] .do_one_initcall+0x98/0x1d8 [c00010483ed0] [c08f0560] .kernel_init+0x228/0x2e8 [c00010483f90] [c0031a08] .kernel_thread+0x54/0x70 EEH: Detected PCI bus error on device null EEH: This PCI device has failed 1 times in the last hour: EEH: location=U78A5.001.WIH8464-P1 driver= pci addr=0001:00:01.0 EEH: of node=/p...@8002209/u...@1 EEH: PCI device/vendor: 00351033 EEH: PCI cmd/status register: 12100146 Unable to handle kernel paging request for data at address 0x0468 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0057610] .rtas_set_slot_reset+0x38/0x10c LR [c0058724] .eeh_reset_device+0x5c/0x124 Call Trace: [cbc6bd00] [c005a0e0] .pcibios_remove_pci_devices+0x7c/0xb0 (unreliable) [cbc6bd90] [c0058724] .eeh_reset_device+0x5c/0x124 [cbc6be40] [c00589c0] .handle_eeh_events+0x1d4/0x39c [cbc6bf00] [c0059124] .eeh_event_handler+0xf0/0x188 [cbc6bf90] [c0031a08] .kernel_thread+0x54/0x70 We called rtas_set_slot_reset while scanning the bus and before the pci_dn to pcidev mapping has been created. Since we only need the pcidev to work out the type of reset and that only gets set after the module for the device loads, lets just do a hot reset if the pcidev is NULL. Signed-off-by: Anton Blanchard an...@samba.org --- Acked-by: Linas Vepstas linasveps...@gmail.com I'm cc'ing Brian King, he's the one who figured out the proper fix for a hot-reset/fundamental-reset hardware feature that added this line of code. The question is -- when the system finishes booting, and the module finally loads, will the device be found in a usable state and/or will it automatically reset to a usable state? --linas Index: linux-2.6/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.orig/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:10.703453565 +1000 +++ linux-2.6/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:24.034323030 +1000 @@ -749,7 +749,7 @@ static void __rtas_set_slot_reset(struct /* Determine type of EEH reset required by device, * default hot reset or fundamental reset */ - if (dev-needs_freset) + if (dev dev-needs_freset) rtas_pci_slot_reset(pdn, 3); else rtas_pci_slot_reset(pdn, 1); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: eeh: Fix oops when probing in early boot
The needs_freset bit went in since the last time I touched all this code, so I don't think this will affect ipr at least. The way this works for the ipr adapters we needed a warm reset for was, we would get the hot reset in the generic EEH code, the the ipr driver would come along after that and issue a warm reset to get the adapter in a usable state. Now that the needs_freset feature is there, we could set that in ipr for the adapters we need a warm reset for and get rid of the useless hot reset. A quick grep through the code shows that qlogic is the one user of this feature. How early is this? I assume this is pre driver load time, in which case even if we could check the flag it wouldn't be set yet... Thanks, Brian On 05/11/2010 01:59 PM, Linas Vepstas wrote: On 10 May 2010 20:38, Anton Blanchard an...@samba.org wrote: If we take an EEH early enough, we oops: Call Trace: [c00010483770] [c0013ee4] .show_stack+0xd8/0x218 (unreliable) [c00010483850] [c0658940] .dump_stack+0x28/0x3c [c000104838d0] [c0057a68] .eeh_dn_check_failure+0x2b8/0x304 [c00010483990] [c00259c8] .rtas_read_config+0x120/0x168 [c00010483a40] [c0025af4] .rtas_pci_read_config+0xe4/0x124 [c00010483af0] [c037af18] .pci_bus_read_config_word+0xac/0x104 [c00010483bc0] [c08fec98] .pcibios_allocate_resources+0x7c/0x220 [c00010483c90] [c08feed8] .pcibios_resource_survey+0x9c/0x418 [c00010483d80] [c08fea10] .pcibios_init+0xbc/0xf4 [c00010483e20] [c0009844] .do_one_initcall+0x98/0x1d8 [c00010483ed0] [c08f0560] .kernel_init+0x228/0x2e8 [c00010483f90] [c0031a08] .kernel_thread+0x54/0x70 EEH: Detected PCI bus error on device null EEH: This PCI device has failed 1 times in the last hour: EEH: location=U78A5.001.WIH8464-P1 driver= pci addr=0001:00:01.0 EEH: of node=/p...@8002209/u...@1 EEH: PCI device/vendor: 00351033 EEH: PCI cmd/status register: 12100146 Unable to handle kernel paging request for data at address 0x0468 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0057610] .rtas_set_slot_reset+0x38/0x10c LR [c0058724] .eeh_reset_device+0x5c/0x124 Call Trace: [cbc6bd00] [c005a0e0] .pcibios_remove_pci_devices+0x7c/0xb0 (unreliable) [cbc6bd90] [c0058724] .eeh_reset_device+0x5c/0x124 [cbc6be40] [c00589c0] .handle_eeh_events+0x1d4/0x39c [cbc6bf00] [c0059124] .eeh_event_handler+0xf0/0x188 [cbc6bf90] [c0031a08] .kernel_thread+0x54/0x70 We called rtas_set_slot_reset while scanning the bus and before the pci_dn to pcidev mapping has been created. Since we only need the pcidev to work out the type of reset and that only gets set after the module for the device loads, lets just do a hot reset if the pcidev is NULL. Signed-off-by: Anton Blanchard an...@samba.org --- Acked-by: Linas Vepstas linasveps...@gmail.com I'm cc'ing Brian King, he's the one who figured out the proper fix for a hot-reset/fundamental-reset hardware feature that added this line of code. The question is -- when the system finishes booting, and the module finally loads, will the device be found in a usable state and/or will it automatically reset to a usable state? --linas Index: linux-2.6/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.orig/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:10.703453565 +1000 +++ linux-2.6/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:24.034323030 +1000 @@ -749,7 +749,7 @@ static void __rtas_set_slot_reset(struct /* Determine type of EEH reset required by device, * default hot reset or fundamental reset */ - if (dev-needs_freset) + if (dev dev-needs_freset) rtas_pci_slot_reset(pdn, 3); else rtas_pci_slot_reset(pdn, 1); -- Brian King Linux on Power Virtualization IBM Linux Technology Center (507) 253-8636 | t/l 553-8636 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc: eeh: Fix oops when probing in early boot
If we take an EEH early enough, we oops: Call Trace: [c00010483770] [c0013ee4] .show_stack+0xd8/0x218 (unreliable) [c00010483850] [c0658940] .dump_stack+0x28/0x3c [c000104838d0] [c0057a68] .eeh_dn_check_failure+0x2b8/0x304 [c00010483990] [c00259c8] .rtas_read_config+0x120/0x168 [c00010483a40] [c0025af4] .rtas_pci_read_config+0xe4/0x124 [c00010483af0] [c037af18] .pci_bus_read_config_word+0xac/0x104 [c00010483bc0] [c08fec98] .pcibios_allocate_resources+0x7c/0x220 [c00010483c90] [c08feed8] .pcibios_resource_survey+0x9c/0x418 [c00010483d80] [c08fea10] .pcibios_init+0xbc/0xf4 [c00010483e20] [c0009844] .do_one_initcall+0x98/0x1d8 [c00010483ed0] [c08f0560] .kernel_init+0x228/0x2e8 [c00010483f90] [c0031a08] .kernel_thread+0x54/0x70 EEH: Detected PCI bus error on device null EEH: This PCI device has failed 1 times in the last hour: EEH: location=U78A5.001.WIH8464-P1 driver= pci addr=0001:00:01.0 EEH: of node=/p...@8002209/u...@1 EEH: PCI device/vendor: 00351033 EEH: PCI cmd/status register: 12100146 Unable to handle kernel paging request for data at address 0x0468 Oops: Kernel access of bad area, sig: 11 [#1] NIP [c0057610] .rtas_set_slot_reset+0x38/0x10c LR [c0058724] .eeh_reset_device+0x5c/0x124 Call Trace: [cbc6bd00] [c005a0e0] .pcibios_remove_pci_devices+0x7c/0xb0 (unreliable) [cbc6bd90] [c0058724] .eeh_reset_device+0x5c/0x124 [cbc6be40] [c00589c0] .handle_eeh_events+0x1d4/0x39c [cbc6bf00] [c0059124] .eeh_event_handler+0xf0/0x188 [cbc6bf90] [c0031a08] .kernel_thread+0x54/0x70 We called rtas_set_slot_reset while scanning the bus and before the pci_dn to pcidev mapping has been created. Since we only need the pcidev to work out the type of reset and that only gets set after the module for the device loads, lets just do a hot reset if the pcidev is NULL. Signed-off-by: Anton Blanchard an...@samba.org --- Index: linux-2.6/arch/powerpc/platforms/pseries/eeh.c === --- linux-2.6.orig/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:10.703453565 +1000 +++ linux-2.6/arch/powerpc/platforms/pseries/eeh.c 2010-05-10 17:25:24.034323030 +1000 @@ -749,7 +749,7 @@ static void __rtas_set_slot_reset(struct /* Determine type of EEH reset required by device, * default hot reset or fundamental reset */ - if (dev-needs_freset) + if (dev dev-needs_freset) rtas_pci_slot_reset(pdn, 3); else rtas_pci_slot_reset(pdn, 1); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev