Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-21 Thread Michael Ellerman
Gaurav Batra  writes:
> You are right. I think, the "reboot" should be replaced with just "boot 
> up". If there are no other comments, or code changes, I can re-word the 
> commit message and submit for review.

Yeah thanks. The change looks fine, just the change log needs a tweak.

It's fine to mention that the bug happens when a system has been
running, a device has been frozen, then the LPAR is rebooted, and *then*
we hit the bug at boot up.

cheers


Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-21 Thread Michael Ellerman
Michal Suchánek  writes:
> Hello,
>
> On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
>> Gaurav Batra  writes:
>> > At the time of LPAR reboot, partition firmware provides Open Firmware
>> > property ibm,dma-window for the PE. This property is provided on the PCI
>> > bus the PE is attached to.
>> 
>> AFAICS you're actually describing a bug that happens during boot *up*?
>> 
>> Describing it as "reboot" makes me think you're talking about the
>> shutdown path. I think that will confuse people, me at least :)
>
> there is probably an assumption that it must have been running
> previously for the errors to happen in the first place but given the
> error state persists for a day it may be a very long 'reboot'.

Yeah. Which is good detail, but the actual change is to the boot up path
so I think it's better described that way.

cheers


Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Gaurav Batra
You are right. I think, the "reboot" should be replaced with just "boot 
up". If there are no other comments, or code changes, I can re-word the 
commit message and submit for review.


Thanks,

Gaurav

On 4/19/24 6:11 AM, Michal Suchánek wrote:

Hello,

On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:

Gaurav Batra  writes:

At the time of LPAR reboot, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.

AFAICS you're actually describing a bug that happens during boot *up*?

Describing it as "reboot" makes me think you're talking about the
shutdown path. I think that will confuse people, me at least :)

there is probably an assumption that it must have been running
previously for the errors to happen in the first place but given the
error state persists for a day it may be a very long 'reboot'.

Thanks

Michal

cheers


There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR reboot. One of the scenario is
where the firmware has frozen the PE due to some error conditions. This
PE is frozen for 24 hours or unless the whole system is reinitialized.

Within this time frame, if the LPAR is rebooted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.

Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.

BUG: Kernel NULL pointer dereference on read at 0x00c8
Faulting instruction address: 0xc01024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 
(NM1060_023) hv:phyp pSeries
NIP:  c01024c0 LR: c01024b0 CTR: c0102450
REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
MSR:  82009033   CR: 28000822  XER: 
CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
...
NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
ibm,dma-window")
Signed-off-by: Gaurav Batra 
---
  arch/powerpc/platforms/pseries/iommu.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..e808d5b1fa49 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
*bus)
 * parent bus. During reboot, there will be ibm,dma-window property to
 * define DMA window. For kdump, there will at least be default window 
or DDW
 * or both.
+* There is an exception to the above. In case the PE goes into frozen
+* state, firmware may not provide ibm,dma-window property at the time
+* of LPAR reboot.
 */
  
+	if (!pdn) {

+   pr_debug("  no ibm,dma-window property !\n");
+   return;
+   }
+
ppci = PCI_DN(pdn);
  
  	pr_debug("  parent is %pOF, iommu_table: 0x%p\n",


base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
--
2.39.3 (Apple Git-146)


Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Michal Suchánek
Hello,

On Fri, Apr 19, 2024 at 04:12:46PM +1000, Michael Ellerman wrote:
> Gaurav Batra  writes:
> > At the time of LPAR reboot, partition firmware provides Open Firmware
> > property ibm,dma-window for the PE. This property is provided on the PCI
> > bus the PE is attached to.
> 
> AFAICS you're actually describing a bug that happens during boot *up*?
> 
> Describing it as "reboot" makes me think you're talking about the
> shutdown path. I think that will confuse people, me at least :)

there is probably an assumption that it must have been running
previously for the errors to happen in the first place but given the
error state persists for a day it may be a very long 'reboot'.

Thanks

Michal
> 
> cheers
> 
> > There are execptions where the partition firmware might not provide this
> > property for the PE at the time of LPAR reboot. One of the scenario is
> > where the firmware has frozen the PE due to some error conditions. This
> > PE is frozen for 24 hours or unless the whole system is reinitialized.
> >
> > Within this time frame, if the LPAR is rebooted, the frozen PE will be
> > presented to the LPAR but ibm,dma-window property could be missing.
> >
> > Today, under these circumstances, the LPAR oopses with NULL pointer
> > dereference, when configuring the PCI bus the PE is attached to.
> >
> > BUG: Kernel NULL pointer dereference on read at 0x00c8
> > Faulting instruction address: 0xc01024c0
> > Oops: Kernel access of bad area, sig: 7 [#1]
> > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > Modules linked in:
> > Supported: Yes
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> > Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 
> > of:IBM,FW1060.00 (NM1060_023) hv:phyp pSeries
> > NIP:  c01024c0 LR: c01024b0 CTR: c0102450
> > REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> > MSR:  82009033   CR: 28000822  XER: 
> > 
> > CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
> > ...
> > NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> > LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> > Call Trace:
> > pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
> > pcibios_setup_bus_self+0x1c0/0x370
> > __of_scan_bus+0x2f8/0x330
> > pcibios_scan_phb+0x280/0x3d0
> > pcibios_init+0x88/0x12c
> > do_one_initcall+0x60/0x320
> > kernel_init_freeable+0x344/0x3e4
> > kernel_init+0x34/0x1d0
> > ret_from_kernel_user_thread+0x14/0x1c
> >
> > Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
> > ibm,dma-window")
> > Signed-off-by: Gaurav Batra 
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 8 
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index e8c4129697b1..e808d5b1fa49 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
> > *bus)
> >  * parent bus. During reboot, there will be ibm,dma-window property to
> >  * define DMA window. For kdump, there will at least be default window 
> > or DDW
> >  * or both.
> > +* There is an exception to the above. In case the PE goes into frozen
> > +* state, firmware may not provide ibm,dma-window property at the time
> > +* of LPAR reboot.
> >  */
> >  
> > +   if (!pdn) {
> > +   pr_debug("  no ibm,dma-window property !\n");
> > +   return;
> > +   }
> > +
> > ppci = PCI_DN(pdn);
> >  
> > pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
> >
> > base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> > -- 
> > 2.39.3 (Apple Git-146)


Re: [PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-19 Thread Michael Ellerman
Gaurav Batra  writes:
> At the time of LPAR reboot, partition firmware provides Open Firmware
> property ibm,dma-window for the PE. This property is provided on the PCI
> bus the PE is attached to.

AFAICS you're actually describing a bug that happens during boot *up*?

Describing it as "reboot" makes me think you're talking about the
shutdown path. I think that will confuse people, me at least :)

cheers

> There are execptions where the partition firmware might not provide this
> property for the PE at the time of LPAR reboot. One of the scenario is
> where the firmware has frozen the PE due to some error conditions. This
> PE is frozen for 24 hours or unless the whole system is reinitialized.
>
> Within this time frame, if the LPAR is rebooted, the frozen PE will be
> presented to the LPAR but ibm,dma-window property could be missing.
>
> Today, under these circumstances, the LPAR oopses with NULL pointer
> dereference, when configuring the PCI bus the PE is attached to.
>
> BUG: Kernel NULL pointer dereference on read at 0x00c8
> Faulting instruction address: 0xc01024c0
> Oops: Kernel access of bad area, sig: 7 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> Supported: Yes
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
> Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 
> (NM1060_023) hv:phyp pSeries
> NIP:  c01024c0 LR: c01024b0 CTR: c0102450
> REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
> MSR:  82009033   CR: 28000822  XER: 
> CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
> ...
> NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
> LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
> Call Trace:
>   pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
>   pcibios_setup_bus_self+0x1c0/0x370
>   __of_scan_bus+0x2f8/0x330
>   pcibios_scan_phb+0x280/0x3d0
>   pcibios_init+0x88/0x12c
>   do_one_initcall+0x60/0x320
>   kernel_init_freeable+0x344/0x3e4
>   kernel_init+0x34/0x1d0
>   ret_from_kernel_user_thread+0x14/0x1c
>
> Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
> ibm,dma-window")
> Signed-off-by: Gaurav Batra 
> ---
>  arch/powerpc/platforms/pseries/iommu.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index e8c4129697b1..e808d5b1fa49 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
> *bus)
>* parent bus. During reboot, there will be ibm,dma-window property to
>* define DMA window. For kdump, there will at least be default window 
> or DDW
>* or both.
> +  * There is an exception to the above. In case the PE goes into frozen
> +  * state, firmware may not provide ibm,dma-window property at the time
> +  * of LPAR reboot.
>*/
>  
> + if (!pdn) {
> + pr_debug("  no ibm,dma-window property !\n");
> + return;
> + }
> +
>   ppci = PCI_DN(pdn);
>  
>   pr_debug("  parent is %pOF, iommu_table: 0x%p\n",
>
> base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
> -- 
> 2.39.3 (Apple Git-146)


[PATCH] powerpc/pseries/iommu: LPAR panics when rebooted with a frozen PE

2024-04-16 Thread Gaurav Batra
At the time of LPAR reboot, partition firmware provides Open Firmware
property ibm,dma-window for the PE. This property is provided on the PCI
bus the PE is attached to.

There are execptions where the partition firmware might not provide this
property for the PE at the time of LPAR reboot. One of the scenario is
where the firmware has frozen the PE due to some error conditions. This
PE is frozen for 24 hours or unless the whole system is reinitialized.

Within this time frame, if the LPAR is rebooted, the frozen PE will be
presented to the LPAR but ibm,dma-window property could be missing.

Today, under these circumstances, the LPAR oopses with NULL pointer
dereference, when configuring the PCI bus the PE is attached to.

BUG: Kernel NULL pointer dereference on read at 0x00c8
Faulting instruction address: 0xc01024c0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.4.0-150600.9-default #1
Hardware name: IBM,9043-MRX POWER10 (raw) 0x800200 0xf06 of:IBM,FW1060.00 
(NM1060_023) hv:phyp pSeries
NIP:  c01024c0 LR: c01024b0 CTR: c0102450
REGS: c37db5c0 TRAP: 0300   Not tainted  (6.4.0-150600.9-default)
MSR:  82009033   CR: 28000822  XER: 
CFAR: c010254c DAR: 00c8 DSISR: 0008 IRQMASK: 0
...
NIP [c01024c0] pci_dma_bus_setup_pSeriesLP+0x70/0x2a0
LR [c01024b0] pci_dma_bus_setup_pSeriesLP+0x60/0x2a0
Call Trace:
pci_dma_bus_setup_pSeriesLP+0x60/0x2a0 (unreliable)
pcibios_setup_bus_self+0x1c0/0x370
__of_scan_bus+0x2f8/0x330
pcibios_scan_phb+0x280/0x3d0
pcibios_init+0x88/0x12c
do_one_initcall+0x60/0x320
kernel_init_freeable+0x344/0x3e4
kernel_init+0x34/0x1d0
ret_from_kernel_user_thread+0x14/0x1c

Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of 
ibm,dma-window")
Signed-off-by: Gaurav Batra 
---
 arch/powerpc/platforms/pseries/iommu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e8c4129697b1..e808d5b1fa49 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -786,8 +786,16 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus 
*bus)
 * parent bus. During reboot, there will be ibm,dma-window property to
 * define DMA window. For kdump, there will at least be default window 
or DDW
 * or both.
+* There is an exception to the above. In case the PE goes into frozen
+* state, firmware may not provide ibm,dma-window property at the time
+* of LPAR reboot.
 */
 
+   if (!pdn) {
+   pr_debug("  no ibm,dma-window property !\n");
+   return;
+   }
+
ppci = PCI_DN(pdn);
 
pr_debug("  parent is %pOF, iommu_table: 0x%p\n",

base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
-- 
2.39.3 (Apple Git-146)