Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2026-03-11 Thread Manivannan Sadhasivam
On Wed, Mar 11, 2026 at 11:59:37AM +0100, Niklas Cassel wrote:
> On Tue, Mar 10, 2026 at 07:07:39PM +0530, Manivannan Sadhasivam wrote:
> > > 
> > > I tested your patch series + your suggested change above, and after a:
> > > 
> > > ## On EP side:
> > > # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start 
> > > && \
> > >   sleep 0.1 && echo 1 > 
> > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > > 
> > > Instead of:
> > > 
> > > # FAILED: 1 / 16 tests passed.
> > > 
> > > I now get:
> > > # FAILED: 7 / 16 tests passed.
> > > 
> > > Test cases 1-7 now passes (the test cases related to BARs),
> > > all other test cases still fail:
> 
> (snip)
> 
> > I guess what is going wrong here is that you might be rebooting the device
> > *after* restoring the config space in 'slot_reset()' callback. So all the
> > previous config space contents would be lost and the device would've started
> > afresh.
> 
> I'm confused.
> 
> # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
>   sleep 0.1 && echo 1 > 
> /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> 
> Will not reboot the device.
> 
> 
> > 
> > I don't think we can restore the config space after rebooting the EP device,
> > because host has no idea about it. It can only restore the content after
> > recovering the device through platform specific means.
> 
> Am I missing something, or why are we talking about rebooting the device?
> 

Sorry, I just went with reboot scenario since I remember you referring to it in
the thread. But I must admit that I didn't read your repro steps where you just
write to the 'start' parameter.

But I did test with toggling the LTSSM alone on the EP and was able to execute
the test successfully on the host. So I'm not entirely sure what is going wrong
on yours.

I need to look into it more in detail, but I do not want to block the series
from getting merged due to this.

- Mani

-- 
மணிவண்ணன் சதாசிவம்



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2026-03-11 Thread Niklas Cassel
On Tue, Mar 10, 2026 at 07:07:39PM +0530, Manivannan Sadhasivam wrote:
> > 
> > I tested your patch series + your suggested change above, and after a:
> > 
> > ## On EP side:
> > # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && 
> > \
> >   sleep 0.1 && echo 1 > 
> > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > 
> > Instead of:
> > 
> > # FAILED: 1 / 16 tests passed.
> > 
> > I now get:
> > # FAILED: 7 / 16 tests passed.
> > 
> > Test cases 1-7 now passes (the test cases related to BARs),
> > all other test cases still fail:

(snip)

> I guess what is going wrong here is that you might be rebooting the device
> *after* restoring the config space in 'slot_reset()' callback. So all the
> previous config space contents would be lost and the device would've started
> afresh.

I'm confused.

# echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
  sleep 0.1 && echo 1 > 
/sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start

Will not reboot the device.


> 
> I don't think we can restore the config space after rebooting the EP device,
> because host has no idea about it. It can only restore the content after
> recovering the device through platform specific means.

Am I missing something, or why are we talking about rebooting the device?



Kind regards,
Niklas



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2026-03-10 Thread Manivannan Sadhasivam
Hi Niklas,

Sorry for replying very late. I lost access to my EP setup, and just got it
back.

On Thu, Sep 04, 2025 at 04:03:39PM +0200, Niklas Cassel wrote:
> Hello Mani,
> 
> On Fri, Aug 29, 2025 at 09:44:08PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Aug 15, 2025 at 11:07:42AM GMT, Niklas Cassel wrote:
> 
> (snip)
> 
> > > > > > ## On EP side:
> > > > > > # echo 0 > 
> > > > > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
> > > > > >   sleep 0.1 && echo 1 > 
> > > > > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > > > > > 
> > > > > > Basically all tests timeout
> > > > > > # FAILED: 1 / 16 tests passed.
> > > > > > 
> > > > > > Which is the same as before this patch series.
> > > > 
> > > > This is kind of expected since the pci_endpoint_test driver doesn't 
> > > > have the AER
> > > > err_handlers defined.
> > > 
> > > I see.
> > > Would be nice if we could add them then, so that we can verify that this
> > > series is working as intended.
> 
> (snip)
> 
> > Ok, thanks for the logs. I guess what is happening here is that we are not
> > saving/restoring the config space of the devices under the Root Port if 
> > linkdown
> > is happens. TBH, we cannot do that from the PCI core since once linkdown
> > happens, we cannot access any devices underneath the Root Port. But if
> > err_handlers are available for drivers for all devices, they could do 
> > something
> > smart like below:
> > 
> > diff --git a/drivers/misc/pci_endpoint_test.c 
> > b/drivers/misc/pci_endpoint_test.c
> > index c4e5e2c977be..9aabf1fe902e 100644
> > --- a/drivers/misc/pci_endpoint_test.c
> > +++ b/drivers/misc/pci_endpoint_test.c
> > @@ -989,6 +989,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
> >  
> > pci_set_drvdata(pdev, test);
> >  
> > +   pci_save_state(pdev);
> > +
> > id = ida_alloc(&pci_endpoint_test_ida, GFP_KERNEL);
> > if (id < 0) {
> > ret = id;
> > @@ -1140,12 +1142,31 @@ static const struct pci_device_id 
> > pci_endpoint_test_tbl[] = {
> >  };
> >  MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);
> >  
> > +static pci_ers_result_t pci_endpoint_test_error_detected(struct pci_dev 
> > *pdev,
> > +  pci_channel_state_t state)
> > +{
> > +   return PCI_ERS_RESULT_NEED_RESET;
> > +}
> > +
> > +static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
> > +{
> > +   pci_restore_state(pdev);
> > +
> > +   return PCI_ERS_RESULT_RECOVERED;
> > +}
> > +
> > +static const struct pci_error_handlers pci_endpoint_test_err_handler = {
> > +   .error_detected = pci_endpoint_test_error_detected,
> > +   .slot_reset = pci_endpoint_test_slot_reset,
> > +};
> > +
> >  static struct pci_driver pci_endpoint_test_driver = {
> > .name   = DRV_MODULE_NAME,
> > .id_table   = pci_endpoint_test_tbl,
> > .probe  = pci_endpoint_test_probe,
> > .remove = pci_endpoint_test_remove,
> > .sriov_configure = pci_sriov_configure_simple,
> > +   .err_handler= &pci_endpoint_test_err_handler,
> >  };
> >  module_pci_driver(pci_endpoint_test_driver);
> > 
> > This essentially saves the good known config space during probe and 
> > restores it
> > during the slot_reset callback. Ofc, the state would've been overwritten if
> > suspend/resume happens in-between, but the point I'm making is that unless 
> > all
> > device drivers restore their known config space, devices cannot be resumed
> > properly post linkdown recovery.
> > 
> > I can add a patch based on the above diff in next revision if that helps. 
> > Right
> > now, I do not have access to my endpoint test setup. So can't test anything.
> 
> I tested your patch series + your suggested change above, and after a:
> 
> ## On EP side:
> # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
>   sleep 0.1 && echo 1 > 
> /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> 
> Instead of:
> 
> # FAILED: 1 / 16 tests passed.
> 
> I now get:
> # FAILED: 7 / 16 tests passed.
> 
> Test cases 1-7 now passes (the test cases related to BARs),
> all other test cases still fail:
> 
> # /pcitest 
> TAP version 13
> 1..16
> # Starting 16 tests from 9 test cases.
> #  RUN   pci_ep_bar.BAR0.BAR_TEST ...
> #OK  pci_ep_bar.BAR0.BAR_TEST
> ok 1 pci_ep_bar.BAR0.BAR_TEST
> #  RUN   pci_ep_bar.BAR1.BAR_TEST ...
> #OK  pci_ep_bar.BAR1.BAR_TEST
> ok 2 pci_ep_bar.BAR1.BAR_TEST
> #  RUN   pci_ep_bar.BAR2.BAR_TEST ...
> #OK  pci_ep_bar.BAR2.BAR_TEST
> ok 3 pci_ep_bar.BAR2.BAR_TEST
> #  RUN   pci_ep_bar.BAR3.BAR_TEST ...
> #OK  pci_ep_bar.BAR3.BAR_TEST
> ok 4 pci_ep_bar.BAR3.BAR_TEST
> #  RUN   pci_ep_bar.BAR4.BAR_TEST ...
> #  SKIP  BAR is disabled
> #OK  pci_ep_bar.BAR4.BAR_TEST
> ok 5 pci_ep_bar.BAR4.BAR_TEST # SKIP BAR is disa

Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-09-23 Thread David Bremner


I have been testing this series on the 6.17 pre-releases, lightly
patched by the collabora [1] and mnt-reform [2] teams. I have been testing
on bare hardware, on MNT Research's pocket-reform product. I'm afraid I
can only offer CI level feedback, but in case it helps

1) The series now applies cleanly onto collabora's rockchip-devel branch
2) The resulting kernel boots and runs OK.
3) the resulting kernel still fails the "platform" pm_test [3] with
 "rockchip-dw-pcie a40c0.pcie: Phy link never came up"

Of course there could be other reasons for (3), I don't know that much
about it.

I'm happy to test a newer version of the series if/when it exists.

d

[1]: 
https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux.git#rockchip-devel
[2]: https://salsa.debian.org/bremner/collabora-rockchip-3588#reform-patches
[3]: https://www.cs.unb.ca/~bremner/blog/posts/hibernate-pocket-12/



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-09-23 Thread Niklas Cassel
Hello David,

On Tue, Sep 23, 2025 at 10:06:13AM -0300, David Bremner wrote:
> 
> I have been testing this series on the 6.17 pre-releases, lightly
> patched by the collabora [1] and mnt-reform [2] teams. I have been testing
> on bare hardware, on MNT Research's pocket-reform product. I'm afraid I
> can only offer CI level feedback, but in case it helps
> 
> 1) The series now applies cleanly onto collabora's rockchip-devel branch
> 2) The resulting kernel boots and runs OK.
> 3) the resulting kernel still fails the "platform" pm_test [3] with
>  "rockchip-dw-pcie a40c0.pcie: Phy link never came up"
> 
> Of course there could be other reasons for (3), I don't know that much
> about it.

I don't think this driver has support for hibernate in upstream.

Did you try forward porting this patch:
https://lore.kernel.org/linux-pci/[email protected]/

Also, have you tried the downstream driver?
If it is working there, perhaps you could try to isolate the changes
that make it work there.

Otherwise, I would recommend you to reach out to author of the patch
above ask for support.


Kind regards,
Niklas



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-09-04 Thread Niklas Cassel
Hello Mani,

On Fri, Aug 29, 2025 at 09:44:08PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Aug 15, 2025 at 11:07:42AM GMT, Niklas Cassel wrote:

(snip)

> > > > > ## On EP side:
> > > > > # echo 0 > 
> > > > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
> > > > >   sleep 0.1 && echo 1 > 
> > > > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > > > > 
> > > > > Basically all tests timeout
> > > > > # FAILED: 1 / 16 tests passed.
> > > > > 
> > > > > Which is the same as before this patch series.
> > > 
> > > This is kind of expected since the pci_endpoint_test driver doesn't have 
> > > the AER
> > > err_handlers defined.
> > 
> > I see.
> > Would be nice if we could add them then, so that we can verify that this
> > series is working as intended.

(snip)

> Ok, thanks for the logs. I guess what is happening here is that we are not
> saving/restoring the config space of the devices under the Root Port if 
> linkdown
> is happens. TBH, we cannot do that from the PCI core since once linkdown
> happens, we cannot access any devices underneath the Root Port. But if
> err_handlers are available for drivers for all devices, they could do 
> something
> smart like below:
> 
> diff --git a/drivers/misc/pci_endpoint_test.c 
> b/drivers/misc/pci_endpoint_test.c
> index c4e5e2c977be..9aabf1fe902e 100644
> --- a/drivers/misc/pci_endpoint_test.c
> +++ b/drivers/misc/pci_endpoint_test.c
> @@ -989,6 +989,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
>  
> pci_set_drvdata(pdev, test);
>  
> +   pci_save_state(pdev);
> +
> id = ida_alloc(&pci_endpoint_test_ida, GFP_KERNEL);
> if (id < 0) {
> ret = id;
> @@ -1140,12 +1142,31 @@ static const struct pci_device_id 
> pci_endpoint_test_tbl[] = {
>  };
>  MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);
>  
> +static pci_ers_result_t pci_endpoint_test_error_detected(struct pci_dev 
> *pdev,
> +  pci_channel_state_t state)
> +{
> +   return PCI_ERS_RESULT_NEED_RESET;
> +}
> +
> +static pci_ers_result_t pci_endpoint_test_slot_reset(struct pci_dev *pdev)
> +{
> +   pci_restore_state(pdev);
> +
> +   return PCI_ERS_RESULT_RECOVERED;
> +}
> +
> +static const struct pci_error_handlers pci_endpoint_test_err_handler = {
> +   .error_detected = pci_endpoint_test_error_detected,
> +   .slot_reset = pci_endpoint_test_slot_reset,
> +};
> +
>  static struct pci_driver pci_endpoint_test_driver = {
> .name   = DRV_MODULE_NAME,
> .id_table   = pci_endpoint_test_tbl,
> .probe  = pci_endpoint_test_probe,
> .remove = pci_endpoint_test_remove,
> .sriov_configure = pci_sriov_configure_simple,
> +   .err_handler= &pci_endpoint_test_err_handler,
>  };
>  module_pci_driver(pci_endpoint_test_driver);
> 
> This essentially saves the good known config space during probe and restores 
> it
> during the slot_reset callback. Ofc, the state would've been overwritten if
> suspend/resume happens in-between, but the point I'm making is that unless all
> device drivers restore their known config space, devices cannot be resumed
> properly post linkdown recovery.
> 
> I can add a patch based on the above diff in next revision if that helps. 
> Right
> now, I do not have access to my endpoint test setup. So can't test anything.

I tested your patch series + your suggested change above, and after a:

## On EP side:
# echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
  sleep 0.1 && echo 1 > 
/sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start

Instead of:

# FAILED: 1 / 16 tests passed.

I now get:
# FAILED: 7 / 16 tests passed.

Test cases 1-7 now passes (the test cases related to BARs),
all other test cases still fail:

# /pcitest 
TAP version 13
1..16
# Starting 16 tests from 9 test cases.
#  RUN   pci_ep_bar.BAR0.BAR_TEST ...
#OK  pci_ep_bar.BAR0.BAR_TEST
ok 1 pci_ep_bar.BAR0.BAR_TEST
#  RUN   pci_ep_bar.BAR1.BAR_TEST ...
#OK  pci_ep_bar.BAR1.BAR_TEST
ok 2 pci_ep_bar.BAR1.BAR_TEST
#  RUN   pci_ep_bar.BAR2.BAR_TEST ...
#OK  pci_ep_bar.BAR2.BAR_TEST
ok 3 pci_ep_bar.BAR2.BAR_TEST
#  RUN   pci_ep_bar.BAR3.BAR_TEST ...
#OK  pci_ep_bar.BAR3.BAR_TEST
ok 4 pci_ep_bar.BAR3.BAR_TEST
#  RUN   pci_ep_bar.BAR4.BAR_TEST ...
#  SKIP  BAR is disabled
#OK  pci_ep_bar.BAR4.BAR_TEST
ok 5 pci_ep_bar.BAR4.BAR_TEST # SKIP BAR is disabled
#  RUN   pci_ep_bar.BAR5.BAR_TEST ...
#OK  pci_ep_bar.BAR5.BAR_TEST
ok 6 pci_ep_bar.BAR5.BAR_TEST
#  RUN   pci_ep_basic.CONSECUTIVE_BAR_TEST ...
#OK  pci_ep_basic.CONSECUTIVE_BAR_TEST
ok 7 pci_ep_basic.CONSECUTIVE_BAR_TEST
#  RUN   pci_ep_basic.LEGACY_IRQ_TEST ...
# pci_endpoint_test.c:106:LEGACY_IRQ_TEST:Expected 0 (0) == ret (-110)
# pci_endpoint_test.c:106:LEGACY_IRQ_TES

Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-08-29 Thread Manivannan Sadhasivam
On Fri, Aug 15, 2025 at 11:07:42AM GMT, Niklas Cassel wrote:
> Hello Mani,
> 
> Sorry for the delayed reply.
> I just came back from vacation.
> 
> On Thu, Jul 24, 2025 at 11:00:05AM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Jul 18, 2025 at 12:39:50PM GMT, Niklas Cassel wrote:
> > > On Fri, Jul 18, 2025 at 12:28:44PM +0200, Niklas Cassel wrote:
> > > > On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 
> > > > Relay wrote:
> > > > 2) Testing link down reset:
> > > > 
> > > > selftests before link down reset:
> > > > # FAILED: 14 / 16 tests passed.
> > > > 
> > > > ## On EP side:
> > > > # echo 0 > 
> > > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
> > > >   sleep 0.1 && echo 1 > 
> > > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > > > 
> > > > 
> > > > [  111.137162] rockchip-dw-pcie a4000.pcie: 
> > > > PCIE_CLIENT_INTR_STATUS_MISC: 0x4
> > > > [  111.137881] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
> > > > [  111.138432] rockchip-dw-pcie a4000.pcie: hot reset or link-down 
> > > > reset
> > > > [  111.139067] pcieport :00:00.0: Recovering Root Port due to Link 
> > > > Down
> > > > [  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
> > > > error_detected callback)
> > > > [  111.255407] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
> > > > [  111.256019] rockchip-dw-pcie a4000.pcie: Root Port reset 
> > > > completed
> > > > [  111.383401] pcieport :00:00.0: Root Port has been reset
> > > > [  111.384060] pcieport :00:00.0: AER: device recovery failed
> > > > [  111.384582] rockchip-dw-pcie a4000.pcie: 
> > > > PCIE_CLIENT_INTR_STATUS_MISC: 0x3
> > > > [  111.385218] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
> > > > [  111.385771] rockchip-dw-pcie a4000.pcie: Received Link up event. 
> > > > Starting enumeration!
> > > > [  111.390866] pcieport :00:00.0: bridge configuration invalid 
> > > > ([bus 00-00]), reconfiguring
> > > > [  111.391650] pci_bus :01: busn_res: [bus 01-ff] end is updated to 
> > > > 01
> > > > 
> > > > Basically all tests timeout
> > > > # FAILED: 1 / 16 tests passed.
> > > > 
> > > > Which is the same as before this patch series.
> > > 
> > > The above was with CONFIG_PCIEAER=y
> > > 
> > 
> > This is kind of expected since the pci_endpoint_test driver doesn't have 
> > the AER
> > err_handlers defined.
> 
> I see.
> Would be nice if we could add them then, so that we can verify that this
> series is working as intended.
> 
> 
> > 
> > > Wilfred suggested that I tried without this config set.
> > > 
> > > However, doing so, I got the exact same result:
> > > # FAILED: 1 / 16 tests passed.
> > > 
> > 
> > Interesting. Could you please share the dmesg log like above.
> 
> It is looking exactly like the dmesg above
> 
> [   86.820059] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
> 0x4
> [   86.820791] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
> [   86.821344] rockchip-dw-pcie a4000.pcie: hot reset or link-down reset
> [   86.821978] pcieport :00:00.0: Recovering Root Port due to Link Down
> [   87.040551] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
> [   87.041138] rockchip-dw-pcie a4000.pcie: Root Port reset completed
> [   87.168378] pcieport :00:00.0: Root Port has been reset
> [   87.168882] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
> 0x3
> [   87.169519] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
> [   87.272463] rockchip-dw-pcie a4000.pcie: Received Link up event. 
> Starting enumeration!
> [   87.277552] pcieport :00:00.0: bridge configuration invalid ([bus 
> 00-00]), reconfiguring
> [   87.278314] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01
> 
> except that we don't get the:
> > [  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
> > error_detected callback)
> > [  111.384060] pcieport :00:00.0: AER: device recovery failed
> 

Ok, thanks for the logs. I guess what is happening here is that we are not
saving/restoring the config space of the devices under the Root Port if linkdown
is happens. TBH, we cannot do that from the PCI core since once linkdown
happens, we cannot access any devices underneath the Root Port. But if
err_handlers are available for drivers for all devices, they could do something
smart like below:

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index c4e5e2c977be..9aabf1fe902e 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -989,6 +989,8 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev,
 
pci_set_drvdata(pdev, test);
 
+   pci_save_state(pdev);
+
id = ida_alloc(&pci_endpoint_test_ida, GFP_KERNEL);
if (id < 0) {
ret = id;
@@ -1140,12 +1142,31 @@ static const struct pci_device_id 
pci_endpoint_test_tbl[] = {
 };
 MODULE_DEVICE_TABLE(pc

Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-08-29 Thread Manivannan Sadhasivam
On Thu, Aug 28, 2025 at 01:01:51PM GMT, Brian Norris wrote:
> On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 Relay 
> wrote:
> > Hi,
> > 
> > Currently, in the event of AER/DPC, PCI core will try to reset the slot 
> > (Root
> > Port) and its subordinate devices by invoking bridge control reset and FLR. 
> > But
> > in some cases like AER Fatal error, it might be necessary to reset the Root
> > Ports using the PCI host bridge drivers in a platform specific way (as 
> > indicated
> > by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
> > Otherwise, the PCI link won't be recovered successfully.
> > 
> > So this series adds a new callback 'pci_host_bridge::reset_root_port' for 
> > the
> > host bridge drivers to reset the Root Port when a fatal error happens.
> > 
> > Also, this series allows the host bridge drivers to handle PCI link down 
> > event
> > by resetting the Root Ports and recovering the bus. This is accomplished by 
> > the
> > help of the new 'pci_host_handle_link_down()' API. Host bridge drivers are
> > expected to call this API (preferrably from a threaded IRQ handler) with
> > relevant Root Port 'pci_dev' when a link down event is detected for the 
> > port.
> > The API will reuse the pcie_do_recovery() function to recover the link if 
> > AER
> > support is enabled, otherwise it will directly call the reset_root_port()
> > callback of the host bridge driver (if exists).
> > 
> > For reference, I've modified the pcie-qcom driver to call
> > pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
> > LINK_DOWN global_irq event and populated 
> > 'pci_host_bridge::reset_root_port()'
> > callback to reset the Root Port. Since the Qcom PCIe controllers support 
> > only
> > a single Root Port (slot) per controller instance, the API is going to be
> > invoked only once. For multi Root Port controllers, the controller driver is
> > expected to detect the Root Port that received the link down event and call
> > the pci_host_handle_link_down() API with 'pci_dev' of that Root Port.
> > 
> > Testing
> > ---
> > 
> > I've lost access to my test setup now. So Krishna (Cced) will help with 
> > testing
> > on the Qcom platform and Wilfred or Niklas should be able to test it on 
> > Rockchip
> > platform. For the moment, this series is compile tested only.
> 
> For the series:
> 
> Tested-by: Brian Norris 
> 
> I've tested the whole thing on Qualcomm SC7280 Herobrine systems with
> NVMe. After adding a debugfs node to control toggling PERST, I can force
> the link to reset, and see it recover and resume NVMe traffic.
> 
> I've tested the first two on Pixel phones, using a non-upstream
> DWC-based driver that I'm working on getting in better shape. (We've
> previously supported a custom link-error API setup instead.) I'd love to
> see this available upstream.
> 

Thanks, Brian for testing! I didn't get time to look into the report from
Niklas (which is the only blocking thing for this series). I'll try to dig into
it today/tomorrow.

- Mani

-- 
மணிவண்ணன் சதாசிவம்



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-08-28 Thread Brian Norris
On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 Relay 
wrote:
> Hi,
> 
> Currently, in the event of AER/DPC, PCI core will try to reset the slot (Root
> Port) and its subordinate devices by invoking bridge control reset and FLR. 
> But
> in some cases like AER Fatal error, it might be necessary to reset the Root
> Ports using the PCI host bridge drivers in a platform specific way (as 
> indicated
> by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
> Otherwise, the PCI link won't be recovered successfully.
> 
> So this series adds a new callback 'pci_host_bridge::reset_root_port' for the
> host bridge drivers to reset the Root Port when a fatal error happens.
> 
> Also, this series allows the host bridge drivers to handle PCI link down event
> by resetting the Root Ports and recovering the bus. This is accomplished by 
> the
> help of the new 'pci_host_handle_link_down()' API. Host bridge drivers are
> expected to call this API (preferrably from a threaded IRQ handler) with
> relevant Root Port 'pci_dev' when a link down event is detected for the port.
> The API will reuse the pcie_do_recovery() function to recover the link if AER
> support is enabled, otherwise it will directly call the reset_root_port()
> callback of the host bridge driver (if exists).
> 
> For reference, I've modified the pcie-qcom driver to call
> pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
> LINK_DOWN global_irq event and populated 'pci_host_bridge::reset_root_port()'
> callback to reset the Root Port. Since the Qcom PCIe controllers support only
> a single Root Port (slot) per controller instance, the API is going to be
> invoked only once. For multi Root Port controllers, the controller driver is
> expected to detect the Root Port that received the link down event and call
> the pci_host_handle_link_down() API with 'pci_dev' of that Root Port.
> 
> Testing
> ---
> 
> I've lost access to my test setup now. So Krishna (Cced) will help with 
> testing
> on the Qcom platform and Wilfred or Niklas should be able to test it on 
> Rockchip
> platform. For the moment, this series is compile tested only.

For the series:

Tested-by: Brian Norris 

I've tested the whole thing on Qualcomm SC7280 Herobrine systems with
NVMe. After adding a debugfs node to control toggling PERST, I can force
the link to reset, and see it recover and resume NVMe traffic.

I've tested the first two on Pixel phones, using a non-upstream
DWC-based driver that I'm working on getting in better shape. (We've
previously supported a custom link-error API setup instead.) I'd love to
see this available upstream.

Regards,
Brian



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-08-15 Thread Niklas Cassel
Hello Mani,

Sorry for the delayed reply.
I just came back from vacation.

On Thu, Jul 24, 2025 at 11:00:05AM +0530, Manivannan Sadhasivam wrote:
> On Fri, Jul 18, 2025 at 12:39:50PM GMT, Niklas Cassel wrote:
> > On Fri, Jul 18, 2025 at 12:28:44PM +0200, Niklas Cassel wrote:
> > > On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 
> > > Relay wrote:
> > > 2) Testing link down reset:
> > > 
> > > selftests before link down reset:
> > > # FAILED: 14 / 16 tests passed.
> > > 
> > > ## On EP side:
> > > # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start 
> > > && \
> > >   sleep 0.1 && echo 1 > 
> > > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > > 
> > > 
> > > [  111.137162] rockchip-dw-pcie a4000.pcie: 
> > > PCIE_CLIENT_INTR_STATUS_MISC: 0x4
> > > [  111.137881] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
> > > [  111.138432] rockchip-dw-pcie a4000.pcie: hot reset or link-down 
> > > reset
> > > [  111.139067] pcieport :00:00.0: Recovering Root Port due to Link 
> > > Down
> > > [  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
> > > error_detected callback)
> > > [  111.255407] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
> > > [  111.256019] rockchip-dw-pcie a4000.pcie: Root Port reset completed
> > > [  111.383401] pcieport :00:00.0: Root Port has been reset
> > > [  111.384060] pcieport :00:00.0: AER: device recovery failed
> > > [  111.384582] rockchip-dw-pcie a4000.pcie: 
> > > PCIE_CLIENT_INTR_STATUS_MISC: 0x3
> > > [  111.385218] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
> > > [  111.385771] rockchip-dw-pcie a4000.pcie: Received Link up event. 
> > > Starting enumeration!
> > > [  111.390866] pcieport :00:00.0: bridge configuration invalid ([bus 
> > > 00-00]), reconfiguring
> > > [  111.391650] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01
> > > 
> > > Basically all tests timeout
> > > # FAILED: 1 / 16 tests passed.
> > > 
> > > Which is the same as before this patch series.
> > 
> > The above was with CONFIG_PCIEAER=y
> > 
> 
> This is kind of expected since the pci_endpoint_test driver doesn't have the 
> AER
> err_handlers defined.

I see.
Would be nice if we could add them then, so that we can verify that this
series is working as intended.


> 
> > Wilfred suggested that I tried without this config set.
> > 
> > However, doing so, I got the exact same result:
> > # FAILED: 1 / 16 tests passed.
> > 
> 
> Interesting. Could you please share the dmesg log like above.

It is looking exactly like the dmesg above

[   86.820059] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
0x4
[   86.820791] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
[   86.821344] rockchip-dw-pcie a4000.pcie: hot reset or link-down reset
[   86.821978] pcieport :00:00.0: Recovering Root Port due to Link Down
[   87.040551] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
[   87.041138] rockchip-dw-pcie a4000.pcie: Root Port reset completed
[   87.168378] pcieport :00:00.0: Root Port has been reset
[   87.168882] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
0x3
[   87.169519] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
[   87.272463] rockchip-dw-pcie a4000.pcie: Received Link up event. 
Starting enumeration!
[   87.277552] pcieport :00:00.0: bridge configuration invalid ([bus 
00-00]), reconfiguring
[   87.278314] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01

except that we don't get the:
> [  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
> error_detected callback)
> [  111.384060] pcieport :00:00.0: AER: device recovery failed

prints.


Kind regards,
Niklas



RE: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-07-24 Thread Hongxing Zhu
> -Original Message-
> From: Manivannan Sadhasivam via B4 Relay
> 
> Sent: 2025年7月15日 22:21
> To: Bjorn Helgaas ; Mahesh J Salgaonkar
> ; Oliver O'Halloran ; Will
> Deacon ; Lorenzo Pieralisi ; Krzysztof
> Wilczy��ski ; Manivannan Sadhasivam
> ; Rob Herring ; Heiko Stuebner
> ; Philipp Zabel 
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Niklas
> Cassel ; Wilfred Mallawa ;
> Krishna Chaitanya Chundru ;
> [email protected]; Lukas Wunner ; Manivannan Sadhasivam
> ; Manivannan Sadhasivam
> 
> Subject: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a
> platform specific way
>
> [You don't often get email from
> [email protected]. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hi,
>
> Currently, in the event of AER/DPC, PCI core will try to reset the slot (Root
> Port) and its subordinate devices by invoking bridge control reset and FLR. 
> But in
> some cases like AER Fatal error, it might be necessary to reset the Root Ports
> using the PCI host bridge drivers in a platform specific way (as indicated by 
> the
> TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
> Otherwise, the PCI link won't be recovered successfully.
>
> So this series adds a new callback 'pci_host_bridge::reset_root_port' for the
> host bridge drivers to reset the Root Port when a fatal error happens.
>
> Also, this series allows the host bridge drivers to handle PCI link down 
> event by
> resetting the Root Ports and recovering the bus. This is accomplished by the 
> help
> of the new 'pci_host_handle_link_down()' API. Host bridge drivers are expected
> to call this API (preferrably from a threaded IRQ handler) with relevant Root
> Port 'pci_dev' when a link down event is detected for the port.
> The API will reuse the pcie_do_recovery() function to recover the link if AER
> support is enabled, otherwise it will directly call the reset_root_port() 
> callback
> of the host bridge driver (if exists).
>
> For reference, I've modified the pcie-qcom driver to call
> pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
> LINK_DOWN global_irq event and populated
> 'pci_host_bridge::reset_root_port()'
> callback to reset the Root Port. Since the Qcom PCIe controllers support only 
> a
> single Root Port (slot) per controller instance, the API is going to be 
> invoked only
> once. For multi Root Port controllers, the controller driver is expected to 
> detect
> the Root Port that received the link down event and call the
> pci_host_handle_link_down() API with 'pci_dev' of that Root Port.
>
> Testing
> ---
>
> I've lost access to my test setup now. So Krishna (Cced) will help with 
> testing on
> the Qcom platform and Wilfred or Niklas should be able to test it on Rockchip
> platform. For the moment, this series is compile tested only.
>
> Changes in v6:
> - Incorporated the patch:
> https://lore.kern/
> el.org%2Fall%2F20250524185304.26698-2-manivannan.sadhasivam%40linaro.o
> rg%2F&data=05%7C02%7Chongxing.zhu%40nxp.com%7C33c08e5bb14347e5c3
> cc08ddc3accaf5%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6388
> 81869083440222%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydW
> UsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D
> %7C0%7C%7C%7C&sdata=BsEibf7v8wAQzwt%2BbgozxE0Se8vvF9lb1O%2F0Hw
> 1gG1M%3D&reserved=0
> - Link to v5:
> https://lore.kern/
> el.org%2Fr%2F20250715-pci-port-reset-v5-0-26a5d278db40%40oss.qualcomm.
> com&data=05%7C02%7Chongxing.zhu%40nxp.com%7C33c08e5bb14347e5c3c
> c08ddc3accaf5%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63888
> 1869083460674%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUs
> IlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%
> 7C0%7C%7C%7C&sdata=JVUK2udmAC4GCN6%2Bg%2B7rMVhnQWJXBF972JB2
> GJMrfWc%3D&reserved=0
>
> Changes in v5:
> * Reworked the pci_host_handle_link_down() to accept Root Port instead of
>   resetting all Root Ports in the event of link down.
> * Renamed 'reset_slot' to 'reset_root_port' to avoid confusion as both terms
>   were used interchangibly and the series is intended to reset Root Port only.
> * Added the Rockchip driver change to this series.
> * Dropped the applied patches and review/tested tags due to rework.
> * Rebased on top of v6.16-rc1.
>
> Changes in v4:
> - Handled link down first in the irq handler
> - Updated ICC & OPP bandwidth after link up in reset_slot() callback
> - Link to v3:
> https://lore.kern/
> el.org%2Fr%2F20250417-pcie-reset-slot-v3-0-59a10811c962%40linaro.org&dat
> a=05%7C02%7Chongxing.zhu%40nxp.com%7C33c08e5bb14347e5c3cc08ddc3a
> ccaf5%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6388818690834
> 74708%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLj
> AuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C

Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-07-23 Thread Manivannan Sadhasivam
On Fri, Jul 18, 2025 at 12:39:50PM GMT, Niklas Cassel wrote:
> On Fri, Jul 18, 2025 at 12:28:44PM +0200, Niklas Cassel wrote:
> > On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 
> > Relay wrote:
> > 2) Testing link down reset:
> > 
> > selftests before link down reset:
> > # FAILED: 14 / 16 tests passed.
> > 
> > ## On EP side:
> > # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && 
> > \
> >   sleep 0.1 && echo 1 > 
> > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> > 
> > 
> > [  111.137162] rockchip-dw-pcie a4000.pcie: 
> > PCIE_CLIENT_INTR_STATUS_MISC: 0x4
> > [  111.137881] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
> > [  111.138432] rockchip-dw-pcie a4000.pcie: hot reset or link-down reset
> > [  111.139067] pcieport :00:00.0: Recovering Root Port due to Link Down
> > [  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
> > error_detected callback)
> > [  111.255407] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
> > [  111.256019] rockchip-dw-pcie a4000.pcie: Root Port reset completed
> > [  111.383401] pcieport :00:00.0: Root Port has been reset
> > [  111.384060] pcieport :00:00.0: AER: device recovery failed
> > [  111.384582] rockchip-dw-pcie a4000.pcie: 
> > PCIE_CLIENT_INTR_STATUS_MISC: 0x3
> > [  111.385218] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
> > [  111.385771] rockchip-dw-pcie a4000.pcie: Received Link up event. 
> > Starting enumeration!
> > [  111.390866] pcieport :00:00.0: bridge configuration invalid ([bus 
> > 00-00]), reconfiguring
> > [  111.391650] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01
> > 
> > Basically all tests timeout
> > # FAILED: 1 / 16 tests passed.
> > 
> > Which is the same as before this patch series.
> 
> The above was with CONFIG_PCIEAER=y
> 

This is kind of expected since the pci_endpoint_test driver doesn't have the AER
err_handlers defined.

> Wilfred suggested that I tried without this config set.
> 
> However, doing so, I got the exact same result:
> # FAILED: 1 / 16 tests passed.
> 

Interesting. Could you please share the dmesg log like above.

- Mani

-- 
மணிவண்ணன் சதாசிவம்



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-07-18 Thread Niklas Cassel
On Fri, Jul 18, 2025 at 12:28:44PM +0200, Niklas Cassel wrote:
> On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 Relay 
> wrote:
> 2) Testing link down reset:
> 
> selftests before link down reset:
> # FAILED: 14 / 16 tests passed.
> 
> ## On EP side:
> # echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
>   sleep 0.1 && echo 1 > 
> /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start
> 
> 
> [  111.137162] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
> 0x4
> [  111.137881] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
> [  111.138432] rockchip-dw-pcie a4000.pcie: hot reset or link-down reset
> [  111.139067] pcieport :00:00.0: Recovering Root Port due to Link Down
> [  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
> error_detected callback)
> [  111.255407] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
> [  111.256019] rockchip-dw-pcie a4000.pcie: Root Port reset completed
> [  111.383401] pcieport :00:00.0: Root Port has been reset
> [  111.384060] pcieport :00:00.0: AER: device recovery failed
> [  111.384582] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
> 0x3
> [  111.385218] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
> [  111.385771] rockchip-dw-pcie a4000.pcie: Received Link up event. 
> Starting enumeration!
> [  111.390866] pcieport :00:00.0: bridge configuration invalid ([bus 
> 00-00]), reconfiguring
> [  111.391650] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01
> 
> Basically all tests timeout
> # FAILED: 1 / 16 tests passed.
> 
> Which is the same as before this patch series.

The above was with CONFIG_PCIEAER=y

Wilfred suggested that I tried without this config set.

However, doing so, I got the exact same result:
# FAILED: 1 / 16 tests passed.


For the record, the test that passes is not actually passing either,
it is the BAR4 test, which is skipped, since BAR4 is reserved on rock5b:
ok 5 pci_ep_bar.BAR4.BAR_TEST # SKIP BAR is disabled


Kind regards,
Niklas



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-07-18 Thread Niklas Cassel
On Tue, Jul 15, 2025 at 07:51:03PM +0530, Manivannan Sadhasivam via B4 Relay 
wrote:
> Testing
> ---
> 
> I've lost access to my test setup now. So Krishna (Cced) will help with 
> testing
> on the Qcom platform and Wilfred or Niklas should be able to test it on 
> Rockchip
> platform. For the moment, this series is compile tested only.


Since this patch series implements two things:

1) Testing sysfs initiated reset:

selftests before sysfs initiated reset:
# FAILED: 14 / 16 tests passed.

# echo 1 > /sys/bus/pci/devices/:01:00.0/reset

[  145.567748] pci-endpoint-test :01:00.0: resetting
[  145.638755] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
0x3
[  145.639472] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
[  145.640063] rockchip-dw-pcie a4000.pcie: Received Link up event. 
Starting enumeration!
[  145.682612] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
[  145.683162] rockchip-dw-pcie a4000.pcie: Root Port reset completed
[  145.810852] pci-endpoint-test :01:00.0: reset done

selftests after sysfs initiated reset:
# FAILED: 14 / 16 tests passed.

(Without this patch series: # FAILED: 7 / 16 tests passed.)

So for this part:
Tested-by: Niklas Cassel 




2) Testing link down reset:

selftests before link down reset:
# FAILED: 14 / 16 tests passed.

## On EP side:
# echo 0 > /sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start && \
  sleep 0.1 && echo 1 > 
/sys/kernel/config/pci_ep/controllers/a4000.pcie-ep/start


[  111.137162] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
0x4
[  111.137881] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x0
[  111.138432] rockchip-dw-pcie a4000.pcie: hot reset or link-down reset
[  111.139067] pcieport :00:00.0: Recovering Root Port due to Link Down
[  111.139686] pci-endpoint-test :01:00.0: AER: can't recover (no 
error_detected callback)
[  111.255407] rockchip-dw-pcie a4000.pcie: PCIe Gen.3 x4 link up
[  111.256019] rockchip-dw-pcie a4000.pcie: Root Port reset completed
[  111.383401] pcieport :00:00.0: Root Port has been reset
[  111.384060] pcieport :00:00.0: AER: device recovery failed
[  111.384582] rockchip-dw-pcie a4000.pcie: PCIE_CLIENT_INTR_STATUS_MISC: 
0x3
[  111.385218] rockchip-dw-pcie a4000.pcie: LTSSM_STATUS: 0x230011
[  111.385771] rockchip-dw-pcie a4000.pcie: Received Link up event. 
Starting enumeration!
[  111.390866] pcieport :00:00.0: bridge configuration invalid ([bus 
00-00]), reconfiguring
[  111.391650] pci_bus :01: busn_res: [bus 01-ff] end is updated to 01

Basically all tests timeout
# FAILED: 1 / 16 tests passed.

Which is the same as before this patch series.

So AFAICT, this part does not seem to work as advertised.

Instead of quickly stopping and starting the link, I also tried to reboot the
EP board, which does the configfs writes and starts the link automatically on
boot, but that had the same result as quickly stopping and starting the link.


Kind regards,
Niklas



Re: [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way

2025-07-17 Thread Krishna Chaitanya Chundru




On 7/15/2025 7:51 PM, Manivannan Sadhasivam via B4 Relay wrote:

Hi,

Currently, in the event of AER/DPC, PCI core will try to reset the slot (Root
Port) and its subordinate devices by invoking bridge control reset and FLR. But
in some cases like AER Fatal error, it might be necessary to reset the Root
Ports using the PCI host bridge drivers in a platform specific way (as indicated
by the TODO in the pcie_do_recovery() function in drivers/pci/pcie/err.c).
Otherwise, the PCI link won't be recovered successfully.

So this series adds a new callback 'pci_host_bridge::reset_root_port' for the
host bridge drivers to reset the Root Port when a fatal error happens.

Also, this series allows the host bridge drivers to handle PCI link down event
by resetting the Root Ports and recovering the bus. This is accomplished by the
help of the new 'pci_host_handle_link_down()' API. Host bridge drivers are
expected to call this API (preferrably from a threaded IRQ handler) with
relevant Root Port 'pci_dev' when a link down event is detected for the port.
The API will reuse the pcie_do_recovery() function to recover the link if AER
support is enabled, otherwise it will directly call the reset_root_port()
callback of the host bridge driver (if exists).

For reference, I've modified the pcie-qcom driver to call
pci_host_handle_link_down() API with Root Port 'pci_dev' after receiving the
LINK_DOWN global_irq event and populated 'pci_host_bridge::reset_root_port()'
callback to reset the Root Port. Since the Qcom PCIe controllers support only
a single Root Port (slot) per controller instance, the API is going to be
invoked only once. For multi Root Port controllers, the controller driver is
expected to detect the Root Port that received the link down event and call
the pci_host_handle_link_down() API with 'pci_dev' of that Root Port.

Testing
---

I've lost access to my test setup now. So Krishna (Cced) will help with testing
on the Qcom platform and Wilfred or Niklas should be able to test it on Rockchip
platform. For the moment, this series is compile tested only.

Tested on QCOM platform rb3gen2.


Changes in v6:
- Incorporated the patch: 
https://lore.kernel.org/all/[email protected]/
- Link to v5: 
https://lore.kernel.org/r/[email protected]

Changes in v5:
* Reworked the pci_host_handle_link_down() to accept Root Port instead of
   resetting all Root Ports in the event of link down.
* Renamed 'reset_slot' to 'reset_root_port' to avoid confusion as both terms
   were used interchangibly and the series is intended to reset Root Port only.
* Added the Rockchip driver change to this series.
* Dropped the applied patches and review/tested tags due to rework.
* Rebased on top of v6.16-rc1.

Changes in v4:
- Handled link down first in the irq handler
- Updated ICC & OPP bandwidth after link up in reset_slot() callback
- Link to v3: 
https://lore.kernel.org/r/[email protected]

Changes in v3:
- Made the pci-host-common driver as a common library for host controller
   drivers
- Moved the reset slot code to pci-host-common library
- Link to v2: 
https://lore.kernel.org/r/[email protected]

Changes in v2:
- Moved calling reset_slot() callback from pcie_do_recovery() to 
pcibios_reset_secondary_bus()
- Link to v1: 
https://lore.kernel.org/r/[email protected]

Signed-off-by: Manivannan Sadhasivam 

Tested-by: Krishna Chaitanya Chundru 

- Krishna Chaitanya.

---
Manivannan Sadhasivam (3):
   PCI/ERR: Add support for resetting the Root Ports in a platform specific 
way
   PCI: host-common: Add link down handling for Root Ports
   PCI: qcom: Add support for resetting the Root Port due to link down event

Wilfred Mallawa (1):
   PCI: dw-rockchip: Add support to reset Root Port upon link down event

  drivers/pci/controller/dwc/Kconfig|   2 +
  drivers/pci/controller/dwc/pcie-dw-rockchip.c |  91 ++-
  drivers/pci/controller/dwc/pcie-qcom.c| 120 --
  drivers/pci/controller/pci-host-common.c  |  33 +++
  drivers/pci/controller/pci-host-common.h  |   1 +
  drivers/pci/pci.c |  21 +
  drivers/pci/pcie/err.c|   6 +-
  include/linux/pci.h   |   1 +
  8 files changed, 260 insertions(+), 15 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250715-pci-port-reset-4d9519570123

Best regards,