Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-06-25 Thread Krishna Kumar


On 5/31/24 7:22 PM, Krishna Kumar wrote:
> On 5/31/24 16:14, Krishna Kumar wrote:
>> On 5/23/24 20:22, Krishna Kumar wrote:
>>> Hi Oliver,
>>>
>>> Thanks for your suggestions. Pls find my response:
>>>
>>> On 5/20/24 20:29, Oliver O'Halloran wrote:
 On Fri, May 17, 2024 at 9:15 PM krishna kumar  
 wrote:
>> Uh, if I'm reading this right it looks like your "slot" C5 is actually
>> the PCIe switch's internal bus which is definitely not hot pluggable.
> It's a hotplug slot. Please see the snippet below:
>
> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
>   SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
> Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
>  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
> Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
>  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
> Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
>  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
> Surprise-
> :~$
 All this is showing is that the switch downstream ports on bus 0004:02
 have a slot capability. I already know that (see what I said
 previously about physical links). The fact the downstream ports have a
 slot capability also has absolutely nothing to do with anything I was
 saying. Look at the lspci output for 0004:01:00.0 which is the
 switch's upstream port. The upstream port device will not have a slot
 capability because it's a bridge into the virtual PCI bus that is
 internal to the switch.
>>> Let me try to understand your suggestion and what needs to be done now:
>>>
>>> lspci -nn snippet in current scenario:
>>>
>>> 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>>> 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052]
>>> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>>> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>>> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>>> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>>>
>>> lspci -tv snippet in current scenario:
>>>
>>> +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]--
>>>  |   |   +-01.0-[08]--+-00.0  Broadcom 
>>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>>  |   |   |    +-00.1  Broadcom 
>>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>>  |   |   |    +-00.2  Broadcom 
>>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>>  |   |   |    \-00.3  Broadcom 
>>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>>  |   |   +-02.0-[09]00.0  Broadcom 
>>> / LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>>>  |   |   \-03.0-[0a]00.0  IBM PCI-E 
>>> IPR SAS Adapter (ASIC)
>>>  |   \-00.1  PMC-Sierra Inc. Device 4052
>>>
>>> C5 bus address:
>>>
>>> [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address
>>> 0004:02:00
>>> [root@ltczzess2-lp1 ~]#
>>>
>>> 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does
>>> have this capability. Below snippet tells about this:
>>>
>>> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color 
>>> HotPlug
>>> [root@ltczzess2-lp1 ~]#
>>> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color 
>>> HotPlug
>>>     SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
>>> Surprise-
>>> [root@ltczzess2-lp1 ~]#
>>>
>>> In Function -  pnv_php_register_one() is responsible for slot creation from
>>> hotplug capable device node:
>>>
>>> Below is the current code that does check the device node for hot plug
>>> capability and takes the decision
>>>
>>>  /* Check if it's hotpluggable slot */
>>>     ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32);
>>>     if (ret || !prop32){
>>>     return -ENXIO;
>>>     }
>>>
>>> Its obvious that 0004:01:00.0 does not get above criteria fulfilled but
>>> 0004:02:00.0 does, so is the current behavior (Upstream port is not became
>>> C5 slot but downstream port became C5 slot).
>>>
>>> I am summarizing your suggested changes. Please let
>>> me know if I've got it right:
>>>
>>> 1. Do you want me to modify the code so that the C5
>>> device-bdf and bus-address become 0004:01:00/0004:01
>>> instead of 0004:02:00/0004:01?
>>>
>>> 2. When performing a hot-unplug operation on C5,
>>> should all devices from 0004:01 be removed? And
>>> should all devices from 0004:02 also be removed?
>>> I think the answer is yes, but 

Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-31 Thread Krishna Kumar


On 5/31/24 16:14, Krishna Kumar wrote:
> On 5/23/24 20:22, Krishna Kumar wrote:
>>
>> Hi Oliver,
>>
>> Thanks for your suggestions. Pls find my response:
>>
>> On 5/20/24 20:29, Oliver O'Halloran wrote:
>>> On Fri, May 17, 2024 at 9:15 PM krishna kumar  
>>> wrote:
> Uh, if I'm reading this right it looks like your "slot" C5 is actually
> the PCIe switch's internal bus which is definitely not hot pluggable.
 It's a hotplug slot. Please see the snippet below:

 :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
   SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
 Surprise-
 :~$

 :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
 Surprise-
 :~$

 :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
 Surprise-
 :~$

 :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
 Surprise-
 :~$
>>> All this is showing is that the switch downstream ports on bus 0004:02
>>> have a slot capability. I already know that (see what I said
>>> previously about physical links). The fact the downstream ports have a
>>> slot capability also has absolutely nothing to do with anything I was
>>> saying. Look at the lspci output for 0004:01:00.0 which is the
>>> switch's upstream port. The upstream port device will not have a slot
>>> capability because it's a bridge into the virtual PCI bus that is
>>> internal to the switch.
>> Let me try to understand your suggestion and what needs to be done now:
>>
>> lspci -nn snippet in current scenario:
>>
>> 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>>
>> lspci -tv snippet in current scenario:
>>
>> +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]--
>>  |   |   +-01.0-[08]--+-00.0  Broadcom 
>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>  |   |   |    +-00.1  Broadcom 
>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>  |   |   |    +-00.2  Broadcom 
>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>  |   |   |    \-00.3  Broadcom 
>> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>>  |   |   +-02.0-[09]00.0  Broadcom / 
>> LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>>  |   |   \-03.0-[0a]00.0  IBM PCI-E 
>> IPR SAS Adapter (ASIC)
>>  |   \-00.1  PMC-Sierra Inc. Device 4052
>>
>> C5 bus address:
>>
>> [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address
>> 0004:02:00
>> [root@ltczzess2-lp1 ~]#
>>
>> 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does
>> have this capability. Below snippet tells about this:
>>
>> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color 
>> HotPlug
>> [root@ltczzess2-lp1 ~]#
>> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color 
>> HotPlug
>>     SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
>> [root@ltczzess2-lp1 ~]#
>>
>> In Function -  pnv_php_register_one() is responsible for slot creation from
>> hotplug capable device node:
>>
>> Below is the current code that does check the device node for hot plug
>> capability and takes the decision
>>
>>  /* Check if it's hotpluggable slot */
>>     ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32);
>>     if (ret || !prop32){
>>     return -ENXIO;
>>     }
>>
>> Its obvious that 0004:01:00.0 does not get above criteria fulfilled but
>> 0004:02:00.0 does, so is the current behavior (Upstream port is not became
>> C5 slot but downstream port became C5 slot).
>>
>> I am summarizing your suggested changes. Please let
>> me know if I've got it right:
>>
>> 1. Do you want me to modify the code so that the C5
>> device-bdf and bus-address become 0004:01:00/0004:01
>> instead of 0004:02:00/0004:01?
>>
>> 2. When performing a hot-unplug operation on C5,
>> should all devices from 0004:01 be removed? And
>> should all devices from 0004:02 also be removed?
>> I think the answer is yes, but please confirm.
>>
>> 3. When performing a hot-plug operation on C5,
>> should all the devices removed earlier from 0004:01
>>  and 0004:02 be re-attached?

Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-31 Thread Krishna Kumar


On 5/23/24 20:22, Krishna Kumar wrote:
>
>
> Hi Oliver,
>
> Thanks for your suggestions. Pls find my response:
>
> On 5/20/24 20:29, Oliver O'Halloran wrote:
>> On Fri, May 17, 2024 at 9:15 PM krishna kumar  wrote:
 Uh, if I'm reading this right it looks like your "slot" C5 is actually
 the PCIe switch's internal bus which is definitely not hot pluggable.
>>> It's a hotplug slot. Please see the snippet below:
>>>
>>> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
>>>   SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
>>> Surprise-
>>> :~$
>>>
>>> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
>>>  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
>>> Surprise-
>>> :~$
>>>
>>> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
>>>  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
>>> Surprise-
>>> :~$
>>>
>>> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
>>>  SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ 
>>> Surprise-
>>> :~$
>> All this is showing is that the switch downstream ports on bus 0004:02
>> have a slot capability. I already know that (see what I said
>> previously about physical links). The fact the downstream ports have a
>> slot capability also has absolutely nothing to do with anything I was
>> saying. Look at the lspci output for 0004:01:00.0 which is the
>> switch's upstream port. The upstream port device will not have a slot
>> capability because it's a bridge into the virtual PCI bus that is
>> internal to the switch.
>
> Let me try to understand your suggestion and what needs to be done now:
>
> lspci -nn snippet in current scenario:
>
> 0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
>
> lspci -tv snippet in current scenario:
>
> +-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]--
>  |   |   +-01.0-[08]--+-00.0  Broadcom 
> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>  |   |   |    +-00.1  Broadcom 
> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>  |   |   |    +-00.2  Broadcom 
> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>  |   |   |    \-00.3  Broadcom 
> Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
>  |   |   +-02.0-[09]00.0  Broadcom / 
> LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |   |   \-03.0-[0a]00.0  IBM PCI-E 
> IPR SAS Adapter (ASIC)
>  |   \-00.1  PMC-Sierra Inc. Device 4052
>
> C5 bus address:
>
> [root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address
> 0004:02:00
> [root@ltczzess2-lp1 ~]#
>
> 0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does
> have this capability. Below snippet tells about this:
>
> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color HotPlug
> [root@ltczzess2-lp1 ~]#
> [root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
>     SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> [root@ltczzess2-lp1 ~]#
>
> In Function -  pnv_php_register_one() is responsible for slot creation from
> hotplug capable device node:
>
> Below is the current code that does check the device node for hot plug
> capability and takes the decision
>
>  /* Check if it's hotpluggable slot */
>     ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32);
>     if (ret || !prop32){
>     return -ENXIO;
>     }
>
> Its obvious that 0004:01:00.0 does not get above criteria fulfilled but
> 0004:02:00.0 does, so is the current behavior (Upstream port is not became
> C5 slot but downstream port became C5 slot).
>
> I am summarizing your suggested changes. Please let
> me know if I've got it right:
>
> 1. Do you want me to modify the code so that the C5
> device-bdf and bus-address become 0004:01:00/0004:01
> instead of 0004:02:00/0004:01?
>
> 2. When performing a hot-unplug operation on C5,
> should all devices from 0004:01 be removed? And
> should all devices from 0004:02 also be removed?
> I think the answer is yes, but please confirm.
>
> 3. When performing a hot-plug operation on C5,
> should all the devices removed earlier from 0004:01
>  and 0004:02 be re-attached?
>
> 4. Will there be any PCIe topology changes in this workflow?
>
> Once you confirm the above requirements, we can discuss
> how to proceed further.
> I have so

Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-30 Thread Krishna Kumar




Hi Oliver,

Thanks for your suggestions. Pls find my response:

On 5/20/24 20:29, Oliver O'Halloran wrote:

On Fri, May 17, 2024 at 9:15 PM krishna kumar  wrote:

Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.

It's a hotplug slot. Please see the snippet below:

:~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
  SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

:~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
 SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

:~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
 SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

:~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
 SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

All this is showing is that the switch downstream ports on bus 0004:02
have a slot capability. I already know that (see what I said
previously about physical links). The fact the downstream ports have a
slot capability also has absolutely nothing to do with anything I was
saying. Look at the lspci output for 0004:01:00.0 which is the
switch's upstream port. The upstream port device will not have a slot
capability because it's a bridge into the virtual PCI bus that is
internal to the switch.


Let me try to understand your suggestion and what needs to be done now:

lspci -nn snippet in current scenario:

0004:01:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:01:00.1 Memory controller [0580]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]

lspci -tv snippet in current scenario:

+-[0001:00]---00.0-[01-0a]--+-00.0-[02-0a]--+-00.0-[03-07]--
 |   |   +-01.0-[08]--+-00.0  Broadcom Inc. 
and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
 |   |   |    +-00.1  Broadcom Inc. 
and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
 |   |   |    +-00.2  Broadcom Inc. 
and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
 |   |   |    \-00.3  Broadcom Inc. 
and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
 |   |   +-02.0-[09]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
 |   |   \-03.0-[0a]00.0  IBM PCI-E IPR 
SAS Adapter (ASIC)
 |   \-00.1  PMC-Sierra Inc. Device 4052

C5 bus address:

[root@ltczzess2-lp1 ~]# cat /sys/bus/pci/slots/C5/address
0004:02:00
[root@ltczzess2-lp1 ~]#

0004:01:00.0 doesn't have hotplug capability but 0004:02:00.0 does
have this capability. Below snippet tells about this:

[root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:01:00.0 | grep --color HotPlug
[root@ltczzess2-lp1 ~]#
[root@ltczzess2-lp1 ~]# sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
    SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
[root@ltczzess2-lp1 ~]#

In Function -  pnv_php_register_one() is responsible for slot creation from
hotplug capable device node:

Below is the current code that does check the device node for hot plug
capability and takes the decision

 /* Check if it's hotpluggable slot */
    ret = of_property_read_u32(dn, "ibm,slot-pluggable", &prop32);
    if (ret || !prop32){
    return -ENXIO;
    }

Its obvious that 0004:01:00.0 does not get above criteria fulfilled but
0004:02:00.0 does, so is the current behavior (Upstream port is not became
C5 slot but downstream port became C5 slot).

I am summarizing your suggested changes. Please let
me know if I've got it right:

1. Do you want me to modify the code so that the C5
device-bdf and bus-address become 0004:01:00/0004:01
instead of 0004:02:00/0004:01?

2. When performing a hot-unplug operation on C5,
should all devices from 0004:01 be removed? And
should all devices from 0004:02 also be removed?
I think the answer is yes, but please confirm.

3. When performing a hot-plug operation on C5,
should all the devices removed earlier from 0004:01
 and 0004:02 be re-attached?

4. Will there be any PCIe topology changes in this workflow?

Once you confirm the above requirements, we can discuss
how to proceed further.
I have some follow up questions from your last mail and I am
putting these questions in below paragraphs as inline statements.
It will confirm me if we should do above things or not.



It seems like your explanation about the missing 0004:01:00.0 may be
correct and could be due to a firmware bug. However, the scope

Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-20 Thread Oliver O'Halloran
On Fri, May 17, 2024 at 9:15 PM krishna kumar  wrote:
>
> > Uh, if I'm reading this right it looks like your "slot" C5 is actually
> > the PCIe switch's internal bus which is definitely not hot pluggable.
>
> It's a hotplug slot. Please see the snippet below:
>
> :~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
>  SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
> SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
> SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$
>
> :~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
> SltCap:AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
> :~$

All this is showing is that the switch downstream ports on bus 0004:02
have a slot capability. I already know that (see what I said
previously about physical links). The fact the downstream ports have a
slot capability also has absolutely nothing to do with anything I was
saying. Look at the lspci output for 0004:01:00.0 which is the
switch's upstream port. The upstream port device will not have a slot
capability because it's a bridge into the virtual PCI bus that is
internal to the switch.

> It seems like your explanation about the missing 0004:01:00.0 may be
> correct and could be due to a firmware bug. However, the scope of this
> patch does not relate to this issue. Additionally, if it starts with
> 0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug
> operations will remain inconsistent. This patch aims to address the
> inconsistent behavior of hot-unplug and hot-plug.
>
> *snip*
>
> > It might be worth adding some logic to pnv_php to verify the PCI
> > bridge upstream of the slot actually has the PCIe slot capability to
> > guard against this problem.
>
> We can have a look at this problem in another patch.

The point of this series is to fix the behaviour of pnv_php, is it
not? Powering off a PCI(e) slot is supposed to render it safe to
remove the card  in that slot. Currently if you "power off" C5, the
kernel is still going to have active references to the switch's
upstream port device (at 0004:01:00.0) and the switch management
function (at 0004:01:00.1). If the kernel has active references to PCI
devices physically located in the slot we supposedly powered off, then
the hotplug driver isn't doing its job. The asymmetry between hot add
and removal that you're trying to fix here is a side effect of the
fact that pnv_php is advertising the wrong thing as a slot. I think
you should stop pnv_php from advertising something as a slot when it's
not actually a slot because that's the root of all your problems.

> We wanted to handle the more generic case and did not want to be confined to
> only one device assumption. We want to fix the current inconsistent behavior
> more generically.

Right, as I said above I don't think handing the more generic case is
actually required if pnv_php is doing its job properly. It doesn't
hurt though.

> Regarding the fix, the fix is obvious:

really?

> We have to traverse
> and find the bridge ports from DT and invoke  pci_scan_slot() on them. This 
> will
> discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 
> on
> the given bus- 0004:02). There is already an existing function, 
> pci_scan_bridge()
> which is doing invocation of pci_scan_slot () for the devices behind the 
> bridge,
> in this case for  SAS device. So eventually, we are doing a scan of all the 
> entities
> behind the slot.

I already read your patch so I'm not sure why you feel the need to
re-describe it in tedious detail.

> Would you like me to combine the non-bridge and bridge cases into one? I can 
> attempt
> to do this. Hopefully, if we incorporate the iterate sibling logic case 
> correctly,
> we may not need to maintain these two separate cases for bridge and 
> non-bridge. I
> will attempt this, and if it works, I will include it in the next patch. 
> Thanks.

Yes, do that.

Also, do not post HTML emails to linux development lists. It breaks
plain text inline quoting which makes your messages annoying to reply
to. Some linux development lists will also silently drop HTML emails.
Please talk to the other LTC engineers about how to set up your mail
client to send plain text emails to avoid these problems in the
future.

Oliver


Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-17 Thread krishna kumar

Hi Oliver,

Thanks for review. Please find my response below -


On 5/17/24 08:12, Oliver O'Halloran wrote:

On Tue, May 14, 2024 at 11:54 PM Krishna Kumar  wrote:

There is an issue with the hotplug operation when it's done on the
bridge/switch slot. The bridge-port and devices behind the bridge, which
become offline by hot-unplug operation, don't get hot-plugged/enabled by
doing hot-plug operation on that slot. Only the first port of the bridge
gets enabled and the remaining port/devices remain unplugged. The hot
plug/unplug operation is done by the hotplug driver
(drivers/pci/hotplug/pnv_php.c).

Root Cause Analysis: This behavior is due to missing code for the DPC
switch/bridge.

I don't see anything touching DPC in this series?


I apologize for the confusion. When I mentioned DPC, I was referring to 
the downward bridge port (the sibling ones) that were not enabled after 
the hot plug operation. I didn't mean to refer to DPC events. It seems 
that this caused some confusion, and I will remove this word in my next 
version of the patch. Thank you!





*snip*

Command for reproducing the issue :

For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
For hot plug/enable -echo 1 > /sys/bus/pci/slots/C5/power

where C5 is slot associated with bridge.

Scenario/Tests:
Output of lspci -nn before test is given below. This snippet contains
devices used for testing on Powernv machine.

0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:08:00.0 Serial Attached SCSI controller [0107]:
Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
0004:09:00.0 Serial Attached SCSI controller [0107]:
Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)

Output of lspci -tv before test is as follows:

# lspci -tv
  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
  |   |   +-01.0-[08]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
  |   |   +-02.0-[09]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
  |   |   \-03.0-[0a-0e]--
  |   \-00.1  PMC-Sierra Inc. Device 4052

C5(bridge) and C6(End Point) slot address are as below:
# cat /sys/bus/pci/slots/C5/address
0004:02:00
# cat /sys/bus/pci/slots/C6/address
0004:09:00

Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.


It's a hotplug slot. Please see the snippet below:

:~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug

 SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-

:~$

:~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug

    SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-

:~$

:~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug

    SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-

:~$

:~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug

    SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-

:~$



I find it helps to look at the PCI topology in terms of where the
physical PCIe links are. Here we've got:

- A link between the PHB (0004:00:00.0) and the switch upstream port
(0004:01:00.0)
- A link from switch downstream port 0 (0004:02:00.0) to nothing
- A link from switch downstream port 1 (0004:02:01.0) to a SAS card
- A link from switch downstream port 2 (0004:02:02.0) to a SAS card
- A link from switch downstream port 2 (0004:02:03.0) to nothing

Note that there's no PCIe link between the switch upstream port
(0004:01:00.0) and the downstream ports on bus 0004:02. The connection
between those is invisible to us because it's custom bus logic
internal to the PCIe switch ASIC. What I think has happened here is
that system firmware has supplied bad PCIe slot information to OPAL
which has resulted in pnv_php advertising a slot in the wrong place.
Assuming this following the usual IBM convention I'd expect the bridge
device for C5 to be the PHB's root port and the bus should be 0004:01.


It seems like your explanation about the missing 0004:01:00.0 may be 
correct and could be due to a firmware bug. However, the scope of this 
patch does not relate to this issue. Additionally, if it starts with 
0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug 
operations will remain inconsistent. This patch aims to address the 
inconsistent behavior of hot-unplug and hot-plug.



It might be worth adding some logic to pnv_php to verify the PCI
bridge upstream of the slot actually has the PCIe slot capability to
guard against this problem.


We can have a look at this problem in another patch.




Hot-unplu

Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-16 Thread Oliver O'Halloran
On Tue, May 14, 2024 at 11:54 PM Krishna Kumar  wrote:
>
> There is an issue with the hotplug operation when it's done on the
> bridge/switch slot. The bridge-port and devices behind the bridge, which
> become offline by hot-unplug operation, don't get hot-plugged/enabled by
> doing hot-plug operation on that slot. Only the first port of the bridge
> gets enabled and the remaining port/devices remain unplugged. The hot
> plug/unplug operation is done by the hotplug driver
> (drivers/pci/hotplug/pnv_php.c).
>
> Root Cause Analysis: This behavior is due to missing code for the DPC
> switch/bridge.

I don't see anything touching DPC in this series?

> *snip*
>
> Command for reproducing the issue :
>
> For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
> For hot plug/enable -echo 1 > /sys/bus/pci/slots/C5/power
>
> where C5 is slot associated with bridge.
>
> Scenario/Tests:
> Output of lspci -nn before test is given below. This snippet contains
> devices used for testing on Powernv machine.
>
> 0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
> 0004:08:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
> 0004:09:00.0 Serial Attached SCSI controller [0107]:
> Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
>
> Output of lspci -tv before test is as follows:
>
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
>  |   |   +-01.0-[08]00.0  Broadcom / 
> LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |   |   +-02.0-[09]00.0  Broadcom / 
> LSI SAS3216 PCI-Express Fusion-MPT SAS-3
>  |   |   \-03.0-[0a-0e]--
>  |   \-00.1  PMC-Sierra Inc. Device 4052
>
> C5(bridge) and C6(End Point) slot address are as below:
> # cat /sys/bus/pci/slots/C5/address
> 0004:02:00
> # cat /sys/bus/pci/slots/C6/address
> 0004:09:00

Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.
I find it helps to look at the PCI topology in terms of where the
physical PCIe links are. Here we've got:

- A link between the PHB (0004:00:00.0) and the switch upstream port
(0004:01:00.0)
- A link from switch downstream port 0 (0004:02:00.0) to nothing
- A link from switch downstream port 1 (0004:02:01.0) to a SAS card
- A link from switch downstream port 2 (0004:02:02.0) to a SAS card
- A link from switch downstream port 2 (0004:02:03.0) to nothing

Note that there's no PCIe link between the switch upstream port
(0004:01:00.0) and the downstream ports on bus 0004:02. The connection
between those is invisible to us because it's custom bus logic
internal to the PCIe switch ASIC. What I think has happened here is
that system firmware has supplied bad PCIe slot information to OPAL
which has resulted in pnv_php advertising a slot in the wrong place.
Assuming this following the usual IBM convention I'd expect the bridge
device for C5 to be the PHB's root port and the bus should be 0004:01.
It might be worth adding some logic to pnv_php to verify the PCI
bridge upstream of the slot actually has the PCIe slot capability to
guard against this problem.

> Hot-unplug operation on slot associated with bridge:
> # echo 0 > /sys/bus/pci/slots/C5/power
> # lspci -tv
>  +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
>  |   \-00.1  PMC-Sierra Inc. Device 4052

Yep, "powering off" C5 doesn't remove the upstream port device. This
would create problems if you physically removed the card from C5 since
the kernel would assume the switch device is still present.

> *snip*


> diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
> index 38561d6a2079..bea612759832 100644
> --- a/arch/powerpc/kernel/pci_dn.c
> +++ b/arch/powerpc/kernel/pci_dn.c
> @@ -493,4 +493,36 @@ static void pci_dev_pdn_setup(struct pci_dev *pdev)
> pdn = pci_get_pdn(pdev);
> pdev->dev.archdata.pci_data = pdn;
>  }
> +
> +void pci_traverse_sibling_nodes_and_scan_slot(struct device_node *start, 
> struct pci_bus *bus)
> +{
> +   struct device_node *dn;
> +   int slotno;
> +
> +   u32 class = 0;
> +
> +   if (!of_property_read_u32(start->child, "class-code", &class)) {
> +   /* Call of pci_scan_slot for non-bridge/EP case */
> +   if (!((class >> 8) == PCI_CLASS_BRIDGE_PCI)) {
> +   slotno = PCI_SLOT(PCI_DN(start->child)->devfn);
> +   pci_scan_slot(bus, PCI_DEVFN(slotno, 0));
> +   return;
> +   }
> +   }
> +
> +   /* Iterate all siblings */
> +   for_each_

[PATCH v2 2/2] powerpc: hotplug driver bridge support

2024-05-14 Thread Krishna Kumar
There is an issue with the hotplug operation when it's done on the
bridge/switch slot. The bridge-port and devices behind the bridge, which
become offline by hot-unplug operation, don't get hot-plugged/enabled by
doing hot-plug operation on that slot. Only the first port of the bridge
gets enabled and the remaining port/devices remain unplugged. The hot
plug/unplug operation is done by the hotplug driver
(drivers/pci/hotplug/pnv_php.c).

Root Cause Analysis: This behavior is due to missing code for the DPC
switch/bridge. The existing driver depends on pci_hp_add_devices()
function for device enablement. This function calls pci_scan_slot() on
only one device-node/port of the bridge, not on all the siblings'
device-node/port.

The missing code needs to be added which will find all the sibling
device-nodes/bridge-ports and will run explicit pci_scan_slot() on
those.  A new function has been added for this purpose which gets
invoked from pci_hp_add_devices(). This new function
pci_traverse_sibling_nodes_and_scan_slot() gets all the sibling
bridge-ports by traversal and explicitly invokes pci_scan_slot on them.

Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Christophe Leroy 
Cc: "Aneesh Kumar K.V" 
Cc: Bjorn Helgaas 
Cc: Gaurav Batra 
Cc: Nathan Lynch 
Cc: Brian King 

Signed-off-by: Krishna Kumar 
---
Command for reproducing the issue :

For hot unplug/disable - echo 0 > /sys/bus/pci/slots/C5/power
For hot plug/enable -echo 1 > /sys/bus/pci/slots/C5/power

where C5 is slot associated with bridge.

Scenario/Tests:
Output of lspci -nn before test is given below. This snippet contains
devices used for testing on Powernv machine.

0004:02:00.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:01.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:02.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:02:03.0 PCI bridge [0604]: PMC-Sierra Inc. Device [11f8:4052]
0004:08:00.0 Serial Attached SCSI controller [0107]:
Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)
0004:09:00.0 Serial Attached SCSI controller [0107]:
Broadcom / LSI SAS3216 PCI-Express Fusion-MPT SAS-3 [1000:00c9] (rev 01)

Output of lspci -tv before test is as follows:

# lspci -tv
 +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
 |   |   +-01.0-[08]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
 |   |   +-02.0-[09]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
 |   |   \-03.0-[0a-0e]--
 |   \-00.1  PMC-Sierra Inc. Device 4052

C5(bridge) and C6(End Point) slot address are as below:
# cat /sys/bus/pci/slots/C5/address
0004:02:00
# cat /sys/bus/pci/slots/C6/address
0004:09:00

Hot-unplug operation on slot associated with bridge:
# echo 0 > /sys/bus/pci/slots/C5/power
# lspci -tv
 +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--
 |   \-00.1  PMC-Sierra Inc. Device 4052

>From the above lspci -tv output, it can be observed that hot unplug
operation has removed all the PMC-Sierra bridge ports like:
00.0-[03-07], 01.0-[08], 02.0-[09], 03.0-[0a-0e] and the SAS devices
behind the bridge-port. Without the fix, when the hot plug operation is
done on the same slot, it adds only the first bridge port and doesn't
restore all the bridge-ports and devices that it unplugged earlier.
Below snippet shows this.

Hot-plug operation on the bridge slot without the fix:
# echo 1 > /sys/bus/pci/slots/C5/power
# lspci -tv
 +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
 |   \-00.1  PMC-Sierra Inc. Device 4052

After the fix, it restores all the devices in the same manner how it
unplugged them earlier during the hot unplug operation. The below snippet
shows the same.
Hot-plug operation on bridge slot with the fix:
# echo 1 > /sys/bus/pci/slots/C5/power
# lspci -tv
 +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
 |   |   +-01.0-[08]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
 |   |   +-02.0-[09]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
 |   |   \-03.0-[0a-0e]--
 |   \-00.1  PMC-Sierra Inc. Device 4052

Removal of End point device behind bridge are also intact and behaving
correctly.
Hot-unplug operation on Endpoint device C6:
# echo 0 > /sys/bus/pci/slots/C6/power
# lspci -tv
 +-[0004:00]---00.0-[01-0e]--+-00.0-[02-0e]--+-00.0-[03-07]--
 |   |   +-01.0-[08]00.0  Broadcom / 
LSI SAS3216 PCI-Express Fusion-MPT SAS-3
 |   |   +-02.0-[09]--
 |   |   \-03.0-[0a-0e]--
 |   \-00.1  PMC-Sierra Inc. Device 4052

Hot-plug operation on Endpoint device C6:
# echo 1 > /sys/bus/pci/s