Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-18 Thread Maciej W. Rozycki
On Thu, 18 Nov 2021, Maciej W. Rozycki wrote:

> "Use the Data Link Layer Link Active status flag as the primary indicator 
> of successful link speed negotiation, but given that the flag is optional 
> by hardware to implement (the ASM2824 does have it though) [...]"

 NB I did verify the change too by making code ignore the presence of the 
DLLLA bit in the ASM2824, the usual approach in simulating a different 
environment that is not readily available.  Otherwise I would have no 
guarantee that the loop termination conditions are indeed right or what a 
reasonable timeout would be for when there is no DLLLA bit.  If that was 
the case, I'd be offering unverified code (which is sometimes inevitable) 
without mentioning that fact and/or asking for verification, and that 
would be a major fault with any submission.

 I could have mentioned it with my submission that operation without DLLLA 
has been verified, which was clearly an oversight on my side.  OTOH it's 
been routine for me when working with hardware, so it didn't occur to me 
it may not be obvious to someone else (though indeed I've seen submissions 
of varying quality, so things cannot be taken for granted).

  Maciej


Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-18 Thread Maciej W. Rozycki
Hi Stefan,

> > > I would like to hear what Bjorn Helgaas thinks about this issue at all
> > > and how to handle it, without potentially break something else. And
> > > based on the fact that in U-Boot's PCI core code there is no such global
> > > quirk implemented, I really do not know if U-Boot maintainers and
> > > developers want to review and understand all PCI Express aspect of this
> > > code and possible side-effects. This is just what I think about it, I do
> > > not have maintainer hat.
> > 
> >   Please leave the decision up to actual maintainers then.  They can speak
> > for themselves.
> 
> Sure, this is correct. But still, review comments from other developers
> (e.g. non maintainers) are often very helpful. And especially Pali has
> done quite a lot of work in the area of PCIe lately (amongst others) and
> IMHO his thoughts and comments seem quite reasonable - at least to me.

 Having done it myself I know reviews are often more challenging than 
actual problem solving, also from my own experience, and I do appreciate 
Pali's effort and the willingness to help, as I have already expressed.  

> > > And... I have also PCIe HW which does not support DLLA bit and RL bit is
> > > write-only. I'm not sure right now, if this code would not cause issues
> > > for such HW (if to it is connected broken PCIe card which does not like
> > > link retraining), this needs to be tested.
> > 
> >   RL is always write-only and DLLLA is often not supported.  I think both
> > code itself and my description is clear as to the handling of these bits.
> > 
> >   Are you sure you are up to reviewing code in this area?
> 
> Maciej, please don't get me wrong. I value your input here. But this
> sounds a bit offensive IMHO and I would like to keep the discussion here
> on the technical level.

 Apologies to sound a little harsh.

 I've started feeling being side-tracked into non-issues though, as some 
of the matters raised are clear either from the specification or I have 
specifically pointed them out with my submission, e.g.:

"Use the Data Link Layer Link Active status flag as the primary indicator 
of successful link speed negotiation, but given that the flag is optional 
by hardware to implement (the ASM2824 does have it though) [...]"

(previously also the matter with the Supported Link Speeds Vector, etc.), 
so I started feeling like my effort to get all the details given in the 
submission was for nothing.

 Thank you for your input.

  Maciej


Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-17 Thread Stefan Roese

Hi Maciej,

On 11/18/21 01:32, Maciej W. Rozycki wrote:

Hi Pali,


If that has indicated failure, reduce the target speed, request a link
retrain and check again if the link has stabilised.  Repeat until either
successful or the link speeds supported by the downstream port have been
exhausted.


I would like to hear what Bjorn Helgaas thinks about this issue at all
and how to handle it, without potentially break something else. And
based on the fact that in U-Boot's PCI core code there is no such global
quirk implemented, I really do not know if U-Boot maintainers and
developers want to review and understand all PCI Express aspect of this
code and possible side-effects. This is just what I think about it, I do
not have maintainer hat.


  Please leave the decision up to actual maintainers then.  They can speak
for themselves.


Sure, this is correct. But still, review comments from other developers
(e.g. non maintainers) are often very helpful. And especially Pali has
done quite a lot of work in the area of PCIe lately (amongst others) and
IMHO his thoughts and comments seem quite reasonable - at least to me.


I suggested to to first send this fixup to the Linux kernel for proper
review (or to any other similar bigger project with PCI Express support)
and then implement (accepted) solution into U-Boot. As I think this can
speed up inclusion of this fix/workaround in U-Boot. I do not have a
problem to ack/review PCI patches for U-Boot which just re-implements
PCI logic from Linux kernel.


  I think that with a change functionally reviewed elsewhere and merely
carried over there'd be little to nothing to ack/review here.  However
based on my most recent findings reported in the other message I don't
think we're going to have the same workaround as Linux likely will,
because U-Boot and Linux have different objectives each.


+   for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
+speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
+speed--) {
+   if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
+   continue;
+
+   debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
+ PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
+
+   dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
+ exp_lnkctl2 | speed);
+   dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
+ exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
+
+   if (dm_pciauto_exp_link_stable(dev, pcie_off))
+   return;


I was thinking more about it and maybe this code still needs some
improvements. PCIe link consist of two ends and you should not force
speed on one end which is unsupported by other. Because iteration is
going from highest speed to slowest speed, it means that this code can
try to setup also 8GT/s speed on root/downstream port to which is
connected only 5GT/s card. I have 5GT/s PCIe cards which successfully
initialize link when downstream port forces 8GT/s speed, but link is
setup only to 2.5GT/s. So this code for those PCIe cards connected to
unstable ports (when this workaround is needed) degrades performance
from 5GT/s to just 2.5GT/s. But I do not know how to correctly handle it
(one quick idea: always force 2.5GT/s, stabilize link, read capability
from card and retrain again; but I do not know if this is correct).


  Are you sure you know what you are talking about here?

  As a part of link training ends first negotiate link at 2.5GT/s, which is
a speed all devices must support, then exchange information as to speeds
actually supported, and only then possibly choose to switch to another
speed supported by both.  How lowering the target link speed is supposed
to break it?  It's no different to connecting a device the maximum
supported link speed of which is the target link speed chosen.  At worst
they'll stay at 2.5GT/s.


And... I have also PCIe HW which does not support DLLA bit and RL bit is
write-only. I'm not sure right now, if this code would not cause issues
for such HW (if to it is connected broken PCIe card which does not like
link retraining), this needs to be tested.


  RL is always write-only and DLLLA is often not supported.  I think both
code itself and my description is clear as to the handling of these bits.

  Are you sure you are up to reviewing code in this area?


Maciej, please don't get me wrong. I value your input here. But this
sounds a bit offensive IMHO and I would like to keep the discussion here
on the technical level.

Thanks,
Stefan


Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-17 Thread Pali Rohár
On Thursday 18 November 2021 00:03:58 Maciej W. Rozycki wrote:
> At that point the link changes to 5GT/s instantaneously (there's no Link 
> Training reported active, not even momentarily, or Data Link Layer Link 
> Active reported inactive), as shown by the Link Status Register at both 
> ends (and the de-emphasis level does not matter; it works at either value, 
> as reported in the Link Status 2 register, again at both ends).

Here is simplified PCIe LTSSM diagram:
https://www.oreilly.com/library/view/pci-express-system/0321156307/0321156307_ch14lev1sec6.html

Active link is in L0 state and changing link speed is performed via
Recovery state followed by Configuration and back to L0. And there is
important note that LinkUp is reported as 1b also when going into
Configuration state from Recovery state. DLLLA bit in Link Status
register reports 1b when DLCMSM is in DL_Active state. And DL_Active is
changed (to DL_Inactive) only when LinkUp changes to 0b.

So it means that DLLLA bit in Link Status is not changed during
successful link retraining when changing link speed. So your above
observation is correct that DLLLA was not changed. And reason is not so
obvious...

Link Training bit in Link Status register is 1b when link is in
Configuration or Recovery state. It is possible that link speed change
is too fast and you do not observe it (or device return old value,
sampled last time).


Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-17 Thread Maciej W. Rozycki
Hi Pali,

> > If that has indicated failure, reduce the target speed, request a link 
> > retrain and check again if the link has stabilised.  Repeat until either 
> > successful or the link speeds supported by the downstream port have been 
> > exhausted.
> 
> I would like to hear what Bjorn Helgaas thinks about this issue at all
> and how to handle it, without potentially break something else. And
> based on the fact that in U-Boot's PCI core code there is no such global
> quirk implemented, I really do not know if U-Boot maintainers and
> developers want to review and understand all PCI Express aspect of this
> code and possible side-effects. This is just what I think about it, I do
> not have maintainer hat.

 Please leave the decision up to actual maintainers then.  They can speak 
for themselves.

> I suggested to to first send this fixup to the Linux kernel for proper
> review (or to any other similar bigger project with PCI Express support)
> and then implement (accepted) solution into U-Boot. As I think this can
> speed up inclusion of this fix/workaround in U-Boot. I do not have a
> problem to ack/review PCI patches for U-Boot which just re-implements
> PCI logic from Linux kernel.

 I think that with a change functionally reviewed elsewhere and merely 
carried over there'd be little to nothing to ack/review here.  However 
based on my most recent findings reported in the other message I don't 
think we're going to have the same workaround as Linux likely will, 
because U-Boot and Linux have different objectives each.

> > +   for (speed = (exp_lnkcap & PCI_EXP_LNKCAP_SLS) - 1;
> > +speed >= PCI_EXP_LNKCAP_SLS_2_5GB;
> > +speed--) {
> > +   if (exp_lnkcap2 && (exp_lnkcap2 & 1 << speed) == 0)
> > +   continue;
> > +
> > +   debug("PCI Autoconfig: %02x.%02x.%02x: Trying speed: %u\n",
> > + PCI_BUS(bdf), PCI_DEV(bdf), PCI_FUNC(bdf), speed);
> > +
> > +   dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL2,
> > + exp_lnkctl2 | speed);
> > +   dm_pci_write_config16(dev, pcie_off + PCI_EXP_LNKCTL,
> > + exp_lnkctl | PCI_EXP_LNKCTL_RETRLNK);
> > +
> > +   if (dm_pciauto_exp_link_stable(dev, pcie_off))
> > +   return;
> 
> I was thinking more about it and maybe this code still needs some
> improvements. PCIe link consist of two ends and you should not force
> speed on one end which is unsupported by other. Because iteration is
> going from highest speed to slowest speed, it means that this code can
> try to setup also 8GT/s speed on root/downstream port to which is
> connected only 5GT/s card. I have 5GT/s PCIe cards which successfully
> initialize link when downstream port forces 8GT/s speed, but link is
> setup only to 2.5GT/s. So this code for those PCIe cards connected to
> unstable ports (when this workaround is needed) degrades performance
> from 5GT/s to just 2.5GT/s. But I do not know how to correctly handle it
> (one quick idea: always force 2.5GT/s, stabilize link, read capability
> from card and retrain again; but I do not know if this is correct).

 Are you sure you know what you are talking about here?

 As a part of link training ends first negotiate link at 2.5GT/s, which is 
a speed all devices must support, then exchange information as to speeds 
actually supported, and only then possibly choose to switch to another 
speed supported by both.  How lowering the target link speed is supposed 
to break it?  It's no different to connecting a device the maximum 
supported link speed of which is the target link speed chosen.  At worst 
they'll stay at 2.5GT/s.

> And... I have also PCIe HW which does not support DLLA bit and RL bit is
> write-only. I'm not sure right now, if this code would not cause issues
> for such HW (if to it is connected broken PCIe card which does not like
> link retraining), this needs to be tested.

 RL is always write-only and DLLLA is often not supported.  I think both 
code itself and my description is clear as to the handling of these bits.  

 Are you sure you are up to reviewing code in this area?

  Maciej


Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-17 Thread Maciej W. Rozycki
Hi Stefan,

> > Make use of this observation then and attempt to detect the inability to
> > negotiate the link speed automatically, and then handle it by hand.  Use
> > the Data Link Layer Link Active status flag as the primary indicator of
> > successful link speed negotiation, but given that the flag is optional
> > by hardware to implement (the ASM2824 does have it though), resort to
> > checking for the mandatory Link Bandwidth Management Status flag showing
> > that the link speed or width has been changed in an attempt to correct
> > unreliable link operation (the ASM2824 does set it too).
> > 
> > If these checks indicate that link may not operate correctly, then poll
> > the Data Link Layer Link Active status flag along with the Link Training
> > flag for the duration of 200ms to see if the link has stabilised, that
> > is either that the Data Link Layer Link Active status flag has been set
> > or that Link Training has been inactive during at least the second half
> > of the inteval.
> > 
> > If that has indicated failure, reduce the target speed, request a link
> > retrain and check again if the link has stabilised.  Repeat until either
> > successful or the link speeds supported by the downstream port have been
> > exhausted.
> 
> So in such cases, the link speed will be downgraded? I would expect at
> least a big warning in such cases.

 I had mixed feelings about such extra clutter and chose not to include 
it, but perhaps it's worth adding after all, especially with the most 
recent findings, noted below.

> Did you try to change some other configuration options for the link
> establishment? I remember that on some hardware we were able to get
> better "link-up results" by setting the de-emphasis level to -3.5dB
> instead of -6dB (in the link control status register 2), before trying
> to re-estblish the link. Did you also test with "tuning" such
> parameters. There might be other, which I'm missing right now.

 Thank you for the suggestion.  I've never been too familiar with analogue 
electronics engineering, so I didn't consider working at that level.

 So as it has turned out the ASM2824 has the de-emphasis level already set 
to -3.5dB by default (at power-up or reset; the power-up default is 0063, 
and some bits are sticky, so may not change at reset).  Interestingly, the 
bit is defined as HwInit, and as such it is meant to be "read-only after 
intialization", however with the ASM2824 it appears freely writable at any 
time and state reported in other registers and at the other end of the 
link indicates these changes do take effect.  They do not fix the issue 
with link training though; I have tried both settings to no avail.

 However while fiddling with the register I have discovered an interesting 
phenomenon in that the link will actually switch to 5GT/s and then work 
reliably, provided that it is done in two steps: first clamping the target 
link speed to 2.5GT/s and letting link training succeed at it, and only 
then switching the target link speed to 5GT/s (or for that matter 8GT/s).  
At that point the link changes to 5GT/s instantaneously (there's no Link 
Training reported active, not even momentarily, or Data Link Layer Link 
Active reported inactive), as shown by the Link Status Register at both 
ends (and the de-emphasis level does not matter; it works at either value, 
as reported in the Link Status 2 register, again at both ends).

 It makes me suspect that the problem with link negotiation is at the data 
link layer, rather than at the physical layer as I originally thought.  
IOW the two devices disagree at the protocol rather than electrical level, 
and only allowing a higher link speed once the data link layer has gone up 
somehow avoids the incompatibility.

 It works the same regardless of whether I change the target link speed in 
U-Boot (whether by firmware code itself or by poking with commands entered 
at the prompt by hand) or in Linux (with `setpci').  However changing the 
target link speed back to any beyond 2.5GT/s in U-Boot has an unfortunate 
side effect of devices behind the problematic link being only accessible 
until Linux boots.  This is because Linux issues a reset to the PCIe tree, 
which causes the link to be reinitialised with the target link speed set 
beyond 2.5GT/s, and that brings the problem back.  The reset however does 
not cause an issue and lets devices behind the problematic link continue 
working if the target link speed has been set by U-Boot to 2.5GT/s.  This 
is because the Target Link Speed field in the Link Control 2 register is 
sticky, so the clamp continues to be applied.

 So I think my observations above have implications as follows:

1. We don't need to try lower and lower target link speeds as a workaround 
   in U-Boot.  It is enough if we force any link found problematic just to 
   2.5GT/s, as a minimal requirement to make such a link to work and also 
   the speed all PCIe devices must support.

2. We don't want to 

Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-16 Thread Pali Rohár
Hello!

On Tuesday 16 November 2021 11:35:24 Maciej W. Rozycki wrote:
> Attempt to handle cases with a downstream port of a PCIe switch where
> link training never completes and the link continues switching between 
> speeds indefinitely with the data link layer never reaching the active 
> state.
> 
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
> P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
> switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
> falling back to 2.5GT/s.
> 
> However the link continues oscillating between the two speeds, at the 
> rate of 34-35 times per second, with link training reported active ~84% 
> of the time repeatedly, e.g.:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
> Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
>   Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
>   LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
>   TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
> Selectable De-emphasis: -3.5dB
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> Forcibly limiting the target link speed to 2.5GT/s with the upstream 
> ASM2824 device makes the two switches communicate correctly however:
> 
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
> Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
>   Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
>   LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
>   TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
> SpeedDis+, Selectable De-emphasis: -3.5dB
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> and then:
> 
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
> 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
>   Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
>   Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
>   LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
>   TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
>   LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>Transmit Margin: Normal Operating Range, 
> EnterModifiedCompliance- ComplianceSOS-
>Compliance De-emphasis: -6dB
> [...]
> 
> Make use of this observation then and attempt to detect the inability to 
> negotiate the link speed automatically, and then handle it by hand.  Use 
> the Data Link Layer Link Active status flag as the primary indicator of 
> successful link speed negotiation, but given that the flag is optional 
> by hardware to implement (the ASM2824 does have it though), resort to 
> checking for the mandatory Link Bandwidth Management Status flag showing 
> that the link speed or width has been changed in an attempt to correct 
> unreliable link operation (the ASM2824 does set it too).
> 
> If these checks indicate that link may not operate correctly, then poll 
> the Data Link Layer Link Active status flag along with the Link Training 
> flag for the duration of 200ms to see if the link has stabilised, that 
> is either that the Data Link Layer Link Active status flag has been set 
> or that Link Training has been inactive during at least the second half 
> of the inteval.
> 
> If that has indicated failure, reduce the target speed, request a link 
> retrain and check again if the link has stabilised.  Repeat until either 
> successful or the link speeds supported by the downstream port have been 
> exhausted.

I would like to hear what Bjorn Helgaas thinks about this issue at all
and how to handle it, without potentially break something else. And
based on the fact that in U-Boot's PCI core code there is no such global
quirk implemented, I really do not know if U-Boot maintainers and
developers want to review and understand all PCI Express aspect of this
code and possible side-effects. This is just what I think about it, I do
not have maintainer hat.

I suggested to to first send this fixup to the Linux kernel for proper
review (or to 

Re: [PATCH v2] pci: Work around PCIe link training failures

2021-11-16 Thread Stefan Roese

Hi Maciej,

On 11/16/21 12:35, Maciej W. Rozycki wrote:

Attempt to handle cases with a downstream port of a PCIe switch where
link training never completes and the link continues switching between
speeds indefinitely with the data link layer never reaching the active
state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the
switches are supposed to negotiate the link speed of preferably 5.0GT/s,
falling back to 2.5GT/s.

However the link continues oscillating between the two speeds, at the
rate of 34-35 times per second, with link training reported active ~84%
of the time repeatedly, e.g.:


Interesting topic and approach. Please find a few questions / comments
below.


02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Forcibly limiting the target link speed to 2.5GT/s with the upstream
ASM2824 device makes the two switches communicate correctly however:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
SpeedDis+, Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

and then:

05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Make use of this observation then and attempt to detect the inability to
negotiate the link speed automatically, and then handle it by hand.  Use
the Data Link Layer Link Active status flag as the primary indicator of
successful link speed negotiation, but given that the flag is optional
by hardware to implement (the ASM2824 does have it though), resort to
checking for the mandatory Link Bandwidth Management Status flag showing
that the link speed or width has been changed in an attempt to correct
unreliable link operation (the ASM2824 does set it too).

If these checks indicate that link may not operate correctly, then poll
the Data Link Layer Link Active status flag along with the Link Training
flag for the duration of 200ms to see if the link has stabilised, that
is either that the Data Link Layer Link Active status flag has been set
or that Link Training has been inactive during at least the second half
of the inteval.

If that has indicated failure, reduce the target speed, request a link
retrain and check again if the link has stabilised.  Repeat until either
successful or the link speeds supported by the downstream port have been
exhausted.


So in such cases, the link speed will be downgraded? I would expect at
least a big warning in such cases.

Did you try to change some other configuration options for the link
establishment? I remember that on some hardware we were able to get
better "link-up results" by setting the de-emphasis level to -3.5dB
instead of -6dB (in the link control status register 2), before trying
to re-estblish the link. Did you also test with "tuning" such
parameters. There might be other, which I'm missing right now.

Thanks,
Stefan


Signed-off-by: Maciej W. Rozycki 
---
Hi,

  Thank you, Pali, for your valuable feedback.  Here's v2 of 

[PATCH v2] pci: Work around PCIe link training failures

2021-11-16 Thread Maciej W. Rozycki
Attempt to handle cases with a downstream port of a PCIe switch where
link training never completes and the link continues switching between 
speeds indefinitely with the data link layer never reaching the active 
state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, 
P/N 41433, wired to a SiFive HiFive Unmatched board.  In this setup the 
switches are supposed to negotiate the link speed of preferably 5.0GT/s, 
falling back to 2.5GT/s.

However the link continues oscillating between the two speeds, at the 
rate of 34-35 times per second, with link training reported active ~84% 
of the time repeatedly, e.g.:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, 
Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Forcibly limiting the target link speed to 2.5GT/s with the upstream 
ASM2824 device makes the two switches communicate correctly however:

02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet 
Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- 
SpeedDis+, Selectable De-emphasis: -3.5dB
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

and then:

05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 
3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
[...]

Make use of this observation then and attempt to detect the inability to 
negotiate the link speed automatically, and then handle it by hand.  Use 
the Data Link Layer Link Active status flag as the primary indicator of 
successful link speed negotiation, but given that the flag is optional 
by hardware to implement (the ASM2824 does have it though), resort to 
checking for the mandatory Link Bandwidth Management Status flag showing 
that the link speed or width has been changed in an attempt to correct 
unreliable link operation (the ASM2824 does set it too).

If these checks indicate that link may not operate correctly, then poll 
the Data Link Layer Link Active status flag along with the Link Training 
flag for the duration of 200ms to see if the link has stabilised, that 
is either that the Data Link Layer Link Active status flag has been set 
or that Link Training has been inactive during at least the second half 
of the inteval.

If that has indicated failure, reduce the target speed, request a link 
retrain and check again if the link has stabilised.  Repeat until either 
successful or the link speeds supported by the downstream port have been 
exhausted.

Signed-off-by: Maciej W. Rozycki 
---
Hi,

 Thank you, Pali, for your valuable feedback.  Here's v2 of the change 
with your suggestions taken into account.  I believe I have addressed all 
your concerns and I haven't heard from anyone else, so I consider this 
version final unless someone speaks out.

 Please apply then.

  Maciej

Changes from v1:

- also handle root complex and PCI/PCI-X to PCIe bridge devices,

- update comments throughout accordingly,

- likewise the change description; mention the rate of oscillations 
  observed with the ASM2824 device.
---
 drivers/pci/pci_auto.c |  157 +
 include/pci.h  |   18 +
 2 files changed, 174 insertions(+), 1 deletion(-)