Re: [PATCH v3 1/2] pci: Add vendor id of Ubiquiti Networks

2018-01-29 Thread Bjorn Helgaas
On Thu, Jan 25, 2018 at 12:13:39PM +0100, Tobias Schramm wrote:
> Signed-off-by: Tobias Schramm <toblemi...@gmail.com>

Acked-by: Bjorn Helgaas <bhelg...@google.com>

I see Kalle has already applied this, but if it's updated for any reason,
please add my ack and change the subject to:

  PCI: Add Ubiquiti Networks vendor ID

i.e., match the style of previous changes.

> ---
>  include/linux/pci_ids.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index ab20dc5db423..35871adfdc86 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -149,6 +149,8 @@
>  #define PCI_VENDOR_ID_DYNALINK   0x0675
>  #define PCI_DEVICE_ID_DYNALINK_IS64PH0x1702
>  
> +#define PCI_VENDOR_ID_UBIQUITI   0x0777
> +
>  #define PCI_VENDOR_ID_BERKOM 0x0871
>  #define PCI_DEVICE_ID_BERKOM_A1T 0xffa1
>  #define PCI_DEVICE_ID_BERKOM_T_CONCEPT   0xffa2
> -- 
> 2.16.0
> 

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3] PCI: Disallow retraining link for Atheros chips on non-Gen1 PCIe bridges

2021-06-16 Thread Bjorn Helgaas
On Wed, Jun 02, 2021 at 09:03:02PM +0200, Pali Rohár wrote:
> On Wednesday 02 June 2021 10:55:59 Bjorn Helgaas wrote:
> > On Wed, Jun 02, 2021 at 02:08:16PM +0200, Pali Rohár wrote:
> > > On Tuesday 01 June 2021 19:00:36 Bjorn Helgaas wrote:
> > 
> > > > I wonder if this could be restructured as a generic quirk in quirks.c
> > > > that simply set the bridge's TLS to 2.5 GT/s during enumeration.  Or
> > > > would the retrain fail even in that case?
> > > 
> > > If I understand it correctly then PCIe link is already up when kernel
> > > starts enumeration. So setting Bridge TLS to 2.5 GT/s does not change
> > > anything here.
> > > 
> > > Moreover it would have side effect that cards which are already set to
> > > 5+ GT/s would be downgraded to 2.5 GT/s during enumeration and for
> > > increasing speed would be needed another round of "enumeration" to set a
> > > new TLS and retrain link again. As TLS affects link only after link goes
> > > into Recovery state.
> > > 
> > > So this would just complicate card enumeration and settings.
> > 
> > The current quirk complicates the ASPM code.  I'm hoping that if we
> > set the bridge's Target Link Speed during enumeration, the link
> > retrain will "just work" without complicating the ASPM code.
> > 
> > An enumeration quirk wouldn't have to set the bridge's TLS to 2.5
> > GT/s; the quirk would be attached to specific endpoint devices and
> > could set the bridge's TLS to whatever the endpoint supports.
> 
> Now I see what you mean. Yes, I agree this is a good idea and can
> simplify code. Quirk is not related to ASPM code and basically has
> nothing with it, just I put it into aspm.c because this is the only
> place where link retraining was activated.
> 
> But with this proposal there is one issue. Some kernel drivers already
> overwrite PCI_EXP_LNKCTL2_TLS value. So if PCI enumeration code set some
> value into PCI_EXP_LNKCTL2_TLS bits then drivers can change it and once
> ASPM will try to retrain link this may cause this issue.

I guess you mean the amdgpu, radeon, and hfi1 drivers.  They really
shouldn't be mucking with that stuff anyway.  But they do and are
unlikely to change because we don't have any good alternative.

One way around that would be to add some quirk code to
pcie_capability_write_word().  Ugly, but we do have something sort of
similar in pcie_capability_read_word() already.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3] PCI: Disallow retraining link for Atheros chips on non-Gen1 PCIe bridges

2021-06-01 Thread Bjorn Helgaas
On Tue, Jun 01, 2021 at 11:18:39PM +0200, Pali Rohár wrote:
> On Tuesday 01 June 2021 15:05:49 Bjorn Helgaas wrote:
> > On Wed, May 05, 2021 at 06:33:57PM +0200, Pali Rohár wrote:
> > > Atheros AR9xxx and QCA9xxx chips have behaviour issues not only after a
> > > bus reset, but also after doing retrain link, if PCIe bridge is not in
> > > GEN1 mode (at 2.5 GT/s speed):
> > > 
> > > - QCA9880 and QCA9890 chips throw a Link Down event and completely
> > >   disappear from the bus and their config space is not accessible
> > >   afterwards.
> > > 
> > > - QCA9377 chip throws a Link Down event followed by Link Up event, the
> > >   config space is accessible and PCI device ID is correct. But trying to
> > >   access chip's I/O space causes Uncorrected (Non-Fatal) AER error,
> > >   followed by Synchronous external abort 96000210 and Segmentation fault
> > >   of insmod while loading ath10k_pci.ko module.
> > > 
> > > - AR9390 chip throws a Link Down event followed by Link Up event, config
> > >   space is accessible, but contains nonsense values. PCI device ID is
> > >   0xABCD which indicates HW bug that chip itself was not able to read
> > >   values from internal EEPROM/OTP.
> > > 
> > > - AR9287 chip throws also Link Down and Link Up events, also has
> > >   accessible config space containing correct values. But ath9k driver
> > >   fails to initialize card from this state as it is unable to access HW
> > >   registers. This also indicates that the chip iself is not able to read
> > >   values from internal EEPROM/OTP.
> > > 
> > > These issues related to PCI device ID 0xABCD and to reading internal
> > > EEPROM/OTP were previously discussed at ath9k-devel mailing list in
> > > following thread:
> > > 
> > >   https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
> > > 
> > > After experiments we've come up with a solution: it seems that Retrain
> > > link can be called only when using GEN1 PCIe bridge or when PCIe bridge
> > > link speed is forced to 2.5 GT/s. Applying this workaround fixes all
> > > mentioned cards.
> > 
> > I *assume* this means the device was running at > 2.5 GT/s in the
> > first place,
> 
> No. All these Atheros chips are 2.5 GT/s only. It looks like that if
> PCIe Bridge has initial value 5 GT/s (or higher) in PCI_EXP_LNKCAP2
> register and link retraining is activated, something happen which cause
> these Atheros chips to "crash". Looks like that Root Bridge tries to
> change link speed from 2.5 GT/s to 5 GT/s (which is not supported by all
> these Atheros chips).

Oh, perfect.  Then I guess all we need is to restrict these devices to
2.5 GT/s.  And we can just ignore all my rambling about higher speeds
below, so I'll elide them.

> > ...

Except this:

> > This patch implies that the hardware automatically trained to a
> > higher rate after power-on (which I think is what PCIe hardware is
> > *supposed* to do) and something prevents that from succeeding when
> > we retrain, or maybe BIOS did something different than what Linux
> > is doing, or ... something else?

> Tested platforms was also without BIOS and without any other firmware
> which touched PCIe.

The fact that the link came up automatically without any firmware or
software at all is very interesting.  The retrain path is actually
different from a hardware point of view: the power-on path through
LTSSM would normally be Detect, Polling, Configuration, L0; the
retrain path would be L0, Recovery, L0.  So I guess it isn't *too*
surprising that the power-on path could work even if the retrain path
is broken.

I wonder if setting, then clearing, the bridge's Link Disable bit
would work, since that would start again with the LTSSM Detect state,
just like power-on.  But I don't think that would help with this
ASPM/Common Clock issue because I think the link disable would look
like a hot reset to the endpoint, and it would clear the Common Clock
Configuration bit.

So backing up a lng ways, how much value is there in doing this
retrain at all?  AFAICT the only reason we do it is because we think
the Common Clock Configuration is inconsistent, and we tried to fix
something, and we have to retrain the link to get the devices to
update their L0s and L1 exit latencies.  I guess it's the Slock Clock
(PCI_EXP_LNKSTA_SLC) bits that determines all this, right?  Do you
know those?

I wonder if this could be restructured as a generic quirk in quirks.c
that simply set the bridge's TLS to 2.5 GT/s during enumeration.  Or
would the retrain fail even in that case?

> > > +static int pcie_downgrade_

Re: [PATCH v3] PCI: Disallow retraining link for Atheros chips on non-Gen1 PCIe bridges

2021-06-02 Thread Bjorn Helgaas
On Wed, Jun 02, 2021 at 02:08:16PM +0200, Pali Rohár wrote:
> On Tuesday 01 June 2021 19:00:36 Bjorn Helgaas wrote:

> > I wonder if this could be restructured as a generic quirk in quirks.c
> > that simply set the bridge's TLS to 2.5 GT/s during enumeration.  Or
> > would the retrain fail even in that case?
> 
> If I understand it correctly then PCIe link is already up when kernel
> starts enumeration. So setting Bridge TLS to 2.5 GT/s does not change
> anything here.
> 
> Moreover it would have side effect that cards which are already set to
> 5+ GT/s would be downgraded to 2.5 GT/s during enumeration and for
> increasing speed would be needed another round of "enumeration" to set a
> new TLS and retrain link again. As TLS affects link only after link goes
> into Recovery state.
> 
> So this would just complicate card enumeration and settings.

The current quirk complicates the ASPM code.  I'm hoping that if we
set the bridge's Target Link Speed during enumeration, the link
retrain will "just work" without complicating the ASPM code.

An enumeration quirk wouldn't have to set the bridge's TLS to 2.5
GT/s; the quirk would be attached to specific endpoint devices and
could set the bridge's TLS to whatever the endpoint supports.

> Moreover here we are dealing with specific OTP/EEPROM bug in Atheros
> chips, which was confirmed that exists. As I wrote in previous email, I
> was told that semi-official workaround is do Warm Reset or Cold Reset
> with turning power off from card. Which on most platforms / boards is
> not possible.

If there's a specific bug with a real root-cause analysis, please cite
it.  The threads mentioned in the current commit log are basically
informed speculation.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v3] PCI: Disallow retraining link for Atheros chips on non-Gen1 PCIe bridges

2021-06-01 Thread Bjorn Helgaas
On Wed, May 05, 2021 at 06:33:57PM +0200, Pali Rohár wrote:
> Atheros AR9xxx and QCA9xxx chips have behaviour issues not only after a
> bus reset, but also after doing retrain link, if PCIe bridge is not in
> GEN1 mode (at 2.5 GT/s speed):
> 
> - QCA9880 and QCA9890 chips throw a Link Down event and completely
>   disappear from the bus and their config space is not accessible
>   afterwards.
> 
> - QCA9377 chip throws a Link Down event followed by Link Up event, the
>   config space is accessible and PCI device ID is correct. But trying to
>   access chip's I/O space causes Uncorrected (Non-Fatal) AER error,
>   followed by Synchronous external abort 96000210 and Segmentation fault
>   of insmod while loading ath10k_pci.ko module.
> 
> - AR9390 chip throws a Link Down event followed by Link Up event, config
>   space is accessible, but contains nonsense values. PCI device ID is
>   0xABCD which indicates HW bug that chip itself was not able to read
>   values from internal EEPROM/OTP.
> 
> - AR9287 chip throws also Link Down and Link Up events, also has
>   accessible config space containing correct values. But ath9k driver
>   fails to initialize card from this state as it is unable to access HW
>   registers. This also indicates that the chip iself is not able to read
>   values from internal EEPROM/OTP.
> 
> These issues related to PCI device ID 0xABCD and to reading internal
> EEPROM/OTP were previously discussed at ath9k-devel mailing list in
> following thread:
> 
>   https://www.mail-archive.com/ath9k-devel@lists.ath9k.org/msg07529.html
> 
> After experiments we've come up with a solution: it seems that Retrain
> link can be called only when using GEN1 PCIe bridge or when PCIe bridge
> link speed is forced to 2.5 GT/s. Applying this workaround fixes all
> mentioned cards.

I *assume* this means the device was running at > 2.5 GT/s in the
first place, and when aspm.c retrains the link to configure the common
clock, we downgrade to 2.5 GT/s, so the device is now slower than it
used to be?

Is that slower speed acceptable?  Is it better to be slower with ASPM,
or faster without ASPM?  Or maybe it could be faster *with* ASPM if we
avoided the common clock config and the retrain?  I think that would
give up some of the benefit of ASPM, but maybe it would still be
worthwhile?

If the device was running at > 2.5 GT/s to begin with, obviously there
is *some* way to get there.  This patch implies that the hardware
automatically trained to a higher rate after power-on (which I think
is what PCIe hardware is *supposed* to do) and something prevents that
from succeeding when we retrain, or maybe BIOS did something different
than what Linux is doing, or ... something else?

Maybe the device can only retrain to 2.5 GT/s or from the current
speed to a higher speed.  This sort of experimentation could probably
be done with setpci.

> This issue was reproduced with more cards:
> - Compex WLE900VX (QCA9880 based / device ID 0x003c)
> - QCNFA435 (QCA9377 based / device ID 0x0042)
> - Compex WLE200NX (AR9287 based / device ID 0x002e)
> - "noname" card (QCA9890 based / device ID 0x003c)
> - Wistron NKR-DNXAH1 (AR9390 based / device ID 0x0030)
> on Armada 385 with pci-mvebu.c driver and also on Armada 3720 with
> pci-aardvark.c driver.
> 
> To workaround this issue, this change introduces a new PCI quirk called
> PCI_DEV_FLAGS_NO_RETRAIN_LINK_WHEN_NOT_GEN1, which is enabled for all
> Atheros chips with PCI_DEV_FLAGS_NO_BUS_RESET quirk, and also for Atheros
> chip AR9287.
> 
> When this quirk is set, kernel disallows triggering PCI_EXP_LNKCTL_RL
> bit in config space of PCIe Bridge in the case when PCIe Bridge is
> capable of higher speed than 2.5 GT/s and this higher speed is already
> allowed. When PCIe Bridge has accessible LNKCTL2 register, we try to
> force target link speed to 2.5 GT/s. After this change it is possible
> to trigger PCI_EXP_LNKCTL_RL bit without issues.

This basically feels like a "it hurts when I do X, so stop doing X"
patch.  We don't really know what's wrong; we've just determined
experimentally how to avoid it.

> Currently only PCIe ASPM kernel code triggers this PCI_EXP_LNKCTL_RL bit,
> so quirk check is added only into pcie/aspm.c file.
> 
> Signed-off-by: Pali Rohár 
> Reported-by: Toke Høiland-Jørgensen 
> Tested-by: Toke Høiland-Jørgensen 
> Tested-by: Marek Behún 
> BugLink: https://lore.kernel.org/linux-pci/87h7l8axqp@toke.dk/
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=84821
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=192441
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209833
> Cc: sta...@vger.kernel.org # c80851f6ce63a ("PCI: Add PCI_EXP_LNKCTL2_TLS* 
> macros")
> 
> ---
> Changes since v1:
> * Move whole quirk code into pcie_downgrade_link_to_gen1() function
> * Reformat to 80 chars per line where possible
> * Add quirk also for cards with AR9287 chip (PCI ID 0x002e)
> * Extend commit message description and add information about 0xABCD
> 
> 

Re: [PATCH v3] PCI: Disallow retraining link for Atheros chips on non-Gen1 PCIe bridges

2021-06-25 Thread Bjorn Helgaas
On Mon, Jun 21, 2021 at 04:28:55PM +0200, Pali Rohár wrote:
> On Wednesday 16 June 2021 16:38:19 Bjorn Helgaas wrote:
> > On Wed, Jun 02, 2021 at 09:03:02PM +0200, Pali Rohár wrote:
> > > On Wednesday 02 June 2021 10:55:59 Bjorn Helgaas wrote:
> > > > On Wed, Jun 02, 2021 at 02:08:16PM +0200, Pali Rohár wrote:
> > > > > On Tuesday 01 June 2021 19:00:36 Bjorn Helgaas wrote:
> > > > 
> > > > > > I wonder if this could be restructured as a generic quirk
> > > > > > in quirks.c that simply set the bridge's TLS to 2.5 GT/s
> > > > > > during enumeration.  Or would the retrain fail even in
> > > > > > that case?
> > > > > 
> > > > > If I understand it correctly then PCIe link is already up
> > > > > when kernel starts enumeration. So setting Bridge TLS to 2.5
> > > > > GT/s does not change anything here.
> > > > > 
> > > > > Moreover it would have side effect that cards which are
> > > > > already set to 5+ GT/s would be downgraded to 2.5 GT/s
> > > > > during enumeration and for increasing speed would be needed
> > > > > another round of "enumeration" to set a new TLS and retrain
> > > > > link again. As TLS affects link only after link goes into
> > > > > Recovery state.
> > > > > 
> > > > > So this would just complicate card enumeration and settings.
> > > > 
> > > > The current quirk complicates the ASPM code.  I'm hoping that
> > > > if we set the bridge's Target Link Speed during enumeration,
> > > > the link retrain will "just work" without complicating the
> > > > ASPM code.
> > > > 
> > > > An enumeration quirk wouldn't have to set the bridge's TLS to
> > > > 2.5 GT/s; the quirk would be attached to specific endpoint
> > > > devices and could set the bridge's TLS to whatever the
> > > > endpoint supports.
> > > 
> > > Now I see what you mean. Yes, I agree this is a good idea and
> > > can simplify code. Quirk is not related to ASPM code and
> > > basically has nothing with it, just I put it into aspm.c because
> > > this is the only place where link retraining was activated.
> > > 
> > > But with this proposal there is one issue. Some kernel drivers
> > > already overwrite PCI_EXP_LNKCTL2_TLS value. So if PCI
> > > enumeration code set some value into PCI_EXP_LNKCTL2_TLS bits
> > > then drivers can change it and once ASPM will try to retrain
> > > link this may cause this issue.
> > 
> > I guess you mean the amdgpu, radeon, and hfi1 drivers.  They
> > really shouldn't be mucking with that stuff anyway.  But they do
> > and are unlikely to change because we don't have any good
> > alternative.
> 
> Yea, these are examples of such drivers... Maybe it is a good idea
> to ask those people why changing PCI_EXP_LNKCTL2_TLS is needed. As
> these drivers are often derived from codebase of shared multisystem
> drivers or from common documentation, it is possible that original
> source has this code as a workaround or common pattern used in other
> operating systems, not related to linux...
> 
> > One way around that would be to add some quirk code to
> > pcie_capability_write_word().  Ugly, but we do have something sort
> > of similar in pcie_capability_read_word() already.
> 
> Bjorn, do you really want such ugly hack in
> pcie_capability_write_word?  It is common code used and called from
> lot of places so it may affect whole system if in future somebody
> changes it again...

I don't know which is uglier, a quirk in pcie_capability_write_word()
or a quirk in aspm.c that has nothing to do with ASPM.  They're both
ugly :)

FWIW, in pcie_capability_write_word() I would envision not a check for
Atheros, but rather something like a "dev->max_target_link_speed" that
could be set by an Atheros quirk.  It does get uglier if we want to
restrict the bridge's link speed via a quirk, then unrestrict it when
the endpoint is unplugged.

I know pcie_downgrade_link_to_gen1() only returns failure for corner
cases that "should not occur," but I don't like the fact that it's
possible to change Common Clock Configuration without doing the
retrain.  That would leave us with incorrect ASPM exit latencies,
which is really hard to debug.

Here's the relevant text in the spec (PCIe r5.0):

  7.5.3.6 Link Capabilities

L0s Exit Latency - This field indicates the L0s exit latency for
the given PCI Express Link. The value reported indicates the
 

Re: [PATCH v2 03/13] PCI/ASPM: Disable ASPM when driver requests it

2023-10-11 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 04:10:53PM +0300, Ilpo Järvinen wrote:
> PCI core/ASPM service driver allows controlling ASPM state through
> pci_disable_link_state() and pci_enable_link_state() API. It was
> decided earlier (see the Link below), to not allow ASPM changes when OS
> does not have control over it but only log a warning about the problem
> (commit 2add0ec14c25 ("PCI/ASPM: Warn when driver asks to disable ASPM,
> but we can't do it")). Similarly, if ASPM is not enabled through
> config, ASPM cannot be disabled.
> 
> A number of drivers have added workarounds to force ASPM off with own
> writes into the Link Control Register (some even with comments
> explaining why PCI core does not disable it under some circumstances).
> According to the comments, some drivers require ASPM to be off for
> reliable operation.
> 
> Having custom ASPM handling in drivers is problematic because the state
> kept in the ASPM service driver is not updated by the changes made
> outside the link state management API.
> 
> As the first step to address this issue, make pci_disable_link_state()
> to unconditionally disable ASPM so the motivation for drivers to come
> up with custom ASPM handling code is eliminated.
> 
> Place the minimal ASPM disable handling into own file as it is too
> complicated to fit into a header as static inline and it has almost no
> overlap with the existing, more complicated ASPM code in
> drivers/pci/pce/aspm.c.
> 
> Make pci_disable_link_state() function comment to comply kerneldoc
> formatting while changing the description.
> 
> Link: 
> https://lore.kernel.org/all/canux_p3f5yhbzx3wgu-j1agpbxb_t9bis2erhvkkfmtdvza...@mail.gmail.com/
> Link: 
> https://lore.kernel.org/all/20230511131441.45704-1-ilpo.jarvi...@linux.intel.com/
> Signed-off-by: Ilpo Järvinen 
> ---
>  drivers/pci/pcie/Makefile   |  1 +
>  drivers/pci/pcie/aspm.c | 33 ++---
>  drivers/pci/pcie/aspm_minimal.c | 66 +
>  include/linux/pci.h |  6 +--
>  4 files changed, 88 insertions(+), 18 deletions(-)
>  create mode 100644 drivers/pci/pcie/aspm_minimal.c
> 
> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
> index 8de4ed5f98f1..ec7f04037b01 100644
> --- a/drivers/pci/pcie/Makefile
> +++ b/drivers/pci/pcie/Makefile
> @@ -6,6 +6,7 @@ pcieportdrv-y := portdrv.o rcec.o
>  
>  obj-$(CONFIG_PCIEPORTBUS)+= pcieportdrv.o
>  
> +obj-y+= aspm_minimal.o

Can we put this code in drivers/pci/pci.c instead of creating a new
file for it?  pci.c is kind of a dumping ground and isn't ideal
either, but we do have a few other things there that we *always* want
even though they're related to a separate Kconfig feature, e.g.,
pci_bridge_reconfigure_ltr(), pcie_clear_device_status(),
pcie_clear_root_pme_status().

>  obj-$(CONFIG_PCIEASPM)   += aspm.o

Or maybe it would be better to just put it in aspm.c, drop this
compilation guard, and wrap the rest of the file in #ifdef
CONFIG_PCIEASPM.  Then everything would be in one file, which is a
major boon for code readers.

What do you think?

>  obj-$(CONFIG_PCIEAER)+= aer.o err.o
>  obj-$(CONFIG_PCIEAER_INJECT) += aer_inject.o
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 860bc94974ec..ec6d7a092ac1 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1042,16 +1042,23 @@ static int __pci_disable_link_state(struct pci_dev 
> *pdev, int state, bool sem)
>   return -EINVAL;
>   /*
>* A driver requested that ASPM be disabled on this device, but
> -  * if we don't have permission to manage ASPM (e.g., on ACPI
> +  * if we might not have permission to manage ASPM (e.g., on ACPI
>* systems we have to observe the FADT ACPI_FADT_NO_ASPM bit and
> -  * the _OSC method), we can't honor that request.  Windows has
> -  * a similar mechanism using "PciASPMOptOut", which is also
> -  * ignored in this situation.
> +  * the _OSC method), previously we chose to not honor disable
> +  * request in that case. Windows has a similar mechanism using
> +  * "PciASPMOptOut", which is also ignored in this situation.
> +  *
> +  * Not honoring the requests to disable ASPM, however, led to
> +  * drivers forcing ASPM off on their own. As such changes of ASPM
> +  * state are not tracked by this service driver, the state kept here
> +  * became out of sync.
> +  *
> +  * Therefore, honor ASPM disable requests even when OS does not have
> +  * ASPM control. Plain disable for ASPM is assumed to be slightly
> +  * safer than fully managing it.
>*/
> - if (aspm_disabled) {
> - pci_warn(pdev, "can't disable ASPM; OS doesn't have ASPM 
> control\n");
> - return -EPERM;
> - }
> + if (aspm_disabled)
> + pci_warn(pdev, "OS doesn't have ASPM control, disabling ASPM 
> anyway\n");

I 

Re: [PATCH v2 04/13] PCI/ASPM: Move L0S/L1/sub states mask calculation into a helper

2023-10-11 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 04:10:54PM +0300, Ilpo Järvinen wrote:
> ASPM service driver does the same L0S / L1S / sub states allowed
> calculation in __pci_disable_link_state() and
> pci_set_default_link_state().

Is there a typo or something here?  This patch only adds a call to
__pci_disable_link_state(), not to pci_set_default_link_state().

> Create a helper to calculate the mask for the allowed states.
> 
> Signed-off-by: Ilpo Järvinen 
> ---
>  drivers/pci/pcie/aspm.c | 33 +
>  1 file changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index ec6d7a092ac1..91dc95aca90f 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1034,6 +1034,26 @@ static struct pcie_link_state 
> *pcie_aspm_get_link(struct pci_dev *pdev)
>   return bridge->link_state;
>  }
>  
> +static u8 pci_link_state_mask(int state)
> +{
> + u8 result = 0;
> +
> + if (state & PCIE_LINK_STATE_L0S)
> + result |= ASPM_STATE_L0S;
> + if (state & PCIE_LINK_STATE_L1)
> + result |= ASPM_STATE_L1;
> + if (state & PCIE_LINK_STATE_L1_1)
> + result |= ASPM_STATE_L1_1;
> + if (state & PCIE_LINK_STATE_L1_2)
> + result |= ASPM_STATE_L1_2;
> + if (state & PCIE_LINK_STATE_L1_1_PCIPM)
> + result |= ASPM_STATE_L1_1_PCIPM;
> + if (state & PCIE_LINK_STATE_L1_2_PCIPM)
> + result |= ASPM_STATE_L1_2_PCIPM;
> +
> + return result;
> +}
> +
>  static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool 
> sem)
>  {
>   struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> @@ -1063,18 +1083,7 @@ static int __pci_disable_link_state(struct pci_dev 
> *pdev, int state, bool sem)
>   if (sem)
>   down_read(_bus_sem);
>   mutex_lock(_lock);
> - if (state & PCIE_LINK_STATE_L0S)
> - link->aspm_disable |= ASPM_STATE_L0S;
> - if (state & PCIE_LINK_STATE_L1)
> - link->aspm_disable |= ASPM_STATE_L1;
> - if (state & PCIE_LINK_STATE_L1_1)
> - link->aspm_disable |= ASPM_STATE_L1_1;
> - if (state & PCIE_LINK_STATE_L1_2)
> - link->aspm_disable |= ASPM_STATE_L1_2;
> - if (state & PCIE_LINK_STATE_L1_1_PCIPM)
> - link->aspm_disable |= ASPM_STATE_L1_1_PCIPM;
> - if (state & PCIE_LINK_STATE_L1_2_PCIPM)
> - link->aspm_disable |= ASPM_STATE_L1_2_PCIPM;
> + link->aspm_disable |= pci_link_state_mask(state);
>   pcie_config_aspm_link(link, policy_to_aspm_state(link));
>  
>   if (state & PCIE_LINK_STATE_CLKPM)
> -- 
> 2.30.2
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 03/13] PCI/ASPM: Disable ASPM when driver requests it

2023-10-26 Thread Bjorn Helgaas
On Mon, Oct 16, 2023 at 05:27:37PM +0300, Ilpo Järvinen wrote:
> On Fri, 13 Oct 2023, Bjorn Helgaas wrote:
> > On Thu, Oct 12, 2023 at 01:56:16PM +0300, Ilpo Järvinen wrote:
> > > On Wed, 11 Oct 2023, Bjorn Helgaas wrote:
> > > > On Mon, Sep 18, 2023 at 04:10:53PM +0300, Ilpo Järvinen wrote:
> > > > > PCI core/ASPM service driver allows controlling ASPM state
> > > > > through pci_disable_link_state() and pci_enable_link_state()
> > > > > API. It was decided earlier (see the Link below), to not
> > > > > allow ASPM changes when OS does not have control over it but
> > > > > only log a warning about the problem (commit 2add0ec14c25
> > > > > ("PCI/ASPM: Warn when driver asks to disable ASPM, but we
> > > > > can't do it")). Similarly, if ASPM is not enabled through
> > > > > config, ASPM cannot be disabled.
> > > ...
> > 
> > > > This disables *all* ASPM states, unlike the version when
> > > > CONFIG_PCIEASPM is enabled.  I suppose there's a reason, and
> > > > maybe a comment could elaborate on it?
> > > >
> > > > When CONFIG_PCIEASPM is not enabled, I don't think we actively
> > > > *disable* ASPM in the hardware; we just leave it as-is, so
> > > > firmware might have left it enabled.
> > > 
> > > This whole trickery is intended for drivers that do not want to
> > > have ASPM because the devices are broken with it. So leaving it
> > > as-is is not really an option (as demonstrated by the custom
> > > workarounds).
> > 
> > Right.
> > 
> > > > Conceptually it seems like the LNKCTL updates here should be
> > > > the same whether CONFIG_PCIEASPM is enabled or not (subject to
> > > > the question above).
> > > > 
> > > > When CONFIG_PCIEASPM is enabled, we might need to do more
> > > > stuff, but it seems like the core should be the same.
> > > 
> > > So you think it's safer to partially disable ASPM (as per
> > > driver's request) rather than disable it completely? I got the
> > > impression that the latter might be safer from what Rafael said
> > > earlier but I suppose I might have misinterpreted him since he
> > > didn't exactly say that it might be safer to _completely_
> > > disable it.
> > 
> > My question is whether the state of the device should depend on
> > CONFIG_PCIEASPM.  If the driver does this:
> > 
> >   pci_disable_link_state(PCIE_LINK_STATE_L0S)
> > 
> > do we want to leave L1 enabled when CONFIG_PCIEASPM=y but disable L1
> > when CONFIG_PCIEASPM is unset?
> > 
> > I can see arguments both ways.  My thought was that it would be nice
> > to end up with a single implementation of pci_disable_link_state()
> > with an #ifdef around the CONFIG_PCIEASPM-enabled stuff because it
> > makes the code easier to read.

Responding to myself here, I think we should do the partial disables
because it matches what the drivers did previously by hand, we can
reduce the number of code paths, and the resulting device state will
be the same regardless of CONFIG_PCIEASPM.

> I think there's still one important thing to discuss and none of the
> comments have covered that area so far.
> 
> The drivers that have workaround are not going to turn more
> dangerous than they're already without this change, so we're mostly
> within charted waters there even with what you propose. However, I
> think the bigger catch and potential source of problems, with both
> this v2 and your alternative, are the drivers that do not have the
> workarounds around CONFIG_PCIEASPM=n and/or _OSC permissions. Those
> code paths just call pci_disable_link_state() and do nothing else.
> 
> Do you think it's okay to alter the behavior for those drivers too
> (disable ASPM where it previously was a no-op)?

Yes.  I assume the reason those drivers call pci_disable_link_state()
is because some hardware defect means ASPM doesn't work correctly.

This change means pci_disable_link_state() will disable ASPM even when
the OS doesn't own ASPM or CONFIG_PCIEASPM is unset.  I think those
cases are unusual and probably not well tested, and I suspect that if
we *did* test them, we'd find that ASPM doesn't work with the current
kernel.

So I think this is more likely to *fix* something than to break it.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 03/13] PCI/ASPM: Disable ASPM when driver requests it

2023-10-11 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 04:10:53PM +0300, Ilpo Järvinen wrote:
> PCI core/ASPM service driver allows controlling ASPM state through
> pci_disable_link_state() and pci_enable_link_state() API. It was
> decided earlier (see the Link below), to not allow ASPM changes when OS
> does not have control over it but only log a warning about the problem
> (commit 2add0ec14c25 ("PCI/ASPM: Warn when driver asks to disable ASPM,
> but we can't do it")). Similarly, if ASPM is not enabled through
> config, ASPM cannot be disabled.
> ...

> +#ifndef CONFIG_PCIEASPM
> +/*
> + * Always disable ASPM when requested, even when CONFIG_PCIEASPM is
> + * not build to avoid drivers adding code to do it on their own
> + * which caused issues when core does not know about the out-of-band
> + * ASPM state changes.
> + */
> +int pci_disable_link_state_locked(struct pci_dev *pdev, int state)
> +{
> + struct pci_dev *parent = pdev->bus->self;
> + struct pci_bus *linkbus = pdev->bus;
> + struct pci_dev *child;
> + u16 aspm_enabled, linkctl;
> + int ret;
> +
> + if (!parent)
> + return -ENODEV;

P.S. I think this should look the same to the user (same dmesg log and
same taint, if we do that) as the CONFIG_PCIEASPM=y case.

> + ret = pcie_capability_read_word(parent, PCI_EXP_LNKCTL, );
> + if (ret != PCIBIOS_SUCCESSFUL)
> + return pcibios_err_to_errno(ret);
> + aspm_enabled = linkctl & PCI_EXP_LNKCTL_ASPMC;
> +
> + ret = pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, );
> + if (ret != PCIBIOS_SUCCESSFUL)
> + return pcibios_err_to_errno(ret);
> + aspm_enabled |= linkctl & PCI_EXP_LNKCTL_ASPMC;
> +
> + /* If no states need to be disabled, don't touch LNKCTL */
> + if (state & aspm_enabled)
> + return 0;
> +
> + ret = pcie_capability_clear_word(parent, PCI_EXP_LNKCTL, 
> PCI_EXP_LNKCTL_ASPMC);
> + if (ret != PCIBIOS_SUCCESSFUL)
> + return pcibios_err_to_errno(ret);
> + list_for_each_entry(child, >devices, bus_list)
> + pcie_capability_clear_word(child, PCI_EXP_LNKCTL, 
> PCI_EXP_LNKCTL_ASPMC);

This disables *all* ASPM states, unlike the version when
CONFIG_PCIEASPM is enabled.  I suppose there's a reason, and maybe a
comment could elaborate on it?

When CONFIG_PCIEASPM is not enabled, I don't think we actively
*disable* ASPM in the hardware; we just leave it as-is, so firmware
might have left it enabled.

> +
> + return 0;
> +}

Conceptually it seems like the LNKCTL updates here should be the same
whether CONFIG_PCIEASPM is enabled or not (subject to the question
above).

When CONFIG_PCIEASPM is enabled, we might need to do more stuff, but
it seems like the core should be the same.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 05/13] PCI/ASPM: Add pci_enable_link_state()

2023-10-11 Thread Bjorn Helgaas
On Mon, Sep 18, 2023 at 04:10:55PM +0300, Ilpo Järvinen wrote:
> pci_disable_link_state() lacks a symmetric pair. Some drivers want to
> disable ASPM during certain phases of their operation but then
> re-enable it later on. If pci_disable_link_state() is made for the
> device, there is currently no way to re-enable the states that were
> disabled.

pci_disable_link_state() gives drivers a way to disable specified ASPM
states using a bitmask (PCIE_LINK_STATE_L0S, PCIE_LINK_STATE_L1,
PCIE_LINK_STATE_L1_1, etc), but IIUC the driver can't tell exactly
what changed and can't directly restore the original state, e.g.,

  - PCIE_LINK_STATE_L1 enabled initially
  - driver calls pci_disable_link_state(PCIE_LINK_STATE_L0S)
  - driver calls pci_enable_link_state(PCIE_LINK_STATE_L0S)
  - PCIE_LINK_STATE_L0S and PCIE_LINK_STATE_L1 are enabled now

Now PCIE_LINK_STATE_L0S is enabled even though it was not initially
enabled.  Maybe that's what we want; I dunno.

pci_disable_link_state() currently returns success/failure, but only
r8169 and mt76 even check, and only rtl_init_one() (r8169) has a
non-trivial reason, so it's conceivable that it could return a bitmask
instead.

> Add pci_enable_link_state() to remove ASPM states from the state
> disable mask.
> 
> Signed-off-by: Ilpo Järvinen 
> ---
>  drivers/pci/pcie/aspm.c | 42 +
>  include/linux/pci.h |  2 ++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 91dc95aca90f..f45d18d47c20 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1117,6 +1117,48 @@ int pci_disable_link_state(struct pci_dev *pdev, int 
> state)
>  }
>  EXPORT_SYMBOL(pci_disable_link_state);
>  
> +/**
> + * pci_enable_link_state - Re-enable device's link state
> + * @pdev: PCI device
> + * @state: ASPM link states to re-enable
> + *
> + * Enable device's link state that were previously disable so the link is

"state[s] that were previously disable[d]" alludes to the use case you
have in mind, but I don't think it describes how this function
actually works.  This function just makes it possible to enable the
specified states.  The @state parameter may have nothing to do with
any previously disabled states.

> + * allowed to enter the specific states. Note that if the BIOS didn't grant
> + * ASPM control to the OS, this does nothing because we can't touch the
> + * LNKCTL register.
> + *
> + * Return: 0 or a negative errno.
> + */
> +int pci_enable_link_state(struct pci_dev *pdev, int state)
> +{
> + struct pcie_link_state *link = pcie_aspm_get_link(pdev);
> +
> + if (!link)
> + return -EINVAL;
> + /*
> +  * A driver requested that ASPM be enabled on this device, but
> +  * if we don't have permission to manage ASPM (e.g., on ACPI
> +  * systems we have to observe the FADT ACPI_FADT_NO_ASPM bit and
> +  * the _OSC method), we can't honor that request.
> +  */
> + if (aspm_disabled) {
> + pci_warn(pdev, "can't enable ASPM; OS doesn't have ASPM 
> control\n");
> + return -EPERM;
> + }
> +
> + mutex_lock(_lock);
> + link->aspm_disable &= ~pci_link_state_mask(state);
> + pcie_config_aspm_link(link, policy_to_aspm_state(link));
> +
> + if (state & PCIE_LINK_STATE_CLKPM)
> + link->clkpm_disable = 0;
> + pcie_set_clkpm(link, policy_to_clkpm_state(link));
> + mutex_unlock(_lock);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(pci_enable_link_state);
> +
>  /**
>   * pci_set_default_link_state - Set the default device link state
>   * @pdev: PCI device
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 3c24ca164104..844d09230264 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1776,11 +1776,13 @@ extern bool pcie_ports_native;
>  int pci_disable_link_state(struct pci_dev *pdev, int state);
>  int pci_disable_link_state_locked(struct pci_dev *pdev, int state);
>  #ifdef CONFIG_PCIEASPM
> +int pci_enable_link_state(struct pci_dev *pdev, int state);
>  int pci_set_default_link_state(struct pci_dev *pdev, int state);
>  void pcie_no_aspm(void);
>  bool pcie_aspm_support_enabled(void);
>  bool pcie_aspm_enabled(struct pci_dev *pdev);
>  #else
> +static inline int pci_enable_link_state(struct pci_dev *pdev, int state) { 
> return -EOPNOTSUPP; }
>  static inline int pci_set_default_link_state(struct pci_dev *pdev, int state)
>  { return 0; }
>  static inline void pcie_no_aspm(void) { }
> -- 
> 2.30.2
> 
> 
> -- 
> ath12k mailing list
> ath...@lists.infradead.org
> https://lists.infradead.org/mailman/listinfo/ath12k

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 05/13] PCI/ASPM: Add pci_enable_link_state()

2023-10-13 Thread Bjorn Helgaas
On Thu, Oct 12, 2023 at 03:53:39PM +0300, Ilpo Järvinen wrote:
> On Wed, 11 Oct 2023, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 04:10:55PM +0300, Ilpo Järvinen wrote:
> > > pci_disable_link_state() lacks a symmetric pair. Some drivers want to
> > > disable ASPM during certain phases of their operation but then
> > > re-enable it later on. If pci_disable_link_state() is made for the
> > > device, there is currently no way to re-enable the states that were
> > > disabled.
> > 
> > pci_disable_link_state() gives drivers a way to disable specified ASPM
> > states using a bitmask (PCIE_LINK_STATE_L0S, PCIE_LINK_STATE_L1,
> > PCIE_LINK_STATE_L1_1, etc), but IIUC the driver can't tell exactly
> > what changed and can't directly restore the original state, e.g.,
> > 
> >   - PCIE_LINK_STATE_L1 enabled initially
> >   - driver calls pci_disable_link_state(PCIE_LINK_STATE_L0S)
> >   - driver calls pci_enable_link_state(PCIE_LINK_STATE_L0S)
> >   - PCIE_LINK_STATE_L0S and PCIE_LINK_STATE_L1 are enabled now
> > 
> > Now PCIE_LINK_STATE_L0S is enabled even though it was not initially
> > enabled.  Maybe that's what we want; I dunno.
> > 
> > pci_disable_link_state() currently returns success/failure, but only
> > r8169 and mt76 even check, and only rtl_init_one() (r8169) has a
> > non-trivial reason, so it's conceivable that it could return a bitmask
> > instead.
> 
> It's great that you suggested this since it's actually what also I've been 
> started to think should be done instead of this straightforward approach
> I used in V2. 
> 
> That is, don't have the drivers to get anything directly from LNKCTL
> but they should get everything through the API provided by the 
> disable/enable calls which makes it easy for the driver to pass the same
> value back into the enable call.
> 
> > > Add pci_enable_link_state() to remove ASPM states from the state
> > > disable mask.
> > > 
> > > Signed-off-by: Ilpo Järvinen 
> > > ---
> > >  drivers/pci/pcie/aspm.c | 42 +
> > >  include/linux/pci.h |  2 ++
> > >  2 files changed, 44 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index 91dc95aca90f..f45d18d47c20 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -1117,6 +1117,48 @@ int pci_disable_link_state(struct pci_dev *pdev, 
> > > int state)
> > >  }
> > >  EXPORT_SYMBOL(pci_disable_link_state);
> > >  
> > > +/**
> > > + * pci_enable_link_state - Re-enable device's link state
> > > + * @pdev: PCI device
> > > + * @state: ASPM link states to re-enable
> > > + *
> > > + * Enable device's link state that were previously disable so the link is
> > 
> > "state[s] that were previously disable[d]" alludes to the use case you
> > have in mind, but I don't think it describes how this function
> > actually works.  This function just makes it possible to enable the
> > specified states.  The @state parameter may have nothing to do with
> > any previously disabled states.
> 
> Yes, it's what I've been thinking between the lines. But I see your point 
> that this API didn't make it easy/obvious as is.
> 
> Would you want me to enforce it too besides altering the API such that the 
> states are actually returned from disable call? (I don't personally find
> that necessary as long as the API pair itself makes it obvious what the 
> driver is expect to pass there.)

This was just a comment about the doc not matching the function
behavior.

I think we have to support pci_enable_link_state() even if the driver
hasn't previously called pci_disable_link_state(), so drivers have to
be able to specify the pci_enable_link_state() @state from scratch.

Does that answer the enforcement question?  I don't think we can
really enforce anything other than that @state specifies valid ASPM
states.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 03/13] PCI/ASPM: Disable ASPM when driver requests it

2023-10-13 Thread Bjorn Helgaas
On Thu, Oct 12, 2023 at 01:56:16PM +0300, Ilpo Järvinen wrote:
> On Wed, 11 Oct 2023, Bjorn Helgaas wrote:
> > On Mon, Sep 18, 2023 at 04:10:53PM +0300, Ilpo Järvinen wrote:
> > > PCI core/ASPM service driver allows controlling ASPM state through
> > > pci_disable_link_state() and pci_enable_link_state() API. It was
> > > decided earlier (see the Link below), to not allow ASPM changes when OS
> > > does not have control over it but only log a warning about the problem
> > > (commit 2add0ec14c25 ("PCI/ASPM: Warn when driver asks to disable ASPM,
> > > but we can't do it")). Similarly, if ASPM is not enabled through
> > > config, ASPM cannot be disabled.
> ...

> > This disables *all* ASPM states, unlike the version when
> > CONFIG_PCIEASPM is enabled.  I suppose there's a reason, and maybe a
> > comment could elaborate on it?
> >
> > When CONFIG_PCIEASPM is not enabled, I don't think we actively
> > *disable* ASPM in the hardware; we just leave it as-is, so firmware
> > might have left it enabled.
> 
> This whole trickery is intended for drivers that do not want to have ASPM 
> because the devices are broken with it. So leaving it as-is is not really 
> an option (as demonstrated by the custom workarounds).

Right.

> > Conceptually it seems like the LNKCTL updates here should be the same
> > whether CONFIG_PCIEASPM is enabled or not (subject to the question
> > above).
> > 
> > When CONFIG_PCIEASPM is enabled, we might need to do more stuff, but
> > it seems like the core should be the same.
> 
> So you think it's safer to partially disable ASPM (as per driver's 
> request) rather than disable it completely? I got the impression that the 
> latter might be safer from what Rafael said earlier but I suppose I might 
> have misinterpreted him since he didn't exactly say that it might be safer 
> to _completely_ disable it.

My question is whether the state of the device should depend on
CONFIG_PCIEASPM.  If the driver does this:

  pci_disable_link_state(PCIE_LINK_STATE_L0S)

do we want to leave L1 enabled when CONFIG_PCIEASPM=y but disable L1
when CONFIG_PCIEASPM is unset?

I can see arguments both ways.  My thought was that it would be nice
to end up with a single implementation of pci_disable_link_state()
with an #ifdef around the CONFIG_PCIEASPM-enabled stuff because it
makes the code easier to read.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/3] wifi: ath9k: stop loading incompatible DT cal data

2023-01-17 Thread Bjorn Helgaas
On Tue, Jan 17, 2023 at 05:27:46PM +0800, Edward Chow wrote:
> Loading calibration data from an OF device tree node not declared
> compatible with the device (e.g. a PCI device with calibration data
> from corresponding DT node gets replaced, so the newly installed
> device become incompatible with the node) or driver may lead to fatal
> result, e.g. kernel panic.

Please include a link to a bug report and include a few lines of the
oops or panic directly in the commit log so when users see this
problem, they can search for the text and possibly find this fix.

> The driver should check whether the DT node corresponding to the
> device compatible with it, and load calibration data only from
> compatible node.

If you read this commit log carefully, it doesn't actually say what
this patch *does*.  It has some background and this assertion about
what drivers *should* do, but it doesn't say what this patch does.

Suggest structure like this (flesh out with the relevant DT property
names, etc):

  For PCI ath9k devices, load calibration data only if there is a DT
  node corresponding to the device with XXX ...

More details: https://chris.beams.io/posts/git-commit/

> Signed-off-by: Edward Chow 
> ---
>  drivers/net/wireless/ath/ath9k/ath9k.h |  1 +
>  drivers/net/wireless/ath/ath9k/init.c  | 26 ++
>  drivers/net/wireless/ath/ath9k/pci.c   |  2 +-
>  3 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h 
> b/drivers/net/wireless/ath/ath9k/ath9k.h
> index 2cc23605c9fc..4f6f0383a5f8 100644
> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> @@ -35,6 +35,7 @@ struct ath_node;
>  struct ath_vif;
>  
>  extern struct ieee80211_ops ath9k_ops;
> +extern struct pci_driver ath_pci_driver;
>  extern int ath9k_modparam_nohwcrypt;
>  extern int ath9k_led_blink;
>  extern bool is_ath9k_unloaded;
> diff --git a/drivers/net/wireless/ath/ath9k/init.c 
> b/drivers/net/wireless/ath/ath9k/init.c
> index 4f00400c7ffb..f88a48e8456b 100644
> --- a/drivers/net/wireless/ath/ath9k/init.c
> +++ b/drivers/net/wireless/ath/ath9k/init.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -577,6 +578,31 @@ static int ath9k_nvmem_request_eeprom(struct ath_softc 
> *sc)
>   size_t len;
>   int err;
>  
> + /* devm_nvmem_cell_get() will get a cell first from the OF
> +  * DT node representing the given device with nvmem-cell-name
> +  * "calibration", and from the global lookup table as a fallback,
> +  * and an ath9k device could be either a pci one or a platform one.
> +  *
> +  * If the OF DT node is not compatible with the real device, the
> +  * calibration data got from the node should not be applied.
> +  *
> +  * dev_is_pci(sc->dev) && ( no OF node || caldata not from node
> +  * || not compatible ) -> do not use caldata .
> +  *
> +  * !dev_is_pci(sc->dev) -> always use caldata .
> +  */
> + if (dev_is_pci(sc->dev) &&
> + (!sc->dev->of_node ||
> +  !of_property_match_string(sc->dev->of_node,
> +"nvmem-cell-names",
> +"calibration") ||
> +  !of_pci_node_match_driver(sc->dev->of_node,
> +_pci_driver)))
> + /* follow the "just return 0;" convention as
> +  * noted below.
> +  */
> + return 0;
> +
>   cell = devm_nvmem_cell_get(sc->dev, "calibration");
>   if (IS_ERR(cell)) {
>   err = PTR_ERR(cell);
> diff --git a/drivers/net/wireless/ath/ath9k/pci.c 
> b/drivers/net/wireless/ath/ath9k/pci.c
> index a074e23013c5..fcb19761e60d 100644
> --- a/drivers/net/wireless/ath/ath9k/pci.c
> +++ b/drivers/net/wireless/ath/ath9k/pci.c
> @@ -1074,7 +1074,7 @@ static SIMPLE_DEV_PM_OPS(ath9k_pm_ops, ath_pci_suspend, 
> ath_pci_resume);
>  
>  MODULE_DEVICE_TABLE(pci, ath_pci_id_table);
>  
> -static struct pci_driver ath_pci_driver = {
> +struct pci_driver ath_pci_driver = {
>   .name   = "ath9k",
>   .id_table   = ath_pci_id_table,
>   .probe  = ath_pci_probe,
> -- 
> 2.39.0
> 

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 9/9] wifi: ath10k: Use RMW accessors for changing LNKCTL

2023-05-26 Thread Bjorn Helgaas
On Thu, May 25, 2023 at 01:11:51PM +0300, Ilpo Järvinen wrote:
> On Wed, 24 May 2023, Bjorn Helgaas wrote:
> > On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote:
> > > Don't assume that only the driver would be accessing LNKCTL. ASPM
> > > policy changes can trigger write to LNKCTL outside of driver's control.
> > > 
> > > Use RMW capability accessors which does proper locking to avoid losing
> > > concurrent updates to the register value. On restore, clear the ASPMC
> > > field properly.
> > > 
> > > Fixes: 76d870ed09ab ("ath10k: enable ASPM")
> > > Suggested-by: Lukas Wunner 
> > > Signed-off-by: Ilpo Järvinen 
> > > Cc: sta...@vger.kernel.org
> > > ---
> > >  drivers/net/wireless/ath/ath10k/pci.c | 9 +
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
> > > b/drivers/net/wireless/ath/ath10k/pci.c
> > > index a7f44f6335fb..9275a672f90c 100644
> > > --- a/drivers/net/wireless/ath/ath10k/pci.c
> > > +++ b/drivers/net/wireless/ath/ath10k/pci.c
> > > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar)
> > >   ath10k_pci_irq_enable(ar);
> > >   ath10k_pci_rx_post(ar);
> > >  
> > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > -ar_pci->link_ctl);
> > > + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > +PCI_EXP_LNKCTL_ASPMC,
> > > +ar_pci->link_ctl & 
> > > PCI_EXP_LNKCTL_ASPMC);
> > >  
> > >   return 0;
> > >  }
> > > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k 
> > > *ar,
> > >  
> > >   pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > _pci->link_ctl);
> > > - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > -ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC);
> > > + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > +PCI_EXP_LNKCTL_ASPMC);
> > 
> > These ath drivers all have the form:
> > 
> >   1) read LNKCTL
> >   2) save LNKCTL value in ->link_ctl
> >   3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC"
> >  to disable ASPM
> >   4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM
> > 
> > These patches close the hole between 1) and 3) where other LNKCTL
> > updates could interfere, which is definitely a good thing.
> > 
> > But the hole between 1) and 4) is much bigger and still there.  Any
> > update by the PCI core in that interval would be lost.
> 
> Any update to PCI_EXP_LNKCTL_ASPMC field in that interval is lost yes, the 
> updates to _the other fields_ in LNKCTL are not lost.

Ah, yes, you're right, I missed the masking to PCI_EXP_LNKCTL_ASPMC in
the pcie_capability_clear_word().

> > Straw-man proposal:
> > 
> >   - Change pci_disable_link_state() so it ignores aspm_disabled and
> > always disables ASPM even if platform firmware hasn't granted
> > ownership.  Maybe this should warn and taint the kernel.
> > 
> >   - Change drivers to use pci_disable_link_state() instead of writing
> > LNKCTL directly.
> 
> I fully agree that's the direction we should be moving, yes. However, I'm 
> a bit hesitant to take that leap in one step. These drivers currently not 
> only disable ASPM but also re-enable it (assuming we guessed the intent
> right).
> 
> If I directly implement that proposal, ASPM is not going to be re-enabled 
> when PCI core does not allowing it. Could it cause some power related 
> regression?

IIUC the potential problem only happens with:

  - A platform that enables ASPM but doesn't grant PCIe Capability
ownership to the OS, and

  - A device where we force-disable ASPM, presumably to avoid some
hardware defect.

I'm not sure this case is worth worrying about.  A platform that
enables ASPM without allowing the OS to disable it is taking a risk
because it can't know about these device defects or even about user
preferences.  A device that has an ASPM-related defect may use more
power than necessary.  I think that's to be expected.

> My plan is to make another patch series after these to realize exactly 
> what you're proposing. It would allow better to isolate the problems that 
> related to the lack of ASPM.
> 
> I hope this two step approach is an acceptable way forward? I can of 
> course add those patches on top of these if that would be preferrable.

I think two steps is OK.  It's a little more work for the driver
maintainers to review them, but this step is pretty trivial already
reviewed (except for the GPUs, which are probably the most important :)).

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 9/9] wifi: ath10k: Use RMW accessors for changing LNKCTL

2023-05-26 Thread Bjorn Helgaas
On Fri, May 26, 2023 at 02:48:44PM +0300, Ilpo Järvinen wrote:
> On Thu, 25 May 2023, Ilpo Järvinen wrote:
> > On Wed, 24 May 2023, Bjorn Helgaas wrote:
> > > On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote:
> > > > Don't assume that only the driver would be accessing LNKCTL. ASPM
> > > > policy changes can trigger write to LNKCTL outside of driver's control.
> > > > 
> > > > Use RMW capability accessors which does proper locking to avoid losing
> > > > concurrent updates to the register value. On restore, clear the ASPMC
> > > > field properly.
> > > > 
> > > > Fixes: 76d870ed09ab ("ath10k: enable ASPM")
> > > > Suggested-by: Lukas Wunner 
> > > > Signed-off-by: Ilpo Järvinen 
> > > > Cc: sta...@vger.kernel.org
> > > > ---
> > > >  drivers/net/wireless/ath/ath10k/pci.c | 9 +
> > > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
> > > > b/drivers/net/wireless/ath/ath10k/pci.c
> > > > index a7f44f6335fb..9275a672f90c 100644
> > > > --- a/drivers/net/wireless/ath/ath10k/pci.c
> > > > +++ b/drivers/net/wireless/ath/ath10k/pci.c
> > > > @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar)
> > > > ath10k_pci_irq_enable(ar);
> > > > ath10k_pci_rx_post(ar);
> > > >  
> > > > -   pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > > -  ar_pci->link_ctl);
> > > > +   pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > > +  PCI_EXP_LNKCTL_ASPMC,
> > > > +  ar_pci->link_ctl & 
> > > > PCI_EXP_LNKCTL_ASPMC);
> > > >  
> > > > return 0;
> > > >  }
> > > > @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k 
> > > > *ar,
> > > >  
> > > > pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > >   _pci->link_ctl);
> > > > -   pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > > -  ar_pci->link_ctl & 
> > > > ~PCI_EXP_LNKCTL_ASPMC);
> > > > +   pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> > > > +  PCI_EXP_LNKCTL_ASPMC);
> > > 
> > > These ath drivers all have the form:
> > > 
> > >   1) read LNKCTL
> > >   2) save LNKCTL value in ->link_ctl
> > >   3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC"
> > >  to disable ASPM
> > >   4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM
> > > 
> > > These patches close the hole between 1) and 3) where other LNKCTL
> > > updates could interfere, which is definitely a good thing.
> > > 
> > > But the hole between 1) and 4) is much bigger and still there.  Any
> > > update by the PCI core in that interval would be lost.
> > 
> > Any update to PCI_EXP_LNKCTL_ASPMC field in that interval is lost yes, the 
> > updates to _the other fields_ in LNKCTL are not lost.
> > 
> > I know this might result in drivers/pci/pcie/aspm.c disagreeing what
> > the state of the ASPM is (as shown under sysfs) compared with LNKCTL 
> > value but the cause can no longer be due racing RMW. Essentially, 4) is 
> > seen as an override to what core did if it changed ASPMC in between. 
> > Technically, something is still "lost" like you say but for a different 
> > reason than this series is trying to fix.
> > 
> > > Straw-man proposal:
> > > 
> > >   - Change pci_disable_link_state() so it ignores aspm_disabled and
> > > always disables ASPM even if platform firmware hasn't granted
> > > ownership.  Maybe this should warn and taint the kernel.
> > > 
> > >   - Change drivers to use pci_disable_link_state() instead of writing
> > > LNKCTL directly.
> 
> Now that I took a deeper look into what pci_disable_link_state() and 
> pci_enable_link_state() do, I realized they're not really disable/enable 
> pair like I had assumed from their names. Disable adds to ->aspm_disable 
> and flags are never removed from that because enable does not touch 
> aspm_disable at all but has it's own flag variable. This asymmetry looks 
> intentional.

Yes, that's an annoying feature.  There's only one caller of
pci_enable_link_state(), so it may be possible to make this more
symmetric.

> So if ath drivers would do pci_disable_link_state() to realize 1)-3), 
> there is no way to undo it in 4). It looks as if ath drivers would 
> actually want to use pci_enable_link_state() with different state 
> parameters to realize what they want to do in 1)-4).

Yeah, that does sound like a problem.  I don't have any great ideas.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2 9/9] wifi: ath10k: Use RMW accessors for changing LNKCTL

2023-05-24 Thread Bjorn Helgaas
On Wed, May 17, 2023 at 01:52:35PM +0300, Ilpo Järvinen wrote:
> Don't assume that only the driver would be accessing LNKCTL. ASPM
> policy changes can trigger write to LNKCTL outside of driver's control.
> 
> Use RMW capability accessors which does proper locking to avoid losing
> concurrent updates to the register value. On restore, clear the ASPMC
> field properly.
> 
> Fixes: 76d870ed09ab ("ath10k: enable ASPM")
> Suggested-by: Lukas Wunner 
> Signed-off-by: Ilpo Järvinen 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/net/wireless/ath/ath10k/pci.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
> b/drivers/net/wireless/ath/ath10k/pci.c
> index a7f44f6335fb..9275a672f90c 100644
> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -1963,8 +1963,9 @@ static int ath10k_pci_hif_start(struct ath10k *ar)
>   ath10k_pci_irq_enable(ar);
>   ath10k_pci_rx_post(ar);
>  
> - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> -ar_pci->link_ctl);
> + pcie_capability_clear_and_set_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> +PCI_EXP_LNKCTL_ASPMC,
> +ar_pci->link_ctl & 
> PCI_EXP_LNKCTL_ASPMC);
>  
>   return 0;
>  }
> @@ -2821,8 +2822,8 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar,
>  
>   pcie_capability_read_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> _pci->link_ctl);
> - pcie_capability_write_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> -ar_pci->link_ctl & ~PCI_EXP_LNKCTL_ASPMC);
> + pcie_capability_clear_word(ar_pci->pdev, PCI_EXP_LNKCTL,
> +PCI_EXP_LNKCTL_ASPMC);

These ath drivers all have the form:

  1) read LNKCTL
  2) save LNKCTL value in ->link_ctl
  3) write LNKCTL with "->link_ctl & ~PCI_EXP_LNKCTL_ASPMC"
 to disable ASPM
  4) write LNKCTL with ->link_ctl, presumably to re-enable ASPM

These patches close the hole between 1) and 3) where other LNKCTL
updates could interfere, which is definitely a good thing.

But the hole between 1) and 4) is much bigger and still there.  Any
update by the PCI core in that interval would be lost.

Straw-man proposal:

  - Change pci_disable_link_state() so it ignores aspm_disabled and
always disables ASPM even if platform firmware hasn't granted
ownership.  Maybe this should warn and taint the kernel.

  - Change drivers to use pci_disable_link_state() instead of writing
LNKCTL directly.

Bjorn

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k