Re: [PATCH v3 5/7] PCI: ACPI: Detect PCIe root ports that are used for tunneling

2023-11-16 Thread Mika Westerberg
Hi Mario,

On Wed, Nov 15, 2023 at 11:08:43AM -0600, Mario Limonciello wrote:
> On 11/15/2023 04:40, Mika Westerberg wrote:
> > Hi Mario,
> > 
> > On Tue, Nov 14, 2023 at 02:07:53PM -0600, Mario Limonciello wrote:
> > > USB4 routers support a feature called "PCIe tunneling". This
> > > allows PCIe traffic to be transmitted over USB4 fabric.
> > > 
> > > PCIe root ports that are used in this fashion can be discovered
> > > by device specific data that specifies the USB4 router they are
> > > connected to. For the PCI core, the specific connection information
> > > doesn't matter, but it's interesting to know that this root port is
> > > used for tunneling traffic. This will allow other decisions to be
> > > made based upon it.
> > > 
> > > Detect the `usb4-host-interface` _DSD and if it's found save it
> > > into a new `is_virtual_link` bit in `struct pci_device`.
> > 
> > While this is fine for the "first" tunneled link, this does not take
> > into account possible other "virtual" links that lead to the endpoint in
> > question. Typically for eGPU it only makes sense to plug it directly to
> > the host but say there is a USB4 hub (with PCIe tunneling capabilities)
> > in the middle. Now the link from the hub to the eGPU that is also
> > "virtual" is not marked as such and the bandwidth calculations may not
> > get what is expected.
> 
> Right; you mentioned the DVSEC available for hubs in this case.  As I don't
> have one of these to validate it works properly I was thinking that should
> be a follow up.
> 
> If you think it should be part of the same series I'll add it, but I'd ask
> if you can please check I did it right on one that reports the DVSEC?

I don't think it should be part of this series. I just checked and DVSEC
is only required for hosts so kind of hardware equivalent for the _DSD
property you are using here. For hubs there is no such luxury
unfortunately.

I think I do have hardware here with the DVSEC in place so if you
decide to add it, I should be able to try it.


Re: [PATCH v3 5/7] PCI: ACPI: Detect PCIe root ports that are used for tunneling

2023-11-15 Thread Mika Westerberg
Hi Mario,

On Tue, Nov 14, 2023 at 02:07:53PM -0600, Mario Limonciello wrote:
> USB4 routers support a feature called "PCIe tunneling". This
> allows PCIe traffic to be transmitted over USB4 fabric.
> 
> PCIe root ports that are used in this fashion can be discovered
> by device specific data that specifies the USB4 router they are
> connected to. For the PCI core, the specific connection information
> doesn't matter, but it's interesting to know that this root port is
> used for tunneling traffic. This will allow other decisions to be
> made based upon it.
> 
> Detect the `usb4-host-interface` _DSD and if it's found save it
> into a new `is_virtual_link` bit in `struct pci_device`.

While this is fine for the "first" tunneled link, this does not take
into account possible other "virtual" links that lead to the endpoint in
question. Typically for eGPU it only makes sense to plug it directly to
the host but say there is a USB4 hub (with PCIe tunneling capabilities)
in the middle. Now the link from the hub to the eGPU that is also
"virtual" is not marked as such and the bandwidth calculations may not
get what is expected.

It should be possible to map the PCIe ports that go over USB4 links
through router port operation "Get PCIe Downstream Entry Mapping" and
for the Thunderbolt 3 there is the DROM entries (I believe Lukas has
patches for this part already) but I guess it is outside of the scope of
this series. Out of curiosity I tried to look in Windows documentation
if there is such interface for GPUs as we have in Linux but could not
find (which does not mean it does not exist, though).


Re: [PATCH v2 8/9] PCI: Exclude PCIe ports used for tunneling in pcie_bandwidth_available()

2023-11-06 Thread Mika Westerberg
On Tue, Nov 07, 2023 at 07:45:26AM +0200, Mika Westerberg wrote:
> Hi,
> 
> On Mon, Nov 06, 2023 at 07:56:52PM +0100, Lukas Wunner wrote:
> > On Mon, Nov 06, 2023 at 12:44:25PM -0600, Mario Limonciello wrote:
> > > Tangentially related; the link speed is currently symmetric but there are
> > > two sysfs files.  Mika left a comment in drivers/thunderbolt/switch.c it 
> > > may
> > > be asymmetric in the future. So we may need to keep that in mind on any
> > > design that builds on top of them.
> > 
> > Aren't asymmetric Thunderbolt speeds just a DisplayPort thing?
> 
> No, they affect the whole fabric. We have the initial code for
> asymmetric switching in v6.7-rc1.
> 
> > > As 'thunderbolt' can be a module or built in, we need to bring code into 
> > > PCI
> > > core so that it works in early boot before it loads.
> > 
> > tb_switch_get_generation() is small enough that it could be moved to the
> > PCI core.  I doubt that we need to make thunderbolt built-in only
> > or move a large amount of code to the PCI core.
> 
> If at all possible I would like to avoid this and littering PCI side
> with non-PCI stuff. There could be other similar "mediums" in the future
> where you can transfer packets of "native" protocols such as PCIe so
> instead of making it Thunderbolt/USB4 specific it should be generic
> enough to support future extensions.
> 
> In case of Thunderbolt/USB4 there is no real way to figure out how much
> bandwidth each PCIe tunnel gets (it is kind of bulk traffic that gets
> what is left from isochronous protocols) so I would not even try that
> and instead use the real PCIe links in pcie_bandwidth_available() and
> skip all the "virtual" ones.

Actually can we call the new function something like pci_link_is_virtual()
instead and make pcie_bandwidth_available() call it? That would be more
future proof IMHO.


Re: [PATCH v2 8/9] PCI: Exclude PCIe ports used for tunneling in pcie_bandwidth_available()

2023-11-06 Thread Mika Westerberg
Hi,

On Mon, Nov 06, 2023 at 07:56:52PM +0100, Lukas Wunner wrote:
> On Mon, Nov 06, 2023 at 12:44:25PM -0600, Mario Limonciello wrote:
> > Tangentially related; the link speed is currently symmetric but there are
> > two sysfs files.  Mika left a comment in drivers/thunderbolt/switch.c it may
> > be asymmetric in the future. So we may need to keep that in mind on any
> > design that builds on top of them.
> 
> Aren't asymmetric Thunderbolt speeds just a DisplayPort thing?

No, they affect the whole fabric. We have the initial code for
asymmetric switching in v6.7-rc1.

> > As 'thunderbolt' can be a module or built in, we need to bring code into PCI
> > core so that it works in early boot before it loads.
> 
> tb_switch_get_generation() is small enough that it could be moved to the
> PCI core.  I doubt that we need to make thunderbolt built-in only
> or move a large amount of code to the PCI core.

If at all possible I would like to avoid this and littering PCI side
with non-PCI stuff. There could be other similar "mediums" in the future
where you can transfer packets of "native" protocols such as PCIe so
instead of making it Thunderbolt/USB4 specific it should be generic
enough to support future extensions.

In case of Thunderbolt/USB4 there is no real way to figure out how much
bandwidth each PCIe tunnel gets (it is kind of bulk traffic that gets
what is left from isochronous protocols) so I would not even try that
and instead use the real PCIe links in pcie_bandwidth_available() and
skip all the "virtual" ones.


Re: [PATCH v2 1/9] drm/nouveau: Switch from pci_is_thunderbolt_attached() to dev_is_removable()

2023-11-06 Thread Mika Westerberg
On Mon, Nov 06, 2023 at 02:25:24PM +0200, Ilpo Järvinen wrote:
> On Fri, 3 Nov 2023, Mario Limonciello wrote:
> 
> > pci_is_thunderbolt_attached() only works for Intel TBT devices. Switch to
> > using dev_is_removable() to be able to detect USB4 devices as well.
> 
> Please extend this with more details. I had to lookup the TBT change to 
> be able to make any guess why you're doing this (and it's still a guess 
> at best).

Also please write it as Thunderbolt not TBT so that it is clear what we
are talking about. Ditto to all patches.


Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-07 Thread Mika Westerberg
On Thu, Oct 06, 2022 at 10:53:44AM -0600, Jason A. Donenfeld wrote:
>  drivers/thunderbolt/xdomain.c  |  2 +-

Acked-by: Mika Westerberg 


Re: Question regarding using Driver Component Framework (crashing kernel)

2022-03-23 Thread Mika Westerberg
Hi,

Don't you have like a serial port that you can turn on early debugging
to see where it blows up? If not that then at least the USB 3 DbC.

On Tue, Mar 22, 2022 at 12:01:15PM -0700, Won Chung wrote:
> Hi,
> 
> I am not sure if this is the correct mailing list to ask about Driver
> Component Framework, so if I am mistaken, I would appreciate it if
> anyone can correct me to the right direction.
> 
> I have a quick question on using driver component framework and a
> strange crash I am seeing when I use component framework.
> 
> There is then component framework currently being used for typec
> connector, as below
> https://elixir.bootlin.com/linux/latest/source/drivers/usb/typec/port-mapper.c
> We add component for usb port to link it with typec connector, as below
> https://elixir.bootlin.com/linux/latest/source/drivers/usb/core/port.c#L622
> https://elixir.bootlin.com/linux/latest/source/drivers/usb/core/port.c#L568
> 
> We are trying to do something similar for usb4 port, add component to
> link to typec connector.
> I have attached a patch to this email for reference.
> However, this change causes a crash in kernel for some devices.
> The crash is too early in the kernel that it is not giving any logs at all.
> This crash is somehow fixed when we change usb4 config from =y
> (built-in) to =m (loadable module).
> So, I am curious if there are some aspects in the component framework
> that depends on whether the device with a component added is a
> built-in or loadable module.
> 
> If anyone has seen a similar issue or have an idea on what is
> happening, can you share your thoughts on what might be the issue and
> how the crash can be avoided?
> Thank you very much!
> 
> Won




Re: [PATCH v5 3/7] PCI: Drop the `is_thunderbolt` attribute from PCI core

2022-02-28 Thread Mika Westerberg
Hi,

On Mon, Feb 28, 2022 at 10:36:59PM +, Limonciello, Mario wrote:
> [AMD Official Use Only]
> 
> > -Original Message-
> > From: Lukas Wunner 
> > Sent: Monday, February 28, 2022 16:33
> > To: Bjorn Helgaas 
> > Cc: Limonciello, Mario ; Mika Westerberg
> > ; Michael Jamet
> > ; open list:PCI SUBSYSTEM  > p...@vger.kernel.org>; open list:THUNDERBOLT DRIVER  > u...@vger.kernel.org>; Yehezkel Bernat ; open
> > list:DRM DRIVERS ; open list:X86
> > PLATFORM DRIVERS ; Andreas
> > Noever ; open list:RADEON and AMDGPU
> > DRM DRIVERS ; open list:DRM DRIVER FOR
> > NVIDIA GEFORCE/QUADRO GPUS ; Bjorn
> > Helgaas ; Deucher, Alexander
> > 
> > Subject: Re: [PATCH v5 3/7] PCI: Drop the `is_thunderbolt` attribute from 
> > PCI
> > core
> > 
> > On Mon, Feb 28, 2022 at 04:13:44PM -0600, Bjorn Helgaas wrote:
> > > On Mon, Feb 28, 2022 at 03:33:13PM +, Limonciello, Mario wrote:
> > > > > On Fri, Feb 25, 2022 at 11:42:24AM -0600, Bjorn Helgaas wrote:
> > > > > > That would just leave the "PCI_VSEC_ID_INTEL_TBT implies external-
> > > > > facing"
> > > > > > assumption above.  Not having a Thunderbolt spec, I have no idea
> > how
> > > > > > you deal with that.
> > > > >
> > > > > You can download the spec here:
> > [...]
> > > > > Inside the archive there is also the DVSEC spec with name "USB4 DVSEC
> > > > > Version 1.0.pdf".
> > > >
> > > > The spec has Host_Router_indication (bits 18-19) as meaning external
> > facing.
> > > > I'll respin the patch 3 for using that.
> > >
> > > Thanks, please include the spec citation when you do.  And probably
> > > the URL, because it's not at all obvious how the casual reader would
> > > get from "is_thunderbolt" to a recent add-on to the USB4 spec.
> > 
> > PCI_VSEC_ID_INTEL_TBT is not mentioned at all in the USB4 spec,
> > hence there's no connection between "is_thunderbolt" and the USB4 spec.
> > 
> > It's a proprietary VSEC used by Intel and the only way to recognize
> > pre-USB4 Thunderbolt devices that I know of.  Its ID is also
> > different from the DVSEC IDs given in the above-mentioned spec.
> > 
> > Thanks,
> 
> The USB4 DVSEC spec makes comments about DVSEC_ID of 0x8086 and also
> DVSEC VENDOR_ID of 0x8086.  Is that not also present on the Intel TBT3 
> controllers?
> 
> My interpretation of this (and Mika's comment) was that rather than
> looking at the Intel VSEC we should look at the USB4 DVSEC to detect
> the Intel TBT3 controllers.

For pre-USB4 controllers (TBT 1-3) we need to use the existing method
(or a quirk based on device ID) as they don't have the USB4 DVSEC.


Re: [PATCH v5 3/7] PCI: Drop the `is_thunderbolt` attribute from PCI core

2022-02-28 Thread Mika Westerberg
Hi Bjorn,

On Fri, Feb 25, 2022 at 11:42:24AM -0600, Bjorn Helgaas wrote:
> That would just leave the "PCI_VSEC_ID_INTEL_TBT implies external-facing"
> assumption above.  Not having a Thunderbolt spec, I have no idea how
> you deal with that.

You can download the spec here:

https://www.usb.org/sites/default/files/USB4%20Specification%202026.zip

Inside the archive there is also the DVSEC spec with name "USB4 DVSEC
Version 1.0.pdf".


Re: [PATCH v4 00/10] Overhaul `is_thunderbolt`

2022-02-17 Thread Mika Westerberg
Hi Mario,

On Wed, Feb 16, 2022 at 10:50:31AM -0600, Limonciello, Mario wrote:
> On 2/16/2022 08:44, Alex Deucher wrote:
> > On Wed, Feb 16, 2022 at 9:34 AM Mika Westerberg
> >  wrote:
> > > 
> > > Hi all,
> > > 
> > > On Tue, Feb 15, 2022 at 01:07:00PM -0600, Limonciello, Mario wrote:
> > > > On 2/15/2022 01:29, Lukas Wunner wrote:
> > > > > On Mon, Feb 14, 2022 at 06:01:50PM -0600, Mario Limonciello wrote:
> > > > > >drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  2 +-
> > > > > >drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c  |  2 +-
> > > > > >drivers/gpu/drm/nouveau/nouveau_vga.c   |  4 +-
> > > > > >drivers/gpu/drm/radeon/radeon_device.c  |  4 +-
> > > > > >drivers/gpu/drm/radeon/radeon_kms.c |  2 +-
> > > > > >drivers/pci/hotplug/pciehp_hpc.c|  6 +-
> > > > > >drivers/pci/pci-acpi.c  | 15 -
> > > > > >drivers/pci/pci.c   | 17 +++--
> > > > > >drivers/pci/probe.c | 52 ++-
> > > > > >drivers/pci/quirks.c| 84 
> > > > > > +
> > > > > >drivers/platform/x86/apple-gmux.c   |  2 +-
> > > > > >drivers/thunderbolt/nhi.h   |  2 -
> > > > > >include/linux/pci.h | 25 +---
> > > > > >include/linux/pci_ids.h |  3 +
> > > > > >14 files changed, 173 insertions(+), 47 deletions(-)
> > > > > 
> > > > > That's an awful lot of additional LoC for what is primarily
> > > > > a refactoring job with the intent to simplify things.
> > > > 
> > > > You may recall the first version of this series was just for adding
> > > > USB4 matches to the existing code paths, and that's when it was noted
> > > > that is_thunderbolt is a bit overloaded.
> > > > 
> > > > > 
> > > > > Honestly this looks like an attempt to fix something that
> > > > > isn't broken.  Specifically, the is_thunderbolt bit apparently
> > > > > can't be removed without adding new bits to struct pci_dev.
> > > > > Not sure if that can be called progress. >
> > > > > Thanks,
> > > > > 
> > > > > Lukas
> > > > 
> > > > Within this series there are two new material patches; setting up root 
> > > > ports
> > > > for both integrated and discrete USB4 controllers to behave well with 
> > > > all
> > > > the existing drivers that rely upon a hint of how they're connected to
> > > > configure devices differently.
> > > > 
> > > > If y'all collectively prefer this direction to not refactor 
> > > > is_thunderbolt
> > > > and push into quirks, a simpler version of this series would be to 
> > > > leave all
> > > > the quirks in place, just drop dev->is_thunderbolt, and set
> > > > dev->external_facing on all 3 cases:
> > > > 
> > > > * Intel TBT controller
> > > > * USB4 integrated PCIe tunneling root port/XHCI tunneling root port
> > > > * USB4 disctete PCIe tunneling root port/XHCI tunneling root port
> > > > 
> > > > All the other drivers and symbols can stay the same then.
> > > 
> > > If I understand correctly the original intention of this patch series is
> > > to be able to differentiate whether the device is "permanently"
> > > connected to the motherboard, or it is connected over some hot-pluggable
> > > bus (PCIe, USB, USB4 for example but I'm sure there are other buses that
> > > fit into this picture too). Specifically this is needed for discrete
> > > GPUs because of power management differences or so (please correct me if
> > > I'm mistaken).
> 
> Correct.  It might be possible to drop the patch for the integrated case
> (patch 3) because I do think that by Microsoft having the _DSD for
> "ExternalFacingPort" it's very likely that most implementations will have
> used it for the appropriate PCIe root ports.  If something shows up in the
> wild that this isn't the case it could be revisited.  If it's found
> pre-production presumably the OEM can still fix it and if it's post
> production and there are problems we can dust it off then.

Yeah, that's most likely the case.

> The discrete USB4

Re: [PATCH v4 00/10] Overhaul `is_thunderbolt`

2022-02-16 Thread Mika Westerberg
Hi all,

On Tue, Feb 15, 2022 at 01:07:00PM -0600, Limonciello, Mario wrote:
> On 2/15/2022 01:29, Lukas Wunner wrote:
> > On Mon, Feb 14, 2022 at 06:01:50PM -0600, Mario Limonciello wrote:
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  2 +-
> > >   drivers/gpu/drm/amd/amdgpu/nbio_v2_3.c  |  2 +-
> > >   drivers/gpu/drm/nouveau/nouveau_vga.c   |  4 +-
> > >   drivers/gpu/drm/radeon/radeon_device.c  |  4 +-
> > >   drivers/gpu/drm/radeon/radeon_kms.c |  2 +-
> > >   drivers/pci/hotplug/pciehp_hpc.c|  6 +-
> > >   drivers/pci/pci-acpi.c  | 15 -
> > >   drivers/pci/pci.c   | 17 +++--
> > >   drivers/pci/probe.c | 52 ++-
> > >   drivers/pci/quirks.c| 84 +
> > >   drivers/platform/x86/apple-gmux.c   |  2 +-
> > >   drivers/thunderbolt/nhi.h   |  2 -
> > >   include/linux/pci.h | 25 +---
> > >   include/linux/pci_ids.h |  3 +
> > >   14 files changed, 173 insertions(+), 47 deletions(-)
> > 
> > That's an awful lot of additional LoC for what is primarily
> > a refactoring job with the intent to simplify things.
> 
> You may recall the first version of this series was just for adding
> USB4 matches to the existing code paths, and that's when it was noted
> that is_thunderbolt is a bit overloaded.
> 
> > 
> > Honestly this looks like an attempt to fix something that
> > isn't broken.  Specifically, the is_thunderbolt bit apparently
> > can't be removed without adding new bits to struct pci_dev.
> > Not sure if that can be called progress. >
> > Thanks,
> > 
> > Lukas
> 
> Within this series there are two new material patches; setting up root ports
> for both integrated and discrete USB4 controllers to behave well with all
> the existing drivers that rely upon a hint of how they're connected to
> configure devices differently.
> 
> If y'all collectively prefer this direction to not refactor is_thunderbolt
> and push into quirks, a simpler version of this series would be to leave all
> the quirks in place, just drop dev->is_thunderbolt, and set
> dev->external_facing on all 3 cases:
> 
> * Intel TBT controller
> * USB4 integrated PCIe tunneling root port/XHCI tunneling root port
> * USB4 disctete PCIe tunneling root port/XHCI tunneling root port
> 
> All the other drivers and symbols can stay the same then.

If I understand correctly the original intention of this patch series is
to be able to differentiate whether the device is "permanently"
connected to the motherboard, or it is connected over some hot-pluggable
bus (PCIe, USB, USB4 for example but I'm sure there are other buses that
fit into this picture too). Specifically this is needed for discrete
GPUs because of power management differences or so (please correct me if
I'm mistaken).

If we set the is_thunderbolt debate aside and concentrate on that issue,
I think the way to do this is to check whether the root port the GPU is
connected to has an ACPI power resource (returned from _PR3() method).
IF it is present then most likely the platform has provided all the
necessary wiring to move the GPU into D3cold (and the BIOS knows this).
If it is not present then the device cannot even go into D3cold as there
is not means to power of the device in PCIe spec.

Perhaps we can simply use pci_pr3_present() here as nouveau is already
doing? Granted it is not too elegant solution either but better than
using is_thunderbolt IMHO. Since this seem to be common for many GPUs,
perhaps we can have a helper in DRM core that handles this.

Then going back to is_thunderbolt debate :) I really don't think the
drivers should care whether they are connected over a tunnel or not.
They should work regardless of the underlying transport of the native
protocol. They should also be prepared for the fact that the hardware
can vanish under them at any point (e.g user unplugs the device). For
this reason I don't really like to see is_thunderbolt to be used more
and prefer to get rid if it completely if possible at all. If there is
still need to differentiate whether the device can be hot-removed or
not, I think "removable" in the driver core is the way to go. That is
not dependent on any single transport.


Re: [PATCH v3 05/12] PCI: Detect root port of internal USB4 devices by `usb4-host-interface`

2022-02-14 Thread Mika Westerberg
On Mon, Feb 14, 2022 at 01:11:05PM +0200, Mika Westerberg wrote:
> > > It is used to identify "tunneled" ports (whether PCIe, USB 3.x or
> > > DisplayPort). Tunnels are created by software (in Linux it is the
> > > Thunderbolt driver) and are dynamic in nature. The USB4 links go over
> > > USB Type-C cable which also is something user can plug/unplug freely.
> > > 
> > > I would say it is reasonable expectation that anything behind these
> > > ports can be assumed as "removable".
> > 
> > USB gadgets may be soldered to the mainboard.  Those cannot be
> > unplugged freely.  It is common practice to solder USB Ethernet
> > or USB FTDI serial ports and nothing's preventing a vendor to solder
> > USB4/Thunderbolt gadgets.
> 
> Right, that's why I say it is "reasonable expectation" that anything
> behind these ports can be assumed "removable" :) Of course they don't
> have to be but if we assume that in the driver where this actually
> matters we should be on the safe side, no?

Also the tunnels are not permanent anyway.


Re: [PATCH v3 05/12] PCI: Detect root port of internal USB4 devices by `usb4-host-interface`

2022-02-14 Thread Mika Westerberg
On Mon, Feb 14, 2022 at 09:52:02AM +0100, Lukas Wunner wrote:
> On Mon, Feb 14, 2022 at 09:34:26AM +0200, Mika Westerberg wrote:
> > On Fri, Feb 11, 2022 at 03:45:46PM -0600, Bjorn Helgaas wrote:
> > > My expectation is that "USB" (like "PCI" and "PCIe") tells me
> > > something about how a device is electrically connected and how
> > > software can operate it.  It doesn't really tell me anything about
> > > whether those electrical connections are permanent, made through an
> > > internal slot, or made through an external connector and cable.
> > 
> > It is used to identify "tunneled" ports (whether PCIe, USB 3.x or
> > DisplayPort). Tunnels are created by software (in Linux it is the
> > Thunderbolt driver) and are dynamic in nature. The USB4 links go over
> > USB Type-C cable which also is something user can plug/unplug freely.
> > 
> > I would say it is reasonable expectation that anything behind these
> > ports can be assumed as "removable".
> 
> USB gadgets may be soldered to the mainboard.  Those cannot be
> unplugged freely.  It is common practice to solder USB Ethernet
> or USB FTDI serial ports and nothing's preventing a vendor to solder
> USB4/Thunderbolt gadgets.

Right, that's why I say it is "reasonable expectation" that anything
behind these ports can be assumed "removable" :) Of course they don't
have to be but if we assume that in the driver where this actually
matters we should be on the safe side, no?


Re: [PATCH v3 05/12] PCI: Detect root port of internal USB4 devices by `usb4-host-interface`

2022-02-13 Thread Mika Westerberg
Hi Bjorn,

On Fri, Feb 11, 2022 at 03:45:46PM -0600, Bjorn Helgaas wrote:
> On Fri, Feb 11, 2022 at 01:32:43PM -0600, Mario Limonciello wrote:
> > The root port used for PCIe tunneling should be marked as removable to
> > ensure that the entire chain is marked removable.
> > 
> > This can be done by looking for the device property specified in
> > the ACPI tables `usb4-host-interface`.
> > 
> > Suggested-by: Mika Westerberg 
> > Link: 
> > https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#mapping-native-protocols-pcie-displayport-tunneled-through-usb4-to-usb4-host-routers
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/pci/pci-acpi.c | 10 ++
> >  drivers/pci/pci.h  |  5 +
> >  drivers/pci/probe.c|  1 +
> >  3 files changed, 16 insertions(+)
> > 
> > diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> > index a42dbf448860..6368e5633b1b 100644
> > --- a/drivers/pci/pci-acpi.c
> > +++ b/drivers/pci/pci-acpi.c
> > @@ -1391,6 +1391,16 @@ void pci_acpi_cleanup(struct device *dev, struct 
> > acpi_device *adev)
> > }
> >  }
> >  
> > +bool pci_acpi_is_usb4(struct pci_dev *dev)
> > +{
> > +   struct acpi_device *adev = ACPI_COMPANION(>dev);
> > +
> > +   if (!adev)
> > +   return false;
> > +   return fwnode_property_present(acpi_fwnode_handle(adev),
> > +  "usb4-host-interface");
> 
> Maybe it's obvious to everybody but me that "USB4" means this device
> is removable.  The Microsoft reference above doesn't say anything
> about removability.
> 
> My expectation is that "USB" (like "PCI" and "PCIe") tells me
> something about how a device is electrically connected and how
> software can operate it.  It doesn't really tell me anything about
> whether those electrical connections are permanent, made through an
> internal slot, or made through an external connector and cable.

It is used to identify "tunneled" ports (whether PCIe, USB 3.x or
DisplayPort). Tunnels are created by software (in Linux it is the
Thunderbolt driver) and are dynamic in nature. The USB4 links go over
USB Type-C cable which also is something user can plug/unplug freely.

I would say it is reasonable expectation that anything behind these
ports can be assumed as "removable".


Re: [PATCH v3 04/12] PCI: Drop the `is_thunderbolt` attribute from PCI core

2022-02-13 Thread Mika Westerberg
Hi Mario,

On Sun, Feb 13, 2022 at 11:26:56AM -0600, Limonciello, Mario wrote:
> On 2/13/2022 02:20, Lukas Wunner wrote:
> > On Fri, Feb 11, 2022 at 01:32:42PM -0600, Mario Limonciello wrote:
> > > The `is_thunderbolt` attribute is currently a dumping ground for a
> > > variety of things.
> > 
> > It's not as arbitrary as it may seem.  Quite a bit of thought went into
> > the current design.
> > 
> > 
> > > Instead use the driver core removable attribute to indicate the
> > > detail a device is attached to a thunderbolt or USB4 chain.
> > 
> > You're missing the point that "is_thunderbolt" is set on the *controller*
> > (i.e. its upstream and downstream ports).
> > 
> > The controller itself is *not* removable if it's the host controller.
> > 
> > However a device can be assumed to be removable if it has an ancestor
> > which has the "is_thunderbolt" flag set.
> > 
> 
> Ah right... I wonder if really what this series should be about then is
> setting up the the PCIe endpoints for PCIe tunneling and XHCI tunneling to
> be marked as "external" instead then.  It would mean that existing code will
> apply the removable attribute to everything downstream (and presumably at
> least some of those drivers it will continue to make sense to drop
> "pcie_is_thunderbolt_attached" and instead check dev_is_removable.

Yes, I think this is the right thing to do. Anything connected over
PCIe/USB 3.x tunnel is pretty much "removable" whereas the host
controllers may or may not. Typically they are not.


Re: [PATCH v3 03/12] PCI: Move check for old Apple Thunderbolt controllers into a quirk

2022-02-13 Thread Mika Westerberg
Hi Lukas,

On Sun, Feb 13, 2022 at 10:19:20AM +0100, Lukas Wunner wrote:
> On Fri, Feb 11, 2022 at 01:32:41PM -0600, Mario Limonciello wrote:
> > `pci_bridge_d3_possible` currently checks explicitly for a Thunderbolt
> > controller to indicate that D3 is possible.  As this is used solely
> > for older Apple systems, move it into a quirk that enumerates across
> > all Intel TBT controllers.
> 
> I'm not so sure if it is only needed on Apple systems.
> 
> 
> > @@ -2954,10 +2960,6 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > if (pci_bridge_d3_force)
> > return true;
> >  
> > -   /* Even the oldest 2010 Thunderbolt controller supports D3. */
> > -   if (bridge->is_thunderbolt)
> > -   return true;
> > -
> > /* Platform might know better if the bridge supports D3 */
> > if (platform_pci_bridge_d3(bridge))
> > return true;
> 
> The fact that Thunderbolt PCIe ports support D3 is a property of those
> devices.  It's not a property of the platform or a quirk of a particular
> vendor.

It is in fact platform specific. For instance the non-Apple systems
without "HotPlugSupportInD3" property have not been wired to support
entering/exiting D3 runtime so putting these ports into D3 may actually
lead into problems as we are in non-validated territory.

> Hence in my view the current location of the check (pci_bridge_d3_possible())
> makes sense wheras the location you're moving it to does not.
> 
> 
> > +/* Apple machines as old as 2010 can do D3 with Thunderbolt controllers, 
> > but don't specify
> > + * it in the ACPI tables
> > + */
> 
> Apple started shipping Thunderbolt in 2011.
> Intel brought the first chips to market in 2010.
> 
> The date is meaningful at the code's current location in
> pci_bridge_d3_possible() because a few lines further down
> there's a 2015 BIOS cut-off date.
> 
> Microsoft came up with an ACPI property that BIOS vendors may set
> so that Windows knows it may put a Thunderbolt controller into D3cold.
> I'm not even sure if that property was ever officially adopted by the
> ACPI spec or if it's just a Microsoft-defined "standard".

AFAIK It was invented by Intel Windows folks but Microsoft "documented"
it. We use the same property in ChromeOS (and therefore Linux) too so it
can be thought of as "de facto" way of declaring such port. 


Re: [PATCH v3 03/12] PCI: Move check for old Apple Thunderbolt controllers into a quirk

2022-02-13 Thread Mika Westerberg
Hi,

On Fri, Feb 11, 2022 at 04:06:20PM -0600, Limonciello, Mario wrote:
> On 2/11/2022 15:35, Bjorn Helgaas wrote:
> > On Fri, Feb 11, 2022 at 01:32:41PM -0600, Mario Limonciello wrote:
> > > `pci_bridge_d3_possible` currently checks explicitly for a Thunderbolt
> > > controller to indicate that D3 is possible.  As this is used solely
> > > for older Apple systems, move it into a quirk that enumerates across
> > > all Intel TBT controllers.
> > > 
> > > Suggested-by: Mika Westerberg 
> > > Signed-off-by: Mario Limonciello 
> > > ---
> > >   drivers/pci/pci.c| 12 +-
> > >   drivers/pci/quirks.c | 53 
> > >   2 files changed, 60 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 9ecce435fb3f..5002e214c9a6 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1064,7 +1064,13 @@ static inline bool platform_pci_bridge_d3(struct 
> > > pci_dev *dev)
> > >   if (pci_use_mid_pm())
> > >   return false;
> > > - return acpi_pci_bridge_d3(dev);
> > > + if (acpi_pci_bridge_d3(dev))
> > > + return true;
> > > +
> > > + if (device_property_read_bool(>dev, "HotPlugSupportInD3"))
> > > + return true;
> > 
> > Why do we need this?  acpi_pci_bridge_d3() already looks for
> > "HotPlugSupportInD3".
> 
> The Apple machines don't have ACPI companion devices that specify this
> property.
> 
> I guess this probes a different question; can `device_property_read_bool` be
> used in `acpi_pci_bridge_d3` instead of:
> 
>   if (acpi_dev_get_property(adev, "HotPlugSupportInD3",
>  ACPI_TYPE_INTEGER, ) < 0)
>   return false;
> 
>   return obj->integer.value == 1;
> 
> If so, then yeah this can probably be simplified.

Unfortunately the code in acpi_pci_bridge_d3() expects the device to
have an ACPI_COMPANION() which may not be the case with software nodes.

> > 
> > > + return false;
> > >   }
> > >   /**
> > > @@ -2954,10 +2960,6 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > >   if (pci_bridge_d3_force)
> > >   return true;
> > > - /* Even the oldest 2010 Thunderbolt controller supports D3. */
> > > - if (bridge->is_thunderbolt)
> > > - return true;
> > > -
> > >   /* Platform might know better if the bridge supports D3 
> > > */
> > >   if (platform_pci_bridge_d3(bridge))
> > >   return true;
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6d3c88edde00..aaf098ca7d54 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -3756,6 +3756,59 @@ DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
> > >  quirk_apple_poweroff_thunderbolt);
> > >   #endif
> > > +/* Apple machines as old as 2010 can do D3 with Thunderbolt controllers, 
> > > but don't specify
> > > + * it in the ACPI tables
> > 
> > Wrap to fit in 80 columns like the rest of the file.  Also use the:
> > 
> >/*
> > * comment ...
> > */
> > 
> > style if it's more than one line.
> > 
> > I don't think "as old as 2010" is helpful here -- I assume 2010 is
> > there because there *were* no Thunderbolt controllers before 2010, but
> > the code doesn't check any dates, so we basically assume all Apple
> > machines of any age with the listed controllers can do this.
> 
> The old comment was saying that, which is where I got it from.  Yeah, I'll
> update it.
> 
> > 
> > > + */
> > > +static void quirk_apple_d3_thunderbolt(struct pci_dev *dev)
> > > +{
> > > + struct property_entry properties[] = {
> > > + PROPERTY_ENTRY_BOOL("HotPlugSupportInD3"),
> > > + {},
> > > + };
> > > +
> > > + if (!x86_apple_machine)
> > > + return;
> > 
> > The current code doesn't check x86_apple_machine, so this needs some
> > justification.  How do I know this works the same as before?
> 
> Mika and Lucas were saying the only reason for this codepath was Apple
> machines in the first place, which is where this idea came from.

Yes, t

Re: [PATCH v2 3/9] PCI: drop `is_thunderbolt` attribute

2022-02-13 Thread Mika Westerberg
Hi,

On Sun, Feb 13, 2022 at 09:39:28AM +0100, Lukas Wunner wrote:
> On Fri, Feb 11, 2022 at 12:23:51PM +0200, Mika Westerberg wrote:
> > On Thu, Feb 10, 2022 at 04:43:23PM -0600, Mario Limonciello wrote:
> > > @@ -2955,7 +2955,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
> > >   return true;
> > >  
> > >   /* Even the oldest 2010 Thunderbolt controller supports D3. */
> > > - if (bridge->is_thunderbolt)
> > > + if (dev_is_removable(>dev))
> > 
> > For this, I'm not entirely sure this is what we want. The purpose of
> > this check is to enable port power management of Apple systems with
> > Intel Thunderbolt controller and therefore checking for "removable" here
> > is kind of misleading IMHO.
> [...]
> > and then make a quirk in quirks.c that adds the software node property
> > for the Apple systems? Or something along those lines.
> 
> Honestly, that feels wrong to me.
> 
> There are non-Apple products with Thunderbolt controllers,
> e.g. Supermicro X10SAT was a Xeon board with Redwood Ridge
> which was introduced in 2013.  This was way before Microsoft
> came up with the HotPlugSupportInD3 property.  It was also way
> before the 2015 BIOS cut-off date that we use to disable
> power management on older boards.
> 
> Still, we currently whitelist the Thunderbolt ports on that
> board for D3 because we know it works.  What if products like
> this one use their own power management scheme and we'd cause
> a power regression if we needlessly disable D3 for them now?

All the non-Apple Thunderbolt products before "HotPlugSupportInD3" use
ACPI "assisted" hotplug which means all the PM is done in the BIOS.
Essentially it means the controller is only present if there is anything
connected and in that case it is always in D0. Unplugging the device
makes the controller to be hot-removed (ACPI hotplug) too and that's the
only way early Thunderbolt used to save energy.


Re: [PATCH v2 3/9] PCI: drop `is_thunderbolt` attribute

2022-02-11 Thread Mika Westerberg
Hi Mario,

On Thu, Feb 10, 2022 at 04:43:23PM -0600, Mario Limonciello wrote:
> The `is_thunderbolt` attribute is currently a dumping ground for a
> variety of things.
> 
> Instead use the driver core removable attribute to indicate the
> detail a device is attached to a thunderbolt or USB4 chain.
> 
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/pci/pci.c |  2 +-
>  drivers/pci/probe.c   | 20 +++-
>  drivers/platform/x86/apple-gmux.c |  2 +-
>  include/linux/pci.h   |  5 ++---
>  4 files changed, 11 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9ecce435fb3f..1264984d5e6d 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2955,7 +2955,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
>   return true;
>  
>   /* Even the oldest 2010 Thunderbolt controller supports D3. */
> - if (bridge->is_thunderbolt)
> + if (dev_is_removable(>dev))

For this, I'm not entirely sure this is what we want. The purpose of
this check is to enable port power management of Apple systems with
Intel Thunderbolt controller and therefore checking for "removable" here
is kind of misleading IMHO.

I wonder if we could instead remove the check completely here and rely
on the below:

if (platform_pci_bridge_d3(bridge))
return true;

and that would then look like:

static inline bool platform_pci_bridge_d3(struct pci_dev *dev)
{
if (pci_use_mid_pm())
return false;

if (acpi_pci_bridge_d3(dev))
return true;

if (device_property_read_bool(>dev, "HotPlugSupportInD3"))
return true;

return false;
}

and then make a quirk in quirks.c that adds the software node property
for the Apple systems? Or something along those lines.

@Lukas, what do you think?


Re: [PATCH v2 4/9] PCI: mark USB4 devices as removable

2022-02-11 Thread Mika Westerberg
Hi Mario,

On Thu, Feb 10, 2022 at 04:43:24PM -0600, Mario Limonciello wrote:
> USB4 class devices are also removable like Intel Thunderbolt devices.
> 
> Drivers of downstream devices use this information to declare functional
> differences in how the drivers perform by knowing that they are connected
> to an upstream TBT/USB4 port.

This may not be covering the integrated controllers. For discrete, yes
but if it is the PCIe root ports that start the PCIe topology (over the
PCIe tunnels) this does not work.

For integrated we have the "usb4-host-interface" ACPI property that
tells this for each port:

https://docs.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#mapping-native-protocols-pcie-displayport-tunneled-through-usb4-to-usb4-host-routers

and for discrete there is the PCIe DVSEC that can be used (see the USB4
spec archive it includes the "USB4 DVSEC Version 1.0.pdf" that has more
information). I would expect AMD controller (assuming it is discrete)
implements this too.

So I'm proposing that we mark the devices that are below  PCIe ports
(root, downstream) that fall in the above categories as "removable".
This is then not dependent on checking the USB4 controller and how it is
setup in a particular system.


Re: [PATCH v2 1/9] thunderbolt: move definition of PCI_CLASS_SERIAL_USB_USB4

2022-02-11 Thread Mika Westerberg
On Thu, Feb 10, 2022 at 04:43:21PM -0600, Mario Limonciello wrote:
> This PCI class definition of the USB4 device is currently located only in
> the thunderbolt driver.
> 
> It will be needed by a few other drivers for upcoming changes. Move it into
> the common include file.
> 
> Acked-by: Alex Deucher 
> Signed-off-by: Mario Limonciello 

Acked-by: Mika Westerberg 


Re: linux-next: manual merge of the drm tree with the pci tree

2020-12-09 Thread Mika Westerberg
Hi,

On Tue, Dec 08, 2020 at 01:27:54PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the drm tree got a conflict in:
> 
>   drivers/gpu/vga/vga_switcheroo.c
> 
> between commit:
> 
>   99efde6c9bb7 ("PCI/PM: Rename pci_wakeup_bus() to pci_resume_bus()")
> 
> from the pci tree and commit:
> 
>   9572e6693cd7 ("vga_switcheroo: simplify the return expression of 
> vga_switcheroo_runtime_resume")
> 
> from the drm tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Thanks for the fix Stephen! Looks correct to me.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-26 Thread Mika Westerberg
On Thu, Jul 23, 2020 at 10:30:58PM +0200, Karol Herbst wrote:
> On Wed, Jul 22, 2020 at 11:25 AM Mika Westerberg
>  wrote:
> >
> > On Tue, Jul 21, 2020 at 01:37:12PM -0500, Patrick Volkerding wrote:
> > > On 7/21/20 10:27 AM, Mika Westerberg wrote:
> > > > On Tue, Jul 21, 2020 at 11:01:55AM -0400, Lyude Paul wrote:
> > > >> Sure thing. Also, feel free to let me know if you'd like access to one 
> > > >> of the
> > > >> systems we saw breaking with this patch - I'm fairly sure I've got one 
> > > >> of them
> > > >> locally at my apartment and don't mind setting up AMT/KVM/SSH
> > > > Probably no need for remote access (thanks for the offer, though). I
> > > > attached a test patch to the bug report:
> > > >
> > > >   https://bugzilla.kernel.org/show_bug.cgi?id=208597
> > > >
> > > > that tries to work it around (based on the ->pm_cap == 0). I wonder if
> > > > anyone would have time to try it out.
> > >
> > >
> > > Hi Mika,
> > >
> > > I can confirm that this patch applied to 5.4.52 fixes the issue with
> > > hybrid graphics on the Thinkpad X1 Extreme gen2.
> >
> > Great, thanks for testing!
> >
> 
> yeah, works on the P1G2 as well.

Thanks for testing!

Since we have the revert queued for this release cycle, I think I will
send an updated version of "PCI/PM: Assume ports without DLL Link Active
train links in 100 ms" after v5.9-rc1 is released that has this
workaround in place.

(I'm continuing my vacation so will be offline next week).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-23 Thread Mika Westerberg
On Tue, Jul 21, 2020 at 02:24:19PM -0400, Lyude Paul wrote:
> On Tue, 2020-07-21 at 12:00 -0400, Lyude Paul wrote:
> > On Tue, 2020-07-21 at 18:27 +0300, Mika Westerberg wrote:
> > > On Tue, Jul 21, 2020 at 11:01:55AM -0400, Lyude Paul wrote:
> > > > Sure thing. Also, feel free to let me know if you'd like access to one
> > > > of
> > > > the
> > > > systems we saw breaking with this patch - I'm fairly sure I've got one
> > > > of
> > > > them
> > > > locally at my apartment and don't mind setting up AMT/KVM/SSH
> > > 
> > > Probably no need for remote access (thanks for the offer, though). I
> > > attached a test patch to the bug report:
> > > 
> > >   https://bugzilla.kernel.org/show_bug.cgi?id=208597
> > > 
> > > that tries to work it around (based on the ->pm_cap == 0). I wonder if
> > > anyone would have time to try it out.
> > 
> > Will give it a shot today and let you know the result
> 
> Ahh-actually, I thought the laptop I had locally could reproduce this bug but
> that doesn't appear to be the case whoops. Karol Herbst still has access to a
> machine that can test this though, so they'll likely get to trying the patch
> today or tommorrow

OK sounds good :)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-23 Thread Mika Westerberg
On Tue, Jul 21, 2020 at 01:37:12PM -0500, Patrick Volkerding wrote:
> On 7/21/20 10:27 AM, Mika Westerberg wrote:
> > On Tue, Jul 21, 2020 at 11:01:55AM -0400, Lyude Paul wrote:
> >> Sure thing. Also, feel free to let me know if you'd like access to one of 
> >> the
> >> systems we saw breaking with this patch - I'm fairly sure I've got one of 
> >> them
> >> locally at my apartment and don't mind setting up AMT/KVM/SSH
> > Probably no need for remote access (thanks for the offer, though). I
> > attached a test patch to the bug report:
> >
> >   https://bugzilla.kernel.org/show_bug.cgi?id=208597
> >
> > that tries to work it around (based on the ->pm_cap == 0). I wonder if
> > anyone would have time to try it out.
> 
> 
> Hi Mika,
> 
> I can confirm that this patch applied to 5.4.52 fixes the issue with
> hybrid graphics on the Thinkpad X1 Extreme gen2.

Great, thanks for testing!
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-22 Thread Mika Westerberg
On Tue, Jul 21, 2020 at 11:01:55AM -0400, Lyude Paul wrote:
> Sure thing. Also, feel free to let me know if you'd like access to one of the
> systems we saw breaking with this patch - I'm fairly sure I've got one of them
> locally at my apartment and don't mind setting up AMT/KVM/SSH

Probably no need for remote access (thanks for the offer, though). I
attached a test patch to the bug report:

  https://bugzilla.kernel.org/show_bug.cgi?id=208597

that tries to work it around (based on the ->pm_cap == 0). I wonder if
anyone would have time to try it out.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-21 Thread Mika Westerberg
On Fri, Jul 17, 2020 at 09:52:09PM +0200, Lukas Wunner wrote:
> On Fri, Jul 17, 2020 at 03:04:10PM -0400, Lyude Paul wrote:
> > Isn't it possible to tell whether a PCI device is connected through
> > thunderbolt or not? We could probably get away with just defaulting
> > to 100ms for thunderbolt devices without DLL Link Active specified,
> > and then default to the old delay value for non-thunderbolt devices.
> 
> pci_is_thunderbolt_attached()

That only works with some devices. I think we should try to keep the
fact that some PCIe links may be tunneled over TBT/USB4 transparent to
the PCI core and try to treat them as "standard" PCIe links if possible
at all :)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau regression with 5.7 caused by "PCI/PM: Assume ports without DLL Link Active train links in 100 ms"

2020-07-21 Thread Mika Westerberg
Hi,

[Sorry for the delay, I was on vacation]

On Fri, Jul 17, 2020 at 01:32:10PM +0200, Karol Herbst wrote:
> Filed at https://bugzilla.kernel.org/show_bug.cgi?id=208597

Thanks for reporting.

I'll check your logs and try to figure if there is something we can do
to make both nouveau and TBT working at the same time.

> oddly enough I wasn't able to reproduce it on my XPS 9560, will ping
> once something breaks.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2020-03-05 Thread Mika Westerberg
On Thu, Mar 05, 2020 at 05:11:57PM +0100, Karol Herbst wrote:
> On Wed, Mar 4, 2020 at 10:33 AM Mika Westerberg
>  wrote:
> >
> > Hi,
> >
> > On Tue, Mar 03, 2020 at 11:10:52AM +0100, Karol Herbst wrote:
> > > Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher 
> > > device
> > > states.
> >
> > I think it is good to explain bit more here why this fix is needed.
> >
> 
> is something like this fine?
> 
> Fixes the infamous 'runpm' bug many users are facing on Laptops with Nvidia
> Pascal GPUs by skipping PCI power state changes on the GPU.

I would open up 'runpm' -> runtime PM.

> It's still unknown why this issue exists, but this is a reliable workaround
> and solves a very annoying issue for user having to choose between a
> crashing kernel or higher power consumption of their Laptops.

Also I think it would be good to include a short log snippet how this
manifests and in what kind of scenario it happens.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2020-03-04 Thread Mika Westerberg
Hi,

On Tue, Mar 03, 2020 at 11:10:52AM +0100, Karol Herbst wrote:
> Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher device
> states.

I think it is good to explain bit more here why this fix is needed.

> v2: convert to pci_dev quirk
> put a proper technical explanation of the issue as a in-code comment
> v3: disable it only for certain combinations of intel and nvidia hardware
> v4: simplify quirk by setting flag on the GPU itself
> v5: restructure quirk to make it easier to add new IDs
> fix whitespace issues
> fix potential NULL pointer access
> update the quirk documentation
> v6: move quirk into nouveau

This information typically goes under the '---' line.

> Signed-off-by: Karol Herbst 
> Cc: Bjorn Helgaas 
> Cc: Lyude Paul 
> Cc: Rafael J. Wysocki 
> Cc: Mika Westerberg 

I have few minor comments but regardless,

Reviewed-by: Mika Westerberg 

> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205623
> ---
>  drivers/gpu/drm/nouveau/nouveau_drm.c | 56 +++
>  drivers/pci/pci.c |  8 
>  include/linux/pci.h   |  1 +
>  3 files changed, 65 insertions(+)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
> b/drivers/gpu/drm/nouveau/nouveau_drm.c
> index 2cd83849600f..51d3a7ba7731 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
> @@ -618,6 +618,59 @@ nouveau_drm_device_fini(struct drm_device *dev)
>   kfree(drm);
>  }
>  
> +/*
> + * On some Intel PCIe bridge controllers doing a
> + * D0 -> D3hot -> D3cold -> D0 sequence causes Nvidia GPUs to not reappear.
> + * Skipping the intermediate D3hot step seems to make it work again. Thise is
^
Thise -> This

> + * probably caused by not meeting the expectation the involved AML code has
> + * when the GPU is put into D3hot state before invoking it.
> + *
> + * This leads to various manifestations of this issue:
> + *  - AML code execution to power on the GPU hits an infinite loop (as the
> + *code waits on device memory to change).
> + *  - kernel crashes, as all PCI reads return -1, which most code isn't able
> + *to handle well enough.
> + *
> + * In all cases dmesg will contain at least one line like this:
> + * 'nouveau :01:00.0: Refused to change power state, currently in D3'
> + * followed by a lot of nouveau timeouts.
> + *
> + * In the \_SB.PCI0.PEG0.PG00._OFF code deeper down writes bit 0x80 to the 
> not
> + * documented PCI config space register 0x248 of the Intel PCIe bridge
> + * controller (0x1901) in order to change the state of the PCIe link between
> + * the PCIe port and the GPU. There are alternative code paths using other
> + * registers, which seem to work fine (executed pre Windows 8):
> + *  - 0xbc bit 0x20 (publicly available documentation claims 'reserved')
> + *  - 0xb0 bit 0x10 (link disable)
> + * Changing the conditions inside the firmware by poking into the relevant
> + * addresses does resolve the issue, but it seemed to be ACPI private memory
> + * and not any device accessible memory at all, so there is no portable way 
> of
> + * changing the conditions.
> + * On a XPS 9560 that means bits [0,3] on \CPEX need to be cleared.
> + *
> + * The only systems where this behavior can be seen are hybrid graphics 
> laptops
> + * with a secondary Nvidia Maxwell, Pascal or Turing GPU. Its unclear 
> wheather
 ^^^ 

Its -> It's
wheather -> whether

> + * this issue only occurs in combination with listed Intel PCIe bridge
> + * controllers and the mentioned GPUs or other devices as well.
> + *
> + * documentation on the PCIe bridge controller can be found in the
> + * "7th Generation Intel® Processor Families for H Platforms Datasheet 
> Volume 2"
> + * Section "12 PCI Express* Controller (x16) Registers"
> + */
> +
> +static void quirk_broken_nv_runpm(struct pci_dev *dev)
> +{
> + struct pci_dev *bridge = pci_upstream_bridge(dev);
> +
> + if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL)
> + return;
> +
> + switch (bridge->device) {
> + case 0x1901:
> + dev->parent_d3cold = 1;

I think it is better to add

break;

here.

> + }
> +}
> +
>  static int nouveau_drm_probe(struct pci_dev *pdev,
>const struct pci_device_id *pent)
>  {
> @@ -699,6 +752,7 @@ s

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-27 Thread Mika Westerberg
On Tue, Nov 26, 2019 at 06:10:36PM -0500, Lyude Paul wrote:
> Hey-this is almost certainly not the right place in this thread to respond,
> but this thread has gotten so deep evolution can't push the subject further to
> the right, heh. So I'll just respond here.

:)

> I've been following this and helping out Karol with testing here and there.
> They had me test Bjorn's PCI branch on the X1 Extreme 2nd generation, which
> has a turing GPU and 8086:1901 PCI bridge.
> 
> I was about to say "the patch fixed things, hooray!" but it seems that after
> trying runtime suspend/resume a couple times things fall apart again:

You mean $subject patch, no?

> [  686.883247] nouveau :01:00.0: DRM: suspending object tree...
> [  752.866484] ACPI Error: Aborting method \_SB.PCI0.PEG0.PEGP.NVPO due to 
> previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> [  752.866508] ACPI Error: Aborting method \_SB.PCI0.PGON due to previous 
> error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)
> [  752.866521] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to 
> previous error (AE_AML_LOOP_TIMEOUT) (20190816/psparse-529)

This is probably the culprit. The same AML code fails to properly turn
on the device.

Is acpidump from this system available somewhere?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Mika Westerberg
On Fri, Nov 22, 2019 at 12:30:20PM +0100, Rafael J. Wysocki wrote:
> On Fri, Nov 22, 2019 at 11:36 AM Mika Westerberg
>  wrote:
> >
> > On Thu, Nov 21, 2019 at 11:39:23PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 8:49 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki wrote:
> > > > > On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
> > > > >  wrote:
> > > > > >
> > > > > > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > > > > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki 
> > > > > > > > > wrote:
> > > > > > > > > > > last week or so I found systems where the GPU was under 
> > > > > > > > > > > the "PCI
> > > > > > > > > > > Express Root Port" (name from lspci) and on those systems 
> > > > > > > > > > > all of that
> > > > > > > > > > > seems to work. So I am wondering if it's indeed just the 
> > > > > > > > > > > 0x1901 one,
> > > > > > > > > > > which also explains Mikas case that Thunderbolt stuff 
> > > > > > > > > > > works as devices
> > > > > > > > > > > never get populated under this particular bridge 
> > > > > > > > > > > controller, but under
> > > > > > > > > > > those "Root Port"s
> > > > > > > > > >
> > > > > > > > > > It always is a PCIe port, but its location within the SoC 
> > > > > > > > > > may matter.
> > > > > > > > >
> > > > > > > > > Exactly. Intel hardware has PCIe ports on CPU side (these are 
> > > > > > > > > called
> > > > > > > > > PEG, PCI Express Graphics, ports), and the PCH side. I think 
> > > > > > > > > the IP is
> > > > > > > > > still the same.
> > > > > > > > >
> > > > > > > > > > Also some custom AML-based power management is involved and 
> > > > > > > > > > that may
> > > > > > > > > > be making specific assumptions on the configuration of the 
> > > > > > > > > > SoC and the
> > > > > > > > > > GPU at the time of its invocation which unfortunately are 
> > > > > > > > > > not known to
> > > > > > > > > > us.
> > > > > > > > > >
> > > > > > > > > > However, it looks like the AML invoked to power down the 
> > > > > > > > > > GPU from
> > > > > > > > > > acpi_pci_set_power_state() gets confused if it is not in 
> > > > > > > > > > PCI D0 at
> > > > > > > > > > that point, so it looks like that AML tries to access 
> > > > > > > > > > device memory on
> > > > > > > > > > the GPU (beyond the PCI config space) or similar which is 
> > > > > > > > > > not
> > > > > > > > > > accessible in PCI power states below D0.
> > > > > > > > >
> > > > > > > > > Or the PCI config space of the GPU when the parent root port 
> > > > > > > > > is in D3hot
> > > > > > > > > (as it is the case here). Also then the GPU config space is 
> > > > > > > > > not
> > > > > > > > > accessible.
> > > > > > > >
> > > > > > > > Why would the parent port be in D3hot at that point?  Wouldn't 
> > > > > > > > that be
> > > > > > > > a suspend ordering violation?
> > > > > > >
> > > > > > > No. We put the GPU into D3hot first,
> > > > >
> > > > >

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Mika Westerberg
On Thu, Nov 21, 2019 at 11:39:23PM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 21, 2019 at 8:49 PM Mika Westerberg
>  wrote:
> >
> > On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki wrote:
> > > On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > > > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > > last week or so I found systems where the GPU was under the 
> > > > > > > > > "PCI
> > > > > > > > > Express Root Port" (name from lspci) and on those systems all 
> > > > > > > > > of that
> > > > > > > > > seems to work. So I am wondering if it's indeed just the 
> > > > > > > > > 0x1901 one,
> > > > > > > > > which also explains Mikas case that Thunderbolt stuff works 
> > > > > > > > > as devices
> > > > > > > > > never get populated under this particular bridge controller, 
> > > > > > > > > but under
> > > > > > > > > those "Root Port"s
> > > > > > > >
> > > > > > > > It always is a PCIe port, but its location within the SoC may 
> > > > > > > > matter.
> > > > > > >
> > > > > > > Exactly. Intel hardware has PCIe ports on CPU side (these are 
> > > > > > > called
> > > > > > > PEG, PCI Express Graphics, ports), and the PCH side. I think the 
> > > > > > > IP is
> > > > > > > still the same.
> > > > > > >
> > > > > > > > Also some custom AML-based power management is involved and 
> > > > > > > > that may
> > > > > > > > be making specific assumptions on the configuration of the SoC 
> > > > > > > > and the
> > > > > > > > GPU at the time of its invocation which unfortunately are not 
> > > > > > > > known to
> > > > > > > > us.
> > > > > > > >
> > > > > > > > However, it looks like the AML invoked to power down the GPU 
> > > > > > > > from
> > > > > > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 
> > > > > > > > at
> > > > > > > > that point, so it looks like that AML tries to access device 
> > > > > > > > memory on
> > > > > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > > > > accessible in PCI power states below D0.
> > > > > > >
> > > > > > > Or the PCI config space of the GPU when the parent root port is 
> > > > > > > in D3hot
> > > > > > > (as it is the case here). Also then the GPU config space is not
> > > > > > > accessible.
> > > > > >
> > > > > > Why would the parent port be in D3hot at that point?  Wouldn't that 
> > > > > > be
> > > > > > a suspend ordering violation?
> > > > >
> > > > > No. We put the GPU into D3hot first,
> > >
> > > OK
> > >
> > > Does this involve any AML, like a _PS3 under the GPU object?
> >
> > I don't see _PS3 (nor _PS0) for that object. If I read it right the GPU
> > itself is not described in ACPI tables at all.
> 
> OK
> 
> > > > > then the root port and then turn
> > > > > off the power resource (which is attached to the root port) resulting
> > > > > the topology entering D3cold.
> > > >
> > > > I don't see that happening in the AML though.
> > >
> > > Which AML do you mean, specifically?  The _OFF method for the root
> > > port's _PR3 power resource or something else?
> >
> > The root port's _OFF method for the power resource returned by its _PR3.
> 
> OK, so without the $subject patch we (1) program the downstream
> component (GPU) into D3hot, then we 

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Mika Westerberg
On Thu, Nov 21, 2019 at 04:43:24PM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 21, 2019 at 1:52 PM Mika Westerberg
>  wrote:
> >
> > On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> > > On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > > > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > > > last week or so I found systems where the GPU was under the "PCI
> > > > > > > Express Root Port" (name from lspci) and on those systems all of 
> > > > > > > that
> > > > > > > seems to work. So I am wondering if it's indeed just the 0x1901 
> > > > > > > one,
> > > > > > > which also explains Mikas case that Thunderbolt stuff works as 
> > > > > > > devices
> > > > > > > never get populated under this particular bridge controller, but 
> > > > > > > under
> > > > > > > those "Root Port"s
> > > > > >
> > > > > > It always is a PCIe port, but its location within the SoC may 
> > > > > > matter.
> > > > >
> > > > > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > > > > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> > > > > still the same.
> > > > >
> > > > > > Also some custom AML-based power management is involved and that may
> > > > > > be making specific assumptions on the configuration of the SoC and 
> > > > > > the
> > > > > > GPU at the time of its invocation which unfortunately are not known 
> > > > > > to
> > > > > > us.
> > > > > >
> > > > > > However, it looks like the AML invoked to power down the GPU from
> > > > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > > > > that point, so it looks like that AML tries to access device memory 
> > > > > > on
> > > > > > the GPU (beyond the PCI config space) or similar which is not
> > > > > > accessible in PCI power states below D0.
> > > > >
> > > > > Or the PCI config space of the GPU when the parent root port is in 
> > > > > D3hot
> > > > > (as it is the case here). Also then the GPU config space is not
> > > > > accessible.
> > > >
> > > > Why would the parent port be in D3hot at that point?  Wouldn't that be
> > > > a suspend ordering violation?
> > >
> > > No. We put the GPU into D3hot first,
> 
> OK
> 
> Does this involve any AML, like a _PS3 under the GPU object?

I don't see _PS3 (nor _PS0) for that object. If I read it right the GPU
itself is not described in ACPI tables at all.

> > > then the root port and then turn
> > > off the power resource (which is attached to the root port) resulting
> > > the topology entering D3cold.
> >
> > I don't see that happening in the AML though.
> 
> Which AML do you mean, specifically?  The _OFF method for the root
> port's _PR3 power resource or something else?

The root port's _OFF method for the power resource returned by its _PR3.

> > Basically the difference is that when Windows 7 or Linux (the _REV==5
> > check) then we directly do link disable whereas in Windows 8+ we invoke
> > LKDS() method that puts the link into L2/L3. None of the fields they
> > access seem to touch the GPU itself.
> 
> So that may be where the problem is.
> 
> Putting the downstream component into PCI D[1-3] is expected to put
> the link into L1, so I'm not sure how that plays with the later
> attempt to put it into L2/L3 Ready.

That should be fine. What I've seen the link goes into L1 when
downstream component is put to D-state (not D0) and then it is put back
to L0 when L2/3 ready is propagated. Eventually it goes into L2 or L3.

> Also, L2/L3 Ready is expected to be transient, so finally power should
> be removed somehow.

There is GPIO for both power and PERST, I think the line here:

  \_SB.SGOV (0x01010004, Zero)

is the one that removes power.

> > LKDS() for the first PEG port looks like this:
> >
> >P0L2 = One
> >Sleep (0x10)
> >Local0 = Zero
> >While (P0L2)
> >{
> > If ((Local0 > 0x04))
> > {
> >   

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Mika Westerberg
On Thu, Nov 21, 2019 at 01:46:14PM +0200, Mika Westerberg wrote:
> On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> > On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > > last week or so I found systems where the GPU was under the "PCI
> > > > > Express Root Port" (name from lspci) and on those systems all of that
> > > > > seems to work. So I am wondering if it's indeed just the 0x1901 one,
> > > > > which also explains Mikas case that Thunderbolt stuff works as devices
> > > > > never get populated under this particular bridge controller, but under
> > > > > those "Root Port"s
> > > >
> > > > It always is a PCIe port, but its location within the SoC may matter.
> > >
> > > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> > > still the same.
> > >
> > > > Also some custom AML-based power management is involved and that may
> > > > be making specific assumptions on the configuration of the SoC and the
> > > > GPU at the time of its invocation which unfortunately are not known to
> > > > us.
> > > >
> > > > However, it looks like the AML invoked to power down the GPU from
> > > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > > that point, so it looks like that AML tries to access device memory on
> > > > the GPU (beyond the PCI config space) or similar which is not
> > > > accessible in PCI power states below D0.
> > >
> > > Or the PCI config space of the GPU when the parent root port is in D3hot
> > > (as it is the case here). Also then the GPU config space is not
> > > accessible.
> > 
> > Why would the parent port be in D3hot at that point?  Wouldn't that be
> > a suspend ordering violation?
> 
> No. We put the GPU into D3hot first, then the root port and then turn
> off the power resource (which is attached to the root port) resulting
> the topology entering D3cold.

I don't see that happening in the AML though.

Basically the difference is that when Windows 7 or Linux (the _REV==5
check) then we directly do link disable whereas in Windows 8+ we invoke
LKDS() method that puts the link into L2/L3. None of the fields they
access seem to touch the GPU itself.

LKDS() for the first PEG port looks like this:

   P0L2 = One
   Sleep (0x10)
   Local0 = Zero
   While (P0L2)
   {
If ((Local0 > 0x04))
{
Break
}

Sleep (0x10)
Local0++
   }

One thing that comes to mind is that the loop can end even if P0L2 is
not cleared as it does only 5 iterations with 16 ms sleep between. Maybe
Sleep() is implemented differently in Windows? I mean Linux may be
"faster" here and return prematurely and if we leave the port into D0
this does not happen, or something. I'm just throwing out ideas :)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Mika Westerberg
On Thu, Nov 21, 2019 at 12:34:22PM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 21, 2019 at 12:28 PM Mika Westerberg
>  wrote:
> >
> > On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > > > last week or so I found systems where the GPU was under the "PCI
> > > > Express Root Port" (name from lspci) and on those systems all of that
> > > > seems to work. So I am wondering if it's indeed just the 0x1901 one,
> > > > which also explains Mikas case that Thunderbolt stuff works as devices
> > > > never get populated under this particular bridge controller, but under
> > > > those "Root Port"s
> > >
> > > It always is a PCIe port, but its location within the SoC may matter.
> >
> > Exactly. Intel hardware has PCIe ports on CPU side (these are called
> > PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
> > still the same.
> >
> > > Also some custom AML-based power management is involved and that may
> > > be making specific assumptions on the configuration of the SoC and the
> > > GPU at the time of its invocation which unfortunately are not known to
> > > us.
> > >
> > > However, it looks like the AML invoked to power down the GPU from
> > > acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> > > that point, so it looks like that AML tries to access device memory on
> > > the GPU (beyond the PCI config space) or similar which is not
> > > accessible in PCI power states below D0.
> >
> > Or the PCI config space of the GPU when the parent root port is in D3hot
> > (as it is the case here). Also then the GPU config space is not
> > accessible.
> 
> Why would the parent port be in D3hot at that point?  Wouldn't that be
> a suspend ordering violation?

No. We put the GPU into D3hot first, then the root port and then turn
off the power resource (which is attached to the root port) resulting
the topology entering D3cold.

> > I took a look at the HP Omen ACPI tables which has similar problem and
> > there is also check for Windows 7 (but not Linux) so I think one
> > alternative workaround would be to add these devices into
> > acpi_osi_dmi_table[] where .callback is set to dmi_disable_osi_win8 (or
> > pass 'acpi_osi="!Windows 2012"' in the kernel command line).
> 
> I'd like to understand the facts that have been established so far
> before deciding what to do about them. :-)

Yes, I agree :)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 11:29:33PM +0100, Rafael J. Wysocki wrote:
> > last week or so I found systems where the GPU was under the "PCI
> > Express Root Port" (name from lspci) and on those systems all of that
> > seems to work. So I am wondering if it's indeed just the 0x1901 one,
> > which also explains Mikas case that Thunderbolt stuff works as devices
> > never get populated under this particular bridge controller, but under
> > those "Root Port"s
> 
> It always is a PCIe port, but its location within the SoC may matter.

Exactly. Intel hardware has PCIe ports on CPU side (these are called
PEG, PCI Express Graphics, ports), and the PCH side. I think the IP is
still the same.

> Also some custom AML-based power management is involved and that may
> be making specific assumptions on the configuration of the SoC and the
> GPU at the time of its invocation which unfortunately are not known to
> us.
> 
> However, it looks like the AML invoked to power down the GPU from
> acpi_pci_set_power_state() gets confused if it is not in PCI D0 at
> that point, so it looks like that AML tries to access device memory on
> the GPU (beyond the PCI config space) or similar which is not
> accessible in PCI power states below D0.

Or the PCI config space of the GPU when the parent root port is in D3hot
(as it is the case here). Also then the GPU config space is not
accessible.

I took a look at the HP Omen ACPI tables which has similar problem and
there is also check for Windows 7 (but not Linux) so I think one
alternative workaround would be to add these devices into
acpi_osi_dmi_table[] where .callback is set to dmi_disable_osi_win8 (or
pass 'acpi_osi="!Windows 2012"' in the kernel command line).
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Mika Westerberg
On Thu, Nov 21, 2019 at 12:03:52PM +0100, Rafael J. Wysocki wrote:
> On Thu, Nov 21, 2019 at 11:14 AM Mika Westerberg
>  wrote:
> >
> > On Wed, Nov 20, 2019 at 10:36:31PM +0100, Karol Herbst wrote:
> > > with the branch and patch applied:
> > > https://gist.githubusercontent.com/karolherbst/03c4c8141b0fa292d781badfa186479e/raw/5c62640afbc57d6e69ea924c338bd2836e770d02/gistfile1.txt
> >
> > Thanks for testing. Too bad it did not help :( I suppose there is no
> > change if you increase the delay to say 1s?
> 
> Well, look at the original patch in this thread.
> 
> What it does is to prevent the device (GPU in this particular case)
> from going into a PCI low-power state before invoking AML to power it
> down (the AML is still invoked after this patch AFAICS), so why would
> that have anything to do with the delays?

Yes, I know what it does :) I was just thinking that maybe it's still
the link that does not come up when we go back to D0 I guess that's not
the case here.

> The only reason would be the AML running too early, but that doesn't
> seem likely.  IMO more likely is that the AML does something which
> cannot be done to a device in a PCI low-power state.

It may very well be the case.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 10:36:31PM +0100, Karol Herbst wrote:
> with the branch and patch applied:
> https://gist.githubusercontent.com/karolherbst/03c4c8141b0fa292d781badfa186479e/raw/5c62640afbc57d6e69ea924c338bd2836e770d02/gistfile1.txt

Thanks for testing. Too bad it did not help :( I suppose there is no
change if you increase the delay to say 1s?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 05:53:07PM +0200, Mika Westerberg wrote:
> On Wed, Nov 20, 2019 at 04:37:14PM +0100, Karol Herbst wrote:
> > On Wed, Nov 20, 2019 at 4:15 PM Mika Westerberg
> >  wrote:
> > >
> > > On Wed, Nov 20, 2019 at 01:11:52PM +0100, Karol Herbst wrote:
> > > > On Wed, Nov 20, 2019 at 1:09 PM Mika Westerberg
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 20, 2019 at 12:58:00PM +0100, Karol Herbst wrote:
> > > > > > overall, what I really want to know is, _why_ does it work on 
> > > > > > windows?
> > > > >
> > > > > So do I ;-)
> > > > >
> > > > > > Or what are we doing differently on Linux so that it doesn't work? 
> > > > > > If
> > > > > > anybody has any idea on how we could dig into this and figure it out
> > > > > > on this level, this would probably allow us to get closer to the 
> > > > > > root
> > > > > > cause? no?
> > > > >
> > > > > Have you tried to use the acpi_rev_override parameter in your system 
> > > > > and
> > > > > does it have any effect?
> > > > >
> > > > > Also did you try to trace the ACPI _ON/_OFF() methods? I think that
> > > > > should hopefully reveal something.
> > > > >
> > > >
> > > > I think I did in the past and it seemed to have worked, there is just
> > > > one big issue with this: it's a Dell specific workaround afaik, and
> > > > this issue plagues not just Dell, but we've seen it on HP and Lenovo
> > > > laptops as well, and I've heard about users having the same issues on
> > > > Asus and MSI laptops as well.
> > >
> > > Maybe it is not a workaround at all but instead it simply determines
> > > whether the system supports RTD3 or something like that (IIRC Windows 8
> > > started supporting it). Maybe Dell added check for Linux because at that
> > > time Linux did not support it.
> > >
> > 
> > the point is, it's not checking it by default, so by default you still
> > run into the windows 8 codepath.
> 
> Well you can add the quirk to acpi_rev_dmi_table[] so it goes to that
> path by default. There are a bunch of similar entries for Dell machines.
> 
> Of course this does not help the non-Dell users so we would still need
> to figure out the root cause.

I think I asked you to test the PCIe delay patch and it did not help but
I wonder if it helps if we increase the delay. As an experiment could
you try Bjorn's pci/pm branch. The last two commits are for the delay.

If you could pull that branch and apply the following patch on top and
give it a try? Then post the dmesg somewhere so we can see whether it
did the delay at all.

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1f319b1175da..1ad6f1372ed5 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4697,12 +4697,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev 
*dev)
return;
}
 
-   /* Take d3cold_delay requirements into account */
-   delay = pci_bus_max_d3cold_delay(dev->subordinate);
-   if (!delay) {
-   up_read(_bus_sem);
-   return;
-   }
+   delay = 500;
 
child = list_first_entry(>subordinate->devices, struct pci_dev,
 bus_list);
@@ -4715,7 +4710,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev 
*dev)
 * management for them (see pci_bridge_d3_possible()).
 */
if (!pci_is_pcie(dev)) {
-   pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
+   pci_info(dev, "waiting %d ms for secondary bus\n", 1000 + 
delay);
msleep(1000 + delay);
return;
}
@@ -4741,10 +4736,10 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev 
*dev)
return;
 
if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
-   pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
+   pci_info(dev, "waiting %d ms for downstream link\n", delay);
msleep(delay);
} else {
-   pci_dbg(dev, "waiting %d ms for downstream link, after 
activation\n",
+   pci_info(dev, "waiting %d ms for downstream link, after 
activation\n",
delay);
if (!pcie_wait_for_link_delay(dev, true, delay)) {
/* Did not train, no need to wait any further */
@@ -4753,7 +4748,7 @@ void pci_bridge_wait_for_secondary_bus(struct pci_dev 
*dev)
}
 
if (!pci_device_is_present(child)) {
-   pci_dbg(child, "waiting additional %d ms to become 
accessible\n", delay);
+   pci_info(child, "waiting additional %d ms to become 
accessible\n", delay);
msleep(delay);
}
 }
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 04:37:14PM +0100, Karol Herbst wrote:
> On Wed, Nov 20, 2019 at 4:15 PM Mika Westerberg
>  wrote:
> >
> > On Wed, Nov 20, 2019 at 01:11:52PM +0100, Karol Herbst wrote:
> > > On Wed, Nov 20, 2019 at 1:09 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Wed, Nov 20, 2019 at 12:58:00PM +0100, Karol Herbst wrote:
> > > > > overall, what I really want to know is, _why_ does it work on windows?
> > > >
> > > > So do I ;-)
> > > >
> > > > > Or what are we doing differently on Linux so that it doesn't work? If
> > > > > anybody has any idea on how we could dig into this and figure it out
> > > > > on this level, this would probably allow us to get closer to the root
> > > > > cause? no?
> > > >
> > > > Have you tried to use the acpi_rev_override parameter in your system and
> > > > does it have any effect?
> > > >
> > > > Also did you try to trace the ACPI _ON/_OFF() methods? I think that
> > > > should hopefully reveal something.
> > > >
> > >
> > > I think I did in the past and it seemed to have worked, there is just
> > > one big issue with this: it's a Dell specific workaround afaik, and
> > > this issue plagues not just Dell, but we've seen it on HP and Lenovo
> > > laptops as well, and I've heard about users having the same issues on
> > > Asus and MSI laptops as well.
> >
> > Maybe it is not a workaround at all but instead it simply determines
> > whether the system supports RTD3 or something like that (IIRC Windows 8
> > started supporting it). Maybe Dell added check for Linux because at that
> > time Linux did not support it.
> >
> 
> the point is, it's not checking it by default, so by default you still
> run into the windows 8 codepath.

Well you can add the quirk to acpi_rev_dmi_table[] so it goes to that
path by default. There are a bunch of similar entries for Dell machines.

Of course this does not help the non-Dell users so we would still need
to figure out the root cause.

> > In case RTD3 is supported it invokes LKDS() which probably does the L2
> > or L3 entry and this is for some reason does not work the same way in
> > Linux than it does with Windows 8+.
> >
> > I don't remember if this happens only with nouveau or with the
> > proprietary driver as well but looking at the nouveau runtime PM suspend
> > hook (assuming I'm looking at the correct code):
> >
> > static int
> > nouveau_pmops_runtime_suspend(struct device *dev)
> > {
> > struct pci_dev *pdev = to_pci_dev(dev);
> > struct drm_device *drm_dev = pci_get_drvdata(pdev);
> > int ret;
> >
> > if (!nouveau_pmops_runtime()) {
> > pm_runtime_forbid(dev);
> > return -EBUSY;
> > }
> >
> > nouveau_switcheroo_optimus_dsm();
> > ret = nouveau_do_suspend(drm_dev, true);
> > pci_save_state(pdev);
> > pci_disable_device(pdev);
> > pci_ignore_hotplug(pdev);
> > pci_set_power_state(pdev, PCI_D3cold);
> > drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
> > return ret;
> > }
> >
> > Normally PCI drivers leave the PCI bus PM things to PCI core but here
> > the driver does these. So I wonder if it makes any difference if we let
> > the core handle all that:
> >
> > static int
> > nouveau_pmops_runtime_suspend(struct device *dev)
> > {
> > struct pci_dev *pdev = to_pci_dev(dev);
> > struct drm_device *drm_dev = pci_get_drvdata(pdev);
> > int ret;
> >
> > if (!nouveau_pmops_runtime()) {
> > pm_runtime_forbid(dev);
> > return -EBUSY;
> > }
> >
> > nouveau_switcheroo_optimus_dsm();
> > ret = nouveau_do_suspend(drm_dev, true);
> > pci_ignore_hotplug(pdev);
> > drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
> > return ret;
> > }
> >
> > and similar for the nouveau_pmops_runtime_resume().
> >
> 
> yeah, I tried that at some point and it didn't help either. The reason
> we call those from inside Nouveau is to support systems pre _PR where
> nouveau invokes custom _DSM calls on its own. We could potentially
> check for that though.

OK.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 01:11:52PM +0100, Karol Herbst wrote:
> On Wed, Nov 20, 2019 at 1:09 PM Mika Westerberg
>  wrote:
> >
> > On Wed, Nov 20, 2019 at 12:58:00PM +0100, Karol Herbst wrote:
> > > overall, what I really want to know is, _why_ does it work on windows?
> >
> > So do I ;-)
> >
> > > Or what are we doing differently on Linux so that it doesn't work? If
> > > anybody has any idea on how we could dig into this and figure it out
> > > on this level, this would probably allow us to get closer to the root
> > > cause? no?
> >
> > Have you tried to use the acpi_rev_override parameter in your system and
> > does it have any effect?
> >
> > Also did you try to trace the ACPI _ON/_OFF() methods? I think that
> > should hopefully reveal something.
> >
> 
> I think I did in the past and it seemed to have worked, there is just
> one big issue with this: it's a Dell specific workaround afaik, and
> this issue plagues not just Dell, but we've seen it on HP and Lenovo
> laptops as well, and I've heard about users having the same issues on
> Asus and MSI laptops as well.

Maybe it is not a workaround at all but instead it simply determines
whether the system supports RTD3 or something like that (IIRC Windows 8
started supporting it). Maybe Dell added check for Linux because at that
time Linux did not support it.

In case RTD3 is supported it invokes LKDS() which probably does the L2
or L3 entry and this is for some reason does not work the same way in
Linux than it does with Windows 8+.

I don't remember if this happens only with nouveau or with the
proprietary driver as well but looking at the nouveau runtime PM suspend
hook (assuming I'm looking at the correct code):

static int
nouveau_pmops_runtime_suspend(struct device *dev)
{   
struct pci_dev *pdev = to_pci_dev(dev);
struct drm_device *drm_dev = pci_get_drvdata(pdev);
int ret;

if (!nouveau_pmops_runtime()) {
pm_runtime_forbid(dev);
return -EBUSY;
}

nouveau_switcheroo_optimus_dsm();
ret = nouveau_do_suspend(drm_dev, true);
pci_save_state(pdev);
pci_disable_device(pdev);
pci_ignore_hotplug(pdev);
pci_set_power_state(pdev, PCI_D3cold);
drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
return ret;
}

Normally PCI drivers leave the PCI bus PM things to PCI core but here
the driver does these. So I wonder if it makes any difference if we let
the core handle all that:

static int
nouveau_pmops_runtime_suspend(struct device *dev)
{   
struct pci_dev *pdev = to_pci_dev(dev);
struct drm_device *drm_dev = pci_get_drvdata(pdev);
int ret;

if (!nouveau_pmops_runtime()) {
pm_runtime_forbid(dev);
return -EBUSY;
}

nouveau_switcheroo_optimus_dsm();
ret = nouveau_do_suspend(drm_dev, true);
pci_ignore_hotplug(pdev);
drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
return ret;
}

and similar for the nouveau_pmops_runtime_resume().
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 12:58:00PM +0100, Karol Herbst wrote:
> overall, what I really want to know is, _why_ does it work on windows?

So do I ;-)

> Or what are we doing differently on Linux so that it doesn't work? If
> anybody has any idea on how we could dig into this and figure it out
> on this level, this would probably allow us to get closer to the root
> cause? no?

Have you tried to use the acpi_rev_override parameter in your system and
does it have any effect?

Also did you try to trace the ACPI _ON/_OFF() methods? I think that
should hopefully reveal something.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 01:22:16PM +0200, Mika Westerberg wrote:
> If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 
> 0x05
> {

The OSYS comes from this (in DSDT):

If (_OSI ("Windows 2009"))
{
OSYS = 0x07D9
}

If (_OSI ("Windows 2012"))
{
OSYS = 0x07DC
}

If (_OSI ("Windows 2013"))
{
OSYS = 0x07DD
}

If (_OSI ("Windows 2015"))
{
OSYS = 0x07DF
}

So I guess this particular check tries to identify Windows 7 and older,
and Linux.

> If ((PIOF == Zero))
> {
> P0LD = One
> TCNT = Zero
> While ((TCNT < LDLY))
> {
> If ((P0LT == 0x08))
> {
> Break
> }
> 
> Sleep (0x10)
> TCNT += 0x10
> }
> 
> P0RM = One
> P0AP = 0x03
> }
> ElseIf ((PIOF == One))
> {
> P1LD = One
> TCNT = Zero
> While ((TCNT < LDLY))
> {
> If ((P1LT == 0x08))
> {
> Break
> }
> 
> Sleep (0x10)
> TCNT += 0x10
> }
> 
> P1RM = One
> P1AP = 0x03
> }
> ElseIf ((PIOF == 0x02))
> {
> P2LD = One
> TCNT = Zero
> While ((TCNT < LDLY))
> {
> If ((P2LT == 0x08))
> {
> Break
> }
> 
> Sleep (0x10)
> TCNT += 0x10
> }
> 
> P2RM = One
> P2AP = 0x03
> }
> 
> If ((PBGE != Zero))
> {
> If (SBDL (PIOF))
> {
> MBDL = GMXB (PIOF)
> PDUB (PIOF, MBDL)
> }
> }
> }
> Else
> {
> LKDS (PIOF)
> }
> 
> If ((DerefOf (SCLK [Zero]) != Zero))
> {
> PCRO (0xDC, 0x100C, DerefOf (SCLK [One]))
> Sleep (0x10)
> }
> 
> GPPR (PIOF, Zero)
> If ((OSYS != 0x07D9))
> {
> DIWK (PIOF)
> }
> 
> \_SB.SGOV (0x01010004, Zero)
> Sleep (0x14)
> Return (Zero)
> }
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
On Wed, Nov 20, 2019 at 11:52:22AM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 20, 2019 at 11:18 AM Mika Westerberg
>  wrote:
> >
> > Hi Karol,
> >
> > On Tue, Nov 19, 2019 at 11:26:45PM +0100, Karol Herbst wrote:
> > > On Tue, Nov 19, 2019 at 10:50 PM Bjorn Helgaas  wrote:
> > > >
> > > > [+cc Dave]
> > > >
> > > > On Thu, Oct 17, 2019 at 02:19:01PM +0200, Karol Herbst wrote:
> > > > > Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher 
> > > > > device
> > > > > states.
> > > > >
> > > > > v2: convert to pci_dev quirk
> > > > > put a proper technical explanation of the issue as a in-code 
> > > > > comment
> > > > > v3: disable it only for certain combinations of intel and nvidia 
> > > > > hardware
> > > > > v4: simplify quirk by setting flag on the GPU itself
> > > >
> > > > I have zero confidence that we understand the real problem, but we do
> > > > need to do something with this.  I'll merge it for v5.5 if we get the
> > > > minor procedural stuff below straightened out.
> > > >
> > >
> > > Thanks, and I agree with your statement, but at this point I think
> > > only Intel can help out digging deeper as I see no way to debug this
> > > further.
> >
> > I don't have anything against this patch, as long as the quirk stays
> > limited to the particular root port leading to the NVIDIA GPU. The
> > reason why I think it should to be limited is that I'm pretty certain
> > the problem is not in the root port itself. I have here a KBL based
> > Thinkpad X1 Carbon 6th gen that can put the TBT controller into D3cold
> > (it is connected to PCH root port) and it wakes up there just fine, so
> > don't want to break that.
> >
> > Now, PCIe devices cannot go into D3cold all by themselves. They always
> > need help from the platform side which is ACPI in this case. This is
> > done by having the device to have _PR3 method that returns one or more
> > power resources that the OS is supposed to turn off when the device is
> > put into D3cold. All of that is implemented as form of ACPI methods that
> > pretty much do the hardware specific things that are outside of PCIe
> > spec to get the device into D3cold. At high level the _OFF() method
> > causes the root port to broadcast PME_Turn_Off message that results the
> > link to enter L2/3 ready, it then asserts PERST, configures WAKE (both
> > can be GPIOs) and finally removes power (if the link goes into L3,
> > otherwise it goes into L2).
> >
> > I think this is where the problem actually lies - the ASL methods that
> > are used to put the device into D3cold and back. We know that in Windows
> > this all works fine so unless Windows quirks the root port the same way
> > there is another reason behind this.
> >
> > In case of Dell XPS 9560 (IIRC that's the machine you have) the
> > corresponding power resource is called \_SB.PCI0.PEG0.PG00 and its
> > _ON/_OFF methods end up calling PGON()/PGOF() accordingly. The methods
> > itself do lots of things and it is hard to follow the dissassembled
> > ASL which does not have any comments but there are couple of things that
> > stand out where we may go into a different path. One of them is this in
> > the PGOF() method:
> >
> >If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05
> >
> > The ((OSYS == 0x07DF) && (_REV == 0x05)) checks specifically for Linux
> > (see [1] and 18d78b64fddc ("ACPI / init: Make it possible to override
> > _REV")) so it might be that Dell people tested this at some point in
> > Linux as well. Added Mario in case he has any ideas.
> >
> > Previously I suggested you to try the ACPI method tracing to see what
> > happens inside PGOF(). Did you have time to try it? It may provide more
> > information about that is happening inside those methods and hopefully
> > point us to the root cause.
> >
> > Also if you haven't tried already passing acpi_rev_override in the
> > command line makes the _REV to return 5 so it should go into the "Linux"
> > path in PGOF().
> 
> Oh, so does it look like we are trying to work around AML that tried
> to work around some problematic behavior in Linux at one point?

Yes, it looks like so if I read the ASL right. The whole method looks
like below (the full acpidump was shared by Karol in v3 thread) and
there is similar check in the _ON (PGON) method:

Meth

Re: [PATCH v4] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-20 Thread Mika Westerberg
Hi Karol,

On Tue, Nov 19, 2019 at 11:26:45PM +0100, Karol Herbst wrote:
> On Tue, Nov 19, 2019 at 10:50 PM Bjorn Helgaas  wrote:
> >
> > [+cc Dave]
> >
> > On Thu, Oct 17, 2019 at 02:19:01PM +0200, Karol Herbst wrote:
> > > Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher 
> > > device
> > > states.
> > >
> > > v2: convert to pci_dev quirk
> > > put a proper technical explanation of the issue as a in-code comment
> > > v3: disable it only for certain combinations of intel and nvidia hardware
> > > v4: simplify quirk by setting flag on the GPU itself
> >
> > I have zero confidence that we understand the real problem, but we do
> > need to do something with this.  I'll merge it for v5.5 if we get the
> > minor procedural stuff below straightened out.
> >
> 
> Thanks, and I agree with your statement, but at this point I think
> only Intel can help out digging deeper as I see no way to debug this
> further.

I don't have anything against this patch, as long as the quirk stays
limited to the particular root port leading to the NVIDIA GPU. The
reason why I think it should to be limited is that I'm pretty certain
the problem is not in the root port itself. I have here a KBL based
Thinkpad X1 Carbon 6th gen that can put the TBT controller into D3cold
(it is connected to PCH root port) and it wakes up there just fine, so
don't want to break that.

Now, PCIe devices cannot go into D3cold all by themselves. They always
need help from the platform side which is ACPI in this case. This is
done by having the device to have _PR3 method that returns one or more
power resources that the OS is supposed to turn off when the device is
put into D3cold. All of that is implemented as form of ACPI methods that
pretty much do the hardware specific things that are outside of PCIe
spec to get the device into D3cold. At high level the _OFF() method
causes the root port to broadcast PME_Turn_Off message that results the
link to enter L2/3 ready, it then asserts PERST, configures WAKE (both
can be GPIOs) and finally removes power (if the link goes into L3,
otherwise it goes into L2).

I think this is where the problem actually lies - the ASL methods that
are used to put the device into D3cold and back. We know that in Windows
this all works fine so unless Windows quirks the root port the same way
there is another reason behind this.

In case of Dell XPS 9560 (IIRC that's the machine you have) the
corresponding power resource is called \_SB.PCI0.PEG0.PG00 and its
_ON/_OFF methods end up calling PGON()/PGOF() accordingly. The methods
itself do lots of things and it is hard to follow the dissassembled
ASL which does not have any comments but there are couple of things that
stand out where we may go into a different path. One of them is this in
the PGOF() method:

   If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05

The ((OSYS == 0x07DF) && (_REV == 0x05)) checks specifically for Linux
(see [1] and 18d78b64fddc ("ACPI / init: Make it possible to override
_REV")) so it might be that Dell people tested this at some point in
Linux as well. Added Mario in case he has any ideas.

Previously I suggested you to try the ACPI method tracing to see what
happens inside PGOF(). Did you have time to try it? It may provide more
information about that is happening inside those methods and hopefully
point us to the root cause.

Also if you haven't tried already passing acpi_rev_override in the
command line makes the _REV to return 5 so it should go into the "Linux"
path in PGOF().

[1] 
https://www.kernel.org/doc/html/latest/firmware-guide/acpi/osi.html#do-not-use-rev

> > > Signed-off-by: Karol Herbst 
> > > Cc: Bjorn Helgaas 
> > > Cc: Lyude Paul 
> > > Cc: Rafael J. Wysocki 
> > > Cc: Mika Westerberg 
> > > Cc: linux-...@vger.kernel.org
> > > Cc: linux...@vger.kernel.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: nouv...@lists.freedesktop.org
> > > ---
> > >  drivers/pci/pci.c|  7 ++
> > >  drivers/pci/quirks.c | 53 
> > >  include/linux/pci.h  |  1 +
> > >  3 files changed, 61 insertions(+)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index b97d9e10c9cc..02e71e0bcdd7 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -850,6 +850,13 @@ static int pci_raw_set_power_state(struct pci_dev 
> > > *dev, pci_power_t state)
> > >  || (state == PCI_D2 && !dev->d2_support))
> > >   return -EIO;
> > >
> > > + /*
> > > +  * check if we have a bad combination of brid

Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-23 Thread Mika Westerberg
On Tue, Oct 22, 2019 at 02:51:53PM +0200, Karol Herbst wrote:
> On Tue, Oct 22, 2019 at 2:45 PM Mika Westerberg
>  wrote:
> >
> > On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote:
> > > I think there is something I totally forgot about:
> > >
> > > When there was never a driver bound to the GPU, and if runtime power
> > > management gets enabled on that device, runtime suspend/resume works
> > > as expected (I am not 100% sure on if that always works, but I will
> > > recheck that).
> >
> > AFAIK, if there is no driver bound to the PCI device it is left to D0
> > regardless of the runtime PM state which could explain why it works in
> > that case (it is never put into D3hot).
> >
> > I looked at the acpidump you sent and there is one thing that may
> > explain the differences between Windows and Linux. Not sure if you were
> > aware of this already, though. The power resource PGOF() method has
> > this:
> >
> >If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05 {
> >   ...
> >}
> >
> 
> I think this is the fallback to some older method of runtime
> suspending the device, and I think it will end up touching different
> registers on the bridge controller which do not show the broken
> behaviour.

I think it actually tries to identify older Windows and then Linux (the
_REV == 0x05 check comes from that). So at least some point Dell people
have experiment this on Linux.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-22 Thread Mika Westerberg
On Tue, Oct 22, 2019 at 11:16:14AM +0200, Karol Herbst wrote:
> I think there is something I totally forgot about:
> 
> When there was never a driver bound to the GPU, and if runtime power
> management gets enabled on that device, runtime suspend/resume works
> as expected (I am not 100% sure on if that always works, but I will
> recheck that).

AFAIK, if there is no driver bound to the PCI device it is left to D0
regardless of the runtime PM state which could explain why it works in
that case (it is never put into D3hot).

I looked at the acpidump you sent and there is one thing that may
explain the differences between Windows and Linux. Not sure if you were
aware of this already, though. The power resource PGOF() method has
this:

   If (((OSYS <= 0x07D9) || ((OSYS == 0x07DF) && (_REV == 0x05 {
  ...
   }  

If I read it right, the later condition tries to detect Linux which
fails nowadays but if you have acpi_rev_override in the command line (or
the machine is listed in acpi_rev_dmi_table) this check passes and does
some magic which is not clear to me. There is similar in PGON() side
which is used to turn the device back on.

You can check what actually happens when _ON()/_OFF() is called by
passing something like below to the kernel command line:

  acpi.trace_debug_layer=0x80 acpi.trace_debug_level=0x10 
acpi.trace_method_name=\_SB.PCI0.PEG0.PG00._ON acpi.trace_state=method

(See also Documentation/firmware-guide/acpi/method-tracing.rst).

Trace goes to system dmesg.


Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-21 Thread Mika Westerberg
On Mon, Oct 21, 2019 at 04:49:09PM +0200, Karol Herbst wrote:
> On Mon, Oct 21, 2019 at 4:09 PM Mika Westerberg
>  wrote:
> >
> > On Mon, Oct 21, 2019 at 03:54:09PM +0200, Karol Herbst wrote:
> > > > I really would like to provide you more information about such
> > > > workaround but I'm not aware of any ;-) I have not seen any issues like
> > > > this when D3cold is properly implemented in the platform.  That's why
> > > > I'm bit skeptical that this has anything to do with specific Intel PCIe
> > > > ports. More likely it is some power sequence in the _ON/_OFF() methods
> > > > that is run differently on Windows.
> > >
> > > yeah.. maybe. I really don't know what's the actual root cause. I just
> > > know that with this workaround it works perfectly fine on my and some
> > > other systems it was tested on. Do you know who would be best to
> > > approach to get proper documentation about those methods and what are
> > > the actual prerequisites of those methods?
> >
> > Those should be documented in the ACPI spec. Chapter 7 should explain
> > power resources and the device power methods in detail.
> 
> either I looked up the wrong spec or the documentation isn't really
> saying much there.

Well it explains those methods, _PSx, _PRx and _ON()/_OFF(). In case of
PCIe device you also want to check PCIe spec. PCIe 5.0 section 5.8 "PCI
Function Power State Transitions" has a picture about the supported
power state transitions and there we can find that function must be in
D3hot before it can be transitioned into D3cold so if the _OFF() for
example blindly assumes that the device is in D0 when it is called, it
is a bug in the BIOS.

BTW, where can I find acpidump of such system?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-21 Thread Mika Westerberg
On Mon, Oct 21, 2019 at 03:54:09PM +0200, Karol Herbst wrote:
> > I really would like to provide you more information about such
> > workaround but I'm not aware of any ;-) I have not seen any issues like
> > this when D3cold is properly implemented in the platform.  That's why
> > I'm bit skeptical that this has anything to do with specific Intel PCIe
> > ports. More likely it is some power sequence in the _ON/_OFF() methods
> > that is run differently on Windows.
> 
> yeah.. maybe. I really don't know what's the actual root cause. I just
> know that with this workaround it works perfectly fine on my and some
> other systems it was tested on. Do you know who would be best to
> approach to get proper documentation about those methods and what are
> the actual prerequisites of those methods?

Those should be documented in the ACPI spec. Chapter 7 should explain
power resources and the device power methods in detail.


Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-21 Thread Mika Westerberg
On Wed, Oct 16, 2019 at 11:48:22PM +0200, Karol Herbst wrote:
> On Wed, Oct 16, 2019 at 11:37 PM Bjorn Helgaas  wrote:
> >
> > [+cc linux-acpi]
> >
> > On Wed, Oct 16, 2019 at 09:18:32PM +0200, Karol Herbst wrote:
> > > but setting the PCI_DEV_FLAGS_NO_D3 flag does prevent using the
> > > platform means of putting the device into D3cold, right? That's
> > > actually what should still happen, just the D3hot step should be
> > > skipped.
> >
> > If I understand correctly, when we put a device in D3cold on an ACPI
> > system, we do something like this:
> >
> >   pci_set_power_state(D3cold)
> > if (PCI_DEV_FLAGS_NO_D3)
> >   return 0   <-- nothing at all if 
> > quirked
> > pci_raw_set_power_state
> >   pci_write_config_word(PCI_PM_CTRL, D3hot)  <-- set to D3hot
> > __pci_complete_power_transition(D3cold)
> >   pci_platform_power_transition(D3cold)
> > platform_pci_set_power_state(D3cold)
> >   acpi_pci_set_power_state(D3cold)
> > acpi_device_set_power(ACPI_STATE_D3_COLD)
> >   ...
> > acpi_evaluate_object("_OFF") <-- set to D3cold
> >
> > I did not understand the connection with platform (ACPI) power
> > management from your patch.  It sounds like you want this entire path
> > except that you want to skip the PCI_PM_CTRL write?
> >
> 
> exactly. I am running with this workaround for a while now and never
> had any fails with it anymore. The GPU gets turned off correctly and I
> see the same power savings, just that the GPU can be powered on again.
> 
> > That seems like something Rafael should weigh in on.  I don't know
> > why we set the device to D3hot with PCI_PM_CTRL before using the ACPI
> > methods, and I don't know what the effect of skipping that is.  It
> > seems a little messy to slice out this tiny piece from the middle, but
> > maybe it makes sense.
> >
> 
> afaik when I was talking with others in the past about it, Windows is
> doing that before using ACPI calls, but maybe they have some similar
> workarounds for certain intel bridges as well? I am sure it affects
> more than the one I am blacklisting here, but I rather want to check
> each device before blacklisting all kabylake and sky lake bridges (as
> those are the ones were this issue can be observed).
> 
> Sadly we had no luck getting any information about such workaround out
> of Nvidia or Intel.

I really would like to provide you more information about such
workaround but I'm not aware of any ;-) I have not seen any issues like
this when D3cold is properly implemented in the platform.  That's why
I'm bit skeptical that this has anything to do with specific Intel PCIe
ports. More likely it is some power sequence in the _ON/_OFF() methods
that is run differently on Windows.


Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-21 Thread Mika Westerberg
On Mon, Oct 21, 2019 at 03:02:23PM +0200, Karol Herbst wrote:
> > No, just block runtime PM from the device in nouveau driver.
> 
> but that's not what the patch does. It only skips the PCI PM reg
> write, but still let the ACPI method be invoked to put the device into
> D3cold

Oh, indeed it does. I did not realize that.

Which makes me wonder whether ACPI _OFF() expects the device to be in D0
for some reason.


Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-21 Thread Mika Westerberg
On Mon, Oct 21, 2019 at 02:00:46PM +0200, Karol Herbst wrote:
> On Mon, Oct 21, 2019 at 1:40 PM Mika Westerberg
>  wrote:
> >
> > Hi Karol,
> >
> > Sorry for commenting late, I just came back from vacation.
> >
> > On Wed, Oct 16, 2019 at 04:44:49PM +0200, Karol Herbst wrote:
> > > Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher 
> > > device
> > > states.
> > >
> > > v2: convert to pci_dev quirk
> > > put a proper technical explanation of the issue as a in-code comment
> > > v3: disable it only for certain combinations of intel and nvidia hardware
> > >
> > > Signed-off-by: Karol Herbst 
> > > Cc: Bjorn Helgaas 
> > > Cc: Lyude Paul 
> > > Cc: Rafael J. Wysocki 
> > > Cc: Mika Westerberg 
> > > Cc: linux-...@vger.kernel.org
> > > Cc: linux...@vger.kernel.org
> > > Cc: dri-devel@lists.freedesktop.org
> > > Cc: nouv...@lists.freedesktop.org
> > > ---
> > >  drivers/pci/pci.c| 11 ++
> > >  drivers/pci/quirks.c | 52 
> >
> > I may be missing something but why you can't do this in the nouveau
> > driver itself?
> 
> What do you mean precisely? Move the quirk into nouveau, but keep the
> changes to pci core?

No, just block runtime PM from the device in nouveau driver.


Re: [PATCH v3] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-10-21 Thread Mika Westerberg
Hi Karol,

Sorry for commenting late, I just came back from vacation.

On Wed, Oct 16, 2019 at 04:44:49PM +0200, Karol Herbst wrote:
> Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher device
> states.
> 
> v2: convert to pci_dev quirk
> put a proper technical explanation of the issue as a in-code comment
> v3: disable it only for certain combinations of intel and nvidia hardware
> 
> Signed-off-by: Karol Herbst 
> Cc: Bjorn Helgaas 
> Cc: Lyude Paul 
> Cc: Rafael J. Wysocki 
> Cc: Mika Westerberg 
> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> ---
>  drivers/pci/pci.c| 11 ++
>  drivers/pci/quirks.c | 52 

I may be missing something but why you can't do this in the nouveau
driver itself?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

2019-10-01 Thread Mika Westerberg
On Tue, Oct 01, 2019 at 10:56:39AM +0200, Karol Herbst wrote:
> On Tue, Oct 1, 2019 at 10:47 AM Mika Westerberg
>  wrote:
> >
> > On Mon, Sep 30, 2019 at 06:36:12PM +0200, Karol Herbst wrote:
> > > On Mon, Sep 30, 2019 at 6:30 PM Mika Westerberg
> > >  wrote:
> > > >
> > > > On Mon, Sep 30, 2019 at 06:05:14PM +0200, Karol Herbst wrote:
> > > > > still happens with your patch applied. The machine simply gets shut 
> > > > > down.
> > > > >
> > > > > dmesg can be found here:
> > > > > https://gist.githubusercontent.com/karolherbst/40eb091c7b7b33ef993525de660f1a3b/raw/2380e31f566e93e5ba7c87ef545420965d4c492c/gistfile1.txt
> > > >
> > > > Looking your dmesg:
> > > >
> > > > Sep 30 17:24:27 kernel: nouveau :01:00.0: DRM: DCB version 4.1
> > > > Sep 30 17:24:27 kernel: nouveau :01:00.0: DRM: MM: using COPY for 
> > > > buffer copies
> > > > Sep 30 17:24:27 kernel: [drm] Initialized nouveau 1.3.1 20120801 for 
> > > > :01:00.0 on minor 1
> > > >
> > > > I would assume it runtime suspends here. Then it wakes up because of PCI
> > > > access from userspace:
> > > >
> > > > Sep 30 17:24:42 kernel: pci_raw_set_power_state: 56 callbacks suppressed
> > > >
> > > > and for some reason it does not get resumed properly. There are also few
> > > > warnings from ACPI that might be relevant:
> > > >
> > > > Sep 30 17:24:27 kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 
> > > > type mismatch - Found [Buffer], ACPI requires [Package] 
> > > > (20190509/nsarguments-59)
> > > > Sep 30 17:24:27 kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: 
> > > > Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] 
> > > > (20190509/nsarguments-59)
> > > >
> > >
> > > afaik this is the case for essentially every laptop out there.
> >
> > OK, so they are harmless?
> >
> 
> yes
> 
> > > > This seems to be Dell XPS 9560 which I think has been around some time
> > > > already so I wonder why we only see issues now. Has it ever worked for
> > > > you or maybe there is a regression that causes it to happen now?
> > >
> > > oh, it's broken since forever, we just tried to get more information
> > > from Nvidia if they know what this is all about, but we got nothing
> > > useful.
> > >
> > > We were also hoping to find a reliable fix or workaround we could have
> > > inside nouveau to fix that as I think nouveau is the only driver
> > > actually hit by this issue, but nothing turned out to be reliable
> > > enough.
> >
> > Can't you just block runtime PM from the nouveau driver until this is
> > understood better? That can be done by calling pm_runtime_forbid() (or
> > not calling pm_runtime_allow() in the driver). Or in case of PCI driver
> > you just don't decrease the reference count when probe() ends.
> >
> 
> the thing is, it does work for a lot of laptops. We could only observe
> this on kaby lake and skylake ones. Even on Cannon Lakes it seems to
> work just fine.

Can't you then limit it to those?

I've experienced that Kabylake root ports can enter and exit in D3cold
just fine because we do that for Thunderbolt for example. But that
always requires help from ACPI. If the system is using non-standard ACPI
methods for example that may require some tricks in the driver side.

> > I think that would be much better than blocking any devices behind
> > Kabylake PCIe root ports from entering D3 (I don't really think the
> > problem is in the root ports itself but there is something we are
> > missing when the NVIDIA GPU is put into D3cold or back from there).
> 
> I highly doubt there is anything wrong with the GPU alone as we have
> too many indications which tell us otherwise.
> 
> Anyway, at this point I don't know where to look further for what's
> actually wrong. And apparently it works on Windows, but I don't know
> why and I have no idea what Windows does on such systems to make it
> work reliably.

By works you mean that Windows is able to put it into D3cold and back?
If that's the case it may be that there is some ACPI magic that the
Windows driver does and we of course are missing in Linux.


Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

2019-10-01 Thread Mika Westerberg
On Mon, Sep 30, 2019 at 06:36:12PM +0200, Karol Herbst wrote:
> On Mon, Sep 30, 2019 at 6:30 PM Mika Westerberg
>  wrote:
> >
> > On Mon, Sep 30, 2019 at 06:05:14PM +0200, Karol Herbst wrote:
> > > still happens with your patch applied. The machine simply gets shut down.
> > >
> > > dmesg can be found here:
> > > https://gist.githubusercontent.com/karolherbst/40eb091c7b7b33ef993525de660f1a3b/raw/2380e31f566e93e5ba7c87ef545420965d4c492c/gistfile1.txt
> >
> > Looking your dmesg:
> >
> > Sep 30 17:24:27 kernel: nouveau :01:00.0: DRM: DCB version 4.1
> > Sep 30 17:24:27 kernel: nouveau :01:00.0: DRM: MM: using COPY for 
> > buffer copies
> > Sep 30 17:24:27 kernel: [drm] Initialized nouveau 1.3.1 20120801 for 
> > :01:00.0 on minor 1
> >
> > I would assume it runtime suspends here. Then it wakes up because of PCI
> > access from userspace:
> >
> > Sep 30 17:24:42 kernel: pci_raw_set_power_state: 56 callbacks suppressed
> >
> > and for some reason it does not get resumed properly. There are also few
> > warnings from ACPI that might be relevant:
> >
> > Sep 30 17:24:27 kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type 
> > mismatch - Found [Buffer], ACPI requires [Package] (20190509/nsarguments-59)
> > Sep 30 17:24:27 kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 
> > type mismatch - Found [Buffer], ACPI requires [Package] 
> > (20190509/nsarguments-59)
> >
> 
> afaik this is the case for essentially every laptop out there.

OK, so they are harmless?

> > This seems to be Dell XPS 9560 which I think has been around some time
> > already so I wonder why we only see issues now. Has it ever worked for
> > you or maybe there is a regression that causes it to happen now?
> 
> oh, it's broken since forever, we just tried to get more information
> from Nvidia if they know what this is all about, but we got nothing
> useful.
> 
> We were also hoping to find a reliable fix or workaround we could have
> inside nouveau to fix that as I think nouveau is the only driver
> actually hit by this issue, but nothing turned out to be reliable
> enough.

Can't you just block runtime PM from the nouveau driver until this is
understood better? That can be done by calling pm_runtime_forbid() (or
not calling pm_runtime_allow() in the driver). Or in case of PCI driver
you just don't decrease the reference count when probe() ends.

I think that would be much better than blocking any devices behind
Kabylake PCIe root ports from entering D3 (I don't really think the
problem is in the root ports itself but there is something we are
missing when the NVIDIA GPU is put into D3cold or back from there).


Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

2019-09-30 Thread Mika Westerberg
On Mon, Sep 30, 2019 at 06:05:14PM +0200, Karol Herbst wrote:
> still happens with your patch applied. The machine simply gets shut down.
> 
> dmesg can be found here:
> https://gist.githubusercontent.com/karolherbst/40eb091c7b7b33ef993525de660f1a3b/raw/2380e31f566e93e5ba7c87ef545420965d4c492c/gistfile1.txt

Looking your dmesg:

Sep 30 17:24:27 kernel: nouveau :01:00.0: DRM: DCB version 4.1
Sep 30 17:24:27 kernel: nouveau :01:00.0: DRM: MM: using COPY for buffer 
copies
Sep 30 17:24:27 kernel: [drm] Initialized nouveau 1.3.1 20120801 for 
:01:00.0 on minor 1

I would assume it runtime suspends here. Then it wakes up because of PCI
access from userspace:

Sep 30 17:24:42 kernel: pci_raw_set_power_state: 56 callbacks suppressed
 
and for some reason it does not get resumed properly. There are also few
warnings from ACPI that might be relevant:

Sep 30 17:24:27 kernel: ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type 
mismatch - Found [Buffer], ACPI requires [Package] (20190509/nsarguments-59)
Sep 30 17:24:27 kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 
type mismatch - Found [Buffer], ACPI requires [Package] 
(20190509/nsarguments-59)

This seems to be Dell XPS 9560 which I think has been around some time
already so I wonder why we only see issues now. Has it ever worked for
you or maybe there is a regression that causes it to happen now?


Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

2019-09-30 Thread Mika Westerberg
On Mon, Sep 30, 2019 at 11:15:48AM +0200, Karol Herbst wrote:
> On Mon, Sep 30, 2019 at 10:05 AM Mika Westerberg
>  wrote:
> >
> > Hi Karol,
> >
> > On Fri, Sep 27, 2019 at 11:53:48PM +0200, Karol Herbst wrote:
> > > > What exactly is the serious issue?  I guess it's that the rescan
> > > > doesn't detect the GPU, which means it's not responding to config
> > > > accesses?  Is there any timing component here, e.g., maybe we're
> > > > missing some delay like the ones Mika is adding to the reset paths?
> > >
> > > When I was checking up on some of the PCI registers of the bridge
> > > controller, the slot detection told me that there is no device
> > > recognized anymore. I don't know which register it was anymore, though
> > > I guess one could read it up in the SoC spec document by Intel.
> > >
> > > My guess is, that the bridge controller fails to detect the GPU being
> > > here or actively threw it of the bus or something. But a normal system
> > > suspend/resume cycle brings the GPU back online (doing a rescan via
> > > sysfs gets the device detected again)
> >
> > Can you elaborate a bit what kind of scenario the issue happens (e.g
> > steps how it reproduces)? It was not 100% clear from the changelog. Also
> > what the result when the failure happens?
> >
> 
> yeah, I already have an updated patch in the works which also does the
> rework Bjorn suggested. Had no time yet to test if I didn't mess it
> up.
> 
> I am also thinking of adding a kernel parameter to enable this
> workaround on demand, but not quite sure on that one yet.

Right, I think it would be good to figure out the root cause before
adding any workarounds ;-) It might very well be that we are just
missing something the PCIe spec requires but not implemented in Linux.

> > I see there is a script that does something but unfortunately I'm not
> > fluent in Python so can't extract the steps how the issue can be
> > reproduced ;-)
> >
> > One thing that I'm working on is that Linux PCI subsystem misses certain
> > delays that are needed after D3cold -> D0 transition, otherwise the
> > device and/or link may not be ready before we access it. What you are
> > experiencing sounds similar. I wonder if you could try the following
> > patch and see if it makes any difference?
> >
> > https://patchwork.kernel.org/patch/11106611/
> 
> I think I already tried this path. The problem isn't that the device
> isn't accessible too late, but that it seems that the device
> completely falls off the bus. But I can retest again just to be sure.

Yes, please try it and share full dmesg if/when the failure still happens.


Re: [RFC PATCH] pci: prevent putting pcie devices into lower device states on certain intel bridges

2019-09-30 Thread Mika Westerberg
Hi Karol,

On Fri, Sep 27, 2019 at 11:53:48PM +0200, Karol Herbst wrote:
> > What exactly is the serious issue?  I guess it's that the rescan
> > doesn't detect the GPU, which means it's not responding to config
> > accesses?  Is there any timing component here, e.g., maybe we're
> > missing some delay like the ones Mika is adding to the reset paths?
> 
> When I was checking up on some of the PCI registers of the bridge
> controller, the slot detection told me that there is no device
> recognized anymore. I don't know which register it was anymore, though
> I guess one could read it up in the SoC spec document by Intel.
> 
> My guess is, that the bridge controller fails to detect the GPU being
> here or actively threw it of the bus or something. But a normal system
> suspend/resume cycle brings the GPU back online (doing a rescan via
> sysfs gets the device detected again)

Can you elaborate a bit what kind of scenario the issue happens (e.g
steps how it reproduces)? It was not 100% clear from the changelog. Also
what the result when the failure happens?

I see there is a script that does something but unfortunately I'm not
fluent in Python so can't extract the steps how the issue can be
reproduced ;-)

One thing that I'm working on is that Linux PCI subsystem misses certain
delays that are needed after D3cold -> D0 transition, otherwise the
device and/or link may not be ready before we access it. What you are
experiencing sounds similar. I wonder if you could try the following
patch and see if it makes any difference?

https://patchwork.kernel.org/patch/11106611/


Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

2019-07-01 Thread Mika Westerberg
On Fri, Jun 28, 2019 at 04:53:02PM +0200, Timur Kristóf wrote:
> On Fri, 2019-06-28 at 17:14 +0300, Mika Westerberg wrote:
> > On Fri, Jun 28, 2019 at 03:33:56PM +0200, Timur Kristóf wrote:
> > > I have two more questions:
> > > 
> > > 1. What is the best way to test that the virtual link is indeed
> > > capable
> > > of 40 Gbit / sec? So far I've been unable to figure out how to
> > > measure
> > > its maximum throughput.
> > 
> > I don't think there is any good way to test it but the Thunderbolt
> > gen 3
> > link is pretty much always 40 Gb/s (20 Gb/s x 2) from which the
> > bandwidth is shared dynamically between different tunnels (virtual
> > links).
> 
> That's unfortunate, I would have expected there to be some sort of PCIe
> speed test utility.
> 
> Now that I gave it a try, I can measure ~20 Gbit/sec when I run Gnome
> Wayland on this system (which forces the eGPU to send the framebuffer
> back and forth all the time - for two 4K monitors). But it still
> doesn't give me 40 Gbit/sec.

How do you measure that? Is there a DP stream also? As I said the
bandwidth is dynamically shared between the consumers so you probably do
not get the full bandwidth for PCIe only because it needs to reserve
something for possible DP streams and so on.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

2019-07-01 Thread Mika Westerberg
On Mon, Jul 01, 2019 at 10:46:34AM -0400, Alex Deucher wrote:
> > 2. As far as I understood what Mika said, there isn't really a 2.5 GT/s
> > limitation there, since the virtual link should be running at 40 Gb/s
> > regardless of the reported speed of that device. Would it be possible
> > to run the AMD GPU at 8 GT/s in this case?
> 
> If there is really a faster link here then we need some way to pass
> that information to the drivers.  We rely on the information from the
> upstream bridges and the pcie core helper functions.

I think you may use "pci_dev->is_thunderbolt" in the GPU driver and then
just use whatever the real PCI link speed & width is between the GPU and
the downstream port it connects to.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

2019-06-30 Thread Mika Westerberg
On Fri, Jun 28, 2019 at 12:23:09PM +0200, Timur Kristóf wrote:
> Hi guys,
> 
> I use an AMD RX 570 in a Thunderbolt 3 external GPU box.
> dmesg gives me the following message:
> pci :3a:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 
> link at :04:04.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
> 
> Here is a tree view of the devices as well as the output of lspci -vvv:
> https://pastebin.com/CSsS2akZ
> 
> The critical path of the device tree looks like this:
> 
> 00:1c.4 Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
> 03:00.0 Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 
> 4C 2016] (rev 02)
> 04:04.0 Intel Corporation JHL6540 Thunderbolt 3 Bridge (C step) [Alpine Ridge 
> 4C 2016] (rev 02)
> 3a:00.0 Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 3b:01.0 Intel Corporation DSL6540 Thunderbolt 3 Bridge [Alpine Ridge 4C 2015]
> 3c:00.0 Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 
> 470/480/570/570X/580/580X/590] (rev ef)
> 
> Here is the weird part:
> 
> Accoding to lspci, all of these devices report in their LnkCap that
> they support 8 GT/s, except the 04:04.0 and 3a:00.0 which say they only
> support 2.5 GT/s. Contradictory to lspci, sysfs on the other hand says
> that both of them are capable of 8 GT/s as well:
> "/sys/bus/pci/devices/:04:04.0/max_link_speed" and
> "/sys/bus/pci/devices/:3a:00.0/max_link_speed" are 8 GT/s.
> It seems that there is a discrepancy between what lspci thinks and what
> the devices are actually capable of.
> 
> Questions:
> 
> 1. Why are there four bridge devices? 04:00.0, 04:01.0 and 04:02.0 look
> superfluous to me and nothing is connected to them. It actually gives
> me the feeling that the TB3 driver creates 4 devices with 2.5 GT/s
> each, instead of one device that can do the full 8 GT/s.

Because it is standard PCIe switch with one upstream port and n
downstream ports.

> 2. Why are some of the bridge devices only capable of 2.5 GT/s
> according to lspci?

You need to talk to lspci maintainer.

> 3. Is it possible to manually set them to 8 GT/s?

No idea.

Are you actually seeing some performance issue because of this or are
you just curious?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

2019-06-30 Thread Mika Westerberg
On Fri, Jun 28, 2019 at 01:08:07PM +0200, Timur Kristóf wrote:
> Hi Mika,
> 
> Thanks for your quick reply.
> 
> > > 1. Why are there four bridge devices? 04:00.0, 04:01.0 and 04:02.0
> > > look
> > > superfluous to me and nothing is connected to them. It actually
> > > gives
> > > me the feeling that the TB3 driver creates 4 devices with 2.5 GT/s
> > > each, instead of one device that can do the full 8 GT/s.
> > 
> > Because it is standard PCIe switch with one upstream port and n
> > downstream ports.
> 
> Sure, though in this case 3 of those downstream ports are not exposed
> by the hardware, so it's a bit surprising to see them there.

They lead to other peripherals on the TBT host router such as the TBT
controller and xHCI. Also there are two downstream ports for extension
from which you eGPU is using one.

> Why I asked about it is because I have a suspicion that maybe the
> bandwidth is allocated equally between the 4 downstream ports, even
> though only one of them is used.
> 
> > 
> > > 2. Why are some of the bridge devices only capable of 2.5 GT/s
> > > according to lspci?
> > 
> > You need to talk to lspci maintainer.
> 
> Sorry if the question was unclear.
> It's not only lspci, the kernel also prints a warning about it.
> 
> Like I said the device really is limited to 2.5 GT/s even though it
> should be able to do 8 GT/s.

There is Thunderbolt link between the host router (your host system) and
the eGPU box. That link is not limited to 2.5 GT/s so even if the slot
claims it is PCI gen1 the actual bandwidth can be much higher because of
the virtual link.

> > > 3. Is it possible to manually set them to 8 GT/s?
> > 
> > No idea.
> > 
> > Are you actually seeing some performance issue because of this or are
> > you just curious?
> 
> Yes, I see a noticable performance hit: some games have very low frame
> rate while neither the CPU nor the GPU are fully utilized.

Is that problem in Linux only or do you see the same issue in Windows as
well?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

2019-06-30 Thread Mika Westerberg
On Fri, Jun 28, 2019 at 03:33:56PM +0200, Timur Kristóf wrote:
> I have two more questions:
> 
> 1. What is the best way to test that the virtual link is indeed capable
> of 40 Gbit / sec? So far I've been unable to figure out how to measure
> its maximum throughput.

I don't think there is any good way to test it but the Thunderbolt gen 3
link is pretty much always 40 Gb/s (20 Gb/s x 2) from which the
bandwidth is shared dynamically between different tunnels (virtual links).

> 2. Why is it that the game can only utilize as much as 2.5 Gbit / sec
> when it gets bottlenecked? The same problem is not present on a desktop
> computer with a "normal" PCIe port.

This is outside of my knowledge, sorry. How that game even knows it can
"utilize" only 2.5 Gbit/s. Does it go over the output of "lspci" as well? :-)

The PCIe links itself should to get you the 8 GT/s x 4 and I'm quite
sure the underlying TBT link works fine as well so my guess is that the
issue lies somewhere else but where, I have no idea.

Maybe the problem is in the game itself?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Why is Thunderbolt 3 limited to 2.5 GT/s on Linux?

2019-06-30 Thread Mika Westerberg
On Fri, Jun 28, 2019 at 02:21:36PM +0200, Timur Kristóf wrote:
> 
> > > Sure, though in this case 3 of those downstream ports are not
> > > exposed
> > > by the hardware, so it's a bit surprising to see them there.
> > 
> > They lead to other peripherals on the TBT host router such as the TBT
> > controller and xHCI. Also there are two downstream ports for
> > extension
> > from which you eGPU is using one.
> 
> If you look at the device tree from my first email, you can see that
> both the GPU and the XHCI uses the same port: 04:04.0 - in fact I can
> even remove the other 3 ports from the system without any consequences.

Well that's the extension PCIe downstream port. The other one is
04:01.0.

Typically 04:00.0 and 04:00.2 are used to connect TBT (05:00.0) and xHCI
(39:00.0) but in your case you don't seem to have USB 3 devices
connected to that so it is not present. If you plug in USB-C device
(non-TBT) you should see the host router xHCI appearing as well.

This is pretty standard topology.

> > > Like I said the device really is limited to 2.5 GT/s even though it
> > > should be able to do 8 GT/s.
> > 
> > There is Thunderbolt link between the host router (your host system)
> > and
> > the eGPU box. That link is not limited to 2.5 GT/s so even if the
> > slot
> > claims it is PCI gen1 the actual bandwidth can be much higher because
> > of
> > the virtual link.
> 
> Not sure I understand correctly, are you saying that TB3 can do 40
> Gbit/sec even though the kernel thinks it can only do 8 Gbit / sec?

Yes the PCIe switch upstream port (3a:00.0) is connected back to the
host router over virtual Thunderbolt 40Gb/s link so the PCIe gen1 speeds
it reports do not really matter here (same goes for the downstream).

The topology looks like bellow if I got it right from the lspci output:

  00:1c.4 (root port) 8 GT/s x 4
^
| real PCIe link
v
  03:00.0 (upstream port) 8 GT/s x 4
  04:04.0 (downstream port) 2.5 GT/s x 4
^
|  virtual link 40 Gb/s
v
  3a:00.0 (upstream port) 2.5 GT/s x 4
  3b:01.0 (downstream port) 8 GT/s x 4
^
| real PCIe link
v
  3c:00.0 (eGPU) 8 GT/s x 4

In other words all the real PCIe links run at full 8 GT/s x 4 which is
what is expected, I think.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Missing Thunderbolt 3 PCI-E atomics support

2019-03-15 Thread Mika Westerberg
On Thu, Mar 14, 2019 at 06:54:21PM +0100, Timur Kristóf wrote:
> On Thu, 2019-03-14 at 19:40 +0200, Mika Westerberg wrote:
> > On Thu, Mar 14, 2019 at 06:26:00PM +0100, Timur Kristóf wrote:
> > > I know atomics is a PCIe feature, but in this case the PCIe goes
> > > through TB3, so I would assume it has something to do with it.
> > 
> > Does it work if you plug the graphics card directly to the PCIe slot?
> 
> There is no PCIe slot in which I could plug the graphics card.
> At least I'm not aware of there being one on this laptop.
> Please correct me if I'm wrong.

No, you are right. I forgot that this is a laptop.

> > 
> > > Here is the output of 'lspci -vv':
> > > https://pastebin.com/Qt5RUFVc
> > 
> > The root port (1c.4) says this:
> > 
> >   DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF
> > Not Supported ARIFwd+
> >AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
> > 
> > Not knowing much about AtomicOps but to me this looks like the root
> > port
> > does not support the feature.
> 
> What kind of output should lspci show if the feature were supported?

The AMD card has this:

 DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR+, OBFF 
Not Supported
  AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-

so I would expect something similar on the root port side as
pci_enable_atomic_ops_to_root() fails otherwise with mask of
PCI_EXP_DEVCAP2_ATOMIC_COMP32 | PCI_EXP_DEVCAP2_ATOMIC_COMP64 that the
AMD driver requests.

> As far as I understand the root port is integrated in the CPU, or in
> the chipset maybe? It says it's a Sunrise Point-LP, and I googled it
> but was unable to find a spec sheet.

You can find it here:

  
https://www.intel.com/content/www/us/en/products/docs/processors/core/6th-gen-core-pch-u-y-io-datasheet-vol-2.html

Pages 845-826 show the DEVCAP2 register for the 1c.4 (D28/F4) and it
does not seem to have AtomicOps caps set.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Missing Thunderbolt 3 PCI-E atomics support

2019-03-15 Thread Mika Westerberg
On Wed, Mar 13, 2019 at 07:09:26PM +0100, Timur Kristóf wrote:
> Hi,

Hi,

> I was sent here by Greg KH from the Linux USB mailing list, I hope this
> is the right place to ask.
> 
> PCI-E atomics don't work for me with Thunderbolt 3.
> I see the following message from my Thunderbolt 3 eGPU in dmesg:
> 
> kfd: skipped device 1002:67df, PCI rejects atomics
> 
> Hardware is a Dell XPS 13 9370 (with i7-8550U CPU) connected to a Zotac
> AMP mini, with an AMD RX 570 graphics card. Due to this, I cannot use
> the GPU for OpenCL, because the compute stack requires PCI-E atomics
> support [1].
> 
> What could be the problem? Is this a hardware limitation or a missing
> feature in the Linux TB driver?

I don't think it has anything to do with TBT itself. AtomicOps is a PCIe
feature.

What does 'sudo lspci -vv' show for the ports in question?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Missing Thunderbolt 3 PCI-E atomics support

2019-03-15 Thread Mika Westerberg
On Thu, Mar 14, 2019 at 06:26:00PM +0100, Timur Kristóf wrote:
> I know atomics is a PCIe feature, but in this case the PCIe goes
> through TB3, so I would assume it has something to do with it.

Does it work if you plug the graphics card directly to the PCIe slot?

> Here is the output of 'lspci -vv':
> https://pastebin.com/Qt5RUFVc

The root port (1c.4) says this:

  DevCap2: Completion Timeout: Range ABC, TimeoutDis+, LTR+, OBFF Not 
Supported ARIFwd+
   AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-

Not knowing much about AtomicOps but to me this looks like the root port
does not support the feature.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 2/3] ACPI / PMIC: Implement exec_mipi_pmic_seq_element for CHT Whiskey Cove PMIC

2018-12-16 Thread Mika Westerberg
On Fri, Dec 14, 2018 at 11:48:35AM +0100, Hans de Goede wrote:
> > > +#include 
> > 
> > Why is this include needed?
> 
> It is no longer needed in v4, since the parsing of the raw
> MIPI sequence data (which needed this include) has been moved
> to the i915 VBT code now.
> 
> I've dropped this from my local version of the patch.

OK.

> Note sure if you (Mika) are the right person to ask, but in the
> coverletter of v1 I suggested merging all 3 patches through the i915 tree
> since the drivers/acpi/pmic/intel_pmic* files typically do
> not see all that churn.  If I can get an Ack from you or
> Rafael for that then I can push the version with the include
> dropped to drm-next (through drm-intel-next-queued) myself
> once the 3th patch also has been acked.

I guess it is up to Rafael whether my ack is enough but here it is for
both patches,

Acked-by: Mika Westerberg 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 2/3] ACPI / PMIC: Implement exec_mipi_pmic_seq_element for CHT Whiskey Cove PMIC

2018-12-16 Thread Mika Westerberg
On Thu, Dec 13, 2018 at 04:35:32PM +0100, Hans de Goede wrote:
> Implement the exec_mipi_pmic_seq_element callback for the CHT Whiskey Cove
> PMIC.
> 
> On some CHT devices this fixes the LCD panel not lighting up when it was
> not initialized by the GOP, because an external monitor was plugged in and
> the GOP initialized only the external monitor.
> 
> Signed-off-by: Hans de Goede 

One question see below, but regardless

Reviewed-by: Mika Westerberg 

> ---
> Changes in v4:
> -The decoding of the raw data of the PMIC MIPI sequence element is now done
>  in our caller, so drop this and adjust the callback prototype to accept
>  the decoded addresses, value and mask
> 
> Changes in v3:
> -Use hex values for out of range checks
> -Make intel_cht_wc_exec_mipi_pmic_seq_element return errors
> 
> Changes in v2:
> -Interpret data passed to the PMIC MIPI elements according to the docs
>  instead of my own reverse engineered interpretation
> ---
>  drivers/acpi/pmic/intel_pmic_chtwc.c | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/acpi/pmic/intel_pmic_chtwc.c 
> b/drivers/acpi/pmic/intel_pmic_chtwc.c
> index 078b0448f30a..c5037c5c5219 100644
> --- a/drivers/acpi/pmic/intel_pmic_chtwc.c
> +++ b/drivers/acpi/pmic/intel_pmic_chtwc.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Why is this include needed?
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 1/3] ACPI / PMIC: Add support for executing PMIC MIPI sequence elements

2018-12-16 Thread Mika Westerberg
On Thu, Dec 13, 2018 at 04:35:31PM +0100, Hans de Goede wrote:
> DSI LCD panels describe an initialization sequence in the Video BIOS
> Tables using so called MIPI sequences. One possible element in these
> sequences is a PMIC specific element of 15 bytes.
> 
> Although this is not really an ACPI opregion, the ACPI opregion code is the
> closest thing we have. We need to have support for these PMIC specific MIPI
> sequence elements somwhere. Since we already instantiate a special platform
> device for Intel PMICs for the ACPI PMIC OpRegion handler to bind to,
> with PMIC specific implementations of the OpRegion, the handling of MIPI
> sequence PMIC elements fits very well in the ACPI PMIC OpRegion code.
> 
> This commit adds a new intel_soc_pmic_exec_mipi_pmic_seq_element()
> function, which is to be backed by a PMIC specific
> exec_mipi_pmic_seq_element callback. This function will be called by the
> i915 code to execture MIPI sequence PMIC elements.
> 
> Signed-off-by: Hans de Goede 

Reviewed-by: Mika Westerberg 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 2/3] ACPI / PMIC: Implement exec_mipi_pmic_seq_element for CHT Whiskey Cove PMIC

2018-12-10 Thread Mika Westerberg
On Thu, Dec 06, 2018 at 02:47:04PM +0100, Hans de Goede wrote:
> Implement the exec_mipi_pmic_seq_element callback for the CHT Whiskey Cove
> PMIC.
> 
> On some CHT devices this fixes the LCD panel not lighting up when it was
> not initialized by the GOP, because an external monitor was plugged in and
> the GOP initialized only the external monitor.
> 
> Signed-off-by: Hans de Goede 
> ---
> Changes in v2:
> -Interpret data passed to the PMIC MIPI elements according to the docs
>  instead of my own reverse engineered interpretation
> ---
>  drivers/acpi/pmic/intel_pmic_chtwc.c | 24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/acpi/pmic/intel_pmic_chtwc.c 
> b/drivers/acpi/pmic/intel_pmic_chtwc.c
> index 078b0448f30a..049c0cf3b9ed 100644
> --- a/drivers/acpi/pmic/intel_pmic_chtwc.c
> +++ b/drivers/acpi/pmic/intel_pmic_chtwc.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "intel_pmic.h"
>  
>  #define CHT_WC_V1P05A_CTRL   0x6e3b
> @@ -231,6 +232,28 @@ static int intel_cht_wc_pmic_update_power(struct regmap 
> *regmap, int reg,
>   return regmap_update_bits(regmap, reg, bitmask, on ? 1 : 0);
>  }
>  
> +static void intel_cht_wc_exec_mipi_pmic_seq_element(struct regmap *regmap,
> + const u8 *data)
> +{
> + u32 value, mask, reg_address, address;
> + u16 i2c_client_address;
> +
> + /* byte 0 aka PMIC Flag is reserved */
> + i2c_client_address  = get_unaligned_le16(data + 1);
> + reg_address = get_unaligned_le32(data + 3);
> + value   = get_unaligned_le32(data + 7);
> + mask= get_unaligned_le32(data + 11);
> +
> + if (i2c_client_address > 255 || reg_address > 255) {
> + pr_warn("%s warning addresses too big client 0x%x reg 0x%x\n",
> + __func__, i2c_client_address, reg_address);
> + return;

Here also return an error code instead.

> + }
> +
> + address = (i2c_client_address << 8) | reg_address;
> + regmap_update_bits(regmap, address, mask, value);
> +}
> +
>  /*
>   * The thermal table and ops are empty, we do not support the Thermal 
> opregion
>   * (DPTF) due to lacking documentation.
> @@ -238,6 +261,7 @@ static int intel_cht_wc_pmic_update_power(struct regmap 
> *regmap, int reg,
>  static struct intel_pmic_opregion_data intel_cht_wc_pmic_opregion_data = {
>   .get_power  = intel_cht_wc_pmic_get_power,
>   .update_power   = intel_cht_wc_pmic_update_power,
> + .exec_mipi_pmic_seq_element = intel_cht_wc_exec_mipi_pmic_seq_element,
>   .power_table= power_table,
>   .power_table_count  = ARRAY_SIZE(power_table),
>  };
> -- 
> 2.19.2
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v2 1/3] ACPI / PMIC: Add support for executing PMIC MIPI sequence elements

2018-12-10 Thread Mika Westerberg
On Thu, Dec 06, 2018 at 02:47:03PM +0100, Hans de Goede wrote:
> DSI LCD panels describe an initialization sequence in the Video BIOS
> Tables using so called MIPI sequences. One possible element in these
> sequences is a PMIC specific element of 15 bytes.
> 
> Although this is not really an ACPI opregion, the ACPI opregion code is the
> closest thing we have. We need to have support for these PMIC specific MIPI
> sequence elements somwhere. Since we already instantiate a special platform
> device for Intel PMICs for the ACPI PMIC OpRegion handler to bind to,
> with PMIC specific implementations of the OpRegion, the handling of MIPI
> sequence PMIC elements fits very well in the ACPI PMIC OpRegion code.
> 
> This commit adds a new intel_soc_pmic_exec_mipi_pmic_seq_element()
> function, which is to be backed by a PMIC specific
> exec_mipi_pmic_seq_element callback. This function will be called by the
> i915 code to execture MIPI sequence PMIC elements.
> 
> Signed-off-by: Hans de Goede 
> ---
>  drivers/acpi/pmic/intel_pmic.c | 25 +
>  drivers/acpi/pmic/intel_pmic.h |  1 +
>  include/linux/mfd/intel_soc_pmic.h |  2 ++
>  3 files changed, 28 insertions(+)
> 
> diff --git a/drivers/acpi/pmic/intel_pmic.c b/drivers/acpi/pmic/intel_pmic.c
> index ca18e0d23df9..0d96ca08bb79 100644
> --- a/drivers/acpi/pmic/intel_pmic.c
> +++ b/drivers/acpi/pmic/intel_pmic.c
> @@ -15,6 +15,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include "intel_pmic.h"
> @@ -36,6 +37,8 @@ struct intel_pmic_opregion {
>   struct intel_pmic_regs_handler_ctx ctx;
>  };
>  
> +static struct intel_pmic_opregion *intel_pmic_opregion;
> +
>  static int pmic_get_reg_bit(int address, struct pmic_table *table,
>   int count, int *reg, int *bit)
>  {
> @@ -304,6 +307,7 @@ int intel_pmic_install_opregion_handler(struct device 
> *dev, acpi_handle handle,
>   }
>  
>   opregion->data = d;
> + intel_pmic_opregion = opregion;
>   return 0;
>  
>  out_remove_thermal_handler:
> @@ -319,3 +323,24 @@ int intel_pmic_install_opregion_handler(struct device 
> *dev, acpi_handle handle,
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(intel_pmic_install_opregion_handler);
> +

Since this is exported, please add kernel-doc here.

> +void intel_soc_pmic_exec_mipi_pmic_seq_element(const u8 *data)
> +{
> + struct intel_pmic_opregion_data *d;
> +
> + if (!intel_pmic_opregion) {
> + pr_warn("%s: No PMIC registered\n", __func__);
> + return;

Why not return error codes instead?

Other ops in struct intel_pmic_opregion_data seem to do so.

> + }
> +
> + d = intel_pmic_opregion->data;
> + if (!d->exec_mipi_pmic_seq_element) {
> + pr_warn("%s: Not implemented\n", __func__);
> + return;

Ditto.

> + }
> +
> + mutex_lock(_pmic_opregion->lock);
> + d->exec_mipi_pmic_seq_element(intel_pmic_opregion->regmap, data);
> + mutex_unlock(_pmic_opregion->lock);
> +}
> +EXPORT_SYMBOL_GPL(intel_soc_pmic_exec_mipi_pmic_seq_element);
> diff --git a/drivers/acpi/pmic/intel_pmic.h b/drivers/acpi/pmic/intel_pmic.h
> index 095afc96952e..5a7bb33d046a 100644
> --- a/drivers/acpi/pmic/intel_pmic.h
> +++ b/drivers/acpi/pmic/intel_pmic.h
> @@ -15,6 +15,7 @@ struct intel_pmic_opregion_data {
>   int (*update_aux)(struct regmap *r, int reg, int raw_temp);
>   int (*get_policy)(struct regmap *r, int reg, int bit, u64 *value);
>   int (*update_policy)(struct regmap *r, int reg, int bit, int enable);
> + void (*exec_mipi_pmic_seq_element)(struct regmap *r, const u8 *data);
>   struct pmic_table *power_table;
>   int power_table_count;
>   struct pmic_table *thermal_table;
> diff --git a/include/linux/mfd/intel_soc_pmic.h 
> b/include/linux/mfd/intel_soc_pmic.h
> index ed1dfba5e5f9..ce04ad7d4b6c 100644
> --- a/include/linux/mfd/intel_soc_pmic.h
> +++ b/include/linux/mfd/intel_soc_pmic.h
> @@ -26,4 +26,6 @@ struct intel_soc_pmic {
>   struct device *dev;
>  };
>  
> +void intel_soc_pmic_exec_mipi_pmic_seq_element(const u8 *data);
> +
>  #endif   /* __INTEL_SOC_PMIC_H__ */
> -- 
> 2.19.2
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

2018-11-29 Thread Mika Westerberg
On Tue, Nov 27, 2018 at 09:49:44PM -0500, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 11:36:50AM +0200, Mika Westerberg wrote:
> > +linux-acpi
> > 
> > Hi Michael,
> > 
> > On Mon, Nov 26, 2018 at 10:53:26PM -0500, Michael S. Tsirkin wrote:
> > > So a new thinkpad:
> > > 01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro 
> > > P2000 Mobile] (rev a1)
> > > 
> > > Hangs whenever I try to poke at the card. It starts happily enough with
> > > 
> > > [3.971515] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type 
> > > mismatch - Found [Buffer], ACPI requires [Package] 
> > > (20181003/nsarguments-66)
> > > [3.971553] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type 
> > > mismatch - Found [Buffer], ACPI requires [Package] 
> > > (20181003/nsarguments-66)
> > > [3.971721] pci :01:00.0: optimus capabilities: enabled, status 
> > > dynamic power, hda bios codec supported
> > > [3.971726] VGA switcheroo: detected Optimus DSM method 
> > > \_SB_.PCI0.PEG0.PEGP handle
> > > [3.971727] nouveau: detected PR support, will not use DSM
> 
> BTW this is also a bit strange. It says it will not use DSM - but did
> it maybe use DSM previously? The ACPI Warning seems to indicate that
> it did ...
> 
> And just to complete the picture here's the _DSM from ACPI:
> 
> Method (_DSM, 4, Serialized)  // _DSM: Device-Specific Method
> {

...

> }
> 
> 
> I am not sure what makes it think that Arg3 is a buffer and
> not a package: IIRC Index (decoded C-style here as []) can apply
> to either.
> 
> Am I maybe misunderstanding the warning?

It looks like coming from the nouveau driver (assuming I'm reading it right).

drivers/gpu/drm/nouveau/nouveau_acpi.c::nouveau_optimus_dsm()

union acpi_object argv4 = {
.buffer.type = ACPI_TYPE_BUFFER,
.buffer.length = 4,
.buffer.pointer = args_buff
};

...

obj = acpi_evaluate_dsm_typed(handle, _op_dsm_muid, 0x0100,
  func, , ACPI_TYPE_BUFFER);


It passes ACPI_TYPE_BUFFER but ACPI spec _DSM expects package.

> > > [3.971745] nouveau :01:00.0: enabling device (0006 -> 0007)
> > > [3.971923] nouveau :01:00.0: NVIDIA GP107 (137000a1)
> > > [4.009875] PM: Image not found (code -22)
> > > [4.135752] nouveau :01:00.0: DRM: VRAM: 4096 MiB
> > > [4.135753] nouveau :01:00.0: DRM: GART: 536870912 MiB
> > > [4.135754] nouveau :01:00.0: DRM: BIT table 'A' not found
> > > [4.135755] nouveau :01:00.0: DRM: BIT table 'L' not found
> > > [4.135756] nouveau :01:00.0: DRM: TMDS table version 2.0
> > > [4.135756] nouveau :01:00.0: DRM: DCB version 4.1
> > > [4.135757] nouveau :01:00.0: DRM: DCB outp 00: 02800f76 04600020
> > > [4.135758] nouveau :01:00.0: DRM: DCB outp 01: 02011f62 00020010
> > > [4.135759] nouveau :01:00.0: DRM: DCB outp 02: 01022f46 04600010
> > > [4.135760] nouveau :01:00.0: DRM: DCB outp 03: 01033f56 04600020
> > > [4.135761] nouveau :01:00.0: DRM: DCB conn 00: 00020047
> > > [4.135761] nouveau :01:00.0: DRM: DCB conn 01: 00010161
> > > [4.135762] nouveau :01:00.0: DRM: DCB conn 02: 1246
> > > [4.135763] nouveau :01:00.0: DRM: DCB conn 03: 2346
> > > [4.508355] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > > [4.508355] [drm] Driver supports precise vblank timestamp query.
> > > [4.509812] [drm] Cannot find any crtc or sizes
> > > [4.510144] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 
> > > on minor 2
> > > 
> > > 
> > > Although that type mismatch is a bit worrying. And I'm not sure what
> > > prints PM: Image not found.
> > 
> > That is fine, it just says it cannot find a hibernation image from swap
> > device. I guess you have resume=... in the kernel command line.
> > 
> > > But after a short while it gets pretty busy:
> > > 
> > > 
> > > [   52.917009] No Local Variables are initialized for Method [NVPO]
> > > [   52.917011] No Arguments are initialized for method [NVPO]
> > > [   52.917012] ACPI Error: Method parse/execution failed 
> > > \_SB.PCI0.PEG0.PEGP.NVPO, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> > > [   52.917063] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON, 
> > > AE_AML_LOOP_TIMEOUT (201810

Re: 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

2018-11-29 Thread Mika Westerberg
On Wed, Nov 28, 2018 at 10:09:22AM -0500, Michael S. Tsirkin wrote:
> Yea all this is weird, in particular I wonder why does everyone
> using dsm insists on saying Arg4
> when they actually mean Arg3. ACPI numbers arguments from 0.
> 
> So it's a bit ugly, and maybe worth fixing but unlikely to be
> an actual issue simply because we end up not using DSM in the end.

I agree.

> Poking at the probing code in nouveau_pr3_present, I started to wonder:
> should I try to hack it to disable d3cold and pr3 and see what
> happens?

I guess it is worth a try. You can do it from sysfs for the graphics
PCI device there is an attribute d3cold_allowed that controls this.

[snip]

> > > 00:14.3 Network controller: Intel Corporation Wireless-AC 9560 [Jefferson 
> > > Peak] (rev 10)
> > > 
> > > so really shouldn't be affected, but go figure. If driver really is 
> > > getting
> > > all-ones from the device, it just might try to poke at a wrong b:d.f by 
> > > mistake
> > > maybe ...
> > 
> > Or it the power resource is shared by wifi as well.
> 
> Is there a way to find out through e.g. sysfs?

It is not shared, I checked from the acpidump you provided. Possibly the
infinite loop in AML when executing NVPO method have some effect on
this.

[snip]

> > No need to send, I can read it from the bugzilla just fine. Can you attach
> > acpidump there as well?
> 
> Done. lspci -x too just in case.

Looking at the dmesg:

[   52.917009] No Local Variables are initialized for Method [NVPO]
[   52.917011] No Arguments are initialized for method [NVPO]
[   52.917012] ACPI Error: Method parse/execution failed 
\_SB.PCI0.PEG0.PEGP.NVPO, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[   52.917063] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON, 
AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
[   52.917084] ACPI Error: Method parse/execution failed 
\_SB.PCI0.PEG0.PG00._ON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)

So what happens here is that Linux turns off power resource
\_SB.PCI0.PEG0.PG00 by calling its _OFF method (happens when the root
port is runtime suspended). This ends up calling \_SB.PCI0.PGON which
calls \_SB.PCI0.PEG0.PEGP.NVPO.

The last method looks like this:

   Method (NVPO, 0, NotSerialized)
{
While ((\_SB.PCI0.P0LS < 0x03))
{
Sleep (One)
}

So basically it polls P0LS register infinitely if the returned value is
less than 3. I suspect this is the issue and it then makes the other
like wifi to fail to execute its methods.

P0LS comes from this operation region:

OperationRegion (OPG0, SystemMemory, (XBAS + 0x8000), 0x1000)
Field (OPG0, AnyAcc, NoLock, Preserve)
{
...
Offset (0x216),
P0LS,   4,

This is some host bridge register but not sure which because XBAS value
cannot be determined from the acpidump.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: 4.20.0-rc3 nouveau/Quadro P2000 Mobile: runpm causing ACPI errors, lockups

2018-11-28 Thread Mika Westerberg
+linux-acpi

Hi Michael,

On Mon, Nov 26, 2018 at 10:53:26PM -0500, Michael S. Tsirkin wrote:
> So a new thinkpad:
> 01:00.0 VGA compatible controller: NVIDIA Corporation GP107GLM [Quadro P2000 
> Mobile] (rev a1)
> 
> Hangs whenever I try to poke at the card. It starts happily enough with
> 
> [3.971515] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - 
> Found [Buffer], ACPI requires [Package] (20181003/nsarguments-66)
> [3.971553] ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type 
> mismatch - Found [Buffer], ACPI requires [Package] (20181003/nsarguments-66)
> [3.971721] pci :01:00.0: optimus capabilities: enabled, status 
> dynamic power, hda bios codec supported
> [3.971726] VGA switcheroo: detected Optimus DSM method 
> \_SB_.PCI0.PEG0.PEGP handle
> [3.971727] nouveau: detected PR support, will not use DSM
> [3.971745] nouveau :01:00.0: enabling device (0006 -> 0007)
> [3.971923] nouveau :01:00.0: NVIDIA GP107 (137000a1)
> [4.009875] PM: Image not found (code -22)
> [4.135752] nouveau :01:00.0: DRM: VRAM: 4096 MiB
> [4.135753] nouveau :01:00.0: DRM: GART: 536870912 MiB
> [4.135754] nouveau :01:00.0: DRM: BIT table 'A' not found
> [4.135755] nouveau :01:00.0: DRM: BIT table 'L' not found
> [4.135756] nouveau :01:00.0: DRM: TMDS table version 2.0
> [4.135756] nouveau :01:00.0: DRM: DCB version 4.1
> [4.135757] nouveau :01:00.0: DRM: DCB outp 00: 02800f76 04600020
> [4.135758] nouveau :01:00.0: DRM: DCB outp 01: 02011f62 00020010
> [4.135759] nouveau :01:00.0: DRM: DCB outp 02: 01022f46 04600010
> [4.135760] nouveau :01:00.0: DRM: DCB outp 03: 01033f56 04600020
> [4.135761] nouveau :01:00.0: DRM: DCB conn 00: 00020047
> [4.135761] nouveau :01:00.0: DRM: DCB conn 01: 00010161
> [4.135762] nouveau :01:00.0: DRM: DCB conn 02: 1246
> [4.135763] nouveau :01:00.0: DRM: DCB conn 03: 2346
> [4.508355] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [4.508355] [drm] Driver supports precise vblank timestamp query.
> [4.509812] [drm] Cannot find any crtc or sizes
> [4.510144] [drm] Initialized nouveau 1.3.1 20120801 for :01:00.0 on 
> minor 2
> 
> 
> Although that type mismatch is a bit worrying. And I'm not sure what
> prints PM: Image not found.

That is fine, it just says it cannot find a hibernation image from swap
device. I guess you have resume=... in the kernel command line.

> But after a short while it gets pretty busy:
> 
> 
> [   52.917009] No Local Variables are initialized for Method [NVPO]
> [   52.917011] No Arguments are initialized for method [NVPO]
> [   52.917012] ACPI Error: Method parse/execution failed 
> \_SB.PCI0.PEG0.PEGP.NVPO, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> [   52.917063] ACPI Error: Method parse/execution failed \_SB.PCI0.PGON, 
> AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> [   52.917084] ACPI Error: Method parse/execution failed 
> \_SB.PCI0.PEG0.PG00._ON, AE_AML_LOOP_TIMEOUT (20181003/psparse-516)
> [   52.917108] acpi device:00: Failed to change power state to D0

Here it seems to fail to turn on the power resource for the device.

> [   52.969287] video LNXVIDEO:00: Cannot transition to power state D0 for 
> parent in (unknown)
> [   52.969289] pci_raw_set_power_state: 2 callbacks suppressed
> [   52.969291] nouveau :01:00.0: Refused to change power state, currently 
> in D3
> [   53.029514] video LNXVIDEO:00: Cannot transition to power state D0 for 
> parent in (unknown)
> [   53.041027] nouveau :01:00.0: Refused to change power state, currently 
> in D3
> [   53.041035] video LNXVIDEO:00: Cannot transition to power state D0 for 
> parent in (unknown)
> [   53.053008] nouveau :01:00.0: Refused to change power state, currently 
> in D3
> 
> 
> And then kernel proceeds to throw up errors at random places, e.g.
> 
> [   67.021892] cfg80211: failed to load regulatory.db
> [   67.021895] cfg80211: failed to load regulatory.db
> [   67.021897] cfg80211: failed to load regulatory.db
> [   67.021900] cfg80211: failed to load regulatory.db
> [   67.021927] cfg80211: failed to load regulatory.db
> [   67.021928] cfg80211: failed to load regulatory.db
> [   67.021932] cfg80211: failed to load regulatory.db
> [   67.021934] cfg80211: failed to load regulatory.db
> [   67.024463] cfg80211: failed to load regulatory.db
> [   99.980625] iwlwifi :00:14.3: Error sending STATISTICS_CMD: time out 
> after 2000ms.
> 
> followed by soft lockups and sometimes hard lockups in places
> like attempts to walk skb lists.
> 
> Adding runpm=0 does away with this issue.
> 
> The specific test was with noaccel=1 - it does not seem to change
> things for me.
> 
> I poked at the ACPI method NVPO and yes it does actually
> seem to execute a while loop waiting for some register
> to become 0. Which I guess never happens? Because card
> is in a low power state and so 

[PATCH] drm/nouveau/acpi: fix check for power resources support

2016-10-31 Thread Mika Westerberg
On Sat, Oct 29, 2016 at 01:40:03AM +0200, Peter Wu wrote:
> Check whether the kernel really supports power resources for a device,
> otherwise the power might not be removed when the device is runtime
> suspended (DSM should still work in these cases where PR does not).
> 
> Link: https://bugs.freedesktop.org/show_bug.cgi?id=98398
> Fixes: 692a17dcc292 ("drm/nouveau/acpi: fix lockup with PCIe runtime PM")
> Signed-off-by: Peter Wu 
> ---
> Hi,
> 
> Maybe Cc: stable (4.8+)?
> 
> Compile-tested only. Rick, can you test if this patch fixes the problem for 
> you?
> 
> This check was actually in the original patch, but it was changed:
> https://lists.freedesktop.org/archives/nouveau/2016-May/025125.html
> Re-adding the check as suggested by Mika.
> 
> Kind regards,
> Peter
> ---
>  drivers/gpu/drm/nouveau/nouveau_acpi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
> b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index dc57b62..193573d 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -240,7 +240,8 @@ static bool nouveau_pr3_present(struct pci_dev *pdev)
>   if (!parent_adev)
>   return false;
>  
> - return acpi_has_method(parent_adev->handle, "_PR3");

You may want to add comment here telling why we need to check
flags.power_resources. Or alternatively explain this in the changelog
(along the lines that we fail to find PowerResources because ACPICA does
not handle certain conditional things in the same way than Windows 10).

Other than that, looks good

Reviewed-by: Mika Westerberg 

> + return parent_adev->power.flags.power_resources &&
> + acpi_has_method(parent_adev->handle, "_PR3");
>  }
>  
>  static void nouveau_dsm_pci_probe(struct pci_dev *pdev, acpi_handle 
> *dhandle_out,
> -- 
> 2.10.1


[PATCH] drm/nouveau/acpi: use DSM if bridge does not support D3cold

2016-08-30 Thread Mika Westerberg
On Fri, Aug 26, 2016 at 01:00:54AM +0200, Peter Wu wrote:
> Even if PR3 support is available on the bridge, it will not be used if
> the PCI layer considers it unavailable (i.e. on all laptops from 2013
> and 2014). Ensure that this condition is checked to allow a fallback to
> the Optimus DSM for device poweroff.
> 
> Initially I wanted to call pci_d3cold_enable before checking bridge_d3
> (in case the user changed d3cold_allowed), but that is such an unlikely
> case and likely fragile anyway. The current patch is suggested by Mika
> in http://www.spinics.net/lists/linux-pci/msg52599.html
> 
> Cc: Mika Westerberg 

Reviewed-by: Mika Westerberg 


[PATCH v2 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

2016-07-08 Thread Mika Westerberg
On Fri, Jul 08, 2016 at 01:38:27AM +0200, Peter Wu wrote:
> Since "PCI: Add runtime PM support for PCIe ports", the parent PCIe port
> can be runtime-suspended which disables power resources via ACPI. This
> is incompatible with DSM, resulting in a GPU device which is still in D3
> and locks up the kernel on resume (on a Clevo P651RA, GTX965M).
> 
> Mirror the behavior of Windows 8 and newer[1] (as observed via an AMLi
> debugger trace) and stop using the DSM functions for D3cold when power
> resources are available on the parent PCIe port.
> 
> pci_d3cold_disable() is not used because on some machines, the old DSM
> method is broken. On a Lenovo T440p (GT 730M) memory and disk corruption
> would occur, but that is fixed with this patch[2].

Fair enough.

>  [1]: 
> https://msdn.microsoft.com/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold
>  [2]: 
> https://github.com/Bumblebee-Project/bbswitch/issues/78#issuecomment-223549072
> 
>  v2: simply check directly for _PR3. Added affected machines.
> 
> Signed-off-by: Peter Wu 

One nitpick below but otherwise looks reasonable to me.

Reviewed-by: Mika Westerberg 

BTW, thanks for doing this :)

> ---
>  drivers/gpu/drm/nouveau/nouveau_acpi.c | 33 +
>  1 file changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
> b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index ad273ad..38a6445 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -46,6 +46,7 @@ static struct nouveau_dsm_priv {
>   bool dsm_detected;
>   bool optimus_detected;
>   bool optimus_flags_detected;
> + bool optimus_skip_dsm;
>   acpi_handle dhandle;
>   acpi_handle rom_handle;
>  } nouveau_dsm_priv;
> @@ -212,9 +213,26 @@ static const struct vga_switcheroo_handler 
> nouveau_dsm_handler = {
>   .get_client_id = nouveau_dsm_get_client_id,
>  };
>  
> +/* Firmware supporting Windows 8 or later do not use _DSM to put the device 
> into
> + * D3cold, they instead rely on disabling power resources on the parent. */

You should follow standard block comment style here.

> +static bool nouveau_pr3_present(struct pci_dev *pdev)
> +{
> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> + struct acpi_device *parent_adev;
> +
> + if (!parent_pdev)
> + return false;
> +
> + parent_adev = ACPI_COMPANION(_pdev->dev);
> + if (!parent_adev)
> + return false;
> +
> + return acpi_has_method(parent_adev->handle, "_PR3");
> +}


[PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

2016-06-01 Thread Mika Westerberg
On Tue, May 31, 2016 at 01:02:31PM +0200, Peter Wu wrote:
> On Tue, May 31, 2016 at 11:43:56AM +0300, Mika Westerberg wrote:
> > On Mon, May 30, 2016 at 06:13:51PM +0200, Peter Wu wrote:
> > > Do you have any suggestions for the case where the pcieport driver
> > > refuses to put the bridge in D3 (because the BIOS is too old)? In that
> > > case the nouveau driver needs to fallback to the DSM method (but not
> > > when runtime PM is deliberately disabled by writing control=on).
> > 
> > Do you know what Windows does then? I think we should do the same if
> > possible.
> 
> If the BIOS is too old, then it probably does not have _PR3 objects nor
> calls to _OSI("Windows 2013"). See below.
> 
> > If user has disabled runtime PM from the root port deliberately, there
> > might be good reason to do so. Why we want to fallback to something that
> > could cause problems? I mean _DSM on such systems is probably not that
> > much tested because everybody runs Windows 8+ and using standard ACPI
> > power resources.
> 
> I agree that when runtime PM on the root port is disabled (control=on),
> then there should be no fallback to DSM. For devices without _PR3 it is
> clear that DSM will always be used (if available).
> 
> In other cases (where _PR3 is available) we can distinguish:
>  - pre-Windows 8 machines. I have never seen this combination. Firmware
>writers seems to prefer sticking to reference code which did not use
>power resources before.
>  - Machines targeting Windows 8 or newer. (Note that there exist
>machines with Windows 8 support that do not have _PR3, DSM is used in
>that case.)
> 
> If Windows 7 is running on a Windows 8 machine, PR3 will not be used
> anyway. If the Linux kernel claims support for Windows 8, but does not
> use PR3, then we are probably approaching an untested area. So far
> firmware seems fine with using *only* DSM *or* PR3, but at least my
> laptop gets confused when you use both at the same time.
> 
> The latter happens on pci/pm (8b71f565) without other patches:
> 
>  1. nouveau invokes _DSM and _PS3, device is put in D3cold.
>  2. pcieport driver calls PG00._OFF (PG00 is returned by _PR3).
>  3. Wake up Nvidia device (e.g. by power=on).
>  4. This will trigger PG00._ON (via pcieport) and _PS0 (via nouveau).
>  5. Nvidia card is not really ready (observed via "restoring config
> space at offset ... (was 0x, writing ...)", a soft lockup
> and RCU stall after that requiring a reboot to recover).
> 
> nouveau could be patched not to invoke DSM when PR3 is detected
> (proposal is ready) but will keep the device powered on in these cases:
>  - nouveau is patched, but pci/pm patches are not.
>  - PR3 is supported but due to the cutoff date (2015) it is not used.
>  - Boot option pcie_port_pm=off.
>  - runtime PM is disabled for pcieport (should be fine).

Since using only _DSM has been the only method to power down the card
currently inńLinux (even if the root port has had _PR3), and it has been
working fine, why not stick with that when _DSM is supported?

In other words, something like this:

nouveau_dsm_pci_probe()
{
...
if (retval & (NOUVEAU_DSM_HAS_OPT | NOUVEAU_DSM_HAS_MUX)) {
/*
 * We have custom _DSM method to power down the card so
 * prevent the PCI core from transitioning the
 * card into D3cold.
 */
pci_d3cold_disable(pdev);
}
}

(Not sure about those flags above, though).

Yes, it does not follow Windows 8+ but if it works... ;-)

> There is a wealth of acpidumps on Launchpad bug 752542
> (https://bugs.launchpad.net/bugs/752542). Search for example for
> comments in early 2015 or before, those will likely be machine from 2014
> or before.
> 
> Interesting to see is the _PR3 method of a HP Envy TS 15 (11/20/2014):
> 
> Method (_PR3, 0, NotSerialized) {
> If (\_OSI ("Windows 2013")) {
> Return (Package (0x01) {
> \NVP3
> })
> } Else {
> Return (Package (0x00) {})
> }
> }
> 
> (Note for self: just checking for the _PR3 handle in the nouveau patch
> is apparently not sufficient, it must really be evaluated.)
> 
> Other machines with _PR3:
>  - Dell Inspiron 3543 (11/04/2014), comment 757.
>  - Dell XPS 15 9530 (03/28/2014), comment 711.
>  - Novatech 15.6 NSPIRE Laptop (01/20/2014), comment 695.
>  - Lenovo ThinkPad T440p (10/27/2013), comment 659.
> 
> There were many models from 2013 without _PR3 method but still checking
&

[PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

2016-05-31 Thread Mika Westerberg
On Mon, May 30, 2016 at 06:13:51PM +0200, Peter Wu wrote:
> Do you have any suggestions for the case where the pcieport driver
> refuses to put the bridge in D3 (because the BIOS is too old)? In that
> case the nouveau driver needs to fallback to the DSM method (but not
> when runtime PM is deliberately disabled by writing control=on).

Do you know what Windows does then? I think we should do the same if
possible.

If user has disabled runtime PM from the root port deliberately, there
might be good reason to do so. Why we want to fallback to something that
could cause problems? I mean _DSM on such systems is probably not that
much tested because everybody runs Windows 8+ and using standard ACPI
power resources.


[PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

2016-05-30 Thread Mika Westerberg
On Mon, May 30, 2016 at 02:20:10PM +0200, Peter Wu wrote:
> On Mon, May 30, 2016 at 12:57:09PM +0300, Mika Westerberg wrote:
> > +Rafael
> > 
> > On Fri, May 27, 2016 at 01:10:37PM +0200, Peter Wu wrote:
> > > On Wed, May 25, 2016 at 04:55:35PM +0300, Mika Westerberg wrote:
> > > > On Wed, May 25, 2016 at 12:53:01AM +0200, Peter Wu wrote:
> > > > > Since "PCI: Add runtime PM support for PCIe ports", the parent PCIe 
> > > > > port
> > > > > can be runtime-suspended which disables power resources via ACPI. This
> > > > > is incompatible with DSM, resulting in a GPU device which is still in 
> > > > > D3
> > > > > and locks up the kernel on resume.
> > > > > 
> > > > > Mirror the behavior of Windows 8 and newer[1] (as observed via an AMLi
> > > > > debugger trace) and stop using the DSM functions for D3cold when power
> > > > > resources are available on the parent PCIe port.
> > > > > 
> > > > >  [1]: 
> > > > > https://msdn.microsoft.com/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold
> > > > > 
> > > > > Signed-off-by: Peter Wu 
> > > > > ---
> > > > >  drivers/gpu/drm/nouveau/nouveau_acpi.c | 34 
> > > > > ++
> > > > >  1 file changed, 30 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
> > > > > b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> > > > > index df9f73e..e469df7 100644
> > > > > --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> > > > > @@ -46,6 +46,7 @@ static struct nouveau_dsm_priv {
> > > > >   bool dsm_detected;
> > > > >   bool optimus_detected;
> > > > >   bool optimus_flags_detected;
> > > > > + bool optimus_skip_dsm;
> > > > >   acpi_handle dhandle;
> > > > >   acpi_handle rom_handle;
> > > > >  } nouveau_dsm_priv;
> > > > > @@ -212,8 +213,26 @@ static const struct vga_switcheroo_handler 
> > > > > nouveau_dsm_handler = {
> > > > >   .get_client_id = nouveau_dsm_get_client_id,
> > > > >  };
> > > > >  
> > > > > +/* Firmware supporting Windows 8 or later do not use _DSM to put the 
> > > > > device into
> > > > > + * D3cold, they instead rely on disabling power resources on the 
> > > > > parent. */
> > > > > +static bool nouveau_pr3_present(struct pci_dev *pdev)
> > > > > +{
> > > > > + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> > > > > + struct acpi_device *ad;
> > > > 
> > > > Nit: please call this adev instead of ad.
> > > 
> > > Will do.
> > > 
> > > > > +
> > > > > + if (!parent_pdev)
> > > > > + return false;
> > > > > +
> > > > > + ad = ACPI_COMPANION(_pdev->dev);
> > > > > + if (!ad)
> > > > > + return false;
> > > > > +
> > > > > + return ad->power.flags.power_resources;
> > > > 
> > > > Is this sufficient to tell if the parent device has _PR3? I thought it
> > > > returns true if it has power resources in general, not necessarily _PR3.
> > > > 
> > > > Otherwise this looks okay to me.
> > > 
> > > It is indeed set whenever there is any _PRx method. I wonder if it is
> > > appropriate to access fields directly like this, perhaps this would be
> > > more accurate (based on device_pm.c):
> > > 
> > > /* Check whether the _PR3 method is available. */
> > > return adev->power.states[ACPI_STATE_D3_COLD].flags.valid;
> > > 
> > > I am also considering adding a check in case the pcieport driver does
> > > not support D3cold via runtime PM, what do you think of this?
> > > 
> > > if (!parent_pdev)
> > > return false;
> > > /* If the PCIe port does not support D3cold via runtime PM, allow a
> > >  * fallback to the Optimus DSM method to put the device in D3cold. */
> > > if (parent_pdev->no_d3cold)
> > > return false;
> > > 
> > > This is needed to

[PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

2016-05-30 Thread Mika Westerberg
+Rafael

On Fri, May 27, 2016 at 01:10:37PM +0200, Peter Wu wrote:
> On Wed, May 25, 2016 at 04:55:35PM +0300, Mika Westerberg wrote:
> > On Wed, May 25, 2016 at 12:53:01AM +0200, Peter Wu wrote:
> > > Since "PCI: Add runtime PM support for PCIe ports", the parent PCIe port
> > > can be runtime-suspended which disables power resources via ACPI. This
> > > is incompatible with DSM, resulting in a GPU device which is still in D3
> > > and locks up the kernel on resume.
> > > 
> > > Mirror the behavior of Windows 8 and newer[1] (as observed via an AMLi
> > > debugger trace) and stop using the DSM functions for D3cold when power
> > > resources are available on the parent PCIe port.
> > > 
> > >  [1]: 
> > > https://msdn.microsoft.com/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold
> > > 
> > > Signed-off-by: Peter Wu 
> > > ---
> > >  drivers/gpu/drm/nouveau/nouveau_acpi.c | 34 
> > > ++
> > >  1 file changed, 30 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
> > > b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> > > index df9f73e..e469df7 100644
> > > --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> > > +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> > > @@ -46,6 +46,7 @@ static struct nouveau_dsm_priv {
> > >   bool dsm_detected;
> > >   bool optimus_detected;
> > >   bool optimus_flags_detected;
> > > + bool optimus_skip_dsm;
> > >   acpi_handle dhandle;
> > >   acpi_handle rom_handle;
> > >  } nouveau_dsm_priv;
> > > @@ -212,8 +213,26 @@ static const struct vga_switcheroo_handler 
> > > nouveau_dsm_handler = {
> > >   .get_client_id = nouveau_dsm_get_client_id,
> > >  };
> > >  
> > > +/* Firmware supporting Windows 8 or later do not use _DSM to put the 
> > > device into
> > > + * D3cold, they instead rely on disabling power resources on the parent. 
> > > */
> > > +static bool nouveau_pr3_present(struct pci_dev *pdev)
> > > +{
> > > + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> > > + struct acpi_device *ad;
> > 
> > Nit: please call this adev instead of ad.
> 
> Will do.
> 
> > > +
> > > + if (!parent_pdev)
> > > + return false;
> > > +
> > > + ad = ACPI_COMPANION(_pdev->dev);
> > > + if (!ad)
> > > + return false;
> > > +
> > > + return ad->power.flags.power_resources;
> > 
> > Is this sufficient to tell if the parent device has _PR3? I thought it
> > returns true if it has power resources in general, not necessarily _PR3.
> > 
> > Otherwise this looks okay to me.
> 
> It is indeed set whenever there is any _PRx method. I wonder if it is
> appropriate to access fields directly like this, perhaps this would be
> more accurate (based on device_pm.c):
> 
> /* Check whether the _PR3 method is available. */
> return adev->power.states[ACPI_STATE_D3_COLD].flags.valid;
> 
> I am also considering adding a check in case the pcieport driver does
> not support D3cold via runtime PM, what do you think of this?
> 
> if (!parent_pdev)
> return false;
> /* If the PCIe port does not support D3cold via runtime PM, allow a
>  * fallback to the Optimus DSM method to put the device in D3cold. */
> if (parent_pdev->no_d3cold)
> return false;
> 
> This is needed to avoid the regression reported in the cover letter, but
> also allows pre-2015 systems to (still) have the D3cold possibility.

The _DSM method with 0 as index parameter should return a bit field
telling which functions are supported. Sane BIOS disables that
particular function if it detects Windows 8 and newer. Have you checked
if that's the case?

Then you can call _DSM only if it is supported and otherwise expect the
parent device's power resources to turn off power when runtime
suspended.

> Out of curiosity I looked up an pre-2015 laptop (found Acer V5-573G,
> apparently from November 2013, Windows 8.1) and extracted the ACPI
> tables from the BIOS images. BIOS 2.28 (2014/05/13) introduces support
> for power resources on the parent devicea(\_SB.PCI0.PEG0._PR3 and a
> related NVP3 device) when _OSI("Windows 2013") is true. (This is added
> as alternative for the old DSM interface.)
> 
> Maybe 2014 is also an appropriate cutoff date? I wonder if it is
> feasible to detect firmware use of _OSI("Windows 2013") and use that
> instead of the BIOS year.

Using BIOS year works even if there is no ACPI available.

What comes to the cutoff date, I discussed with Rafael and it was
decided that we use the same year Windows 10 was released to be on the
safe side. Reading the links you provided here:

https://msdn.microsoft.com/fi-fi/windows/hardware/drivers/bringup/device-power-management
https://msdn.microsoft.com/en-us/library/windows/hardware/hh967709(v=vs.85).aspx

it seems that from Windows 8 they started transitioning devices into
D3cold during runtime as well.


[PATCH 4/4] drm/nouveau/acpi: fix lockup with PCIe runtime PM

2016-05-25 Thread Mika Westerberg
On Wed, May 25, 2016 at 12:53:01AM +0200, Peter Wu wrote:
> Since "PCI: Add runtime PM support for PCIe ports", the parent PCIe port
> can be runtime-suspended which disables power resources via ACPI. This
> is incompatible with DSM, resulting in a GPU device which is still in D3
> and locks up the kernel on resume.
> 
> Mirror the behavior of Windows 8 and newer[1] (as observed via an AMLi
> debugger trace) and stop using the DSM functions for D3cold when power
> resources are available on the parent PCIe port.
> 
>  [1]: 
> https://msdn.microsoft.com/windows/hardware/drivers/bringup/firmware-requirements-for-d3cold
> 
> Signed-off-by: Peter Wu 
> ---
>  drivers/gpu/drm/nouveau/nouveau_acpi.c | 34 
> ++
>  1 file changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_acpi.c 
> b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> index df9f73e..e469df7 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_acpi.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_acpi.c
> @@ -46,6 +46,7 @@ static struct nouveau_dsm_priv {
>   bool dsm_detected;
>   bool optimus_detected;
>   bool optimus_flags_detected;
> + bool optimus_skip_dsm;
>   acpi_handle dhandle;
>   acpi_handle rom_handle;
>  } nouveau_dsm_priv;
> @@ -212,8 +213,26 @@ static const struct vga_switcheroo_handler 
> nouveau_dsm_handler = {
>   .get_client_id = nouveau_dsm_get_client_id,
>  };
>  
> +/* Firmware supporting Windows 8 or later do not use _DSM to put the device 
> into
> + * D3cold, they instead rely on disabling power resources on the parent. */
> +static bool nouveau_pr3_present(struct pci_dev *pdev)
> +{
> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev);
> + struct acpi_device *ad;

Nit: please call this adev instead of ad.

> +
> + if (!parent_pdev)
> + return false;
> +
> + ad = ACPI_COMPANION(_pdev->dev);
> + if (!ad)
> + return false;
> +
> + return ad->power.flags.power_resources;

Is this sufficient to tell if the parent device has _PR3? I thought it
returns true if it has power resources in general, not necessarily _PR3.

Otherwise this looks okay to me.


[PATCH 1/2] vga_switcheroo: add power support for windows 10 machines.

2016-03-15 Thread Mika Westerberg
On Tue, Mar 15, 2016 at 02:39:58PM +0100, Lukas Wunner wrote:
> Hi Mika,
> 
> On Mon, Mar 14, 2016 at 11:43:35AM +0200, Mika Westerberg wrote:
> > On Mon, Mar 14, 2016 at 12:19:14PM +1000, Dave Airlie wrote:
> > > On 11 March 2016 at 23:45, Rafael J. Wysocki  wrote:
> > > > On Friday, March 11, 2016 12:58:15 PM Mika Westerberg wrote:
> > > >> On Thu, Mar 10, 2016 at 09:57:09PM +0100, Rafael J. Wysocki wrote:
> > > >> > > It doesn't seem to do any runtime PM,
> > > >> > > I do wonder if pcieport should be doing it's own runtime PM 
> > > >> > > handling,
> > > >> > > but that is a
> > > >> > > larger task than I'm thinking to tackle here.
> > > >> >
> > > >> > PCIe ports don't do PM - yet.  Mika has posted a series of patches 
> > > >> > to implement
> > > >> > that, however, that are waiting for comments now:
> > > >> >
> > > >> > https://patchwork.kernel.org/patch/8453311/
> > > >> > https://patchwork.kernel.org/patch/8453381/
> > > >> > https://patchwork.kernel.org/patch/8453391/
> > > >> > https://patchwork.kernel.org/patch/8453411/
> > > >> > https://patchwork.kernel.org/patch/8453371/
> > > >> > https://patchwork.kernel.org/patch/8453351/
> 
> If a pciehp port is runtime suspended and pciehp_poll_mode is enabled,
> the poll timer needs to be disabled and later reenabled on runtime resume.

If we disable the timer then we can't detect when a new device is
connected to the port.

I think in this case it might be better not to enable runtime PM for the
port at all.


[PATCH 1/2] vga_switcheroo: add power support for windows 10 machines.

2016-03-14 Thread Mika Westerberg
On Mon, Mar 14, 2016 at 01:50:41PM +0100, Rafael J. Wysocki wrote:
> On Mon, Mar 14, 2016 at 11:23 AM, Mika Westerberg
>  wrote:
> > On Mon, Mar 14, 2016 at 07:47:39PM +1000, Dave Airlie wrote:
> >> >
> >> >> - if (pcie_port_runtime_suspend_allowed(dev))
> >> >> + if (pcie_port_runtime_suspend_allowed(dev)) {
> >> >> + pm_runtime_allow(>dev);
> >> >
> >> > PCI drivers typically have left this decision up to the userspace. I'm
> >> > wondering whether it is good idea to deviate from that here? Of course
> >> > this allows immediate power savings but could potentially cause problems
> >> > as well.
> >> >
> >>
> >> No distro has ever shipped userspace to do this, I really think this
> >> is a bad design.
> >> We have wasted countless watts of power on this stupid idea that people 
> >> will
> >> run powertop, only a few people in the world run powertop, lots of
> >> people use Linux.
> >
> > That is a fair point.
> >
> > I do not have anything against calling pm_runtime_allow() here. In fact
> > we already do the same in Intel LPSS drivers. I just wanted to bring
> > that up.
> >
> > Rafael, what do you think?
> 
> We can do that to start with.  If there are no problems in the field
> with it, I don't see any problems in principle.
> 
> > If we anyway are going to add cut-off date to enable runtime PM we
> > should expect that the hardware is also capable of doing so (and if not
> > we can always blacklist the exceptions).
> 
> Sounds reasonable.
> 
> >> The kernel should power stuff down not wait for the user to run powertop,
> >> At least for the GPU it's in the area of 8W of power, and I've got the
> >> GPU drivers doing this themselves,
> >>
> >> I could have the GPU driver call runtime allow for it's host bridge I 
> >> suppose,
> >> if we insist on the userspace cares, but I'd prefer not doing so.
> >>
> >> > I think we need to add corresponding call to pm_runtime_forbid() in
> >> > pcie_portdrv_remove().
> >>
> >> Yes most likely.
> >
> > BTW, I can add both calls to the next version of PCIe runtime PM patches
> > if you are OK with that, and all agree this is a good idea.
> 
> That would be fine by me.

OK thanks.

I'll do these changes to the next version of the patch series then.


[PATCH 1/2] vga_switcheroo: add power support for windows 10 machines.

2016-03-14 Thread Mika Westerberg
On Mon, Mar 14, 2016 at 07:47:39PM +1000, Dave Airlie wrote:
> >
> >> - if (pcie_port_runtime_suspend_allowed(dev))
> >> + if (pcie_port_runtime_suspend_allowed(dev)) {
> >> + pm_runtime_allow(>dev);
> >
> > PCI drivers typically have left this decision up to the userspace. I'm
> > wondering whether it is good idea to deviate from that here? Of course
> > this allows immediate power savings but could potentially cause problems
> > as well.
> >
> 
> No distro has ever shipped userspace to do this, I really think this
> is a bad design.
> We have wasted countless watts of power on this stupid idea that people will
> run powertop, only a few people in the world run powertop, lots of
> people use Linux.

That is a fair point.

I do not have anything against calling pm_runtime_allow() here. In fact
we already do the same in Intel LPSS drivers. I just wanted to bring
that up.

Rafael, what do you think?

If we anyway are going to add cut-off date to enable runtime PM we
should expect that the hardware is also capable of doing so (and if not
we can always blacklist the exceptions).

> The kernel should power stuff down not wait for the user to run powertop,
> At least for the GPU it's in the area of 8W of power, and I've got the
> GPU drivers doing this themselves,
> 
> I could have the GPU driver call runtime allow for it's host bridge I suppose,
> if we insist on the userspace cares, but I'd prefer not doing so.
> 
> > I think we need to add corresponding call to pm_runtime_forbid() in
> > pcie_portdrv_remove().
> 
> Yes most likely.

BTW, I can add both calls to the next version of PCIe runtime PM patches
if you are OK with that, and all agree this is a good idea.


[PATCH 1/2] vga_switcheroo: add power support for windows 10 machines.

2016-03-14 Thread Mika Westerberg
On Mon, Mar 14, 2016 at 12:19:14PM +1000, Dave Airlie wrote:
> On 11 March 2016 at 23:45, Rafael J. Wysocki  wrote:
> > On Friday, March 11, 2016 12:58:15 PM Mika Westerberg wrote:
> >> On Thu, Mar 10, 2016 at 09:57:09PM +0100, Rafael J. Wysocki wrote:
> >> > > It doesn't seem to do any runtime PM,
> >> > > I do wonder if pcieport should be doing it's own runtime PM handling,
> >> > > but that is a
> >> > > larger task than I'm thinking to tackle here.
> >> >
> >> > PCIe ports don't do PM - yet.  Mika has posted a series of patches to 
> >> > implement
> >> > that, however, that are waiting for comments now:
> >> >
> >> > https://patchwork.kernel.org/patch/8453311/
> >> > https://patchwork.kernel.org/patch/8453381/
> >> > https://patchwork.kernel.org/patch/8453391/
> >> > https://patchwork.kernel.org/patch/8453411/
> >> > https://patchwork.kernel.org/patch/8453371/
> >> > https://patchwork.kernel.org/patch/8453351/
> >> >
> >> > > Maybe I should be doing
> >> > >
> >> > > pci_set_power_state(pdev->bus->self, PCI_D3cold) ? I'm not really sure.
> >> >
> >> > Using pci_set_power_state() would be more appropriate IMO, but you can 
> >> > get
> >> > to the bridge via dev->parent too, can't you?
> >> >
> >> > In any case, it looks like you and Mika need to talk. :-)
> >>
> >> When the vga_switcheroo device gets runtime suspended (with the above
> >> runtime PM patchs for PCIe root ports) the root port should also be
> >> runtime suspended by the PM core.
> >
> > Right, after your patches have been applied, the additional handling
> > won't be needed.
> >
> > So Dave, maybe you can check if the Mika's patches help?
> 
> Hi Mika,
> 
> I tested your patches with a couple of changes on the Lenovo W541.

Thanks for testing.

> The attached patch contains the two things I needed to get the same
> functionality
> as my patches.
> 
> I'm really not in love with the per-chipset enablement for this,
> really any chipsets
> after a certain year should probably be better, as we'll constantly be
> adding PCI Ids
> for every chipset ever made, and I expect we'll forget some.

Bjorn also had the same comment. I think I'll change it to use cut-off
date instead of list of PCI IDs of nobody objects.

> Dave.

> From eea412076c9d24262ebd4811f766d5379b728045 Mon Sep 17 00:00:00 2001
> From: Dave Airlie 
> Date: Mon, 14 Mar 2016 23:37:43 +1000
> Subject: [PATCH] [RFC] PCI: enable runtime PM on pcieports to let secondary
>  GPUs powerdown.
> 
> With secondary GPUs in Windows 10 laptops we need to use runtime PM
> to power down the PCIe ports after the devices goes to D3Cold.
> 
> This patch adds the PCI ID for the Haswell 16x PCIE slot found
> in the W541 laptops. I should probably try and gather the PCI IDs,
> for broadwell/skylake variants as well if we can't find them.
> 
> This fixes a regression on W541 laptops on top of Mika's PCIE patches
> since we started advertising Windows 2013 ACPI.
> 
> [probably should be two patches - hence RFC]
> 
> Signed-off-by: Dave Airlie 
> ---
>  drivers/pci/pcie/portdrv_pci.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 43dd23e..1405de8 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -278,8 +278,10 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  
>   pci_save_state(dev);
>  
> - if (pcie_port_runtime_suspend_allowed(dev))
> + if (pcie_port_runtime_suspend_allowed(dev)) {
> + pm_runtime_allow(>dev);

PCI drivers typically have left this decision up to the userspace. I'm
wondering whether it is good idea to deviate from that here? Of course
this allows immediate power savings but could potentially cause problems
as well.

I think we need to add corresponding call to pm_runtime_forbid() in
pcie_portdrv_remove().

>   pm_runtime_put_noidle(>dev);
> + }
>  
>   return 0;
>  }
> @@ -436,6 +438,8 @@ static void pcie_portdrv_err_resume(struct pci_dev *dev)
>   * LINUX Device Driver Model
>   */
>  static const struct pci_device_id port_pci_ids[] = {
> + /* Intel Skylake PCIE graphics port */
> + { PCI_VDEVICE(INTEL, 0x0c01), .driver_data = PCIE_PORT_SPT },
>   /* Intel Broxton */
>   { PCI_VDEVICE(INTEL, 0x1ad6), .driver_data = PCIE_PORT_SPT },
>   { PCI_VDEVICE(INTEL, 0x1ad7), .driver_data = PCIE_PORT_SPT },
> -- 
> 2.5.0
> 



[PATCH 1/2] vga_switcheroo: add power support for windows 10 machines.

2016-03-11 Thread Mika Westerberg
On Thu, Mar 10, 2016 at 09:57:09PM +0100, Rafael J. Wysocki wrote:
> > It doesn't seem to do any runtime PM,
> > I do wonder if pcieport should be doing it's own runtime PM handling,
> > but that is a
> > larger task than I'm thinking to tackle here.
> 
> PCIe ports don't do PM - yet.  Mika has posted a series of patches to 
> implement
> that, however, that are waiting for comments now:
> 
> https://patchwork.kernel.org/patch/8453311/
> https://patchwork.kernel.org/patch/8453381/
> https://patchwork.kernel.org/patch/8453391/
> https://patchwork.kernel.org/patch/8453411/
> https://patchwork.kernel.org/patch/8453371/
> https://patchwork.kernel.org/patch/8453351/
> 
> > Maybe I should be doing
> > 
> > pci_set_power_state(pdev->bus->self, PCI_D3cold) ? I'm not really sure.
> 
> Using pci_set_power_state() would be more appropriate IMO, but you can get
> to the bridge via dev->parent too, can't you?
> 
> In any case, it looks like you and Mika need to talk. :-)

When the vga_switcheroo device gets runtime suspended (with the above
runtime PM patchs for PCIe root ports) the root port should also be
runtime suspended by the PM core. I don't think there is a need to call
any pci_set_power_state() in this driver but maybe I'm missing
something.


Re: [PATCH v3 0/4] Fix Win8 backlight issue

2013-09-29 Thread Mika Westerberg
On Tue, Sep 24, 2013 at 05:47:28PM +0800, Aaron Lu wrote:
 v3:
 1 Add a new patch 4/4 to fix some problems in thinkpad-acpi module;
 2 Remove unnecessary function acpi_video_unregister introduced in
   patch 2/3 as pointed out by Jani Nikula.
 
 v2:
 v1 has the subject of Rework ACPI video driver and is posted here:
 http://lkml.org/lkml/2013/9/9/74
 Since the objective is really to fix Win8 backlight issues, I changed
 the subject in this version, sorry about that.
 
 This patchset has three patches, the first introduced a new API named
 backlight_device_registered in backlight layer that can be used for
 backlight interface provider module to check if a specific type backlight
 interface has been registered, see changelog for patch 1/3 for details.
 Then patch 2/3 does the cleanup to sepeate the backlight control and
 event delivery functionality in the ACPI video module and patch 3/3
 solves some Win8 backlight control problems by avoiding register ACPI
 video's backlight interface if:
 1 Kernel cmdline option acpi_backlight=video is not given;
 2 This is a Win8 system;
 3 Native backlight control interface exists.
 
 Technically, patch 2/3 is not required to fix the issue here. So if you
 think it is not necessary, I can remove it from the series.
 
 Aaron Lu (4):
   backlight: introduce backlight_device_registered
   ACPI / video: seperate backlight control and event interface
   ACPI / video: Do not register backlight if win8 and native interface
 exists
   thinkpad-acpi: fix handle locate for video and query of _BCL
 
  drivers/acpi/internal.h  |   5 +-
  drivers/acpi/video.c | 442 
 ---
  drivers/acpi/video_detect.c  |  14 +-
  drivers/platform/x86/thinkpad_acpi.c |  31 ++-
  drivers/video/backlight/backlight.c  |  31 +++
  include/acpi/video.h |   2 +
  include/linux/backlight.h|   4 +
  7 files changed, 324 insertions(+), 205 deletions(-)

Just tested this series on my HP revolve 810 and this fixes the backlight
issue it had :)

Feel free to add my,

Tested-by: Mika Westerberg mika.westerb...@linux.intel.com
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel