Re: [Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-19 Thread Alex Deucher
On Mon, Feb 19, 2018 at 9:54 AM, Daniel Vetter  wrote:
> On Mon, Feb 19, 2018 at 03:47:42PM +0100, Lukas Wunner wrote:
>> On Mon, Feb 19, 2018 at 03:05:53PM +0100, Daniel Vetter wrote:
>> > On Mon, Feb 19, 2018 at 12:58:17PM +0100, Lukas Wunner wrote:
>> > > On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote:
>> > > > Well, userspace expects hotplug events, even when we runtime suspend
>> > > > stuff. Hence waking shit up with polling. Imo ignoring hotplug events 
>> > > > is a
>> > > > pretty serious policy decision which is ok in the context of
>> > > > vga_switcheroo, but not really as an automatic thing. E.g. usb also 
>> > > > wakes
>> > > > up if you plug something in, even with all the runtime pm stuff 
>> > > > enabled.
>> > > > Same for sata and everything else.
>> > >
>> > > On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into
>> > > the gmux controller, which sends an interrupt on hotplug even if the GPU
>> > > is powered down.
>> > >
>> > > Apparently Optimus has a similar functionality, cf. 3a6536c51d5d.
>> >
>> > Yeah, for vga_switcheroo and gmux setups shutting down polling explicitly
>> > makes sense. I think ideally we'd stop polling in the gmux handler somehow
>> > (maybe by dropping the relevant DRM_CONNECTOR_POLL flags, or outright
>> > stopping it all). But not when runtime suspending the entire gpu (e.g.
>> > idle system that shuts down the screen and everything, before it decides
>> > a few minutes later to do a full system suspend).
>>
>> nouveau, radeon and amdgpu currently use runtime PM *only* on hybrid
>> graphics laptops.
>>
>> Should the drivers later be extended to also use runtime PM in other
>> scenarios (desktop machines, eGPUs), they can easily detect whether
>> to disable polling on runtime suspend by calling apple_gmux_present()
>> on Macs or the equivalent for Optimus/ATPX.
>
> Ah, then I think the current solution is ok (if not entirely clean imo,
> but that can be fixed up whenever it hurts). Implementing runtime pm for
> other cases is up to the driver authors really (probably more pressing
> when the gpu is on the same SoC).

On our APUs, we support fairly fine grained powergating so this mostly
happens auto-magically in hw; no need for runtimepm.  We haven't
supported native analog encoders in last 3 or 4 generations of display
hw, so polling is not much of an issue going forward.  On most
integrated platforms (e.g., laptops and all-in-ones), digital hotplug
is handled by the platform (we get an ACPI ATIF notification) so we
can wake the dGPU.

Alex

> -Daniel
>
>>
>> Thanks,
>>
>> Lukas
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-19 Thread Daniel Vetter
On Mon, Feb 19, 2018 at 03:47:42PM +0100, Lukas Wunner wrote:
> On Mon, Feb 19, 2018 at 03:05:53PM +0100, Daniel Vetter wrote:
> > On Mon, Feb 19, 2018 at 12:58:17PM +0100, Lukas Wunner wrote:
> > > On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote:
> > > > Well, userspace expects hotplug events, even when we runtime suspend
> > > > stuff. Hence waking shit up with polling. Imo ignoring hotplug events 
> > > > is a
> > > > pretty serious policy decision which is ok in the context of
> > > > vga_switcheroo, but not really as an automatic thing. E.g. usb also 
> > > > wakes
> > > > up if you plug something in, even with all the runtime pm stuff enabled.
> > > > Same for sata and everything else.
> > > 
> > > On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into
> > > the gmux controller, which sends an interrupt on hotplug even if the GPU
> > > is powered down.
> > > 
> > > Apparently Optimus has a similar functionality, cf. 3a6536c51d5d.
> > 
> > Yeah, for vga_switcheroo and gmux setups shutting down polling explicitly
> > makes sense. I think ideally we'd stop polling in the gmux handler somehow
> > (maybe by dropping the relevant DRM_CONNECTOR_POLL flags, or outright
> > stopping it all). But not when runtime suspending the entire gpu (e.g.
> > idle system that shuts down the screen and everything, before it decides
> > a few minutes later to do a full system suspend).
> 
> nouveau, radeon and amdgpu currently use runtime PM *only* on hybrid
> graphics laptops.
> 
> Should the drivers later be extended to also use runtime PM in other
> scenarios (desktop machines, eGPUs), they can easily detect whether
> to disable polling on runtime suspend by calling apple_gmux_present()
> on Macs or the equivalent for Optimus/ATPX.

Ah, then I think the current solution is ok (if not entirely clean imo,
but that can be fixed up whenever it hurts). Implementing runtime pm for
other cases is up to the driver authors really (probably more pressing
when the gpu is on the same SoC).
-Daniel

> 
> Thanks,
> 
> Lukas
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-19 Thread Lukas Wunner
On Mon, Feb 19, 2018 at 03:05:53PM +0100, Daniel Vetter wrote:
> On Mon, Feb 19, 2018 at 12:58:17PM +0100, Lukas Wunner wrote:
> > On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote:
> > > Well, userspace expects hotplug events, even when we runtime suspend
> > > stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a
> > > pretty serious policy decision which is ok in the context of
> > > vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes
> > > up if you plug something in, even with all the runtime pm stuff enabled.
> > > Same for sata and everything else.
> > 
> > On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into
> > the gmux controller, which sends an interrupt on hotplug even if the GPU
> > is powered down.
> > 
> > Apparently Optimus has a similar functionality, cf. 3a6536c51d5d.
> 
> Yeah, for vga_switcheroo and gmux setups shutting down polling explicitly
> makes sense. I think ideally we'd stop polling in the gmux handler somehow
> (maybe by dropping the relevant DRM_CONNECTOR_POLL flags, or outright
> stopping it all). But not when runtime suspending the entire gpu (e.g.
> idle system that shuts down the screen and everything, before it decides
> a few minutes later to do a full system suspend).

nouveau, radeon and amdgpu currently use runtime PM *only* on hybrid
graphics laptops.

Should the drivers later be extended to also use runtime PM in other
scenarios (desktop machines, eGPUs), they can easily detect whether
to disable polling on runtime suspend by calling apple_gmux_present()
on Macs or the equivalent for Optimus/ATPX.

Thanks,

Lukas
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-19 Thread Daniel Vetter
On Mon, Feb 19, 2018 at 12:58:17PM +0100, Lukas Wunner wrote:
> On Mon, Feb 19, 2018 at 12:34:43PM +0100, Daniel Vetter wrote:
> > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> > > Fix a deadlock on hybrid graphics laptops that's been present since 2013:
> > > 
> > > DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> > > stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
> > > the poll worker invokes the DRM drivers' ->detect callbacks, which call
> > > pm_runtime_get_sync().  If the poll worker starts after runtime suspend
> > > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> > > with the intention of runtime resuming the device afterwards.  The result
> > > is a circular wait between poll worker and autosuspend worker.
> > 
> > Don't shut down the poll worker when runtime suspending, that' doesn't
> > work. If you need the poll work, then that means waking up the gpu every
> > few seconds. If you don't need it, then make sure the DRM_CONNECTOR_POLL
> > flags are set correctly (you can update them at runtime, the poll worker
> > will pick that up).
> > 
> > That should fix the deadlock, and it's how we do it in i915 (where igt in
> > CI totally hammers the runtime pm support, and it seems to hold up).
> 
> It would fix the deadlock but it's not an option on dual GPU laptops.
> Power consumption of the discrete GPU is massive (9 W on my machine).
> 
> 
> > > i915, malidp and msm "solved" this issue by not stopping the poll worker
> > > on runtime suspend.  But this results in the GPU bouncing back and forth
> > > between D0 and D3 continuously.  That's a total no-go for GPUs which
> > > runtime suspend to D3cold since every suspend/resume cycle costs a
> > > significant amount of time and energy.  (i915 and malidp do not seem
> > > to acquire a runtime PM ref in the ->detect callbacks, which seems
> > > questionable.  msm however does and would also deadlock if it disabled
> > > the poll worker on runtime suspend.  cc += Archit, Liviu, intel-gfx)
> > 
> > Well, userspace expects hotplug events, even when we runtime suspend
> > stuff. Hence waking shit up with polling. Imo ignoring hotplug events is a
> > pretty serious policy decision which is ok in the context of
> > vga_switcheroo, but not really as an automatic thing. E.g. usb also wakes
> > up if you plug something in, even with all the runtime pm stuff enabled.
> > Same for sata and everything else.
> 
> On the MacBook Pro, the HPD pin of external DP and HDMI ports goes into
> the gmux controller, which sends an interrupt on hotplug even if the GPU
> is powered down.
> 
> Apparently Optimus has a similar functionality, cf. 3a6536c51d5d.

Yeah, for vga_switcheroo and gmux setups shutting down polling explicitly
makes sense. I think ideally we'd stop polling in the gmux handler somehow
(maybe by dropping the relevant DRM_CONNECTOR_POLL flags, or outright
stopping it all). But not when runtime suspending the entire gpu (e.g.
idle system that shuts down the screen and everything, before it decides
a few minutes later to do a full system suspend).

I also think that this approach would lead to cleaner code, having
explicit checks for the (locking) execution context all over the place
tends to result in regrets eventually ime.

> For the rare cases where an external VGA or DVI-A port is present, I guess
> it's reasonable to have the user wake up the machine manually.
> 
> I'm not sure why nouveau polls ports on my laptop, the GK107 only has an
> LVDS and three DP ports, need to investigate.

Yeah, that'd be good to figure out. The probe helpers should shut down the
worker if there's no connector that needs probing. We use that to
enable/disable the poll worker when there's a hotplug storm on the irq
line.

Once that's fixed we can perhaps also untangle the poll-vs-gmux story.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-19 Thread Lukas Wunner
On Mon, Feb 19, 2018 at 12:48:04PM +0100, Daniel Vetter wrote:
> On Thu, Feb 15, 2018 at 06:38:44AM +0100, Lukas Wunner wrote:
> > On Wed, Feb 14, 2018 at 09:58:43AM -0500, Sean Paul wrote:
> > > On Wed, Feb 14, 2018 at 03:43:56PM +0100, Michel Dänzer wrote:
> > > > On 2018-02-14 03:08 PM, Sean Paul wrote:
> > > > > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote:
> > > > >> Op 14-02-18 om 09:46 schreef Lukas Wunner:
> > > > >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> > > >  Fix a deadlock on hybrid graphics laptops that's been present 
> > > >  since 2013:
> > > > >>> This series has been reviewed, consent has been expressed by the 
> > > > >>> most
> > > > >>> interested parties, patch [1/5] which touches files outside 
> > > > >>> drivers/gpu
> > > > >>> has been acked and I've just out a v2 addressing the only objection
> > > > >>> raised.  My plan is thus to wait another two days for comments and,
> > > > >>> barring further objections, push to drm-misc this weekend.
> > > > >>>
> > > > >>> However I'm struggling with the decision whether to push to next or
> > > > >>> fixes.  The series is marked for stable, however the number of
> > > > >>> affected machines is limited and for an issue that's been present
> > > > >>> for 5 years it probably doesn't matter if it soaks another two 
> > > > >>> months
> > > > >>> in linux-next befor it gets backported.  Hence I tend to err on the
> > > > >>> side of caution and push to next, however a case could be made that
> > > > >>> fixes is more appropriate.
> > > > >>>
> > > > >>> I'm lacking experience making such decisions and would be interested
> > > > >>> to learn how you'd handle this.
> > > > >>
> > > > >> I would say fixes, it doesn't look particularly scary. :)
> > > > > 
> > > > > Agreed. If it's good enough for stable, it's good enough for -fixes!
> > > > 
> > > > It's not that simple, is it? Fast-tracking patches (some of which appear
> > > > to be untested) to stable without an immediate cause for urgency seems
> > > > risky to me.
> > > 
> > > /me should be more careful what he says
> > > 
> > > Given where we are in the release cycle, it's barely a fast track.
> > > If these go in -fixes, they'll get in -rc2 and will have plenty of
> > > time to bake. If we were at rc5, it might be a different story.
> > 
> > The patches are marked for stable though, so if they go in through
> > drm-misc-fixes, they may appear in stable kernels before 4.16-final
> > is out.  Greg picks up patches once they're in Linus' tree, though
> > often with a delay of a few days or weeks.  If they go in through
> > drm-misc-next, they're guaranteed not to appear in *any* release
> > before 4.16-final is out.
> > 
> > This allows for differentiation between no-brainer stable fixes that
> > can be sent immediately and scarier, but similarly important stable
> > fixes that should soak for a while.  I'm not sure which category
> > this series belongs to, though it's true what Maarten says, it's
> > not *that* grave a change.
> 
> If you're this concerned about them, then pls do _not_ put cc: stable on
> the patches. Instead get them merged through -fixes (or maybe even -next),
> and once they're sufficiently tested, send a mail to stable@ asking for
> ane explicit backport.

I'm not concerned about them, but would have erred on the side of caution.
However consensus seems to have been that they're sufficiently unscary to
push to -fixes.  Do you disagree with that decision, if so, why?  Can we
amend the dim docs to codify guidelines whether to push to -fixes or -next?
I allowed 1 week for comments, now you're returning from vacation and seem
to be unhappy, was 1 week too short?

Thanks,

Lukas
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [Intel-gfx] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-19 Thread Daniel Vetter
On Thu, Feb 15, 2018 at 06:38:44AM +0100, Lukas Wunner wrote:
> On Wed, Feb 14, 2018 at 09:58:43AM -0500, Sean Paul wrote:
> > On Wed, Feb 14, 2018 at 03:43:56PM +0100, Michel Dänzer wrote:
> > > On 2018-02-14 03:08 PM, Sean Paul wrote:
> > > > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote:
> > > >> Op 14-02-18 om 09:46 schreef Lukas Wunner:
> > > >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> > >  Fix a deadlock on hybrid graphics laptops that's been present since 
> > >  2013:
> > > >>> This series has been reviewed, consent has been expressed by the most
> > > >>> interested parties, patch [1/5] which touches files outside 
> > > >>> drivers/gpu
> > > >>> has been acked and I've just out a v2 addressing the only objection
> > > >>> raised.  My plan is thus to wait another two days for comments and,
> > > >>> barring further objections, push to drm-misc this weekend.
> > > >>>
> > > >>> However I'm struggling with the decision whether to push to next or
> > > >>> fixes.  The series is marked for stable, however the number of
> > > >>> affected machines is limited and for an issue that's been present
> > > >>> for 5 years it probably doesn't matter if it soaks another two months
> > > >>> in linux-next befor it gets backported.  Hence I tend to err on the
> > > >>> side of caution and push to next, however a case could be made that
> > > >>> fixes is more appropriate.
> > > >>>
> > > >>> I'm lacking experience making such decisions and would be interested
> > > >>> to learn how you'd handle this.
> > > >>
> > > >> I would say fixes, it doesn't look particularly scary. :)
> > > > 
> > > > Agreed. If it's good enough for stable, it's good enough for -fixes!
> > > 
> > > It's not that simple, is it? Fast-tracking patches (some of which appear
> > > to be untested) to stable without an immediate cause for urgency seems
> > > risky to me.
> > 
> > /me should be more careful what he says
> > 
> > Given where we are in the release cycle, it's barely a fast track.
> > If these go in -fixes, they'll get in -rc2 and will have plenty of
> > time to bake. If we were at rc5, it might be a different story.
> 
> The patches are marked for stable though, so if they go in through
> drm-misc-fixes, they may appear in stable kernels before 4.16-final
> is out.  Greg picks up patches once they're in Linus' tree, though
> often with a delay of a few days or weeks.  If they go in through
> drm-misc-next, they're guaranteed not to appear in *any* release
> before 4.16-final is out.
> 
> This allows for differentiation between no-brainer stable fixes that
> can be sent immediately and scarier, but similarly important stable
> fixes that should soak for a while.  I'm not sure which category
> this series belongs to, though it's true what Maarten says, it's
> not *that* grave a change.

If you're this concerned about them, then pls do _not_ put cc: stable on
the patches. Instead get them merged through -fixes (or maybe even -next),
and once they're sufficiently tested, send a mail to stable@ asking for
ane explicit backport.

Stuff that's marked for stable must be obviuos and tested enough for
backporting right away (which doesn't seem to be the case here).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau