[Nouveau] [Bug 103897] Kernel 4.14 causes high cpu usage, 4.12 was OK

2018-02-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103897

--- Comment #7 from Andrew Randrianasulu  ---
This bug still around for me with 4.15.0

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Lukas Wunner
On Wed, Feb 14, 2018 at 09:58:43AM -0500, Sean Paul wrote:
> On Wed, Feb 14, 2018 at 03:43:56PM +0100, Michel Dänzer wrote:
> > On 2018-02-14 03:08 PM, Sean Paul wrote:
> > > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote:
> > >> Op 14-02-18 om 09:46 schreef Lukas Wunner:
> > >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> >  Fix a deadlock on hybrid graphics laptops that's been present since 
> >  2013:
> > >>> This series has been reviewed, consent has been expressed by the most
> > >>> interested parties, patch [1/5] which touches files outside drivers/gpu
> > >>> has been acked and I've just out a v2 addressing the only objection
> > >>> raised.  My plan is thus to wait another two days for comments and,
> > >>> barring further objections, push to drm-misc this weekend.
> > >>>
> > >>> However I'm struggling with the decision whether to push to next or
> > >>> fixes.  The series is marked for stable, however the number of
> > >>> affected machines is limited and for an issue that's been present
> > >>> for 5 years it probably doesn't matter if it soaks another two months
> > >>> in linux-next befor it gets backported.  Hence I tend to err on the
> > >>> side of caution and push to next, however a case could be made that
> > >>> fixes is more appropriate.
> > >>>
> > >>> I'm lacking experience making such decisions and would be interested
> > >>> to learn how you'd handle this.
> > >>
> > >> I would say fixes, it doesn't look particularly scary. :)
> > > 
> > > Agreed. If it's good enough for stable, it's good enough for -fixes!
> > 
> > It's not that simple, is it? Fast-tracking patches (some of which appear
> > to be untested) to stable without an immediate cause for urgency seems
> > risky to me.
> 
> /me should be more careful what he says
> 
> Given where we are in the release cycle, it's barely a fast track.
> If these go in -fixes, they'll get in -rc2 and will have plenty of
> time to bake. If we were at rc5, it might be a different story.

The patches are marked for stable though, so if they go in through
drm-misc-fixes, they may appear in stable kernels before 4.16-final
is out.  Greg picks up patches once they're in Linus' tree, though
often with a delay of a few days or weeks.  If they go in through
drm-misc-next, they're guaranteed not to appear in *any* release
before 4.16-final is out.

This allows for differentiation between no-brainer stable fixes that
can be sent immediately and scarier, but similarly important stable
fixes that should soak for a while.  I'm not sure which category
this series belongs to, though it's true what Maarten says, it's
not *that* grave a change.

Thanks,

Lukas
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] Addressing the problem of noisy GPUs under Nouveau

2018-02-14 Thread Martin Peres
On 07/02/18 05:31, John Hubbard wrote:
> On 01/28/2018 04:05 PM, Martin Peres wrote:
>> On 29/01/18 01:24, Martin Peres wrote:
>>> On 28/11/17 07:32, John Hubbard wrote:
 On 11/23/2017 02:48 PM, Martin Peres wrote:
> On 23/11/17 10:06, John Hubbard wrote:
>> On 11/22/2017 05:07 PM, Martin Peres wrote:
>>> Hey,
>>>
>>> Thanks for your answer, Andy!
>>>
>>> On 22/11/17 04:06, Ilia Mirkin wrote:
 On Tue, Nov 21, 2017 at 8:29 PM, Andy Ritger  
 wrote:
 Martin's question was very long, but it boils down to this:

 How do we compute the correct values to write into the e114/e118 pwm
 registers based on the VBIOS contents and current state of the board
 (like temperature).
>>>
>>> Unfortunately, it can also be the e11c/e120 couple, or 0x200d8/dc on
>>> GF119+, or 0x200cd/d0 on Kepler+.
>>>
>>> At least, it looks like we know which PWM controler we need to drive, so
>>> I did not want to muddy the water even more by giving register
>>> addresses, rather concentrating on the problem at hand: How to compute
>>> the duty value for the PWM controler.
>>>

 We generally do this right, but appear to get it extra-wrong for 
 certain GPUs.
>>>
>>> Yes... So far, we are always safe, but users tend to mind when their
>>> computer sound like a jumbo jet at take off... Who would have thought? 
>>> :D
>>>
>>> Anyway, looking forward to your answer!
>>>
>>> Cheers,
>>> Martin
>>
> [...]
> 
> Hi Martin,
> 
> I strongly suspect you are seeing a special behavior, which is: on
> some GF108 boards we use only a very limited range of PWM,
> 0.4 to 2.5%, due to the particular type of DC power conversion
> circuit on those boards. However, it could also just be difficulties
> in interpreting the fixed-point variables in the tables. In either
> case, the answer is to explain those formats, so I'll do that now.
> 
> I am attaching the fan cooler table, in HTML format. We have also
> published the BIT (BIOS Information Table) format, separately:
> 
> 
> http://download.nvidia.com/open-gpu-doc/BIOS-Information-Table/1/BIOS-Information-Table.html
> 
> , but I don't think it has any surprises for you, in this regard. You
> can check it, to be sure you're looking at the right subtable, though,
> just in case.
> 
> The interesting parts of that table are:
> 
> PWM Scale Slope (16 bits):
> 
>   Slope to scale effective PWM to actual PWM (1/4096, F4.12, signed).
>   For backwards compatibility, a value of 0.0 (0x) is interpreted as 1.0 
> (0x1000).
>   This value is used to scale the effective PWM duty cycle, a conceptual 
> fraction
>   of full speed (0% to 100%), to the actual electrical PWM duty cycle.
>   PWM(actual) = Slope × PWM(effective) + Offset
> 
> PWM Scale Offset (16 bits):
> 
>   Offset to scale effective PWM to actual PWM (1/4096, F4.12, signed).
>   This value is used to scale the effective PWM duty cycle, a conceptual 
> fraction
>   of full speed (0% to 100%), to the actual electrical PWM duty cycle.
>   PWM(actual) = Slope × PWM(effective) + Offset
> 
> 
> However, the calculations are hard to get right, and the table stores
> values in fixed-point format, so I'm showing a few simplified code excerpts
> that use these. The various fixed point macro definitions are found as part of
> our normal driver package, in nvmisc.h and nvtypes.h. Any other definitions
> that you need are included right here (I ran a quick compiler check to be 
> sure.)

Wow John, thanks a lot! Sorry for the delay, I was on vacation when you
posted this, but this definitely is what I was looking for!

Thanks a lot for the code example, I will try to make use of it soon and
come back to you if I still have issues!

Martin
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Meelis Roos
> Actually this was brought up to me already, there's a fix on the mailing list
> for this I reviewed a little while ago from nvidia that we should pull in:
> 
> https://patchwork.freedesktop.org/patch/203205/
> 
> Would you guys mind confirming that this patch fixes your issues?

It works on my amd64, P4 is still compiling.

[1.124987] nouveau :04:05.0: NVIDIA NV05 (20154000)
[1.161464] nouveau :04:05.0: bios: version 03.05.00.10.00
[1.161475] nouveau :04:05.0: bios: DCB table not found
[1.161535] nouveau :04:05.0: bios: DCB table not found
[1.161577] nouveau :04:05.0: bios: DCB table not found
[1.161586] nouveau :04:05.0: bios: DCB table not found
[1.344008] tsc: Refined TSC clocksource calibration: 2200.078 MHz
[1.344024] clocksource: tsc: mask: 0x max_cycles: 
0x1fb67c69f81, max_idle_ns: 440795210317 ns
[1.344037] clocksource: Switched to clocksource tsc
[1.408102] nouveau :04:05.0: tmr: unknown input clock freq
[1.409471] nouveau :04:05.0: fb: 32 MiB SDRAM
[1.414459] nouveau :04:05.0: DRM: VRAM: 31 MiB
[1.414467] nouveau :04:05.0: DRM: GART: 128 MiB
[1.414476] nouveau :04:05.0: DRM: BMP version 5.17
[1.414484] nouveau :04:05.0: DRM: No DCB data found in VBIOS
[1.415629] nouveau :04:05.0: DRM: Adaptor not initialised, running 
VBIOS init tables.
[1.415829] nouveau :04:05.0: bios: DCB table not found
[1.416125] nouveau :04:05.0: DRM: Saving VGA fonts
[1.477526] nouveau :04:05.0: DRM: No DCB data found in VBIOS
[1.478428] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.478438] [drm] Driver supports precise vblank timestamp query.
[1.479618] nouveau :04:05.0: DRM: MM: using M2MF for buffer copies
[1.517930] nouveau :04:05.0: DRM: allocated 1024x768 fb: 0x4000, bo 
a09f4d1f
[1.519294] nouveau :04:05.0: fb1: nouveaufb frame buffer device
[1.519313] [drm] Initialized nouveau 1.3.1 20120801 for :04:05.0 on 
minor 1


-- 
Meelis Roos (mr...@linux.ee)
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 105097] Computer hangs only mouse moves

2018-02-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105097

Dmitry Yakimov  changed:

   What|Removed |Added

   Hardware|Other   |x86-64 (AMD64)
 OS|All |Linux (All)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 105097] Computer hangs only mouse moves

2018-02-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105097

--- Comment #1 from Dmitry Yakimov  ---
Created attachment 137361
  --> https://bugs.freedesktop.org/attachment.cgi?id=137361&action=edit
XOrg log

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 105097] New: Computer hangs only mouse moves

2018-02-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105097

Bug ID: 105097
   Summary: Computer hangs only mouse moves
   Product: xorg
   Version: unspecified
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Driver/nouveau
  Assignee: nouveau@lists.freedesktop.org
  Reporter: yaru...@gmail.com
QA Contact: xorg-t...@lists.x.org

Created attachment 137360
  --> https://bugs.freedesktop.org/attachment.cgi?id=137360&action=edit
dmesg log

I have attached dmesg output and foound some messageg in syslog before
freezing:

Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [   73.308412] nouveau
:01:00.0: fifo: write fault at 24 engine 00 [GR] client 0f
[GPC0/PROP_0] reason 02 [PTE] on channel 2 [003fbf8000 Xorg[1036]]
Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [   73.308436] nouveau
:01:00.0: fifo: channel 2: killed
Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [   73.308440] nouveau
:01:00.0: fifo: runlist 0: scheduled for recovery
Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [   73.308446] nouveau
:01:00.0: fifo: engine 0: scheduled for recovery
Feb 14 22:24:28 dmitry-Aspire-X3470 kernel: [   73.308539] nouveau
:01:00.0: Xorg[1036]: channel 2 killed!

-- 
You are receiving this mail because:
You are the assignee for the bug.___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Lyude Paul
Actually this was brought up to me already, there's a fix on the mailing list
for this I reviewed a little while ago from nvidia that we should pull in:

https://patchwork.freedesktop.org/patch/203205/

Would you guys mind confirming that this patch fixes your issues?

On Wed, 2018-02-14 at 18:41 +0100, Pierre Moreau wrote:
> On 2018-02-14 — 09:36, Ilia Mirkin wrote:
> > On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin  wrote:
> > > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
> > > > > This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in
> > > > > 4.15:
> > > > 
> > > > NV5 in another PC (secondary card in x86-64) made the systrem crash on
> > > > boot, in nvkm_therm_clkgate_fini.
> > > 
> > > Mind booting with nouveau.debug=trace? That should hopefully tell us
> > > more exactly which thing is dying. If you have a cross-compile/distcc
> > > setup handy, a bisect may be even more useful.
> > 
> > Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
> > somehow mis-hooked up for NV5 now. A bisect result would still make
> > the culprit a lot more obvious.
> 
> CC’ing Lyude Paul as she hooked up the clockgating support.
> 
> Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t
> nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and
> nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of
> their check for the clkgate_* hooks being there? Or instead, maybe have the
> check in nvkm_device_init() nvkm_device_init()?
> 
> Pierre
-- 
Cheers,
Lyude Paul
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2] drm: Allow determining if current task is output poll worker

2018-02-14 Thread Lyude Paul
I think your idea of having the extra kerneldoc as a seperate patch to make
this easier to backport should work fine :). Thanks for the good work!

Reviewed-by: Lyude Paul 

On Wed, 2018-02-14 at 08:41 +0100, Lukas Wunner wrote:
> Introduce a helper to determine if the current task is an output poll
> worker.
> 
> This allows us to fix a long-standing deadlock in several DRM drivers
> wherein the ->runtime_suspend callback waits for the output poll worker
> to finish and the worker in turn calls a ->detect callback which waits
> for runtime suspend to finish.  The ->detect callback is invoked from
> multiple call sites and waiting for runtime suspend to finish is the
> correct thing to do except if it's executing in the context of the
> worker.
> 
> v2: Expand kerneldoc to specifically mention deadlock between
> output poll worker and autosuspend worker as use case. (Lyude)
> 
> Cc: Dave Airlie 
> Cc: Ben Skeggs 
> Cc: Alex Deucher 
> Reviewed-by: Lyude Paul 
> Signed-off-by: Lukas Wunner 
> ---
>  drivers/gpu/drm/drm_probe_helper.c | 20 
>  include/drm/drm_crtc_helper.h  |  1 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_probe_helper.c
> b/drivers/gpu/drm/drm_probe_helper.c
> index 6dc2dde5b672..7a6b2dc08913 100644
> --- a/drivers/gpu/drm/drm_probe_helper.c
> +++ b/drivers/gpu/drm/drm_probe_helper.c
> @@ -654,6 +654,26 @@ static void output_poll_execute(struct work_struct
> *work)
>   schedule_delayed_work(delayed_work,
> DRM_OUTPUT_POLL_PERIOD);
>  }
>  
> +/**
> + * drm_kms_helper_is_poll_worker - is %current task an output poll worker?
> + *
> + * Determine if %current task is an output poll worker.  This can be used
> + * to select distinct code paths for output polling versus other contexts.
> + *
> + * One use case is to avoid a deadlock between the output poll worker and
> + * the autosuspend worker wherein the latter waits for polling to finish
> + * upon calling drm_kms_helper_poll_disable(), while the former waits for
> + * runtime suspend to finish upon calling pm_runtime_get_sync() in a
> + * connector ->detect hook.
> + */
> +bool drm_kms_helper_is_poll_worker(void)
> +{
> + struct work_struct *work = current_work();
> +
> + return work && work->func == output_poll_execute;
> +}
> +EXPORT_SYMBOL(drm_kms_helper_is_poll_worker);
> +
>  /**
>   * drm_kms_helper_poll_disable - disable output polling
>   * @dev: drm_device
> diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h
> index 76e237bd989b..6914633037a5 100644
> --- a/include/drm/drm_crtc_helper.h
> +++ b/include/drm/drm_crtc_helper.h
> @@ -77,5 +77,6 @@ void drm_kms_helper_hotplug_event(struct drm_device *dev);
>  
>  void drm_kms_helper_poll_disable(struct drm_device *dev);
>  void drm_kms_helper_poll_enable(struct drm_device *dev);
> +bool drm_kms_helper_is_poll_worker(void);
>  
>  #endif
-- 
Cheers,
Lyude Paul
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Pierre Moreau
On 2018-02-14 — 09:36, Ilia Mirkin wrote:
> On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin  wrote:
> > On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
> >>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
> >>
> >> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> >> boot, in nvkm_therm_clkgate_fini.
> >
> > Mind booting with nouveau.debug=trace? That should hopefully tell us
> > more exactly which thing is dying. If you have a cross-compile/distcc
> > setup handy, a bisect may be even more useful.
> 
> Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
> somehow mis-hooked up for NV5 now. A bisect result would still make
> the culprit a lot more obvious.

CC’ing Lyude Paul as she hooked up the clockgating support.

Looking at the code, only NV40+ do have a therm engine. Therefore, shouldn’t
nvkm_therm_clkgate_enable(), nvkm_therm_clkgate_fini() and
nvkm_therm_clkgate_oneinit() all check for therm being not NULL, on top of
their check for the clkgate_* hooks being there? Or instead, maybe have the
check in nvkm_device_init() nvkm_device_init()?

Pierre


signature.asc
Description: PGP signature
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Mike Lothian
On 12 February 2018 at 03:39, Lukas Wunner  wrote:
> On Mon, Feb 12, 2018 at 12:35:51AM +, Mike Lothian wrote:
>> I've not been able to reproduce the original problem you're trying to
>> solve on amdgpu thats with or without your patch set and the above
>> "trigger" too
>>
>> Is anything else required to trigger it, I started multiple DRI_PRIME
>> glxgears, in parallel, serial waiting the 12 seconds and serial within
>> the 12 seconds and I couldn't reproduce it
>
> The discrete GPU needs to runtime suspend, that's the trigger,
> so no DRI_PRIME executables should be running.  Just let it
> autosuspend after boot.  Do you see "waiting 12 sec" messages
> in dmesg?  If not it's not autosuspending.
>
> Thanks,
>
> Lukas

Hi

Yes I'm seeing those messages, I'm just not seeing the hangs

I've attached the dmesg in case you're interested

Regards

Mike


dmesg.nohangs
Description: Binary data
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 1/5] workqueue: Allow retrieval of current task's work struct

2018-02-14 Thread Tejun Heo
Hello,

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> Introduce a helper to retrieve the current task's work struct if it is
> a workqueue worker.
> 
> This allows us to fix a long-standing deadlock in several DRM drivers
> wherein the ->runtime_suspend callback waits for a specific worker to
> finish and that worker in turn calls a function which waits for runtime
> suspend to finish.  That function is invoked from multiple call sites
> and waiting for runtime suspend to finish is the correct thing to do
> except if it's executing in the context of the worker.
> 
> Cc: Tejun Heo 
> Cc: Lai Jiangshan 
> Cc: Dave Airlie 
> Cc: Ben Skeggs 
> Cc: Alex Deucher 
> Signed-off-by: Lukas Wunner 

I wonder whether it's too generic a name but there are other functions
named in a similar fashion and AFAICS current_work isn't used by
anyone in the tree, so it seems okay.

Acked-by: Tejun Heo 

Please feel free to route as you see fit.

Thanks.

-- 
tejun
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Sean Paul
On Wed, Feb 14, 2018 at 03:43:56PM +0100, Michel Dänzer wrote:
> On 2018-02-14 03:08 PM, Sean Paul wrote:
> > On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote:
> >> Op 14-02-18 om 09:46 schreef Lukas Wunner:
> >>> Dear drm-misc maintainers,
> >>>
> >>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
>  Fix a deadlock on hybrid graphics laptops that's been present since 2013:
> >>> This series has been reviewed, consent has been expressed by the most
> >>> interested parties, patch [1/5] which touches files outside drivers/gpu
> >>> has been acked and I've just out a v2 addressing the only objection
> >>> raised.  My plan is thus to wait another two days for comments and,
> >>> barring further objections, push to drm-misc this weekend.
> >>>
> >>> However I'm struggling with the decision whether to push to next or
> >>> fixes.  The series is marked for stable, however the number of
> >>> affected machines is limited and for an issue that's been present
> >>> for 5 years it probably doesn't matter if it soaks another two months
> >>> in linux-next befor it gets backported.  Hence I tend to err on the
> >>> side of caution and push to next, however a case could be made that
> >>> fixes is more appropriate.
> >>>
> >>> I'm lacking experience making such decisions and would be interested
> >>> to learn how you'd handle this.
> >>>
> >>> Thanks,
> >>>
> >>> Lukas
> >>
> >> I would say fixes, it doesn't look particularly scary. :)
> > 
> > Agreed. If it's good enough for stable, it's good enough for -fixes!
> 
> It's not that simple, is it? Fast-tracking patches (some of which appear
> to be untested) to stable without an immediate cause for urgency seems
> risky to me.
> 

/me should be more careful what he says

Given where we are in the release cycle, it's barely a fast track. If these go
in -fixes, they'll get in -rc2 and will have plenty of time to bake. If we were
at rc5, it might be a different story.

Sean

> 
> -- 
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer

-- 
Sean Paul, Software Engineer, Google / Chromium OS
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Michel Dänzer
On 2018-02-14 03:08 PM, Sean Paul wrote:
> On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote:
>> Op 14-02-18 om 09:46 schreef Lukas Wunner:
>>> Dear drm-misc maintainers,
>>>
>>> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
 Fix a deadlock on hybrid graphics laptops that's been present since 2013:
>>> This series has been reviewed, consent has been expressed by the most
>>> interested parties, patch [1/5] which touches files outside drivers/gpu
>>> has been acked and I've just out a v2 addressing the only objection
>>> raised.  My plan is thus to wait another two days for comments and,
>>> barring further objections, push to drm-misc this weekend.
>>>
>>> However I'm struggling with the decision whether to push to next or
>>> fixes.  The series is marked for stable, however the number of
>>> affected machines is limited and for an issue that's been present
>>> for 5 years it probably doesn't matter if it soaks another two months
>>> in linux-next befor it gets backported.  Hence I tend to err on the
>>> side of caution and push to next, however a case could be made that
>>> fixes is more appropriate.
>>>
>>> I'm lacking experience making such decisions and would be interested
>>> to learn how you'd handle this.
>>>
>>> Thanks,
>>>
>>> Lukas
>>
>> I would say fixes, it doesn't look particularly scary. :)
> 
> Agreed. If it's good enough for stable, it's good enough for -fixes!

It's not that simple, is it? Fast-tracking patches (some of which appear
to be untested) to stable without an immediate cause for urgency seems
risky to me.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Ilia Mirkin
On Wed, Feb 14, 2018 at 9:35 AM, Ilia Mirkin  wrote:
> On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
>>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>>
>> NV5 in another PC (secondary card in x86-64) made the systrem crash on
>> boot, in nvkm_therm_clkgate_fini.
>
> Mind booting with nouveau.debug=trace? That should hopefully tell us
> more exactly which thing is dying. If you have a cross-compile/distcc
> setup handy, a bisect may be even more useful.

Erm, sorry, nevermind. You even said it -- nvkm_therm_clkgate_fini is
somehow mis-hooked up for NV5 now. A bisect result would still make
the culprit a lot more obvious.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Ilia Mirkin
On Wed, Feb 14, 2018 at 9:29 AM, Meelis Roos  wrote:
>> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:
>
> NV5 in another PC (secondary card in x86-64) made the systrem crash on
> boot, in nvkm_therm_clkgate_fini.

Mind booting with nouveau.debug=trace? That should hopefully tell us
more exactly which thing is dying. If you have a cross-compile/distcc
setup handy, a bisect may be even more useful.

It's funny, I had a NV5 plugged into my desktop for testing, and
*just* took it out (because the box wouldn't even get to BIOS anymore
... although it was unrelated to the NV5, probably just something
mis-seated.)

  -ilia
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] 4.16-rc1: UBSAN warning in nouveau/nvkm/subdev/therm/base.c + oops in nvkm_therm_clkgate_fini

2018-02-14 Thread Meelis Roos
> This is 4.16-rc1+todays git on a lowly P4 with NV5, worked fine in 4.15:

NV5 in another PC (secondary card in x86-64) made the systrem crash on 
boot, in nvkm_therm_clkgate_fini.

-- 
Meelis Roos (mr...@linux.ee)
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Sean Paul
On Wed, Feb 14, 2018 at 10:26:35AM +0100, Maarten Lankhorst wrote:
> Op 14-02-18 om 09:46 schreef Lukas Wunner:
> > Dear drm-misc maintainers,
> >
> > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> >> Fix a deadlock on hybrid graphics laptops that's been present since 2013:
> > This series has been reviewed, consent has been expressed by the most
> > interested parties, patch [1/5] which touches files outside drivers/gpu
> > has been acked and I've just out a v2 addressing the only objection
> > raised.  My plan is thus to wait another two days for comments and,
> > barring further objections, push to drm-misc this weekend.
> >
> > However I'm struggling with the decision whether to push to next or
> > fixes.  The series is marked for stable, however the number of
> > affected machines is limited and for an issue that's been present
> > for 5 years it probably doesn't matter if it soaks another two months
> > in linux-next befor it gets backported.  Hence I tend to err on the
> > side of caution and push to next, however a case could be made that
> > fixes is more appropriate.
> >
> > I'm lacking experience making such decisions and would be interested
> > to learn how you'd handle this.
> >
> > Thanks,
> >
> > Lukas
> 
> I would say fixes, it doesn't look particularly scary. :)

Agreed. If it's good enough for stable, it's good enough for -fixes!

Sean

> 
> ~Maarten
> 

-- 
Sean Paul, Software Engineer, Google / Chromium OS
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Lukas Wunner
On Tue, Feb 13, 2018 at 03:46:08PM +, Liviu Dudau wrote:
> On Tue, Feb 13, 2018 at 12:52:06PM +0100, Lukas Wunner wrote:
> > On Tue, Feb 13, 2018 at 10:55:06AM +, Liviu Dudau wrote:
> > > On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> > > > DRM drivers poll connectors in 10 sec intervals.  The poll worker is
> > > > stopped on ->runtime_suspend with cancel_delayed_work_sync().  However
> > > > the poll worker invokes the DRM drivers' ->detect callbacks, which call
> > > > pm_runtime_get_sync().  If the poll worker starts after runtime suspend
> > > > has begun, pm_runtime_get_sync() will wait for runtime suspend to finish
> > > > with the intention of runtime resuming the device afterwards.  The 
> > > > result
> > > > is a circular wait between poll worker and autosuspend worker.
> > > 
> > > I think I understand the problem you are trying to solve, but I'm
> > > struggling to understand where malidp makes any specific mistakes. First
> > > of all, malidp is only a display engine, so there is no GPU attached to
> > > it, but that is only a small clarification. Second, malidp doesn't use
> > > directly any of the callbacks that you are referring to, it uses the
> > > drm_cma_() API plus the generic drm_() call. So if there are any
> > > issues there (as they might well be) I think they would apply to a lot
> > > more drivers and the fix will involve more than just malidp, i915 and
> > > msm.
[snip]
> > There are no ->detect hooks declared
> > in drivers/gpu/drm/arm/, so it's unclear to me whether you're able to probe
> > during runtime suspend.
> 
> That's because the drivers in drivers/gpu/drm/arm do not have
> connectors, they are only the CRTC part of the driver. Both hdlcd and
> mali-dp use the component framework to locate an encoder in device tree
> that will then provide the connectors.
> 
> > 
> > hdlcd_drv.c and malidp_drv.c both enable output polling.  Output polling
> > is only necessary if you don't get HPD interrupts.
> 
> That's right, hdlcd and mali-dp don't receive HPD interrupts because
> they don't have any. And because we don't know ahead of time which
> encoder/connector will be paired with the driver, we enable polling as a
> safe fallback.
> 

Looking e.g. at inno_hdmi.c (used by rk3036.dtsi), this calls
drm_helper_hpd_irq_event() on receiving an HPD interrupt, and that
function returns immediately if polling is not enabled.  So you *have*
to enable polling to receive HPD events.

You seem to keep the crtc runtime active as long as it's bound to an
encoder.  If you do not ever intend to runtime suspend the crtc while
an encoder is attached, you don't need to keep polling enabled during
runtime suspend (because there's nothing to poll), but it shouldn't
hurt either.

If you would runtime suspend while an encoder is attached, then you
would only runtime resume every 10 sec (upon polling) if the encoder
was a child of the crtc and would support runtime suspend as well.
That's because the PM core wakes the parent by default when a child
runtime resumes.  However in the DT's I've looked at, the encoder
is never a child of the crtc and at least inno_hdmi.c doesn't use
runtime suspend.

So I think you're all green, I can't spot any grave issues here.
Just be aware of the above-mentioned constraints.

Thanks,

Lukas
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Maarten Lankhorst
Op 14-02-18 om 09:46 schreef Lukas Wunner:
> Dear drm-misc maintainers,
>
> On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
>> Fix a deadlock on hybrid graphics laptops that's been present since 2013:
> This series has been reviewed, consent has been expressed by the most
> interested parties, patch [1/5] which touches files outside drivers/gpu
> has been acked and I've just out a v2 addressing the only objection
> raised.  My plan is thus to wait another two days for comments and,
> barring further objections, push to drm-misc this weekend.
>
> However I'm struggling with the decision whether to push to next or
> fixes.  The series is marked for stable, however the number of
> affected machines is limited and for an issue that's been present
> for 5 years it probably doesn't matter if it soaks another two months
> in linux-next befor it gets backported.  Hence I tend to err on the
> side of caution and push to next, however a case could be made that
> fixes is more appropriate.
>
> I'm lacking experience making such decisions and would be interested
> to learn how you'd handle this.
>
> Thanks,
>
> Lukas

I would say fixes, it doesn't look particularly scary. :)

~Maarten

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 0/5] Fix deadlock on runtime suspend in DRM drivers

2018-02-14 Thread Lukas Wunner
Dear drm-misc maintainers,

On Sun, Feb 11, 2018 at 10:38:28AM +0100, Lukas Wunner wrote:
> Fix a deadlock on hybrid graphics laptops that's been present since 2013:

This series has been reviewed, consent has been expressed by the most
interested parties, patch [1/5] which touches files outside drivers/gpu
has been acked and I've just out a v2 addressing the only objection
raised.  My plan is thus to wait another two days for comments and,
barring further objections, push to drm-misc this weekend.

However I'm struggling with the decision whether to push to next or
fixes.  The series is marked for stable, however the number of
affected machines is limited and for an issue that's been present
for 5 years it probably doesn't matter if it soaks another two months
in linux-next befor it gets backported.  Hence I tend to err on the
side of caution and push to next, however a case could be made that
fixes is more appropriate.

I'm lacking experience making such decisions and would be interested
to learn how you'd handle this.

Thanks,

Lukas
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH 2/5] drm: Allow determining if current task is output poll worker

2018-02-14 Thread Lukas Wunner
On Mon, Feb 12, 2018 at 12:46:11PM -0500, Lyude Paul wrote:
> On Sun, 2018-02-11 at 10:38 +0100, Lukas Wunner wrote:
> > Introduce a helper to determine if the current task is an output poll
> > worker.
> > 
> > This allows us to fix a long-standing deadlock in several DRM drivers
> > wherein the ->runtime_suspend callback waits for the output poll worker
> > to finish and the worker in turn calls a ->detect callback which waits
> > for runtime suspend to finish.  The ->detect callback is invoked from
> > multiple call sites and waiting for runtime suspend to finish is the
> > correct thing to do except if it's executing in the context of the
> > worker.
[snip]
> > +/**
> > + * drm_kms_helper_is_poll_worker - is %current task an output poll worker?
> > + *
> > + * Determine if %current task is an output poll worker.  This can be used
> > + * to select distinct code paths for output polling versus other contexts.
> > + */
>
> For this, it would be worth explicitly noting in the comments herethat this
> should be called by DRM drivers in order to prevent racing with hotplug
> polling workers, so that new drivers in the future can avoid implementing this
> race condition in their driver.

Good point, I've just sent out a v2 to address your comment.  Let me know
if this isn't what you had in mind.  It may also be worth to expand the
DOC section at the top of drm_probe_helper.c to explain the interaction
between polling and runtime suspend in more detail, but I think this is
better done in a separate patch to keep the present patch small and thus
easily backportable to stable.

Thanks a lot for the review,

Lukas
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau