from:"Mario Kleiner"

Re: [PATCH] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA

2018-02-18 Thread Mario Kleiner


On 02/18/2018 09:53 AM, Kai-Heng Feng wrote:

Similar to commit e10aec652f31 ("drm/edid: Add 6 bpc quirk for display
AEO model 0."), the EDID reports "DFP 1.x compliant TMDS" but it support
6bpc instead of 8 bpc.

Hence, use 6 bpc quirk for this panel.

Fixes: 196f954e2509 ("drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink 
capability is unknown"")
BugLink: https://bugs.launchpad.net/bugs/1749420
Signed-off-by: Kai-Heng Feng <kai.heng.f...@canonical.com>
---
  drivers/gpu/drm/drm_edid.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index ddd537914575..d9c8d718e261 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -113,6 +113,9 @@ static const struct edid_quirk {
/* AEO model 0 reports 8 bpc, but is a 6 bpc panel */
{ "AEO", 0, EDID_QUIRK_FORCE_6BPC },
  
+	/* CPT panel of Asus UX303LA reports 8 bpc, but is a 6 bpc panel */

+   { "CPT", 0x17df, EDID_QUIRK_FORCE_6BPC },
+
/* Belinea 10 15 55 */
{ "MAX", 1516, EDID_QUIRK_PREFER_LARGE_60 },
    { "MAX", 0x77e, EDID_QUIRK_PREFER_LARGE_60 },



Reviewed-by: Mario Kleiner <mario.kleiner...@gmail.com>

Re: [PATCH] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA

2018-02-18 Thread Mario Kleiner


On 02/18/2018 09:53 AM, Kai-Heng Feng wrote:

Similar to commit e10aec652f31 ("drm/edid: Add 6 bpc quirk for display
AEO model 0."), the EDID reports "DFP 1.x compliant TMDS" but it support
6bpc instead of 8 bpc.

Hence, use 6 bpc quirk for this panel.

Fixes: 196f954e2509 ("drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink 
capability is unknown"")
BugLink: https://bugs.launchpad.net/bugs/1749420
Signed-off-by: Kai-Heng Feng 
---
  drivers/gpu/drm/drm_edid.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
index ddd537914575..d9c8d718e261 100644
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@ -113,6 +113,9 @@ static const struct edid_quirk {
/* AEO model 0 reports 8 bpc, but is a 6 bpc panel */
{ "AEO", 0, EDID_QUIRK_FORCE_6BPC },
  
+	/* CPT panel of Asus UX303LA reports 8 bpc, but is a 6 bpc panel */

+   { "CPT", 0x17df, EDID_QUIRK_FORCE_6BPC },
+
/* Belinea 10 15 55 */
{ "MAX", 1516, EDID_QUIRK_PREFER_LARGE_60 },
    { "MAX", 0x77e, EDID_QUIRK_PREFER_LARGE_60 },



Reviewed-by: Mario Kleiner

Re: [PATCH 4.4 030/103] drm/amdgpu: Make display watermark calculations more accurate

2017-06-06 Thread Mario Kleiner

On Thu, Jun 1, 2017 at 1:13 PM, Ben Hutchings
<ben.hutchi...@codethink.co.uk> wrote:
> On Tue, 2017-05-23 at 22:08 +0200, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------
>>
>> From: Mario Kleiner <mario.kleiner...@gmail.com>
>>
>> commit d63c277dc672e0c568481af043359420fa9d4736 upstream.
>>
>> Avoid big roundoff errors in scanline/hactive durations for
>> high pixel clocks, especially for >= 500 Mhz, and thereby
>> program more accurate display fifo watermarks.
> [...]
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
>> @@ -1237,14 +1237,14 @@ static void dce_v10_0_program_watermarks
>>  {
>>   struct drm_display_mode *mode = _crtc->base.mode;
>>   struct dce10_wm_params wm_low, wm_high;
>> - u32 pixel_period;
>> + u32 active_time;
>>   u32 line_time = 0;
>>   u32 latency_watermark_a = 0, latency_watermark_b = 0;
>>   u32 tmp, wm_mask, lb_vblank_lead_lines = 0;
>>
>>   if (amdgpu_crtc->base.enabled && num_heads && mode) {
>> - pixel_period = 100 / (u32)mode->clock;
>> - line_time = min((u32)mode->crtc_htotal * pixel_period, 
>> (u32)65535);
>> + active_time = 100UL * (u32)mode->crtc_hdisplay / 
>> (u32)mode->clock;
>> + line_time = min((u32) (100UL * (u32)mode->crtc_htotal / 
>> (u32)mode->clock), (u32)65535);
> [...]
>
> Won't these multiplications overflow if a >4K display is attached to a
> 32-bit machine?
>
> Ben.
>

Yes, indeed > 4k causes a new overflow problem! Thanks for catching this.
I will prepare a follow-up patch on top of this one, to use ...

active_time = (u32) div_u64((u64) mode->crtc_hdisplay * 100,
(u32)mode->clock);
line_time = (u32) div_u64((u64) mode->crtc_htotal * 100, (u32)mode->clock);
line_time = min(line_time, (u32) 65535);

...instead.

Ok?
-mario

> --
> Ben Hutchings
> Software Developer, Codethink Ltd.
>
>

Re: [PATCH 4.4 030/103] drm/amdgpu: Make display watermark calculations more accurate

2017-06-06 Thread Mario Kleiner

On Thu, Jun 1, 2017 at 1:13 PM, Ben Hutchings
 wrote:
> On Tue, 2017-05-23 at 22:08 +0200, Greg Kroah-Hartman wrote:
>> 4.4-stable review patch.  If anyone has any objections, please let me know.
>>
>> ------
>>
>> From: Mario Kleiner 
>>
>> commit d63c277dc672e0c568481af043359420fa9d4736 upstream.
>>
>> Avoid big roundoff errors in scanline/hactive durations for
>> high pixel clocks, especially for >= 500 Mhz, and thereby
>> program more accurate display fifo watermarks.
> [...]
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
>> @@ -1237,14 +1237,14 @@ static void dce_v10_0_program_watermarks
>>  {
>>   struct drm_display_mode *mode = _crtc->base.mode;
>>   struct dce10_wm_params wm_low, wm_high;
>> - u32 pixel_period;
>> + u32 active_time;
>>   u32 line_time = 0;
>>   u32 latency_watermark_a = 0, latency_watermark_b = 0;
>>   u32 tmp, wm_mask, lb_vblank_lead_lines = 0;
>>
>>   if (amdgpu_crtc->base.enabled && num_heads && mode) {
>> - pixel_period = 100 / (u32)mode->clock;
>> - line_time = min((u32)mode->crtc_htotal * pixel_period, 
>> (u32)65535);
>> + active_time = 100UL * (u32)mode->crtc_hdisplay / 
>> (u32)mode->clock;
>> + line_time = min((u32) (100UL * (u32)mode->crtc_htotal / 
>> (u32)mode->clock), (u32)65535);
> [...]
>
> Won't these multiplications overflow if a >4K display is attached to a
> 32-bit machine?
>
> Ben.
>

Yes, indeed > 4k causes a new overflow problem! Thanks for catching this.
I will prepare a follow-up patch on top of this one, to use ...

active_time = (u32) div_u64((u64) mode->crtc_hdisplay * 100,
(u32)mode->clock);
line_time = (u32) div_u64((u64) mode->crtc_htotal * 100, (u32)mode->clock);
line_time = min(line_time, (u32) 65535);

...instead.

Ok?
-mario

> --
> Ben Hutchings
> Software Developer, Codethink Ltd.
>
>

Re: [PATCH 18/28] drm: avoid uninitialized timestamp use in wait_vblank

2016-10-17 Thread Mario Kleiner


On 10/18/2016 12:13 AM, Arnd Bergmann wrote:

gcc warns about the timestamp in drm_wait_vblank being possibly
used without an initialization:

drivers/gpu/drm/drm_irq.c: In function 'drm_crtc_send_vblank_event':
drivers/gpu/drm/drm_irq.c:992:24: error: 'now.tv_usec' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
drivers/gpu/drm/drm_irq.c:1069:17: note: 'now.tv_usec' was declared here
drivers/gpu/drm/drm_irq.c:991:23: error: 'now.tv_sec' may be used uninitialized 
in this function [-Werror=maybe-uninitialized]

This can happen if drm_vblank_count_and_time() returns 0 in its
error path. To sanitize the error case, I'm changing that function
to return a zero timestamp when it fails.

Fixes: e6ae8687a87b ("drm: idiot-proof vblank")
Reviewed-by: David Herrmann <dh.herrm...@gmail.com>
Cc: Rob Clark <robdcl...@gmail.com>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Signed-off-by: Arnd Bergmann <a...@arndb.de>
---
First submitted in January 2016, second submission in February,
the patch is still required.

 drivers/gpu/drm/drm_irq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index b969a64..48a6167 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -952,8 +952,10 @@ static u32 drm_vblank_count_and_time(struct drm_device 
*dev, unsigned int pipe,
u32 vblank_count;
unsigned int seq;

-   if (WARN_ON(pipe >= dev->num_crtcs))
+   if (WARN_ON(pipe >= dev->num_crtcs)) {
+   *vblanktime = (struct timeval) { 0 };
return 0;
+   }

do {
seq = read_seqbegin(>seqlock);



Looks good to me.

Reviewed-by: Mario Kleiner <mario.kleiner...@gmail.com>

-mario

Re: [PATCH 18/28] drm: avoid uninitialized timestamp use in wait_vblank

2016-10-17 Thread Mario Kleiner


On 10/18/2016 12:13 AM, Arnd Bergmann wrote:

gcc warns about the timestamp in drm_wait_vblank being possibly
used without an initialization:

drivers/gpu/drm/drm_irq.c: In function 'drm_crtc_send_vblank_event':
drivers/gpu/drm/drm_irq.c:992:24: error: 'now.tv_usec' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
drivers/gpu/drm/drm_irq.c:1069:17: note: 'now.tv_usec' was declared here
drivers/gpu/drm/drm_irq.c:991:23: error: 'now.tv_sec' may be used uninitialized 
in this function [-Werror=maybe-uninitialized]

This can happen if drm_vblank_count_and_time() returns 0 in its
error path. To sanitize the error case, I'm changing that function
to return a zero timestamp when it fails.

Fixes: e6ae8687a87b ("drm: idiot-proof vblank")
Reviewed-by: David Herrmann 
Cc: Rob Clark 
Cc: Daniel Vetter 
Signed-off-by: Arnd Bergmann 
---
First submitted in January 2016, second submission in February,
the patch is still required.

 drivers/gpu/drm/drm_irq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index b969a64..48a6167 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -952,8 +952,10 @@ static u32 drm_vblank_count_and_time(struct drm_device 
*dev, unsigned int pipe,
u32 vblank_count;
unsigned int seq;

-   if (WARN_ON(pipe >= dev->num_crtcs))
+   if (WARN_ON(pipe >= dev->num_crtcs)) {
+   *vblanktime = (struct timeval) { 0 };
return 0;
+   }

do {
seq = read_seqbegin(>seqlock);



Looks good to me.

Reviewed-by: Mario Kleiner 

-mario

Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-14 Thread Mario Kleiner

Ok, so legacy gamma table updates are completely broken for Intel on 
Linux-4.7-rc7, the final release candidate.


The good news is that applying Lionel's patch

"drm/i915: add missing condition for committing planes on crtc"

from

https://patchwork.freedesktop.org/patch/89111/

fixes it nicely. The patch currently applies cleanly to drm-fixes and 
drm-next and is


Reviewed-and-tested-by: Mario Kleiner <mario.kleiner...@gmail.com>


When we are at it, could somebody please look at that updated series of 
my Displayport color depth fixes ("EDID/DP fixes for proper bpc 
detection of displays.") i sent out a week ago?


Especially pulling patch 2/5 "[PATCH 2/5] drm/i915/dp: Revert 
"drm/i915/dp: fall back to 18 bpp when sink capability is unknown" would 
be important, as that bug introduced a regression for Intel + DP + 
legacy DP converters into stable kernels which is very serious for users 
of scientific/medical display equipment, especially as the failures can 
easily go unnoticed during normal equipment tests, but would introduce 
the equivalent of "silent data corruption" into their measured 
scientific data, which is not a great experience given that collecting 
such data can easily take half a year of work time and ten-thousands of 
euros of wasted research funding.


Patches 3 and 4 contain changes Daniel asked me to do, patch 5 would be 
good to safe-guard against similar issues in the future.


thanks,
-mario

On 07/12/2016 12:50 PM, Lionel Landwerlin wrote:

Hi Mario,

There was a couple of patch to fix this issue :

https://patchwork.freedesktop.org/series/5467/
https://patchwork.freedesktop.org/series/5466/

I tested this late last week on drm-intel-nightly, it seems a series of
revert fixed most of the issues.

Cheers,

-
Lionel

On 12/07/16 11:33, Mario Kleiner wrote:

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com>
Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
  drivers/gpu/drm/i915/intel_display.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct
drm_device *dev,
  bool modeset = needs_modeset(crtc->state);
  struct intel_crtc_state *pipe_config =
  to_intel_crtc_state(crtc->state);
-bool update_pipe = !modeset && pipe_config->update_pipe;
  if (modeset && crtc->state->active) {
  update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct
drm_device *dev,
  drm_atomic_get_existing_plane_state(state, crtc->primary))
  intel_fbc_enable(intel_crtc);
-if (crtc->state->active &&
-(crtc->state->planes_changed || update_pipe))
+if (crtc->state->active)
  drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
  if (pipe_config->base.active && needs_vblank_wait(pipe_config))

Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-14 Thread Mario Kleiner

Ok, so legacy gamma table updates are completely broken for Intel on 
Linux-4.7-rc7, the final release candidate.


The good news is that applying Lionel's patch

"drm/i915: add missing condition for committing planes on crtc"

from

https://patchwork.freedesktop.org/patch/89111/

fixes it nicely. The patch currently applies cleanly to drm-fixes and 
drm-next and is


Reviewed-and-tested-by: Mario Kleiner 


When we are at it, could somebody please look at that updated series of 
my Displayport color depth fixes ("EDID/DP fixes for proper bpc 
detection of displays.") i sent out a week ago?


Especially pulling patch 2/5 "[PATCH 2/5] drm/i915/dp: Revert 
"drm/i915/dp: fall back to 18 bpp when sink capability is unknown" would 
be important, as that bug introduced a regression for Intel + DP + 
legacy DP converters into stable kernels which is very serious for users 
of scientific/medical display equipment, especially as the failures can 
easily go unnoticed during normal equipment tests, but would introduce 
the equivalent of "silent data corruption" into their measured 
scientific data, which is not a great experience given that collecting 
such data can easily take half a year of work time and ten-thousands of 
euros of wasted research funding.


Patches 3 and 4 contain changes Daniel asked me to do, patch 5 would be 
good to safe-guard against similar issues in the future.


thanks,
-mario

On 07/12/2016 12:50 PM, Lionel Landwerlin wrote:

Hi Mario,

There was a couple of patch to fix this issue :

https://patchwork.freedesktop.org/series/5467/
https://patchwork.freedesktop.org/series/5466/

I tested this late last week on drm-intel-nightly, it seems a series of
revert fixed most of the issues.

Cheers,

-
Lionel

On 12/07/16 11:33, Mario Kleiner wrote:

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner 
Cc: Maarten Lankhorst 
Cc: Lionel Landwerlin 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/i915/intel_display.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct
drm_device *dev,
  bool modeset = needs_modeset(crtc->state);
  struct intel_crtc_state *pipe_config =
  to_intel_crtc_state(crtc->state);
-bool update_pipe = !modeset && pipe_config->update_pipe;
  if (modeset && crtc->state->active) {
  update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct
drm_device *dev,
  drm_atomic_get_existing_plane_state(state, crtc->primary))
  intel_fbc_enable(intel_crtc);
-if (crtc->state->active &&
-(crtc->state->planes_changed || update_pipe))
+if (crtc->state->active)
  drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
  if (pipe_config->base.active && needs_vblank_wait(pipe_config))

Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-12 Thread Mario Kleiner


On 07/12/2016 05:02 PM, Lionel Landwerlin wrote:

On 12/07/16 13:11, Mario Kleiner wrote:

On 07/12/2016 12:50 PM, Lionel Landwerlin wrote:

Hi Mario,



Hi Lionel,


There was a couple of patch to fix this issue :

https://patchwork.freedesktop.org/series/5467/
https://patchwork.freedesktop.org/series/5466/



Looking at them they should fix the issue, but they seem to be stuck
in review?


I tested this late last week on drm-intel-nightly, it seems a series of
revert fixed most of the issues.



You mean something else has fixed legacy gamma updates, as i can't
find above patches applied on drm-intel-nightly?


This revert on drm-intel-nightly seems to have fixed the problem :

https://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915?id=e42aeef1237b7c969a77b7f726c50f6cb832185f



Ok, with that intel-nightly looks like drm-next for 4.8 and that indeed 
has working lut updates in my testing. My own patch was motivated by the 
way the implementation is done in intel_atomic_commit_tail() from drm-next.






Are those fixes supposed to be already part of 4.7-rc7, the final rc
afaik?


I haven't seen it on 4.7-rc7.



I just checked Linus tree for 4.7-rc7 and there the code in 
intel_display.c didn't receive any updates since 13 days and looks like 
the broken code from rc6 which according to my testing doesn't work.


So i'd assume legacy gamma table updates are broken in Linux 4.7 final 
rc atm. Couldn't test, because for some weird reason 4.7-rc7 doesn't 
even boot on my laptop :( - However i got that via a quick install from 
Ubuntu's mainline ppa so it could be some unrelated problem with their 
ppa builds.


I think either my patch would fix it, but is untested wrt. nuclear 
pageflip, or those two patches you referenced, which apparently didn't 
move forward.


What now?
-mario



thanks,
-mario



Cheers,

-
Lionel

On 12/07/16 11:33, Mario Kleiner wrote:

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com>
Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
  drivers/gpu/drm/i915/intel_display.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct
drm_device *dev,
  bool modeset = needs_modeset(crtc->state);
  struct intel_crtc_state *pipe_config =
  to_intel_crtc_state(crtc->state);
-bool update_pipe = !modeset && pipe_config->update_pipe;
  if (modeset && crtc->state->active) {
  update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct
drm_device *dev,
  drm_atomic_get_existing_plane_state(state,
crtc->primary))
  intel_fbc_enable(intel_crtc);
-if (crtc->state->active &&
-(crtc->state->planes_changed || update_pipe))
+if (crtc->state->active)
drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
  if (pipe_config->base.active &&
needs_vblank_wait(pipe_config))

Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-12 Thread Mario Kleiner


On 07/12/2016 05:02 PM, Lionel Landwerlin wrote:

On 12/07/16 13:11, Mario Kleiner wrote:

On 07/12/2016 12:50 PM, Lionel Landwerlin wrote:

Hi Mario,



Hi Lionel,


There was a couple of patch to fix this issue :

https://patchwork.freedesktop.org/series/5467/
https://patchwork.freedesktop.org/series/5466/



Looking at them they should fix the issue, but they seem to be stuck
in review?


I tested this late last week on drm-intel-nightly, it seems a series of
revert fixed most of the issues.



You mean something else has fixed legacy gamma updates, as i can't
find above patches applied on drm-intel-nightly?


This revert on drm-intel-nightly seems to have fixed the problem :

https://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915?id=e42aeef1237b7c969a77b7f726c50f6cb832185f



Ok, with that intel-nightly looks like drm-next for 4.8 and that indeed 
has working lut updates in my testing. My own patch was motivated by the 
way the implementation is done in intel_atomic_commit_tail() from drm-next.






Are those fixes supposed to be already part of 4.7-rc7, the final rc
afaik?


I haven't seen it on 4.7-rc7.



I just checked Linus tree for 4.7-rc7 and there the code in 
intel_display.c didn't receive any updates since 13 days and looks like 
the broken code from rc6 which according to my testing doesn't work.


So i'd assume legacy gamma table updates are broken in Linux 4.7 final 
rc atm. Couldn't test, because for some weird reason 4.7-rc7 doesn't 
even boot on my laptop :( - However i got that via a quick install from 
Ubuntu's mainline ppa so it could be some unrelated problem with their 
ppa builds.


I think either my patch would fix it, but is untested wrt. nuclear 
pageflip, or those two patches you referenced, which apparently didn't 
move forward.


What now?
-mario



thanks,
-mario



Cheers,

-
Lionel

On 12/07/16 11:33, Mario Kleiner wrote:

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner 
Cc: Maarten Lankhorst 
Cc: Lionel Landwerlin 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/i915/intel_display.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct
drm_device *dev,
  bool modeset = needs_modeset(crtc->state);
  struct intel_crtc_state *pipe_config =
  to_intel_crtc_state(crtc->state);
-bool update_pipe = !modeset && pipe_config->update_pipe;
  if (modeset && crtc->state->active) {
  update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct
drm_device *dev,
  drm_atomic_get_existing_plane_state(state,
crtc->primary))
  intel_fbc_enable(intel_crtc);
-if (crtc->state->active &&
-(crtc->state->planes_changed || update_pipe))
+if (crtc->state->active)
drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
  if (pipe_config->base.active &&
needs_vblank_wait(pipe_config))

Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-12 Thread Mario Kleiner


On 07/12/2016 12:50 PM, Lionel Landwerlin wrote:

Hi Mario,



Hi Lionel,


There was a couple of patch to fix this issue :

https://patchwork.freedesktop.org/series/5467/
https://patchwork.freedesktop.org/series/5466/



Looking at them they should fix the issue, but they seem to be stuck in 
review?



I tested this late last week on drm-intel-nightly, it seems a series of
revert fixed most of the issues.



You mean something else has fixed legacy gamma updates, as i can't find 
above patches applied on drm-intel-nightly?


Are those fixes supposed to be already part of 4.7-rc7, the final rc afaik?

thanks,
-mario



Cheers,

-
Lionel

On 12/07/16 11:33, Mario Kleiner wrote:

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com>
Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
  drivers/gpu/drm/i915/intel_display.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct
drm_device *dev,
  bool modeset = needs_modeset(crtc->state);
  struct intel_crtc_state *pipe_config =
  to_intel_crtc_state(crtc->state);
-bool update_pipe = !modeset && pipe_config->update_pipe;
  if (modeset && crtc->state->active) {
  update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct
drm_device *dev,
  drm_atomic_get_existing_plane_state(state, crtc->primary))
  intel_fbc_enable(intel_crtc);
-if (crtc->state->active &&
-(crtc->state->planes_changed || update_pipe))
+if (crtc->state->active)
  drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
  if (pipe_config->base.active && needs_vblank_wait(pipe_config))

Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-12 Thread Mario Kleiner


On 07/12/2016 12:50 PM, Lionel Landwerlin wrote:

Hi Mario,



Hi Lionel,


There was a couple of patch to fix this issue :

https://patchwork.freedesktop.org/series/5467/
https://patchwork.freedesktop.org/series/5466/



Looking at them they should fix the issue, but they seem to be stuck in 
review?



I tested this late last week on drm-intel-nightly, it seems a series of
revert fixed most of the issues.



You mean something else has fixed legacy gamma updates, as i can't find 
above patches applied on drm-intel-nightly?


Are those fixes supposed to be already part of 4.7-rc7, the final rc afaik?

thanks,
-mario



Cheers,

-
Lionel

On 12/07/16 11:33, Mario Kleiner wrote:

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner 
Cc: Maarten Lankhorst 
Cc: Lionel Landwerlin 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/i915/intel_display.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct
drm_device *dev,
  bool modeset = needs_modeset(crtc->state);
  struct intel_crtc_state *pipe_config =
  to_intel_crtc_state(crtc->state);
-bool update_pipe = !modeset && pipe_config->update_pipe;
  if (modeset && crtc->state->active) {
  update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct
drm_device *dev,
  drm_atomic_get_existing_plane_state(state, crtc->primary))
  intel_fbc_enable(intel_crtc);
-if (crtc->state->active &&
-(crtc->state->planes_changed || update_pipe))
+if (crtc->state->active)
  drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
  if (pipe_config->base.active && needs_vblank_wait(pipe_config))

[PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-12 Thread Mario Kleiner

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com>
Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com>
Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_display.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev,
bool modeset = needs_modeset(crtc->state);
struct intel_crtc_state *pipe_config =
to_intel_crtc_state(crtc->state);
-   bool update_pipe = !modeset && pipe_config->update_pipe;
 
if (modeset && crtc->state->active) {
update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev,
drm_atomic_get_existing_plane_state(state, crtc->primary))
intel_fbc_enable(intel_crtc);
 
-   if (crtc->state->active &&
-   (crtc->state->planes_changed || update_pipe))
+   if (crtc->state->active)
drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
 
if (pipe_config->base.active && needs_vblank_wait(pipe_config))
-- 
2.7.0

[PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6

2016-07-12 Thread Mario Kleiner

Updating legacy gamma tables, e.g., via RandR doesn't work at all
as of Linux 4.7-rc6.

Reason seems to be that the required call to
drm_atomic_helper_commit_planes_on_crtc is skipped in
intel_atomic_commit after userspace set new gamma tables,
because neither crtc->state->planes_changed nor
update_pipe (= pipe_config->update_pipe) are true.

Removing the check for planes_changed || update_pipe fixes
gamma table updates.

The code for Linux 4.8 drm-next has changed a lot in that area
wrt. 4.7, but the new code for 4.8 also removed those checks
and calls drm_atomic_helper_commit_planes_on_crtc unconditionally,
and legacy gamma lut updates work on drm-next, so this seems to be
the right solution.

Tested also shutdown/reboot, suspend/resume, (un-)plugging displays,
mode switches for resolution/refresh rate, display rotation, and
page-flipping/pageflip timing on Intel HD Ironlake to confirm the
fix apparently doesn't break anything under X11.

Signed-off-by: Mario Kleiner 
Cc: Maarten Lankhorst 
Cc: Lionel Landwerlin 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/i915/intel_display.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index 04452cf..eb8fb36 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev,
bool modeset = needs_modeset(crtc->state);
struct intel_crtc_state *pipe_config =
to_intel_crtc_state(crtc->state);
-   bool update_pipe = !modeset && pipe_config->update_pipe;
 
if (modeset && crtc->state->active) {
update_scanline_offset(to_intel_crtc(crtc));
@@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev,
drm_atomic_get_existing_plane_state(state, crtc->primary))
intel_fbc_enable(intel_crtc);
 
-   if (crtc->state->active &&
-   (crtc->state->planes_changed || update_pipe))
+   if (crtc->state->active)
drm_atomic_helper_commit_planes_on_crtc(old_crtc_state);
 
if (pipe_config->base.active && needs_vblank_wait(pipe_config))
-- 
2.7.0

Re: [PATCH 2/2] drm/vc4: Squash commit for Mario's precise vblank timestamping.

2016-07-09 Thread Mario Kleiner


Hi Eric,

thanks for all the infos and help! Both your patches look good and i 
have successfully tested them on top of with my vblank timestamping patch.


So for both:

Reviewed-and-tested-by: Mario Kleiner <mario.kleiner...@gmail.com>

Will you squash 2/2 into my patch or should i resend my patch with yours 
squashed in?


thanks,
-mario

On 07/08/2016 08:44 PM, Eric Anholt wrote:

Read out the DISPBASE registers to decide on the FIFO size.

Signed-off-by: Eric Anholt <e...@anholt.net>
---

Mario: How about this for a squash into your commit?  Here are the
values I dumped for cob_size:

[2.148314] [drm] Scaler 0 size 5232
[2.162239] [drm] Scaler 2 size 2048
[2.172957] [drm] Scaler 1 size 13456

  drivers/gpu/drm/vc4/vc4_crtc.c | 23 +--
  drivers/gpu/drm/vc4/vc4_regs.h | 18 +-
  2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index baf962bce063..3b7db17c356d 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -55,6 +55,8 @@ struct vc4_crtc {
u8 lut_r[256];
u8 lut_g[256];
u8 lut_b[256];
+   /* Size in pixels of the COB memory allocated to this CRTC. */
+   u32 cob_size;

struct drm_pending_vblank_event *event;
  };
@@ -195,8 +197,7 @@ int vc4_crtc_get_scanoutpos(struct drm_device *dev, 
unsigned int crtc_id,
*hpos = 0;

/* This is the offset we need for translating hvs -> pv scanout pos. */
-   /* XXX Find proper formula from hw docs instead of guesstimating? */
-   fifo_lines = 2048 * 7 / mode->crtc_hdisplay;
+   fifo_lines = vc4_crtc->cob_size / mode->crtc_hdisplay;

if (fifo_lines > 0)
ret |= DRM_SCANOUTPOS_VALID;
@@ -873,6 +874,22 @@ static void vc4_set_crtc_possible_masks(struct drm_device 
*drm,
}
  }

+static void
+vc4_crtc_get_cob_allocation(struct vc4_crtc *vc4_crtc)
+{
+   struct drm_device *drm = vc4_crtc->base.dev;
+   struct vc4_dev *vc4 = to_vc4_dev(drm);
+   u32 dispbase = HVS_READ(SCALER_DISPBASEX(vc4_crtc->channel));
+   /* Top/base are supposed to be 4-pixel aligned, but the
+* Raspberry Pi firmware fills the low bits (which are
+* presumably ignored).
+*/
+   u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3;
+   u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3;
+
+   vc4_crtc->cob_size = top - base + 4;
+}
+
  static int vc4_crtc_bind(struct device *dev, struct device *master, void 
*data)
  {
struct platform_device *pdev = to_platform_device(dev);
@@ -949,6 +966,8 @@ static int vc4_crtc_bind(struct device *dev, struct device 
*master, void *data)
crtc->cursor = cursor_plane;
}

+   vc4_crtc_get_cob_allocation(vc4_crtc);
+
CRTC_WRITE(PV_INTEN, 0);
CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START);
ret = devm_request_irq(dev, platform_get_irq(pdev, 0),
diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h
index 63cdc28ff7bb..160942a9180e 100644
--- a/drivers/gpu/drm/vc4/vc4_regs.h
+++ b/drivers/gpu/drm/vc4/vc4_regs.h
@@ -366,7 +366,6 @@
  # define SCALER_DISPBKGND_FILLBIT(24)

  #define SCALER_DISPSTAT00x0048
-#define SCALER_DISPBASE00x004c
  # define SCALER_DISPSTATX_MODE_MASK   VC4_MASK(31, 30)
  # define SCALER_DISPSTATX_MODE_SHIFT  30
  # define SCALER_DISPSTATX_MODE_DISABLED   0
@@ -379,6 +378,20 @@
  # define SCALER_DISPSTATX_FRAME_COUNT_SHIFT   12
  # define SCALER_DISPSTATX_LINE_MASK   VC4_MASK(11, 0)
  # define SCALER_DISPSTATX_LINE_SHIFT  0
+
+#define SCALER_DISPBASE00x004c
+/* Last pixel in the COB (display FIFO memory) allocated to this HVS
+ * channel.  Must be 4-pixel aligned (and thus 4 pixels less than the
+ * next COB base).
+ */
+# define SCALER_DISPBASEX_TOP_MASK VC4_MASK(31, 16)
+# define SCALER_DISPBASEX_TOP_SHIFT16
+/* First pixel in the COB (display FIFO memory) allocated to this HVS
+ * channel.  Must be 4-pixel aligned.
+ */
+# define SCALER_DISPBASEX_BASE_MASKVC4_MASK(15, 0)
+# define SCALER_DISPBASEX_BASE_SHIFT   0
+
  #define SCALER_DISPCTRL10x0050
  #define SCALER_DISPBKGND1   0x0054
  #define SCALER_DISPBKGNDX(x)  (SCALER_DISPBKGND0 +\
@@ -389,6 +402,9 @@
 (x) * (SCALER_DISPSTAT1 - \
SCALER_DISPSTAT0))
  #define SCALER_DISPBASE10x005c
+#define SCALER_DISPBASEX(x)(SCALER_DISPBASE0 +\
+

Re: [PATCH 2/2] drm/vc4: Squash commit for Mario's precise vblank timestamping.

2016-07-09 Thread Mario Kleiner


Hi Eric,

thanks for all the infos and help! Both your patches look good and i 
have successfully tested them on top of with my vblank timestamping patch.


So for both:

Reviewed-and-tested-by: Mario Kleiner 

Will you squash 2/2 into my patch or should i resend my patch with yours 
squashed in?


thanks,
-mario

On 07/08/2016 08:44 PM, Eric Anholt wrote:

Read out the DISPBASE registers to decide on the FIFO size.

Signed-off-by: Eric Anholt 
---

Mario: How about this for a squash into your commit?  Here are the
values I dumped for cob_size:

[2.148314] [drm] Scaler 0 size 5232
[2.162239] [drm] Scaler 2 size 2048
[2.172957] [drm] Scaler 1 size 13456

  drivers/gpu/drm/vc4/vc4_crtc.c | 23 +--
  drivers/gpu/drm/vc4/vc4_regs.h | 18 +-
  2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c
index baf962bce063..3b7db17c356d 100644
--- a/drivers/gpu/drm/vc4/vc4_crtc.c
+++ b/drivers/gpu/drm/vc4/vc4_crtc.c
@@ -55,6 +55,8 @@ struct vc4_crtc {
u8 lut_r[256];
u8 lut_g[256];
u8 lut_b[256];
+   /* Size in pixels of the COB memory allocated to this CRTC. */
+   u32 cob_size;

struct drm_pending_vblank_event *event;
  };
@@ -195,8 +197,7 @@ int vc4_crtc_get_scanoutpos(struct drm_device *dev, 
unsigned int crtc_id,
*hpos = 0;

/* This is the offset we need for translating hvs -> pv scanout pos. */
-   /* XXX Find proper formula from hw docs instead of guesstimating? */
-   fifo_lines = 2048 * 7 / mode->crtc_hdisplay;
+   fifo_lines = vc4_crtc->cob_size / mode->crtc_hdisplay;

if (fifo_lines > 0)
ret |= DRM_SCANOUTPOS_VALID;
@@ -873,6 +874,22 @@ static void vc4_set_crtc_possible_masks(struct drm_device 
*drm,
}
  }

+static void
+vc4_crtc_get_cob_allocation(struct vc4_crtc *vc4_crtc)
+{
+   struct drm_device *drm = vc4_crtc->base.dev;
+   struct vc4_dev *vc4 = to_vc4_dev(drm);
+   u32 dispbase = HVS_READ(SCALER_DISPBASEX(vc4_crtc->channel));
+   /* Top/base are supposed to be 4-pixel aligned, but the
+* Raspberry Pi firmware fills the low bits (which are
+* presumably ignored).
+*/
+   u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3;
+   u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3;
+
+   vc4_crtc->cob_size = top - base + 4;
+}
+
  static int vc4_crtc_bind(struct device *dev, struct device *master, void 
*data)
  {
struct platform_device *pdev = to_platform_device(dev);
@@ -949,6 +966,8 @@ static int vc4_crtc_bind(struct device *dev, struct device 
*master, void *data)
crtc->cursor = cursor_plane;
}

+   vc4_crtc_get_cob_allocation(vc4_crtc);
+
CRTC_WRITE(PV_INTEN, 0);
CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START);
ret = devm_request_irq(dev, platform_get_irq(pdev, 0),
diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h
index 63cdc28ff7bb..160942a9180e 100644
--- a/drivers/gpu/drm/vc4/vc4_regs.h
+++ b/drivers/gpu/drm/vc4/vc4_regs.h
@@ -366,7 +366,6 @@
  # define SCALER_DISPBKGND_FILLBIT(24)

  #define SCALER_DISPSTAT00x0048
-#define SCALER_DISPBASE00x004c
  # define SCALER_DISPSTATX_MODE_MASK   VC4_MASK(31, 30)
  # define SCALER_DISPSTATX_MODE_SHIFT  30
  # define SCALER_DISPSTATX_MODE_DISABLED   0
@@ -379,6 +378,20 @@
  # define SCALER_DISPSTATX_FRAME_COUNT_SHIFT   12
  # define SCALER_DISPSTATX_LINE_MASK   VC4_MASK(11, 0)
  # define SCALER_DISPSTATX_LINE_SHIFT  0
+
+#define SCALER_DISPBASE00x004c
+/* Last pixel in the COB (display FIFO memory) allocated to this HVS
+ * channel.  Must be 4-pixel aligned (and thus 4 pixels less than the
+ * next COB base).
+ */
+# define SCALER_DISPBASEX_TOP_MASK VC4_MASK(31, 16)
+# define SCALER_DISPBASEX_TOP_SHIFT16
+/* First pixel in the COB (display FIFO memory) allocated to this HVS
+ * channel.  Must be 4-pixel aligned.
+ */
+# define SCALER_DISPBASEX_BASE_MASKVC4_MASK(15, 0)
+# define SCALER_DISPBASEX_BASE_SHIFT   0
+
  #define SCALER_DISPCTRL10x0050
  #define SCALER_DISPBKGND1   0x0054
  #define SCALER_DISPBKGNDX(x)  (SCALER_DISPBKGND0 +\
@@ -389,6 +402,9 @@
 (x) * (SCALER_DISPSTAT1 - \
SCALER_DISPSTAT0))
  #define SCALER_DISPBASE10x005c
+#define SCALER_DISPBASEX(x)(SCALER_DISPBASE0 +\
+(x) * (SCALER_DISPBASE1 - \
+   SCALER_DIS

Re: [PATCH 07/14] drm/nouveau: use drm_crtc_send_vblank_event()

2016-04-25 Thread Mario Kleiner


On 04/14/2016 07:48 PM, Gustavo Padovan wrote:

From: Gustavo Padovan 

Replace the legacy drm_send_vblank_event() with the new helper function.

Signed-off-by: Gustavo Padovan 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 7ce7fa5..973c2d9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -841,10 +841,12 @@ nouveau_finish_page_flip(struct nouveau_channel *chan,

s = list_first_entry(>flip, struct nouveau_page_flip_state, head);


Hi Gustavo


if (s->event) {
+   struct drm_crtc *crtc = drm_crtc_find(dev, s->crtc);
+


I don't think this would work. s->crtc is a nouveau internal display 
pipe index, not a drm mode object id, so i don't think drm_crtc_find() 
will do what you need. Also it takes a mutex, which might_sleep() and i 
think nouveau_finish_page_flip gets called from irq context and holds a 
spin_lock_irqsave, so would end badly.


You'd probably have to extend struct nouveau_page_flip_state to carry 
around a reference to the required drm_crtc, set up in 
nouveau_crtc_page_flip().


-mario


if (drm->device.info.family < NV_DEVICE_INFO_V0_TESLA) {
drm_arm_vblank_event(dev, s->crtc, s->event);
} else {
-   drm_send_vblank_event(dev, s->crtc, s->event);
+   drm_crtc_send_vblank_event(crtc, s->event);

/* Give up ownership of vblank for page-flipped crtc */
drm_vblank_put(dev, s->crtc);

Re: [PATCH 07/14] drm/nouveau: use drm_crtc_send_vblank_event()

2016-04-25 Thread Mario Kleiner


On 04/14/2016 07:48 PM, Gustavo Padovan wrote:

From: Gustavo Padovan 

Replace the legacy drm_send_vblank_event() with the new helper function.

Signed-off-by: Gustavo Padovan 
---
  drivers/gpu/drm/nouveau/nouveau_display.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
b/drivers/gpu/drm/nouveau/nouveau_display.c
index 7ce7fa5..973c2d9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -841,10 +841,12 @@ nouveau_finish_page_flip(struct nouveau_channel *chan,

s = list_first_entry(>flip, struct nouveau_page_flip_state, head);


Hi Gustavo


if (s->event) {
+   struct drm_crtc *crtc = drm_crtc_find(dev, s->crtc);
+


I don't think this would work. s->crtc is a nouveau internal display 
pipe index, not a drm mode object id, so i don't think drm_crtc_find() 
will do what you need. Also it takes a mutex, which might_sleep() and i 
think nouveau_finish_page_flip gets called from irq context and holds a 
spin_lock_irqsave, so would end badly.


You'd probably have to extend struct nouveau_page_flip_state to carry 
around a reference to the required drm_crtc, set up in 
nouveau_crtc_page_flip().


-mario


if (drm->device.info.family < NV_DEVICE_INFO_V0_TESLA) {
drm_arm_vblank_event(dev, s->crtc, s->event);
} else {
-   drm_send_vblank_event(dev, s->crtc, s->event);
+   drm_crtc_send_vblank_event(crtc, s->event);

/* Give up ownership of vblank for page-flipped crtc */
drm_vblank_put(dev, s->crtc);

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 09:32 PM, Daniel Vetter wrote:

On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:



On 01/25/2016 07:51 PM, Daniel Vetter wrote:

On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:

Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 07:51 PM, Daniel Vetter wrote:

On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:

Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_update_vblank_count() if vblank irqs get actually disabled, not on
no-op calls. I will try that now.


It does that on

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner


Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_update_vblank_count() if vblank irqs get actually disabled, not on
no-op calls. I will try that now.


It does that on purpose. Otherwise the vblank counter would appear to
have stalled while the interrupt was off.



Ok, that's

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_update_vblank_count() if vblank irqs get actually disabled, not on
no-op calls. I will try that now.


It does that on purpose. Otherwise the vblank counter would appear to
have stalled while the interrupt was off.



Ok, that's what the comments there say, although i don't see atm. why 
that perceived stall would be a big problem. I checked all callers of 
vblank_disable_and_save(). They are all

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls 
drm_update_vblank_count() unconditionally, even if vblank irqs are 
already off.


So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
final count.


Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
drm_vblank_off -> vblank_disable_and_save -> A pointless 
drm_update_vblank_count() while the hw counter is already reset to zero 
--> Unwanted counter jump.



The problem doesn't happen on a pure modeset to a different video 
resolution/refresh rate, as then we only have one call into 
atombios_crtc_dpms(DPMS_OFF).


I think the fix is to fix vblank_disable_and_save() to only call 
drm_update_vblank_count() if vblank irqs get actually disabled, not on 
no-op calls. I will try that now.


Otherwise kms drivers would have to be careful to never call 
drm_vblank_off multiple times before calling drm_vblank_on, but the help 
text to drm_vblank_on() claims that unbalanced calls to these functions 
are perfectly fine.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls 
drm_update_vblank_count() unconditionally, even if vblank irqs are 
already off.


So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> 
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes 
final count.


Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> 
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> 
drm_vblank_off -> vblank_disable_and_save -> A pointless 
drm_update_vblank_count() while the hw counter is already reset to zero 
--> Unwanted counter jump.



The problem doesn't happen on a pure modeset to a different video 
resolution/refresh rate, as then we only have one call into 
atombios_crtc_dpms(DPMS_OFF).


I think the fix is to fix vblank_disable_and_save() to only call 
drm_update_vblank_count() if vblank irqs get actually disabled, not on 
no-op calls. I will try that now.


Otherwise kms drivers would have to be careful to never call 
drm_vblank_off multiple times before calling drm_vblank_on, but the help 
text to drm_vblank_on() claims that unbalanced calls to these functions 
are perfectly fine.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_update_vblank_count() if vblank irqs get actually disabled, not on
no-op calls. I will try that now.


It does that on purpose. Otherwise the vblank counter would appear to
have stalled while the interrupt was off.



Ok, that's what the comments there say, although i don't see atm. why 
that perceived stall would be a big problem. I checked all callers of 
vblank_disable_and_save(). They are all

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 07:51 PM, Daniel Vetter wrote:

On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:

Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_update_vblank_count() if vblank irqs get actually disabled, not on
no-op calls. I will try that now.


It does that on

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner


Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_update_vblank_count() if vblank irqs get actually disabled, not on
no-op calls. I will try that now.


It does that on purpose. Otherwise the vblank counter would appear to
have stalled while the interrupt was off.



Ok, that's

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-25 Thread Mario Kleiner




On 01/25/2016 09:32 PM, Daniel Vetter wrote:

On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote:



On 01/25/2016 07:51 PM, Daniel Vetter wrote:

On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote:

Readding Daniel, which somehow got dropped from the cc.

On 01/25/2016 03:53 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote:



On 01/25/2016 02:23 PM, Ville Syrjälä wrote:

On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote:



On 01/25/2016 05:15 AM, Michel Dänzer wrote:

On 23.01.2016 00:18, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.


At least one of the jumps is expected, because this is around turning
off the CRTC for DPMS off. Don't know yet why there are two jumps back
though.



These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.


Which is of course the idea of Daniel's patch (which is what I'm getting
the above with) or Mario's patch as well, but clearly something's still
wrong. It's certainly possible that it's something in the driver, but
since calling drm_vblank_pre/post_modeset from the same places seems to
work fine (ignoring the regression discussed in this thread)... Do
drm_vblank_on/off require something else to handle this correctly?




I suspect it is because vblank_disable_and_save calls
drm_update_vblank_count() unconditionally, even if vblank irqs are
already off.

So on a manual display disable -> reenable you get something like

At disable:

Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off ->
vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes
final count.

Then the crtc is shut down and its hw counter resets to zero.

At reenable:

Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) ->
atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) ->
drm_vblank_off -> vblank_disable_and_save -> A pointless
drm_update_vblank_count() while the hw counter is already reset to zero
--> Unwanted counter jump.


The problem doesn't happen on a pure modeset to a different video
resolution/refresh rate, as then we only have one call into
atombios_crtc_dpms(DPMS_OFF).

I think the fix is to fix vblank_disable_and_save() to only call
drm_

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-23 Thread Mario Kleiner


On 01/22/2016 07:29 PM, Mario Kleiner wrote:



On 01/22/2016 04:18 PM, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html

, but I can't find that in the archives, so maybe that was just on
IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html

. Basically, I ran into the bug fixed by your patch because the
counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0:
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7
p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0:
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1:
current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2:
current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3:
current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@
7304.317140 -> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0:
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.

These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.



Fwiw, testing the HD-57570 single display with my patch that uses
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show
hardware counter reset to zero as expected, but no jumps of software
vblank counter. So with that vblank_off/on placement it seems to work
nicely here.

-mario



I spoke too early. The jump doesn't happen when i change video modes - 
video resolution / refresh rate etc, despite hw counter reset. But if i 
just disable and then reenable a display, the software counter jumps.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-23 Thread Mario Kleiner


On 01/22/2016 07:29 PM, Mario Kleiner wrote:



On 01/22/2016 04:18 PM, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html

, but I can't find that in the archives, so maybe that was just on
IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html

. Basically, I ran into the bug fixed by your patch because the
counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0:
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7
p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0:
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1:
current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2:
current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3:
current=0, diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@
7304.317140 -> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0:
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.

These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.



Fwiw, testing the HD-57570 single display with my patch that uses
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show
hardware counter reset to zero as expected, but no jumps of software
vblank counter. So with that vblank_off/on placement it seems to work
nicely here.

-mario



I spoke too early. The jump doesn't happen when i change video modes - 
video resolution / refresh rate etc, despite hw counter reset. But if i 
just disable and then reenable a display, the software counter jumps.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-22 Thread Mario Kleiner




On 01/22/2016 04:18 PM, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.

These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.



Fwiw, testing the HD-57570 single display with my patch that uses 
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show 
hardware counter reset to zero as expected, but no jumps of software 
vblank counter. So with that vblank_off/on placement it seems to work 
nicely here.


-mario


dev->max_vblank_count = 0x, which makes the wraparound code in
drm_update_vblank_count a no-op. Maybe you can reproduce it if you
artificially set a lower max_vblank_count in the driver.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-22 Thread Mario Kleiner




On 01/22/2016 04:18 PM, Ville Syrjälä wrote:

On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote:


[ Trimming KDE folks from Cc ]

On 21.01.2016 19:09, Daniel Vetter wrote:

On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote:

On 21.01.2016 16:58, Daniel Vetter wrote:


Can you please point me at the vblank on/off jump bug please?


AFAIR I originally reported it in response to
http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html
, but I can't find that in the archives, so maybe that was just on IRC.
See
http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html
. Basically, I ran into the bug fixed by your patch because the counter
jumped forward on every DPMS off, so it hit the 32-bit boundary after
just a few days.


Ok, so just uncovered the overflow bug.


Not sure what you mean by "just", but to be clear: The drm_vblank_on/off
counter jumping bug (similar to the bug this thread is about), which
exposed the overflow bug, is still alive and kicking in 4.5. It seems
to happen when turning off the CRTC:

[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=0, hw=916 hw_last=916
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 
7304.307354 -> 7304.308006 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=218104694, diff=16776301, hw=1 hw_last=916


Not sure what bug we're talking about here, but here the hw counter
clearly jumps backwards.


[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:radeon_get_vblank_counter_kms] Query failed! stat 3
[drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, 
diff=0, hw=0 hw_last=0
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 
-> 7304.317140 [e 0 us, 0 rep]
[drm:radeon_get_vblank_counter_kms] Query failed! stat 1
[drm:drm_update_vblank_count] updating vblank count on crtc 0: 
current=234880995, diff=16777215, hw=0 hw_last=1


Same here.

These things just don't happen on i915 because drm_vblank_off() and
drm_vblank_on() are always called around the times when the hw counter
might get reset. Or at least that's how it should be.



Fwiw, testing the HD-57570 single display with my patch that uses 
drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show 
hardware counter reset to zero as expected, but no jumps of software 
vblank counter. So with that vblank_off/on placement it seems to work 
nicely here.


-mario


dev->max_vblank_count = 0x, which makes the wraparound code in
drm_update_vblank_count a no-op. Maybe you can reproduce it if you
artificially set a lower max_vblank_count in the driver.


--
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-21 Thread Mario Kleiner


On 01/21/2016 07:38 AM, Michel Dänzer wrote:

On 21.01.2016 14:31, Mario Kleiner wrote:

On 01/21/2016 04:43 AM, Michel Dänzer wrote:

On 21.01.2016 05:32, Mario Kleiner wrote:


So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?



The whole logic has changed and the software counter updates are now
driven all the time by the hw counter.



BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...


I spent the last few hours reading through the drm and radeon code and i
think what should probably work is to replace the
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
calls. These are apparently meant for drivers whose hw counters reset
during modeset, [...]


... just like drm_vblank_pre/post_modeset. That those were broken is a
regression which needs to be fixed anyway. I don't think switching to
drm_vblank_on/off is suitable for stable trees.

Looking at Vlastimil's original post again, I'd say the most likely
culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
vblanks were missed").



Yes, i think reverting that one alone would likely fix it by reverting 
to the old vblank update logic.





Once drm_vblank_off is called, drm_vblank_get will no-op and return an
error, so clients can't enable vblank irqs during the modeset - pageflip
ioctl and waitvblank ioctl would fail while a modeset happens -
hopefully userspace handles this correctly everywhere.


We've fixed xf86-video-ati for this.



I'll hack up a patch for demonstration now.


You're a bit late to that party. :)

http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html




Oops. Just sent out my little (so far untested) creations. Yes, they are 
essentially the same as Daniel's patches. The only addition is to also 
fix that other potential small race i describe by slightly moving the 
xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put 
imbalance in radeon_pm if vblank_on/off would be used.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-21 Thread Mario Kleiner


On 01/21/2016 07:38 AM, Michel Dänzer wrote:

On 21.01.2016 14:31, Mario Kleiner wrote:

On 01/21/2016 04:43 AM, Michel Dänzer wrote:

On 21.01.2016 05:32, Mario Kleiner wrote:


So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?



The whole logic has changed and the software counter updates are now
driven all the time by the hw counter.



BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...


I spent the last few hours reading through the drm and radeon code and i
think what should probably work is to replace the
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on
calls. These are apparently meant for drivers whose hw counters reset
during modeset, [...]


... just like drm_vblank_pre/post_modeset. That those were broken is a
regression which needs to be fixed anyway. I don't think switching to
drm_vblank_on/off is suitable for stable trees.

Looking at Vlastimil's original post again, I'd say the most likely
culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many
vblanks were missed").



Yes, i think reverting that one alone would likely fix it by reverting 
to the old vblank update logic.





Once drm_vblank_off is called, drm_vblank_get will no-op and return an
error, so clients can't enable vblank irqs during the modeset - pageflip
ioctl and waitvblank ioctl would fail while a modeset happens -
hopefully userspace handles this correctly everywhere.


We've fixed xf86-video-ati for this.



I'll hack up a patch for demonstration now.


You're a bit late to that party. :)

http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html
http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html




Oops. Just sent out my little (so far untested) creations. Yes, they are 
essentially the same as Daniel's patches. The only addition is to also 
fix that other potential small race i describe by slightly moving the 
xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put 
imbalance in radeon_pm if vblank_on/off would be used.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Mario Kleiner


On 01/21/2016 04:43 AM, Michel Dänzer wrote:

On 21.01.2016 05:32, Mario Kleiner wrote:


So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?



The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.




BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...




I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.


Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.


It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?


There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.


I'll hack up a patch for demonstration now.

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Mario Kleiner


On 01/18/2016 11:49 AM, Vlastimil Babka wrote:

On 01/16/2016 05:24 AM, Mario Kleiner wrote:



On 01/15/2016 01:26 PM, Ville Syrjälä wrote:

On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:


I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
can't trigger a hang after hundreds of runs.

Does this also hang for you?


No, test mode seems to be fine.


I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
should probably give useful info around the time of the hang.


Attached. Captured by having kdm running, switching to console, running
"dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
frozen splashscreen, switch back, terminate dmesg. So somewhere around
the middle there should be where ksplashscreen starts...


Maybe also check XOrg.0.log for (WW) warnings related to flip.


No such warnings there.


thanks,
-mario



Thanks,
Vlastimil






Thanks. So the problem is that AMDs hardware frame counters reset to 
zero during a modeset. The old DRM code dealt with drivers doing that by 
keeping vblank irqs enabled during modesets and incrementing vblank 
count by one during each vblank irq, i think that's what 
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


The new code in drm_update_vblank_count() breaks this. The reset of the 
counter to zero is treated as counter wraparound, so our software vblank 
counter jumps forward by up to 2^24 counts in response (in case of AMD's 
24 bit hw counters), and then the vblank event handling code in 
drm_handle_vblank_events() and other places detects the counter being 
more than 2^23 counts ahead of queued vblank events and as part of its 
own wraparound handling for the 32-Bit software counter doesn't deliver 
these queued events for a long time -> no vblank swap trigger event -> 
no swap -> client hangs waiting for swap completion.


I think i remember seeing the ksplash progress screen occasionally 
blanking half way through login, i guess that's when kwin triggers a 
modeset in parallel to ksplash doing its OpenGL animations. So depending 
on the hw vblank count at the time of login ksplash would or wouldn't 
hang, apparently i got "lucky" with my counts at login.


-mario

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Mario Kleiner


On 01/21/2016 04:43 AM, Michel Dänzer wrote:

On 21.01.2016 05:32, Mario Kleiner wrote:


So the problem is that AMDs hardware frame counters reset to
zero during a modeset. The old DRM code dealt with drivers doing that by
keeping vblank irqs enabled during modesets and incrementing vblank
count by one during each vblank irq, i think that's what
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


Right, looks like there's been a regression breaking this. I suspect the
problem is that vblank->last isn't getting updated from
drm_vblank_post_modeset. Not sure which change broke that though, or how
to fix it. Ville?



The whole logic has changed and the software counter updates are now 
driven all the time by the hw counter.




BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which
exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for
vblank counters"). I've been meaning to track that down since then; one
of these days hopefully, but if anybody has any ideas offhand...




I spent the last few hours reading through the drm and radeon code and i 
think what should probably work is to replace the 
drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on 
calls. These are apparently meant for drivers whose hw counters reset 
during modeset, and seem to reinitialize stuff properly and release 
clients queued vblank events to avoid blocking - not tested so far, just 
looked at the code.


Once drm_vblank_off is called, drm_vblank_get will no-op and return an 
error, so clients can't enable vblank irqs during the modeset - pageflip 
ioctl and waitvblank ioctl would fail while a modeset happens - 
hopefully userspace handles this correctly everywhere.


It would also cause radeons power management to not sync its actions to 
vblank if it would get invoked during a modeset, but that seems to be 
handled by a 200 msec timeout and hopefully only cause visual glitches - 
or invisible glitches while the crtc is blanked during modeset?


There could be another tiny race with the new "vblank counter bumping" 
logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts 
...") if drm_update_vblank_counter() would be called multiple times in 
quick succession within the "radeon_crtc->lb_vblank_lead_lines" 
scanlines before start of real vblank iff at the same time a modeset 
would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller 
value due to a change in horizontal mode resolution. That needs a 
modeset to happen to a higher horizontal resolution just exactly when 
the scanout is in exactly the right 5 or so scanlines and some client is 
calling drm_vblank_get() to enable vblank irqs at the same time, but it 
would cause the same hang if it happened - not that likely to happen 
often, but still not nice, also Murphy's law... If we could switch to 
drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove 
those race as well by forbidding any vblank irq related activity during 
a modeset.


I'll hack up a patch for demonstration now.

Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon

2016-01-20 Thread Mario Kleiner


On 01/18/2016 11:49 AM, Vlastimil Babka wrote:

On 01/16/2016 05:24 AM, Mario Kleiner wrote:



On 01/15/2016 01:26 PM, Ville Syrjälä wrote:

On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote:


I'm currently running...

while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done

... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770  and so far i
can't trigger a hang after hundreds of runs.

Does this also hang for you?


No, test mode seems to be fine.


I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank"
should probably give useful info around the time of the hang.


Attached. Captured by having kdm running, switching to console, running
"dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see
frozen splashscreen, switch back, terminate dmesg. So somewhere around
the middle there should be where ksplashscreen starts...


Maybe also check XOrg.0.log for (WW) warnings related to flip.


No such warnings there.


thanks,
-mario



Thanks,
Vlastimil






Thanks. So the problem is that AMDs hardware frame counters reset to 
zero during a modeset. The old DRM code dealt with drivers doing that by 
keeping vblank irqs enabled during modesets and incrementing vblank 
count by one during each vblank irq, i think that's what 
drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for.


The new code in drm_update_vblank_count() breaks this. The reset of the 
counter to zero is treated as counter wraparound, so our software vblank 
counter jumps forward by up to 2^24 counts in response (in case of AMD's 
24 bit hw counters), and then the vblank event handling code in 
drm_handle_vblank_events() and other places detects the counter being 
more than 2^23 counts ahead of queued vblank events and as part of its 
own wraparound handling for the 32-Bit software counter doesn't deliver 
these queued events for a long time -> no vblank swap trigger event -> 
no swap -> client hangs waiting for swap completion.


I think i remember seeing the ksplash progress screen occasionally 
blanking half way through login, i guess that's when kwin triggers a 
modeset in parallel to ksplash doing its OpenGL animations. So depending 
on the hw vblank count at the time of login ksplash would or wouldn't 
hang, apparently i got "lucky" with my counts at login.


-mario

Fwd: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table

2016-01-01 Thread Mario Kleiner


 Forwarded Message 
Subject: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table
Date: Fri, 18 Dec 2015 20:24:06 +0100
From: Mario Kleiner 
To: x...@kernel.org
CC: mi...@redhat.com, h...@zytor.com, mario.kleiner...@gmail.com, 
sta...@vger.kernel.org


Without the reboot=pci method, the iMac 10,1 simply
hangs after printing "Restarting system" at the point
when it should reboot. This fixes it.

Signed-off-by: Mario Kleiner 
Cc: 
---
 arch/x86/kernel/reboot.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 02693dd..f660d63 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -182,6 +182,14 @@ static struct dmi_system_id __initdata 
reboot_dmi_table[] = {

DMI_MATCH(DMI_PRODUCT_NAME, "iMac9,1"),
},
},
+   {   /* Handle problems with rebooting on the iMac10,1. */
+   .callback = set_pci_reboot,
+   .ident = "Apple iMac10,1",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "iMac10,1"),
+   },
+   },

/* ASRock */
{   /* Handle problems with rebooting on ASRock Q1900DC-ITX */
--
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Fwd: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table

2016-01-01 Thread Mario Kleiner


 Forwarded Message 
Subject: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table
Date: Fri, 18 Dec 2015 20:24:06 +0100
From: Mario Kleiner <mario.kleiner...@gmail.com>
To: x...@kernel.org
CC: mi...@redhat.com, h...@zytor.com, mario.kleiner...@gmail.com, 
sta...@vger.kernel.org


Without the reboot=pci method, the iMac 10,1 simply
hangs after printing "Restarting system" at the point
when it should reboot. This fixes it.

Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com>
Cc: <sta...@vger.kernel.org>
---
 arch/x86/kernel/reboot.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 02693dd..f660d63 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -182,6 +182,14 @@ static struct dmi_system_id __initdata 
reboot_dmi_table[] = {

DMI_MATCH(DMI_PRODUCT_NAME, "iMac9,1"),
},
},
+   {   /* Handle problems with rebooting on the iMac10,1. */
+   .callback = set_pci_reboot,
+   .ident = "Apple iMac10,1",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."),
+   DMI_MATCH(DMI_PRODUCT_NAME, "iMac10,1"),
+   },
+   },

/* ASRock */
{   /* Handle problems with rebooting on ASRock Q1900DC-ITX */
--
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-12 Thread Mario Kleiner

On 08/13/2014 03:50 AM, Michel Dänzer wrote:

On 12.08.2014 00:17, Jerome Glisse wrote:

On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:

On 08/10/2014 08:02 PM, Mario Kleiner wrote:

On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:

On 08/10/2014 05:11 AM, Mario Kleiner wrote:

The other problem is that probably TTM does not reuse pages from the
DMA pool. If i trace the __ttm_dma_alloc_page
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>
and
__ttm_dma_free_page
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>
calls for
those single page allocs/frees, then over a 20 second interval of
tracing and switching tabs in firefox, scrolling things around etc. i
find about as many alloc's as i find free's, e.g., 1607 allocs vs.
1648 frees.

This is because historically the pools have been designed to keep only
pages with nonstandard caching attributes since changing page caching
attributes have been very slow but the kernel page allocators have been
reasonably fast.

/Thomas

Ok. A bit more ftraceing showed my hang problem case goes through the
"if (is_cached)" paths, so the pool doesn't recycle anything and i see
it bouncing up and down by 4 pages all the time.

But for the non-cached case, which i don't hit with my problem, could
one of you look at line 954...

https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60

... and tell me why that unconditional npages = count; assignment
makes sense? It seems to essentially disable all recycling for the dma
pool whenever the pool isn't filled up to/beyond its maximum with free
pages? When the pool is filled up, lots of stuff is recycled, but when
it is already somewhat below capacity, it gets "punished" by not
getting refilled? I'd just like to understand the logic behind that line.

thanks,
-mario

I'll happily forward that question to Konrad who wrote the code (or it
may even stem from the ordinary page pool code which IIRC has Dave
Airlie / Jerome Glisse as authors)

This is effectively bogus code, i now wonder how it came to stay alive.
Attached patch will fix that.

I haven't tested Mario's scenario specifically, but it survived piglit
and the UE4 Effects Cave Demo (for which 1GB of VRAM isn't enough, so
some BOs ended up in GTT instead with write-combined CPU mappings) on
radeonsi without any noticeable issues.

Tested-by: Michel Dänzer

I haven't tested the patch yet. For the original bug it won't help
directly, because the super-slow allocations which cause the desktop
stall are tt_cached allocations, so they go through the if (is_cached)
code path which isn't improved by Jerome's patch. is_cached always
releases memory immediately, so the tt_cached pool just bounces up and
down between 4 and 7 pages. So this was an independent issue. The slow
allocations i noticed were mostly caused by exa allocating new gem bo's,
i don't know which path is taken by 3d graphics?

However, the fixed ttm path could indirectly solve the DMA_CMA stalls by
completely killing CMA for its intended purpose. Typical CMA sizes are
probably around < 100 MB (kernel default is 16 MB, Ubuntu config is 64
MB), and the limit for the page pool seems to be more like 50% of all
system RAM? Iow. if the ttm dma pool is allowed to grow that big with
recycled pages, it probably will almost completely monopolize the whole
CMA memory after a short amount of time. ttm won't suffer stalls if it
essentially doesn't interact with CMA anymore after a warmup period, but
actual clients which really need CMA (ie., hardware without
scatter-gather dma etc.) will be starved of what they need as far as my
limited understanding of the CMA goes.

So fwiw probably the fix to ttm will increase the urgency for the CMA
people to come up with a fix/optimization for the allocator. Unless it
doesn't matter if most desktop systems have CMA disabled by default, and
ttm is mostly used by desktop graphics drivers (nouveau, radeon, vmgfx)?
I only stumbled over the problem because the Ubuntu 3.16 mainline
testing kernels are compiled with CMA on.

-mario

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body o

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-12 Thread Mario Kleiner


On 08/11/2014 05:17 PM, Jerome Glisse wrote:

On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:

On 08/10/2014 08:02 PM, Mario Kleiner wrote:

On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:

On 08/10/2014 05:11 AM, Mario Kleiner wrote:

Resent this time without HTML formatting which lkml doesn't like.
Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:

On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:

On August 9, 2014 1:39:39 AM EDT, Thomas
Hellstrom  wrote:

Hi.


Hey Thomas!


IIRC I don't think the TTM DMA pool allocates coherent pages more
than
one page at a time, and _if that's true_ it's pretty unnecessary for
the
dma subsystem to route those allocations to CMA. Maybe Konrad could
shed
some light over this?

It should allocate in batches and keep them in the TTM DMA pool for
some time to be reused.

The pages that it gets are in 4kb granularity though.

Then I feel inclined to say this is a DMA subsystem bug. Single page
allocations shouldn't get routed to CMA.

/Thomas

Yes, seems you're both right. I read through the code a bit more and
indeed the TTM DMA pool allocates only one page during each
dma_alloc_coherent() call, so it doesn't need CMA memory. The current
allocators don't check for single page CMA allocations and therefore
try to get it from the CMA area anyway, instead of skipping to the
much cheaper fallback.

So the callers of dma_alloc_from_contiguous() could need that little
optimization of skipping it if only one page is requested. For

dma_generic_alloc_coherent
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Ddma_generic_alloc_coherent=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d1852625e2ab2ff07eb34a7f33fc1f55f7f13959912d5a6ce9316d23070ce939>

andintel_alloc_coherent
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Dintel_alloc_coherent=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=82d587e9b6aeced5cf9a7caefa91bf47fba809f3522b7379d22e45a2d5d35ebd>
this
seems easy to do. Looking at the arm arch variants, e.g.,

https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c%23L1194=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=4c178257eab9b5d7ca650dedba76cf27abeb49ddc7aebb9433f52b6c8bb3bbac


and

https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c%23L44=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=5f62f4cbe8cee1f1dd4cbba656354efe6867bcdc664cf90e9719e2f42a85de08


i'm not sure if it is that easily done, as there aren't any fallbacks
for such a case and the code looks to me as if that's at least
somewhat intentional.

As far as TTM goes, one quick one-line fix to prevent it from using
the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
above methods) would be to clear the __GFP_WAIT
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc>
flag from the
passed gfp_t flags. That would trigger the well working fallback.
So, is

__GFP_WAIT
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc>
needed
for those single page allocations that go through__ttm_dma_alloc_page
<https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>?


It would be nice to have such a simple, non-intrusive one-line patch
that we still could get into 3.17 and then backported to older stable
kernels to avoid the same desktop hangs there if CMA is enabled. It
would be also nice for actual users of CMA to not use up lots of CMA
space for gpu's which don't need it. I think DMA_CMA was introduced
around 3.12.


I don't think that's a good idea. Omitting __GFP_WAIT would cause
unnecessary memory allocation errors on systems under stress.
I think this should be filed as a DMA subsystem kernel bug / regression
and an appropriate solution should be worked out together with

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-12 Thread Mario Kleiner


On 08/11/2014 05:17 PM, Jerome Glisse wrote:

On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:

On 08/10/2014 08:02 PM, Mario Kleiner wrote:

On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:

On 08/10/2014 05:11 AM, Mario Kleiner wrote:

Resent this time without HTML formatting which lkml doesn't like.
Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:

On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:

On August 9, 2014 1:39:39 AM EDT, Thomas
Hellstromthellst...@vmware.com  wrote:

Hi.


Hey Thomas!


IIRC I don't think the TTM DMA pool allocates coherent pages more
than
one page at a time, and _if that's true_ it's pretty unnecessary for
the
dma subsystem to route those allocations to CMA. Maybe Konrad could
shed
some light over this?

It should allocate in batches and keep them in the TTM DMA pool for
some time to be reused.

The pages that it gets are in 4kb granularity though.

Then I feel inclined to say this is a DMA subsystem bug. Single page
allocations shouldn't get routed to CMA.

/Thomas

Yes, seems you're both right. I read through the code a bit more and
indeed the TTM DMA pool allocates only one page during each
dma_alloc_coherent() call, so it doesn't need CMA memory. The current
allocators don't check for single page CMA allocations and therefore
try to get it from the CMA area anyway, instead of skipping to the
much cheaper fallback.

So the callers of dma_alloc_from_contiguous() could need that little
optimization of skipping it if only one page is requested. For

dma_generic_alloc_coherent
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Ddma_generic_alloc_coherentk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=d1852625e2ab2ff07eb34a7f33fc1f55f7f13959912d5a6ce9316d23070ce939

andintel_alloc_coherent
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Dintel_alloc_coherentk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=82d587e9b6aeced5cf9a7caefa91bf47fba809f3522b7379d22e45a2d5d35ebd
this
seems easy to do. Looking at the arm arch variants, e.g.,

https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c%23L1194k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=4c178257eab9b5d7ca650dedba76cf27abeb49ddc7aebb9433f52b6c8bb3bbac


and

https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c%23L44k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=5f62f4cbe8cee1f1dd4cbba656354efe6867bcdc664cf90e9719e2f42a85de08


i'm not sure if it is that easily done, as there aren't any fallbacks
for such a case and the code looks to me as if that's at least
somewhat intentional.

As far as TTM goes, one quick one-line fix to prevent it from using
the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
above methods) would be to clear the __GFP_WAIT
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAITk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc
flag from the
passed gfp_t flags. That would trigger the well working fallback.
So, is

__GFP_WAIT
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAITk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc
needed
for those single page allocations that go through__ttm_dma_alloc_page
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_pagek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0?


It would be nice to have such a simple, non-intrusive one-line patch
that we still could get into 3.17 and then backported to older stable
kernels to avoid the same desktop hangs there if CMA is enabled. It
would be also nice for actual users of CMA to not use up lots of CMA
space for gpu's which don't need it. I think DMA_CMA was introduced
around 3.12.


I don't think that's a good idea. Omitting __GFP_WAIT would cause
unnecessary memory allocation errors on systems under stress.
I think this should be filed as a DMA subsystem kernel bug / regression
and an appropriate solution should be worked out together

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-12 Thread Mario Kleiner

On 08/13/2014 03:50 AM, Michel Dänzer wrote:

On 12.08.2014 00:17, Jerome Glisse wrote:

On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote:

On 08/10/2014 08:02 PM, Mario Kleiner wrote:

On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:

On 08/10/2014 05:11 AM, Mario Kleiner wrote:

The other problem is that probably TTM does not reuse pages from the
DMA pool. If i trace the __ttm_dma_alloc_page
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_pagek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0
and
__ttm_dma_free_page
https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_pagek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0
calls for
those single page allocs/frees, then over a 20 second interval of
tracing and switching tabs in firefox, scrolling things around etc. i
find about as many alloc's as i find free's, e.g., 1607 allocs vs.
1648 frees.

/Thomas

Ok. A bit more ftraceing showed my hang problem case goes through the
if (is_cached) paths, so the pool doesn't recycle anything and i see
it bouncing up and down by 4 pages all the time.

But for the non-cached case, which i don't hit with my problem, could
one of you look at line 954...

https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60

... and tell me why that unconditional npages = count; assignment
makes sense? It seems to essentially disable all recycling for the dma
pool whenever the pool isn't filled up to/beyond its maximum with free
pages? When the pool is filled up, lots of stuff is recycled, but when
it is already somewhat below capacity, it gets punished by not
getting refilled? I'd just like to understand the logic behind that line.

thanks,
-mario

I'll happily forward that question to Konrad who wrote the code (or it
may even stem from the ordinary page pool code which IIRC has Dave
Airlie / Jerome Glisse as authors)

This is effectively bogus code, i now wonder how it came to stay alive.
Attached patch will fix that.

Tested-by: Michel Dänzer michel.daen...@amd.com

However, the fixed ttm path could indirectly solve the DMA_CMA stalls by
completely killing CMA for its intended purpose. Typical CMA sizes are
probably around 100 MB (kernel default is 16 MB, Ubuntu config is 64
MB), and the limit for the page pool seems to be more like 50% of all
system RAM? Iow. if the ttm dma pool is allowed to grow that big with
recycled pages, it probably will almost completely monopolize the whole
CMA memory after a short amount of time. ttm won't suffer stalls if it
essentially doesn't interact with CMA anymore after a warmup period, but
actual clients which really need CMA (ie., hardware without
scatter-gather dma etc.) will be starved of what they need as far as my
limited understanding of the CMA goes.

-mario

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-10 Thread Mario Kleiner


On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:

On 08/10/2014 05:11 AM, Mario Kleiner wrote:

Resent this time without HTML formatting which lkml doesn't like. Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:

On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:

On August 9, 2014 1:39:39 AM EDT, Thomas
Hellstrom  wrote:

Hi.


Hey Thomas!


IIRC I don't think the TTM DMA pool allocates coherent pages more than
one page at a time, and _if that's true_ it's pretty unnecessary for
the
dma subsystem to route those allocations to CMA. Maybe Konrad could
shed
some light over this?

It should allocate in batches and keep them in the TTM DMA pool for
some time to be reused.

The pages that it gets are in 4kb granularity though.

Then I feel inclined to say this is a DMA subsystem bug. Single page
allocations shouldn't get routed to CMA.

/Thomas

Yes, seems you're both right. I read through the code a bit more and
indeed the TTM DMA pool allocates only one page during each
dma_alloc_coherent() call, so it doesn't need CMA memory. The current
allocators don't check for single page CMA allocations and therefore
try to get it from the CMA area anyway, instead of skipping to the
much cheaper fallback.

So the callers of dma_alloc_from_contiguous() could need that little
optimization of skipping it if only one page is requested. For

dma_generic_alloc_coherent
<http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent>
andintel_alloc_coherent
<http://lxr.free-electrons.com/ident?i=intel_alloc_coherent>  this
seems easy to do. Looking at the arm arch variants, e.g.,

http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194

and

http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44

i'm not sure if it is that easily done, as there aren't any fallbacks
for such a case and the code looks to me as if that's at least
somewhat intentional.

As far as TTM goes, one quick one-line fix to prevent it from using
the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
above methods) would be to clear the __GFP_WAIT
<http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the
passed gfp_t flags. That would trigger the well working fallback. So, is

__GFP_WAIT  <http://lxr.free-electrons.com/ident?i=__GFP_WAIT>  needed
for those single page allocations that go through__ttm_dma_alloc_page
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>?

It would be nice to have such a simple, non-intrusive one-line patch
that we still could get into 3.17 and then backported to older stable
kernels to avoid the same desktop hangs there if CMA is enabled. It
would be also nice for actual users of CMA to not use up lots of CMA
space for gpu's which don't need it. I think DMA_CMA was introduced
around 3.12.


I don't think that's a good idea. Omitting __GFP_WAIT would cause
unnecessary memory allocation errors on systems under stress.
I think this should be filed as a DMA subsystem kernel bug / regression
and an appropriate solution should be worked out together with the DMA
subsystem maintainers and then backported.


Ok, so it is needed. I'll file a bug report.


The other problem is that probably TTM does not reuse pages from the
DMA pool. If i trace the __ttm_dma_alloc_page
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and
__ttm_dma_free_page
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for
those single page allocs/frees, then over a 20 second interval of
tracing and switching tabs in firefox, scrolling things around etc. i
find about as many alloc's as i find free's, e.g., 1607 allocs vs.
1648 frees.

This is because historically the pools have been designed to keep only
pages with nonstandard caching attributes since changing page caching
attributes have been very slow but the kernel page allocators have been
reasonably fast.

/Thomas


Ok. A bit more ftraceing showed my hang problem case goes through the 
"if (is_cached)" paths, so the pool doesn't recycle anything and i see 
it bouncing up and down by 4 pages all the time.


But for the non-cached case, which i don't hit with my problem, could 
one of you look at line 954...


http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954

... and tell me why that unconditional npages = count; assignment makes sense? It seems 
to essentially disable all recycling for the dma pool whenever the pool isn't filled up 
to/beyond its maximum with free pages? When the pool is filled up, lots of stuff is 
recycled, but when it is already somewhat below capacity, it gets "punished" by 
not getting refilled? I'd just like to understand the logic behind that line.

thanks,
-mario



This bit of code fromttm_dma_unpopulate
<http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate>()  (line
954 in 3.16) looks suspicious:

http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954


Alloc'

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-10 Thread Mario Kleiner


On 08/10/2014 01:03 PM, Thomas Hellstrom wrote:

On 08/10/2014 05:11 AM, Mario Kleiner wrote:

Resent this time without HTML formatting which lkml doesn't like. Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:

On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:

On August 9, 2014 1:39:39 AM EDT, Thomas
Hellstromthellst...@vmware.com  wrote:

Hi.


Hey Thomas!


IIRC I don't think the TTM DMA pool allocates coherent pages more than
one page at a time, and _if that's true_ it's pretty unnecessary for
the
dma subsystem to route those allocations to CMA. Maybe Konrad could
shed
some light over this?

It should allocate in batches and keep them in the TTM DMA pool for
some time to be reused.

The pages that it gets are in 4kb granularity though.

Then I feel inclined to say this is a DMA subsystem bug. Single page
allocations shouldn't get routed to CMA.

/Thomas

Yes, seems you're both right. I read through the code a bit more and
indeed the TTM DMA pool allocates only one page during each
dma_alloc_coherent() call, so it doesn't need CMA memory. The current
allocators don't check for single page CMA allocations and therefore
try to get it from the CMA area anyway, instead of skipping to the
much cheaper fallback.

So the callers of dma_alloc_from_contiguous() could need that little
optimization of skipping it if only one page is requested. For

dma_generic_alloc_coherent
http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent
andintel_alloc_coherent
http://lxr.free-electrons.com/ident?i=intel_alloc_coherent  this
seems easy to do. Looking at the arm arch variants, e.g.,

http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194

and

http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44

i'm not sure if it is that easily done, as there aren't any fallbacks
for such a case and the code looks to me as if that's at least
somewhat intentional.

As far as TTM goes, one quick one-line fix to prevent it from using
the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the
above methods) would be to clear the __GFP_WAIT
http://lxr.free-electrons.com/ident?i=__GFP_WAIT flag from the
passed gfp_t flags. That would trigger the well working fallback. So, is

__GFP_WAIT  http://lxr.free-electrons.com/ident?i=__GFP_WAIT  needed
for those single page allocations that go through__ttm_dma_alloc_page
http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page?

It would be nice to have such a simple, non-intrusive one-line patch
that we still could get into 3.17 and then backported to older stable
kernels to avoid the same desktop hangs there if CMA is enabled. It
would be also nice for actual users of CMA to not use up lots of CMA
space for gpu's which don't need it. I think DMA_CMA was introduced
around 3.12.


I don't think that's a good idea. Omitting __GFP_WAIT would cause
unnecessary memory allocation errors on systems under stress.
I think this should be filed as a DMA subsystem kernel bug / regression
and an appropriate solution should be worked out together with the DMA
subsystem maintainers and then backported.


Ok, so it is needed. I'll file a bug report.


The other problem is that probably TTM does not reuse pages from the
DMA pool. If i trace the __ttm_dma_alloc_page
http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page and
__ttm_dma_free_page
http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page calls for
those single page allocs/frees, then over a 20 second interval of
tracing and switching tabs in firefox, scrolling things around etc. i
find about as many alloc's as i find free's, e.g., 1607 allocs vs.
1648 frees.

This is because historically the pools have been designed to keep only
pages with nonstandard caching attributes since changing page caching
attributes have been very slow but the kernel page allocators have been
reasonably fast.

/Thomas


Ok. A bit more ftraceing showed my hang problem case goes through the 
if (is_cached) paths, so the pool doesn't recycle anything and i see 
it bouncing up and down by 4 pages all the time.


But for the non-cached case, which i don't hit with my problem, could 
one of you look at line 954...


http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954

... and tell me why that unconditional npages = count; assignment makes sense? It seems 
to essentially disable all recycling for the dma pool whenever the pool isn't filled up 
to/beyond its maximum with free pages? When the pool is filled up, lots of stuff is 
recycled, but when it is already somewhat below capacity, it gets punished by 
not getting refilled? I'd just like to understand the logic behind that line.

thanks,
-mario



This bit of code fromttm_dma_unpopulate
http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate()  (line
954 in 3.16) looks suspicious:

http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954


Alloc's from a tt_cached cached pool ( if (is_cached)...) always get
freed

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-09 Thread Mario Kleiner


Resent this time without HTML formatting which lkml doesn't like. Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:

On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:

On August 9, 2014 1:39:39 AM EDT, Thomas Hellstrom  
wrote:

Hi.


Hey Thomas!


IIRC I don't think the TTM DMA pool allocates coherent pages more than
one page at a time, and _if that's true_ it's pretty unnecessary for
the
dma subsystem to route those allocations to CMA. Maybe Konrad could
shed
some light over this?

It should allocate in batches and keep them in the TTM DMA pool for some time 
to be reused.

The pages that it gets are in 4kb granularity though.

Then I feel inclined to say this is a DMA subsystem bug. Single page
allocations shouldn't get routed to CMA.

/Thomas


Yes, seems you're both right. I read through the code a bit more and 
indeed the TTM DMA pool allocates only one page during each 
dma_alloc_coherent() call, so it doesn't need CMA memory. The current 
allocators don't check for single page CMA allocations and therefore try 
to get it from the CMA area anyway, instead of skipping to the much 
cheaper fallback.


So the callers of dma_alloc_from_contiguous() could need that little 
optimization of skipping it if only one page is requested. For


dma_generic_alloc_coherent  
<http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent>  
andintel_alloc_coherent  <http://lxr.free-electrons.com/ident?i=intel_alloc_coherent> 
 this seems easy to do. Looking at the arm arch variants, e.g.,

http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194

and

http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44

i'm not sure if it is that easily done, as there aren't any fallbacks 
for such a case and the code looks to me as if that's at least somewhat 
intentional.


As far as TTM goes, one quick one-line fix to prevent it from using the 
CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above 
methods) would be to clear the __GFP_WAIT 
<http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the passed 
gfp_t flags. That would trigger the well working fallback. So, is


__GFP_WAIT  <http://lxr.free-electrons.com/ident?i=__GFP_WAIT>  needed for those 
single page allocations that go through__ttm_dma_alloc_page  
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>?

It would be nice to have such a simple, non-intrusive one-line patch 
that we still could get into 3.17 and then backported to older stable 
kernels to avoid the same desktop hangs there if CMA is enabled. It 
would be also nice for actual users of CMA to not use up lots of CMA 
space for gpu's which don't need it. I think DMA_CMA was introduced 
around 3.12.



The other problem is that probably TTM does not reuse pages from the DMA 
pool. If i trace the __ttm_dma_alloc_page 
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and 
__ttm_dma_free_page 
<http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for 
those single page allocs/frees, then over a 20 second interval of 
tracing and switching tabs in firefox, scrolling things around etc. i 
find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 
frees.


This bit of code fromttm_dma_unpopulate 
<http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate>()  (line 954 
in 3.16) looks suspicious:


http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954

Alloc's from a tt_cached cached pool ( if (is_cached)...) always get 
freed and are not given back to the cached pool. But in the uncached 
case, there's logic to make sure the pool doesn't grow forever (line 
955, checking against _manager->options.max_size), but before that check 
in line 954 there's an uncoditional assignment of npages = count; which 
seems to force freeing all pages as well, instead of recycling? Is this 
some debug code left over, or intentional and just me not understanding 
what happens there?


thanks,
-mario



/Thomas


On 08/08/2014 07:42 PM, Mario Kleiner wrote:

Hi all,

there is a rather severe performance problem i accidentally found

when

trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under
Ubuntu 14.04 LTS with nouveau as graphics driver.

I was lazy and just installed the Ubuntu precompiled mainline kernel.
That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA
(contiguous memory allocator) size of 64 MB. Older Ubuntu kernels
weren't compiled with CMA, so i only observed this on 3.16, but
previous kernels would likely be affected too.

After a few minutes of regular desktop use like switching workspaces,
scrolling text in a terminal window, Firefox with multiple tabs open,
Thunderbird etc. (tested with KDE/Kwin, with/without desktop
composition), i get chunky desktop updates, then multi-second

freezes,

after a few minutes the desktop hangs for over a minute on almost any
GUI action like switching windows etc.

Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-09 Thread Mario Kleiner


Resent this time without HTML formatting which lkml doesn't like. Sorry.

On 08/09/2014 03:58 PM, Thomas Hellstrom wrote:

On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote:

On August 9, 2014 1:39:39 AM EDT, Thomas Hellstromthellst...@vmware.com  
wrote:

Hi.


Hey Thomas!


IIRC I don't think the TTM DMA pool allocates coherent pages more than
one page at a time, and _if that's true_ it's pretty unnecessary for
the
dma subsystem to route those allocations to CMA. Maybe Konrad could
shed
some light over this?

It should allocate in batches and keep them in the TTM DMA pool for some time 
to be reused.

The pages that it gets are in 4kb granularity though.

Then I feel inclined to say this is a DMA subsystem bug. Single page
allocations shouldn't get routed to CMA.

/Thomas


Yes, seems you're both right. I read through the code a bit more and 
indeed the TTM DMA pool allocates only one page during each 
dma_alloc_coherent() call, so it doesn't need CMA memory. The current 
allocators don't check for single page CMA allocations and therefore try 
to get it from the CMA area anyway, instead of skipping to the much 
cheaper fallback.


So the callers of dma_alloc_from_contiguous() could need that little 
optimization of skipping it if only one page is requested. For


dma_generic_alloc_coherent  
http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent  
andintel_alloc_coherent  http://lxr.free-electrons.com/ident?i=intel_alloc_coherent 
 this seems easy to do. Looking at the arm arch variants, e.g.,

http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194

and

http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44

i'm not sure if it is that easily done, as there aren't any fallbacks 
for such a case and the code looks to me as if that's at least somewhat 
intentional.


As far as TTM goes, one quick one-line fix to prevent it from using the 
CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above 
methods) would be to clear the __GFP_WAIT 
http://lxr.free-electrons.com/ident?i=__GFP_WAIT flag from the passed 
gfp_t flags. That would trigger the well working fallback. So, is


__GFP_WAIT  http://lxr.free-electrons.com/ident?i=__GFP_WAIT  needed for those 
single page allocations that go through__ttm_dma_alloc_page  
http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page?

It would be nice to have such a simple, non-intrusive one-line patch 
that we still could get into 3.17 and then backported to older stable 
kernels to avoid the same desktop hangs there if CMA is enabled. It 
would be also nice for actual users of CMA to not use up lots of CMA 
space for gpu's which don't need it. I think DMA_CMA was introduced 
around 3.12.



The other problem is that probably TTM does not reuse pages from the DMA 
pool. If i trace the __ttm_dma_alloc_page 
http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page and 
__ttm_dma_free_page 
http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page calls for 
those single page allocs/frees, then over a 20 second interval of 
tracing and switching tabs in firefox, scrolling things around etc. i 
find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 
frees.


This bit of code fromttm_dma_unpopulate 
http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate()  (line 954 
in 3.16) looks suspicious:


http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954

Alloc's from a tt_cached cached pool ( if (is_cached)...) always get 
freed and are not given back to the cached pool. But in the uncached 
case, there's logic to make sure the pool doesn't grow forever (line 
955, checking against _manager-options.max_size), but before that check 
in line 954 there's an uncoditional assignment of npages = count; which 
seems to force freeing all pages as well, instead of recycling? Is this 
some debug code left over, or intentional and just me not understanding 
what happens there?


thanks,
-mario



/Thomas


On 08/08/2014 07:42 PM, Mario Kleiner wrote:

Hi all,

there is a rather severe performance problem i accidentally found

when

trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under
Ubuntu 14.04 LTS with nouveau as graphics driver.

I was lazy and just installed the Ubuntu precompiled mainline kernel.
That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA
(contiguous memory allocator) size of 64 MB. Older Ubuntu kernels
weren't compiled with CMA, so i only observed this on 3.16, but
previous kernels would likely be affected too.

After a few minutes of regular desktop use like switching workspaces,
scrolling text in a terminal window, Firefox with multiple tabs open,
Thunderbird etc. (tested with KDE/Kwin, with/without desktop
composition), i get chunky desktop updates, then multi-second

freezes,

after a few minutes the desktop hangs for over a minute on almost any
GUI action like switching windows etc. -- Unuseable.

ftrace'ing shows the culprit

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-08 Thread Mario Kleiner


Hi all,

there is a rather severe performance problem i accidentally found when 
trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under 
Ubuntu 14.04 LTS with nouveau as graphics driver.


I was lazy and just installed the Ubuntu precompiled mainline kernel. 
That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA 
(contiguous memory allocator) size of 64 MB. Older Ubuntu kernels 
weren't compiled with CMA, so i only observed this on 3.16, but previous 
kernels would likely be affected too.


After a few minutes of regular desktop use like switching workspaces, 
scrolling text in a terminal window, Firefox with multiple tabs open, 
Thunderbird etc. (tested with KDE/Kwin, with/without desktop 
composition), i get chunky desktop updates, then multi-second freezes, 
after a few minutes the desktop hangs for over a minute on almost any 
GUI action like switching windows etc. --> Unuseable.


ftrace'ing shows the culprit being this callchain (typical good/bad 
example ftrace snippets at the end of this mail):


...ttm dma coherent memory allocations, e.g., from 
__ttm_dma_alloc_page() ... --> dma_alloc_coherent() --> platform 
specific hooks ... -> dma_generic_alloc_coherent() [on x86_64] --> 
dma_alloc_from_contiguous()


dma_alloc_from_contiguous() is a no-op without CONFIG_DMA_CMA, or when 
the machine is booted with kernel boot cmdline parameter "cma=0", so it 
triggers the fast alloc_pages_node() fallback at least on x86_64.


With CMA, this function becomes progressively more slow with every 
minute of desktop use, e.g., runtimes going up from < 0.3 usecs to 
hundreds or thousands of microseconds (before it gives up and 
alloc_pages_node() fallback is used), so this causes the 
multi-second/minute hangs of the desktop.


So it seems ttm memory allocations quickly fragment and/or exhaust the 
CMA memory area, and dma_alloc_from_contiguous() tries very hard to find 
a fitting hole big enough to satisfy allocations with a retry loop (see 
http://lxr.free-electrons.com/source/drivers/base/dma-contiguous.c#L339) 
that takes forever.


This is not good, also not for other devices which actually need a 
non-fragmented CMA for DMA, so what to do? I doubt most current gpus 
still need physically contiguous dma memory, maybe with exception of 
some embedded gpus?


My naive approach would be to add a new gfp_t flag a la ___GFP_AVOIDCMA, 
and make callers of dma_alloc_from_contiguous() refrain from doing so if 
they have some fallback for getting memory. And then add that flag to 
ttm's ttm_dma_populate() gfp_flags, e.g., around here: 
http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L884


However i'm not familiar enough with memory management, so likely 
greater minds here have much better ideas on how to deal with this?


thanks,
-mario

Typical snippet from an example trace of a badly stalling desktop with 
CMA (alloc_pages_node() fallback may have been missing in this traces 
ftrace_filter settings):


1)   |  ttm_dma_pool_get_pages [ttm]() {
 1)   | ttm_dma_page_pool_fill_locked [ttm]() {
 1)   | ttm_dma_pool_alloc_new_pages [ttm]() {
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1873.071 us | dma_alloc_from_contiguous();
 1) ! 1874.292 us |  }
 1) ! 1875.400 us |}
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1868.372 us | dma_alloc_from_contiguous();
 1) ! 1869.586 us |  }
 1) ! 1870.053 us |}
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1871.085 us | dma_alloc_from_contiguous();
 1) ! 1872.240 us |  }
 1) ! 1872.669 us |}
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1888.934 us | dma_alloc_from_contiguous();
 1) ! 1890.179 us |  }
 1) ! 1890.608 us |}
 1)   0.048 us| ttm_set_pages_caching [ttm]();
 1) ! 7511.000 us |  }
 1) ! 7511.306 us |}
 1) ! 7511.623 us |  }

The good case (with cma=0 kernel cmdline, so dma_alloc_from_contiguous() 
no-ops,)


0)   |  ttm_dma_pool_get_pages [ttm]() {
 0)   | ttm_dma_page_pool_fill_locked [ttm]() {
 0)   | ttm_dma_pool_alloc_new_pages [ttm]() {
 0)   | __ttm_dma_alloc_page [ttm]() {
 0)   | dma_generic_alloc_coherent() {
 0)   0.171 us| dma_alloc_from_contiguous();
 0)   0.849 us| __alloc_pages_nodemask();
 0)   3.029 us|  }
 0)   3.882 us|

CONFIG_DMA_CMA causes ttm performance problems/hangs.

2014-08-08 Thread Mario Kleiner


Hi all,

there is a rather severe performance problem i accidentally found when 
trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under 
Ubuntu 14.04 LTS with nouveau as graphics driver.


I was lazy and just installed the Ubuntu precompiled mainline kernel. 
That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA 
(contiguous memory allocator) size of 64 MB. Older Ubuntu kernels 
weren't compiled with CMA, so i only observed this on 3.16, but previous 
kernels would likely be affected too.


After a few minutes of regular desktop use like switching workspaces, 
scrolling text in a terminal window, Firefox with multiple tabs open, 
Thunderbird etc. (tested with KDE/Kwin, with/without desktop 
composition), i get chunky desktop updates, then multi-second freezes, 
after a few minutes the desktop hangs for over a minute on almost any 
GUI action like switching windows etc. -- Unuseable.


ftrace'ing shows the culprit being this callchain (typical good/bad 
example ftrace snippets at the end of this mail):


...ttm dma coherent memory allocations, e.g., from 
__ttm_dma_alloc_page() ... -- dma_alloc_coherent() -- platform 
specific hooks ... - dma_generic_alloc_coherent() [on x86_64] -- 
dma_alloc_from_contiguous()


dma_alloc_from_contiguous() is a no-op without CONFIG_DMA_CMA, or when 
the machine is booted with kernel boot cmdline parameter cma=0, so it 
triggers the fast alloc_pages_node() fallback at least on x86_64.


With CMA, this function becomes progressively more slow with every 
minute of desktop use, e.g., runtimes going up from  0.3 usecs to 
hundreds or thousands of microseconds (before it gives up and 
alloc_pages_node() fallback is used), so this causes the 
multi-second/minute hangs of the desktop.


So it seems ttm memory allocations quickly fragment and/or exhaust the 
CMA memory area, and dma_alloc_from_contiguous() tries very hard to find 
a fitting hole big enough to satisfy allocations with a retry loop (see 
http://lxr.free-electrons.com/source/drivers/base/dma-contiguous.c#L339) 
that takes forever.


This is not good, also not for other devices which actually need a 
non-fragmented CMA for DMA, so what to do? I doubt most current gpus 
still need physically contiguous dma memory, maybe with exception of 
some embedded gpus?


My naive approach would be to add a new gfp_t flag a la ___GFP_AVOIDCMA, 
and make callers of dma_alloc_from_contiguous() refrain from doing so if 
they have some fallback for getting memory. And then add that flag to 
ttm's ttm_dma_populate() gfp_flags, e.g., around here: 
http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L884


However i'm not familiar enough with memory management, so likely 
greater minds here have much better ideas on how to deal with this?


thanks,
-mario

Typical snippet from an example trace of a badly stalling desktop with 
CMA (alloc_pages_node() fallback may have been missing in this traces 
ftrace_filter settings):


1)   |  ttm_dma_pool_get_pages [ttm]() {
 1)   | ttm_dma_page_pool_fill_locked [ttm]() {
 1)   | ttm_dma_pool_alloc_new_pages [ttm]() {
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1873.071 us | dma_alloc_from_contiguous();
 1) ! 1874.292 us |  }
 1) ! 1875.400 us |}
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1868.372 us | dma_alloc_from_contiguous();
 1) ! 1869.586 us |  }
 1) ! 1870.053 us |}
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1871.085 us | dma_alloc_from_contiguous();
 1) ! 1872.240 us |  }
 1) ! 1872.669 us |}
 1)   | __ttm_dma_alloc_page [ttm]() {
 1)   | dma_generic_alloc_coherent() {
 1) ! 1888.934 us | dma_alloc_from_contiguous();
 1) ! 1890.179 us |  }
 1) ! 1890.608 us |}
 1)   0.048 us| ttm_set_pages_caching [ttm]();
 1) ! 7511.000 us |  }
 1) ! 7511.306 us |}
 1) ! 7511.623 us |  }

The good case (with cma=0 kernel cmdline, so dma_alloc_from_contiguous() 
no-ops,)


0)   |  ttm_dma_pool_get_pages [ttm]() {
 0)   | ttm_dma_page_pool_fill_locked [ttm]() {
 0)   | ttm_dma_pool_alloc_new_pages [ttm]() {
 0)   | __ttm_dma_alloc_page [ttm]() {
 0)   | dma_generic_alloc_coherent() {
 0)   0.171 us| dma_alloc_from_contiguous();
 0)   0.849 us| __alloc_pages_nodemask();
 0)   3.029 us|  }
 0)   3.882 us|

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-10-11 Thread Mario Kleiner


On 10/11/2013 03:30 PM, Sebastian Andrzej Siewior wrote:

On 10/11/2013 02:37 PM, Steven Rostedt wrote:

On Fri, 11 Oct 2013 12:18:00 +0200
Sebastian Andrzej Siewior  wrote:


* Mario Kleiner | 2013-09-26 18:16:47 [+0200]:


Good! I will do that. Thanks for clarifying the irq and constraints
on raw locks in the other thread.


Are there any suggestions for "now"?  preempt_disable_nort() like Luis
suggesed?



The preempt_disable_nort() is rather pointless, because the
preempt_disable() was added specifically for -rt. When PREEMPT_RT is
not enabled, preemption is disabled there already by the previous calls
to spin_lock().


Either way. Then I remove the preempt_enable/disable call. Any
objections?



Good with me. I'm currently working on a replacement.
-mario


-- Steve


Sebastian



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-10-11 Thread Mario Kleiner


On 10/11/2013 03:30 PM, Sebastian Andrzej Siewior wrote:

On 10/11/2013 02:37 PM, Steven Rostedt wrote:

On Fri, 11 Oct 2013 12:18:00 +0200
Sebastian Andrzej Siewior bige...@linutronix.de wrote:


* Mario Kleiner | 2013-09-26 18:16:47 [+0200]:


Good! I will do that. Thanks for clarifying the irq and constraints
on raw locks in the other thread.


Are there any suggestions for now?  preempt_disable_nort() like Luis
suggesed?



The preempt_disable_nort() is rather pointless, because the
preempt_disable() was added specifically for -rt. When PREEMPT_RT is
not enabled, preemption is disabled there already by the previous calls
to spin_lock().


Either way. Then I remove the preempt_enable/disable call. Any
objections?



Good with me. I'm currently working on a replacement.
-mario


-- Steve


Sebastian



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-26 Thread Mario Kleiner


On 25.09.13 09:49, Ville Syrjälä wrote:

On Wed, Sep 25, 2013 at 06:32:10AM +0200, Mario Kleiner wrote:

On 23.09.13 10:38, Ville Syrjälä wrote:

On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote:

On 09/17/2013 10:55 PM, Daniel Vetter wrote:

On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley  wrote:

On 09/11/2013 03:31 PM, Peter Hurley wrote:


[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:


On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley  wrote:


The funny part is, there's a comment there that shows that this was
done even for "PREEMPT_RT". Unfortunately, the call to
"get_scanout_position()" can call functions that use the rt-mutex
"sleeping spin locks" and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?



The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?



No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.



By that, you mean "any other lock" that might be claimed "would also need
to be raw"?  Hopefully not "any other lock" already held?


And by taking a quick audit of the code, I see this:

   spin_lock_irqsave(_priv->uncore.lock, irqflags);

   /* Reset the chip */

   /* GEN6_GDRST is not in the gt power well, no need to check
* for fifo space for the write or forcewake the chip for
* the read
*/
   __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

   /* Spin waiting for the device to ack the reset request */
   ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) &
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.



Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?




Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?


The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson 
Date:   Fri Jul 19 20:36:51 2013 +0100

   drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.


The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.


Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard 
Date:   Fri Jan 6 11:34:04 2012 -0800

   drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel



Regards,
Peter Hurley




What's the real issue here?



That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
   holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably
should

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-26 Thread Mario Kleiner


On 25.09.13 16:13, Steven Rostedt wrote:

On Wed, 25 Sep 2013 06:32:10 +0200
Mario Kleiner  wrote:



But given the new situation, your proposal is great! If we push the
clock readouts into the get_scanoutpos routine, we can make this robust
without causing grief for the rt people and without the need for a new
separate lock for display regs in intel-kms.

E.g., for intel-kms:

i915_get_crtc_scanoutpos(..., ktime_t *stime, ktime_t *etime)
{
...
spin_lock_irqsave(...uncore.lock);
preempt_disable();
*stime = ktime_get();
position = __raw_i915_read32(dev_priv, PIPEDSL(pipe));
*etime = ktime_get();
preempt_enable();
spin_unlock_irqrestore(...uncore.lock)
...
}

With your patchset to reduce the amount of register reads needed in that
function, and given that forcewake handling isn't needed for these
registers, this should make it robust again and wouldn't need new locks.

Unless ktime_get is also a bad thing to do in a preempt disabled section?


ktime_get() works fine in preempt_disable sections, although it may add
some latencies, but you shouldn't need to worry about it.

I like this solution the best too, but if it does go in, I would ask to
send us the patch for adding the preempt_disable() and we can add the
preempt_disable_rt() to it. Why make mainline have a little more
overhead?

-- Steve


Good! I will do that. Thanks for clarifying the irq and constraints on 
raw locks in the other thread.


-mario


___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-26 Thread Mario Kleiner


On 25.09.13 16:13, Steven Rostedt wrote:

On Wed, 25 Sep 2013 06:32:10 +0200
Mario Kleiner mario.klei...@tuebingen.mpg.de wrote:



But given the new situation, your proposal is great! If we push the
clock readouts into the get_scanoutpos routine, we can make this robust
without causing grief for the rt people and without the need for a new
separate lock for display regs in intel-kms.

E.g., for intel-kms:

i915_get_crtc_scanoutpos(..., ktime_t *stime, ktime_t *etime)
{
...
spin_lock_irqsave(...uncore.lock);
preempt_disable();
*stime = ktime_get();
position = __raw_i915_read32(dev_priv, PIPEDSL(pipe));
*etime = ktime_get();
preempt_enable();
spin_unlock_irqrestore(...uncore.lock)
...
}

With your patchset to reduce the amount of register reads needed in that
function, and given that forcewake handling isn't needed for these
registers, this should make it robust again and wouldn't need new locks.

Unless ktime_get is also a bad thing to do in a preempt disabled section?


ktime_get() works fine in preempt_disable sections, although it may add
some latencies, but you shouldn't need to worry about it.

I like this solution the best too, but if it does go in, I would ask to
send us the patch for adding the preempt_disable() and we can add the
preempt_disable_rt() to it. Why make mainline have a little more
overhead?

-- Steve


Good! I will do that. Thanks for clarifying the irq and constraints on 
raw locks in the other thread.


-mario


___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-26 Thread Mario Kleiner


On 25.09.13 09:49, Ville Syrjälä wrote:

On Wed, Sep 25, 2013 at 06:32:10AM +0200, Mario Kleiner wrote:

On 23.09.13 10:38, Ville Syrjälä wrote:

On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote:

On 09/17/2013 10:55 PM, Daniel Vetter wrote:

On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley pe...@hurleysoftware.com wrote:

On 09/11/2013 03:31 PM, Peter Hurley wrote:


[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:


On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley pe...@hurleysoftware.com wrote:


The funny part is, there's a comment there that shows that this was
done even for PREEMPT_RT. Unfortunately, the call to
get_scanout_position() can call functions that use the rt-mutex
sleeping spin locks and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?



The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?



No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.



By that, you mean any other lock that might be claimed would also need
to be raw?  Hopefully not any other lock already held?


And by taking a quick audit of the code, I see this:

   spin_lock_irqsave(dev_priv-uncore.lock, irqflags);

   /* Reset the chip */

   /* GEN6_GDRST is not in the gt power well, no need to check
* for fifo space for the write or forcewake the chip for
* the read
*/
   __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

   /* Spin waiting for the device to ack the reset request */
   ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) 
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.



Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?




Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?


The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Fri Jul 19 20:36:51 2013 +0100

   drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.


The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.


Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard kei...@keithp.com
Date:   Fri Jan 6 11:34:04 2012 -0800

   drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel



Regards,
Peter Hurley




What's the real issue here?



That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
   holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-25 Thread Mario Kleiner


On 23.09.13 10:38, Ville Syrjälä wrote:

On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote:

On 09/17/2013 10:55 PM, Daniel Vetter wrote:

On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley  wrote:

On 09/11/2013 03:31 PM, Peter Hurley wrote:


[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:


On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley  wrote:


The funny part is, there's a comment there that shows that this was
done even for "PREEMPT_RT". Unfortunately, the call to
"get_scanout_position()" can call functions that use the rt-mutex
"sleeping spin locks" and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?



The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?



No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.



By that, you mean "any other lock" that might be claimed "would also need
to be raw"?  Hopefully not "any other lock" already held?


And by taking a quick audit of the code, I see this:

  spin_lock_irqsave(_priv->uncore.lock, irqflags);

  /* Reset the chip */

  /* GEN6_GDRST is not in the gt power well, no need to check
   * for fifo space for the write or forcewake the chip for
   * the read
   */
  __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

  /* Spin waiting for the device to ack the reset request */
  ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) &
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.



Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?




Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?


The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson 
Date:   Fri Jul 19 20:36:51 2013 +0100

  drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.


The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.


Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard 
Date:   Fri Jan 6 11:34:04 2012 -0800

  drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel



Regards,
Peter Hurley




What's the real issue here?



That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
  holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably
should have mmio_idx_lock spinlock as raw]




Ok, so register access must be serialized, at least within a register
block

Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-25 Thread Mario Kleiner


On 23.09.13 10:38, Ville Syrjälä wrote:

On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote:

On 09/17/2013 10:55 PM, Daniel Vetter wrote:

On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley pe...@hurleysoftware.com wrote:

On 09/11/2013 03:31 PM, Peter Hurley wrote:


[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:


On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley pe...@hurleysoftware.com wrote:


The funny part is, there's a comment there that shows that this was
done even for PREEMPT_RT. Unfortunately, the call to
get_scanout_position() can call functions that use the rt-mutex
sleeping spin locks and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?



The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?



No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.



By that, you mean any other lock that might be claimed would also need
to be raw?  Hopefully not any other lock already held?


And by taking a quick audit of the code, I see this:

  spin_lock_irqsave(dev_priv-uncore.lock, irqflags);

  /* Reset the chip */

  /* GEN6_GDRST is not in the gt power well, no need to check
   * for fifo space for the write or forcewake the chip for
   * the read
   */
  __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

  /* Spin waiting for the device to ack the reset request */
  ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) 
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.



Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?




Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?


The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Fri Jul 19 20:36:51 2013 +0100

  drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.


The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.


Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard kei...@keithp.com
Date:   Fri Jan 6 11:34:04 2012 -0800

  drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel



Regards,
Peter Hurley




What's the real issue here?



That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
  holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably
should have mmio_idx_lock spinlock as raw]




Ok, so register access must be serialized, at least within a register

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-20 Thread Mario Kleiner


On 09/17/2013 10:55 PM, Daniel Vetter wrote:

On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley  wrote:

On 09/11/2013 03:31 PM, Peter Hurley wrote:


[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:


On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley  wrote:


The funny part is, there's a comment there that shows that this was
done even for "PREEMPT_RT". Unfortunately, the call to
"get_scanout_position()" can call functions that use the rt-mutex
"sleeping spin locks" and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?



The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?



No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.



By that, you mean "any other lock" that might be claimed "would also need
to be raw"?  Hopefully not "any other lock" already held?


And by taking a quick audit of the code, I see this:

 spin_lock_irqsave(_priv->uncore.lock, irqflags);

 /* Reset the chip */

 /* GEN6_GDRST is not in the gt power well, no need to check
  * for fifo space for the write or forcewake the chip for
  * the read
  */
 __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

 /* Spin waiting for the device to ack the reset request */
 ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) &
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.



Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?




Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?


The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson 
Date:   Fri Jul 19 20:36:51 2013 +0100

 drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.


The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.


Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard 
Date:   Fri Jan 6 11:34:04 2012 -0800

 drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel



Regards,
Peter Hurley




What's the real issue here?



That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
 holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably
should have mmio_idx_lock spinlock as raw]




Ok, so register access must be serialized, at least within a register 
block, no way around it. Thanks for explaining this. That makes me a bit 
depressed. Is this also true for older hw generation

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-20 Thread Mario Kleiner


On 09/17/2013 10:55 PM, Daniel Vetter wrote:

On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley pe...@hurleysoftware.com wrote:

On 09/11/2013 03:31 PM, Peter Hurley wrote:


[+cc dri-devel]

On 09/11/2013 11:38 AM, Steven Rostedt wrote:


On Wed, 11 Sep 2013 11:16:43 -0400
Peter Hurley pe...@hurleysoftware.com wrote:


The funny part is, there's a comment there that shows that this was
done even for PREEMPT_RT. Unfortunately, the call to
get_scanout_position() can call functions that use the rt-mutex
sleeping spin locks and it breaks there.

I guess we need to ask the authors of the mainline patch exactly why
that preempt_disable() is needed?



The drm core associates a timestamp with each vertical blank frame #.
Drm drivers can optionally support a 'high resolution' hw timestamp.
The vblank frame #/timestamp tuple is user-space visible.

The i915 drm driver supports a hw timestamp via this drm helper function
which computes the timestamp from the crtc scan position (based on the
pixel clock).

For mainline, the preempt_disable/_enable() isn't actually necessary
because every call tree that leads here already has preemption disabled.

For -RT, the maybe i915 register spinlock (uncore.lock) should be raw?



No, it should not. Note, any other lock that can be held when it is
held would also need to be raw.



By that, you mean any other lock that might be claimed would also need
to be raw?  Hopefully not any other lock already held?


And by taking a quick audit of the code, I see this:

 spin_lock_irqsave(dev_priv-uncore.lock, irqflags);

 /* Reset the chip */

 /* GEN6_GDRST is not in the gt power well, no need to check
  * for fifo space for the write or forcewake the chip for
  * the read
  */
 __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL);

 /* Spin waiting for the device to ack the reset request */
 ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) 
GEN6_GRDOM_FULL) == 0, 500);

That spin is unacceptable in RT with preemption and interrupts disabled.



Yep. That would be bad.

AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included
in the force-wake set, so raw reads of the registers would
probably be acceptable (thus obviating the need for claiming the
uncore.lock).

Except that _ALL_ register access is disabled with the uncore.lock
during a gpu reset. Not sure if that's meant to include crtc registers
or not, or what other synchronization/serialization issues are being
handled/hidden by forcing all register accesses to wait during a gpu
reset.

Hopefully an i915 expert can weigh in here?




Daniel,

Can you shed some light on whether the i915+ crtc registers (specifically
those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter())
read as part of the vblank counter/timestamp handling need to
be prevented during gpu reset?


The depency here in the locking is a recent addition:

commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a
Author: Chris Wilson ch...@chris-wilson.co.uk
Date:   Fri Jul 19 20:36:51 2013 +0100

 drm/i915: Serialize almost all register access

It's a (slightly) oversized hammer to work around a hardware issue -
we could break it down to register blocks, which can be accessed
concurrently, but that tends to be more fragile. But the chip really
dies if you access (even just reads) the same block concurrently :(

We could try break the spinlock protected section a bit in the reset
handler - register access on a hung gpu tends to be ill-defined
anyway.


The implied wait with preemption and interrupts disabled is causing grief
in -RT, but also a 4ms wait inside an irq handler seems like a bad idea.


Oops, the magic code in wait_for which is just there to make the imo
totally misguided kgdb support work papered over the aweful long wait
in atomic context ever since we've added this in

commit b6e45f866465f42b53d803b0c574da0fc508a0e9
Author: Keith Packard kei...@keithp.com
Date:   Fri Jan 6 11:34:04 2012 -0800

 drm/i915: Move reset forcewake processing to gen6_do_reset

Reverting this change should be enough (code moved obviously a bit).

Cheers, Daniel



Regards,
Peter Hurley




What's the real issue here?



That the vblank timestamp needs to be an accurate measurement of a
realtime event. Sleeping/servicing interrupts while reading
the registers necessary to compute the timestamp would be bad too.

(edit: which hopefully Mario Kleiner clarified in his reply)

My point earlier was three-fold:
1. Don't need the preempt_disable() for mainline: all callers are already
 holding interrupt-disabling spinlocks.
2. -RT still needs to prevent scheduling there.
3. the problem is i915-specific.

[update: the radeon driver should also BUG like the i915 driver but
probably
should have mmio_idx_lock spinlock as raw]




Ok, so register access must be serialized, at least within a register 
block, no way around it. Thanks for explaining this. That makes me a bit 
depressed. Is this also true for older hw

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-11 Thread Mario Kleiner




On 11.09.13 21:19, Steven Rostedt wrote:

On Wed, 11 Sep 2013 21:07:10 +0200
Mario Kleiner  wrote:




On 11.09.13 20:35, Steven Rostedt wrote:

On Wed, 11 Sep 2013 20:29:07 +0200
Mario Kleiner  wrote:


That said, maybe preempt_disable is no longer the optimal choice there
and there's some better way to achieve good protection against
interruptions of that bit of code? My knowledge here is a bit rusty, and
the intel kms drivers and rt stuff has changed quite a bit.


If you set your code to a higher priority than other tasks (and
interrupts) than it wont be preempted there. Unless of course it blocks
on a lock, but even then, priority inheritance will take place and it
still should be rather quick. (unless the holder of the lock is doing
that strange polling).

-- Steve



Right, on a rt kernel. But that creates the problem of not very computer
savvy users (psychologists and biologists mostly) somehow having to
choose proper priorities for gpu interrupt threads and for the
x-server/wayland/..., and not much protection on a non-rt kernel?


IIUC, the preempt_disable() is only for -rt, the non-rt case already
disables preemption with the spin_locks called before it.



Oh, right! should have thought about that. I'm quite sleepy, so my brain 
is not working very well atm.




preempt_disable() a few years ago looked like a good "plug and play"
default solution, because the ->get_crtc_scanoutpos() function was
supposed to have a very low and bounded execution time. At the time we
wrote the patches for intel/radeon/nouveau, that was the case. Typical
execution time (= preempt off time) was like 1-4 usecs, even on very low
end hardware.

Seems that at least intel's kms driver does a lot of things now, which
can sleep and spin inside that section? I tried to follow the posted
stack trace, but got lost somewhere around the i915_read32 code and
power management stuff...


Note, the sleeps only happen on -rt, and not in mainline.

If one is going to use -rt for real-time work, it requires a bit more
knowledge of the system. The problem with RT in general, is that it's
hard, and anyone telling you they have a generic RT system that
requires no computer savvyness can also be selling you a bridge over
the east river.

-- Steve


;) - I know the problem, i spend a lot of time telling that to users of 
my software, although they then generally want some sort of bridges 
anyway. I'm maintaining one of the most popular open-source toolkits for 
neuro-science, and in my experience at least the field of neuro-science 
research has the problem that a lot of people there need good real-time 
behaviour and a lot of flexibility in their hardware and software 
setups, but very few have the necessary technical background. Given the 
limited money they can spend, there's also not much commercial interest 
or probably viability in providing good technical consulting. The few 
proprietary hardware solutions i know of are either unaffordable by the 
majority, or are bridges over the east river, or quite often both. My 
main motivation for luring my users to Linux and contributing some 
little bits sometimes is the hope that some problems can be solved in a 
better way at the system level than piling software workarounds on top 
of hardware workarounds on top of expensive equipment.


But back to the topic, I think a better argument for the 
preempt_disable() there instead of changing code execution priority is 
that i wouldn't know how to set a static priority properly either. The 
timestamping code is also called from drm code (drmWaitVblank ioctl()) 
and it isn't called from the actual experiment software, where i would 
at least roughly know what i'm doing, and could adjust priorities 
dynamically, but from the X-Server, or maybe in the future Wayland, on 
behalf of the OpenGL client app. For the timestamping to work properly, 
one only would need a raised priority (higher than most interrupt kernel 
threads, except the one of the kms driver) for those few lines of 
timestamping code. I don't think it would be good to run xorg or wayland 
permanently at a higher priority than most irq threads, given that the 
display server does not only serve rt apps and is not designed as a 
realtime application. One only wants a short protection from preemption 
during timestamping.


Sorry, i think i'm rambling here quite a bit and i didn't want to 
sidetrack the thread, just give some explanation why i think the 
preempt_disable() is (/was?) justified.


-mario
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-11 Thread Mario Kleiner


On 11.09.13 17:16, Peter Hurley wrote:

On 09/11/2013 09:26 AM, Steven Rostedt wrote:

On Wed, 11 Sep 2013 07:28:09 -0300
"Luis Claudio R. Goncalves"  wrote:


Hello,

I saw two different occurrences of "BUG: sleeping function called from
invalid context" happening on 3.10.10-rt7. The first one happened on
drm_vblank_get() -> i915_get_vblank_timestamp() and was flooding dmesg
after Xorg started. I silenced that with a hackish patch, just for
fun, and
found a second problem, this time on tty_ldisc_reinit().

The logs:

[   23.973991] BUG: sleeping function called from invalid context at
/home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659
[   23.973992] in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg
[   23.973993] 2 locks held by Xorg/517:
[   23.973993]  #0:
[   23.973994]  (
[   23.973994] >vbl_lock
[   23.973995] ){..}
[   23.973995] , at:
[   23.974007] [] drm_vblank_get+0x30/0x2b0 [drm]
[   23.974008]  #1:
[   23.974008]  (
[   23.974008] >vblank_time_lock
[   23.974008] ){..}
[   23.974009] , at:
[   23.974013] [] drm_vblank_get+0xb1/0x2b0 [drm]
[   23.974013] Preemption disabled at:
[   23.974021] []
i915_get_vblank_timestamp+0x45/0xa0 [i915]
[   23.974023] CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5
[   23.974024] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15
02/05/2013
[   23.974024]  00070008
[   23.974025]  88017a08bae0
[   23.974025]  8164b790
[   23.974025]  88017a08baf8
[   23.974026]  8107e62f
[   23.974026]  880179840040
[   23.974026]  88017a08bb18
[   23.974027]  81651ac4
[   23.974027]  0002
[   23.974027]  88017984
[   23.974028]  88017a08bb48
[   23.974028]  a0084e67
[   23.974028] Call Trace:
[   23.974033]  [] dump_stack+0x19/0x1b
[   23.974035]  [] __might_sleep+0xff/0x170
[   23.974037]  [] rt_spin_lock+0x24/0x60
[   23.974043]  [] i915_read32+0x27/0x170 [i915]
[   23.974048]  [] i915_pipe_enabled+0x31/0x40 [i915]
[   23.974054]  []
i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915]
[   23.974056]  [] ? sched_clock_cpu+0xb5/0x100
[   23.974062]  []
drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm]
[   23.974068]  []
i915_get_vblank_timestamp+0x45/0xa0 [i915]
[   23.974073]  []
drm_get_last_vbltimestamp+0x48/0x70 [drm]
[   23.974078]  [] drm_vblank_get+0x185/0x2b0 [drm]
[   23.974079]  [] ? sched_clock_cpu+0xb5/0x100
[   23.974085]  [] drm_wait_vblank+0x83/0x5d0 [drm]
[   23.974086]  [] ? sched_clock_cpu+0xb5/0x100
[   23.974088]  [] ?
__lock_acquire.isra.31+0x273/0xa70
[   23.974093]  [] drm_ioctl+0x552/0x6a0 [drm]
[   23.974096]  [] ? avc_has_perm_flags+0x126/0x1c0
[   23.974098]  [] ? avc_has_perm_flags+0x28/0x1c0
[   23.974099]  [] ? put_lock_stats.isra.23+0xe/0x40
[   23.974101]  [] do_vfs_ioctl+0x325/0x5b0
[   23.974103]  [] ? file_has_perm+0x8e/0xa0
[   23.974105]  [] SyS_ioctl+0x81/0xa0
[   23.974107]  [] tracesys+0xdd/0xe2


and


[   25.423675] BUG: sleeping function called from invalid context at
/home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659
[   25.423676] in_atomic(): 1, irqs_disabled(): 1, pid: 188, name:
plymouthd
[   25.423676] 3 locks held by plymouthd/188:
[   25.423682]  #0:  (>legacy_mutex){..}, at:
[] tty_lock_nested+0x40/0x90
[   25.423686]  #1:  (>ldisc_mutex){..}, at:
[] tty_ldisc_hangup+0x152/0x300
[   25.423688]  #2:  (tty_ldisc_lock){..}, at:
[] tty_ldisc_reinit+0x72/0x130
[   25.423689] Preemption disabled at:[]
tty_ldisc_reinit+0x72/0x130
[   25.423691] CPU: 2 PID: 188 Comm: plymouthd Not tainted
3.10.10-rt7+ #6
[   25.423692] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15
02/05/2013
[   25.423694]  005ff000 8801788ada68 8164b790
8801788ada80
[   25.423695]  8107e62f 8801991ce1a0 8801788adaa0
81651ac4
[   25.423696]   ea0005e26680 8801788adaf8
81130984
[   25.423696] Call Trace:
[   25.423700]  [] dump_stack+0x19/0x1b
[   25.423702]  [] __might_sleep+0xff/0x170
[   25.423704]  [] rt_spin_lock+0x24/0x60
[   25.423707]  [] free_hot_cold_page+0xb4/0x3c0
[   25.423710]  []
?unfreeze_partials.isra.42+0x229/0x2b0
[   25.423711]  [] __free_pages+0x47/0x70
[   25.423713]  [] __free_memcg_kmem_pages+0x22/0x50
[   25.423714]  [] __free_slab+0xe8/0x1e0
[   25.423716]  [] free_delayed+0x34/0x50
[   25.423717]  [] __slab_free+0x273/0x36b
[   25.423719]  [] ? get_lock_stats+0x19/0x60
[   25.423721]  [] ? put_lock_stats.isra.23+0xe/0x40
[   25.423722]  [] kfree+0x1c4/0x210
[   25.423724]  [] ? tty_ldisc_reinit+0xa5/0x130
[   25.423725]  [] tty_ldisc_reinit+0xa5/0x130
[   25.423726]  [] tty_ldisc_hangup+0x16f/0x300
[   25.423728]  [] ? get_parent_ip+0xd/0x50
[   25.423731]  [] ? unpin_current_cpu+0x16/0x70
[   25.423732]  [] __tty_hangup+0x346/0x460
[   25.423733]  [] tty_vhangup+0x10/0x20
[   25.423735]  [] pty_close+0x131/0x180
[   25.423736]  [] tty_release+0x11d/0x5f0
[   25.423737]  [] ? get_lock_stats+0x19/0x60
[   25.423747]  [] ?

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-11 Thread Mario Kleiner




On 11.09.13 20:35, Steven Rostedt wrote:

On Wed, 11 Sep 2013 20:29:07 +0200
Mario Kleiner  wrote:


That said, maybe preempt_disable is no longer the optimal choice there
and there's some better way to achieve good protection against
interruptions of that bit of code? My knowledge here is a bit rusty, and
the intel kms drivers and rt stuff has changed quite a bit.


If you set your code to a higher priority than other tasks (and
interrupts) than it wont be preempted there. Unless of course it blocks
on a lock, but even then, priority inheritance will take place and it
still should be rather quick. (unless the holder of the lock is doing
that strange polling).

-- Steve



Right, on a rt kernel. But that creates the problem of not very computer 
savvy users (psychologists and biologists mostly) somehow having to 
choose proper priorities for gpu interrupt threads and for the 
x-server/wayland/..., and not much protection on a non-rt kernel?


preempt_disable() a few years ago looked like a good "plug and play" 
default solution, because the ->get_crtc_scanoutpos() function was 
supposed to have a very low and bounded execution time. At the time we 
wrote the patches for intel/radeon/nouveau, that was the case. Typical 
execution time (= preempt off time) was like 1-4 usecs, even on very low 
end hardware.


Seems that at least intel's kms driver does a lot of things now, which 
can sleep and spin inside that section? I tried to follow the posted 
stack trace, but got lost somewhere around the i915_read32 code and 
power management stuff...


-mario
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-11 Thread Mario Kleiner




On 11.09.13 20:35, Steven Rostedt wrote:

On Wed, 11 Sep 2013 20:29:07 +0200
Mario Kleiner mario.klei...@tuebingen.mpg.de wrote:


That said, maybe preempt_disable is no longer the optimal choice there
and there's some better way to achieve good protection against
interruptions of that bit of code? My knowledge here is a bit rusty, and
the intel kms drivers and rt stuff has changed quite a bit.


If you set your code to a higher priority than other tasks (and
interrupts) than it wont be preempted there. Unless of course it blocks
on a lock, but even then, priority inheritance will take place and it
still should be rather quick. (unless the holder of the lock is doing
that strange polling).

-- Steve



Right, on a rt kernel. But that creates the problem of not very computer 
savvy users (psychologists and biologists mostly) somehow having to 
choose proper priorities for gpu interrupt threads and for the 
x-server/wayland/..., and not much protection on a non-rt kernel?


preempt_disable() a few years ago looked like a good plug and play 
default solution, because the -get_crtc_scanoutpos() function was 
supposed to have a very low and bounded execution time. At the time we 
wrote the patches for intel/radeon/nouveau, that was the case. Typical 
execution time (= preempt off time) was like 1-4 usecs, even on very low 
end hardware.


Seems that at least intel's kms driver does a lot of things now, which 
can sleep and spin inside that section? I tried to follow the posted 
stack trace, but got lost somewhere around the i915_read32 code and 
power management stuff...


-mario
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-11 Thread Mario Kleiner


On 11.09.13 17:16, Peter Hurley wrote:

On 09/11/2013 09:26 AM, Steven Rostedt wrote:

On Wed, 11 Sep 2013 07:28:09 -0300
Luis Claudio R. Goncalves lclau...@uudg.org wrote:


Hello,

I saw two different occurrences of BUG: sleeping function called from
invalid context happening on 3.10.10-rt7. The first one happened on
drm_vblank_get() - i915_get_vblank_timestamp() and was flooding dmesg
after Xorg started. I silenced that with a hackish patch, just for
fun, and
found a second problem, this time on tty_ldisc_reinit().

The logs:

[   23.973991] BUG: sleeping function called from invalid context at
/home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659
[   23.973992] in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg
[   23.973993] 2 locks held by Xorg/517:
[   23.973993]  #0:
[   23.973994]  (
[   23.973994] dev-vbl_lock
[   23.973995] ){..}
[   23.973995] , at:
[   23.974007] [a0024c60] drm_vblank_get+0x30/0x2b0 [drm]
[   23.974008]  #1:
[   23.974008]  (
[   23.974008] dev-vblank_time_lock
[   23.974008] ){..}
[   23.974009] , at:
[   23.974013] [a0024ce1] drm_vblank_get+0xb1/0x2b0 [drm]
[   23.974013] Preemption disabled at:
[   23.974021] [a008bc95]
i915_get_vblank_timestamp+0x45/0xa0 [i915]
[   23.974023] CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5
[   23.974024] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15
02/05/2013
[   23.974024]  00070008
[   23.974025]  88017a08bae0
[   23.974025]  8164b790
[   23.974025]  88017a08baf8
[   23.974026]  8107e62f
[   23.974026]  880179840040
[   23.974026]  88017a08bb18
[   23.974027]  81651ac4
[   23.974027]  0002
[   23.974027]  88017984
[   23.974028]  88017a08bb48
[   23.974028]  a0084e67
[   23.974028] Call Trace:
[   23.974033]  [8164b790] dump_stack+0x19/0x1b
[   23.974035]  [8107e62f] __might_sleep+0xff/0x170
[   23.974037]  [81651ac4] rt_spin_lock+0x24/0x60
[   23.974043]  [a0084e67] i915_read32+0x27/0x170 [i915]
[   23.974048]  [a008a591] i915_pipe_enabled+0x31/0x40 [i915]
[   23.974054]  [a008a6be]
i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915]
[   23.974056]  [81087025] ? sched_clock_cpu+0xb5/0x100
[   23.974062]  [a00245d4]
drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm]
[   23.974068]  [a008bc95]
i915_get_vblank_timestamp+0x45/0xa0 [i915]
[   23.974073]  [a0024998]
drm_get_last_vbltimestamp+0x48/0x70 [drm]
[   23.974078]  [a0024db5] drm_vblank_get+0x185/0x2b0 [drm]
[   23.974079]  [81087025] ? sched_clock_cpu+0xb5/0x100
[   23.974085]  [a0025d03] drm_wait_vblank+0x83/0x5d0 [drm]
[   23.974086]  [81087025] ? sched_clock_cpu+0xb5/0x100
[   23.974088]  [810af143] ?
__lock_acquire.isra.31+0x273/0xa70
[   23.974093]  [a00212a2] drm_ioctl+0x552/0x6a0 [drm]
[   23.974096]  [8128d886] ? avc_has_perm_flags+0x126/0x1c0
[   23.974098]  [8128d788] ? avc_has_perm_flags+0x28/0x1c0
[   23.974099]  [810ad49e] ? put_lock_stats.isra.23+0xe/0x40
[   23.974101]  [811a0095] do_vfs_ioctl+0x325/0x5b0
[   23.974103]  [8128fc3e] ? file_has_perm+0x8e/0xa0
[   23.974105]  [811a03a1] SyS_ioctl+0x81/0xa0
[   23.974107]  [8165a342] tracesys+0xdd/0xe2


and


[   25.423675] BUG: sleeping function called from invalid context at
/home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659
[   25.423676] in_atomic(): 1, irqs_disabled(): 1, pid: 188, name:
plymouthd
[   25.423676] 3 locks held by plymouthd/188:
[   25.423682]  #0:  (tty-legacy_mutex){..}, at:
[816528d0] tty_lock_nested+0x40/0x90
[   25.423686]  #1:  (tty-ldisc_mutex){..}, at:
[8139b482] tty_ldisc_hangup+0x152/0x300
[   25.423688]  #2:  (tty_ldisc_lock){..}, at:
[8139a9c2] tty_ldisc_reinit+0x72/0x130
[   25.423689] Preemption disabled at:[8139a9c2]
tty_ldisc_reinit+0x72/0x130
[   25.423691] CPU: 2 PID: 188 Comm: plymouthd Not tainted
3.10.10-rt7+ #6
[   25.423692] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15
02/05/2013
[   25.423694]  005ff000 8801788ada68 8164b790
8801788ada80
[   25.423695]  8107e62f 8801991ce1a0 8801788adaa0
81651ac4
[   25.423696]   ea0005e26680 8801788adaf8
81130984
[   25.423696] Call Trace:
[   25.423700]  [8164b790] dump_stack+0x19/0x1b
[   25.423702]  [8107e62f] __might_sleep+0xff/0x170
[   25.423704]  [81651ac4] rt_spin_lock+0x24/0x60
[   25.423707]  [81130984] free_hot_cold_page+0xb4/0x3c0
[   25.423710]  [81178209]
?unfreeze_partials.isra.42+0x229/0x2b0
[   25.423711]  [81130dc7] __free_pages+0x47/0x70
[   25.423713]  [81130fb2] __free_memcg_kmem_pages+0x22/0x50
[   25.423714]  [81177528] __free_slab+0xe8/0x1e0
[   25.423716]  [81177654] free_delayed+0x34/0x50
[

Re: BUG: sleeping function called from invalid context on 3.10.10-rt7

2013-09-11 Thread Mario Kleiner




On 11.09.13 21:19, Steven Rostedt wrote:

On Wed, 11 Sep 2013 21:07:10 +0200
Mario Kleiner mario.klei...@tuebingen.mpg.de wrote:




On 11.09.13 20:35, Steven Rostedt wrote:

On Wed, 11 Sep 2013 20:29:07 +0200
Mario Kleiner mario.klei...@tuebingen.mpg.de wrote:


That said, maybe preempt_disable is no longer the optimal choice there
and there's some better way to achieve good protection against
interruptions of that bit of code? My knowledge here is a bit rusty, and
the intel kms drivers and rt stuff has changed quite a bit.


If you set your code to a higher priority than other tasks (and
interrupts) than it wont be preempted there. Unless of course it blocks
on a lock, but even then, priority inheritance will take place and it
still should be rather quick. (unless the holder of the lock is doing
that strange polling).

-- Steve



Right, on a rt kernel. But that creates the problem of not very computer
savvy users (psychologists and biologists mostly) somehow having to
choose proper priorities for gpu interrupt threads and for the
x-server/wayland/..., and not much protection on a non-rt kernel?


IIUC, the preempt_disable() is only for -rt, the non-rt case already
disables preemption with the spin_locks called before it.



Oh, right! should have thought about that. I'm quite sleepy, so my brain 
is not working very well atm.




preempt_disable() a few years ago looked like a good plug and play
default solution, because the -get_crtc_scanoutpos() function was
supposed to have a very low and bounded execution time. At the time we
wrote the patches for intel/radeon/nouveau, that was the case. Typical
execution time (= preempt off time) was like 1-4 usecs, even on very low
end hardware.

Seems that at least intel's kms driver does a lot of things now, which
can sleep and spin inside that section? I tried to follow the posted
stack trace, but got lost somewhere around the i915_read32 code and
power management stuff...


Note, the sleeps only happen on -rt, and not in mainline.

If one is going to use -rt for real-time work, it requires a bit more
knowledge of the system. The problem with RT in general, is that it's
hard, and anyone telling you they have a generic RT system that
requires no computer savvyness can also be selling you a bridge over
the east river.

-- Steve


;) - I know the problem, i spend a lot of time telling that to users of 
my software, although they then generally want some sort of bridges 
anyway. I'm maintaining one of the most popular open-source toolkits for 
neuro-science, and in my experience at least the field of neuro-science 
research has the problem that a lot of people there need good real-time 
behaviour and a lot of flexibility in their hardware and software 
setups, but very few have the necessary technical background. Given the 
limited money they can spend, there's also not much commercial interest 
or probably viability in providing good technical consulting. The few 
proprietary hardware solutions i know of are either unaffordable by the 
majority, or are bridges over the east river, or quite often both. My 
main motivation for luring my users to Linux and contributing some 
little bits sometimes is the hope that some problems can be solved in a 
better way at the system level than piling software workarounds on top 
of hardware workarounds on top of expensive equipment.


But back to the topic, I think a better argument for the 
preempt_disable() there instead of changing code execution priority is 
that i wouldn't know how to set a static priority properly either. The 
timestamping code is also called from drm code (drmWaitVblank ioctl()) 
and it isn't called from the actual experiment software, where i would 
at least roughly know what i'm doing, and could adjust priorities 
dynamically, but from the X-Server, or maybe in the future Wayland, on 
behalf of the OpenGL client app. For the timestamping to work properly, 
one only would need a raised priority (higher than most interrupt kernel 
threads, except the one of the kms driver) for those few lines of 
timestamping code. I don't think it would be good to run xorg or wayland 
permanently at a higher priority than most irq threads, given that the 
display server does not only serve rt apps and is not designed as a 
realtime application. One only wants a short protection from preemption 
during timestamping.


Sorry, i think i'm rambling here quite a bit and i didn't want to 
sidetrack the thread, just give some explanation why i think the 
preempt_disable() is (/was?) justified.


-mario
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

66 matches

Mail list logo