Re: [PATCH] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
On 02/18/2018 09:53 AM, Kai-Heng Feng wrote: Similar to commit e10aec652f31 ("drm/edid: Add 6 bpc quirk for display AEO model 0."), the EDID reports "DFP 1.x compliant TMDS" but it support 6bpc instead of 8 bpc. Hence, use 6 bpc quirk for this panel. Fixes: 196f954e2509 ("drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown"") BugLink: https://bugs.launchpad.net/bugs/1749420 Signed-off-by: Kai-Heng Feng <kai.heng.f...@canonical.com> --- drivers/gpu/drm/drm_edid.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index ddd537914575..d9c8d718e261 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -113,6 +113,9 @@ static const struct edid_quirk { /* AEO model 0 reports 8 bpc, but is a 6 bpc panel */ { "AEO", 0, EDID_QUIRK_FORCE_6BPC }, + /* CPT panel of Asus UX303LA reports 8 bpc, but is a 6 bpc panel */ + { "CPT", 0x17df, EDID_QUIRK_FORCE_6BPC }, + /* Belinea 10 15 55 */ { "MAX", 1516, EDID_QUIRK_PREFER_LARGE_60 }, { "MAX", 0x77e, EDID_QUIRK_PREFER_LARGE_60 }, Reviewed-by: Mario Kleiner <mario.kleiner...@gmail.com>
Re: [PATCH] drm/edid: Add 6 bpc quirk for CPT panel in Asus UX303LA
On 02/18/2018 09:53 AM, Kai-Heng Feng wrote: Similar to commit e10aec652f31 ("drm/edid: Add 6 bpc quirk for display AEO model 0."), the EDID reports "DFP 1.x compliant TMDS" but it support 6bpc instead of 8 bpc. Hence, use 6 bpc quirk for this panel. Fixes: 196f954e2509 ("drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown"") BugLink: https://bugs.launchpad.net/bugs/1749420 Signed-off-by: Kai-Heng Feng --- drivers/gpu/drm/drm_edid.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index ddd537914575..d9c8d718e261 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -113,6 +113,9 @@ static const struct edid_quirk { /* AEO model 0 reports 8 bpc, but is a 6 bpc panel */ { "AEO", 0, EDID_QUIRK_FORCE_6BPC }, + /* CPT panel of Asus UX303LA reports 8 bpc, but is a 6 bpc panel */ + { "CPT", 0x17df, EDID_QUIRK_FORCE_6BPC }, + /* Belinea 10 15 55 */ { "MAX", 1516, EDID_QUIRK_PREFER_LARGE_60 }, { "MAX", 0x77e, EDID_QUIRK_PREFER_LARGE_60 }, Reviewed-by: Mario Kleiner
Re: [PATCH 4.4 030/103] drm/amdgpu: Make display watermark calculations more accurate
On Thu, Jun 1, 2017 at 1:13 PM, Ben Hutchings <ben.hutchi...@codethink.co.uk> wrote: > On Tue, 2017-05-23 at 22:08 +0200, Greg Kroah-Hartman wrote: >> 4.4-stable review patch. If anyone has any objections, please let me know. >> >> ------ >> >> From: Mario Kleiner <mario.kleiner...@gmail.com> >> >> commit d63c277dc672e0c568481af043359420fa9d4736 upstream. >> >> Avoid big roundoff errors in scanline/hactive durations for >> high pixel clocks, especially for >= 500 Mhz, and thereby >> program more accurate display fifo watermarks. > [...] >> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c >> @@ -1237,14 +1237,14 @@ static void dce_v10_0_program_watermarks >> { >> struct drm_display_mode *mode = _crtc->base.mode; >> struct dce10_wm_params wm_low, wm_high; >> - u32 pixel_period; >> + u32 active_time; >> u32 line_time = 0; >> u32 latency_watermark_a = 0, latency_watermark_b = 0; >> u32 tmp, wm_mask, lb_vblank_lead_lines = 0; >> >> if (amdgpu_crtc->base.enabled && num_heads && mode) { >> - pixel_period = 100 / (u32)mode->clock; >> - line_time = min((u32)mode->crtc_htotal * pixel_period, >> (u32)65535); >> + active_time = 100UL * (u32)mode->crtc_hdisplay / >> (u32)mode->clock; >> + line_time = min((u32) (100UL * (u32)mode->crtc_htotal / >> (u32)mode->clock), (u32)65535); > [...] > > Won't these multiplications overflow if a >4K display is attached to a > 32-bit machine? > > Ben. > Yes, indeed > 4k causes a new overflow problem! Thanks for catching this. I will prepare a follow-up patch on top of this one, to use ... active_time = (u32) div_u64((u64) mode->crtc_hdisplay * 100, (u32)mode->clock); line_time = (u32) div_u64((u64) mode->crtc_htotal * 100, (u32)mode->clock); line_time = min(line_time, (u32) 65535); ...instead. Ok? -mario > -- > Ben Hutchings > Software Developer, Codethink Ltd. > >
Re: [PATCH 4.4 030/103] drm/amdgpu: Make display watermark calculations more accurate
On Thu, Jun 1, 2017 at 1:13 PM, Ben Hutchings wrote: > On Tue, 2017-05-23 at 22:08 +0200, Greg Kroah-Hartman wrote: >> 4.4-stable review patch. If anyone has any objections, please let me know. >> >> ------ >> >> From: Mario Kleiner >> >> commit d63c277dc672e0c568481af043359420fa9d4736 upstream. >> >> Avoid big roundoff errors in scanline/hactive durations for >> high pixel clocks, especially for >= 500 Mhz, and thereby >> program more accurate display fifo watermarks. > [...] >> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c >> @@ -1237,14 +1237,14 @@ static void dce_v10_0_program_watermarks >> { >> struct drm_display_mode *mode = _crtc->base.mode; >> struct dce10_wm_params wm_low, wm_high; >> - u32 pixel_period; >> + u32 active_time; >> u32 line_time = 0; >> u32 latency_watermark_a = 0, latency_watermark_b = 0; >> u32 tmp, wm_mask, lb_vblank_lead_lines = 0; >> >> if (amdgpu_crtc->base.enabled && num_heads && mode) { >> - pixel_period = 100 / (u32)mode->clock; >> - line_time = min((u32)mode->crtc_htotal * pixel_period, >> (u32)65535); >> + active_time = 100UL * (u32)mode->crtc_hdisplay / >> (u32)mode->clock; >> + line_time = min((u32) (100UL * (u32)mode->crtc_htotal / >> (u32)mode->clock), (u32)65535); > [...] > > Won't these multiplications overflow if a >4K display is attached to a > 32-bit machine? > > Ben. > Yes, indeed > 4k causes a new overflow problem! Thanks for catching this. I will prepare a follow-up patch on top of this one, to use ... active_time = (u32) div_u64((u64) mode->crtc_hdisplay * 100, (u32)mode->clock); line_time = (u32) div_u64((u64) mode->crtc_htotal * 100, (u32)mode->clock); line_time = min(line_time, (u32) 65535); ...instead. Ok? -mario > -- > Ben Hutchings > Software Developer, Codethink Ltd. > >
Re: [PATCH 18/28] drm: avoid uninitialized timestamp use in wait_vblank
On 10/18/2016 12:13 AM, Arnd Bergmann wrote: gcc warns about the timestamp in drm_wait_vblank being possibly used without an initialization: drivers/gpu/drm/drm_irq.c: In function 'drm_crtc_send_vblank_event': drivers/gpu/drm/drm_irq.c:992:24: error: 'now.tv_usec' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/gpu/drm/drm_irq.c:1069:17: note: 'now.tv_usec' was declared here drivers/gpu/drm/drm_irq.c:991:23: error: 'now.tv_sec' may be used uninitialized in this function [-Werror=maybe-uninitialized] This can happen if drm_vblank_count_and_time() returns 0 in its error path. To sanitize the error case, I'm changing that function to return a zero timestamp when it fails. Fixes: e6ae8687a87b ("drm: idiot-proof vblank") Reviewed-by: David Herrmann <dh.herrm...@gmail.com> Cc: Rob Clark <robdcl...@gmail.com> Cc: Daniel Vetter <daniel.vet...@ffwll.ch> Signed-off-by: Arnd Bergmann <a...@arndb.de> --- First submitted in January 2016, second submission in February, the patch is still required. drivers/gpu/drm/drm_irq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index b969a64..48a6167 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -952,8 +952,10 @@ static u32 drm_vblank_count_and_time(struct drm_device *dev, unsigned int pipe, u32 vblank_count; unsigned int seq; - if (WARN_ON(pipe >= dev->num_crtcs)) + if (WARN_ON(pipe >= dev->num_crtcs)) { + *vblanktime = (struct timeval) { 0 }; return 0; + } do { seq = read_seqbegin(>seqlock); Looks good to me. Reviewed-by: Mario Kleiner <mario.kleiner...@gmail.com> -mario
Re: [PATCH 18/28] drm: avoid uninitialized timestamp use in wait_vblank
On 10/18/2016 12:13 AM, Arnd Bergmann wrote: gcc warns about the timestamp in drm_wait_vblank being possibly used without an initialization: drivers/gpu/drm/drm_irq.c: In function 'drm_crtc_send_vblank_event': drivers/gpu/drm/drm_irq.c:992:24: error: 'now.tv_usec' may be used uninitialized in this function [-Werror=maybe-uninitialized] drivers/gpu/drm/drm_irq.c:1069:17: note: 'now.tv_usec' was declared here drivers/gpu/drm/drm_irq.c:991:23: error: 'now.tv_sec' may be used uninitialized in this function [-Werror=maybe-uninitialized] This can happen if drm_vblank_count_and_time() returns 0 in its error path. To sanitize the error case, I'm changing that function to return a zero timestamp when it fails. Fixes: e6ae8687a87b ("drm: idiot-proof vblank") Reviewed-by: David Herrmann Cc: Rob Clark Cc: Daniel Vetter Signed-off-by: Arnd Bergmann --- First submitted in January 2016, second submission in February, the patch is still required. drivers/gpu/drm/drm_irq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c index b969a64..48a6167 100644 --- a/drivers/gpu/drm/drm_irq.c +++ b/drivers/gpu/drm/drm_irq.c @@ -952,8 +952,10 @@ static u32 drm_vblank_count_and_time(struct drm_device *dev, unsigned int pipe, u32 vblank_count; unsigned int seq; - if (WARN_ON(pipe >= dev->num_crtcs)) + if (WARN_ON(pipe >= dev->num_crtcs)) { + *vblanktime = (struct timeval) { 0 }; return 0; + } do { seq = read_seqbegin(>seqlock); Looks good to me. Reviewed-by: Mario Kleiner -mario
Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
Ok, so legacy gamma table updates are completely broken for Intel on Linux-4.7-rc7, the final release candidate. The good news is that applying Lionel's patch "drm/i915: add missing condition for committing planes on crtc" from https://patchwork.freedesktop.org/patch/89111/ fixes it nicely. The patch currently applies cleanly to drm-fixes and drm-next and is Reviewed-and-tested-by: Mario Kleiner <mario.kleiner...@gmail.com> When we are at it, could somebody please look at that updated series of my Displayport color depth fixes ("EDID/DP fixes for proper bpc detection of displays.") i sent out a week ago? Especially pulling patch 2/5 "[PATCH 2/5] drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown" would be important, as that bug introduced a regression for Intel + DP + legacy DP converters into stable kernels which is very serious for users of scientific/medical display equipment, especially as the failures can easily go unnoticed during normal equipment tests, but would introduce the equivalent of "silent data corruption" into their measured scientific data, which is not a great experience given that collecting such data can easily take half a year of work time and ten-thousands of euros of wasted research funding. Patches 3 and 4 contain changes Daniel asked me to do, patch 5 would be good to safe-guard against similar issues in the future. thanks, -mario On 07/12/2016 12:50 PM, Lionel Landwerlin wrote: Hi Mario, There was a couple of patch to fix this issue : https://patchwork.freedesktop.org/series/5467/ https://patchwork.freedesktop.org/series/5466/ I tested this late last week on drm-intel-nightly, it seems a series of revert fixed most of the issues. Cheers, - Lionel On 12/07/16 11:33, Mario Kleiner wrote: Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com> Cc: Daniel Vetter <daniel.vet...@ffwll.ch> --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); -bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); -if (crtc->state->active && -(crtc->state->planes_changed || update_pipe)) +if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config))
Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
Ok, so legacy gamma table updates are completely broken for Intel on Linux-4.7-rc7, the final release candidate. The good news is that applying Lionel's patch "drm/i915: add missing condition for committing planes on crtc" from https://patchwork.freedesktop.org/patch/89111/ fixes it nicely. The patch currently applies cleanly to drm-fixes and drm-next and is Reviewed-and-tested-by: Mario Kleiner When we are at it, could somebody please look at that updated series of my Displayport color depth fixes ("EDID/DP fixes for proper bpc detection of displays.") i sent out a week ago? Especially pulling patch 2/5 "[PATCH 2/5] drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown" would be important, as that bug introduced a regression for Intel + DP + legacy DP converters into stable kernels which is very serious for users of scientific/medical display equipment, especially as the failures can easily go unnoticed during normal equipment tests, but would introduce the equivalent of "silent data corruption" into their measured scientific data, which is not a great experience given that collecting such data can easily take half a year of work time and ten-thousands of euros of wasted research funding. Patches 3 and 4 contain changes Daniel asked me to do, patch 5 would be good to safe-guard against similar issues in the future. thanks, -mario On 07/12/2016 12:50 PM, Lionel Landwerlin wrote: Hi Mario, There was a couple of patch to fix this issue : https://patchwork.freedesktop.org/series/5467/ https://patchwork.freedesktop.org/series/5466/ I tested this late last week on drm-intel-nightly, it seems a series of revert fixed most of the issues. Cheers, - Lionel On 12/07/16 11:33, Mario Kleiner wrote: Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner Cc: Maarten Lankhorst Cc: Lionel Landwerlin Cc: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); -bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); -if (crtc->state->active && -(crtc->state->planes_changed || update_pipe)) +if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config))
Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
On 07/12/2016 05:02 PM, Lionel Landwerlin wrote: On 12/07/16 13:11, Mario Kleiner wrote: On 07/12/2016 12:50 PM, Lionel Landwerlin wrote: Hi Mario, Hi Lionel, There was a couple of patch to fix this issue : https://patchwork.freedesktop.org/series/5467/ https://patchwork.freedesktop.org/series/5466/ Looking at them they should fix the issue, but they seem to be stuck in review? I tested this late last week on drm-intel-nightly, it seems a series of revert fixed most of the issues. You mean something else has fixed legacy gamma updates, as i can't find above patches applied on drm-intel-nightly? This revert on drm-intel-nightly seems to have fixed the problem : https://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915?id=e42aeef1237b7c969a77b7f726c50f6cb832185f Ok, with that intel-nightly looks like drm-next for 4.8 and that indeed has working lut updates in my testing. My own patch was motivated by the way the implementation is done in intel_atomic_commit_tail() from drm-next. Are those fixes supposed to be already part of 4.7-rc7, the final rc afaik? I haven't seen it on 4.7-rc7. I just checked Linus tree for 4.7-rc7 and there the code in intel_display.c didn't receive any updates since 13 days and looks like the broken code from rc6 which according to my testing doesn't work. So i'd assume legacy gamma table updates are broken in Linux 4.7 final rc atm. Couldn't test, because for some weird reason 4.7-rc7 doesn't even boot on my laptop :( - However i got that via a quick install from Ubuntu's mainline ppa so it could be some unrelated problem with their ppa builds. I think either my patch would fix it, but is untested wrt. nuclear pageflip, or those two patches you referenced, which apparently didn't move forward. What now? -mario thanks, -mario Cheers, - Lionel On 12/07/16 11:33, Mario Kleiner wrote: Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com> Cc: Daniel Vetter <daniel.vet...@ffwll.ch> --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); -bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); -if (crtc->state->active && -(crtc->state->planes_changed || update_pipe)) +if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config))
Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
On 07/12/2016 05:02 PM, Lionel Landwerlin wrote: On 12/07/16 13:11, Mario Kleiner wrote: On 07/12/2016 12:50 PM, Lionel Landwerlin wrote: Hi Mario, Hi Lionel, There was a couple of patch to fix this issue : https://patchwork.freedesktop.org/series/5467/ https://patchwork.freedesktop.org/series/5466/ Looking at them they should fix the issue, but they seem to be stuck in review? I tested this late last week on drm-intel-nightly, it seems a series of revert fixed most of the issues. You mean something else has fixed legacy gamma updates, as i can't find above patches applied on drm-intel-nightly? This revert on drm-intel-nightly seems to have fixed the problem : https://cgit.freedesktop.org/drm-intel/commit/drivers/gpu/drm/i915?id=e42aeef1237b7c969a77b7f726c50f6cb832185f Ok, with that intel-nightly looks like drm-next for 4.8 and that indeed has working lut updates in my testing. My own patch was motivated by the way the implementation is done in intel_atomic_commit_tail() from drm-next. Are those fixes supposed to be already part of 4.7-rc7, the final rc afaik? I haven't seen it on 4.7-rc7. I just checked Linus tree for 4.7-rc7 and there the code in intel_display.c didn't receive any updates since 13 days and looks like the broken code from rc6 which according to my testing doesn't work. So i'd assume legacy gamma table updates are broken in Linux 4.7 final rc atm. Couldn't test, because for some weird reason 4.7-rc7 doesn't even boot on my laptop :( - However i got that via a quick install from Ubuntu's mainline ppa so it could be some unrelated problem with their ppa builds. I think either my patch would fix it, but is untested wrt. nuclear pageflip, or those two patches you referenced, which apparently didn't move forward. What now? -mario thanks, -mario Cheers, - Lionel On 12/07/16 11:33, Mario Kleiner wrote: Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner Cc: Maarten Lankhorst Cc: Lionel Landwerlin Cc: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); -bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); -if (crtc->state->active && -(crtc->state->planes_changed || update_pipe)) +if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config))
Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
On 07/12/2016 12:50 PM, Lionel Landwerlin wrote: Hi Mario, Hi Lionel, There was a couple of patch to fix this issue : https://patchwork.freedesktop.org/series/5467/ https://patchwork.freedesktop.org/series/5466/ Looking at them they should fix the issue, but they seem to be stuck in review? I tested this late last week on drm-intel-nightly, it seems a series of revert fixed most of the issues. You mean something else has fixed legacy gamma updates, as i can't find above patches applied on drm-intel-nightly? Are those fixes supposed to be already part of 4.7-rc7, the final rc afaik? thanks, -mario Cheers, - Lionel On 12/07/16 11:33, Mario Kleiner wrote: Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com> Cc: Daniel Vetter <daniel.vet...@ffwll.ch> --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); -bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); -if (crtc->state->active && -(crtc->state->planes_changed || update_pipe)) +if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config))
Re: [PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
On 07/12/2016 12:50 PM, Lionel Landwerlin wrote: Hi Mario, Hi Lionel, There was a couple of patch to fix this issue : https://patchwork.freedesktop.org/series/5467/ https://patchwork.freedesktop.org/series/5466/ Looking at them they should fix the issue, but they seem to be stuck in review? I tested this late last week on drm-intel-nightly, it seems a series of revert fixed most of the issues. You mean something else has fixed legacy gamma updates, as i can't find above patches applied on drm-intel-nightly? Are those fixes supposed to be already part of 4.7-rc7, the final rc afaik? thanks, -mario Cheers, - Lionel On 12/07/16 11:33, Mario Kleiner wrote: Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner Cc: Maarten Lankhorst Cc: Lionel Landwerlin Cc: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); -bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); -if (crtc->state->active && -(crtc->state->planes_changed || update_pipe)) +if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config))
[PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> Cc: Lionel Landwerlin <lionel.g.landwer...@intel.com> Cc: Daniel Vetter <daniel.vet...@ffwll.ch> --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); - bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); - if (crtc->state->active && - (crtc->state->planes_changed || update_pipe)) + if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config)) -- 2.7.0
[PATCH] drm/i915: Fix legacy gamma lut updates in Linux 4.7-rc6
Updating legacy gamma tables, e.g., via RandR doesn't work at all as of Linux 4.7-rc6. Reason seems to be that the required call to drm_atomic_helper_commit_planes_on_crtc is skipped in intel_atomic_commit after userspace set new gamma tables, because neither crtc->state->planes_changed nor update_pipe (= pipe_config->update_pipe) are true. Removing the check for planes_changed || update_pipe fixes gamma table updates. The code for Linux 4.8 drm-next has changed a lot in that area wrt. 4.7, but the new code for 4.8 also removed those checks and calls drm_atomic_helper_commit_planes_on_crtc unconditionally, and legacy gamma lut updates work on drm-next, so this seems to be the right solution. Tested also shutdown/reboot, suspend/resume, (un-)plugging displays, mode switches for resolution/refresh rate, display rotation, and page-flipping/pageflip timing on Intel HD Ironlake to confirm the fix apparently doesn't break anything under X11. Signed-off-by: Mario Kleiner Cc: Maarten Lankhorst Cc: Lionel Landwerlin Cc: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 04452cf..eb8fb36 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -13685,7 +13685,6 @@ static int intel_atomic_commit(struct drm_device *dev, bool modeset = needs_modeset(crtc->state); struct intel_crtc_state *pipe_config = to_intel_crtc_state(crtc->state); - bool update_pipe = !modeset && pipe_config->update_pipe; if (modeset && crtc->state->active) { update_scanline_offset(to_intel_crtc(crtc)); @@ -13699,8 +13698,7 @@ static int intel_atomic_commit(struct drm_device *dev, drm_atomic_get_existing_plane_state(state, crtc->primary)) intel_fbc_enable(intel_crtc); - if (crtc->state->active && - (crtc->state->planes_changed || update_pipe)) + if (crtc->state->active) drm_atomic_helper_commit_planes_on_crtc(old_crtc_state); if (pipe_config->base.active && needs_vblank_wait(pipe_config)) -- 2.7.0
Re: [PATCH 2/2] drm/vc4: Squash commit for Mario's precise vblank timestamping.
Hi Eric, thanks for all the infos and help! Both your patches look good and i have successfully tested them on top of with my vblank timestamping patch. So for both: Reviewed-and-tested-by: Mario Kleiner <mario.kleiner...@gmail.com> Will you squash 2/2 into my patch or should i resend my patch with yours squashed in? thanks, -mario On 07/08/2016 08:44 PM, Eric Anholt wrote: Read out the DISPBASE registers to decide on the FIFO size. Signed-off-by: Eric Anholt <e...@anholt.net> --- Mario: How about this for a squash into your commit? Here are the values I dumped for cob_size: [2.148314] [drm] Scaler 0 size 5232 [2.162239] [drm] Scaler 2 size 2048 [2.172957] [drm] Scaler 1 size 13456 drivers/gpu/drm/vc4/vc4_crtc.c | 23 +-- drivers/gpu/drm/vc4/vc4_regs.h | 18 +- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index baf962bce063..3b7db17c356d 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -55,6 +55,8 @@ struct vc4_crtc { u8 lut_r[256]; u8 lut_g[256]; u8 lut_b[256]; + /* Size in pixels of the COB memory allocated to this CRTC. */ + u32 cob_size; struct drm_pending_vblank_event *event; }; @@ -195,8 +197,7 @@ int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id, *hpos = 0; /* This is the offset we need for translating hvs -> pv scanout pos. */ - /* XXX Find proper formula from hw docs instead of guesstimating? */ - fifo_lines = 2048 * 7 / mode->crtc_hdisplay; + fifo_lines = vc4_crtc->cob_size / mode->crtc_hdisplay; if (fifo_lines > 0) ret |= DRM_SCANOUTPOS_VALID; @@ -873,6 +874,22 @@ static void vc4_set_crtc_possible_masks(struct drm_device *drm, } } +static void +vc4_crtc_get_cob_allocation(struct vc4_crtc *vc4_crtc) +{ + struct drm_device *drm = vc4_crtc->base.dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); + u32 dispbase = HVS_READ(SCALER_DISPBASEX(vc4_crtc->channel)); + /* Top/base are supposed to be 4-pixel aligned, but the +* Raspberry Pi firmware fills the low bits (which are +* presumably ignored). +*/ + u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3; + u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3; + + vc4_crtc->cob_size = top - base + 4; +} + static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) { struct platform_device *pdev = to_platform_device(dev); @@ -949,6 +966,8 @@ static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) crtc->cursor = cursor_plane; } + vc4_crtc_get_cob_allocation(vc4_crtc); + CRTC_WRITE(PV_INTEN, 0); CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START); ret = devm_request_irq(dev, platform_get_irq(pdev, 0), diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h index 63cdc28ff7bb..160942a9180e 100644 --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -366,7 +366,6 @@ # define SCALER_DISPBKGND_FILLBIT(24) #define SCALER_DISPSTAT00x0048 -#define SCALER_DISPBASE00x004c # define SCALER_DISPSTATX_MODE_MASK VC4_MASK(31, 30) # define SCALER_DISPSTATX_MODE_SHIFT 30 # define SCALER_DISPSTATX_MODE_DISABLED 0 @@ -379,6 +378,20 @@ # define SCALER_DISPSTATX_FRAME_COUNT_SHIFT 12 # define SCALER_DISPSTATX_LINE_MASK VC4_MASK(11, 0) # define SCALER_DISPSTATX_LINE_SHIFT 0 + +#define SCALER_DISPBASE00x004c +/* Last pixel in the COB (display FIFO memory) allocated to this HVS + * channel. Must be 4-pixel aligned (and thus 4 pixels less than the + * next COB base). + */ +# define SCALER_DISPBASEX_TOP_MASK VC4_MASK(31, 16) +# define SCALER_DISPBASEX_TOP_SHIFT16 +/* First pixel in the COB (display FIFO memory) allocated to this HVS + * channel. Must be 4-pixel aligned. + */ +# define SCALER_DISPBASEX_BASE_MASKVC4_MASK(15, 0) +# define SCALER_DISPBASEX_BASE_SHIFT 0 + #define SCALER_DISPCTRL10x0050 #define SCALER_DISPBKGND1 0x0054 #define SCALER_DISPBKGNDX(x) (SCALER_DISPBKGND0 +\ @@ -389,6 +402,9 @@ (x) * (SCALER_DISPSTAT1 - \ SCALER_DISPSTAT0)) #define SCALER_DISPBASE10x005c +#define SCALER_DISPBASEX(x)(SCALER_DISPBASE0 +\ +
Re: [PATCH 2/2] drm/vc4: Squash commit for Mario's precise vblank timestamping.
Hi Eric, thanks for all the infos and help! Both your patches look good and i have successfully tested them on top of with my vblank timestamping patch. So for both: Reviewed-and-tested-by: Mario Kleiner Will you squash 2/2 into my patch or should i resend my patch with yours squashed in? thanks, -mario On 07/08/2016 08:44 PM, Eric Anholt wrote: Read out the DISPBASE registers to decide on the FIFO size. Signed-off-by: Eric Anholt --- Mario: How about this for a squash into your commit? Here are the values I dumped for cob_size: [2.148314] [drm] Scaler 0 size 5232 [2.162239] [drm] Scaler 2 size 2048 [2.172957] [drm] Scaler 1 size 13456 drivers/gpu/drm/vc4/vc4_crtc.c | 23 +-- drivers/gpu/drm/vc4/vc4_regs.h | 18 +- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vc4/vc4_crtc.c b/drivers/gpu/drm/vc4/vc4_crtc.c index baf962bce063..3b7db17c356d 100644 --- a/drivers/gpu/drm/vc4/vc4_crtc.c +++ b/drivers/gpu/drm/vc4/vc4_crtc.c @@ -55,6 +55,8 @@ struct vc4_crtc { u8 lut_r[256]; u8 lut_g[256]; u8 lut_b[256]; + /* Size in pixels of the COB memory allocated to this CRTC. */ + u32 cob_size; struct drm_pending_vblank_event *event; }; @@ -195,8 +197,7 @@ int vc4_crtc_get_scanoutpos(struct drm_device *dev, unsigned int crtc_id, *hpos = 0; /* This is the offset we need for translating hvs -> pv scanout pos. */ - /* XXX Find proper formula from hw docs instead of guesstimating? */ - fifo_lines = 2048 * 7 / mode->crtc_hdisplay; + fifo_lines = vc4_crtc->cob_size / mode->crtc_hdisplay; if (fifo_lines > 0) ret |= DRM_SCANOUTPOS_VALID; @@ -873,6 +874,22 @@ static void vc4_set_crtc_possible_masks(struct drm_device *drm, } } +static void +vc4_crtc_get_cob_allocation(struct vc4_crtc *vc4_crtc) +{ + struct drm_device *drm = vc4_crtc->base.dev; + struct vc4_dev *vc4 = to_vc4_dev(drm); + u32 dispbase = HVS_READ(SCALER_DISPBASEX(vc4_crtc->channel)); + /* Top/base are supposed to be 4-pixel aligned, but the +* Raspberry Pi firmware fills the low bits (which are +* presumably ignored). +*/ + u32 top = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_TOP) & ~3; + u32 base = VC4_GET_FIELD(dispbase, SCALER_DISPBASEX_BASE) & ~3; + + vc4_crtc->cob_size = top - base + 4; +} + static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) { struct platform_device *pdev = to_platform_device(dev); @@ -949,6 +966,8 @@ static int vc4_crtc_bind(struct device *dev, struct device *master, void *data) crtc->cursor = cursor_plane; } + vc4_crtc_get_cob_allocation(vc4_crtc); + CRTC_WRITE(PV_INTEN, 0); CRTC_WRITE(PV_INTSTAT, PV_INT_VFP_START); ret = devm_request_irq(dev, platform_get_irq(pdev, 0), diff --git a/drivers/gpu/drm/vc4/vc4_regs.h b/drivers/gpu/drm/vc4/vc4_regs.h index 63cdc28ff7bb..160942a9180e 100644 --- a/drivers/gpu/drm/vc4/vc4_regs.h +++ b/drivers/gpu/drm/vc4/vc4_regs.h @@ -366,7 +366,6 @@ # define SCALER_DISPBKGND_FILLBIT(24) #define SCALER_DISPSTAT00x0048 -#define SCALER_DISPBASE00x004c # define SCALER_DISPSTATX_MODE_MASK VC4_MASK(31, 30) # define SCALER_DISPSTATX_MODE_SHIFT 30 # define SCALER_DISPSTATX_MODE_DISABLED 0 @@ -379,6 +378,20 @@ # define SCALER_DISPSTATX_FRAME_COUNT_SHIFT 12 # define SCALER_DISPSTATX_LINE_MASK VC4_MASK(11, 0) # define SCALER_DISPSTATX_LINE_SHIFT 0 + +#define SCALER_DISPBASE00x004c +/* Last pixel in the COB (display FIFO memory) allocated to this HVS + * channel. Must be 4-pixel aligned (and thus 4 pixels less than the + * next COB base). + */ +# define SCALER_DISPBASEX_TOP_MASK VC4_MASK(31, 16) +# define SCALER_DISPBASEX_TOP_SHIFT16 +/* First pixel in the COB (display FIFO memory) allocated to this HVS + * channel. Must be 4-pixel aligned. + */ +# define SCALER_DISPBASEX_BASE_MASKVC4_MASK(15, 0) +# define SCALER_DISPBASEX_BASE_SHIFT 0 + #define SCALER_DISPCTRL10x0050 #define SCALER_DISPBKGND1 0x0054 #define SCALER_DISPBKGNDX(x) (SCALER_DISPBKGND0 +\ @@ -389,6 +402,9 @@ (x) * (SCALER_DISPSTAT1 - \ SCALER_DISPSTAT0)) #define SCALER_DISPBASE10x005c +#define SCALER_DISPBASEX(x)(SCALER_DISPBASE0 +\ +(x) * (SCALER_DISPBASE1 - \ + SCALER_DIS
Re: [PATCH 07/14] drm/nouveau: use drm_crtc_send_vblank_event()
On 04/14/2016 07:48 PM, Gustavo Padovan wrote: From: Gustavo PadovanReplace the legacy drm_send_vblank_event() with the new helper function. Signed-off-by: Gustavo Padovan --- drivers/gpu/drm/nouveau/nouveau_display.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 7ce7fa5..973c2d9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -841,10 +841,12 @@ nouveau_finish_page_flip(struct nouveau_channel *chan, s = list_first_entry(>flip, struct nouveau_page_flip_state, head); Hi Gustavo if (s->event) { + struct drm_crtc *crtc = drm_crtc_find(dev, s->crtc); + I don't think this would work. s->crtc is a nouveau internal display pipe index, not a drm mode object id, so i don't think drm_crtc_find() will do what you need. Also it takes a mutex, which might_sleep() and i think nouveau_finish_page_flip gets called from irq context and holds a spin_lock_irqsave, so would end badly. You'd probably have to extend struct nouveau_page_flip_state to carry around a reference to the required drm_crtc, set up in nouveau_crtc_page_flip(). -mario if (drm->device.info.family < NV_DEVICE_INFO_V0_TESLA) { drm_arm_vblank_event(dev, s->crtc, s->event); } else { - drm_send_vblank_event(dev, s->crtc, s->event); + drm_crtc_send_vblank_event(crtc, s->event); /* Give up ownership of vblank for page-flipped crtc */ drm_vblank_put(dev, s->crtc);
Re: [PATCH 07/14] drm/nouveau: use drm_crtc_send_vblank_event()
On 04/14/2016 07:48 PM, Gustavo Padovan wrote: From: Gustavo Padovan Replace the legacy drm_send_vblank_event() with the new helper function. Signed-off-by: Gustavo Padovan --- drivers/gpu/drm/nouveau/nouveau_display.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index 7ce7fa5..973c2d9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -841,10 +841,12 @@ nouveau_finish_page_flip(struct nouveau_channel *chan, s = list_first_entry(>flip, struct nouveau_page_flip_state, head); Hi Gustavo if (s->event) { + struct drm_crtc *crtc = drm_crtc_find(dev, s->crtc); + I don't think this would work. s->crtc is a nouveau internal display pipe index, not a drm mode object id, so i don't think drm_crtc_find() will do what you need. Also it takes a mutex, which might_sleep() and i think nouveau_finish_page_flip gets called from irq context and holds a spin_lock_irqsave, so would end badly. You'd probably have to extend struct nouveau_page_flip_state to carry around a reference to the required drm_crtc, set up in nouveau_crtc_page_flip(). -mario if (drm->device.info.family < NV_DEVICE_INFO_V0_TESLA) { drm_arm_vblank_event(dev, s->crtc, s->event); } else { - drm_send_vblank_event(dev, s->crtc, s->event); + drm_crtc_send_vblank_event(crtc, s->event); /* Give up ownership of vblank for page-flipped crtc */ drm_vblank_put(dev, s->crtc);
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 09:32 PM, Daniel Vetter wrote: On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote: On 01/25/2016 07:51 PM, Daniel Vetter wrote: On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote: Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 07:51 PM, Daniel Vetter wrote: On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote: Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. It does that on
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off. Ok, that's
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off. Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. Otherwise kms drivers would have to be careful to never call drm_vblank_off multiple times before calling drm_vblank_on, but the help text to drm_vblank_on() claims that unbalanced calls to these functions are perfectly fine. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off. Ok, that's what the comments there say, although i don't see atm. why that perceived stall would be a big problem. I checked all callers of vblank_disable_and_save(). They are all
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 07:51 PM, Daniel Vetter wrote: On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote: Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. It does that on
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_update_vblank_count() if vblank irqs get actually disabled, not on no-op calls. I will try that now. It does that on purpose. Otherwise the vblank counter would appear to have stalled while the interrupt was off. Ok, that's
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/25/2016 09:32 PM, Daniel Vetter wrote: On Mon, Jan 25, 2016 at 08:30:14PM +0100, Mario Kleiner wrote: On 01/25/2016 07:51 PM, Daniel Vetter wrote: On Mon, Jan 25, 2016 at 05:38:30PM +0100, Mario Kleiner wrote: Readding Daniel, which somehow got dropped from the cc. On 01/25/2016 03:53 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:44:53PM +0100, Mario Kleiner wrote: On 01/25/2016 02:23 PM, Ville Syrjälä wrote: On Mon, Jan 25, 2016 at 02:16:45PM +0100, Mario Kleiner wrote: On 01/25/2016 05:15 AM, Michel Dänzer wrote: On 23.01.2016 00:18, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. At least one of the jumps is expected, because this is around turning off the CRTC for DPMS off. Don't know yet why there are two jumps back though. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Which is of course the idea of Daniel's patch (which is what I'm getting the above with) or Mario's patch as well, but clearly something's still wrong. It's certainly possible that it's something in the driver, but since calling drm_vblank_pre/post_modeset from the same places seems to work fine (ignoring the regression discussed in this thread)... Do drm_vblank_on/off require something else to handle this correctly? I suspect it is because vblank_disable_and_save calls drm_update_vblank_count() unconditionally, even if vblank irqs are already off. So on a manual display disable -> reenable you get something like At disable: Call to dpms-off --> atombios_crtc_dpms(DPMS_OFF) --> drm_vblank_off -> vblank_disable_and_save -> irqs off, drm_update_vblank_count() computes final count. Then the crtc is shut down and its hw counter resets to zero. At reenable: Modesetting -> drm_crtc_helper_set_mode -> crtc_funcs->prepare(crtc) -> atombios_crtc_prepare() -> atombios_crtc_dpms(DPMS_OFF) -> drm_vblank_off -> vblank_disable_and_save -> A pointless drm_update_vblank_count() while the hw counter is already reset to zero --> Unwanted counter jump. The problem doesn't happen on a pure modeset to a different video resolution/refresh rate, as then we only have one call into atombios_crtc_dpms(DPMS_OFF). I think the fix is to fix vblank_disable_and_save() to only call drm_
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/22/2016 07:29 PM, Mario Kleiner wrote: On 01/22/2016 04:18 PM, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Fwiw, testing the HD-57570 single display with my patch that uses drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show hardware counter reset to zero as expected, but no jumps of software vblank counter. So with that vblank_off/on placement it seems to work nicely here. -mario I spoke too early. The jump doesn't happen when i change video modes - video resolution / refresh rate etc, despite hw counter reset. But if i just disable and then reenable a display, the software counter jumps. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/22/2016 07:29 PM, Mario Kleiner wrote: On 01/22/2016 04:18 PM, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Fwiw, testing the HD-57570 single display with my patch that uses drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show hardware counter reset to zero as expected, but no jumps of software vblank counter. So with that vblank_off/on placement it seems to work nicely here. -mario I spoke too early. The jump doesn't happen when i change video modes - video resolution / refresh rate etc, despite hw counter reset. But if i just disable and then reenable a display, the software counter jumps. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/22/2016 04:18 PM, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Fwiw, testing the HD-57570 single display with my patch that uses drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show hardware counter reset to zero as expected, but no jumps of software vblank counter. So with that vblank_off/on placement it seems to work nicely here. -mario dev->max_vblank_count = 0x, which makes the wraparound code in drm_update_vblank_count a no-op. Maybe you can reproduce it if you artificially set a lower max_vblank_count in the driver. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/22/2016 04:18 PM, Ville Syrjälä wrote: On Fri, Jan 22, 2016 at 12:06:00PM +0900, Michel Dänzer wrote: [ Trimming KDE folks from Cc ] On 21.01.2016 19:09, Daniel Vetter wrote: On Thu, Jan 21, 2016 at 05:36:46PM +0900, Michel Dänzer wrote: On 21.01.2016 16:58, Daniel Vetter wrote: Can you please point me at the vblank on/off jump bug please? AFAIR I originally reported it in response to http://lists.freedesktop.org/archives/dri-devel/2015-August/087841.html , but I can't find that in the archives, so maybe that was just on IRC. See http://lists.freedesktop.org/archives/dri-devel/2016-January/099122.html . Basically, I ran into the bug fixed by your patch because the counter jumped forward on every DPMS off, so it hit the 32-bit boundary after just a few days. Ok, so just uncovered the overflow bug. Not sure what you mean by "just", but to be clear: The drm_vblank_on/off counter jumping bug (similar to the bug this thread is about), which exposed the overflow bug, is still alive and kicking in 4.5. It seems to happen when turning off the CRTC: [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=0, hw=916 hw_last=916 [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x7 p(2199,-45)@ 7304.307354 -> 7304.308006 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] crtc 0: dist from vblank start 3 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=218104694, diff=16776301, hw=1 hw_last=916 Not sure what bug we're talking about here, but here the hw counter clearly jumps backwards. [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 1: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 2: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:radeon_get_vblank_counter_kms] Query failed! stat 3 [drm:drm_update_vblank_count] updating vblank count on crtc 3: current=0, diff=0, hw=0 hw_last=0 [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_calc_vbltimestamp_from_scanoutpos] crtc 0 : v 0x1 p(0,0)@ 7304.317140 -> 7304.317140 [e 0 us, 0 rep] [drm:radeon_get_vblank_counter_kms] Query failed! stat 1 [drm:drm_update_vblank_count] updating vblank count on crtc 0: current=234880995, diff=16777215, hw=0 hw_last=1 Same here. These things just don't happen on i915 because drm_vblank_off() and drm_vblank_on() are always called around the times when the hw counter might get reset. Or at least that's how it should be. Fwiw, testing the HD-57570 single display with my patch that uses drm_vblank_off/on() in the DPMS OFF/ON path of radeon-kms does show hardware counter reset to zero as expected, but no jumps of software vblank counter. So with that vblank_off/on placement it seems to work nicely here. -mario dev->max_vblank_count = 0x, which makes the wraparound code in drm_update_vblank_count a no-op. Maybe you can reproduce it if you artificially set a lower max_vblank_count in the driver. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/21/2016 07:38 AM, Michel Dänzer wrote: On 21.01.2016 14:31, Mario Kleiner wrote: On 01/21/2016 04:43 AM, Michel Dänzer wrote: On 21.01.2016 05:32, Mario Kleiner wrote: So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for. Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville? The whole logic has changed and the software counter updates are now driven all the time by the hw counter. BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand... I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, [...] ... just like drm_vblank_pre/post_modeset. That those were broken is a regression which needs to be fixed anyway. I don't think switching to drm_vblank_on/off is suitable for stable trees. Looking at Vlastimil's original post again, I'd say the most likely culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many vblanks were missed"). Yes, i think reverting that one alone would likely fix it by reverting to the old vblank update logic. Once drm_vblank_off is called, drm_vblank_get will no-op and return an error, so clients can't enable vblank irqs during the modeset - pageflip ioctl and waitvblank ioctl would fail while a modeset happens - hopefully userspace handles this correctly everywhere. We've fixed xf86-video-ati for this. I'll hack up a patch for demonstration now. You're a bit late to that party. :) http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html Oops. Just sent out my little (so far untested) creations. Yes, they are essentially the same as Daniel's patches. The only addition is to also fix that other potential small race i describe by slightly moving the xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put imbalance in radeon_pm if vblank_on/off would be used. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/21/2016 07:38 AM, Michel Dänzer wrote: On 21.01.2016 14:31, Mario Kleiner wrote: On 01/21/2016 04:43 AM, Michel Dänzer wrote: On 21.01.2016 05:32, Mario Kleiner wrote: So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for. Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville? The whole logic has changed and the software counter updates are now driven all the time by the hw counter. BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand... I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, [...] ... just like drm_vblank_pre/post_modeset. That those were broken is a regression which needs to be fixed anyway. I don't think switching to drm_vblank_on/off is suitable for stable trees. Looking at Vlastimil's original post again, I'd say the most likely culprit is 4dfd6486 ("drm: Use vblank timestamps to guesstimate how many vblanks were missed"). Yes, i think reverting that one alone would likely fix it by reverting to the old vblank update logic. Once drm_vblank_off is called, drm_vblank_get will no-op and return an error, so clients can't enable vblank irqs during the modeset - pageflip ioctl and waitvblank ioctl would fail while a modeset happens - hopefully userspace handles this correctly everywhere. We've fixed xf86-video-ati for this. I'll hack up a patch for demonstration now. You're a bit late to that party. :) http://lists.freedesktop.org/archives/dri-devel/2015-May/083614.html http://lists.freedesktop.org/archives/dri-devel/2015-July/086451.html Oops. Just sent out my little (so far untested) creations. Yes, they are essentially the same as Daniel's patches. The only addition is to also fix that other potential small race i describe by slightly moving the xxx_pm_compute_clocks() calls around. And a fix for drm_vblank_get/put imbalance in radeon_pm if vblank_on/off would be used. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/21/2016 04:43 AM, Michel Dänzer wrote: On 21.01.2016 05:32, Mario Kleiner wrote: So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for. Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville? The whole logic has changed and the software counter updates are now driven all the time by the hw counter. BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand... I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, and seem to reinitialize stuff properly and release clients queued vblank events to avoid blocking - not tested so far, just looked at the code. Once drm_vblank_off is called, drm_vblank_get will no-op and return an error, so clients can't enable vblank irqs during the modeset - pageflip ioctl and waitvblank ioctl would fail while a modeset happens - hopefully userspace handles this correctly everywhere. It would also cause radeons power management to not sync its actions to vblank if it would get invoked during a modeset, but that seems to be handled by a 200 msec timeout and hopefully only cause visual glitches - or invisible glitches while the crtc is blanked during modeset? There could be another tiny race with the new "vblank counter bumping" logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts ...") if drm_update_vblank_counter() would be called multiple times in quick succession within the "radeon_crtc->lb_vblank_lead_lines" scanlines before start of real vblank iff at the same time a modeset would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller value due to a change in horizontal mode resolution. That needs a modeset to happen to a higher horizontal resolution just exactly when the scanout is in exactly the right 5 or so scanlines and some client is calling drm_vblank_get() to enable vblank irqs at the same time, but it would cause the same hang if it happened - not that likely to happen often, but still not nice, also Murphy's law... If we could switch to drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove those race as well by forbidding any vblank irq related activity during a modeset. I'll hack up a patch for demonstration now.
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/18/2016 11:49 AM, Vlastimil Babka wrote: On 01/16/2016 05:24 AM, Mario Kleiner wrote: On 01/15/2016 01:26 PM, Ville Syrjälä wrote: On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote: I'm currently running... while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770 and so far i can't trigger a hang after hundreds of runs. Does this also hang for you? No, test mode seems to be fine. I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank" should probably give useful info around the time of the hang. Attached. Captured by having kdm running, switching to console, running "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see frozen splashscreen, switch back, terminate dmesg. So somewhere around the middle there should be where ksplashscreen starts... Maybe also check XOrg.0.log for (WW) warnings related to flip. No such warnings there. thanks, -mario Thanks, Vlastimil Thanks. So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for. The new code in drm_update_vblank_count() breaks this. The reset of the counter to zero is treated as counter wraparound, so our software vblank counter jumps forward by up to 2^24 counts in response (in case of AMD's 24 bit hw counters), and then the vblank event handling code in drm_handle_vblank_events() and other places detects the counter being more than 2^23 counts ahead of queued vblank events and as part of its own wraparound handling for the 32-Bit software counter doesn't deliver these queued events for a long time -> no vblank swap trigger event -> no swap -> client hangs waiting for swap completion. I think i remember seeing the ksplash progress screen occasionally blanking half way through login, i guess that's when kwin triggers a modeset in parallel to ksplash doing its OpenGL animations. So depending on the hw vblank count at the time of login ksplash would or wouldn't hang, apparently i got "lucky" with my counts at login. -mario
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/21/2016 04:43 AM, Michel Dänzer wrote: On 21.01.2016 05:32, Mario Kleiner wrote: So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for. Right, looks like there's been a regression breaking this. I suspect the problem is that vblank->last isn't getting updated from drm_vblank_post_modeset. Not sure which change broke that though, or how to fix it. Ville? The whole logic has changed and the software counter updates are now driven all the time by the hw counter. BTW, I'm seeing a similar issue with drm_vblank_on/off as well, which exposed the bug fixed by 209e4dbc ("drm/vblank: Use u32 consistently for vblank counters"). I've been meaning to track that down since then; one of these days hopefully, but if anybody has any ideas offhand... I spent the last few hours reading through the drm and radeon code and i think what should probably work is to replace the drm_vblank_pre/post_modeset calls in radeon/amdgpu by drm_vblank_off/on calls. These are apparently meant for drivers whose hw counters reset during modeset, and seem to reinitialize stuff properly and release clients queued vblank events to avoid blocking - not tested so far, just looked at the code. Once drm_vblank_off is called, drm_vblank_get will no-op and return an error, so clients can't enable vblank irqs during the modeset - pageflip ioctl and waitvblank ioctl would fail while a modeset happens - hopefully userspace handles this correctly everywhere. It would also cause radeons power management to not sync its actions to vblank if it would get invoked during a modeset, but that seems to be handled by a 200 msec timeout and hopefully only cause visual glitches - or invisible glitches while the crtc is blanked during modeset? There could be another tiny race with the new "vblank counter bumping" logic from commit 5b5561b ("drm/radeon: Fixup hw vblank counters/ts ...") if drm_update_vblank_counter() would be called multiple times in quick succession within the "radeon_crtc->lb_vblank_lead_lines" scanlines before start of real vblank iff at the same time a modeset would happen and set radeon_crtc->lb_vblank_lead_lines to a smaller value due to a change in horizontal mode resolution. That needs a modeset to happen to a higher horizontal resolution just exactly when the scanout is in exactly the right 5 or so scanlines and some client is calling drm_vblank_get() to enable vblank irqs at the same time, but it would cause the same hang if it happened - not that likely to happen often, but still not nice, also Murphy's law... If we could switch to drm_vblank_off/on instead of drm_vblank_pre/post_modeset we could remove those race as well by forbidding any vblank irq related activity during a modeset. I'll hack up a patch for demonstration now.
Re: linux-4.4 bisected: kwin5 stuck on kde5 loading screen with radeon
On 01/18/2016 11:49 AM, Vlastimil Babka wrote: On 01/16/2016 05:24 AM, Mario Kleiner wrote: On 01/15/2016 01:26 PM, Ville Syrjälä wrote: On Fri, Jan 15, 2016 at 11:34:08AM +0100, Vlastimil Babka wrote: I'm currently running... while xinit /usr/bin/ksplashqml --test -- :1 ; do echo yay; done ... in an endless loop on Linux 4.4 SMP PREEMPT on HD-5770 and so far i can't trigger a hang after hundreds of runs. Does this also hang for you? No, test mode seems to be fine. I think a drm.debug=0x21 setting and grep'ping the syslog for "vblank" should probably give useful info around the time of the hang. Attached. Captured by having kdm running, switching to console, running "dmesg -C ; dmesg -w > /tmp/dmesg", switch to kdm, enter password, see frozen splashscreen, switch back, terminate dmesg. So somewhere around the middle there should be where ksplashscreen starts... Maybe also check XOrg.0.log for (WW) warnings related to flip. No such warnings there. thanks, -mario Thanks, Vlastimil Thanks. So the problem is that AMDs hardware frame counters reset to zero during a modeset. The old DRM code dealt with drivers doing that by keeping vblank irqs enabled during modesets and incrementing vblank count by one during each vblank irq, i think that's what drm_vblank_pre_modeset() and drm_vblank_post_modeset() were meant for. The new code in drm_update_vblank_count() breaks this. The reset of the counter to zero is treated as counter wraparound, so our software vblank counter jumps forward by up to 2^24 counts in response (in case of AMD's 24 bit hw counters), and then the vblank event handling code in drm_handle_vblank_events() and other places detects the counter being more than 2^23 counts ahead of queued vblank events and as part of its own wraparound handling for the 32-Bit software counter doesn't deliver these queued events for a long time -> no vblank swap trigger event -> no swap -> client hangs waiting for swap completion. I think i remember seeing the ksplash progress screen occasionally blanking half way through login, i guess that's when kwin triggers a modeset in parallel to ksplash doing its OpenGL animations. So depending on the hw vblank count at the time of login ksplash would or wouldn't hang, apparently i got "lucky" with my counts at login. -mario
Fwd: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table
Forwarded Message Subject: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table Date: Fri, 18 Dec 2015 20:24:06 +0100 From: Mario Kleiner To: x...@kernel.org CC: mi...@redhat.com, h...@zytor.com, mario.kleiner...@gmail.com, sta...@vger.kernel.org Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by: Mario Kleiner Cc: --- arch/x86/kernel/reboot.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 02693dd..f660d63 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -182,6 +182,14 @@ static struct dmi_system_id __initdata reboot_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "iMac9,1"), }, }, + { /* Handle problems with rebooting on the iMac10,1. */ + .callback = set_pci_reboot, + .ident = "Apple iMac10,1", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "iMac10,1"), + }, + }, /* ASRock */ { /* Handle problems with rebooting on ASRock Q1900DC-ITX */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Fwd: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table
Forwarded Message Subject: [PATCH] x86: Add iMac10,1 to pci_reboot_dmi_table Date: Fri, 18 Dec 2015 20:24:06 +0100 From: Mario Kleiner <mario.kleiner...@gmail.com> To: x...@kernel.org CC: mi...@redhat.com, h...@zytor.com, mario.kleiner...@gmail.com, sta...@vger.kernel.org Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by: Mario Kleiner <mario.kleiner...@gmail.com> Cc: <sta...@vger.kernel.org> --- arch/x86/kernel/reboot.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c index 02693dd..f660d63 100644 --- a/arch/x86/kernel/reboot.c +++ b/arch/x86/kernel/reboot.c @@ -182,6 +182,14 @@ static struct dmi_system_id __initdata reboot_dmi_table[] = { DMI_MATCH(DMI_PRODUCT_NAME, "iMac9,1"), }, }, + { /* Handle problems with rebooting on the iMac10,1. */ + .callback = set_pci_reboot, + .ident = "Apple iMac10,1", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Apple Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "iMac10,1"), + }, + }, /* ASRock */ { /* Handle problems with rebooting on ASRock Q1900DC-ITX */ -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
On 08/13/2014 03:50 AM, Michel Dänzer wrote: On 12.08.2014 00:17, Jerome Glisse wrote: On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote: On 08/10/2014 08:02 PM, Mario Kleiner wrote: On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: On 08/10/2014 05:11 AM, Mario Kleiner wrote: The other problem is that probably TTM does not reuse pages from the DMA pool. If i trace the __ttm_dma_alloc_page <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0> and __ttm_dma_free_page <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0> calls for those single page allocs/frees, then over a 20 second interval of tracing and switching tabs in firefox, scrolling things around etc. i find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 frees. This is because historically the pools have been designed to keep only pages with nonstandard caching attributes since changing page caching attributes have been very slow but the kernel page allocators have been reasonably fast. /Thomas Ok. A bit more ftraceing showed my hang problem case goes through the "if (is_cached)" paths, so the pool doesn't recycle anything and i see it bouncing up and down by 4 pages all the time. But for the non-cached case, which i don't hit with my problem, could one of you look at line 954... https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60 ... and tell me why that unconditional npages = count; assignment makes sense? It seems to essentially disable all recycling for the dma pool whenever the pool isn't filled up to/beyond its maximum with free pages? When the pool is filled up, lots of stuff is recycled, but when it is already somewhat below capacity, it gets "punished" by not getting refilled? I'd just like to understand the logic behind that line. thanks, -mario I'll happily forward that question to Konrad who wrote the code (or it may even stem from the ordinary page pool code which IIRC has Dave Airlie / Jerome Glisse as authors) This is effectively bogus code, i now wonder how it came to stay alive. Attached patch will fix that. I haven't tested Mario's scenario specifically, but it survived piglit and the UE4 Effects Cave Demo (for which 1GB of VRAM isn't enough, so some BOs ended up in GTT instead with write-combined CPU mappings) on radeonsi without any noticeable issues. Tested-by: Michel Dänzer I haven't tested the patch yet. For the original bug it won't help directly, because the super-slow allocations which cause the desktop stall are tt_cached allocations, so they go through the if (is_cached) code path which isn't improved by Jerome's patch. is_cached always releases memory immediately, so the tt_cached pool just bounces up and down between 4 and 7 pages. So this was an independent issue. The slow allocations i noticed were mostly caused by exa allocating new gem bo's, i don't know which path is taken by 3d graphics? However, the fixed ttm path could indirectly solve the DMA_CMA stalls by completely killing CMA for its intended purpose. Typical CMA sizes are probably around < 100 MB (kernel default is 16 MB, Ubuntu config is 64 MB), and the limit for the page pool seems to be more like 50% of all system RAM? Iow. if the ttm dma pool is allowed to grow that big with recycled pages, it probably will almost completely monopolize the whole CMA memory after a short amount of time. ttm won't suffer stalls if it essentially doesn't interact with CMA anymore after a warmup period, but actual clients which really need CMA (ie., hardware without scatter-gather dma etc.) will be starved of what they need as far as my limited understanding of the CMA goes. So fwiw probably the fix to ttm will increase the urgency for the CMA people to come up with a fix/optimization for the allocator. Unless it doesn't matter if most desktop systems have CMA disabled by default, and ttm is mostly used by desktop graphics drivers (nouveau, radeon, vmgfx)? I only stumbled over the problem because the Ubuntu 3.16 mainline testing kernels are compiled with CMA on. -mario -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body o
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
On 08/11/2014 05:17 PM, Jerome Glisse wrote: On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote: On 08/10/2014 08:02 PM, Mario Kleiner wrote: On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: On 08/10/2014 05:11 AM, Mario Kleiner wrote: Resent this time without HTML formatting which lkml doesn't like. Sorry. On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: On August 9, 2014 1:39:39 AM EDT, Thomas Hellstrom wrote: Hi. Hey Thomas! IIRC I don't think the TTM DMA pool allocates coherent pages more than one page at a time, and _if that's true_ it's pretty unnecessary for the dma subsystem to route those allocations to CMA. Maybe Konrad could shed some light over this? It should allocate in batches and keep them in the TTM DMA pool for some time to be reused. The pages that it gets are in 4kb granularity though. Then I feel inclined to say this is a DMA subsystem bug. Single page allocations shouldn't get routed to CMA. /Thomas Yes, seems you're both right. I read through the code a bit more and indeed the TTM DMA pool allocates only one page during each dma_alloc_coherent() call, so it doesn't need CMA memory. The current allocators don't check for single page CMA allocations and therefore try to get it from the CMA area anyway, instead of skipping to the much cheaper fallback. So the callers of dma_alloc_from_contiguous() could need that little optimization of skipping it if only one page is requested. For dma_generic_alloc_coherent <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Ddma_generic_alloc_coherent=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d1852625e2ab2ff07eb34a7f33fc1f55f7f13959912d5a6ce9316d23070ce939> andintel_alloc_coherent <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Dintel_alloc_coherent=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=82d587e9b6aeced5cf9a7caefa91bf47fba809f3522b7379d22e45a2d5d35ebd> this seems easy to do. Looking at the arm arch variants, e.g., https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c%23L1194=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=4c178257eab9b5d7ca650dedba76cf27abeb49ddc7aebb9433f52b6c8bb3bbac and https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c%23L44=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=5f62f4cbe8cee1f1dd4cbba656354efe6867bcdc664cf90e9719e2f42a85de08 i'm not sure if it is that easily done, as there aren't any fallbacks for such a case and the code looks to me as if that's at least somewhat intentional. As far as TTM goes, one quick one-line fix to prevent it from using the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above methods) would be to clear the __GFP_WAIT <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc> flag from the passed gfp_t flags. That would trigger the well working fallback. So, is __GFP_WAIT <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAIT=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc> needed for those single page allocations that go through__ttm_dma_alloc_page <https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_page=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0A=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0>? It would be nice to have such a simple, non-intrusive one-line patch that we still could get into 3.17 and then backported to older stable kernels to avoid the same desktop hangs there if CMA is enabled. It would be also nice for actual users of CMA to not use up lots of CMA space for gpu's which don't need it. I think DMA_CMA was introduced around 3.12. I don't think that's a good idea. Omitting __GFP_WAIT would cause unnecessary memory allocation errors on systems under stress. I think this should be filed as a DMA subsystem kernel bug / regression and an appropriate solution should be worked out together with
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
On 08/11/2014 05:17 PM, Jerome Glisse wrote: On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote: On 08/10/2014 08:02 PM, Mario Kleiner wrote: On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: On 08/10/2014 05:11 AM, Mario Kleiner wrote: Resent this time without HTML formatting which lkml doesn't like. Sorry. On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: On August 9, 2014 1:39:39 AM EDT, Thomas Hellstromthellst...@vmware.com wrote: Hi. Hey Thomas! IIRC I don't think the TTM DMA pool allocates coherent pages more than one page at a time, and _if that's true_ it's pretty unnecessary for the dma subsystem to route those allocations to CMA. Maybe Konrad could shed some light over this? It should allocate in batches and keep them in the TTM DMA pool for some time to be reused. The pages that it gets are in 4kb granularity though. Then I feel inclined to say this is a DMA subsystem bug. Single page allocations shouldn't get routed to CMA. /Thomas Yes, seems you're both right. I read through the code a bit more and indeed the TTM DMA pool allocates only one page during each dma_alloc_coherent() call, so it doesn't need CMA memory. The current allocators don't check for single page CMA allocations and therefore try to get it from the CMA area anyway, instead of skipping to the much cheaper fallback. So the callers of dma_alloc_from_contiguous() could need that little optimization of skipping it if only one page is requested. For dma_generic_alloc_coherent https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Ddma_generic_alloc_coherentk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=d1852625e2ab2ff07eb34a7f33fc1f55f7f13959912d5a6ce9316d23070ce939 andintel_alloc_coherent https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3Dintel_alloc_coherentk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=82d587e9b6aeced5cf9a7caefa91bf47fba809f3522b7379d22e45a2d5d35ebd this seems easy to do. Looking at the arm arch variants, e.g., https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c%23L1194k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=4c178257eab9b5d7ca650dedba76cf27abeb49ddc7aebb9433f52b6c8bb3bbac and https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c%23L44k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=5f62f4cbe8cee1f1dd4cbba656354efe6867bcdc664cf90e9719e2f42a85de08 i'm not sure if it is that easily done, as there aren't any fallbacks for such a case and the code looks to me as if that's at least somewhat intentional. As far as TTM goes, one quick one-line fix to prevent it from using the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above methods) would be to clear the __GFP_WAIT https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAITk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc flag from the passed gfp_t flags. That would trigger the well working fallback. So, is __GFP_WAIT https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__GFP_WAITk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=d56d076770d3416264be6c9ea2829ac0d6951203696fa3ad04144f13307577bc needed for those single page allocations that go through__ttm_dma_alloc_page https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_pagek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0? It would be nice to have such a simple, non-intrusive one-line patch that we still could get into 3.17 and then backported to older stable kernels to avoid the same desktop hangs there if CMA is enabled. It would be also nice for actual users of CMA to not use up lots of CMA space for gpu's which don't need it. I think DMA_CMA was introduced around 3.12. I don't think that's a good idea. Omitting __GFP_WAIT would cause unnecessary memory allocation errors on systems under stress. I think this should be filed as a DMA subsystem kernel bug / regression and an appropriate solution should be worked out together
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
On 08/13/2014 03:50 AM, Michel Dänzer wrote: On 12.08.2014 00:17, Jerome Glisse wrote: On Mon, Aug 11, 2014 at 12:11:21PM +0200, Thomas Hellstrom wrote: On 08/10/2014 08:02 PM, Mario Kleiner wrote: On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: On 08/10/2014 05:11 AM, Mario Kleiner wrote: The other problem is that probably TTM does not reuse pages from the DMA pool. If i trace the __ttm_dma_alloc_page https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_pagek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0 and __ttm_dma_free_page https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/ident?i%3D__ttm_dma_alloc_pagek=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=7898522bba274e4dcc332735fbcf0c96e48918f60c2ee8e9a3e9c73ab3487bd0 calls for those single page allocs/frees, then over a 20 second interval of tracing and switching tabs in firefox, scrolling things around etc. i find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 frees. This is because historically the pools have been designed to keep only pages with nonstandard caching attributes since changing page caching attributes have been very slow but the kernel page allocators have been reasonably fast. /Thomas Ok. A bit more ftraceing showed my hang problem case goes through the if (is_cached) paths, so the pool doesn't recycle anything and i see it bouncing up and down by 4 pages all the time. But for the non-cached case, which i don't hit with my problem, could one of you look at line 954... https://urldefense.proofpoint.com/v1/url?u=http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c%23L954k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0Am=QQSN6uVpEiw6RuWLAfK%2FKWBFV5HspJUfDh4Y2mUz%2FH4%3D%0As=e15c51805d429ee6d8960d6b88035e9811a1cdbfbf13168eec2fbb2214b99c60 ... and tell me why that unconditional npages = count; assignment makes sense? It seems to essentially disable all recycling for the dma pool whenever the pool isn't filled up to/beyond its maximum with free pages? When the pool is filled up, lots of stuff is recycled, but when it is already somewhat below capacity, it gets punished by not getting refilled? I'd just like to understand the logic behind that line. thanks, -mario I'll happily forward that question to Konrad who wrote the code (or it may even stem from the ordinary page pool code which IIRC has Dave Airlie / Jerome Glisse as authors) This is effectively bogus code, i now wonder how it came to stay alive. Attached patch will fix that. I haven't tested Mario's scenario specifically, but it survived piglit and the UE4 Effects Cave Demo (for which 1GB of VRAM isn't enough, so some BOs ended up in GTT instead with write-combined CPU mappings) on radeonsi without any noticeable issues. Tested-by: Michel Dänzer michel.daen...@amd.com I haven't tested the patch yet. For the original bug it won't help directly, because the super-slow allocations which cause the desktop stall are tt_cached allocations, so they go through the if (is_cached) code path which isn't improved by Jerome's patch. is_cached always releases memory immediately, so the tt_cached pool just bounces up and down between 4 and 7 pages. So this was an independent issue. The slow allocations i noticed were mostly caused by exa allocating new gem bo's, i don't know which path is taken by 3d graphics? However, the fixed ttm path could indirectly solve the DMA_CMA stalls by completely killing CMA for its intended purpose. Typical CMA sizes are probably around 100 MB (kernel default is 16 MB, Ubuntu config is 64 MB), and the limit for the page pool seems to be more like 50% of all system RAM? Iow. if the ttm dma pool is allowed to grow that big with recycled pages, it probably will almost completely monopolize the whole CMA memory after a short amount of time. ttm won't suffer stalls if it essentially doesn't interact with CMA anymore after a warmup period, but actual clients which really need CMA (ie., hardware without scatter-gather dma etc.) will be starved of what they need as far as my limited understanding of the CMA goes. So fwiw probably the fix to ttm will increase the urgency for the CMA people to come up with a fix/optimization for the allocator. Unless it doesn't matter if most desktop systems have CMA disabled by default, and ttm is mostly used by desktop graphics drivers (nouveau, radeon, vmgfx)? I only stumbled over the problem because the Ubuntu 3.16 mainline testing kernels are compiled with CMA on. -mario -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: On 08/10/2014 05:11 AM, Mario Kleiner wrote: Resent this time without HTML formatting which lkml doesn't like. Sorry. On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: On August 9, 2014 1:39:39 AM EDT, Thomas Hellstrom wrote: Hi. Hey Thomas! IIRC I don't think the TTM DMA pool allocates coherent pages more than one page at a time, and _if that's true_ it's pretty unnecessary for the dma subsystem to route those allocations to CMA. Maybe Konrad could shed some light over this? It should allocate in batches and keep them in the TTM DMA pool for some time to be reused. The pages that it gets are in 4kb granularity though. Then I feel inclined to say this is a DMA subsystem bug. Single page allocations shouldn't get routed to CMA. /Thomas Yes, seems you're both right. I read through the code a bit more and indeed the TTM DMA pool allocates only one page during each dma_alloc_coherent() call, so it doesn't need CMA memory. The current allocators don't check for single page CMA allocations and therefore try to get it from the CMA area anyway, instead of skipping to the much cheaper fallback. So the callers of dma_alloc_from_contiguous() could need that little optimization of skipping it if only one page is requested. For dma_generic_alloc_coherent <http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent> andintel_alloc_coherent <http://lxr.free-electrons.com/ident?i=intel_alloc_coherent> this seems easy to do. Looking at the arm arch variants, e.g., http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194 and http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44 i'm not sure if it is that easily done, as there aren't any fallbacks for such a case and the code looks to me as if that's at least somewhat intentional. As far as TTM goes, one quick one-line fix to prevent it from using the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above methods) would be to clear the __GFP_WAIT <http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the passed gfp_t flags. That would trigger the well working fallback. So, is __GFP_WAIT <http://lxr.free-electrons.com/ident?i=__GFP_WAIT> needed for those single page allocations that go through__ttm_dma_alloc_page <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>? It would be nice to have such a simple, non-intrusive one-line patch that we still could get into 3.17 and then backported to older stable kernels to avoid the same desktop hangs there if CMA is enabled. It would be also nice for actual users of CMA to not use up lots of CMA space for gpu's which don't need it. I think DMA_CMA was introduced around 3.12. I don't think that's a good idea. Omitting __GFP_WAIT would cause unnecessary memory allocation errors on systems under stress. I think this should be filed as a DMA subsystem kernel bug / regression and an appropriate solution should be worked out together with the DMA subsystem maintainers and then backported. Ok, so it is needed. I'll file a bug report. The other problem is that probably TTM does not reuse pages from the DMA pool. If i trace the __ttm_dma_alloc_page <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and __ttm_dma_free_page <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for those single page allocs/frees, then over a 20 second interval of tracing and switching tabs in firefox, scrolling things around etc. i find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 frees. This is because historically the pools have been designed to keep only pages with nonstandard caching attributes since changing page caching attributes have been very slow but the kernel page allocators have been reasonably fast. /Thomas Ok. A bit more ftraceing showed my hang problem case goes through the "if (is_cached)" paths, so the pool doesn't recycle anything and i see it bouncing up and down by 4 pages all the time. But for the non-cached case, which i don't hit with my problem, could one of you look at line 954... http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954 ... and tell me why that unconditional npages = count; assignment makes sense? It seems to essentially disable all recycling for the dma pool whenever the pool isn't filled up to/beyond its maximum with free pages? When the pool is filled up, lots of stuff is recycled, but when it is already somewhat below capacity, it gets "punished" by not getting refilled? I'd just like to understand the logic behind that line. thanks, -mario This bit of code fromttm_dma_unpopulate <http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate>() (line 954 in 3.16) looks suspicious: http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954 Alloc'
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
On 08/10/2014 01:03 PM, Thomas Hellstrom wrote: On 08/10/2014 05:11 AM, Mario Kleiner wrote: Resent this time without HTML formatting which lkml doesn't like. Sorry. On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: On August 9, 2014 1:39:39 AM EDT, Thomas Hellstromthellst...@vmware.com wrote: Hi. Hey Thomas! IIRC I don't think the TTM DMA pool allocates coherent pages more than one page at a time, and _if that's true_ it's pretty unnecessary for the dma subsystem to route those allocations to CMA. Maybe Konrad could shed some light over this? It should allocate in batches and keep them in the TTM DMA pool for some time to be reused. The pages that it gets are in 4kb granularity though. Then I feel inclined to say this is a DMA subsystem bug. Single page allocations shouldn't get routed to CMA. /Thomas Yes, seems you're both right. I read through the code a bit more and indeed the TTM DMA pool allocates only one page during each dma_alloc_coherent() call, so it doesn't need CMA memory. The current allocators don't check for single page CMA allocations and therefore try to get it from the CMA area anyway, instead of skipping to the much cheaper fallback. So the callers of dma_alloc_from_contiguous() could need that little optimization of skipping it if only one page is requested. For dma_generic_alloc_coherent http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent andintel_alloc_coherent http://lxr.free-electrons.com/ident?i=intel_alloc_coherent this seems easy to do. Looking at the arm arch variants, e.g., http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194 and http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44 i'm not sure if it is that easily done, as there aren't any fallbacks for such a case and the code looks to me as if that's at least somewhat intentional. As far as TTM goes, one quick one-line fix to prevent it from using the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above methods) would be to clear the __GFP_WAIT http://lxr.free-electrons.com/ident?i=__GFP_WAIT flag from the passed gfp_t flags. That would trigger the well working fallback. So, is __GFP_WAIT http://lxr.free-electrons.com/ident?i=__GFP_WAIT needed for those single page allocations that go through__ttm_dma_alloc_page http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page? It would be nice to have such a simple, non-intrusive one-line patch that we still could get into 3.17 and then backported to older stable kernels to avoid the same desktop hangs there if CMA is enabled. It would be also nice for actual users of CMA to not use up lots of CMA space for gpu's which don't need it. I think DMA_CMA was introduced around 3.12. I don't think that's a good idea. Omitting __GFP_WAIT would cause unnecessary memory allocation errors on systems under stress. I think this should be filed as a DMA subsystem kernel bug / regression and an appropriate solution should be worked out together with the DMA subsystem maintainers and then backported. Ok, so it is needed. I'll file a bug report. The other problem is that probably TTM does not reuse pages from the DMA pool. If i trace the __ttm_dma_alloc_page http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page and __ttm_dma_free_page http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page calls for those single page allocs/frees, then over a 20 second interval of tracing and switching tabs in firefox, scrolling things around etc. i find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 frees. This is because historically the pools have been designed to keep only pages with nonstandard caching attributes since changing page caching attributes have been very slow but the kernel page allocators have been reasonably fast. /Thomas Ok. A bit more ftraceing showed my hang problem case goes through the if (is_cached) paths, so the pool doesn't recycle anything and i see it bouncing up and down by 4 pages all the time. But for the non-cached case, which i don't hit with my problem, could one of you look at line 954... http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954 ... and tell me why that unconditional npages = count; assignment makes sense? It seems to essentially disable all recycling for the dma pool whenever the pool isn't filled up to/beyond its maximum with free pages? When the pool is filled up, lots of stuff is recycled, but when it is already somewhat below capacity, it gets punished by not getting refilled? I'd just like to understand the logic behind that line. thanks, -mario This bit of code fromttm_dma_unpopulate http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate() (line 954 in 3.16) looks suspicious: http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954 Alloc's from a tt_cached cached pool ( if (is_cached)...) always get freed
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
Resent this time without HTML formatting which lkml doesn't like. Sorry. On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: On August 9, 2014 1:39:39 AM EDT, Thomas Hellstrom wrote: Hi. Hey Thomas! IIRC I don't think the TTM DMA pool allocates coherent pages more than one page at a time, and _if that's true_ it's pretty unnecessary for the dma subsystem to route those allocations to CMA. Maybe Konrad could shed some light over this? It should allocate in batches and keep them in the TTM DMA pool for some time to be reused. The pages that it gets are in 4kb granularity though. Then I feel inclined to say this is a DMA subsystem bug. Single page allocations shouldn't get routed to CMA. /Thomas Yes, seems you're both right. I read through the code a bit more and indeed the TTM DMA pool allocates only one page during each dma_alloc_coherent() call, so it doesn't need CMA memory. The current allocators don't check for single page CMA allocations and therefore try to get it from the CMA area anyway, instead of skipping to the much cheaper fallback. So the callers of dma_alloc_from_contiguous() could need that little optimization of skipping it if only one page is requested. For dma_generic_alloc_coherent <http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent> andintel_alloc_coherent <http://lxr.free-electrons.com/ident?i=intel_alloc_coherent> this seems easy to do. Looking at the arm arch variants, e.g., http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194 and http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44 i'm not sure if it is that easily done, as there aren't any fallbacks for such a case and the code looks to me as if that's at least somewhat intentional. As far as TTM goes, one quick one-line fix to prevent it from using the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above methods) would be to clear the __GFP_WAIT <http://lxr.free-electrons.com/ident?i=__GFP_WAIT> flag from the passed gfp_t flags. That would trigger the well working fallback. So, is __GFP_WAIT <http://lxr.free-electrons.com/ident?i=__GFP_WAIT> needed for those single page allocations that go through__ttm_dma_alloc_page <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page>? It would be nice to have such a simple, non-intrusive one-line patch that we still could get into 3.17 and then backported to older stable kernels to avoid the same desktop hangs there if CMA is enabled. It would be also nice for actual users of CMA to not use up lots of CMA space for gpu's which don't need it. I think DMA_CMA was introduced around 3.12. The other problem is that probably TTM does not reuse pages from the DMA pool. If i trace the __ttm_dma_alloc_page <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> and __ttm_dma_free_page <http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page> calls for those single page allocs/frees, then over a 20 second interval of tracing and switching tabs in firefox, scrolling things around etc. i find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 frees. This bit of code fromttm_dma_unpopulate <http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate>() (line 954 in 3.16) looks suspicious: http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954 Alloc's from a tt_cached cached pool ( if (is_cached)...) always get freed and are not given back to the cached pool. But in the uncached case, there's logic to make sure the pool doesn't grow forever (line 955, checking against _manager->options.max_size), but before that check in line 954 there's an uncoditional assignment of npages = count; which seems to force freeing all pages as well, instead of recycling? Is this some debug code left over, or intentional and just me not understanding what happens there? thanks, -mario /Thomas On 08/08/2014 07:42 PM, Mario Kleiner wrote: Hi all, there is a rather severe performance problem i accidentally found when trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under Ubuntu 14.04 LTS with nouveau as graphics driver. I was lazy and just installed the Ubuntu precompiled mainline kernel. That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA (contiguous memory allocator) size of 64 MB. Older Ubuntu kernels weren't compiled with CMA, so i only observed this on 3.16, but previous kernels would likely be affected too. After a few minutes of regular desktop use like switching workspaces, scrolling text in a terminal window, Firefox with multiple tabs open, Thunderbird etc. (tested with KDE/Kwin, with/without desktop composition), i get chunky desktop updates, then multi-second freezes, after a few minutes the desktop hangs for over a minute on almost any GUI action like switching windows etc.
Re: CONFIG_DMA_CMA causes ttm performance problems/hangs.
Resent this time without HTML formatting which lkml doesn't like. Sorry. On 08/09/2014 03:58 PM, Thomas Hellstrom wrote: On 08/09/2014 03:33 PM, Konrad Rzeszutek Wilk wrote: On August 9, 2014 1:39:39 AM EDT, Thomas Hellstromthellst...@vmware.com wrote: Hi. Hey Thomas! IIRC I don't think the TTM DMA pool allocates coherent pages more than one page at a time, and _if that's true_ it's pretty unnecessary for the dma subsystem to route those allocations to CMA. Maybe Konrad could shed some light over this? It should allocate in batches and keep them in the TTM DMA pool for some time to be reused. The pages that it gets are in 4kb granularity though. Then I feel inclined to say this is a DMA subsystem bug. Single page allocations shouldn't get routed to CMA. /Thomas Yes, seems you're both right. I read through the code a bit more and indeed the TTM DMA pool allocates only one page during each dma_alloc_coherent() call, so it doesn't need CMA memory. The current allocators don't check for single page CMA allocations and therefore try to get it from the CMA area anyway, instead of skipping to the much cheaper fallback. So the callers of dma_alloc_from_contiguous() could need that little optimization of skipping it if only one page is requested. For dma_generic_alloc_coherent http://lxr.free-electrons.com/ident?i=dma_generic_alloc_coherent andintel_alloc_coherent http://lxr.free-electrons.com/ident?i=intel_alloc_coherent this seems easy to do. Looking at the arm arch variants, e.g., http://lxr.free-electrons.com/source/arch/arm/mm/dma-mapping.c#L1194 and http://lxr.free-electrons.com/source/arch/arm64/mm/dma-mapping.c#L44 i'm not sure if it is that easily done, as there aren't any fallbacks for such a case and the code looks to me as if that's at least somewhat intentional. As far as TTM goes, one quick one-line fix to prevent it from using the CMA at least on SWIOTLB, NOMMU and Intel IOMMU (when using the above methods) would be to clear the __GFP_WAIT http://lxr.free-electrons.com/ident?i=__GFP_WAIT flag from the passed gfp_t flags. That would trigger the well working fallback. So, is __GFP_WAIT http://lxr.free-electrons.com/ident?i=__GFP_WAIT needed for those single page allocations that go through__ttm_dma_alloc_page http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page? It would be nice to have such a simple, non-intrusive one-line patch that we still could get into 3.17 and then backported to older stable kernels to avoid the same desktop hangs there if CMA is enabled. It would be also nice for actual users of CMA to not use up lots of CMA space for gpu's which don't need it. I think DMA_CMA was introduced around 3.12. The other problem is that probably TTM does not reuse pages from the DMA pool. If i trace the __ttm_dma_alloc_page http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page and __ttm_dma_free_page http://lxr.free-electrons.com/ident?i=__ttm_dma_alloc_page calls for those single page allocs/frees, then over a 20 second interval of tracing and switching tabs in firefox, scrolling things around etc. i find about as many alloc's as i find free's, e.g., 1607 allocs vs. 1648 frees. This bit of code fromttm_dma_unpopulate http://lxr.free-electrons.com/ident?i=ttm_dma_unpopulate() (line 954 in 3.16) looks suspicious: http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L954 Alloc's from a tt_cached cached pool ( if (is_cached)...) always get freed and are not given back to the cached pool. But in the uncached case, there's logic to make sure the pool doesn't grow forever (line 955, checking against _manager-options.max_size), but before that check in line 954 there's an uncoditional assignment of npages = count; which seems to force freeing all pages as well, instead of recycling? Is this some debug code left over, or intentional and just me not understanding what happens there? thanks, -mario /Thomas On 08/08/2014 07:42 PM, Mario Kleiner wrote: Hi all, there is a rather severe performance problem i accidentally found when trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under Ubuntu 14.04 LTS with nouveau as graphics driver. I was lazy and just installed the Ubuntu precompiled mainline kernel. That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA (contiguous memory allocator) size of 64 MB. Older Ubuntu kernels weren't compiled with CMA, so i only observed this on 3.16, but previous kernels would likely be affected too. After a few minutes of regular desktop use like switching workspaces, scrolling text in a terminal window, Firefox with multiple tabs open, Thunderbird etc. (tested with KDE/Kwin, with/without desktop composition), i get chunky desktop updates, then multi-second freezes, after a few minutes the desktop hangs for over a minute on almost any GUI action like switching windows etc. -- Unuseable. ftrace'ing shows the culprit
CONFIG_DMA_CMA causes ttm performance problems/hangs.
Hi all, there is a rather severe performance problem i accidentally found when trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under Ubuntu 14.04 LTS with nouveau as graphics driver. I was lazy and just installed the Ubuntu precompiled mainline kernel. That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA (contiguous memory allocator) size of 64 MB. Older Ubuntu kernels weren't compiled with CMA, so i only observed this on 3.16, but previous kernels would likely be affected too. After a few minutes of regular desktop use like switching workspaces, scrolling text in a terminal window, Firefox with multiple tabs open, Thunderbird etc. (tested with KDE/Kwin, with/without desktop composition), i get chunky desktop updates, then multi-second freezes, after a few minutes the desktop hangs for over a minute on almost any GUI action like switching windows etc. --> Unuseable. ftrace'ing shows the culprit being this callchain (typical good/bad example ftrace snippets at the end of this mail): ...ttm dma coherent memory allocations, e.g., from __ttm_dma_alloc_page() ... --> dma_alloc_coherent() --> platform specific hooks ... -> dma_generic_alloc_coherent() [on x86_64] --> dma_alloc_from_contiguous() dma_alloc_from_contiguous() is a no-op without CONFIG_DMA_CMA, or when the machine is booted with kernel boot cmdline parameter "cma=0", so it triggers the fast alloc_pages_node() fallback at least on x86_64. With CMA, this function becomes progressively more slow with every minute of desktop use, e.g., runtimes going up from < 0.3 usecs to hundreds or thousands of microseconds (before it gives up and alloc_pages_node() fallback is used), so this causes the multi-second/minute hangs of the desktop. So it seems ttm memory allocations quickly fragment and/or exhaust the CMA memory area, and dma_alloc_from_contiguous() tries very hard to find a fitting hole big enough to satisfy allocations with a retry loop (see http://lxr.free-electrons.com/source/drivers/base/dma-contiguous.c#L339) that takes forever. This is not good, also not for other devices which actually need a non-fragmented CMA for DMA, so what to do? I doubt most current gpus still need physically contiguous dma memory, maybe with exception of some embedded gpus? My naive approach would be to add a new gfp_t flag a la ___GFP_AVOIDCMA, and make callers of dma_alloc_from_contiguous() refrain from doing so if they have some fallback for getting memory. And then add that flag to ttm's ttm_dma_populate() gfp_flags, e.g., around here: http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L884 However i'm not familiar enough with memory management, so likely greater minds here have much better ideas on how to deal with this? thanks, -mario Typical snippet from an example trace of a badly stalling desktop with CMA (alloc_pages_node() fallback may have been missing in this traces ftrace_filter settings): 1) | ttm_dma_pool_get_pages [ttm]() { 1) | ttm_dma_page_pool_fill_locked [ttm]() { 1) | ttm_dma_pool_alloc_new_pages [ttm]() { 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1873.071 us | dma_alloc_from_contiguous(); 1) ! 1874.292 us | } 1) ! 1875.400 us |} 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1868.372 us | dma_alloc_from_contiguous(); 1) ! 1869.586 us | } 1) ! 1870.053 us |} 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1871.085 us | dma_alloc_from_contiguous(); 1) ! 1872.240 us | } 1) ! 1872.669 us |} 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1888.934 us | dma_alloc_from_contiguous(); 1) ! 1890.179 us | } 1) ! 1890.608 us |} 1) 0.048 us| ttm_set_pages_caching [ttm](); 1) ! 7511.000 us | } 1) ! 7511.306 us |} 1) ! 7511.623 us | } The good case (with cma=0 kernel cmdline, so dma_alloc_from_contiguous() no-ops,) 0) | ttm_dma_pool_get_pages [ttm]() { 0) | ttm_dma_page_pool_fill_locked [ttm]() { 0) | ttm_dma_pool_alloc_new_pages [ttm]() { 0) | __ttm_dma_alloc_page [ttm]() { 0) | dma_generic_alloc_coherent() { 0) 0.171 us| dma_alloc_from_contiguous(); 0) 0.849 us| __alloc_pages_nodemask(); 0) 3.029 us| } 0) 3.882 us|
CONFIG_DMA_CMA causes ttm performance problems/hangs.
Hi all, there is a rather severe performance problem i accidentally found when trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under Ubuntu 14.04 LTS with nouveau as graphics driver. I was lazy and just installed the Ubuntu precompiled mainline kernel. That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA (contiguous memory allocator) size of 64 MB. Older Ubuntu kernels weren't compiled with CMA, so i only observed this on 3.16, but previous kernels would likely be affected too. After a few minutes of regular desktop use like switching workspaces, scrolling text in a terminal window, Firefox with multiple tabs open, Thunderbird etc. (tested with KDE/Kwin, with/without desktop composition), i get chunky desktop updates, then multi-second freezes, after a few minutes the desktop hangs for over a minute on almost any GUI action like switching windows etc. -- Unuseable. ftrace'ing shows the culprit being this callchain (typical good/bad example ftrace snippets at the end of this mail): ...ttm dma coherent memory allocations, e.g., from __ttm_dma_alloc_page() ... -- dma_alloc_coherent() -- platform specific hooks ... - dma_generic_alloc_coherent() [on x86_64] -- dma_alloc_from_contiguous() dma_alloc_from_contiguous() is a no-op without CONFIG_DMA_CMA, or when the machine is booted with kernel boot cmdline parameter cma=0, so it triggers the fast alloc_pages_node() fallback at least on x86_64. With CMA, this function becomes progressively more slow with every minute of desktop use, e.g., runtimes going up from 0.3 usecs to hundreds or thousands of microseconds (before it gives up and alloc_pages_node() fallback is used), so this causes the multi-second/minute hangs of the desktop. So it seems ttm memory allocations quickly fragment and/or exhaust the CMA memory area, and dma_alloc_from_contiguous() tries very hard to find a fitting hole big enough to satisfy allocations with a retry loop (see http://lxr.free-electrons.com/source/drivers/base/dma-contiguous.c#L339) that takes forever. This is not good, also not for other devices which actually need a non-fragmented CMA for DMA, so what to do? I doubt most current gpus still need physically contiguous dma memory, maybe with exception of some embedded gpus? My naive approach would be to add a new gfp_t flag a la ___GFP_AVOIDCMA, and make callers of dma_alloc_from_contiguous() refrain from doing so if they have some fallback for getting memory. And then add that flag to ttm's ttm_dma_populate() gfp_flags, e.g., around here: http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L884 However i'm not familiar enough with memory management, so likely greater minds here have much better ideas on how to deal with this? thanks, -mario Typical snippet from an example trace of a badly stalling desktop with CMA (alloc_pages_node() fallback may have been missing in this traces ftrace_filter settings): 1) | ttm_dma_pool_get_pages [ttm]() { 1) | ttm_dma_page_pool_fill_locked [ttm]() { 1) | ttm_dma_pool_alloc_new_pages [ttm]() { 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1873.071 us | dma_alloc_from_contiguous(); 1) ! 1874.292 us | } 1) ! 1875.400 us |} 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1868.372 us | dma_alloc_from_contiguous(); 1) ! 1869.586 us | } 1) ! 1870.053 us |} 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1871.085 us | dma_alloc_from_contiguous(); 1) ! 1872.240 us | } 1) ! 1872.669 us |} 1) | __ttm_dma_alloc_page [ttm]() { 1) | dma_generic_alloc_coherent() { 1) ! 1888.934 us | dma_alloc_from_contiguous(); 1) ! 1890.179 us | } 1) ! 1890.608 us |} 1) 0.048 us| ttm_set_pages_caching [ttm](); 1) ! 7511.000 us | } 1) ! 7511.306 us |} 1) ! 7511.623 us | } The good case (with cma=0 kernel cmdline, so dma_alloc_from_contiguous() no-ops,) 0) | ttm_dma_pool_get_pages [ttm]() { 0) | ttm_dma_page_pool_fill_locked [ttm]() { 0) | ttm_dma_pool_alloc_new_pages [ttm]() { 0) | __ttm_dma_alloc_page [ttm]() { 0) | dma_generic_alloc_coherent() { 0) 0.171 us| dma_alloc_from_contiguous(); 0) 0.849 us| __alloc_pages_nodemask(); 0) 3.029 us| } 0) 3.882 us|
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 10/11/2013 03:30 PM, Sebastian Andrzej Siewior wrote: On 10/11/2013 02:37 PM, Steven Rostedt wrote: On Fri, 11 Oct 2013 12:18:00 +0200 Sebastian Andrzej Siewior wrote: * Mario Kleiner | 2013-09-26 18:16:47 [+0200]: Good! I will do that. Thanks for clarifying the irq and constraints on raw locks in the other thread. Are there any suggestions for "now"? preempt_disable_nort() like Luis suggesed? The preempt_disable_nort() is rather pointless, because the preempt_disable() was added specifically for -rt. When PREEMPT_RT is not enabled, preemption is disabled there already by the previous calls to spin_lock(). Either way. Then I remove the preempt_enable/disable call. Any objections? Good with me. I'm currently working on a replacement. -mario -- Steve Sebastian -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 10/11/2013 03:30 PM, Sebastian Andrzej Siewior wrote: On 10/11/2013 02:37 PM, Steven Rostedt wrote: On Fri, 11 Oct 2013 12:18:00 +0200 Sebastian Andrzej Siewior bige...@linutronix.de wrote: * Mario Kleiner | 2013-09-26 18:16:47 [+0200]: Good! I will do that. Thanks for clarifying the irq and constraints on raw locks in the other thread. Are there any suggestions for now? preempt_disable_nort() like Luis suggesed? The preempt_disable_nort() is rather pointless, because the preempt_disable() was added specifically for -rt. When PREEMPT_RT is not enabled, preemption is disabled there already by the previous calls to spin_lock(). Either way. Then I remove the preempt_enable/disable call. Any objections? Good with me. I'm currently working on a replacement. -mario -- Steve Sebastian -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 25.09.13 09:49, Ville Syrjälä wrote: On Wed, Sep 25, 2013 at 06:32:10AM +0200, Mario Kleiner wrote: On 23.09.13 10:38, Ville Syrjälä wrote: On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote: On 09/17/2013 10:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley wrote: The funny part is, there's a comment there that shows that this was done even for "PREEMPT_RT". Unfortunately, the call to "get_scanout_position()" can call functions that use the rt-mutex "sleeping spin locks" and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean "any other lock" that might be claimed "would also need to be raw"? Hopefully not "any other lock" already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(_priv->uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) & GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since we've added this in commit b6e45f866465f42b53d803b0c574da0fc508a0e9 Author: Keith Packard Date: Fri Jan 6 11:34:04 2012 -0800 drm/i915: Move reset forcewake processing to gen6_do_reset Reverting this change should be enough (code moved obviously a bit). Cheers, Daniel Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 25.09.13 16:13, Steven Rostedt wrote: On Wed, 25 Sep 2013 06:32:10 +0200 Mario Kleiner wrote: But given the new situation, your proposal is great! If we push the clock readouts into the get_scanoutpos routine, we can make this robust without causing grief for the rt people and without the need for a new separate lock for display regs in intel-kms. E.g., for intel-kms: i915_get_crtc_scanoutpos(..., ktime_t *stime, ktime_t *etime) { ... spin_lock_irqsave(...uncore.lock); preempt_disable(); *stime = ktime_get(); position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)); *etime = ktime_get(); preempt_enable(); spin_unlock_irqrestore(...uncore.lock) ... } With your patchset to reduce the amount of register reads needed in that function, and given that forcewake handling isn't needed for these registers, this should make it robust again and wouldn't need new locks. Unless ktime_get is also a bad thing to do in a preempt disabled section? ktime_get() works fine in preempt_disable sections, although it may add some latencies, but you shouldn't need to worry about it. I like this solution the best too, but if it does go in, I would ask to send us the patch for adding the preempt_disable() and we can add the preempt_disable_rt() to it. Why make mainline have a little more overhead? -- Steve Good! I will do that. Thanks for clarifying the irq and constraints on raw locks in the other thread. -mario ___ Intel-gfx mailing list intel-...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 25.09.13 16:13, Steven Rostedt wrote: On Wed, 25 Sep 2013 06:32:10 +0200 Mario Kleiner mario.klei...@tuebingen.mpg.de wrote: But given the new situation, your proposal is great! If we push the clock readouts into the get_scanoutpos routine, we can make this robust without causing grief for the rt people and without the need for a new separate lock for display regs in intel-kms. E.g., for intel-kms: i915_get_crtc_scanoutpos(..., ktime_t *stime, ktime_t *etime) { ... spin_lock_irqsave(...uncore.lock); preempt_disable(); *stime = ktime_get(); position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)); *etime = ktime_get(); preempt_enable(); spin_unlock_irqrestore(...uncore.lock) ... } With your patchset to reduce the amount of register reads needed in that function, and given that forcewake handling isn't needed for these registers, this should make it robust again and wouldn't need new locks. Unless ktime_get is also a bad thing to do in a preempt disabled section? ktime_get() works fine in preempt_disable sections, although it may add some latencies, but you shouldn't need to worry about it. I like this solution the best too, but if it does go in, I would ask to send us the patch for adding the preempt_disable() and we can add the preempt_disable_rt() to it. Why make mainline have a little more overhead? -- Steve Good! I will do that. Thanks for clarifying the irq and constraints on raw locks in the other thread. -mario ___ Intel-gfx mailing list intel-...@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 25.09.13 09:49, Ville Syrjälä wrote: On Wed, Sep 25, 2013 at 06:32:10AM +0200, Mario Kleiner wrote: On 23.09.13 10:38, Ville Syrjälä wrote: On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote: On 09/17/2013 10:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley pe...@hurleysoftware.com wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley pe...@hurleysoftware.com wrote: The funny part is, there's a comment there that shows that this was done even for PREEMPT_RT. Unfortunately, the call to get_scanout_position() can call functions that use the rt-mutex sleeping spin locks and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean any other lock that might be claimed would also need to be raw? Hopefully not any other lock already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(dev_priv-uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson ch...@chris-wilson.co.uk Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since we've added this in commit b6e45f866465f42b53d803b0c574da0fc508a0e9 Author: Keith Packard kei...@keithp.com Date: Fri Jan 6 11:34:04 2012 -0800 drm/i915: Move reset forcewake processing to gen6_do_reset Reverting this change should be enough (code moved obviously a bit). Cheers, Daniel Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 23.09.13 10:38, Ville Syrjälä wrote: On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote: On 09/17/2013 10:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley wrote: The funny part is, there's a comment there that shows that this was done even for "PREEMPT_RT". Unfortunately, the call to "get_scanout_position()" can call functions that use the rt-mutex "sleeping spin locks" and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean "any other lock" that might be claimed "would also need to be raw"? Hopefully not "any other lock" already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(_priv->uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) & GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since we've added this in commit b6e45f866465f42b53d803b0c574da0fc508a0e9 Author: Keith Packard Date: Fri Jan 6 11:34:04 2012 -0800 drm/i915: Move reset forcewake processing to gen6_do_reset Reverting this change should be enough (code moved obviously a bit). Cheers, Daniel Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should have mmio_idx_lock spinlock as raw] Ok, so register access must be serialized, at least within a register block
Re: [Intel-gfx] BUG: sleeping function called from invalid context on 3.10.10-rt7
On 23.09.13 10:38, Ville Syrjälä wrote: On Sat, Sep 21, 2013 at 12:07:36AM +0200, Mario Kleiner wrote: On 09/17/2013 10:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley pe...@hurleysoftware.com wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley pe...@hurleysoftware.com wrote: The funny part is, there's a comment there that shows that this was done even for PREEMPT_RT. Unfortunately, the call to get_scanout_position() can call functions that use the rt-mutex sleeping spin locks and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean any other lock that might be claimed would also need to be raw? Hopefully not any other lock already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(dev_priv-uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson ch...@chris-wilson.co.uk Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since we've added this in commit b6e45f866465f42b53d803b0c574da0fc508a0e9 Author: Keith Packard kei...@keithp.com Date: Fri Jan 6 11:34:04 2012 -0800 drm/i915: Move reset forcewake processing to gen6_do_reset Reverting this change should be enough (code moved obviously a bit). Cheers, Daniel Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should have mmio_idx_lock spinlock as raw] Ok, so register access must be serialized, at least within a register
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 09/17/2013 10:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley wrote: The funny part is, there's a comment there that shows that this was done even for "PREEMPT_RT". Unfortunately, the call to "get_scanout_position()" can call functions that use the rt-mutex "sleeping spin locks" and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean "any other lock" that might be claimed "would also need to be raw"? Hopefully not "any other lock" already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(_priv->uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) & GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since we've added this in commit b6e45f866465f42b53d803b0c574da0fc508a0e9 Author: Keith Packard Date: Fri Jan 6 11:34:04 2012 -0800 drm/i915: Move reset forcewake processing to gen6_do_reset Reverting this change should be enough (code moved obviously a bit). Cheers, Daniel Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should have mmio_idx_lock spinlock as raw] Ok, so register access must be serialized, at least within a register block, no way around it. Thanks for explaining this. That makes me a bit depressed. Is this also true for older hw generation
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 09/17/2013 10:55 PM, Daniel Vetter wrote: On Tue, Sep 17, 2013 at 9:50 PM, Peter Hurley pe...@hurleysoftware.com wrote: On 09/11/2013 03:31 PM, Peter Hurley wrote: [+cc dri-devel] On 09/11/2013 11:38 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 11:16:43 -0400 Peter Hurley pe...@hurleysoftware.com wrote: The funny part is, there's a comment there that shows that this was done even for PREEMPT_RT. Unfortunately, the call to get_scanout_position() can call functions that use the rt-mutex sleeping spin locks and it breaks there. I guess we need to ask the authors of the mainline patch exactly why that preempt_disable() is needed? The drm core associates a timestamp with each vertical blank frame #. Drm drivers can optionally support a 'high resolution' hw timestamp. The vblank frame #/timestamp tuple is user-space visible. The i915 drm driver supports a hw timestamp via this drm helper function which computes the timestamp from the crtc scan position (based on the pixel clock). For mainline, the preempt_disable/_enable() isn't actually necessary because every call tree that leads here already has preemption disabled. For -RT, the maybe i915 register spinlock (uncore.lock) should be raw? No, it should not. Note, any other lock that can be held when it is held would also need to be raw. By that, you mean any other lock that might be claimed would also need to be raw? Hopefully not any other lock already held? And by taking a quick audit of the code, I see this: spin_lock_irqsave(dev_priv-uncore.lock, irqflags); /* Reset the chip */ /* GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for * the read */ __raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_FULL); /* Spin waiting for the device to ack the reset request */ ret = wait_for((__raw_i915_read32(dev_priv, GEN6_GDRST) GEN6_GRDOM_FULL) == 0, 500); That spin is unacceptable in RT with preemption and interrupts disabled. Yep. That would be bad. AFAICT the registers read in i915_get_crtc_scanoutpos() aren't included in the force-wake set, so raw reads of the registers would probably be acceptable (thus obviating the need for claiming the uncore.lock). Except that _ALL_ register access is disabled with the uncore.lock during a gpu reset. Not sure if that's meant to include crtc registers or not, or what other synchronization/serialization issues are being handled/hidden by forcing all register accesses to wait during a gpu reset. Hopefully an i915 expert can weigh in here? Daniel, Can you shed some light on whether the i915+ crtc registers (specifically those in i915_get_crtc_scanoutpos() and i915_/gm45_get_vblank_counter()) read as part of the vblank counter/timestamp handling need to be prevented during gpu reset? The depency here in the locking is a recent addition: commit a7cd1b8fea2f341b626b255d9898a5ca5fabbf0a Author: Chris Wilson ch...@chris-wilson.co.uk Date: Fri Jul 19 20:36:51 2013 +0100 drm/i915: Serialize almost all register access It's a (slightly) oversized hammer to work around a hardware issue - we could break it down to register blocks, which can be accessed concurrently, but that tends to be more fragile. But the chip really dies if you access (even just reads) the same block concurrently :( We could try break the spinlock protected section a bit in the reset handler - register access on a hung gpu tends to be ill-defined anyway. The implied wait with preemption and interrupts disabled is causing grief in -RT, but also a 4ms wait inside an irq handler seems like a bad idea. Oops, the magic code in wait_for which is just there to make the imo totally misguided kgdb support work papered over the aweful long wait in atomic context ever since we've added this in commit b6e45f866465f42b53d803b0c574da0fc508a0e9 Author: Keith Packard kei...@keithp.com Date: Fri Jan 6 11:34:04 2012 -0800 drm/i915: Move reset forcewake processing to gen6_do_reset Reverting this change should be enough (code moved obviously a bit). Cheers, Daniel Regards, Peter Hurley What's the real issue here? That the vblank timestamp needs to be an accurate measurement of a realtime event. Sleeping/servicing interrupts while reading the registers necessary to compute the timestamp would be bad too. (edit: which hopefully Mario Kleiner clarified in his reply) My point earlier was three-fold: 1. Don't need the preempt_disable() for mainline: all callers are already holding interrupt-disabling spinlocks. 2. -RT still needs to prevent scheduling there. 3. the problem is i915-specific. [update: the radeon driver should also BUG like the i915 driver but probably should have mmio_idx_lock spinlock as raw] Ok, so register access must be serialized, at least within a register block, no way around it. Thanks for explaining this. That makes me a bit depressed. Is this also true for older hw
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 11.09.13 21:19, Steven Rostedt wrote: On Wed, 11 Sep 2013 21:07:10 +0200 Mario Kleiner wrote: On 11.09.13 20:35, Steven Rostedt wrote: On Wed, 11 Sep 2013 20:29:07 +0200 Mario Kleiner wrote: That said, maybe preempt_disable is no longer the optimal choice there and there's some better way to achieve good protection against interruptions of that bit of code? My knowledge here is a bit rusty, and the intel kms drivers and rt stuff has changed quite a bit. If you set your code to a higher priority than other tasks (and interrupts) than it wont be preempted there. Unless of course it blocks on a lock, but even then, priority inheritance will take place and it still should be rather quick. (unless the holder of the lock is doing that strange polling). -- Steve Right, on a rt kernel. But that creates the problem of not very computer savvy users (psychologists and biologists mostly) somehow having to choose proper priorities for gpu interrupt threads and for the x-server/wayland/..., and not much protection on a non-rt kernel? IIUC, the preempt_disable() is only for -rt, the non-rt case already disables preemption with the spin_locks called before it. Oh, right! should have thought about that. I'm quite sleepy, so my brain is not working very well atm. preempt_disable() a few years ago looked like a good "plug and play" default solution, because the ->get_crtc_scanoutpos() function was supposed to have a very low and bounded execution time. At the time we wrote the patches for intel/radeon/nouveau, that was the case. Typical execution time (= preempt off time) was like 1-4 usecs, even on very low end hardware. Seems that at least intel's kms driver does a lot of things now, which can sleep and spin inside that section? I tried to follow the posted stack trace, but got lost somewhere around the i915_read32 code and power management stuff... Note, the sleeps only happen on -rt, and not in mainline. If one is going to use -rt for real-time work, it requires a bit more knowledge of the system. The problem with RT in general, is that it's hard, and anyone telling you they have a generic RT system that requires no computer savvyness can also be selling you a bridge over the east river. -- Steve ;) - I know the problem, i spend a lot of time telling that to users of my software, although they then generally want some sort of bridges anyway. I'm maintaining one of the most popular open-source toolkits for neuro-science, and in my experience at least the field of neuro-science research has the problem that a lot of people there need good real-time behaviour and a lot of flexibility in their hardware and software setups, but very few have the necessary technical background. Given the limited money they can spend, there's also not much commercial interest or probably viability in providing good technical consulting. The few proprietary hardware solutions i know of are either unaffordable by the majority, or are bridges over the east river, or quite often both. My main motivation for luring my users to Linux and contributing some little bits sometimes is the hope that some problems can be solved in a better way at the system level than piling software workarounds on top of hardware workarounds on top of expensive equipment. But back to the topic, I think a better argument for the preempt_disable() there instead of changing code execution priority is that i wouldn't know how to set a static priority properly either. The timestamping code is also called from drm code (drmWaitVblank ioctl()) and it isn't called from the actual experiment software, where i would at least roughly know what i'm doing, and could adjust priorities dynamically, but from the X-Server, or maybe in the future Wayland, on behalf of the OpenGL client app. For the timestamping to work properly, one only would need a raised priority (higher than most interrupt kernel threads, except the one of the kms driver) for those few lines of timestamping code. I don't think it would be good to run xorg or wayland permanently at a higher priority than most irq threads, given that the display server does not only serve rt apps and is not designed as a realtime application. One only wants a short protection from preemption during timestamping. Sorry, i think i'm rambling here quite a bit and i didn't want to sidetrack the thread, just give some explanation why i think the preempt_disable() is (/was?) justified. -mario -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 11.09.13 17:16, Peter Hurley wrote: On 09/11/2013 09:26 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 07:28:09 -0300 "Luis Claudio R. Goncalves" wrote: Hello, I saw two different occurrences of "BUG: sleeping function called from invalid context" happening on 3.10.10-rt7. The first one happened on drm_vblank_get() -> i915_get_vblank_timestamp() and was flooding dmesg after Xorg started. I silenced that with a hackish patch, just for fun, and found a second problem, this time on tty_ldisc_reinit(). The logs: [ 23.973991] BUG: sleeping function called from invalid context at /home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659 [ 23.973992] in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg [ 23.973993] 2 locks held by Xorg/517: [ 23.973993] #0: [ 23.973994] ( [ 23.973994] >vbl_lock [ 23.973995] ){..} [ 23.973995] , at: [ 23.974007] [] drm_vblank_get+0x30/0x2b0 [drm] [ 23.974008] #1: [ 23.974008] ( [ 23.974008] >vblank_time_lock [ 23.974008] ){..} [ 23.974009] , at: [ 23.974013] [] drm_vblank_get+0xb1/0x2b0 [drm] [ 23.974013] Preemption disabled at: [ 23.974021] [] i915_get_vblank_timestamp+0x45/0xa0 [i915] [ 23.974023] CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5 [ 23.974024] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15 02/05/2013 [ 23.974024] 00070008 [ 23.974025] 88017a08bae0 [ 23.974025] 8164b790 [ 23.974025] 88017a08baf8 [ 23.974026] 8107e62f [ 23.974026] 880179840040 [ 23.974026] 88017a08bb18 [ 23.974027] 81651ac4 [ 23.974027] 0002 [ 23.974027] 88017984 [ 23.974028] 88017a08bb48 [ 23.974028] a0084e67 [ 23.974028] Call Trace: [ 23.974033] [] dump_stack+0x19/0x1b [ 23.974035] [] __might_sleep+0xff/0x170 [ 23.974037] [] rt_spin_lock+0x24/0x60 [ 23.974043] [] i915_read32+0x27/0x170 [i915] [ 23.974048] [] i915_pipe_enabled+0x31/0x40 [i915] [ 23.974054] [] i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915] [ 23.974056] [] ? sched_clock_cpu+0xb5/0x100 [ 23.974062] [] drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm] [ 23.974068] [] i915_get_vblank_timestamp+0x45/0xa0 [i915] [ 23.974073] [] drm_get_last_vbltimestamp+0x48/0x70 [drm] [ 23.974078] [] drm_vblank_get+0x185/0x2b0 [drm] [ 23.974079] [] ? sched_clock_cpu+0xb5/0x100 [ 23.974085] [] drm_wait_vblank+0x83/0x5d0 [drm] [ 23.974086] [] ? sched_clock_cpu+0xb5/0x100 [ 23.974088] [] ? __lock_acquire.isra.31+0x273/0xa70 [ 23.974093] [] drm_ioctl+0x552/0x6a0 [drm] [ 23.974096] [] ? avc_has_perm_flags+0x126/0x1c0 [ 23.974098] [] ? avc_has_perm_flags+0x28/0x1c0 [ 23.974099] [] ? put_lock_stats.isra.23+0xe/0x40 [ 23.974101] [] do_vfs_ioctl+0x325/0x5b0 [ 23.974103] [] ? file_has_perm+0x8e/0xa0 [ 23.974105] [] SyS_ioctl+0x81/0xa0 [ 23.974107] [] tracesys+0xdd/0xe2 and [ 25.423675] BUG: sleeping function called from invalid context at /home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659 [ 25.423676] in_atomic(): 1, irqs_disabled(): 1, pid: 188, name: plymouthd [ 25.423676] 3 locks held by plymouthd/188: [ 25.423682] #0: (>legacy_mutex){..}, at: [] tty_lock_nested+0x40/0x90 [ 25.423686] #1: (>ldisc_mutex){..}, at: [] tty_ldisc_hangup+0x152/0x300 [ 25.423688] #2: (tty_ldisc_lock){..}, at: [] tty_ldisc_reinit+0x72/0x130 [ 25.423689] Preemption disabled at:[] tty_ldisc_reinit+0x72/0x130 [ 25.423691] CPU: 2 PID: 188 Comm: plymouthd Not tainted 3.10.10-rt7+ #6 [ 25.423692] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15 02/05/2013 [ 25.423694] 005ff000 8801788ada68 8164b790 8801788ada80 [ 25.423695] 8107e62f 8801991ce1a0 8801788adaa0 81651ac4 [ 25.423696] ea0005e26680 8801788adaf8 81130984 [ 25.423696] Call Trace: [ 25.423700] [] dump_stack+0x19/0x1b [ 25.423702] [] __might_sleep+0xff/0x170 [ 25.423704] [] rt_spin_lock+0x24/0x60 [ 25.423707] [] free_hot_cold_page+0xb4/0x3c0 [ 25.423710] [] ?unfreeze_partials.isra.42+0x229/0x2b0 [ 25.423711] [] __free_pages+0x47/0x70 [ 25.423713] [] __free_memcg_kmem_pages+0x22/0x50 [ 25.423714] [] __free_slab+0xe8/0x1e0 [ 25.423716] [] free_delayed+0x34/0x50 [ 25.423717] [] __slab_free+0x273/0x36b [ 25.423719] [] ? get_lock_stats+0x19/0x60 [ 25.423721] [] ? put_lock_stats.isra.23+0xe/0x40 [ 25.423722] [] kfree+0x1c4/0x210 [ 25.423724] [] ? tty_ldisc_reinit+0xa5/0x130 [ 25.423725] [] tty_ldisc_reinit+0xa5/0x130 [ 25.423726] [] tty_ldisc_hangup+0x16f/0x300 [ 25.423728] [] ? get_parent_ip+0xd/0x50 [ 25.423731] [] ? unpin_current_cpu+0x16/0x70 [ 25.423732] [] __tty_hangup+0x346/0x460 [ 25.423733] [] tty_vhangup+0x10/0x20 [ 25.423735] [] pty_close+0x131/0x180 [ 25.423736] [] tty_release+0x11d/0x5f0 [ 25.423737] [] ? get_lock_stats+0x19/0x60 [ 25.423747] [] ?
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 11.09.13 20:35, Steven Rostedt wrote: On Wed, 11 Sep 2013 20:29:07 +0200 Mario Kleiner wrote: That said, maybe preempt_disable is no longer the optimal choice there and there's some better way to achieve good protection against interruptions of that bit of code? My knowledge here is a bit rusty, and the intel kms drivers and rt stuff has changed quite a bit. If you set your code to a higher priority than other tasks (and interrupts) than it wont be preempted there. Unless of course it blocks on a lock, but even then, priority inheritance will take place and it still should be rather quick. (unless the holder of the lock is doing that strange polling). -- Steve Right, on a rt kernel. But that creates the problem of not very computer savvy users (psychologists and biologists mostly) somehow having to choose proper priorities for gpu interrupt threads and for the x-server/wayland/..., and not much protection on a non-rt kernel? preempt_disable() a few years ago looked like a good "plug and play" default solution, because the ->get_crtc_scanoutpos() function was supposed to have a very low and bounded execution time. At the time we wrote the patches for intel/radeon/nouveau, that was the case. Typical execution time (= preempt off time) was like 1-4 usecs, even on very low end hardware. Seems that at least intel's kms driver does a lot of things now, which can sleep and spin inside that section? I tried to follow the posted stack trace, but got lost somewhere around the i915_read32 code and power management stuff... -mario -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 11.09.13 20:35, Steven Rostedt wrote: On Wed, 11 Sep 2013 20:29:07 +0200 Mario Kleiner mario.klei...@tuebingen.mpg.de wrote: That said, maybe preempt_disable is no longer the optimal choice there and there's some better way to achieve good protection against interruptions of that bit of code? My knowledge here is a bit rusty, and the intel kms drivers and rt stuff has changed quite a bit. If you set your code to a higher priority than other tasks (and interrupts) than it wont be preempted there. Unless of course it blocks on a lock, but even then, priority inheritance will take place and it still should be rather quick. (unless the holder of the lock is doing that strange polling). -- Steve Right, on a rt kernel. But that creates the problem of not very computer savvy users (psychologists and biologists mostly) somehow having to choose proper priorities for gpu interrupt threads and for the x-server/wayland/..., and not much protection on a non-rt kernel? preempt_disable() a few years ago looked like a good plug and play default solution, because the -get_crtc_scanoutpos() function was supposed to have a very low and bounded execution time. At the time we wrote the patches for intel/radeon/nouveau, that was the case. Typical execution time (= preempt off time) was like 1-4 usecs, even on very low end hardware. Seems that at least intel's kms driver does a lot of things now, which can sleep and spin inside that section? I tried to follow the posted stack trace, but got lost somewhere around the i915_read32 code and power management stuff... -mario -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 11.09.13 17:16, Peter Hurley wrote: On 09/11/2013 09:26 AM, Steven Rostedt wrote: On Wed, 11 Sep 2013 07:28:09 -0300 Luis Claudio R. Goncalves lclau...@uudg.org wrote: Hello, I saw two different occurrences of BUG: sleeping function called from invalid context happening on 3.10.10-rt7. The first one happened on drm_vblank_get() - i915_get_vblank_timestamp() and was flooding dmesg after Xorg started. I silenced that with a hackish patch, just for fun, and found a second problem, this time on tty_ldisc_reinit(). The logs: [ 23.973991] BUG: sleeping function called from invalid context at /home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659 [ 23.973992] in_atomic(): 1, irqs_disabled(): 0, pid: 517, name: Xorg [ 23.973993] 2 locks held by Xorg/517: [ 23.973993] #0: [ 23.973994] ( [ 23.973994] dev-vbl_lock [ 23.973995] ){..} [ 23.973995] , at: [ 23.974007] [a0024c60] drm_vblank_get+0x30/0x2b0 [drm] [ 23.974008] #1: [ 23.974008] ( [ 23.974008] dev-vblank_time_lock [ 23.974008] ){..} [ 23.974009] , at: [ 23.974013] [a0024ce1] drm_vblank_get+0xb1/0x2b0 [drm] [ 23.974013] Preemption disabled at: [ 23.974021] [a008bc95] i915_get_vblank_timestamp+0x45/0xa0 [i915] [ 23.974023] CPU: 3 PID: 517 Comm: Xorg Not tainted 3.10.10-rt7+ #5 [ 23.974024] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15 02/05/2013 [ 23.974024] 00070008 [ 23.974025] 88017a08bae0 [ 23.974025] 8164b790 [ 23.974025] 88017a08baf8 [ 23.974026] 8107e62f [ 23.974026] 880179840040 [ 23.974026] 88017a08bb18 [ 23.974027] 81651ac4 [ 23.974027] 0002 [ 23.974027] 88017984 [ 23.974028] 88017a08bb48 [ 23.974028] a0084e67 [ 23.974028] Call Trace: [ 23.974033] [8164b790] dump_stack+0x19/0x1b [ 23.974035] [8107e62f] __might_sleep+0xff/0x170 [ 23.974037] [81651ac4] rt_spin_lock+0x24/0x60 [ 23.974043] [a0084e67] i915_read32+0x27/0x170 [i915] [ 23.974048] [a008a591] i915_pipe_enabled+0x31/0x40 [i915] [ 23.974054] [a008a6be] i915_get_crtc_scanoutpos+0x3e/0x1b0 [i915] [ 23.974056] [81087025] ? sched_clock_cpu+0xb5/0x100 [ 23.974062] [a00245d4] drm_calc_vbltimestamp_from_scanoutpos+0xf4/0x430 [drm] [ 23.974068] [a008bc95] i915_get_vblank_timestamp+0x45/0xa0 [i915] [ 23.974073] [a0024998] drm_get_last_vbltimestamp+0x48/0x70 [drm] [ 23.974078] [a0024db5] drm_vblank_get+0x185/0x2b0 [drm] [ 23.974079] [81087025] ? sched_clock_cpu+0xb5/0x100 [ 23.974085] [a0025d03] drm_wait_vblank+0x83/0x5d0 [drm] [ 23.974086] [81087025] ? sched_clock_cpu+0xb5/0x100 [ 23.974088] [810af143] ? __lock_acquire.isra.31+0x273/0xa70 [ 23.974093] [a00212a2] drm_ioctl+0x552/0x6a0 [drm] [ 23.974096] [8128d886] ? avc_has_perm_flags+0x126/0x1c0 [ 23.974098] [8128d788] ? avc_has_perm_flags+0x28/0x1c0 [ 23.974099] [810ad49e] ? put_lock_stats.isra.23+0xe/0x40 [ 23.974101] [811a0095] do_vfs_ioctl+0x325/0x5b0 [ 23.974103] [8128fc3e] ? file_has_perm+0x8e/0xa0 [ 23.974105] [811a03a1] SyS_ioctl+0x81/0xa0 [ 23.974107] [8165a342] tracesys+0xdd/0xe2 and [ 25.423675] BUG: sleeping function called from invalid context at /home/lclaudio/SANDBOX/mrg-rt-v2-devel/kernel/rtmutex.c:659 [ 25.423676] in_atomic(): 1, irqs_disabled(): 1, pid: 188, name: plymouthd [ 25.423676] 3 locks held by plymouthd/188: [ 25.423682] #0: (tty-legacy_mutex){..}, at: [816528d0] tty_lock_nested+0x40/0x90 [ 25.423686] #1: (tty-ldisc_mutex){..}, at: [8139b482] tty_ldisc_hangup+0x152/0x300 [ 25.423688] #2: (tty_ldisc_lock){..}, at: [8139a9c2] tty_ldisc_reinit+0x72/0x130 [ 25.423689] Preemption disabled at:[8139a9c2] tty_ldisc_reinit+0x72/0x130 [ 25.423691] CPU: 2 PID: 188 Comm: plymouthd Not tainted 3.10.10-rt7+ #6 [ 25.423692] Hardware name: Hewlett-Packard p7-1512/2ADA, BIOS 8.15 02/05/2013 [ 25.423694] 005ff000 8801788ada68 8164b790 8801788ada80 [ 25.423695] 8107e62f 8801991ce1a0 8801788adaa0 81651ac4 [ 25.423696] ea0005e26680 8801788adaf8 81130984 [ 25.423696] Call Trace: [ 25.423700] [8164b790] dump_stack+0x19/0x1b [ 25.423702] [8107e62f] __might_sleep+0xff/0x170 [ 25.423704] [81651ac4] rt_spin_lock+0x24/0x60 [ 25.423707] [81130984] free_hot_cold_page+0xb4/0x3c0 [ 25.423710] [81178209] ?unfreeze_partials.isra.42+0x229/0x2b0 [ 25.423711] [81130dc7] __free_pages+0x47/0x70 [ 25.423713] [81130fb2] __free_memcg_kmem_pages+0x22/0x50 [ 25.423714] [81177528] __free_slab+0xe8/0x1e0 [ 25.423716] [81177654] free_delayed+0x34/0x50 [
Re: BUG: sleeping function called from invalid context on 3.10.10-rt7
On 11.09.13 21:19, Steven Rostedt wrote: On Wed, 11 Sep 2013 21:07:10 +0200 Mario Kleiner mario.klei...@tuebingen.mpg.de wrote: On 11.09.13 20:35, Steven Rostedt wrote: On Wed, 11 Sep 2013 20:29:07 +0200 Mario Kleiner mario.klei...@tuebingen.mpg.de wrote: That said, maybe preempt_disable is no longer the optimal choice there and there's some better way to achieve good protection against interruptions of that bit of code? My knowledge here is a bit rusty, and the intel kms drivers and rt stuff has changed quite a bit. If you set your code to a higher priority than other tasks (and interrupts) than it wont be preempted there. Unless of course it blocks on a lock, but even then, priority inheritance will take place and it still should be rather quick. (unless the holder of the lock is doing that strange polling). -- Steve Right, on a rt kernel. But that creates the problem of not very computer savvy users (psychologists and biologists mostly) somehow having to choose proper priorities for gpu interrupt threads and for the x-server/wayland/..., and not much protection on a non-rt kernel? IIUC, the preempt_disable() is only for -rt, the non-rt case already disables preemption with the spin_locks called before it. Oh, right! should have thought about that. I'm quite sleepy, so my brain is not working very well atm. preempt_disable() a few years ago looked like a good plug and play default solution, because the -get_crtc_scanoutpos() function was supposed to have a very low and bounded execution time. At the time we wrote the patches for intel/radeon/nouveau, that was the case. Typical execution time (= preempt off time) was like 1-4 usecs, even on very low end hardware. Seems that at least intel's kms driver does a lot of things now, which can sleep and spin inside that section? I tried to follow the posted stack trace, but got lost somewhere around the i915_read32 code and power management stuff... Note, the sleeps only happen on -rt, and not in mainline. If one is going to use -rt for real-time work, it requires a bit more knowledge of the system. The problem with RT in general, is that it's hard, and anyone telling you they have a generic RT system that requires no computer savvyness can also be selling you a bridge over the east river. -- Steve ;) - I know the problem, i spend a lot of time telling that to users of my software, although they then generally want some sort of bridges anyway. I'm maintaining one of the most popular open-source toolkits for neuro-science, and in my experience at least the field of neuro-science research has the problem that a lot of people there need good real-time behaviour and a lot of flexibility in their hardware and software setups, but very few have the necessary technical background. Given the limited money they can spend, there's also not much commercial interest or probably viability in providing good technical consulting. The few proprietary hardware solutions i know of are either unaffordable by the majority, or are bridges over the east river, or quite often both. My main motivation for luring my users to Linux and contributing some little bits sometimes is the hope that some problems can be solved in a better way at the system level than piling software workarounds on top of hardware workarounds on top of expensive equipment. But back to the topic, I think a better argument for the preempt_disable() there instead of changing code execution priority is that i wouldn't know how to set a static priority properly either. The timestamping code is also called from drm code (drmWaitVblank ioctl()) and it isn't called from the actual experiment software, where i would at least roughly know what i'm doing, and could adjust priorities dynamically, but from the X-Server, or maybe in the future Wayland, on behalf of the OpenGL client app. For the timestamping to work properly, one only would need a raised priority (higher than most interrupt kernel threads, except the one of the kms driver) for those few lines of timestamping code. I don't think it would be good to run xorg or wayland permanently at a higher priority than most irq threads, given that the display server does not only serve rt apps and is not designed as a realtime application. One only wants a short protection from preemption during timestamping. Sorry, i think i'm rambling here quite a bit and i didn't want to sidetrack the thread, just give some explanation why i think the preempt_disable() is (/was?) justified. -mario -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/