This is a note to let you know that I've just added the patch titled
drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus
to the 3.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
drm-i915-fix-spurious-eio-sigbus-on-wedged-gpus.patch
and it can be found in the queue-3.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <[email protected]> know about it.
>From 7abb690a0e095717420ba78dcab4309abbbec78a Mon Sep 17 00:00:00 2001
From: Daniel Vetter <[email protected]>
Date: Fri, 24 May 2013 21:29:32 +0200
Subject: drm/i915: Fix spurious -EIO/SIGBUS on wedged gpus
From: Daniel Vetter <[email protected]>
commit 7abb690a0e095717420ba78dcab4309abbbec78a upstream.
Chris Wilson noticed that since
commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c [v3.9]
Author: Daniel Vetter <[email protected]>
Date: Thu Nov 15 17:17:22 2012 +0100
drm/i915: clear up wedged transitions
X can again get -EIO when it does not expect it. And even worse score
a SIGBUS when accessing gtt mmaps. The established ABI is that we
_only_ return an -EIO from execbuf - all other ioctls should just
work. And since the reset code moves all bos out of gpu domains and
clears out all the last_seqno/ring tracking there really shouldn't be
any reason for non-execbuf code to ever touch the hw and see an -EIO.
After some extensive discussions we've noticed that these spurios -EIO
are caused by i915_gem_wait_for_error:
http://www.mail-archive.com/[email protected]/msg20540.html
That is easy to fix by returning 0 instead of -EIO, since grabbing the
dev->struct_mutex does not yet mean that we actually want to touch the
hw. And so there is no reason at all to fail with -EIO.
But that's not the entire since, since often (at least it's easily
googleable) dmesg indicates that the reset fails and we declare the
gpu wedged. Then, quite a bit later X wakes up with the "Timed out
waiting for the gpu reset to complete" DRM_ERROR message in
wait_for_errror and brings down the desktop with an -EIO/SIGBUS.
So clearly we're missing a wakeup somewhere, since the gpu reset just
doesn't take 10 seconds to complete. And indeed we're do handle the
terminally wedged state wrong.
Fix this all up.
References: https://bugs.freedesktop.org/show_bug.cgi?id=63921
References: https://bugs.freedesktop.org/show_bug.cgi?id=64073
Cc: Chris Wilson <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Damien Lespiau <[email protected]>
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -91,14 +91,11 @@ i915_gem_wait_for_error(struct i915_gpu_
{
int ret;
-#define EXIT_COND (!i915_reset_in_progress(error))
+#define EXIT_COND (!i915_reset_in_progress(error) || \
+ i915_terminally_wedged(error))
if (EXIT_COND)
return 0;
- /* GPU is already declared terminally dead, give up. */
- if (i915_terminally_wedged(error))
- return -EIO;
-
/*
* Only wait 10 seconds for the gpu reset to complete to avoid hanging
* userspace. If it takes that long something really bad is going on and
Patches currently in stable-queue which might be from [email protected] are
queue-3.9/drm-i915-sdvo-use-intel_sdvo-ddc-instead-of-intel_sdvo-i2c-for-ddc.patch
queue-3.9/drm-i915-fix-spurious-eio-sigbus-on-wedged-gpus.patch
queue-3.9/drm-i915-no-lvds-quirk-for-hp-t5740.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html