On Wed, Mar 03, 2010 at 05:52:56PM +0100, Geir Ove Myhr wrote: > >> With the patch from Chris Wilson, it should be sufficient to capture > >> only the file i915_error_state, but I guess we have to get the timing > >> right. The udev rule is only triggered when the kernel notices that > >> the GPU is hung, right? At that time the GPU is reset and this is > >> probably also the time that i915_error_state shows up. So I'm > >> wondering if we currently end up with recording a GPU dump of a > >> reinitialized GPU, which is not very useful. Maybe this would have > >> been obvious to me if I knew how to read the output of > >> intel_gpu_dump... > > I have read up a bit on intel_gpu_dump. Apparently, there was some > rationale for doing it the way it's currently done. I found this in > xserver-xorg-video-intel_2:2.9.0-1ubuntu2_2:2.9.0-1ubuntu4.diff.gz: > > --- xserver-xorg-video-intel-2.9.0.orig/debian/xserver-xorg-video-intel.udev > +++ xserver-xorg-video-intel-2.9.0/debian/xserver-xorg-video-intel.udev > @@ -0,0 +1,10 @@ > +# do not edit this file, it will be overwritten on update > + > +# Jesse Barnes on [email protected]: > +# You'll get three events, one when the error is detected, one before the > +# reset and one after. Each has a different environment variable set; the > +# initial error has ERROR=1, the pre-reset event has RESET=1 and the > +# post-reset event has ERROR=0. > + > + > +DRIVER=="i915, "ACTION=="change", ENV{ERROR}==1, > PROGRAM="/usr/share/apport/apport-gpu-error-intel.py" > > So the event is indeed triggered before the reset happens. At that > point intel_gpu_dump should give a useful dump and i915_error_state > will contain nothing useful yet. At some point, at least when the > capture-error-state patch is in the Ubuntu kernel, we should trigger > at ERROR=0 and capture i915_error_state (which can be decoded with > intel_error_decode from intel-gpu-tools in newest git).
Hrm, that sounds rather involved. I'm not sure how we could arrange to have apport file a bug with info from two separate invocations. Think of any way to collect it in one call? > > Jesse, here are a few examples of the dumps we're collecting now. ?Mind > > doublechecking that this are actually useful dumps? > > I'm not an expert, but it looks like they have some potentially useful > information. > > > ?https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/529702 > This one is maybe not so useful. The ringbuffer isn't shown. > > > ?https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/529410 > The ringbuffer is all zero (MI_NOOP), but PGTBL_ER: 0x00000010 > indicates that the hardware has detected an error. According the the > i965 PRM [1] it is "Invalid GTT Entry during Display A Fetch". > > > ?https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/528795 > PGTBL_ER: 0x00000029 > This one also has something in the Page Table Error register. Here, > bits 5, 3 and 0 are set. On i965 5 and 3 are reserved, but 0 means > "Invalid GTT Entry during Fetch on behalf of the Host". Great! I'm impressed you know how to decipher these - would you mind writing a paragraph or so on the freeze troubleshooting page in wiki about how to get this info? (And I wonder if there's some way we could automatically suss out the error in the apport script itself...?) > It will be interesting once we get the first bug reports from > xserver-xorg-video-intel 2.9.1-1ubuntu8, since that also should have > hardware information attached :-) Definitely > [1]: http://intellinuxgraphics.org/VOL_1_graphics_core.pdf > > -- > Ubuntu-x mailing list > [email protected] > Modify settings or unsubscribe at: > https://lists.ubuntu.com/mailman/listinfo/ubuntu-x -- Ubuntu-x mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-x
