Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Mon, 14 Jan 2013 10:49:08 +0100, Nikola Pajkovsky wrote: > Daniel Vetter writes: > > > On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky > > wrote: > >> Daniel Vetter writes: > >> > >>> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky > >>> wrote: > bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track > unbound pages") > >>> > >>> Could be a different bug, can you please attach the error_state somewhere? > >> > >> yep, i915_error_state is attached. btw, I'm going to bisect kernel, so > >> hopefully I will bring some commit. > > > > Different bug, on a quick lock this could be a dupe of > > https://bugzilla.kernel.org/show_bug.cgi?id=52311 > > ok > > > Chris should know the details. > > thanks, bisection leads me to commit d7d4eed ("drm/i915: Allow > DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers). It's not > possible to simply revert/test commit and I have no idea how i915 works. > > Chris any ideas? Userspace is failing to prepare the GPU to execute a WAIT_FOR_EVENT command, which it can only try if the kernel allows execution of privileged batch buffers. Option "SwapbuffersWait" "false" in xorg.conf will prevent the ddx from issuing the hanging command sequence. It is not clear yet what the missing ingredient is, I suspect the ddx needs to be more careful about not setting conditions that can never be met. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Daniel Vetter writes: > On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky wrote: >> Daniel Vetter writes: >> >>> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky >>> wrote: bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track unbound pages") >>> >>> Could be a different bug, can you please attach the error_state somewhere? >> >> yep, i915_error_state is attached. btw, I'm going to bisect kernel, so >> hopefully I will bring some commit. > > Different bug, on a quick lock this could be a dupe of > https://bugzilla.kernel.org/show_bug.cgi?id=52311 ok > Chris should know the details. thanks, bisection leads me to commit d7d4eed ("drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers). It's not possible to simply revert/test commit and I have no idea how i915 works. Chris any ideas? -- Nikola -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky wrote: > Daniel Vetter writes: > >> On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky >> wrote: >>> bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track >>> unbound pages") >> >> Could be a different bug, can you please attach the error_state somewhere? > > yep, i915_error_state is attached. btw, I'm going to bisect kernel, so > hopefully I will bring some commit. Different bug, on a quick lock this could be a dupe of https://bugzilla.kernel.org/show_bug.cgi?id=52311 Chris should know the details. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky npajk...@redhat.com wrote: Daniel Vetter daniel.vet...@ffwll.ch writes: On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky npajk...@redhat.com wrote: bug still kicking even w/ (drm/i915: Revert shrinker changes from Track unbound pages) Could be a different bug, can you please attach the error_state somewhere? yep, i915_error_state is attached. btw, I'm going to bisect kernel, so hopefully I will bring some commit. Different bug, on a quick lock this could be a dupe of https://bugzilla.kernel.org/show_bug.cgi?id=52311 Chris should know the details. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Daniel Vetter daniel.vet...@ffwll.ch writes: On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky npajk...@redhat.com wrote: Daniel Vetter daniel.vet...@ffwll.ch writes: On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky npajk...@redhat.com wrote: bug still kicking even w/ (drm/i915: Revert shrinker changes from Track unbound pages) Could be a different bug, can you please attach the error_state somewhere? yep, i915_error_state is attached. btw, I'm going to bisect kernel, so hopefully I will bring some commit. Different bug, on a quick lock this could be a dupe of https://bugzilla.kernel.org/show_bug.cgi?id=52311 ok Chris should know the details. thanks, bisection leads me to commit d7d4eed (drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers). It's not possible to simply revert/test commit and I have no idea how i915 works. Chris any ideas? -- Nikola -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Mon, 14 Jan 2013 10:49:08 +0100, Nikola Pajkovsky npajk...@redhat.com wrote: Daniel Vetter daniel.vet...@ffwll.ch writes: On Mon, Jan 14, 2013 at 7:58 AM, Nikola Pajkovsky npajk...@redhat.com wrote: Daniel Vetter daniel.vet...@ffwll.ch writes: On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky npajk...@redhat.com wrote: bug still kicking even w/ (drm/i915: Revert shrinker changes from Track unbound pages) Could be a different bug, can you please attach the error_state somewhere? yep, i915_error_state is attached. btw, I'm going to bisect kernel, so hopefully I will bring some commit. Different bug, on a quick lock this could be a dupe of https://bugzilla.kernel.org/show_bug.cgi?id=52311 ok Chris should know the details. thanks, bisection leads me to commit d7d4eed (drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers). It's not possible to simply revert/test commit and I have no idea how i915 works. Chris any ideas? Userspace is failing to prepare the GPU to execute a WAIT_FOR_EVENT command, which it can only try if the kernel allows execution of privileged batch buffers. Option SwapbuffersWait false in xorg.conf will prevent the ddx from issuing the hanging command sequence. It is not clear yet what the missing ingredient is, I suspect the ddx needs to be more careful about not setting conditions that can never be met. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky wrote: > bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track > unbound pages") Could be a different bug, can you please attach the error_state somewhere? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Dave Kleikamp writes: > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: >> >> I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. >> >> 00:02.0 VGA compatible controller [0300]: Intel Corporation Core >> Processor Integrated Graphics Controller [8086:0046] (rev 02) >> >> Thinkpad T410 >> >> Shaggy > > Daniel's patch: > > drm/i915: Revert shrinker changes from "Track unbound pages" > > fixes the problem for me. bug still kicking even w/ (drm/i915: Revert shrinker changes from "Track unbound pages") $ glxgears [ 429.656459] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 429.656463] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 429.665762] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring -- Nikola -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Dave Kleikamp dave.kleik...@oracle.com writes: On 01/09/2013 01:44 PM, Dave Kleikamp wrote: I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy Daniel's patch: drm/i915: Revert shrinker changes from Track unbound pages fixes the problem for me. bug still kicking even w/ (drm/i915: Revert shrinker changes from Track unbound pages) $ glxgears [ 429.656459] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [ 429.656463] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [ 429.665762] [drm:kick_ring] *ERROR* Kicking stuck wait on render ring -- Nikola -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Fri, Jan 11, 2013 at 6:26 PM, Nikola Pajkovsky npajk...@redhat.com wrote: bug still kicking even w/ (drm/i915: Revert shrinker changes from Track unbound pages) Could be a different bug, can you please attach the error_state somewhere? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Thu, Jan 10, 2013 at 11:07 AM, Chris Wilson wrote: > On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH wrote: >> On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: >> > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: >> > > >> > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. >> > > >> > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core >> > > Processor Integrated Graphics Controller [8086:0046] (rev 02) >> > > >> > > Thinkpad T410 >> > > >> > > Shaggy >> > >> > Daniel's patch: >> > >> > drm/i915: Revert shrinker changes from "Track unbound pages" >> > >> > fixes the problem for me. >> >> After an afternoon of multiple kernel builds and other stressful things, >> it looks like it fixes it for me as well. Chris, this will be going to >> Linus soon, right? > > Daniel will send it on. I hope before he does so, he will clarify the > changelog to note that it is just papering over the issue. If the > conjecture is right, it will not prevent that path from triggering the > hang, nor does it prevent other eviction paths from potentially causing > the same issue. In this case since the issue was papered over all the kernel up until 3.7, I think repapering is the answer for now. I have a novel idea maybe someone could spend some time working out what is broken in private on a test box instead of making everyone who runs 3.7 and 3.8 on ILK deal with it. I of course know this won't happen and I'll be reverting patches from you guys that cause Ironlake flakyness for ever. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH wrote: > On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: > > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: > > > > > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. > > > > > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core > > > Processor Integrated Graphics Controller [8086:0046] (rev 02) > > > > > > Thinkpad T410 > > > > > > Shaggy > > > > Daniel's patch: > > > > drm/i915: Revert shrinker changes from "Track unbound pages" > > > > fixes the problem for me. > > After an afternoon of multiple kernel builds and other stressful things, > it looks like it fixes it for me as well. Chris, this will be going to > Linus soon, right? Daniel will send it on. I hope before he does so, he will clarify the changelog to note that it is just papering over the issue. If the conjecture is right, it will not prevent that path from triggering the hang, nor does it prevent other eviction paths from potentially causing the same issue. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: > > > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. > > > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core > > Processor Integrated Graphics Controller [8086:0046] (rev 02) > > > > Thinkpad T410 > > > > Shaggy > > Daniel's patch: > > drm/i915: Revert shrinker changes from "Track unbound pages" > > fixes the problem for me. After an afternoon of multiple kernel builds and other stressful things, it looks like it fixes it for me as well. Chris, this will be going to Linus soon, right? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: > On 01/09/2013 01:44 PM, Dave Kleikamp wrote: > > > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. > > > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core > > Processor Integrated Graphics Controller [8086:0046] (rev 02) > > > > Thinkpad T410 > > > > Shaggy > > Daniel's patch: > > drm/i915: Revert shrinker changes from "Track unbound pages" > > fixes the problem for me. Thanks for the hint, I'll go try that right now... greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On 01/09/2013 01:44 PM, Dave Kleikamp wrote: > > I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. > > 00:02.0 VGA compatible controller [0300]: Intel Corporation Core > Processor Integrated Graphics Controller [8086:0046] (rev 02) > > Thinkpad T410 > > Shaggy Daniel's patch: drm/i915: Revert shrinker changes from "Track unbound pages" fixes the problem for me. Thanks, Shaggy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On 01/09/2013 01:28 AM, Lijo Antony wrote: > On 01/09/2013 09:31 AM, Dave Airlie wrote: >> On Wed, Jan 9, 2013 at 2:25 PM, Greg KH >> wrote: >>> On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: >> Hi all, >> >> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: >> >> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer >> elapsed... GPU hung >> [11868.414655] [drm] capturing error event; look for more >> information in /debug/dri/0/i915_error_state >> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer >> elapsed... GPU hung >> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, >> declaring wedged! >> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. >> [11883.083225] gnome-shell[19396]: segfault at 218 ip >> 7feef5f32333 sp 7c1dc930 error 4 in >> i965_dri.so[7feef5ecb000+d] > > I just hit this again. And, as the kernel was asking for it, attached > is the i915_error_state file, compressed due to the size of it. > Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, >>> >>> Ugh, what a mess. >>> Assuming you have an Ironlake machine which I'm going to guess you do. >>> >>> I don't know, it's an old i5 machine that has never had any video >>> problems for many years now. How do I tell? >> >> lspci -nn probably an 8086:0046 device. >> >> Old i5 probably means original i5 which means ironlake. >> > > I have also seen this a couple of times on 3.7 and 3.8-rc1. > Most of the times I was watching youtube video in chrome. Nothing > crashed though(I am not running gnome shell). System recovered after few > seconds. > > I didn't see this on 3.8-rc2 yet, probably because I haven't watched any > video. I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy > > -lijo > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On 01/09/2013 01:28 AM, Lijo Antony wrote: On 01/09/2013 09:31 AM, Dave Airlie wrote: On Wed, Jan 9, 2013 at 2:25 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] I just hit this again. And, as the kernel was asking for it, attached is the i915_error_state file, compressed due to the size of it. Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Ugh, what a mess. Assuming you have an Ironlake machine which I'm going to guess you do. I don't know, it's an old i5 machine that has never had any video problems for many years now. How do I tell? lspci -nn probably an 8086:0046 device. Old i5 probably means original i5 which means ironlake. I have also seen this a couple of times on 3.7 and 3.8-rc1. Most of the times I was watching youtube video in chrome. Nothing crashed though(I am not running gnome shell). System recovered after few seconds. I didn't see this on 3.8-rc2 yet, probably because I haven't watched any video. I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy -lijo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On 01/09/2013 01:44 PM, Dave Kleikamp wrote: I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy Daniel's patch: drm/i915: Revert shrinker changes from Track unbound pages fixes the problem for me. Thanks, Shaggy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: On 01/09/2013 01:44 PM, Dave Kleikamp wrote: I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy Daniel's patch: drm/i915: Revert shrinker changes from Track unbound pages fixes the problem for me. Thanks for the hint, I'll go try that right now... greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: On 01/09/2013 01:44 PM, Dave Kleikamp wrote: I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy Daniel's patch: drm/i915: Revert shrinker changes from Track unbound pages fixes the problem for me. After an afternoon of multiple kernel builds and other stressful things, it looks like it fixes it for me as well. Chris, this will be going to Linus soon, right? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: On 01/09/2013 01:44 PM, Dave Kleikamp wrote: I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy Daniel's patch: drm/i915: Revert shrinker changes from Track unbound pages fixes the problem for me. After an afternoon of multiple kernel builds and other stressful things, it looks like it fixes it for me as well. Chris, this will be going to Linus soon, right? Daniel will send it on. I hope before he does so, he will clarify the changelog to note that it is just papering over the issue. If the conjecture is right, it will not prevent that path from triggering the hang, nor does it prevent other eviction paths from potentially causing the same issue. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Thu, Jan 10, 2013 at 11:07 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: On Wed, 9 Jan 2013 16:40:25 -0800, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jan 09, 2013 at 02:12:04PM -0600, Dave Kleikamp wrote: On 01/09/2013 01:44 PM, Dave Kleikamp wrote: I can easily reproduce it running glxgears on 3.8-rc1 or 3.8-rc2. 00:02.0 VGA compatible controller [0300]: Intel Corporation Core Processor Integrated Graphics Controller [8086:0046] (rev 02) Thinkpad T410 Shaggy Daniel's patch: drm/i915: Revert shrinker changes from Track unbound pages fixes the problem for me. After an afternoon of multiple kernel builds and other stressful things, it looks like it fixes it for me as well. Chris, this will be going to Linus soon, right? Daniel will send it on. I hope before he does so, he will clarify the changelog to note that it is just papering over the issue. If the conjecture is right, it will not prevent that path from triggering the hang, nor does it prevent other eviction paths from potentially causing the same issue. In this case since the issue was papered over all the kernel up until 3.7, I think repapering is the answer for now. I have a novel idea maybe someone could spend some time working out what is broken in private on a test box instead of making everyone who runs 3.7 and 3.8 on ILK deal with it. I of course know this won't happen and I'll be reverting patches from you guys that cause Ironlake flakyness for ever. Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On 01/09/2013 09:31 AM, Dave Airlie wrote: On Wed, Jan 9, 2013 at 2:25 PM, Greg KH wrote: On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] I just hit this again. And, as the kernel was asking for it, attached is the i915_error_state file, compressed due to the size of it. Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Ugh, what a mess. Assuming you have an Ironlake machine which I'm going to guess you do. I don't know, it's an old i5 machine that has never had any video problems for many years now. How do I tell? lspci -nn probably an 8086:0046 device. Old i5 probably means original i5 which means ironlake. I have also seen this a couple of times on 3.7 and 3.8-rc1. Most of the times I was watching youtube video in chrome. Nothing crashed though(I am not running gnome shell). System recovered after few seconds. I didn't see this on 3.8-rc2 yet, probably because I haven't watched any video. -lijo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 9, 2013 at 2:25 PM, Greg KH wrote: > On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: >> >> Hi all, >> >> >> >> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: >> >> >> >> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer >> >> elapsed... GPU hung >> >> [11868.414655] [drm] capturing error event; look for more information in >> >> /debug/dri/0/i915_error_state >> >> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer >> >> elapsed... GPU hung >> >> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring >> >> wedged! >> >> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. >> >> [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp >> >> 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] >> > >> > I just hit this again. And, as the kernel was asking for it, attached >> > is the i915_error_state file, compressed due to the size of it. >> > >> Welcome to sink hole that is >> https://bugs.freedesktop.org/show_bug.cgi?id=55984 >> >> 3 months and ticking, Intel guys are all running away from it saying >> they can't reproduce, everyone else on planet seems to reproduce quite >> easily. >> >> Its generally considered a bug in the relocation/shrinker/no idea category, > > Ugh, what a mess. > >> Assuming you have an Ironlake machine which I'm going to guess you do. > > I don't know, it's an old i5 machine that has never had any video > problems for many years now. How do I tell? lspci -nn probably an 8086:0046 device. Old i5 probably means original i5 which means ironlake. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: > >> Hi all, > >> > >> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: > >> > >> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer > >> elapsed... GPU hung > >> [11868.414655] [drm] capturing error event; look for more information in > >> /debug/dri/0/i915_error_state > >> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer > >> elapsed... GPU hung > >> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring > >> wedged! > >> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. > >> [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp > >> 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] > > > > I just hit this again. And, as the kernel was asking for it, attached > > is the i915_error_state file, compressed due to the size of it. > > > Welcome to sink hole that is > https://bugs.freedesktop.org/show_bug.cgi?id=55984 > > 3 months and ticking, Intel guys are all running away from it saying > they can't reproduce, everyone else on planet seems to reproduce quite > easily. > > Its generally considered a bug in the relocation/shrinker/no idea category, Ugh, what a mess. > Assuming you have an Ironlake machine which I'm going to guess you do. I don't know, it's an old i5 machine that has never had any video problems for many years now. How do I tell? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
>> Hi all, >> >> I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: >> >> [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... >> GPU hung >> [11868.414655] [drm] capturing error event; look for more information in >> /debug/dri/0/i915_error_state >> [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... >> GPU hung >> [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring >> wedged! >> [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. >> [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp >> 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] > > I just hit this again. And, as the kernel was asking for it, attached > is the i915_error_state file, compressed due to the size of it. > Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Assuming you have an Ironlake machine which I'm going to guess you do. Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] When it happens, gnome-shell dies a horrible death and it requires a reboot in order to get xorg working properly again (probably because gnome-shell is hosed.) The machine does still work to do other things from a text console (I'm writing this on the machine after the last time this happened.) It seems to happen when doing a "stressful" thing on the machine (i.e. multiple kernel builds at the same time). I also seem to be able to hit this on 3.7.1, but not as regularly, and not at all on 3.6.y. Any hints or ideas of what to try out? thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] When it happens, gnome-shell dies a horrible death and it requires a reboot in order to get xorg working properly again (probably because gnome-shell is hosed.) The machine does still work to do other things from a text console (I'm writing this on the machine after the last time this happened.) It seems to happen when doing a stressful thing on the machine (i.e. multiple kernel builds at the same time). I also seem to be able to hit this on 3.7.1, but not as regularly, and not at all on 3.6.y. Any hints or ideas of what to try out? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] I just hit this again. And, as the kernel was asking for it, attached is the i915_error_state file, compressed due to the size of it. Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Assuming you have an Ironlake machine which I'm going to guess you do. Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] I just hit this again. And, as the kernel was asking for it, attached is the i915_error_state file, compressed due to the size of it. Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Ugh, what a mess. Assuming you have an Ironlake machine which I'm going to guess you do. I don't know, it's an old i5 machine that has never had any video problems for many years now. How do I tell? thanks, greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On Wed, Jan 9, 2013 at 2:25 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] I just hit this again. And, as the kernel was asking for it, attached is the i915_error_state file, compressed due to the size of it. Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Ugh, what a mess. Assuming you have an Ironlake machine which I'm going to guess you do. I don't know, it's an old i5 machine that has never had any video problems for many years now. How do I tell? lspci -nn probably an 8086:0046 device. Old i5 probably means original i5 which means ironlake. Dave. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i915_hangcheck_hung problem with 3.8-rc2+ (Linus's latest tree)
On 01/09/2013 09:31 AM, Dave Airlie wrote: On Wed, Jan 9, 2013 at 2:25 PM, Greg KH gre...@linuxfoundation.org wrote: On Wed, Jan 09, 2013 at 01:42:39PM +1000, Dave Airlie wrote: Hi all, I've hit this 3 times today on Linus's latest 3.8-rc2+ tree: [11868.414648] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11868.414655] [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state [11870.408342] [drm:i915_hangcheck_hung] *ERROR* Hangcheck timer elapsed... GPU hung [11870.408412] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring wedged! [11870.408414] [drm:i915_reset] *ERROR* Failed to reset chip. [11883.083225] gnome-shell[19396]: segfault at 218 ip 7feef5f32333 sp 7c1dc930 error 4 in i965_dri.so[7feef5ecb000+d] I just hit this again. And, as the kernel was asking for it, attached is the i915_error_state file, compressed due to the size of it. Welcome to sink hole that is https://bugs.freedesktop.org/show_bug.cgi?id=55984 3 months and ticking, Intel guys are all running away from it saying they can't reproduce, everyone else on planet seems to reproduce quite easily. Its generally considered a bug in the relocation/shrinker/no idea category, Ugh, what a mess. Assuming you have an Ironlake machine which I'm going to guess you do. I don't know, it's an old i5 machine that has never had any video problems for many years now. How do I tell? lspci -nn probably an 8086:0046 device. Old i5 probably means original i5 which means ironlake. I have also seen this a couple of times on 3.7 and 3.8-rc1. Most of the times I was watching youtube video in chrome. Nothing crashed though(I am not running gnome shell). System recovered after few seconds. I didn't see this on 3.8-rc2 yet, probably because I haven't watched any video. -lijo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/