Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-08-07 Thread Daniel Vetter
On Thu, Aug 07, 2014 at 02:47:21PM +0200, Jiri Kosina wrote:
> On Fri, 11 Jul 2014, Pavel Machek wrote:
> 
> > > > > Ok, so I have set up machines for ktest / autobisect, and found out 
> > > > > that 
> > > > > 3.16-rc1 no longer has that problem. Oh well, bisect would not be 
> > > > > fun, 
> > > > > anyway...
> > > > 
> > > > I am still seeing the problem with 3.16-rc2.
> > > 
> > > I'm confused now. Is the bisect result
> > > 
> > > commit 773875bfb6737982903c42d1ee88cf60af80089c
> > > Author: Daniel Vetter 
> > > Date:   Mon Jan 27 10:00:30 2014 +0100
> > > 
> > > drm/i915: Don't set the 8to6 dither flag when not scaling
> > > 
> > > now the culprit or not? Or do we have 2 different bugs at hand here?
> > 
> > Three different issues, it seems. Two ring initialization problems,
> > one went away in 3.16 (for me), second did not (suspend for jikos),
> > third -- trivial issue with 8to6 dither.
> 
> The patch below seems to finally cure the problem at my system; I've just 
> attached it to freedesktop bugzilla, but sending it to this thread as well 
> to hopefully get as much testing coverage by affected people as possible.
> 
> I am going on with testing whether it really completely fixes the problem 
> or just made it less likely.

Picked up for -fixes, thanks for the patch.
-Daniel
> 
> 
> 
> 
> 
> From: Jiri Kosina 
> Subject: [PATCH] drm/i915: read HEAD register back in init_ring_common() to 
> enforce ordering
> 
> Withtout this, ring initialization fails reliabily during resume with
> 
>   [drm:init_ring_common] *ERROR* render ring initialization failed ctl 
> 0001f001 head ff8804 tail  start 000e4000
> 
> Signed-off-by: Jiri Kosina 
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 279488a..7add7ee 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -517,6 +517,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
>   else
>   ring_setup_phys_status_page(ring);
>  
> + /* Enforce ordering by reading HEAD register back */
> + I915_READ_HEAD(ring);
> +
>   /* Initialize the ring. This must happen _after_ we've cleared the ring
>* registers with the above sequence (the readback of the HEAD registers
>* also enforces ordering), otherwise the hw might lose the new ring
> 
> -- 
> Jiri Kosina
> SUSE Labs
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-08-07 Thread Jiri Kosina
On Thu, 7 Aug 2014, Jiri Kosina wrote:

> The patch below seems to finally cure the problem at my system; I've just 
> attached it to freedesktop bugzilla, but sending it to this thread as well 
> to hopefully get as much testing coverage by affected people as possible.
> 
> I am going on with testing whether it really completely fixes the problem 
> or just made it less likely.

Okay, after 31 suspend-resume cycles, the problem appeared again (while 
without the patch, it triggers with 100% reliability). So it's not a 
complete fix, it just makes the problem much less visible.

Going back to bugzilla discussion.

> 
> From: Jiri Kosina 
> Subject: [PATCH] drm/i915: read HEAD register back in init_ring_common() to 
> enforce ordering
> 
> Withtout this, ring initialization fails reliabily during resume with
> 
>   [drm:init_ring_common] *ERROR* render ring initialization failed ctl 
> 0001f001 head ff8804 tail  start 000e4000
> 
> Signed-off-by: Jiri Kosina 
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
> b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 279488a..7add7ee 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -517,6 +517,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
>   else
>   ring_setup_phys_status_page(ring);
>  
> + /* Enforce ordering by reading HEAD register back */
> + I915_READ_HEAD(ring);
> +
>   /* Initialize the ring. This must happen _after_ we've cleared the ring
>* registers with the above sequence (the readback of the HEAD registers
>* also enforces ordering), otherwise the hw might lose the new ring

-- 
Jiri Kosina
SUSE Labs
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-08-07 Thread Jiri Kosina
On Fri, 11 Jul 2014, Pavel Machek wrote:

> > > > Ok, so I have set up machines for ktest / autobisect, and found out 
> > > > that 
> > > > 3.16-rc1 no longer has that problem. Oh well, bisect would not be fun, 
> > > > anyway...
> > > 
> > > I am still seeing the problem with 3.16-rc2.
> > 
> > I'm confused now. Is the bisect result
> > 
> > commit 773875bfb6737982903c42d1ee88cf60af80089c
> > Author: Daniel Vetter 
> > Date:   Mon Jan 27 10:00:30 2014 +0100
> > 
> > drm/i915: Don't set the 8to6 dither flag when not scaling
> > 
> > now the culprit or not? Or do we have 2 different bugs at hand here?
> 
> Three different issues, it seems. Two ring initialization problems,
> one went away in 3.16 (for me), second did not (suspend for jikos),
> third -- trivial issue with 8to6 dither.

The patch below seems to finally cure the problem at my system; I've just 
attached it to freedesktop bugzilla, but sending it to this thread as well 
to hopefully get as much testing coverage by affected people as possible.

I am going on with testing whether it really completely fixes the problem 
or just made it less likely.





From: Jiri Kosina 
Subject: [PATCH] drm/i915: read HEAD register back in init_ring_common() to 
enforce ordering

Withtout this, ring initialization fails reliabily during resume with

[drm:init_ring_common] *ERROR* render ring initialization failed ctl 
0001f001 head ff8804 tail  start 000e4000

Signed-off-by: Jiri Kosina 
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279488a..7add7ee 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -517,6 +517,9 @@ static int init_ring_common(struct intel_engine_cs *ring)
else
ring_setup_phys_status_page(ring);
 
+   /* Enforce ordering by reading HEAD register back */
+   I915_READ_HEAD(ring);
+
/* Initialize the ring. This must happen _after_ we've cleared the ring
 * registers with the above sequence (the readback of the HEAD registers
 * also enforces ordering), otherwise the hw might lose the new ring

-- 
Jiri Kosina
SUSE Labs
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-07-11 Thread Jiri Kosina
On Fri, 11 Jul 2014, Pavel Machek wrote:

> > > > Ok, so I have set up machines for ktest / autobisect, and found out 
> > > > that 
> > > > 3.16-rc1 no longer has that problem. Oh well, bisect would not be fun, 
> > > > anyway...
> > > 
> > > I am still seeing the problem with 3.16-rc2.
> > 
> > I'm confused now. Is the bisect result
> > 
> > commit 773875bfb6737982903c42d1ee88cf60af80089c
> > Author: Daniel Vetter 
> > Date:   Mon Jan 27 10:00:30 2014 +0100
> > 
> > drm/i915: Don't set the 8to6 dither flag when not scaling
> > 
> > now the culprit or not? Or do we have 2 different bugs at hand here?
> 
> Three different issues, it seems. Two ring initialization problems,
> one went away in 3.16 (for me), second did not (suspend for jikos),
> third -- trivial issue with 8to6 dither.

That's correct assesment.

The ring initialization failure I reported is still there.

-- 
Jiri Kosina
SUSE Labs
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-07-11 Thread Pavel Machek
On Mon 2014-07-07 10:39:08, Daniel Vetter wrote:
> On Fri, Jun 27, 2014 at 03:37:16PM +0200, Jiri Kosina wrote:
> > On Thu, 26 Jun 2014, Pavel Machek wrote:
> > 
> > > Ok, so I have set up machines for ktest / autobisect, and found out that 
> > > 3.16-rc1 no longer has that problem. Oh well, bisect would not be fun, 
> > > anyway...
> > 
> > I am still seeing the problem with 3.16-rc2.
> 
> I'm confused now. Is the bisect result
> 
> commit 773875bfb6737982903c42d1ee88cf60af80089c
> Author: Daniel Vetter 
> Date:   Mon Jan 27 10:00:30 2014 +0100
> 
> drm/i915: Don't set the 8to6 dither flag when not scaling
> 
> now the culprit or not? Or do we have 2 different bugs at hand here?

Three different issues, it seems. Two ring initialization problems,
one went away in 3.16 (for me), second did not (suspend for jikos),
third -- trivial issue with 8to6 dither.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-07-07 Thread Daniel Vetter
On Fri, Jun 27, 2014 at 03:37:16PM +0200, Jiri Kosina wrote:
> On Thu, 26 Jun 2014, Pavel Machek wrote:
> 
> > Ok, so I have set up machines for ktest / autobisect, and found out that 
> > 3.16-rc1 no longer has that problem. Oh well, bisect would not be fun, 
> > anyway...
> 
> I am still seeing the problem with 3.16-rc2.

I'm confused now. Is the bisect result

commit 773875bfb6737982903c42d1ee88cf60af80089c
Author: Daniel Vetter 
Date:   Mon Jan 27 10:00:30 2014 +0100

drm/i915: Don't set the 8to6 dither flag when not scaling

now the culprit or not? Or do we have 2 different bugs at hand here?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-27 Thread Jiri Kosina
On Thu, 26 Jun 2014, Pavel Machek wrote:

> Ok, so I have set up machines for ktest / autobisect, and found out that 
> 3.16-rc1 no longer has that problem. Oh well, bisect would not be fun, 
> anyway...

I am still seeing the problem with 3.16-rc2.

-- 
Jiri Kosina
SUSE Labs
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-25 Thread Pavel Machek
On Mon 2014-06-09 13:03:31, Jiri Kosina wrote:
> On Mon, 9 Jun 2014, Pavel Machek wrote:
> 
> > > > Strange. It seems 3.15 with the patch reverted only boots in 30% or so
> > > > cases... And I've seen resume failure, too, so maybe I was just lucky
> > > > that it worked for a while.
> > > 
> > > git bisect really likes 25f397a429dfa43f22c278d0119a60 - you're about
> > > the 5th report or so that claims this is the culprit but it's
> > > something else. The above code is definitely not used in i915 so bogus
> > > bisect result.
> > 
> > Note I did not do the bisect, I only attempted revert and test.
> > 
> > And did three boots of successful s2ram.. only to find out that it
> > does not really fix s2ram, I was just lucky :-(.
> > 
> > Unfortunately, this means my s2ram problem will be tricky/impossible
> > to bisect :-(.
> 
> Welcome to the situation I have been in for past several months.

Ok, so I have set up machines for ktest / autobisect, and found out that 
3.16-rc1 no
longer has that problem. Oh well, bisect would not be fun, anyway...


Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-09 Thread Jiri Kosina
On Mon, 9 Jun 2014, Pavel Machek wrote:

> > > Strange. It seems 3.15 with the patch reverted only boots in 30% or so
> > > cases... And I've seen resume failure, too, so maybe I was just lucky
> > > that it worked for a while.
> > 
> > git bisect really likes 25f397a429dfa43f22c278d0119a60 - you're about
> > the 5th report or so that claims this is the culprit but it's
> > something else. The above code is definitely not used in i915 so bogus
> > bisect result.
> 
> Note I did not do the bisect, I only attempted revert and test.
> 
> And did three boots of successful s2ram.. only to find out that it
> does not really fix s2ram, I was just lucky :-(.
> 
> Unfortunately, this means my s2ram problem will be tricky/impossible
> to bisect :-(.

Welcome to the situation I have been in for past several months.

-- 
Jiri Kosina
SUSE Labs
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-09 Thread Pavel Machek
On Mon 2014-06-09 11:25:20, Daniel Vetter wrote:
> On Sun, Jun 8, 2014 at 1:11 AM, Pavel Machek  wrote:
> > Strange. It seems 3.15 with the patch reverted only boots in 30% or so
> > cases... And I've seen resume failure, too, so maybe I was just lucky
> > that it worked for a while.
> 
> git bisect really likes 25f397a429dfa43f22c278d0119a60 - you're about
> the 5th report or so that claims this is the culprit but it's
> something else. The above code is definitely not used in i915 so bogus
> bisect result.

Note I did not do the bisect, I only attempted revert and test.

And did three boots of successful s2ram.. only to find out that it
does not really fix s2ram, I was just lucky :-(.

Unfortunately, this means my s2ram problem will be tricky/impossible
to bisect :-(.

Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-09 Thread Daniel Vetter
On Sun, Jun 8, 2014 at 1:11 AM, Pavel Machek  wrote:
> Strange. It seems 3.15 with the patch reverted only boots in 30% or so
> cases... And I've seen resume failure, too, so maybe I was just lucky
> that it worked for a while.

git bisect really likes 25f397a429dfa43f22c278d0119a60 - you're about
the 5th report or so that claims this is the culprit but it's
something else. The above code is definitely not used in i915 so bogus
bisect result.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-07 Thread Pavel Machek
On Sat 2014-06-07 14:06:14, Pavel Machek wrote:
> On Thu 2014-05-15 17:31:54, Daniel Vetter wrote:
> > On Thu, May 15, 2014 at 5:29 PM, Jiri Kosina  wrote:
> > >> > Note that X do work somehow after resume (I can't switch virtual
> > >> > desktops and dialog is stuck on screen, but it is not complete
> > >> > failure). I can do ctrl-alt-f1 and get to useful prompt.
> > >>
> > >> Oops. You were right. It seems it is duplicate after all.
> > >>
> > >> [drm:init_ring_common] *ERROR* render ring initialization failed ctl 
> > >> 0001f001 head 1020 tail  start 3000
> > >
> > > Pavel, thanks a lot for testing.
> > >
> > > Adding Daniel and Chris to CC -- we have another incarnation of the bug
> > > that is being chased in 76554.
> > 
> > Someone succeeding at a bisect would be awesome ... Note that the only
> > key here is the ring init failure in dmesg.
> 
> Oh and... the machine has problems comming up after reboot (never seen
> those before 3.15). Sometimes, boot will hang with blank screen, and hard
> powerdown is needed...

Strange. It seems 3.15 with the patch reverted only boots in 30% or so
cases... And I've seen resume failure, too, so maybe I was just lucky
that it worked for a while.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-07 Thread Pavel Machek
On Thu 2014-05-15 17:31:54, Daniel Vetter wrote:
> On Thu, May 15, 2014 at 5:29 PM, Jiri Kosina  wrote:
> >> > Note that X do work somehow after resume (I can't switch virtual
> >> > desktops and dialog is stuck on screen, but it is not complete
> >> > failure). I can do ctrl-alt-f1 and get to useful prompt.
> >>
> >> Oops. You were right. It seems it is duplicate after all.
> >>
> >> [drm:init_ring_common] *ERROR* render ring initialization failed ctl 
> >> 0001f001 head 1020 tail  start 3000
> >
> > Pavel, thanks a lot for testing.
> >
> > Adding Daniel and Chris to CC -- we have another incarnation of the bug
> > that is being chased in 76554.
> 
> Someone succeeding at a bisect would be awesome ... Note that the only
> key here is the ring init failure in dmesg.

Oh and... the machine has problems comming up after reboot (never seen
those before 3.15). Sometimes, boot will hang with blank screen, and hard
powerdown is needed...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-06-07 Thread Pavel Machek
On Thu 2014-05-15 17:31:54, Daniel Vetter wrote:
> On Thu, May 15, 2014 at 5:29 PM, Jiri Kosina  wrote:
> >> > Note that X do work somehow after resume (I can't switch virtual
> >> > desktops and dialog is stuck on screen, but it is not complete
> >> > failure). I can do ctrl-alt-f1 and get to useful prompt.
> >>
> >> Oops. You were right. It seems it is duplicate after all.
> >>
> >> [drm:init_ring_common] *ERROR* render ring initialization failed ctl 
> >> 0001f001 head 1020 tail  start 3000
> >
> > Pavel, thanks a lot for testing.
> >
> > Adding Daniel and Chris to CC -- we have another incarnation of the bug
> > that is being chased in 76554.
> 
> Someone succeeding at a bisect would be awesome ... Note that the only
> key here is the ring init failure in dmesg.

Hmm. I was slowly getting ready for doing the bisect, but it seems
someone did it for me.

Date: Wed, 28 May 2014 18:25:21 +0100
From: Ken Moffat 
To: Daniel Vetter 
Cc: Dave Airlie , linux-ker...@vger.kernel.org
Subject: Resume from suspend broken in 3.15. (bisected)
User-Agent: Mutt/1.5.23 (2014-03-12)

I manually reverted 25f397a429dfa43f22c278d0119a60a343aa568f and it
seems I'm back at working suspend/resume in i915.

The 3.15 regression seemed to be "if suspend works once, it seems to
work until reboot", otherwise it failed in cca 70% cases. I did two
reboots so far with 25f... reverted, and it seems to behave ok.

Best regards,  
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] 3.15-rc: regression in suspend

2014-05-15 Thread Daniel Vetter
On Thu, May 15, 2014 at 5:29 PM, Jiri Kosina  wrote:
>> > Note that X do work somehow after resume (I can't switch virtual
>> > desktops and dialog is stuck on screen, but it is not complete
>> > failure). I can do ctrl-alt-f1 and get to useful prompt.
>>
>> Oops. You were right. It seems it is duplicate after all.
>>
>> [drm:init_ring_common] *ERROR* render ring initialization failed ctl 
>> 0001f001 head 1020 tail  start 3000
>
> Pavel, thanks a lot for testing.
>
> Adding Daniel and Chris to CC -- we have another incarnation of the bug
> that is being chased in 76554.

Someone succeeding at a bisect would be awesome ... Note that the only
key here is the ring init failure in dmesg.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx