[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #124 from Alexandre Demers  ---
(In reply to comment #123)
> I just got 1 lockup. The 1st in 10 days.

Do you have exactly the same symptoms as before? On my side, everything is
still fine. The only problem I've encountered is related to X.

I've been running and testing everything with kernel 3.16 which has some other
fixes related to Cayman, so maybe you encountered a different issue than the
one from the current bug.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #123 from Marc  ---
I just got 1 lockup. The 1st in 10 days.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-07 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

Alexandre Demers  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #122 from Alexandre Demers  ---
Tested with kernel 3.16-RC4 (which includes the patch by commit
b0880e87c1fd038b84498944f52e52c3e86ebe59) and still working flawlessly. Closing
this bug. I'll reopen it if needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-06 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #121 from Alexandre Demers  ---
I just saw the latest RC kernel and the patch was included. Thanks!

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-06 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #120 from Alexandre Demers  ---
Alex, I think we can close this bug as soon as the "drm/radeon/dpm: fix vddci
setup typo on cayman" patch lands in the kernel tree: I had no problem for the
last couple of days since I applied the patch. The same goes for Martin.

Any chance of seeing it included in kernel 3.16 (against which I'm testing the
patch)?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-06 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #119 from Martin Andersson  ---
I have been running 3.15 + plus the latest patch for a couple of days now. So
far not a single lockup.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-04 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #118 from Alexandre Demers  ---
Well, everything seems to still be fine. Suspended, woken up, ran some games
(L4D2, FEZ, The Witcher 2, SS3), navigated on the web, watched some movies and
still running.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-03 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #117 from Marc  ---
Same here, no freeze with this patch for 24h so it looks this bug might be
fixed. I'll confirm again at the end of the day.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-03 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #116 from Alexandre Demers  ---
Up until now, no problem encountered. The problem fixed by the patch would also
points in the same direction as some of my previous tests and supposions were
heading. I think this might really be our culprit.

Which means we should also be in a position to reenable Spread Spectrum if
everything continues to run smoothly.

I'll give you some updates tomorrow.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-02 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #115 from Alexandre Demers  ---
Testing as of now with some games and videos. Temperature is around 79 Celcius,
power level changing as expected.

If everything works as we want, maybe we'll be able to look at bug 69721 (to
reach full speed capacity for cards not sticking to reference board)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-01 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #114 from Marc  ---
I am trying the patch on kernel 3.15.0 (I modified line 1318). I'll give a
feedback in 24h.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-01 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #113 from Alexandre Demers  ---
Thanks Alex, I had seen this patch yesterday and I added it to my things to be
testes. I'll test it as soon as possible, but I still need to fix my kernel
build setup. I had to reinstall all my system a couple of weeks ago. I should
be able to test it in the next couple of days.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-07-01 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #112 from Alex Deucher  ---
Created attachment 102082
  --> https://bugs.freedesktop.org/attachment.cgi?id=102082=edit
possible fix

Does this patch fix the issues?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-04-21 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #111 from Alexandre Demers  ---
To update a bit this bug: it is still experienced with kernel 3.15-rc1.

Also, something being displayed must be called to change (scrolling a window,
clicking a link, opening a new window, visualizing a video, the display going
in standby). It will not freeze if there is no modification to the display.

I'm convinced that spread spectrum is not linked to the bug. It just helps
stabilize things, but it is not the root cause of the problem.

By any chance, do you know what was modified in the new ucodes for Bonaire and
above?

It should be noted there seems to be no correlation between the power level and
the occurence of the bug. What I mean is, as long as DPM is enabled, it can
happen at the lowest power level at any time; the same goes when the power
level is at its highest. While I don't hear the profile change (by that I mean
the fan usually spins faster|lower when it does), I'm pretty sure it has to do
with the memory controller and I'm thinking more and more it comes from the
ucode. It could be because the memory controller doesn't wait to have completed
its changes to a new state before changing it again? It would explain why even
when doing light work, it can happen (a very short raise in power level and
going back to the previous state wouldn't be heard, the fan wouldn't have the
time to accelerate).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-04-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #110 from Alex Deucher  ---
(In reply to comment #109)
> Can we expect something similar to what was proposed in bug 75992 as a
> possible fix (new ucode)? Just praying for a new proposition to test over
> here...

I already checked.  There's no new mc ucode for cayman.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-04-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #109 from Alexandre Demers  ---
Can we expect something similar to what was proposed in bug 75992 as a possible
fix (new ucode)? Just praying for a new proposition to test over here...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-03-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #108 from Alexandre Demers  ---
Sorry for not giving any news with kernel 3.14 cycle for now: I'm struggling
with another bug unrelated to GPU and I haven't had much time to complete
bisecting it. Once done and fixed, I'll give some new inputs.

On a different topic, I hit a GPU hang/reset situation this week with a 3.13.6
kernel while dpm was enabled, which is... extremely rare. Usually, it either
hang completly at some point or work for some time without problem. I was able
to take a screen shot. I'll look in the logs to see if I can get something out.
However, I was wondering if I should push it here or open a new bug. Any
suggestion?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-03-07 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #107 from Alex Deucher  ---
(In reply to comment #106)
> Alex Deucher, should we try
> https://bugzilla.kernel.org/attachment.cgi?id=128321 as proposed in other
> bugs?

No, that patch only applies to evergreen and BTC parts.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-03-07 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #106 from Alexandre Demers  ---
Alex Deucher, should we try
https://bugzilla.kernel.org/attachment.cgi?id=128321 as proposed in other bugs?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-02-02 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #105 from Alexandre Demers  ---
(In reply to comment #104)
> (In reply to comment #103)
> > I don't remember if we've tried this recently, but does disabling power
> > containment help?
> > 
> > diff --git a/drivers/gpu/drm/radeon/ni_dpm.c
> > b/drivers/gpu/drm/radeon/ni_dpm.c
> > index 22c3391..19b7c68 100644
> > --- a/drivers/gpu/drm/radeon/ni_dpm.c
> > +++ b/drivers/gpu/drm/radeon/ni_dpm.c
> > @@ -4250,7 +4250,7 @@ int ni_dpm_init(struct radeon_device *rdev)
> > break;
> > }
> >  
> > -   if (ni_pi->cac_weights->enable_power_containment_by_default) {
> > +   if (0/*ni_pi->cac_weights->enable_power_containment_by_default*/) {
> > ni_pi->enable_power_containment = true;
> > ni_pi->enable_cac = true;
> > ni_pi->enable_sq_ramping = true;
> 
> I don't remember playing with it lately, so I'll try it either later today
> (tonight) or tomorrow. I have to complete a report first for a personal
> project.

I've been testing it since Friday night (Desktop and games). I had a single
lock, but it was while playing a video. Since I suspect this may be related to
other issues that I've seen patches for, I'll continue testing it and I'll
report in a couple of days if things seem stable (or more stable).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-30 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #104 from Alexandre Demers  ---
(In reply to comment #103)
> I don't remember if we've tried this recently, but does disabling power
> containment help?
> 
> diff --git a/drivers/gpu/drm/radeon/ni_dpm.c
> b/drivers/gpu/drm/radeon/ni_dpm.c
> index 22c3391..19b7c68 100644
> --- a/drivers/gpu/drm/radeon/ni_dpm.c
> +++ b/drivers/gpu/drm/radeon/ni_dpm.c
> @@ -4250,7 +4250,7 @@ int ni_dpm_init(struct radeon_device *rdev)
> break;
> }
>  
> -   if (ni_pi->cac_weights->enable_power_containment_by_default) {
> +   if (0/*ni_pi->cac_weights->enable_power_containment_by_default*/) {
> ni_pi->enable_power_containment = true;
> ni_pi->enable_cac = true;
> ni_pi->enable_sq_ramping = true;

I don't remember playing with it lately, so I'll try it either later today
(tonight) or tomorrow. I have to complete a report first for a personal
project.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-30 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #103 from Alex Deucher  ---
I don't remember if we've tried this recently, but does disabling power
containment help?

diff --git a/drivers/gpu/drm/radeon/ni_dpm.c b/drivers/gpu/drm/radeon/ni_dpm.c
index 22c3391..19b7c68 100644
--- a/drivers/gpu/drm/radeon/ni_dpm.c
+++ b/drivers/gpu/drm/radeon/ni_dpm.c
@@ -4250,7 +4250,7 @@ int ni_dpm_init(struct radeon_device *rdev)
break;
}

-   if (ni_pi->cac_weights->enable_power_containment_by_default) {
+   if (0/*ni_pi->cac_weights->enable_power_containment_by_default*/) {
ni_pi->enable_power_containment = true;
ni_pi->enable_cac = true;
ni_pi->enable_sq_ramping = true;

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-23 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #102 from Alexandre Demers  ---
Went ahead with drm-next 3.14 and it still hangs from time to time (not better,
not worse)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-20 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #101 from Marc  ---
It does greatly help for me as well as it lasted 5h.

But setting the option to glamor isn't so good for me because there were
display artifacts in some cases. For example I would open vinagre and the
header bar (where there is Remote / View / ... menus and the Connect /
Disconnect / ... buttons) wouldn't be drawn, I would see the content of what
was there on the screen before I open that window instead. Same with
gnome-control-center.

I just noticed a package update in my distrib today:
glamor-egl-0.5.1.r258-1-x86_64

I might give it another shot. I'll also update my kernel to 3.13 as I am on
3.12.7 at the moment.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-18 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #100 from Alexandre Demers  ---
I also experienced a hang last night when quitting a game with my said "more
stable" setup. It does seem to greatly help though on my side.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-18 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #99 from marc at ttux.net ---
I tried radeon.dpm=1 and glamor. It took 5h but same issue, my screen turned
white and the PC rebooted.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-18 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #98 from Alexandre Demers  ---
I went back to EXA and my display froze after under 10 minutes. So it begins to
look like a trend to me about EXA VS Glamour. Still, I'm continuing to test
(I've just activated Glamor once more).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-17 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #97 from Alexandre Demers  ---
(In reply to comment #96)
> (In reply to comment #95)
> > Second night without any problem (Video, Glamor and games were all tested).
> > Now, I'll try to find what seems to be part of the solution by reverting one
> > change at a time until the system begins to hang again.
> 
> It may some sort of synchronization issue between the EXA acceleration code
> in ddx and the 3D acceleration code in mesa.  When you use glamor, all
> acceleration uses the code in mesa.

Indeed. I'll begin by reverting this change first then.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-17 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #96 from Alex Deucher  ---
(In reply to comment #95)
> Second night without any problem (Video, Glamor and games were all tested).
> Now, I'll try to find what seems to be part of the solution by reverting one
> change at a time until the system begins to hang again.

It may some sort of synchronization issue between the EXA acceleration code in
ddx and the 3D acceleration code in mesa.  When you use glamor, all
acceleration uses the code in mesa.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-17 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #95 from Alexandre Demers  ---
Second night without any problem (Video, Glamor and games were all tested).
Now, I'll try to find what seems to be part of the solution by reverting one
change at a time until the system begins to hang again.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-16 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #94 from Alexandre Demers  ---
And funny enough on my side, I cold booted with this option last night and was
able to play all the time, go online and run some movies without any problem.

However, I must specify some modifications I did recently:
Updated LLVM to 3.4 (this prevents some application crashes I was experiencing)
Switched to GLAMOR (fixed a rendering issue I had when not using it with
applications using LLVM 3.4)

However, I've been lucky before from time to time where I had no crashes, so it
might just be that. But I suspect cold booting and hot booting may be part of
the problem with the hyperz disabled. More test to come tonight.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-16 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #93 from marc at ttux.net ---
I thought R600_DEBUG=nohyperz worked out for me but it just delayed the crash
so instead of couple of minutes, it took almost 1h.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-15 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #92 from Alexandre Demers  ---
(In reply to comment #91)
> (In reply to comment #88)
> > Forget that... locked as always.
> > 
> > I'll test 3.14 soon, either from Alex's drm-next or when we'll get
> > 3.14-rc1...
> 
> Have you tried disabling hyperz globally rather than just for a specific
> app?  E.g., set env var R600_DEBUG=nohyperz in /etc/environment or however
> your distro handles global env vars.

That's what I did. I'll test it again, just in case.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-15 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #91 from Alex Deucher  ---
(In reply to comment #88)
> Forget that... locked as always.
> 
> I'll test 3.14 soon, either from Alex's drm-next or when we'll get
> 3.14-rc1...

Have you tried disabling hyperz globally rather than just for a specific app? 
E.g., set env var R600_DEBUG=nohyperz in /etc/environment or however your
distro handles global env vars.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-15 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #90 from Alexandre Demers  ---
(In reply to comment #89)
> I am on 3.12.7 and the R600_DEBUG=nohyperz seems to work fine for me.
> Without it, it crashes after just couple of minutes (machine rebooting
> automatically). I have a HD6950. So thank you for the tip.

Are you seeing this problem with HyperZ since the introduction of dpm or is it
something new (a regression)? Have you tried a 3.13-rcX kernel to see if this
is still happening?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-15 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #89 from marc at ttux.net ---
I am on 3.12.7 and the R600_DEBUG=nohyperz seems to work fine for me. Without
it, it crashes after just couple of minutes (machine rebooting automatically).
I have a HD6950. So thank you for the tip.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-12 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #88 from Alexandre Demers  ---
Forget that... locked as always.

I'll test 3.14 soon, either from Alex's drm-next or when we'll get 3.14-rc1...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-12 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #87 from Martin Andersson  ---
(In reply to comment #86)
> Martin, could you try something? I may have been lucky until now, but I've
> been running with kernel 3.13-rc7 with HyperZ disabled and for now it has
> been stable (inspired by another bug report, bug 73088).

I'm running 3.13-rc7 and it has been stable for me so far, but I have only
played FTL, since I got bored with serious sam 3.

I will try to find a benchmark that triggers the hang and then I can see if
disabling HyperZ helps.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2014-01-12 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #86 from Alexandre Demers  ---
Martin, could you try something? I may have been lucky until now, but I've been
running with kernel 3.13-rc7 with HyperZ disabled and for now it has been
stable (inspired by another bug report, bug 73088).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-22 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #85 from Alexandre Demers  ---
(In reply to comment #84)
> (In reply to comment #83)
> > Alex Deucher, would there be any interest in testing "[PATCH 00/18] Rework
> > PM init order" and the following ones?
> 
> Yeah, probably worth a shot.  For convenience, I've pushed the patches to a
> branch:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=dpm-reorder

Nope, not working...

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-20 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #84 from Alex Deucher  ---
(In reply to comment #83)
> Alex Deucher, would there be any interest in testing "[PATCH 00/18] Rework
> PM init order" and the following ones?

Yeah, probably worth a shot.  For convenience, I've pushed the patches to a
branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=dpm-reorder

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-20 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #83 from Alexandre Demers  ---
Alex Deucher, would there be any interest in testing "[PATCH 00/18] Rework PM
init order" and the following ones?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-16 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #82 from Martin Andersson  ---
I can now also report that disabling spread spectrum isn't a complete fix for
me either. It is much better but I still get hangs.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-16 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #81 from Alexandre Demers  ---
I think I've figured out what's going on (now running rc4 with ss disabled
patch + a little modification). I'll confirm it and I'll be back.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #80 from Alexandre Demers  ---
(In reply to comment #79)
> Maybe we don't have the same issue then (or our cards have different
> sensitivity to this issue) because I see a huge improvement with spread
> spectrum disabled. I saw that you had got hangs while playing serious sam 3,
> so I bought it and tried with spread spectrum enabled and it hung within 5
> minutes, but when I disabled spread spectrum I played for 2 hours without
> problems.
> 
> I will continue testing it and report back if I encounter any problems.

That's why I suggested to push it as a workaround for now since it seems to
help both of us on different levels. Only, it's not a total cure on my side.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #79 from Martin Andersson  ---
Maybe we don't have the same issue then (or our cards have different
sensitivity to this issue) because I see a huge improvement with spread
spectrum disabled. I saw that you had got hangs while playing serious sam 3, so
I bought it and tried with spread spectrum enabled and it hung within 5
minutes, but when I disabled spread spectrum I played for 2 hours without
problems.

I will continue testing it and report back if I encounter any problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #78 from Alexandre Demers  ---
(In reply to comment #77)
> (In reply to comment #76)
> >
> > Since my last cold boot, I tried with unlimited clocks (reverted previous
> > patches) with hangs almost at start. I tried again with stock 3.13-rc3 with
> > spread spectrum disabled, and I also hit hangs when testing (without having
> > to push the video card with heavy tests). So, for me, disabling spread
> > spectrum may sometimes help, but it is not a real solution to this bug.
> > 
> > Sorry for bringing bad news.
> 
> Make sure you try on a cold boot.

I had already done it and from what I could see until now, it doesn't change
anything if it is a cold or a hot boot. Just in case, after your comment, I did
it again and it still ended hung (3.13-rc3 with all spread spectrum disabled).
It happened before that my system would not hang for some reason once in many
boots, maybe that's what happened.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #77 from Alex Deucher  ---
(In reply to comment #76)
>
> Since my last cold boot, I tried with unlimited clocks (reverted previous
> patches) with hangs almost at start. I tried again with stock 3.13-rc3 with
> spread spectrum disabled, and I also hit hangs when testing (without having
> to push the video card with heavy tests). So, for me, disabling spread
> spectrum may sometimes help, but it is not a real solution to this bug.
> 
> Sorry for bringing bad news.

Make sure you try on a cold boot.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-13 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #76 from Alexandre Demers  ---
(In reply to comment #75)
> Created attachment 90667 [details]
> possible fix
> 
> Sounds like both are causing problems.

Since my last cold boot, I tried with unlimited clocks (reverted previous
patches) with hangs almost at start. I tried again with stock 3.13-rc3 with
spread spectrum disabled, and I also hit hangs when testing (without having to
push the video card with heavy tests). So, for me, disabling spread spectrum
may sometimes help, but it is not a real solution to this bug.

Sorry for bringing bad news.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-12 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

Alex Deucher  changed:

   What|Removed |Added

  Attachment #90542|0   |1
is obsolete||

--- Comment #75 from Alex Deucher  ---
Created attachment 90667
  --> https://bugs.freedesktop.org/attachment.cgi?id=90667=edit
possible fix

Sounds like both are causing problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-12 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #74 from Alexandre Demers  ---
On my side, disabling spread spectrum but keeping the patches limiting the gpu
and mem clocks to reference board ones seems to be OK. I've been running
phoronix-test-suite tests for the last day without problem. Still, while it
seems stable, it may not correct everything, only time will tell.

I'll have to test again with patches from bug 68235 reverted because I suspect
I had selected the wrong kernel when I last posted my results about it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #73 from Alexandre Demers  ---
(In reply to comment #72)
> I have now run another 12+ hour test without problems, where I disabled both
> sclk_ss and mclk_ss and disabled the clock limiting code (even though I
> didn't see any difference to the clock speeds).
> 
> So at least for me it seems very stable with both sclk_ss and mclk_ss
> disabled. I will continue using it to see if I will run into any problems.

Good to know. I'll be also confirming on my side soon (maybe tomorrow).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #72 from Martin Andersson  ---
I have now run another 12+ hour test without problems, where I disabled both
sclk_ss and mclk_ss and disabled the clock limiting code (even though I didn't
see any difference to the clock speeds).

So at least for me it seems very stable with both sclk_ss and mclk_ss disabled.
I will continue using it to see if I will run into any problems.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #71 from Alexandre Demers  ---
(In reply to comment #70)
> (In reply to comment #67)
> > I may have been lucky until now, but disabling the whole ss seems to be
> > better. I'll continue testing.
> 
> It took me longer, but I hanged the GPU again running Phoronix test suite.
> Nothing in the journal about the latest hang. I'm now testing with mclk_ss
> disabled on stock 3.13-rc3 (no patch reverted).

Oops, I may have to revisit what I just said. I'll be in touch soon.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-11 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #70 from Alexandre Demers  ---
(In reply to comment #67)
> I may have been lucky until now, but disabling the whole ss seems to be
> better. I'll continue testing.

It took me longer, but I hanged the GPU again running Phoronix test suite.
Nothing in the journal about the latest hang. I'm now testing with mclk_ss
disabled on stock 3.13-rc3 (no patch reverted).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-10 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #69 from Martin Andersson  ---
I have now completed a 12+ hour long test run with dynamic_ss disabled (mclk_ss
was also disable) without any problem. So it seems that disabling mclk_ss makes
it a little more stable, but disabling dynamic_ss makes it much more stable. I
have not had any lockups with dynamic_ss disabled, but I haven't tested it that
much. Only this long session and one 6+ hour long session before that.

The next thing I'm gonna test is dynamic_ss disabled and the patches from
https://bugs.freedesktop.org/show_bug.cgi?id=68235 reverted.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-10 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #68 from Martin Andersson  ---
It seems I also spoke to soon, because I tried again with mclk disabled but
this time with the test method where I forced the low and high power levels by
using  /sys/class/drm/card0/device/power_dpm_force_performance_level and this
time I got a lockup. 

So either it was just chance that I didn't hit the lockup with mclk disabled
the first time and/or my second test method have a higher chance of triggering
the lockup.

I'm currently running a test with dynamic_ss disabled.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-10 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #67 from Alexandre Demers  ---
I may have been lucky until now, but disabling the whole ss seems to be better.
I'll continue testing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-10 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #66 from Alexandre Demers  ---
(In reply to comment #65)
> (In reply to comment #63)
> > Created attachment 90542 [details] [review] [review]
> > possible fix
> > 
> > Thanks for tracking this down.  The attached patch should fix the issue. 
> > With this fixed, it may be worth checking to see if you can reliably use the
> > tweaked clocks on certain oem boards (basically disable the ni code added to
> > fix bug 69723).
> 
> I reverted patches from bug 68235 as suggested, applied the proposed patch
> to disable mclk spread spectrum and I'm now running this new kernel. First
> observation: without patches from bug 68235 prior to disabling mclk ss, I
> was unable to load a session without it hanging in less thant a minute. It's
> now rock solid running GPU clock at 830MHz and memory clock at 1300MHz. I'll
> run a couple of other tests, but I think we got it!

Sadly, I pushed the card a bit harder with Serious Sam 3 and it hanged again.
So I'll try two different things:
- keep spread spectrum disabled only for mclk but reapplying both patches from
bug 68235;
- disable all spread spectrum without reapplying the other two patches.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-10 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #65 from Alexandre Demers  ---
(In reply to comment #63)
> Created attachment 90542 [details] [review]
> possible fix
> 
> Thanks for tracking this down.  The attached patch should fix the issue. 
> With this fixed, it may be worth checking to see if you can reliably use the
> tweaked clocks on certain oem boards (basically disable the ni code added to
> fix bug 69723).

I reverted patches from bug 68235 as suggested, applied the proposed patch to
disable mclk spread spectrum and I'm now running this new kernel. First
observation: without patches from bug 68235 prior to disabling mclk ss, I was
unable to load a session without it hanging in less thant a minute. It's now
rock solid running GPU clock at 830MHz and memory clock at 1300MHz. I'll run a
couple of other tests, but I think we got it!

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-10 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #64 from Alexandre Demers  ---
(In reply to comment #63)
> Created attachment 90542 [details] [review]
> possible fix
> 
> Thanks for tracking this down.  The attached patch should fix the issue. 
> With this fixed, it may be worth checking to see if you can reliably use the
> tweaked clocks on certain oem boards (basically disable the ni code added to
> fix bug 69723).

You certainly meant bug 68235. I'll test it in a couple of minutes.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #63 from Alex Deucher  ---
Created attachment 90542
  --> https://bugs.freedesktop.org/attachment.cgi?id=90542=edit
possible fix

Thanks for tracking this down.  The attached patch should fix the issue.  With
this fixed, it may be worth checking to see if you can reliably use the tweaked
clocks on certain oem boards (basically disable the ni code added to fix bug
69723).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #62 from Martin Andersson  ---
(In reply to comment #61)
> (In reply to comment #60)
> > Instead of triggering the power level switches by running GpuTest in bursts,
> > I put this in a bash script:
> > 
> > for i in {1..3600}
> > do
> >echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level
> >sleep 1
> >echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
> >sleep 5
> > done
> > 
> > and just let GpuTest, and piglit, run continously. I found that this also
> > trigger the lockups within minutes.
> > 
> > Then I ran the script by itself, no GpuTest or piglit, and left it running
> > while I was at work. When I came home the machine was still running, so it
> > ran for six hours without any lockup. So it seems the power level switching
> > alone is not sufficient to trigger the lockups, it also needs a load of some
> > sort.
> 
> Do you mean while spread spectrum is still enabled?

Yes

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #61 from Alexandre Demers  ---
(In reply to comment #60)
> Instead of triggering the power level switches by running GpuTest in bursts,
> I put this in a bash script:
> 
> for i in {1..3600}
> do
>echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level
>sleep 1
>echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
>sleep 5
> done
> 
> and just let GpuTest, and piglit, run continously. I found that this also
> trigger the lockups within minutes.
> 
> Then I ran the script by itself, no GpuTest or piglit, and left it running
> while I was at work. When I came home the machine was still running, so it
> ran for six hours without any lockup. So it seems the power level switching
> alone is not sufficient to trigger the lockups, it also needs a load of some
> sort.

Do you mean while spread spectrum is still enabled?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #60 from Martin Andersson  ---
Instead of triggering the power level switches by running GpuTest in bursts, I
put this in a bash script:

for i in {1..3600}
do
   echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level
   sleep 1
   echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
   sleep 5
done

and just let GpuTest, and piglit, run continously. I found that this also
trigger the lockups within minutes.

Then I ran the script by itself, no GpuTest or piglit, and left it running
while I was at work. When I came home the machine was still running, so it ran
for six hours without any lockup. So it seems the power level switching alone
is not sufficient to trigger the lockups, it also needs a load of some sort.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #59 from Alexandre Demers  ---
This morning, I tested a bit the kernel after disabling only mclk_ss and it
seems to work correctly when it is disabled. Martin may have put his finger
where the problem is.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #58 from Alexandre Demers  ---
Disabling dynamic_ss seems also to do the trick over here.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-09 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #57 from Alexandre Demers  ---
(In reply to comment #54)
> However if I set pi->dynamic_ss to false the lockups disappear, it also
> works with dynamic_ss set to true and pi->mclk_ss set to false.
>
So this seems to point to a spread spectrum mischief. I don't know if
dynamic_ss automatically applies to mclk but it seems to, since disabling
spread spectrum only for mclk solves your problem. We could suspect that at a
given frequency, we have a problem restoring the original message / clock (the
higher we get, the harder it is) until at some point it becomes unreliable.

I should be able to test it later tonight to confirm if this fixes the bug on
my side too.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-08 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #56 from Martin Andersson  ---
(In reply to comment #55)
> I'll try to reproduce your observations in the next couple of days, but I'm
> pretty sure we are experiencing the same problem. What is your video card
> model?

Sapphire Radeon HD 6950

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-08 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #55 from Alexandre Demers  ---
I'll try to reproduce your observations in the next couple of days, but I'm
pretty sure we are experiencing the same problem. What is your video card
model?

About the performance level switch, you could either force level 2 to use the
same values as level 1 OR you could catch when the performance level and force
it to 1.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-08 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #54 from Martin Andersson  ---
I have a 6950 and I'm seeing the exact same things as Alexandre, random hangs
that completely lockup the machine. Can't ssh into it and nothing is printed to
the logs, the only thing that works is a power cycle. If I disable dpm the
machine is stable.

I have run with dpm since 3.11 and I have had the occasional lockup, maybe one
every two weeks. But I started playing some more games recently and noticed
that the lockups became much more frequent. So I decided to investigate.

The method I use to trigger a lockup is to run GpuTest in loop, with a 10
seconds sleep after each run. I do this to trigger power level switches. The
arguments to GpuTest is /test=plot3d /benchmark /benchmark_duration_ms=1
/no_scorebox. At the same time I run piglit quick.tests in a loop, I later
found out that the piglit tests are not essential to get lockups but I kept
doing them for consistency. 20 of these tests have resulted in a lockup, of
these the longest running one lasted 80 minutes and shortest 3 minutes with an
average of 23 minutes. The tests that didn't cause lockups either had dpm
completely disabled or only certain features, which features are described
below. If I run GpuTest constantly, without the sleep and longer benchmark
duration, I don't get any lockups (I have done several long runs, with longest
being over six hours).

I also tried to find a good commit. I started with
7ad8d0687bb5030c3328bc7229a3183ce179ab25 (drm/radeon/dpm: re-enable state
transitions for Cayman) + the gcc fixes, but I get lockups on that commit as
well. I checked out 3.13-rc2 and started disabling features in ni_dpm_init. I
disabled the following things without any improvement. I reenabled each feature
after I had tested it and cold booted the machine.

eg_pi->smu_uvd_hs
pi->mvdd_control
eg_pi->vddci_control
pi->gfx_clock_gating
pi->mg_clock_gating
pi->mgcgtssm
pi->dynamic_pcie_gen2
pi->thermal_protection
pi->display_gap
pi->dcodt
pi->ulps
eg_pi->abm
eg_pi->mcls
eg_pi->light_sleep
eg_pi->memory_transition
ni_pi->cac_weights->enable_power_containment_by_default
ni_pi->use_power_boost_limit
pi->sclk_ss

eg_pi->pcie_performance_request, was already false so I didn't test it.

I noticed that pi->mvdd_control wasn't set, is that normal?

I don't get any lockups with pi->voltage_control disabled, but I also don't get
any power level switches.

If I set eg_pi->dynamic_ac_timing to false my machine lockups somewhere in the
boot process, I haven't looked into that any deeper.

However if I set pi->dynamic_ss to false the lockups disappear, it also works
with dynamic_ss set to true and pi->mclk_ss set to false.

So it seems, at least for me, it has something to do with mclk together with
power level switches. I'm not sure what to test next, but one thing might be to
try to remove the performance power level 2, so that it could only switch
between 0 and 1. But I haven't figured out how to accomplish that yet.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-12-02 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #53 from Alexandre Demers  ---
Possible circular locking dependency and a DEADLOCK. Using latest 3.13.0-rc2
(with some added printk(), so no difference from rc1) and having set
performance level to low, I've just found the following in my journal from last
night. The system never crashed nor anything, but this may be a clue pointing
at... no one else than pm.mclk...


Dec 02 01:01:05 Xander kernel: Dec 02 01:01:05 Xander kernel:
==
Dec 02 01:01:05 Xander kernel: [ INFO: possible circular locking dependency
detected ]
Dec 02 01:01:05 Xander kernel: 3.13.0-rc2-VANILLA-dirty #170 Tainted: G
C  
Dec 02 01:01:05 Xander kernel:
---
Dec 02 01:01:05 Xander kernel: chromium/3786 is trying to acquire lock:
Dec 02 01:01:05 Xander kernel:  (reservation_ww_class_mutex){+.+.+.}, at:
[] ttm_bo_wait_unreserved+0x39/0x70 [ttm]
Dec 02 01:01:05 Xander kernel: 
   but task is already holding lock:
Dec 02 01:01:05 Xander kernel:  (>wu_mutex){+.+...}, at:
[] ttm_bo_wait_unreserved+0x1d/0x70 [ttm]
Dec 02 01:01:05 Xander kernel: 
   which lock already depends on the new lock.
Dec 02 01:01:05 Xander kernel: 
   the existing dependency chain (in reverse order)
is:
Dec 02 01:01:05 Xander kernel: 
   -> #4 (>wu_mutex){+.+...}:
Dec 02 01:01:05 Xander kernel:[]
lock_acquire+0x72/0xa0
Dec 02 01:01:05 Xander kernel:[]
mutex_lock_interruptible_nested+0x58/0x550
Dec 02 01:01:05 Xander kernel:[]
ttm_bo_wait_unreserved+0x1d/0x70 [ttm]
Dec 02 01:01:05 Xander kernel:[]
ttm_bo_vm_fault+0x389/0x470 [ttm]
Dec 02 01:01:05 Xander kernel:[]
radeon_ttm_fault+0x47/0x60 [radeon]
Dec 02 01:01:05 Xander kernel:[]
__do_fault+0x6c/0x4c0
Dec 02 01:01:05 Xander kernel:[]
handle_mm_fault+0x2e6/0xc90
Dec 02 01:01:05 Xander kernel:[]
__do_page_fault+0x165/0x560
Dec 02 01:01:05 Xander kernel:[]
do_page_fault+0x9/0x10
Dec 02 01:01:05 Xander kernel:[] page_fault+0x28/0x30
Dec 02 01:01:05 Xander kernel: 
   -> #3 (>pm.mclk_lock){++}:
Dec 02 01:01:05 Xander kernel:[]
lock_acquire+0x72/0xa0
Dec 02 01:01:05 Xander kernel:[] down_write+0x31/0x60
Dec 02 01:01:05 Xander kernel:[]
radeon_pm_compute_clocks+0x2ee/0x790 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_pm_init+0x7c1/0x960 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_modeset_init+0x40f/0x9a0 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_driver_load_kms+0xe0/0x210 [radeon]
Dec 02 01:01:05 Xander kernel:[]
drm_dev_register+0x9f/0x1d0
Dec 02 01:01:05 Xander kernel:[]
drm_get_pci_dev+0x8d/0x140
Dec 02 01:01:05 Xander kernel:[]
radeon_pci_probe+0x9f/0xd0 [radeon]
Dec 02 01:01:05 Xander kernel:[]
local_pci_probe+0x40/0xa0
Dec 02 01:01:05 Xander kernel:[]
work_for_cpu_fn+0xf/0x20
Dec 02 01:01:05 Xander kernel:[]
process_one_work+0x1cb/0x490
Dec 02 01:01:05 Xander kernel:[]
worker_thread+0x258/0x3a0
Dec 02 01:01:05 Xander kernel:[] kthread+0xf7/0x110
Dec 02 01:01:05 Xander kernel:[]
ret_from_fork+0x7c/0xb0
Dec 02 01:01:05 Xander kernel: 
   -> #2 (>struct_mutex){+.+.+.}:
Dec 02 01:01:05 Xander kernel:[]
lock_acquire+0x72/0xa0
Dec 02 01:01:05 Xander kernel:[]
mutex_lock_nested+0x4b/0x490
Dec 02 01:01:05 Xander kernel:[]
radeon_pm_compute_clocks+0x2e6/0x790 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_pm_init+0x7c1/0x960 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_modeset_init+0x40f/0x9a0 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_driver_load_kms+0xe0/0x210 [radeon]
Dec 02 01:01:05 Xander kernel:[]
drm_dev_register+0x9f/0x1d0
Dec 02 01:01:05 Xander kernel:[]
drm_get_pci_dev+0x8d/0x140
Dec 02 01:01:05 Xander kernel:[]
radeon_pci_probe+0x9f/0xd0 [radeon]
Dec 02 01:01:05 Xander kernel:[]
local_pci_probe+0x40/0xa0
Dec 02 01:01:05 Xander kernel:[]
work_for_cpu_fn+0xf/0x20
Dec 02 01:01:05 Xander kernel:[]
process_one_work+0x1cb/0x490
Dec 02 01:01:05 Xander kernel:[]
worker_thread+0x258/0x3a0
Dec 02 01:01:05 Xander kernel:[] kthread+0xf7/0x110
Dec 02 01:01:05 Xander kernel:[]
ret_from_fork+0x7c/0xb0
Dec 02 01:01:05 Xander kernel: 
   -> #1 (>pm.mutex){+.+.+.}:
Dec 02 01:01:05 Xander kernel:[]
lock_acquire+0x72/0xa0
Dec 02 01:01:05 Xander kernel:[]
mutex_lock_nested+0x4b/0x490
Dec 02 01:01:05 Xander kernel:[]
radeon_dpm_enable_uvd+0x79/0xc0 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_uvd_note_usage+0xef/0x110 [radeon]
Dec 02 01:01:05 Xander kernel:[]
radeon_cs_ioctl+0x8f0/0x9f0 

[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-29 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #52 from Alexandre Demers  ---
I disabled R600_LLVM and ran another piglit at high power level. It crashed
anyway. But I got a different message in my journal:

Nov 29 14:02:48 Xander kernel: glx-create-cont[14825]: segfault at 17c ip
7f474ede915e sp 7fff555e8ff0 error 6 in
r600_dri.so[7f474e81d000+80e000]
Nov 29 14:02:48 Xander systemd-coredump[14834]: Process 14825 (glx-create-cont)
dumped core.
Nov 29 14:09:06 Xander kernel: traps: shader_runner[20951] trap int3
ip:7fd47a14782f sp:7fffeb3ab520 error:0
Nov 29 14:09:06 Xander systemd-coredump[20968]: Process 20951 (shader_runner)
dumped core.
Nov 29 14:09:21 Xander kernel: traps: shader_runner[22965] trap int3
ip:7f9e6348982f sp:7fff739db970 error:0
Nov 29 14:09:21 Xander systemd-coredump[22983]: Process 22965 (shader_runner)
dumped core.
Nov 29 14:11:09 Xander dbus-daemon[3115]: dbus[3115]: [system] Activating via
systemd: service name='org.freedesktop.hostname1'
unit='dbus-org.freedesktop.hostname1.service'
Nov 29 14:11:09 Xander dbus[3115]: [system] Activating via systemd: service
name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service'
Nov 29 14:11:09 Xander systemd[1]: Starting Hostname Service...
Nov 29 14:11:09 Xander dbus-daemon[3115]: dbus[3115]: [system] Successfully
activated service 'org.freedesktop.hostname1'
Nov 29 14:11:09 Xander dbus[3115]: [system] Successfully activated service
'org.freedesktop.hostname1'
Nov 29 14:11:09 Xander systemd[1]: Started Hostname Service.
Nov 29 14:12:19 Xander kernel: radeon_gem_object_create:62 alloc size 1365Mb
bigger than 256Mb limit
Nov 29 14:12:19 Xander kernel: radeon_gem_object_create:62 alloc size 1365Mb
bigger than 256Mb limit
Nov 29 14:12:41 Xander kernel: radeon_gem_object_create:62 alloc size 1024Mb
bigger than 256Mb limit
Nov 29 14:12:41 Xander kernel: radeon_gem_object_create:62 alloc size 1024Mb
bigger than 256Mb limit
Nov 29 14:13:47 Xander systemd[1]: Starting Cleanup of Temporary Directories...
-- Reboot --

Still dumps and a GEM object allocation problem. Anything usefull?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-29 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #51 from Alexandre Demers  ---
(In reply to comment #50)
> (In reply to comment #49)
> > (In reply to comment #48)
> > > The display went black a couple of time and according to dmesg/journal, 
> > > it was
> > > related to texelFetch segfaulting in llvm [...]
> > 
> > 'The display went black a couple of time' sounds like GPU resets after
> > lockups, which are unlikely to be directly related to segfaulting piglit
> > tests FWIW.
> 
> Maybe, but I have no other indication. Also, I'm pretty that, when in auto
> power level, when I run my piglit tests, it locks at there.
> 
> Another thing I noticed: if I run the quick piglit tests at a low power
> level, the total number of passed tests is higher than when I run it at an
> higher speed. The failing tests explaining this difference are mostly
> texelFetch related. I'll send the piglit results from both runs once I'll
> have set back mclk to its default value.

Oops, "... I'm pretty that..." -> "... I'm pretty sure that..."

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-29 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #50 from Alexandre Demers  ---
(In reply to comment #49)
> (In reply to comment #48)
> > The display went black a couple of time and according to dmesg/journal, it 
> > was
> > related to texelFetch segfaulting in llvm [...]
> 
> 'The display went black a couple of time' sounds like GPU resets after
> lockups, which are unlikely to be directly related to segfaulting piglit
> tests FWIW.

Maybe, but I have no other indication. Also, I'm pretty that, when in auto
power level, when I run my piglit tests, it locks at there.

Another thing I noticed: if I run the quick piglit tests at a low power level,
the total number of passed tests is higher than when I run it at an higher
speed. The failing tests explaining this difference are mostly texelFetch
related. I'll send the piglit results from both runs once I'll have set back
mclk to its default value.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-29 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #49 from Michel D?nzer  ---
(In reply to comment #48)
> The display went black a couple of time and according to dmesg/journal, it was
> related to texelFetch segfaulting in llvm [...]

'The display went black a couple of time' sounds like GPU resets after lockups,
which are unlikely to be directly related to segfaulting piglit tests FWIW.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-29 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #48 from Alexandre Demers  ---
Went to 122500, still no total hang. Ran quick.test (piglit) while playing a
movie. The display went black a couple of time and according to dmesg/journal,
it was related to texelFetch segfaulting in llvm (another bug I've reported).
So, for now, we can say it is still stable at this frequency. I'll push it a
bit more.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #47 from Alexandre Demers  ---
Latest news: tried to force VDDC to 1100 (some 6950 cards are using 1.1V
instead of 1.06V, mostly factory overclocked ones), tried to force VDDCI to
1150 for all power levels (crashed even quicker), tried to force VDDCI to 1100
for high power level just in case (hanged as usual after mostly the same
delay).

So I turned my attention on mclk and downclocked it to 12 (instead of
125000). Until now, the auto power level runs fine. I'll tweak it until I can
find out at which value it begins to hang... and why it hangs mostly on auto
power level and almost never on high or low (even at high, which is mostly as
if dpm was disabled)... and never under Windows.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #46 from Alexandre Demers  ---
(In reply to comment #45)
> The windows driver should operate pretty much the same as the linux driver
> as far as I know.  I'm not really familiar with how gpu-z reads back the
> clocks and voltages and that may have something to do with the differences. 
> Not all of the aspects of the level transition happen at the same time.
> 
> You could try setting the vddc to 1060 and changing the mclk of level 1 (mid
> level) to 650Mhz.  Perhaps the jump from 150 to 1300 is too big and the pll
> is not able to lock properly.

I tried it, but I had the same result: a hang. However, I'd be curious to see
the result on the ower con sumption. I'll keep that for another time.

I'll try to follow Vddci under Windows with something like Afterburner.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #45 from Alex Deucher  ---
The windows driver should operate pretty much the same as the linux driver as
far as I know.  I'm not really familiar with how gpu-z reads back the clocks
and voltages and that may have something to do with the differences.  Not all
of the aspects of the level transition happen at the same time.

You could try setting the vddc to 1060 and changing the mclk of level 1 (mid
level) to 650Mhz.  Perhaps the jump from 150 to 1300 is too big and the pll is
not able to lock properly.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #44 from Alexandre Demers  ---
Created attachment 89885
  --> https://bugs.freedesktop.org/attachment.cgi?id=89885=edit
A second GPU-Z log, this time pushing a bit more

In this new log, I pushed the video card a bit more. We can see there is an
sclk 500MHz with a mclk of 650MHz (VDDC @ 1.063V), another intermediate power
level using a sclk 500MHz with a mclk of 1300MHz (VDDC @ 1.063V) and finally a
last power level using a sclk 830MHz with a mclk of 1300MHz (also using a VDDC
@ 1.063V).

So, we are missing power level under Linux AND we are not using the same VDDC
above the lowest power level. This fact could also explain why we had to limit
sclk and mclk (800MHz and 1250MHz) on cards that are not using stock speed
(overclocked cards, like mine) if we were not using a high enough VDDC, isn't
it?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #43 from Alexandre Demers  ---
Created attachment 89884
  --> https://bugs.freedesktop.org/attachment.cgi?id=89884=edit
GPU-Z Cayman log playing a Youtube video

I used GPU-Z under Windows to monitor mclk, sclk, VDDC and temperature.
Temperature was pretty much the same as the one I had under Linux.

However, power levels are different.

First, under Windows, anytime the memory goes above the default mclk speed
(150MHz), VDDC is set to 1.063V even if the GPU is not running at full speed.
Under Linux, VDDC is set to 1.060V only if both mclk and sclk are running at
full speed. Otherwise, VDDC is kept at 1.000V. Why this difference?

Next, there is an intermediate power level where sclk runs at 500MHz and mclk
runs at 650MHz with a VDDC at 1.063V. Even when running in 1080p, it never went
above that speed. Under Linux, the intermediate power level (power level 1)
uses a sclk at 500MHz (same) and mclk at 1300MHz (twice as fast, the same as
the maximum power level 2) with only a VDDC of 1.000V.

There was no indication about the VDDCI.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #42 from Alexandre Demers  ---
(In reply to comment #41)
> (In reply to comment #40)
> > Do you think I could try to force to vddc=1060 in ni_apply_state_adjust()
> > instead of 1000 when mclk=13 to see if the system becomes stable?
> 
> The driver will currently limit the mclk to 125000 since that's the max
> level in the vddc/mclk dep table.  You could try it.  The driver will use
> 1060 when the sclk is 8.

Well, it isn't that... I'll look at mclk again if I limit it a bit lower
(12 maybe). I ran a video on Youtube (no UVD) and it hangeg again. The only
way I don't have this problem is by forcing the power level (sorry, I was
calling it performance level earlier). Doing so is OK almost everytime while
using the auto mode freezes the card.

I also had a look under Windows. I played the same video and I used GPU-Z to
get a log. I'm attaching it right away to compare what I get under Linux VS
Windows.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #41 from Alex Deucher  ---
(In reply to comment #40)
> Do you think I could try to force to vddc=1060 in ni_apply_state_adjust()
> instead of 1000 when mclk=13 to see if the system becomes stable?

The driver will currently limit the mclk to 125000 since that's the max level
in the vddc/mclk dep table.  You could try it.  The driver will use 1060 when
the sclk is 8.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-27 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #40 from Alexandre Demers  ---
(In reply to comment #39)
> > - Third one: it was previously said mclk is tied to vddci AND vddc. Wouldn't
> > there be a chance we could encounter a problem here if vddc=1000 and not
> > 1060 when running at full speed?
> 
> Sure.  That's why we have the ni_apply_state_adjust_rules() to make sure the
> power state is valid based on the current requirements.

Thank you for all the explainations.

Do you think I could try to force to vddc=1060 in ni_apply_state_adjust()
instead of 1000 when mclk=13 to see if the system becomes stable?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-26 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #39 from Alex Deucher  ---
(In reply to comment #38)
> A couple of observations. I've forced power state to performance and
> performance state to high. Here is the result:
> [root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
> uvdvclk: 0 dclk: 0
> power level 2sclk: 83000 mclk: 13 vddc: 1060 vddci: 1150
> 
> Now, keeping this configuration, I launched a video relying on UVD. The
> result downclocks the core clock from 830 (probably limited to 800 as we
> know) to 725.
> [root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
> uvdvclk: 54000 dclk: 4
> power level 2sclk: 72500 mclk: 13 vddc: 1060 vddci: 1150
> 
> However, if I don't force this power and performance states combination
> (letting it as balanced and auto or performance and auto), I have the
> following:
> [root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
> uvdvclk: 54000 dclk: 4
> power level 0sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
> [root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
> uvdvclk: 54000 dclk: 4
> power level 2sclk: 72500 mclk: 13 vddc: 1060 vddci: 1150
> 
> As you can see, it will adapt to the needed performance state.
> 

Not exactly.  Here are the power states defined for your system:

 == power state 0 ==
  ui class: none
  internal class: boot 
  caps: 
  uvdvclk: 0 dclk: 0
  power level 0sclk: 25000 mclk: 15000 vddc: 1000 vddci: 1000
  power level 1sclk: 25000 mclk: 15000 vddc: 1000 vddci: 1000
  power level 2sclk: 25000 mclk: 15000 vddc: 1000 vddci: 1000
  status: c r b 
 == power state 1 ==
  ui class: performance
  internal class: none
  caps: 
  uvdvclk: 0 dclk: 0
  power level 0sclk: 25000 mclk: 15000 vddc: 900 vddci: 950
  power level 1sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
  power level 2sclk: 83000 mclk: 13 vddc: 1060 vddci: 1150
  status: 
 == power state 2 ==
  ui class: none
  internal class: uvd 
  caps: video 
  uvdvclk: 54000 dclk: 4
  power level 0sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
  power level 1sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
  power level 2sclk: 72500 mclk: 13 vddc: 1060 vddci: 1150
  status: 
 == power state 3 ==
  ui class: none
  internal class: uvd_mvc 
  caps: video 
  uvdvclk: 7 dclk: 56000
  power level 0sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
  power level 1sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
  power level 2sclk: 72500 mclk: 13 vddc: 1060 vddci: 1150
  status: 

When you select performance, battery, or balanced state on your system, power
state 1 is used.  When you activate UVD, the driver selects power state 2. 
When UVD is not in use, the previously active power state (1 in this case) is
selected again.  The driver only selects the power state.  The power levels
(0-2) within the power state are either selected automatically by the hw based
on GPU load (auto) or forced to power level 0 or 2 (low or high) is you force
the performance level.  When you force the performance level to high, it will
apply to all power states that you select (both power state 1 and 2) which is
why you see the UVD state using 500 vs. 725 Mhz.

> - So, first question: is it expected to see a lowered sclk when UVD is
> active?

Yes since the driver selects a different power state (one tailored to the
requirements of the UVD block for smooth video playback).

> - Second one: when the performance is changed automatically (auto), could we
> be triggering a performance state change too quickly?

The driver doesn't trigger performance level changes, the hw does.  The driver
just selects the overall power state.  

> - Third one: it was previously said mclk is tied to vddci AND vddc. Wouldn't
> there be a chance we could encounter a problem here if vddc=1000 and not
> 1060 when running at full speed?

Sure.  That's why we have the ni_apply_state_adjust_rules() to make sure the
power state is valid based on the current requirements.

> - Last one: is there a way to monitor the GPU temperature and/or the GPU fan
> speed? (even at full speed when highly solicited, the fan is not running as
> fast as when dpm=0. I'm wondering if I'm not overheating from time to time).

You can see the temperature in sysfs.  There should be a entry under
/sys/class/hwmon/ for radeon.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-26 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #38 from Alexandre Demers  ---
A couple of observations. I've forced power state to performance and
performance state to high. Here is the result:
[root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
uvdvclk: 0 dclk: 0
power level 2sclk: 83000 mclk: 13 vddc: 1060 vddci: 1150

Now, keeping this configuration, I launched a video relying on UVD. The result
downclocks the core clock from 830 (probably limited to 800 as we know) to 725.
[root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
uvdvclk: 54000 dclk: 4
power level 2sclk: 72500 mclk: 13 vddc: 1060 vddci: 1150

However, if I don't force this power and performance states combination
(letting it as balanced and auto or performance and auto), I have the
following:
[root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
uvdvclk: 54000 dclk: 4
power level 0sclk: 5 mclk: 13 vddc: 1000 vddci: 1150
[root at Xander device]# more /sys/kernel/debug/dri/64/radeon_pm_info
uvdvclk: 54000 dclk: 4
power level 2sclk: 72500 mclk: 13 vddc: 1060 vddci: 1150

As you can see, it will adapt to the needed performance state.

- So, first question: is it expected to see a lowered sclk when UVD is active?
- Second one: when the performance is changed automatically (auto), could we be
triggering a performance state change too quickly?
- Third one: it was previously said mclk is tied to vddci AND vddc. Wouldn't
there be a chance we could encounter a problem here if vddc=1000 and not 1060
when running at full speed?
- Last one: is there a way to monitor the GPU temperature and/or the GPU fan
speed? (even at full speed when highly solicited, the fan is not running as
fast as when dpm=0. I'm wondering if I'm not overheating from time to time).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-25 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #37 from Alexandre Demers  ---
(In reply to comment #36)
> (In reply to comment #35)
> > To be noted: I was using a balanced power state this time, not a performance
> > power state. To be investigated.
> 
> On most cards there are only performance states.  Selecting balanced also
> selects performance.  You are probably using the same state in both cases.

This is also what it seems. However, it may be completly unrelated, but I only
had hangs when using the balanced setting.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-25 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

--- Comment #36 from Alex Deucher  ---
(In reply to comment #35)
> To be noted: I was using a balanced power state this time, not a performance
> power state. To be investigated.

On most cards there are only performance states.  Selecting balanced also
selects performance.  You are probably using the same state in both cases.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 69723] GPU lockups with kernel 3.11.0 / 3.12-rc1 when dpm=1 on r600g (Cayman)

2013-11-25 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=69723

Alexandre Demers  changed:

   What|Removed |Added

Summary|GPU lockups with kernel |GPU lockups with kernel
   |3.11.0 / 3.12-rc1 (with bug |3.11.0 / 3.12-rc1 when
   |68235's patches applied)|dpm=1 on r600g (Cayman)
   |when dpm=1 on r600g |
   |(Cayman)|

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: