[Bug 91790] TONGA hang in amdgpu_ring_lock

2019-11-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=91790

Martin Peres  changed:

   What|Removed |Added

 Resolution|--- |MOVED
 Status|NEW |RESOLVED

--- Comment #16 from Martin Peres  ---
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been
closed from further activity.

You can subscribe and participate further through the new bug through this link
to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/57.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-03 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #15 from Andy Furniss  ---
(In reply to Andy Furniss from comment #13)
> (In reply to Mathias Tillman from comment #11)
> > (In reply to Alex Deucher from comment #10)
> > > Created attachment 118056 [details] [review] [review] [review]
> > > possible fix
> > > 
> > > I think this patch should fix it.
> > 
> > No luck here I'm afraid - I'm having a hard time reproducing it during
> > normal desktop usage (with or without the patch), but it did lockup while
> > running Unigine Valley.
> 
> I see drm-next-4.3 is now ahead again, haven't tested that yet.
> 
> With patch + drm-next-4.3-wip, I haven't yet managed to lock valley - but
> I've only had time to do a couple of runs (45 min then 90 min) from a clean
> boot. Maybe later when I've been up a while doing other things I'll try
> harder.
> 
> Patch doesn't apply with git apply - did it by hand.

I managed to lock it, seems that doing "something" between runs changes things,
or first runs are lucky.

FWIW I tried running Unreal 4.5 ElementalDemo after my long runs and I got a
signal 7.

After I later locked/hung valley I rebooted and tried again elemental from a
clean boot and it ran OK, but after quitting. it now gives signal 7 again if I
try to start it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-03 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #14 from Mathias Tillman  ---
Created attachment 118060
  --> https://bugs.freedesktop.org/attachment.cgi?id=118060=edit
Output of amdgpu_regs and amdgpu_fence_info

I have attached the output of amdgpu_regs and amdgpu_fence_info. Hang is right
after the hang happened, Normal is right after a reboot after the hang (for
comparison).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-03 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #13 from Andy Furniss  ---
(In reply to Mathias Tillman from comment #11)
> (In reply to Alex Deucher from comment #10)
> > Created attachment 118056 [details] [review] [review]
> > possible fix
> > 
> > I think this patch should fix it.
> 
> No luck here I'm afraid - I'm having a hard time reproducing it during
> normal desktop usage (with or without the patch), but it did lockup while
> running Unigine Valley.

I see drm-next-4.3 is now ahead again, haven't tested that yet.

With patch + drm-next-4.3-wip, I haven't yet managed to lock valley - but I've
only had time to do a couple of runs (45 min then 90 min) from a clean boot.
Maybe later when I've been up a while doing other things I'll try harder.

Patch doesn't apply with git apply - did it by hand.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-03 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #12 from Christian König  ---
(In reply to Mathias Tillman from comment #11)
> No luck here I'm afraid - I'm having a hard time reproducing it during
> normal desktop usage (with or without the patch), but it did lockup while
> running Unigine Valley.

Assuming you can still access the box over the network after the lockup then
please provide the output of the following as root:

cat /sys/kernel/debug/dri/0/amdgpu_fence_info
hexdump -s 0x14fc -n 4 /sys/kernel/debug/dri/0/amdgpu_regs

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-02 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #11 from Mathias Tillman  ---
(In reply to Alex Deucher from comment #10)
> Created attachment 118056 [details] [review]
> possible fix
> 
> I think this patch should fix it.

No luck here I'm afraid - I'm having a hard time reproducing it during normal
desktop usage (with or without the patch), but it did lockup while running
Unigine Valley.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-02 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #10 from Alex Deucher  ---
Created attachment 118056
  --> https://bugs.freedesktop.org/attachment.cgi?id=118056=edit
possible fix

I think this patch should fix it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-02 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #9 from Mathias Tillman  ---
(In reply to Andy Furniss from comment #8)
> (In reply to Mathias Tillman from comment #7)
> > Andy: Could you try compiling the latest kernel from drm-next-4.3-wip? I've
> > been running it all day without a single lock up, before it used to lock up
> > several times a day. Just wanted someone to confirm if it is in fact
> > working, or if it's just me.
> 
> I can imaging that it's far better for desktop locks - I moved onto it when
> it got updated.
> 
> Initially testing with Unigine Valley I thought it was going to be good - I
> got further than ever before (about 4x through all the scenes having not got
> through once previously), but it did lock.

That's a shame. I'll try and see if I can find out what has caused the lockups
to stop for me, maybe that could help in finding out what's still causing them
for you.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-01 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #8 from Andy Furniss  ---
(In reply to Mathias Tillman from comment #7)
> Andy: Could you try compiling the latest kernel from drm-next-4.3-wip? I've
> been running it all day without a single lock up, before it used to lock up
> several times a day. Just wanted someone to confirm if it is in fact
> working, or if it's just me.

I can imaging that it's far better for desktop locks - I moved onto it when it
got updated.

Initially testing with Unigine Valley I thought it was going to be good - I got
further than ever before (about 4x through all the scenes having not got
through once previously), but it did lock.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-09-01 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #7 from Mathias Tillman  ---
Andy: Could you try compiling the latest kernel from drm-next-4.3-wip? I've
been running it all day without a single lock up, before it used to lock up
several times a day. Just wanted someone to confirm if it is in fact working,
or if it's just me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #6 from Christian König  ---
No, current released catalyst doesn't uses anything from the amdgpu module yet.

It's clearly not a hardware problem, but invalid render commands can cause the
hardware to lock up.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #5 from Mathias Tillman  ---
(In reply to Christian König from comment #3)
> That could just be a symptom of a hardware hang which isn't detected for
> some reason.
> 
> Please take a look at amdgpu_fence_info as well to see if there are any
> outstanding submissions.

If it's a hardware hang, wouldn't it also happen when using catalyst? It
doesn't happen there, so it should at least be possible to work around (if it
is a hardware problem).
I will continue investigating why this happens, but it does seem to me like
this, #91278, and #91676 all are caused by the same thing, but with different
log output depending on if you use drm-next-4.3 or drm-next-4.2.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #4 from Andy Furniss  ---
(In reply to Christian König from comment #3)
> That could just be a symptom of a hardware hang which isn't detected for
> some reason.

There's this - drm/amdgpu: disable GPU reset by default

http://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.3=a895c222e7ab5f50ec10e209cd4548ecd5dd9443

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #3 from Christian König  ---
That could just be a symptom of a hardware hang which isn't detected for some
reason.

Please take a look at amdgpu_fence_info as well to see if there are any
outstanding submissions.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #2 from Mathias Tillman  ---
Created attachment 117967
  --> https://bugs.freedesktop.org/attachment.cgi?id=117967=edit
dmesg with added debug output

I've done some more testing, turns out that it never reaches
amdgpu_ring_unlock_commit on certain cases, and that's what causes it to hang,
since the mutex never unlocks.
I added some debug output to the code, gfx/sdma0 is ring->name, 0/9 is
ring->idx and the address is the address of the ring struct.
As you can see in the log, it calls amdgpu_ring_lock on ring 9 with name sdma0,
and then afterwards it calls it again on ring 0 with name gfx, without calling
amdgpu_ring_unlock_commit.
I will add some more debug output in hopes of finding why exactly it's never
unlocked, and if it is fixable. I should mention that these random lockups do
not happen while using the proprietary catalyst driver, so it must be something
in the amdgpu driver.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

--- Comment #1 from Andy Furniss  ---
Created attachment 117963
  --> https://bugs.freedesktop.org/attachment.cgi?id=117963=edit
mplayer X hung task

I got a similar trace yesterday on current agd5f drm-next-4.3 while trying to
kill uvd with mplayer by repeatedly starting.

I am slightly hopeful this is a different issue from uvd as it starts with X
and I got way more starts than I recently have - 360 to get this trace after a
couple of OK 250 runs.

I haven't locked in normal use, but then my desktop setup is simple = fluxbox.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: 



[Bug 91790] TONGA hang in amdgpu_ring_lock

2015-08-28 Thread bugzilla-dae...@freedesktop.org
https://bugs.freedesktop.org/show_bug.cgi?id=91790

Bug ID: 91790
   Summary: TONGA hang in amdgpu_ring_lock
   Product: DRI
   Version: XOrg git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: DRM/AMDgpu
  Assignee: dri-devel at lists.freedesktop.org
  Reporter: master.homer at gmail.com

Created attachment 117962
  --> https://bugs.freedesktop.org/attachment.cgi?id=117962=edit
dmesg of hang

I've been getting random hangs in amdgpu_ring_lock, this causes X to hang,
meaning I can't use the computer at all. I can sometimes switch to a tty, but
this doesn't always work either.

I'm running Ubuntu 15.04 with mesa and libdrm from the oibaf ppa, with a
self-compiled xf86-video-amdgpu and a self-compiled kernel from agd5f,
drm-next-4.3-wip (9066b0c318589f47b754a3def4fe8ec4688dc21a).

I haven't been able to predict when the hang will happen, sometimes I can use
it for several hours before it hangs, other times it happens just a few minutes
after booting.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-- next part --
An HTML attachment was scrubbed...
URL: