[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2018-03-15 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

Shih-Yuan Lee  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #46 from Shih-Yuan Lee  ---
The Linux kernel of Comment 45 is 4.15.0-10.11 from Ubuntu 18.04.
When I tried a later version 4.15.0-12.13, I can not reduplicate this issue on
Ubuntu 18.04.
4.15.0-12.13 contains the following commit.

commit 239b5f64e12b1f09f506c164dff0374924782979
Author: Alex Deucher 
Date:   Tue Nov 21 12:09:38 2017 -0500

drm/radeon: Add dpm quirk for Jet PRO (v2)

Fixes stability issues.

v2: clamp sclk to 600 Mhz

Bug: https://bugs.freedesktop.org/show_bug.cgi?id=103370
Acked-by: Christian König 
Signed-off-by: Alex Deucher 
Cc: sta...@vger.kernel.org

diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index ee3e742..97a0a63 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -2984,6 +2984,11 @@ static void si_apply_state_adjust_rules(struct
radeon_device *rdev,
(rdev->pdev->device == 0x6667)) {
max_sclk = 75000;
}
+   if ((rdev->pdev->revision == 0xC3) ||
+   (rdev->pdev->device == 0x6665)) {
+   max_sclk = 6;
+   max_mclk = 8;
+   }
} else if (rdev->family == CHIP_OLAND) {
if ((rdev->pdev->revision == 0xC7) ||
(rdev->pdev->revision == 0x80) ||

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2018-03-14 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #45 from Shih-Yuan Lee  ---
I can still reduplicate this issue on Ubuntu 18.04 by `seq 100 | while read i;
do echo Loop $i; DRI_PRIME=1 glxgears -info|head -n2; done`.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2018-01-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #44 from Shih-Yuan Lee  ---
I tried max_sclk = 5 and max_mclk = 6 on Ubuntu-4.4.0-112.135, but I
can still reduplicate the GPU lock up issue.
It can pass the first run of `seq 100 | while read i; do echo Loop $i;
DRI_PRIME=1 glxgears -info|head -n 3; done`.
But it failed when I tried the second run of `seq 100 | while read i; do echo
Loop $i; DRI_PRIME=1 glxgears -info|head -n 3; done`.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2018-01-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #43 from Shih-Yuan Lee  ---
I can still reduplicate the issue after setting max_sclk to 6 and max_mclk
to 8.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-24 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #42 from Michel Dänzer  ---
(In reply to Robert Liu from comment #41)
> Another issue found is when removing the adapter, the system goes to
> suspend.

That's not directly related to graphics drivers.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-23 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #41 from Robert Liu  ---
So far, setting max_sclk to 6 and max_mclk to 8, the system passed a
24hours burn-in test (vblank_mode=0 DRI_PRIME=1 glmark2 --run-forever).

Another issue found is when removing the adapter, the system goes to suspend.
After I wake it up, it continues running the benchmark.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #40 from Shih-Yuan Lee  ---
Created attachment 135662
  --> https://bugs.freedesktop.org/attachment.cgi?id=135662=edit
dmesg

(In reply to Alex Deucher from comment #38)
> Created attachment 135647 [details] [review]
> workaround for radeon
> 
> workarounds for radeon and amdgpu to fix the issue.

I applied this patch on top of Ubuntu-4.4.0-101.124 Linux kernel and it seems
to fix the issue in the beginning.
But it has some problem later on.

$ seq 20 | while read i; do echo Loop $i; DRI_PRIME=1 glxgears -info|head -n 5;
done
Loop 1
radeon: Failed to allocate virtual address for buffer:
radeon:size  : 65536 bytes
radeon:alignment : 4096 bytes
radeon:domains   : 4
radeon:va: 0x0080
radeon: Failed to deallocate virtual address for buffer:
radeon:size  : 65536 bytes
radeon:va: 0x80
radeon: Failed to allocate virtual address for buffer:
radeon:size  : 65536 bytes
radeon:alignment : 4096 bytes
radeon:domains   : 4
radeon:va: 0x0080
radeon: Failed to deallocate virtual address for buffer:
radeon:size  : 65536 bytes
radeon:va: 0x80
radeonsi: Failed to create a context.
Loop 2
...

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #39 from Alex Deucher  ---
Created attachment 135648
  --> https://bugs.freedesktop.org/attachment.cgi?id=135648=edit
workaround for amdgpu

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #38 from Alex Deucher  ---
Created attachment 135647
  --> https://bugs.freedesktop.org/attachment.cgi?id=135647=edit
workaround for radeon

workarounds for radeon and amdgpu to fix the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #37 from Robert Liu  ---
(In reply to Robert Liu from comment #36)
> BTW, with 4.13.0-16-generic, I change the max_sclk in drm/radeon/si_dpm.c
> (what we did with Ubuntu kernel 4.4.0-101-generic) from 75000 to 65000, but
> still met the hang issue.
By restricting max_sclk to 65000 and max_mclk to 8, both radeon and amdgpu
do not have the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-21 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #36 from Robert Liu  ---
(In reply to Alex Deucher from comment #33)
> I think Sonny fixed this.  It was due to using the wrong firmware.
> [1.827060] [drm] initializing kernel modesetting (HAINAN 0x1002:0x6665
> 0x1028:0x0844 0xC3).  This chip should be using radeon/banks_k_2_smc.bin smc
> firmware.  Is that available on the test system and kernel?
The firmware radeon/banks_k_2_smc.bin is on the test system.
With Ubuntu kernel 4.4.0-101-generic, I am not pretty sure the radeon driver is
using this firmware.
With Ubuntu kernel 4.13.0-16-generic, I tried both amdgpu and radeon drivers,
but the system hang. as soon as the system hang, the amdgpu_pm_info shows
'invalid dpm profile 15'.

(In reply to Alex Deucher from comment #34)
> The following commits are relevant:
> abb2e3c1ce64c8bba678973800c34ea1dc97c42c
> 6458bd4dfd9414cba5804eb9907fe2a824278c34
> ef736d394e85b1bf1fd65ba5e5257b85f6c82325
> 4e6e98b1e48c9474aed7ce03025ec319b941e26e
These commits would be already included in Ubuntu kernel 4.13.0-16-generic.

(In reply to Alex Deucher from comment #35)
> Does reverting a628392cf03e0eef21b345afbb192cbade041741 fix the issue?
Removing this commit does not fix the issue.


BTW, with 4.13.0-16-generic, I change the max_sclk in drm/radeon/si_dpm.c (what
we did with Ubuntu kernel 4.4.0-101-generic) from 75000 to 65000, but still met
the hang issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #35 from Alex Deucher  ---
Does reverting a628392cf03e0eef21b345afbb192cbade041741 fix the issue?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #34 from Alex Deucher  ---
The following commits are relevant:
abb2e3c1ce64c8bba678973800c34ea1dc97c42c
6458bd4dfd9414cba5804eb9907fe2a824278c34
ef736d394e85b1bf1fd65ba5e5257b85f6c82325
4e6e98b1e48c9474aed7ce03025ec319b941e26e

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #33 from Alex Deucher  ---
(In reply to Michel Dänzer from comment #27)
> Thanks for bisecting, but I don't think that commit can be directly
> responsible for a GPU hang. Before that commit, the DRI3 code in Mesa would
> only use one back buffer for glxgears, which means that the GPU could only
> start rendering a new frame after the previous one had finished presenting.
> Maybe that somehow prevented the hang.

That commit "fixed" a performance regression at the time because it ended up
causing enough of a delay that the clocks didn't ramp up.  So it probably
exposed a kernel dpm issue.  Without it, the clocks never ramped up enough to
cause an issue.  With it, they did.


(In reply to Timo Aaltonen from comment #32)
> forwarding a comment from an engineer:
> 
> "During viewing the source code of radeon module, I found there is a bug [1]
> related to the dpm and clocks. So I decided to do some experiments.
> Tried to set different max_sclk and max_mclk to see if the issue is gone.
> 1. max_sclk: 7, max_mclk: 75000 --> have the same issue
> 2. max_sclk: 5, max_mclk: 6 --> pass multi-run test (more than 50
> runs)
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=76490
> "

I think Sonny fixed this.  It was due to using the wrong firmware.
[1.827060] [drm] initializing kernel modesetting (HAINAN 0x1002:0x6665
0x1028:0x0844 0xC3).  This chip should be using radeon/banks_k_2_smc.bin smc
firmware.  Is that available on the test system and kernel?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-19 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #32 from Timo Aaltonen  ---
forwarding a comment from an engineer:

"During viewing the source code of radeon module, I found there is a bug [1]
related to the dpm and clocks. So I decided to do some experiments.
Tried to set different max_sclk and max_mclk to see if the issue is gone.
1. max_sclk: 7, max_mclk: 75000 --> have the same issue
2. max_sclk: 5, max_mclk: 6 --> pass multi-run test (more than 50 runs)

[1] https://bugs.freedesktop.org/show_bug.cgi?id=76490
"

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #31 from Michel Dänzer  ---
With vblank_mode=0, the only thing that can prevent tearing is luck.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #30 from Shih-Yuan Lee  ---
Tearing won't happen on battery power, but it will only happen when plugged in
AC power.
Is this behavior also expected?

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

--- Comment #29 from Michel Dänzer  ---
Tearing is expected with vblank_mode=0.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 103370] `vblank_mode=0 DRI_PRIME=1 glxgears` will introduce GPU lock up on Intel Graphics [8086:5917] + AMD Graphics [1002:6665] (rev c3)

2017-11-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=103370

Shih-Yuan Lee  changed:

   What|Removed |Added

Summary|`DRI_PRIME=1 glxgears   |`vblank_mode=0 DRI_PRIME=1
   |-info` halts the system |glxgears` will introduce
   |with Intel Graphics |GPU lock up on Intel
   |[8086:5917] + AMD Graphics  |Graphics [8086:5917] + AMD
   |[1002:6665] (rev c3)|Graphics [1002:6665] (rev
   ||c3)

-- 
You are receiving this mail because:
You are the assignee for the bug.___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel