[Bug 204609] amdgpu: powerplay failed send message

2020-08-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

--- Comment #10 from w...@hllmnn.de ---
Created attachment 290911
  --> https://bugzilla.kernel.org/attachment.cgi?id=290911&action=edit
journalctl output

Adding journalctl output

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 204609] amdgpu: powerplay failed send message

2020-08-15 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

w...@hllmnn.de changed:

   What|Removed |Added

 CC||w...@hllmnn.de

--- Comment #9 from w...@hllmnn.de ---
I can reproduce this bug with kernel 5.7.0 from Debian bullseye (5.7.0-2-amd64)
when using a dual monitor setup. If I boot with only one monitor connected and
connect the second one later, everything seems to work fine. Both monitors are
connected via DisplayPort, but it does not change anything if I connect one of
them via HDMI.

The significant errors in dmesg read:

Aug 15 17:15:03 dino kernel: failed send message: TransferTableSmu2Dram (18)   
 param: 0x0006 response 0xffc2
Aug 15 17:15:03 dino kernel: Failed to export SMU metrics table!
Aug 15 17:15:05 dino kernel: Msg issuing pre-check failed and SMU may be not in
the right state!
Aug 15 17:15:05 dino kernel: [drm:amdgpu_dpm_enable_uvd [amdgpu]] *ERROR* Dpm
enable uvd failed, ret = -62. 
Aug 15 17:15:06 dino kernel: amdgpu :09:00.0: [drm:amdgpu_ib_ring_tests
[amdgpu]] *ERROR* IB test failed on vcn_enc0 (-110).
Aug 15 17:15:07 dino kernel: amdgpu :09:00.0: [drm:amdgpu_ib_ring_tests
[amdgpu]] *ERROR* IB test failed on vcn_enc1 (-110).
Aug 15 17:15:08 dino kernel: Msg issuing pre-check failed and SMU may be not in
the right state!
Aug 15 17:15:08 dino kernel: Failed to export SMU metrics table!
Aug 15 17:15:11 dino kernel: Msg issuing pre-check failed and SMU may be not in
the right state!
Aug 15 17:15:11 dino kernel: [drm:jpeg_v2_0_set_powergating_state [amdgpu]]
*ERROR* Dpm enable jpeg failed, ret = -62. 
Aug 15 17:15:11 dino kernel: [drm:process_one_work] *ERROR* ib ring test failed
(-110).

I am using the firmware files from that commit:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=49e9ea898f870ae09b91ccd3dd1c45d520fcb0c3
(commit date: 2020-08-07)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 204609] amdgpu: powerplay failed send message

2020-07-03 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

--- Comment #8 from Mikhail Tuchkov (tuchkov.mikh...@gmail.com) ---
Created attachment 290075
  --> https://bugzilla.kernel.org/attachment.cgi?id=290075&action=edit
dmesg

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 204609] amdgpu: powerplay failed send message

2020-07-03 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

Mikhail Tuchkov (tuchkov.mikh...@gmail.com) changed:

   What|Removed |Added

 CC||tuchkov.mikh...@gmail.com

--- Comment #7 from Mikhail Tuchkov (tuchkov.mikh...@gmail.com) ---
Same hardware: Sapphire RX 5700 and X570 and similar issues with GPU.
Attaching dmesg output

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 204609] amdgpu: powerplay failed send message

2020-04-10 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

Janpieter Sollie (janpieter.sol...@edpnet.be) changed:

   What|Removed |Added

 CC||janpieter.sol...@edpnet.be

--- Comment #6 from Janpieter Sollie (janpieter.sol...@edpnet.be) ---
I hope this is the right place to add some comments:
Using 5.5.14, when the GPU runs at full speed, it seems to overheat after some
time.
I'd expect powerplay to take care of this critical temperature and slow down
the GPU.
instead, the GPU overheats and a system reset is neccessary.
The log is flooded with:
Apr 11 06:45:54 linuxserver kernel: amdgpu :05:00.0: GPU reset begin!
Apr 11 06:45:54 linuxserver kernel: [drm] Timeout wait for RLC serdes 0,0
Apr 11 06:45:56 linuxserver last message repeated 4 times
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 133 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 5e ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 145 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 146 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 148 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 145 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 146 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 16a ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 186 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 54 ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  last message was failed ret is 65535
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay] 
Apr 11 06:45:57 linuxserver kernel:  failed to send message 13d ret is 65535 
Apr 11 06:45:57 linuxserver kernel: amdgpu: [powerplay]

Then after some time, the kernel realizes it did something wrong, but it seems
too late:

[  509.939497] amdgpu: [powerplay] Failed to force to switch arbf0!
[  509.939497] amdgpu: [powerplay] [disable_dpm_tasks] Failed to disable DPM!
[  509.939513] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend
of IP block  failed -22
[  510.203234] amdgpu :05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  510.203254] [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[  510.313006] AMD-Vi: Completion-Wait loop timed out
[  510.730195] cp is busy, skip halt cp
[  510.744990] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
device=05:00.0 address=0x5f38cdcf0]
[  510.993797] rlc is busy, skip halt rlc
[  511.312957] AMD-Vi: Completion-Wait loop timed out
[  511.522695] AMD-Vi: Completion-Wait loop timed out
[  511.742668] AMD-Vi: Completion-Wait loop timed out
[  511.746867] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
device=05:00.0 address=0x5f38cdd20]
[  512.748750] iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
device=05:00

[Bug 204609] amdgpu: powerplay failed send message

2019-12-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

Błażej Szczygieł (spa...@wp.pl) changed:

   What|Removed |Added

 CC||spa...@wp.pl

--- Comment #5 from Błażej Szczygieł (spa...@wp.pl) ---
Created attachment 286329
  --> https://bugzilla.kernel.org/attachment.cgi?id=286329&action=edit
kernel5.4 powerplay add mutex

Also the same as: https://gitlab.freedesktop.org/drm/amd/issues/900

In kernel 5.4.3 run a few times `watch -n 0.1 cat
/sys/class/drm/card0/device/hwmon/hwmon*/fan1_input` in parallel and you'll
have a lot of warning and finally system freeze.

In kernel 5.5-rc2 powerplay warnings and finally system freeze can happen when
sensors are read and screen mode is changed (like screen resolution change,
etc).

In my opinion a lot of powerplay issues are caused by missing mutexes between
kernel and user space. I prepared a very simple kernel 5.4 patch which allows
me to use fan monitor, but probably there are still some missing mutexes there.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Bug 204609] amdgpu: powerplay failed send message

2019-12-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

Jon (j...@moozaad.co.uk) changed:

   What|Removed |Added

 CC||j...@moozaad.co.uk

--- Comment #4 from Jon (j...@moozaad.co.uk) ---
Likely the same as this... 

https://gitlab.freedesktop.org/drm/amd/issues/929

Multiple Display ports. If you have flickering too (not sure if it's all
related) there's a dpm override you can try.

`echo "low" > /sys/class/drm/card0/device/power_dpm_force_performance_level`

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 204609] amdgpu: powerplay failed send message

2019-11-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

--- Comment #3 from Guido Winkelmann (guido-kern-b...@unknownsite.de) ---
Created attachment 285867
  --> https://bugzilla.kernel.org/attachment.cgi?id=285867&action=edit
dmesg output from another system

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 204609] amdgpu: powerplay failed send message

2019-11-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

Guido Winkelmann (guido-kern-b...@unknownsite.de) changed:

   What|Removed |Added

 CC||guido-kern-bugs@unknownsite
   ||.de

--- Comment #2 from Guido Winkelmann (guido-kern-b...@unknownsite.de) ---
I have very similar problems with Kernel 5.4-rc7. In my case it's a Sapphire
5700XT, and I am using Gentoo instead of Ubuntu, however, the problems start
way before any userspace code is loaded.

In my case, the problems cause long delays on system boot, once before loading
init, once while X is loading. As a very rough estimate, I think the delays
amount to about 2 minutes. Once X is loaded, I can no longer see any delays.

All sensors on the board are completely non-functional. Here is a sample output
from "sensors":

==
amdgpu-pci-0a00
Adapter: PCI adapter
vddgfx:   +0.75 V  
ERROR: Can't get value of subfeature fan1_input: I/O error
fan1: N/A  (min =0 RPM, max = 3200 RPM)
ERROR: Can't get value of subfeature temp1_input: I/O error
edge: N/A  (crit = +118.0°C, hyst = -273.1°C)
   (emerg = +99.0°C)
ERROR: Can't get value of subfeature temp2_input: I/O error
junction: N/A  (crit = +99.0°C, hyst = -273.1°C)
   (emerg = +99.0°C)
ERROR: Can't get value of subfeature temp3_input: I/O error
mem:  N/A  (crit = +99.0°C, hyst = -273.1°C)
   (emerg = +99.0°C)
ERROR: Can't get value of subfeature power1_average: I/O error
power1:   N/A  (cap = 195.00 W)

k10temp-pci-00c3
Adapter: PCI adapter
Tdie: +31.5°C  (high = +70.0°C)
Tctl: +41.5°C
==

I did not experience any of these problems with 5.3.10.

I have three monitors connected to this board, all of them via DisplayPort.

I am attaching another dmesg log.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 204609] amdgpu: powerplay failed send message

2019-10-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204609

Witold Baryluk (bary...@smp.if.uj.edu.pl) changed:

   What|Removed |Added

 CC||bary...@smp.if.uj.edu.pl

--- Comment #1 from Witold Baryluk (bary...@smp.if.uj.edu.pl) ---
Do you use two displays by any chance?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel