Public bug reported:

Previously reported here as a kernel bug (
https://bugs.launchpad.net/ubuntu/+source/linux-signed-
hwe-5.8/+bug/1910562 ), now after finding root cause reporting here
also.

After updating via apt dist-upgrade from kernel 5.4.0-59 to kernel
5.8.0-34 the fan on my machine started switching on (for an instant) and
off every 10 seconds even when idle with CPU at 48/50°C.

Switching back to previous kernel solves temporary the problem, i.e.
fans are always off with light desktop work.

The new behavior is really annoying and I guess not healthy for the
fans.

I'm on latest Dell bios, with every other package updated.

So, after testing several different kernels and live distros to pinpoint
this bug, I finally found out the problem: it's an interaction between
lm-sensors and amdgpu driver with kernel > 5.4.0.

I found out by chance because I noticed the problem happened only after
logging in with a graphical session.

This is what is happening:
- a gnome extension to monitor sensors/temps calls the 'sensors' utility from 
package lm-sensors every 10 senconds
- sensors 'hangs' for a couple of seconds when poking something related to the 
amdgpu driver
- amdgpu driver spits some warning/errors on vt console and dmesg
- fans starts spinning for one sec
- then sensors continue normally displaying the readouts from other sensor

This is the output of 'sensors', taken in a non-graphical console
(ctr+alt+F3) with kernel 5.8.0-41:

[UNRELATED OUTPUT]

amdgpu-pci-0100
Adapter: PCI adapter
[ 112.780951] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* 
dce110_edp_wait_for_hpd_ready: wait timed out!
[ 113.380939] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* 
dce110_edp_wait_for_hpd_ready: wait timed out!
vddgfx: 1.05 V
edge: +44.0°C (crit = +94.0°C, hyst = -273.1°C)
power1: 7.12 W (cap = 35.00 W)

[UNRELATED OUTPUT]

This is the complete kernel log from amgpu when this happens:

[ 111.572873] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 112.780951] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* 
dce110_edp_wait_for_hpd_ready: wait timed out!
[ 113.380939] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* 
dce110_edp_wait_for_hpd_ready: wait timed out!
[ 113.411556] [drm] UVD and UVD ENC initialized successfully.
[ 113.521534] [drm] VCE initialized successfully.

It seems that lm-sensors poking the amdgpu thermal sensor i triggering
some sort of reset and/or causing the thermal infrastructure to spin up
the fans

Note that this is not happening with kernel 5.4, with which sensor
reports this:

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx: N/A
edge: N/A (crit = +94.0°C, hyst = -273.1°C)
power1: N/A (cap = 35.00 W)

[UNRELATED OUTPUT]

Note the missing data about amdgpu and no console kernel warning
messages.

Disabling the gnome sensor check extension solves the problem for now,
but there is definitely something going on here.

Please feel free to ask me for anything I can do/test to help solve this
problem

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: lm-sensors 1:3.6.0-2ubuntu1
ProcVersionSignature: Ubuntu 5.8.0-41.46~20.04.1-generic 5.8.18
Uname: Linux 5.8.0-41-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.16
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Wed Feb  3 14:22:37 2021
InstallationDate: Installed on 2020-05-06 (273 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
SourcePackage: lm-sensors
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: lm-sensors (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 amdgpu apport-bug focal lm-sensors thermal

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1914414

Title:
  amdgpu driver interaciton with lm-sensor causes fans to spin up in
  kernel 5.8

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lm-sensors/+bug/1914414/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to