Re: AMD EPYC throttled to 400 mhz

2022-01-21 Thread Andrei POPESCU
On Lu, 17 ian 22, 22:08:43, Alexander V. Makartsev wrote:
> On 17.01.2022 18:40, Simon Kainz wrote:
> > 
> > I did not set/change governor/driver settings, this is a stock debian
> > kernel.
> Is the server platform runs latest BIOS and firmware?
> Things I'd try first if I was in your place.
> I always flash latest firmware available as a pre-sale procedure, or during
> server installation.

Installing amd64-microcode (from nonfree) might be a good idea as well.
 
Kind regards,
Andrei
-- 
http://wiki.debian.org/FAQsFromDebianUser


signature.asc
Description: PGP signature


Re: AMD EPYC throttled to 400 mhz

2022-01-17 Thread Simon Kainz


Am 17.01.22 um 22:53 schrieb Mike Kupfer:
> Simon Kainz wrote:
> 
>> #Governor:
>> root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
>> schedutil
> 
> Maybe try a different governor?  I had a different problem (CPU running
> too hot) after upgrading to Bullseye, and the problem went away after I
> switched to the ondemand governor.

Thanks for the tip. Well, i now changed the governor to "performance",
as my hosts are all compute nodes on a HPC system, so no real reason for
cpu throttling anyway.

Maybe this helps, we'll see.
Regards,

Simon


> 
> mike
> 



Re: AMD EPYC throttled to 400 mhz

2022-01-17 Thread Mike Kupfer
Simon Kainz wrote:

> #Governor:
> root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> schedutil

Maybe try a different governor?  I had a different problem (CPU running
too hot) after upgrading to Bullseye, and the problem went away after I
switched to the ondemand governor.

mike



Re: AMD EPYC throttled to 400 mhz

2022-01-17 Thread didier gaumet
Hello Simon,

Disclaimer: I am nowhere near knowledgeable regarding cpu frequency
scaling

Perhaps will you find tips in the official Lenovo doc to fine tune
power saving under Linux for Thinksystem:
https://lenovopress.com/lp0826.pdf
it seems to require setup up the UEFI accordingly to use acpi-cpufreq

and there is a recent amd-pstate similar to intel-pstate, which should
provide the best results:
https://www.phoronix.com/scan.php?page=news_item&px=AMD-PSTATE-2021




Re: AMD EPYC throttled to 400 mhz

2022-01-17 Thread Alexander V. Makartsev

On 17.01.2022 18:40, Simon Kainz wrote:


Am 17.01.22 um 11:36 schrieb Alexander V. Makartsev:

On 17.01.2022 14:41, Simon Kainz wrote:

Hello,

we are experiencing spontaneous CPU speed throttlings.

System is a Lenovo  ThinkSystem SR645 with 2
AMD EPYC 7452 32-Core Processor, running

Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64
GNU/Linux

After some time (hours, day, weeks even) the system suddenly gets
throttled to 400 Mhz (see below)

HW Vendor replies with "Debian ist not on the supported OS" list, so we
are currently fighting on our own.

Does someone else experince the same/similar issue? It seems to my as
some kind of thermal throttling, but kernel does not log thottling
events. Maybe some Debian-specific kernel setting, that influences CPU
throttling..


Are you sure it is not due to a "power save" feature for a system under
low load?

Good point, but no, because the system is under heavy load all the time,
not idling.
After throttling down to 400 mhz, system also stays at this speed. Only
system reboot mitigates the issue.


What CPU driver and Governor currently in use?
https://www.kernel.org/doc/html/latest/admin-guide/pm/working-state.html

#CPU driver:

root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
acpi-cpufreq

#Governor:
root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil

I did not set/change governor/driver settings, this is a stock debian
kernel.

Is the server platform runs latest BIOS and firmware?
Things I'd try first if I was in your place.
I always flash latest firmware available as a pre-sale procedure, or 
during server installation.


I've also found this bug report¹ . Could be the same issue with scaling 
driver, which was fixed in kernel 5.11.

Debian stable runs version 5.10.84, so test the system with newer kernel.


¹ https://bugzilla.kernel.org/show_bug.cgi?id=211305

--
With kindest regards, Alexander.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀https://www.debian.org
⠈⠳⣄


Re: AMD EPYC throttled to 400 mhz

2022-01-17 Thread Simon Kainz



Am 17.01.22 um 11:36 schrieb Alexander V. Makartsev:
> On 17.01.2022 14:41, Simon Kainz wrote:
>> Hello,
>>
>> we are experiencing spontaneous CPU speed throttlings.
>>
>> System is a Lenovo  ThinkSystem SR645 with 2
>> AMD EPYC 7452 32-Core Processor, running
>>
>> Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64
>> GNU/Linux
>>
>> After some time (hours, day, weeks even) the system suddenly gets
>> throttled to 400 Mhz (see below)
>>
>> HW Vendor replies with "Debian ist not on the supported OS" list, so we
>> are currently fighting on our own.
>>
>> Does someone else experince the same/similar issue? It seems to my as
>> some kind of thermal throttling, but kernel does not log thottling
>> events. Maybe some Debian-specific kernel setting, that influences CPU
>> throttling..
>>
> Are you sure it is not due to a "power save" feature for a system under
> low load?

Good point, but no, because the system is under heavy load all the time,
not idling.
After throttling down to 400 mhz, system also stays at this speed. Only
system reboot mitigates the issue.

> What CPU driver and Governor currently in use?
> https://www.kernel.org/doc/html/latest/admin-guide/pm/working-state.html

#CPU driver:

root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
acpi-cpufreq

#Governor:
root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
schedutil

I did not set/change governor/driver settings, this is a stock debian
kernel.
> 
> Is the CPU temperature ok?
> Since this is a server platform, it could be due to wrong installation
> of FANs/Radiators/Air ducts and shields/etc.> Check it with "sensors".
yes, good point, but CPU/temp/fans are all ok. BMC, ipmi and management
interface all show no issues whatsovers.

Regards,

Simon



Re: AMD EPYC throttled to 400 mhz

2022-01-17 Thread Alexander V. Makartsev

On 17.01.2022 14:41, Simon Kainz wrote:

Hello,

we are experiencing spontaneous CPU speed throttlings.

System is a Lenovo  ThinkSystem SR645 with 2
AMD EPYC 7452 32-Core Processor, running

Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64
GNU/Linux

After some time (hours, day, weeks even) the system suddenly gets
throttled to 400 Mhz (see below)

HW Vendor replies with "Debian ist not on the supported OS" list, so we
are currently fighting on our own.

Does someone else experince the same/similar issue? It seems to my as
some kind of thermal throttling, but kernel does not log thottling
events. Maybe some Debian-specific kernel setting, that influences CPU
throttling..

Are you sure it is not due to a "power save" feature for a system under 
low load?

What CPU driver and Governor currently in use?
https://www.kernel.org/doc/html/latest/admin-guide/pm/working-state.html

Is the CPU temperature ok?
Since this is a server platform, it could be due to wrong installation 
of FANs/Radiators/Air ducts and shields/etc.

Check it with "sensors".

--
With kindest regards, Alexander.

⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄



AMD EPYC throttled to 400 mhz

2022-01-17 Thread Simon Kainz
Hello,

we are experiencing spontaneous CPU speed throttlings.

System is a Lenovo  ThinkSystem SR645 with 2
AMD EPYC 7452 32-Core Processor, running

Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64
GNU/Linux

After some time (hours, day, weeks even) the system suddenly gets
throttled to 400 Mhz (see below)

HW Vendor replies with "Debian ist not on the supported OS" list, so we
are currently fighting on our own.

Does someone else experince the same/similar issue? It seems to my as
some kind of thermal throttling, but kernel does not log thottling
events. Maybe some Debian-specific kernel setting, that influences CPU
throttling..



root@node3:~# lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
Address sizes:   48 bits physical, 48 bits virtual
CPU(s):  64
On-line CPU(s) list: 0-63
Thread(s) per core:  1
Core(s) per socket:  32
Socket(s):   2
NUMA node(s):2
Vendor ID:   AuthenticAMD
CPU family:  23
Model:   49
Model name:  AMD EPYC 7452 32-Core Processor
Stepping:0
Frequency boost: enabled
CPU MHz: 399.256
CPU max MHz: 3364.3550
CPU min MHz: 1500.
BogoMIPS:4691.25
Virtualization:  AMD-V
L1d cache:   2 MiB
L1i cache:   2 MiB
L2 cache:32 MiB
L3 cache:256 MiB
NUMA node0 CPU(s):   0-31
NUMA node1 CPU(s):   32-63
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf:  Not affected
Vulnerability Mds:   Not affected
Vulnerability Meltdown:  Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:Vulnerable: __user pointer sanitization
and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:Vulnerable, IBPB: disabled, STIBP: disabled
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort:   Not affected
Flags:   fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mm
 xext fxsr_opt pdpe1gb rdtscp lm
constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni
pclmulqdq
 monitor ssse3 fma cx16 sse4_1 sse4_2
movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic
cr8_leg
 acy abm sse4a misalignsse 3dnowprefetch
osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc
 mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd
mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed
 adx smap clflushopt clwb sha_ni
xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total
cqm_mbm_local c
 lzero irperf xsaveerptr rdpru wbnoinvd
amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid
 decodeassists pausefilter pfthreshold
avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca




root@node3:~# cpupower monitor
  | Mperf
 PKG|CORE| CPU| C0   | Cx   | Freq
   0|   0|   0| 99.52|  0.48|   399
   0|   1|   1| 99.52|  0.48|   399
   0|   2|   2| 99.52|  0.48|   399
   0|   3|   3| 99.52|  0.48|   399
   0|   4|   4| 99.51|  0.49|   399
   0|   5|   5| 99.51|  0.49|   399
   0|   6|   6| 99.50|  0.50|   399
   0|   7|   7| 99.50|  0.50|   399
   0|   8|   8| 99.49|  0.51|   399
   0|   9|   9| 99.49|  0.51|   399
   0|  10|  10| 99.48|  0.52|   399
   0|  11|  11| 99.48|  0.52|   399
   0|  12|  12| 99.47|  0.53|   399
   0|  13|  13| 99.46|  0.54|   399
   0|  14|  14| 99.46|  0.54|   399
   0|  15|  15| 99.45|  0.55|   399
   0|  16|  16| 99.45|  0.55|   399
   0|  17|  17| 99.45|  0.55|   399
   0|  18|  18| 99.44|  0.56|   399
.


Thanks,

Simon