Re: AMD EPYC throttled to 400 mhz
On Lu, 17 ian 22, 22:08:43, Alexander V. Makartsev wrote: > On 17.01.2022 18:40, Simon Kainz wrote: > > > > I did not set/change governor/driver settings, this is a stock debian > > kernel. > Is the server platform runs latest BIOS and firmware? > Things I'd try first if I was in your place. > I always flash latest firmware available as a pre-sale procedure, or during > server installation. Installing amd64-microcode (from nonfree) might be a good idea as well. Kind regards, Andrei -- http://wiki.debian.org/FAQsFromDebianUser signature.asc Description: PGP signature
Re: AMD EPYC throttled to 400 mhz
Am 17.01.22 um 22:53 schrieb Mike Kupfer: > Simon Kainz wrote: > >> #Governor: >> root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor >> schedutil > > Maybe try a different governor? I had a different problem (CPU running > too hot) after upgrading to Bullseye, and the problem went away after I > switched to the ondemand governor. Thanks for the tip. Well, i now changed the governor to "performance", as my hosts are all compute nodes on a HPC system, so no real reason for cpu throttling anyway. Maybe this helps, we'll see. Regards, Simon > > mike >
Re: AMD EPYC throttled to 400 mhz
Simon Kainz wrote: > #Governor: > root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > schedutil Maybe try a different governor? I had a different problem (CPU running too hot) after upgrading to Bullseye, and the problem went away after I switched to the ondemand governor. mike
Re: AMD EPYC throttled to 400 mhz
Hello Simon, Disclaimer: I am nowhere near knowledgeable regarding cpu frequency scaling Perhaps will you find tips in the official Lenovo doc to fine tune power saving under Linux for Thinksystem: https://lenovopress.com/lp0826.pdf it seems to require setup up the UEFI accordingly to use acpi-cpufreq and there is a recent amd-pstate similar to intel-pstate, which should provide the best results: https://www.phoronix.com/scan.php?page=news_item&px=AMD-PSTATE-2021
Re: AMD EPYC throttled to 400 mhz
On 17.01.2022 18:40, Simon Kainz wrote: Am 17.01.22 um 11:36 schrieb Alexander V. Makartsev: On 17.01.2022 14:41, Simon Kainz wrote: Hello, we are experiencing spontaneous CPU speed throttlings. System is a Lenovo ThinkSystem SR645 with 2 AMD EPYC 7452 32-Core Processor, running Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux After some time (hours, day, weeks even) the system suddenly gets throttled to 400 Mhz (see below) HW Vendor replies with "Debian ist not on the supported OS" list, so we are currently fighting on our own. Does someone else experince the same/similar issue? It seems to my as some kind of thermal throttling, but kernel does not log thottling events. Maybe some Debian-specific kernel setting, that influences CPU throttling.. Are you sure it is not due to a "power save" feature for a system under low load? Good point, but no, because the system is under heavy load all the time, not idling. After throttling down to 400 mhz, system also stays at this speed. Only system reboot mitigates the issue. What CPU driver and Governor currently in use? https://www.kernel.org/doc/html/latest/admin-guide/pm/working-state.html #CPU driver: root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver acpi-cpufreq #Governor: root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor schedutil I did not set/change governor/driver settings, this is a stock debian kernel. Is the server platform runs latest BIOS and firmware? Things I'd try first if I was in your place. I always flash latest firmware available as a pre-sale procedure, or during server installation. I've also found this bug report¹ . Could be the same issue with scaling driver, which was fixed in kernel 5.11. Debian stable runs version 5.10.84, so test the system with newer kernel. ¹ https://bugzilla.kernel.org/show_bug.cgi?id=211305 -- With kindest regards, Alexander. ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system ⢿⡄⠘⠷⠚⠋⠀https://www.debian.org ⠈⠳⣄
Re: AMD EPYC throttled to 400 mhz
Am 17.01.22 um 11:36 schrieb Alexander V. Makartsev: > On 17.01.2022 14:41, Simon Kainz wrote: >> Hello, >> >> we are experiencing spontaneous CPU speed throttlings. >> >> System is a Lenovo ThinkSystem SR645 with 2 >> AMD EPYC 7452 32-Core Processor, running >> >> Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 >> GNU/Linux >> >> After some time (hours, day, weeks even) the system suddenly gets >> throttled to 400 Mhz (see below) >> >> HW Vendor replies with "Debian ist not on the supported OS" list, so we >> are currently fighting on our own. >> >> Does someone else experince the same/similar issue? It seems to my as >> some kind of thermal throttling, but kernel does not log thottling >> events. Maybe some Debian-specific kernel setting, that influences CPU >> throttling.. >> > Are you sure it is not due to a "power save" feature for a system under > low load? Good point, but no, because the system is under heavy load all the time, not idling. After throttling down to 400 mhz, system also stays at this speed. Only system reboot mitigates the issue. > What CPU driver and Governor currently in use? > https://www.kernel.org/doc/html/latest/admin-guide/pm/working-state.html #CPU driver: root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver acpi-cpufreq #Governor: root@node3:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor schedutil I did not set/change governor/driver settings, this is a stock debian kernel. > > Is the CPU temperature ok? > Since this is a server platform, it could be due to wrong installation > of FANs/Radiators/Air ducts and shields/etc.> Check it with "sensors". yes, good point, but CPU/temp/fans are all ok. BMC, ipmi and management interface all show no issues whatsovers. Regards, Simon
Re: AMD EPYC throttled to 400 mhz
On 17.01.2022 14:41, Simon Kainz wrote: Hello, we are experiencing spontaneous CPU speed throttlings. System is a Lenovo ThinkSystem SR645 with 2 AMD EPYC 7452 32-Core Processor, running Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux After some time (hours, day, weeks even) the system suddenly gets throttled to 400 Mhz (see below) HW Vendor replies with "Debian ist not on the supported OS" list, so we are currently fighting on our own. Does someone else experince the same/similar issue? It seems to my as some kind of thermal throttling, but kernel does not log thottling events. Maybe some Debian-specific kernel setting, that influences CPU throttling.. Are you sure it is not due to a "power save" feature for a system under low load? What CPU driver and Governor currently in use? https://www.kernel.org/doc/html/latest/admin-guide/pm/working-state.html Is the CPU temperature ok? Since this is a server platform, it could be due to wrong installation of FANs/Radiators/Air ducts and shields/etc. Check it with "sensors". -- With kindest regards, Alexander. ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org ⠈⠳⣄
AMD EPYC throttled to 400 mhz
Hello, we are experiencing spontaneous CPU speed throttlings. System is a Lenovo ThinkSystem SR645 with 2 AMD EPYC 7452 32-Core Processor, running Linux node3 5.10.0-10-amd64 #1 SMP Debian 5.10.84-1 (2021-12-08) x86_64 GNU/Linux After some time (hours, day, weeks even) the system suddenly gets throttled to 400 Mhz (see below) HW Vendor replies with "Debian ist not on the supported OS" list, so we are currently fighting on our own. Does someone else experince the same/similar issue? It seems to my as some kind of thermal throttling, but kernel does not log thottling events. Maybe some Debian-specific kernel setting, that influences CPU throttling.. root@node3:~# lscpu Architecture:x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 32 Socket(s): 2 NUMA node(s):2 Vendor ID: AuthenticAMD CPU family: 23 Model: 49 Model name: AMD EPYC 7452 32-Core Processor Stepping:0 Frequency boost: enabled CPU MHz: 399.256 CPU max MHz: 3364.3550 CPU min MHz: 1500. BogoMIPS:4691.25 Virtualization: AMD-V L1d cache: 2 MiB L1i cache: 2 MiB L2 cache:32 MiB L3 cache:256 MiB NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Vulnerable Vulnerability Spectre v1:Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers Vulnerability Spectre v2:Vulnerable, IBPB: disabled, STIBP: disabled Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mm xext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_leg acy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local c lzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca root@node3:~# cpupower monitor | Mperf PKG|CORE| CPU| C0 | Cx | Freq 0| 0| 0| 99.52| 0.48| 399 0| 1| 1| 99.52| 0.48| 399 0| 2| 2| 99.52| 0.48| 399 0| 3| 3| 99.52| 0.48| 399 0| 4| 4| 99.51| 0.49| 399 0| 5| 5| 99.51| 0.49| 399 0| 6| 6| 99.50| 0.50| 399 0| 7| 7| 99.50| 0.50| 399 0| 8| 8| 99.49| 0.51| 399 0| 9| 9| 99.49| 0.51| 399 0| 10| 10| 99.48| 0.52| 399 0| 11| 11| 99.48| 0.52| 399 0| 12| 12| 99.47| 0.53| 399 0| 13| 13| 99.46| 0.54| 399 0| 14| 14| 99.46| 0.54| 399 0| 15| 15| 99.45| 0.55| 399 0| 16| 16| 99.45| 0.55| 399 0| 17| 17| 99.45| 0.55| 399 0| 18| 18| 99.44| 0.56| 399 . Thanks, Simon