Private bug reported:
VRHOT (Voltage Regulator Hot) is a hardware signal asserted by the
Voltage Regulator Module (VRM) to indicate that it is operating beyond
safe thermal limits. When VRHOT is triggered, it signals the processor
to take immediate corrective action, such as reducing frequency,
voltage, or load, to prevent damage to the VRM and ensure system
stability.
VRHOT events are critical for protecting the power delivery subsystem,
especially in high-performance and high-density server platforms where
power and thermal constraints are tightly coupled. These events can be
triggered due to sustained high current draw, inadequate cooling, or
transient workload spikes.
In modern systems, VRHOT handling is typically managed at the hardware
and firmware levels, with mechanisms such as Dynamic Voltage and
Frequency Scaling (DVFS) and power capping. However, visibility and
coordination at the OS level are important for observability, logging,
and workload-aware mitigation.
In the Linux kernel, thermal and power management frameworks (e.g.,
hwmon, thermal subsystem, cpufreq, powercap) provide partial visibility,
but VRHOT-specific events are not always explicitly exposed or
standardized across platforms. Enhancing OS support would improve
monitoring and enable better coordination with workload management.
Feature Request:
Requested details to be enabled on OS:
Enable detection and reporting of VRHOT events in the OS.
Integrate VRHOT signals with thermal and power management subsystems.
Expose VRHOT status and telemetry via sysfs/hwmon interfaces.
Log VRHOT assertions and deassertions for diagnostics and analysis.
Enable policy-based responses (e.g., CPU throttling, workload migration).
Support firmware-to-OS handoff of VRHOT thresholds and configuration.
Correlate VRHOT events with CPU, memory, and workload activity.
Provide tools for monitoring and validating VRHOT behavior.
Support integration with data center power/thermal management frameworks.
Document VRHOT behavior, thresholds, and recommended mitigation strategies
Business Justification:
Protects power delivery components from thermal damage.
Improves system stability under high load conditions.
Enhances observability of power and thermal events.
Enables proactive workload and power management.
Reduces risk of unexpected shutdowns or hardware failures.
Aligns OS capabilities with modern platform power management features.
References:
CPU Vendor Power Management Documentation (e.g., AMD, Intel VRHOT
specifications)
Linux Thermal and Power Management Subsystem Documentation
ACPI Thermal Management Specification
Industry Whitepapers on Power Delivery and Thermal Protection
** Affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Information type changed from Public to Private
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2146677
Title:
Request for Power Management Support – VRHOT Handling
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2146677/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs