On 3/19/26 12:16 PM, Christoph Heiss wrote:
Two comments inline.
Other than that, please consider it:
Reviewed-by: Christoph Heiss <[email protected]>
On Thu Mar 5, 2026 at 10:16 AM CET, Dominik Csapak wrote:
[..]
diff --git a/pve-rs/Cargo.toml b/pve-rs/Cargo.toml
index 45389b5..3b6c2fc 100644
--- a/pve-rs/Cargo.toml
+++ b/pve-rs/Cargo.toml
@@ -20,6 +20,7 @@ hex = "0.4"
http = "1"
libc = "0.2"
nix = "0.29"
+nvml-wrapper = "0.12"
Missing the respective entry in d/control.
[..]
diff --git a/pve-rs/src/bindings/nvml.rs b/pve-rs/src/bindings/nvml.rs
new file mode 100644
index 0000000..0f4c81e
--- /dev/null
+++ b/pve-rs/src/bindings/nvml.rs
@@ -0,0 +1,91 @@
+//! Provides access to the state of NVIDIA (v)GPU devices connected to the
system.
+
+#[perlmod::package(name = "PVE::RS::NVML", lib = "pve_rs")]
+pub mod pve_rs_nvml {
+ //! The `PVE::RS::NVML` package.
+ //!
+ //! Provides high level helpers to get info from the system with NVML.
+
+ use anyhow::Result;
+ use nvml_wrapper::Nvml;
+ use perlmod::Value;
+
+ /// Retrieves a list of *creatable* vGPU types for the specified GPU by
bus id.
+ ///
+ /// The [`bus_id`] is of format
"\<domain\>:\<bus\>:\<device\>.\<function\>",
+ /// e.g. "0000:01:01.0".
+ ///
+ /// # See also
+ ///
+ /// [`nvmlDeviceGetCreatableVgpus`]:
<https://docs.nvidia.com/deploy/nvml-api/group__nvmlVgpu.html#group__nvmlVgpu_1ge86fff933c262740f7a374973c4747b6>
+ /// [`nvmlDeviceGetHandleByPciBusId_v2`]:
<https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1gea7484bb9eac412c28e8a73842254c05>
+ /// [`struct nvmlPciInfo_t`]:
<https://docs.nvidia.com/deploy/nvml-api/structnvmlPciInfo__t.html#structnvmlPciInfo__t_1a4d54ad9b596d7cab96ecc34613adbe4>
+ #[export]
+ fn creatable_vgpu_types_for_dev(bus_id: &str) -> Result<Vec<Value>> {
+ let nvml = Nvml::init()?;
Looking at this, I was wondering how expensive that call is, considering
this path is triggered from the API. Same for
supported_vgpu_types_for_dev() below.
Did some quick & simple benchmarking - on average, `Nvml::init()` took
~32ms, with quite some variance; at best ~26ms up to an worst case
of >150ms.
IMO nothing worth blocking the series on, as this falls into premature
optimization territory and can be fixed in the future, if needed.
Holding an instance in memory might also be problematic on driver
upgrades? I.e. we keep an old version of the library loaded, and thus
mismatched API.
The above results were done with one GPU only though, so potentially
could be worse on multi-GPU systems.
what we could do is to cache the results from this either
here, or in perl (i think it's easier to do on the perl side)
that way the cost has to be only paid once, and the amount
of data should be in the KBs only.
I think this should work because the available models/devices
can't change while the server is up?
+ let device = nvml.device_by_pci_bus_id(bus_id)?;
+
+ build_vgpu_type_list(device.vgpu_creatable_types()?)
+ }
+
+ /// Retrieves a list of *supported* vGPU types for the specified GPU by
bus id.
+ ///
+ /// The [`bus_id`] is of format
"\<domain\>:\<bus\>:\<device\>.\<function\>",
+ /// e.g. "0000:01:01.0".
+ ///
+ /// # See also
+ ///
+ /// [`nvmlDeviceGetSupportedVgpus`]:
<https://docs.nvidia.com/deploy/nvml-api/group__nvmlVgpu.html#group__nvmlVgpu_1ge084b87e80350165859500ebec714274>
+ /// [`nvmlDeviceGetHandleByPciBusId_v2`]:
<https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1gea7484bb9eac412c28e8a73842254c05>
+ /// [`struct nvmlPciInfo_t`]:
<https://docs.nvidia.com/deploy/nvml-api/structnvmlPciInfo__t.html#structnvmlPciInfo__t_1a4d54ad9b596d7cab96ecc34613adbe4>
+ #[export]
+ fn supported_vgpu_types_for_dev(bus_id: &str) -> Result<Vec<Value>> {
+ let nvml = Nvml::init()?;
+ let device = nvml.device_by_pci_bus_id(bus_id)?;
+
+ build_vgpu_type_list(device.vgpu_supported_types()?)
+ }