On Tue Jul 29, 2025 at 11:44 AM CEST, Lukas Wagner wrote: > Hey Aaron, > > some notes inline. > > The memory leaks I mentioned should be fixed before this goes in (since > this is in the long-running pmxcfs process), everything else could also > happen as a followup. I'm not super familiar with this code, so there > might be things that I've missed. > > On Sat Jul 26, 2025 at 3:05 AM CEST, Aaron Lauterer wrote: >> With PVE9 now we have additional fields in the metrics that are >> collected and distributed in the cluster. The new fields/columns are >> added at the end of the existing ones. This makes it possible for PVE8 >> installations to still use them by cutting the new additional data. >> >> To make it more future proof, the format of the keys for each metrics >> are now changed: >> >> Old pre PVE9: pve{version}-{type}/{id} >> Now with PVE9: pve-{type}-{version}/{id} >> >> This way we have an easier time to handle new versions in the future as >> we initially only need to check for `pve-{type}-`. If we know the >> version, we can handle it accordingly; e.g. pad if older format with >> missing data. If we don't know the version, it must be a newer one and >> we cut the data stream at the length we need for the current version. >> >> This means of course that to avoid a breaking change, we can only add >> new columns if needed, but not remove any! But waiting for a breaking >> change until the next major release is a worthy trade-off if it allows >> us to expand the format in between if needed. >> >> The 'rrd_skip_data' function got a new parameter defining the sepataring >> character. This then makes it possible to use it also to determine which >> part of the key string is the version/type and which one is the actual >> resource identifier. >> >> We add several new columns to nodes and VMs (guest) RRDs. See futher >> down for details. Additionally we change the RRA definitions on how we >> aggregate the data to match how we do it for the Proxmox Backup Server >> [0]. >> >> The migration of an existing installation is handled by a dedicated >> tool. Only once that has happened, will we store data in the new >> format. >> This leaves us with a few cases to handle: >> >> data recv → old new >> ↓ rrd files >> >> -------------|---------------------------|------------------------------------- >> none | check if directories exists: >> | neither old or new -> new >> | new -> new >> | old only -> old >> --------------|---------------------------|------------------------------------- >> only old | use old file as is | cut new columns and use old file >> --------------|---------------------------|------------------------------------- >> new present | pad data to match new fmt | use new file as is and pass data >> >> To handle the padding we use a buffer. Cutting can be handled as we >> already do it in the stable/bookworm (PVE8) branch by introducing a null >> terminator in the original string at the end of the expected columns. >> >> We add the following new columns: >> >> Nodes: >> * memfree >> * arcsize >> * pressures: >> * cpu some >> * io some >> * io full >> * mem some >> * mem full >> >> VMs: >> * memhost (memory consumption of all processes in the guests cgroup, host >> view) >> * pressures: >> * cpu some >> * cpu full >> * io some >> * io full >> * mem some >> * mem full >> >> [0] >> https://git.proxmox.com/?p=proxmox-backup.git;a=blob;f=src/server/metric_collection/rrd.rs;h=ed39cc94ee056924b7adbc21b84c0209478bcf42;hb=dc324716a688a67d700fa133725740ac5d3795ce#l76 >> >> Signed-off-by: Aaron Lauterer <a.laute...@proxmox.com> >> --- >> src/pmxcfs/status.c | 261 +++++++++++++++++++++++++++++++++++++++----- >> 1 file changed, 236 insertions(+), 25 deletions(-) >> >> diff --git a/src/pmxcfs/status.c b/src/pmxcfs/status.c >> index e6b578b..eaef12c 100644 >> --- a/src/pmxcfs/status.c >> +++ b/src/pmxcfs/status.c >> @@ -1096,6 +1096,9 @@ kventry_hash_set(GHashTable *kvhash, const char *key, >> gconstpointer data, size_t >> return TRUE; >> } >> >> +// We create the RRD files with a 60 second stepsize, therefore, RRA >> timesteps >> +// are alwys per 60 seconds. These 60 seconds are usually showing up in >> other >> +// code paths where we interact with RRD data! >> static const char *rrd_def_node[] = { >> "DS:loadavg:GAUGE:120:0:U", >> "DS:maxcpu:GAUGE:120:0:U", >> @@ -1124,6 +1127,39 @@ static const char *rrd_def_node[] = { >> NULL, >> }; >> >> +static const char *rrd_def_node_pve9_0[] = { >> + "DS:loadavg:GAUGE:120:0:U", >> + "DS:maxcpu:GAUGE:120:0:U", >> + "DS:cpu:GAUGE:120:0:U", >> + "DS:iowait:GAUGE:120:0:U", >> + "DS:memtotal:GAUGE:120:0:U", >> + "DS:memused:GAUGE:120:0:U", >> + "DS:swaptotal:GAUGE:120:0:U", >> + "DS:swapused:GAUGE:120:0:U", >> + "DS:roottotal:GAUGE:120:0:U", >> + "DS:rootused:GAUGE:120:0:U", >> + "DS:netin:DERIVE:120:0:U", >> + "DS:netout:DERIVE:120:0:U", >> + "DS:memfree:GAUGE:120:0:U", >> + "DS:arcsize:GAUGE:120:0:U", >> + "DS:pressurecpusome:GAUGE:120:0:U", >> + "DS:pressureiosome:GAUGE:120:0:U", >> + "DS:pressureiofull:GAUGE:120:0:U", >> + "DS:pressurememorysome:GAUGE:120:0:U", >> + "DS:pressurememoryfull:GAUGE:120:0:U", >> + >> + "RRA:AVERAGE:0.5:1:1440", // 1 min * 1440 => 1 day >> + "RRA:AVERAGE:0.5:30:1440", // 30 min * 1440 => 30 day >> + "RRA:AVERAGE:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year >> + "RRA:AVERAGE:0.5:10080:570", // 1 week * 570 => ~10 years >> + >> + "RRA:MAX:0.5:1:1440", // 1 min * 1440 => 1 day >> + "RRA:MAX:0.5:30:1440", // 30 min * 1440 => 30 day >> + "RRA:MAX:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year >> + "RRA:MAX:0.5:10080:570", // 1 week * 570 => ~10 years >> + NULL, >> +}; >> + >> static const char *rrd_def_vm[] = { >> "DS:maxcpu:GAUGE:120:0:U", >> "DS:cpu:GAUGE:120:0:U", >> @@ -1149,6 +1185,36 @@ static const char *rrd_def_vm[] = { >> "RRA:MAX:0.5:10080:70", // 7 day max - ony year >> NULL, >> }; >> +static const char *rrd_def_vm_pve9_0[] = { >> + "DS:maxcpu:GAUGE:120:0:U", >> + "DS:cpu:GAUGE:120:0:U", >> + "DS:maxmem:GAUGE:120:0:U", >> + "DS:mem:GAUGE:120:0:U", >> + "DS:maxdisk:GAUGE:120:0:U", >> + "DS:disk:GAUGE:120:0:U", >> + "DS:netin:DERIVE:120:0:U", >> + "DS:netout:DERIVE:120:0:U", >> + "DS:diskread:DERIVE:120:0:U", >> + "DS:diskwrite:DERIVE:120:0:U", >> + "DS:memhost:GAUGE:120:0:U", >> + "DS:pressurecpusome:GAUGE:120:0:U", >> + "DS:pressurecpufull:GAUGE:120:0:U", >> + "DS:pressureiosome:GAUGE:120:0:U", >> + "DS:pressureiofull:GAUGE:120:0:U", >> + "DS:pressurememorysome:GAUGE:120:0:U", >> + "DS:pressurememoryfull:GAUGE:120:0:U", >> + >> + "RRA:AVERAGE:0.5:1:1440", // 1 min * 1440 => 1 day >> + "RRA:AVERAGE:0.5:30:1440", // 30 min * 1440 => 30 day >> + "RRA:AVERAGE:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year >> + "RRA:AVERAGE:0.5:10080:570", // 1 week * 570 => ~10 years >> + >> + "RRA:MAX:0.5:1:1440", // 1 min * 1440 => 1 day >> + "RRA:MAX:0.5:30:1440", // 30 min * 1440 => 30 day >> + "RRA:MAX:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year >> + "RRA:MAX:0.5:10080:570", // 1 week * 570 => ~10 years >> + NULL, >> +}; >> >> static const char *rrd_def_storage[] = { >> "DS:total:GAUGE:120:0:U", >> @@ -1168,8 +1234,30 @@ static const char *rrd_def_storage[] = { >> NULL, >> }; >> >> +static const char *rrd_def_storage_pve9_0[] = { >> + "DS:total:GAUGE:120:0:U", >> + "DS:used:GAUGE:120:0:U", >> + >> + "RRA:AVERAGE:0.5:1:1440", // 1 min * 1440 => 1 day >> + "RRA:AVERAGE:0.5:30:1440", // 30 min * 1440 => 30 day >> + "RRA:AVERAGE:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year >> + "RRA:AVERAGE:0.5:10080:570", // 1 week * 570 => ~10 years >> + >> + "RRA:MAX:0.5:1:1440", // 1 min * 1440 => 1 day >> + "RRA:MAX:0.5:30:1440", // 30 min * 1440 => 30 day >> + "RRA:MAX:0.5:360:1440", // 6 hours * 1440 => 360 day ~1 year >> + "RRA:MAX:0.5:10080:570", // 1 week * 570 => ~10 years >> + NULL, >> +}; > > Not important right now, but the migration tool and this > should probably use the same source of the schema definition (e.g. via > putting this into a header file which is then shared in some way > (submodule?)) > >> + >> #define RRDDIR "/var/lib/rrdcached/db" >> >> +// A 4k buffer should be plenty to temporarily store RRD data. 64 bit >> integers are 20 chars long, >> +// plus the separator char: (4096-1)/21~195 columns This buffer is only >> used in the >> +// `update_rrd_data` function. It is safe to use as the calling sites get >> the global mutex: >> +// rrd_update_data -> rrdentry_hash_set -> cfs_status_set / and >> cfs_kvstore_node_set >> +static char rrd_format_update_buffer[4096]; > > Maybe I'm missing something, but since this is only used in the > update_rrd_data function, is there any reason why this statically > allocated and not just malloc'd as needed? > >> + >> static void create_rrd_file(const char *filename, int argcount, const char >> *rrddef[]) { >> /* start at day boundary */ >> time_t ctime; >> @@ -1229,6 +1317,8 @@ static void update_rrd_data(const char *key, >> gconstpointer data, size_t len) { >> >> int skip = 0; // columns to skip at beginning. They contain >> non-archivable data, like uptime, >> // status, is guest a template and such. >> + int padding = 0; // how many columns need to be added with "U" if we >> get an old format that is >> + // missing columns at the end. >> int keep_columns = 0; // how many columns do we want to keep (after >> initial skip) in case we get >> // more columns than needed from a newer format >> >> @@ -1243,20 +1333,60 @@ static void update_rrd_data(const char *key, >> gconstpointer data, size_t len) { >> goto keyerror; >> } >> >> - skip = 2; // first two columns are live data that isn't archived >> + filename = g_strdup_printf(RRDDIR "/pve-node-9.0/%s", node); >> + char *filename_pve2 = g_strdup_printf(RRDDIR "/pve2-node/%s", node); >> >> - if (strncmp(key, "pve-node-", 9) == 0) { > > Btw, you can safely use `strcmp` if the thing you are comparing to is a > string literal, since these are guaranteed to be NULL-terminated. I > would even argue its better that using `strncmp`, since there is no > chance of miscounting the length of the string literal (which is also > hard to spot in code review). > > Applies to the other instances where you do the same as well. > > Of course, if *both* arguments are a pointer with content that might not > be NULL-terminated, you are still required to use `strnlen` of avoid > buffer overruns. >
Please disregard this comment. While what I've said is true for checking for a full string match, here you are actually checking for a prefix, so strnlen is the correct choice here. I guess we could use strlen to avoid the error potential of passing the length explicitly. A modern compiler will optimize it anyway and compute the length at compile time. strncmp(key, "pve-node-", strlen("pve-node-")) To avoid the repetition this could then also be some constant. But that's all not that important for now and can still be done later on. _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel