In order to minimize system noise, the slurmd is typically completely asleep when applications are running, but there is a ping which can be sent periodically (see SlurmdTimeout in slurm.conf for controlling its frequency). That seems like a good place to collect and report CPU load.
Quoting Michael Gutteridge <[email protected]>: > > I started down that path... got pretty close and realized that this > may not work as I'd wanted. I'd wanted something that would > periodically check and update the node record (say every 5+ minutes or > so) with the current load average. However, "slurmd" only wakes up > and processes when it receives a message from slurmctld. AFAICT, it'd > only reevaluate cpu load (i.e. run through the "get*" functions) when > sent a "reconfigure" message. > > Or have I got that wrong? > > Thanks > > M > > On Tue, Aug 7, 2012 at 10:01 AM, Moe Jette <[email protected]> wrote: >> >> SLURM is designed to allocate and use resource (e.g. CPUs and memory) >> rather than monitor CPU load and use that as a basis for scheduling. >> Although that feature has been requested in the past, there are no >> immediate plans to work on it. >> >> Right now the load average is not collected, although that would be >> simple to add. The USE_CPU_SPEED logic is designed to capture the >> CPU's performance in a cluster with different CPUs on different nodes. >> I'd recommend adding a new set of fields for the node load and store >> the information in the same structure used for other node information >> (slurmd_config) then add a new function to >> src/slurmd/slurmd/get_mach_stat.c to collect the info. If you do work >> on this, please send your work to the mailing list and we can >> incorporate it into the main code base. >> >> Quoting Michael Gutteridge <[email protected]>: >> >>> >>> Hi >>> >>> Does slurm report system load information (c.f. /proc/loadavg, >>> uptime(1))? I'd like to report that up to Moab, but I can't seem to >>> suss out where slurm has that information. >>> >>> It doesn't look like it reports CPULOAD via wiki2, so that's going to >>> need modification. I can't locate anything in slurmd's source that >>> suggests slurmd gathers that data. If it's something I'll have to >>> write, would using something a'la the ifdef'd "USE_CPU_SPEED" work? >>> (that's in src/slurmd/slurmd/get_mach_stat.c) >>> >>> Thanks >>> >>> Michael >> > > > > -- > Hey! Somebody punched the foley guy! > - Crow, MST3K ep. 508
