In order to minimize system noise, the slurmd is typically completely  
asleep when applications are running, but there is a ping which can be  
sent periodically (see SlurmdTimeout in slurm.conf for controlling its  
frequency). That seems like a good place to collect and report CPU load.

Quoting Michael Gutteridge <[email protected]>:

>
> I started down that path... got pretty close and realized that this
> may not work as I'd wanted.    I'd wanted something that would
> periodically check and update the node record (say every 5+ minutes or
> so) with the current load average.  However, "slurmd" only wakes up
> and processes when it receives a message from slurmctld.  AFAICT, it'd
> only reevaluate cpu load (i.e. run through the "get*" functions) when
> sent a "reconfigure" message.
>
> Or have I got that wrong?
>
> Thanks
>
> M
>
> On Tue, Aug 7, 2012 at 10:01 AM, Moe Jette <[email protected]> wrote:
>>
>> SLURM is designed to allocate and use resource (e.g. CPUs and memory)
>> rather than monitor CPU load and use that as a basis for scheduling.
>> Although that feature has been requested in the past, there are no
>> immediate plans to work on it.
>>
>> Right now the load average is not collected, although that would be
>> simple to add. The USE_CPU_SPEED logic is designed to capture the
>> CPU's performance in a cluster with different CPUs on different nodes.
>> I'd recommend adding a new set of fields for the node load and store
>> the information in the same structure used for other node information
>> (slurmd_config) then add a new function to
>> src/slurmd/slurmd/get_mach_stat.c to collect the info. If you do work
>> on this, please send your work to the mailing list and we can
>> incorporate it into the main code base.
>>
>> Quoting Michael Gutteridge <[email protected]>:
>>
>>>
>>> Hi
>>>
>>> Does slurm report system load information (c.f. /proc/loadavg,
>>> uptime(1))?  I'd like to report that up to Moab, but I can't seem to
>>> suss out where slurm has that information.
>>>
>>> It doesn't look like it reports CPULOAD via wiki2, so that's going to
>>> need modification.  I can't locate anything in slurmd's source that
>>> suggests slurmd gathers that data.  If it's something I'll have to
>>> write, would using something a'la the ifdef'd "USE_CPU_SPEED" work?
>>> (that's in src/slurmd/slurmd/get_mach_stat.c)
>>>
>>> Thanks
>>>
>>> Michael
>>
>
>
>
> --
> Hey! Somebody punched the foley guy!
>    - Crow, MST3K ep. 508

Reply via email to