Hello again!

I wrote today:
>    We found that simple command such as 'srun hostname' executes at
>random - from 1 second to tens.  After a lot of research the cause of
>that strange behavior was found - process of power save runs in own
>thread and gets not always in sync with scheduler.  Sure, if we had the
>power management fully power off nodes that delay would be unnoticed
>(and, probably it's why it was unnoticed before) but our power save
>function just manipulate CPU frequency and such so delay 20 seconds looks
>strange enough.  As soon the cause was found the source is patched and
>patch is in attachment.

    I can be wrong a bit as slurmctld should be locked when scheduler
doing work so changes in src/slurmctld/node_mgr.c and node_scheduler.c
may be not neccessary so I've overdid it. I'll check without it and
report after.

    Andriy.

Reply via email to