Hi,

I'm glad to announce a initial version of Slurm Simulator.

I have been working those last months on this and I got a version stable
and deterministic enough.

The point is to simulate a long trace of jobs so no real jobs are
executed. Intead, a simulation manager knows how long jobs will last in
advance.

The main goal is to offer this simulation mode without main Slurm code
modifications. This is accomplished with slurm 2.1.9 and I think this
could be migrated to newer slurm versions easily  as long as no main
slurm design changes happen. The simulation is based on LD_PRELOAD
functionality so time related and thread related functions are captured
and wrappers executed for simulation purposes.

Slurm is a multithread and multiprocess software so taking control of
slurm execution is not simple at all. Main slurm threads are executed in
sequential order to achieve determinsm. There are other threads related
to job submission, job disptach or  job completion which are executed on
their own, although limiting how much of those threads are created at
the same time. What simulation manager takes care of is those threads
"belonging" to a simulation cycle should be executed on that cycle.
Under simulation there are several slurm functionalilties which are not
needed. Others have been modified for simplicity. 

Simulation performance is good enough. Having such a tight control over
slurm threads is not a problem since all can be done in milliseconds.
The main problem is normal slurm scheduling under high load makes a
simulator cycle (1 simulated second)  to last several real seconds. For
a 3000 nodes /12000 cores cluster, executing 1000 jobs ranging from 30
seconds to an hour, with multifactor priority, and using a Intel Xeon
with 8 cores for simulation execution, it takes ~1000 seconds to
simulate 40000 real seconds. So a full-day simulation would take ~35
minutes.

A patch for slurm-2.1.9  with a HOWTO and some configuration files can
be obtained from http://www.bsc.es/plantillaA.php?cat_id=705

I'm looking forward to get some feedback about the usefulness of this work.

Alejandro Lucero


WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer.htm

Reply via email to