Michel,
One more thing, SLURM version 2.3 is only getting bug fixes. Version
2.4 will be release around May 2012, so that would be the target.
Quoting Michel Bourget <[email protected]>:
Hi all,
It's about time I report to this mailing list what "SGI did to SLURM".
Short story:
FYI, we are releasing ( and support ) "SGI SLURM" product on SGI
platforms this November.
It's based on version 2.2.7. For the user, it simply introduce the
"sgimpi" mpi plugin.
Long story:
SGI MPI integration was not trivial since we are utilizing the
native SGI MPI launcher ( array
services ) underneath slurmstepd. We have introduced the notion of
"strack" allowing job launched
outside slurm scope to be tracked process-wise( proctrack ) and
accounting-wise ( job_acct_gather ).
This introduce the notion of "sentinel" thread, in slurmstepd,
responsible to add additional
"pgid's" not being launched under slurmstepd umbrella. Those
additional pgid are communicated by
strack usinga simple mailbox file mechanism (
slurm.sentinel.<job>.<step> ). Essentially,
in addition to the native slurmstepd childmonitoring, we are adding
hooks to monitor
out-of-band pgid's via the newly introduced strack/sentinel mechanism.
The resulting source patches to accomplish this integration are not,
in our opinion, ready for a proposal
on this mailing list yet for the following reasons:
- we would need to re-base on 2.3 and/or 2.4. Can someone confirm ?
- the source patches are quite large.
initd.sysconfig.patch : 3 files changed, 37
insertions(+), 16 deletions(-)
sentinel.patch : 50 files changed, 3334
insertions(+), 28 deletions(-)
sgimpi.patch : 18 files changed, 1089
insertions(+), 5 deletions(-)
slurm.modulefile.patch : 1 file changed, 28 insertions(+)
We need some guidance on an acceptable process for the slurm community for
submitting above patches. I presume a documented ( details, do,
don't, why, ... )
approach is probably required.
Note the source RPM is, of course, shipped on the SGI SLURM iso.
Please let me know if you'd like to look at it.
We hope to integrate the above into the stock SLURM release in
the following year.
- we believe a safe soak time ( customer's reported bug to us, etc
... ) is necessary.
- initial SGI release support ALTIX ICE Cluster. We don't support
large SSI yet ( UV
1024 cores for example ) because it would require additional
required optimizations
for such big machines. In particular, proctrack/job_acct_gather
need to relieve pressure
on reading the entire /proc/<all_pids>/stat. Why ? Because, on an
idle 512 CPU machines,
we have:
#nproc=8867 #kthreads=8813 kthreads/nproc= 99.39%
In other words, kthreads are useless to scan over and over every
all_user_job/all_step.
I am working on a separate solution ( GPL ) to
scan-once-and-for-all-and-share
those kthreads, hence relieving pressure. That separate solution
would then be
integrated into slurm in a form of an optional option:
- dlopen the optional library
- if there: use it
- else : continue as before.
In addition, the SGI MPI plugin would require some adjustments for
SSI machines.
Cheers
--
-----------------------------------------------------------
Michel Bourget - SGI - Linux Software Engineering
"Past BIOS POST, everything else is extra" (travis)
-----------------------------------------------------------