I have not looked at the patch, but changing 72 files is very
surprising to me. Adding support for Cray and IBM BlueGene systems
each involved changes to about 25 files with the vast majority of
changes in new plugins. I'd expect an SGI port to follow a similar
pattern with a small number of new plugins and minor changes elsewhere.
Moe Jette
SchedMD LLC
Quoting Andy Riebs <[email protected]>:
Hi Michel,
Some of the things that you should consider as you approach
submitting your changes to SLURM:
* SLURM already has the PMI interface (and I see that someone is
working on PMI2); do you require more support than the PMI
interface, or SPANK plugins, could provide? It might be helpful to
identify specific hooks that you need -- others on the list may be
able to identify existing mechanisms.
* Are you introducing new functionality that might be of more
general use? This may relate to the previous question.
* You mentioned a concern with high cpu counts. The BlueGene code
offers an excellent example of "the SLURM way" to handle those
problems.
* Are your changes implemented so that they will have little or no
impact on those who choose not to use them? (This should also be
viewed from the point of view of maintaining the code.)
Changing 72 files is a huge change. Clearly I speak only on behalf
of myself, but the SLURM community can be of best help if we
understand the pieces of the puzzle, and have a chance to ensure
that the changes that you require will also meet the needs of the
rest of the community.
Best regards,
Andy
On 11/24/2011 01:31 PM, Michel Bourget wrote:
Hi all,
It's about time I report to this mailing list what "SGI did to SLURM".
Short story:
FYI, we are releasing ( and support ) "SGI SLURM" product on SGI
platforms this November.
It's based on version 2.2.7. For the user, it simply introduce the
"sgimpi" mpi plugin.
Long story:
SGI MPI integration was not trivial since we are utilizing the
native SGI MPI launcher ( array
services ) underneath slurmstepd. We have introduced the notion of
"strack" allowing job launched
outside slurm scope to be tracked process-wise( proctrack ) and
accounting-wise ( job_acct_gather ).
This introduce the notion of "sentinel" thread, in slurmstepd,
responsible to add additional
"pgid's" not being launched under slurmstepd umbrella. Those
additional pgid are communicated by
strack usinga simple mailbox file mechanism (
slurm.sentinel.<job>.<step> ). Essentially,
in addition to the native slurmstepd childmonitoring, we are adding
hooks to monitor
out-of-band pgid's via the newly introduced strack/sentinel mechanism.
The resulting source patches to accomplish this integration are
not, in our opinion, ready for a proposal
on this mailing list yet for the following reasons:
- we would need to re-base on 2.3 and/or 2.4. Can someone confirm ?
- the source patches are quite large.
initd.sysconfig.patch : 3 files changed, 37
insertions(+), 16 deletions(-)
sentinel.patch : 50 files changed, 3334
insertions(+), 28 deletions(-)
sgimpi.patch : 18 files changed, 1089
insertions(+), 5 deletions(-)
slurm.modulefile.patch : 1 file changed, 28 insertions(+)
We need some guidance on an acceptable process for the slurm community for
submitting above patches. I presume a documented ( details, do,
don't, why, ... )
approach is probably required.
Note the source RPM is, of course, shipped on the SGI SLURM iso.
Please let me know if you'd like to look at it.
We hope to integrate the above into the stock SLURM release in
the following year.
- we believe a safe soak time ( customer's reported bug to us, etc
... ) is necessary.
- initial SGI release support ALTIX ICE Cluster. We don't support
large SSI yet ( UV
1024 cores for example ) because it would require additional
required optimizations
for such big machines. In particular, proctrack/job_acct_gather
need to relieve pressure
on reading the entire /proc/<all_pids>/stat. Why ? Because, on an
idle 512 CPU machines,
we have:
#nproc=8867 #kthreads=8813 kthreads/nproc= 99.39%
In other words, kthreads are useless to scan over and over every
all_user_job/all_step.
I am working on a separate solution ( GPL ) to
scan-once-and-for-all-and-share
those kthreads, hence relieving pressure. That separate solution
would then be
integrated into slurm in a form of an optional option:
- dlopen the optional library
- if there: use it
- else : continue as before.
In addition, the SGI MPI plugin would require some adjustments for
SSI machines.
Cheers