Hi all,
We have here several clusters (all are either 14.11.6 or 15.08.6). We
want users to be able to send jobs to multiple clusters but not
necessarily "all" and we don't want them to type a list of clusters.
The main problems with "all" are that users will get warnings about
invalid account on some clusters, and that sending job from a 15.08.6
sbatch binary to a 14.11.6 slurmctld binary, causes the slurmctld to
crash (until the state/hash.?/job.* is deleted).
I thought of creating a spank plugin, but I can only effect the clusters
(through SLURM_CLUSTERS environment variable) in the spank_init, but
then I don't get the users' options because in spank_init_post_opt it's
already too late to set the clusters.
The best option I've found is to write a wrapper to sbatch - like
mslurm, but these wrappers sometimes becomes complicated when
encountering escaped characters (spaces and apostrophe).
Anyone knows about other options?
And with some relevance, what's the current status of inter cluster
options? last reference I found was from 2015-10-07, commit 0f6bf40:
Remove SICP job option
This was intended as a step toward managing jobs across mutliple
clusters, but we will be pursuing a very different design.
Thanks is advance,
Yair.