As usual, I'm trying to run millions of jobs using SLURM, where each job
is a single invocation of a program with different input parameters. I
wrote a small script to do that (feedback appreciated).
We already know scheduling is slow at these queue sizes. But submission
itself is even more painful (taking hours and slowing down the scheduler).
I've been abusing job arrays to group simultaneous invocations of the
same program, and to speed up submission times.
I wrote a script to generalize job array usage so that any list of
commands can be used (as long as the command likes to share the same
scheduling policy).
It takes a command list (one per line) on stdin, and submits a single
job array allocation that uses a simple invocation trampoline.
I use it as follows (stupid example):
for input in in/*; do
output="out/$(basename $file)"
echo "./process-me < $input > $output"
done | sarrayscript [any sbatch arguments]
or more commonly:
for i in $(seq 1 100); do
for k in $(seq 1 5000 10); do
echo "./cmd file $i $k"
done
done | sarrayscript
I'd like some feedback:
http://www.thregr.org/~wavexx/tmp/sarrayscript.tar.gz
In particular:
- I'm using the home directory to share the job file, since I cannot use
sbcast to broadcast a randomly-generated file name and hope it won't
collide (with millions of jobs, this happens frequently).
- I don't like the idea of sharing a file at all. I could literally use
something like: ssh submissionhost fetch-job-at-index $SLURM_JOB_ID
$SLURM_ARRAY_JOB_INDEX to retrieve it instead and avoid cleanup issues
(but that assumes ssh will login without prompting, which is not true
for me).
- I use a dependent job to perform the cleanup, but I could have used a
trigger with --fini (this depends whether you can use strigger as a
normal user). The cleanup script can be used with both.