On 7/28/22 14:56, Rob Sargent wrote:
On Jul 28, 2022, at 1:10 AM, Christian Meesters<meest...@uni-mainz.de> wrote:
Hi,
not quite. Under SLURM the jobstep starter (SLURM lingo) is "srun". You do not do ssh from job host
to job host, but rather use "parallel" as a semaphore avoiding over subscription of job steps with
"srun". I summarized this approach here:
https://mogonwiki.zdv.uni-mainz.de/dokuwiki/start:working_on_mogon:workflow_organization:node_local_scheduling#running_on_several_hosts
(uh-oh - I need to clean up that site, many outdated sections there, but this
one should still be ok)
One advantage: you can safely utilize the resources of both (or more) hosts -
the master hosts and all secondaries. How much resources you require depends on
your application and the work it does. Be sure to consider I/O (e.g. stage-in
file to avoid random I/O with too many concurrent applications, etc.), if this
is an issue for your application.
Cheers
Christian
Christian,
My use of GNU parallel does not include ssh. Rather I simply fill the slurm
node with —jobs=ncores
That would require to have an interactive job and having
ncores_per_node/threads_per_application ssh-connections, and you have to
manually trigger the script. My solution is to use parallel in a
SLURM-job context and avoid the synchronization step by a human, whilst
offering a potential multi-node job with smp applications. It's your
choice, of course.
Ole,
Is your suggestion that I should ssh back to my account and run the job?
Pretty sure 2FA will get in the way.
Thanks to you both,
rjs