Hi all,

During the weekend I submited many thousands of jobs through slurm. These
jobs are submited using a script that edit a file with the simulation
parameters and call sbatch on it. All of these simulations are part of the
same "group": due to the Monte-Carlo nature of them, I need to run many of
them to acquire a good statistics of the problem.

My issue is that I need to submit many of these groups of simulations, each
of them requiring thousands of runs. But I don't want a single group to
monopolise the whole cluster until the thousands of runs are done. What I
want is simulations from differente "groups" being run alternatively so even
though every group are not finished running, I can start building the
statistics from every group slowly and see if something is emerging of if I
need to cancel, change parameters, etc.

Also, sharing the cluster would be easier: I can submit my thousands of jobs
but another user could still run something before all my jobs are done.

Would something like that be possible? I hope I was not to confusing. I
don't use any external scheduler for now (SchedulerType=sched/backfill); is
slurm able to achieve something like that? If not, what would you suggest?

Thanks a lot for your answers!

Regards,

Nicolas

Reply via email to