L,
I discovered earlier this week (after embarking on a crusade to squash
nonexistent bugs) that sbatch will ignore any flag that comes after the
filename of the script to be submitted.
For example, I re-purposed a lua script that just spits out the time limit
of the job and exits with an error:
> /etc/slurm/job_submit.lua:
> function slurm_job_submit(job_desc, part_list, submit_uid)
> slurm.log_user("time_limit: %s", job_desc.time_limit)
> return slurm.ERROR
> end
function slurm_job_modify(job_desc, part_list, submit_uid)
> end
>
Here's what I get when I run it in various ways (positional argument job.sh
bolded):
# sbatch *job.sh*
sbatch: error: time_limit: 4294967294
sbatch: error: Batch job submission failed: Unspecified error
# sbatch --time=0-07:00:00 *job.sh*
sbatch: error: time_limit: 420
sbatch: error: Batch job submission failed: Unspecified error
# sbatch *job.sh* --time=0-07:00:00
sbatch: error: time_limit: 4294967294
sbatch: error: Batch job submission failed: Unspecified error
4294967294 = 2^32 - 2 is the default time limit, which means that on my
third run of this script, the time argument is completely ignored! This is
bad, especially for people who are used to the command line where almost
every program uses an argument parsing library like getopt that works in a
manner that's predictable, both for the programmer and for the user.
Nathan
On 29 June 2017 at 21:32, Lachlan Musicman <[email protected]> wrote:
> We have a 40min default time on our main partition.
>
> We are finding that researchers that use
>
> #SBATCH --time=0-07:00:00
>
> are still having their jobs terminated at 40 minutes.
>
> Using slurm 17.2.04 on Centos 7.3
>
>
> Has anyone else experienced this?
>
>
> Cheers
> L.
> ------
> "Mission Statement: To provide hope and inspiration for collective action,
> to build collective power, to achieve collective transformation, rooted in
> grief and rage but pointed towards vision and dreams."
>
> - Patrisse Cullors, *Black Lives Matter founder*
>