Hi,
I'm trying Slurm 14.03.0, and in particular the new feature which allows the
job_submit.lua to print an error message to the shall of our customers upon job
submission.
>From what I understood from the code one can print to the user shell by
>calling the function "log_user" with a string as an argument. This seems to
>work at the first job submission, but after that Slurm crashes.
Here's a snippet of my code where you can see how I call the log_user function:
################## job_submit.lua ###################################
function slurm_job_submit ( job_desc, part_list, submit_uid )
setmetatable (job_desc, job_req_meta)
local part_rec = _build_part_table (part_list)
-- *** YOUR LOGIC GOES BELOW ***
-- print (job_desc.num_tasks, job_desc.min_nodes, job_desc.max_nodes,
job_desc.partition)
-- Call function check_bq which checks the billing quota. If the quota is
-- exceeded return with the status 2050.
if(not check_bq(submit_uid, job_desc.group_id)) then
log_user("Job aborted because you project is over quota")
return 2050
end
return 0
end
function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid )
setmetatable (job_desc, job_req_meta)
setmetatable (job_rec, job_rec_meta)
-- *** YOUR LOGIC GOES BELOW ***
return 0
#####################################################################
This is what happens at the user shell:
#####################################################################
$ srun -N1 hostname
srun: error: Job aborted because you project is over quota
srun: error: Unable to allocate resources: Job violates accounting/QOS policy
(job submit limit, user's size and/or time limits)
$ srun -N1 hostname
srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
srun: error: Unable to allocate resources: Zero Bytes were transmitted or
received
$ srun -N1 hostname
** stuck ... **
#####################################################################
This is what is logged in the logs:
#################### /var/log/slurm/Slurmctld.log ###################
[2014-03-11T15:47:18.754] _slurm_rpc_allocate_resources: Job violates
accounting/QOS policy (job submit limit, user's size and/or time limits)
[2014-03-11T15:47:37.505] error: () Error: from job_submit_lua.c:207: ():
Assertion (p[0] == 0x42) failed
#####################################################################
Could you help me solving the issue?
Thanks in advance,
Marco Passerini