You'll need this too:
https://github.com/SchedMD/slurm/commit/e25249684af250eb65e424eaf12ff4755c0d0af1.patch
Quoting Tommi T <tommi_...@yahoo.com>:
Hi,
I pulled your fix but seems that it does not help:
[2014-03-11T16:54:40.587] _slurm_rpc_allocate_resources: Job
violates accounting/QOS policy (job submit limit, user's size and/or
time limits)
[2014-03-11T16:54:41.982] _slurm_rpc_allocate_resources: Job
violates accounting/QOS policy (job submit limit, user's size and/or
time limits)
[2014-03-11T16:54:41.982] error: () Error: from
slurm_protocol_defs.c:434: (): Assertion (p[0] == 0x42) failed
Best Regards,
Tommi Tervo
CSC
On Tuesday, March 11, 2014 4:35 PM, Moe Jette <je...@schedmd.com> wrote:
I don't have time to test this right now, but believe the commit below
will fix the problem by initializing a variable to NULL.
https://github.com/SchedMD/slurm/commit/e3363b95b0cedd4972c8c7b8dc87a1750f6bc3dd
Quoting Marco Passerini <marco.passer...@csc.fi>:
Hi,
I'm trying Slurm 14.03.0, and in particular the new feature which
allows the job_submit.lua to print an error message to the shall of
our customers upon job submission.
From what I understood from the code one can print to the user shell
by calling the function "log_user" with a string as an argument.
This seems to work at the first job submission, but after that Slurm
crashes.
Here's a snippet of my code where you can see how I call the
log_user function:
################## job_submit.lua ###################################
function slurm_job_submit ( job_desc, part_list, submit_uid )
setmetatable (job_desc, job_req_meta)
local part_rec = _build_part_table (part_list)
-- *** YOUR LOGIC GOES BELOW ***
-- print (job_desc.num_tasks, job_desc.min_nodes,
job_desc.max_nodes, job_desc.partition)
-- Call function check_bq which checks the billing quota. If the quota is
-- exceeded return with the status 2050.
if(not check_bq(submit_uid, job_desc.group_id)) then
log_user("Job aborted because you project is over quota")
return 2050
end
return 0
end
function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid )
setmetatable (job_desc, job_req_meta)
setmetatable (job_rec, job_rec_meta)
-- *** YOUR LOGIC GOES BELOW ***
return 0
#####################################################################
This is what happens at the user shell:
#####################################################################
$ srun -N1 hostname
srun: error: Job aborted because you project is over quota
srun: error: Unable to allocate resources: Job violates
accounting/QOS policy (job submit limit, user's size and/or time
limits)
$ srun -N1 hostname
srun: error: slurm_receive_msg: Zero Bytes were transmitted or received
srun: error: Unable to allocate resources: Zero Bytes were
transmitted or received
$ srun -N1 hostname
** stuck ... **
#####################################################################
This is what is logged in the logs:
#################### /var/log/slurm/Slurmctld.log ###################
[2014-03-11T15:47:18.754] _slurm_rpc_allocate_resources: Job
violates accounting/QOS policy (job submit limit, user's size and/or
time limits)
[2014-03-11T15:47:37.505] error: () Error: from
job_submit_lua.c:207: (): Assertion (p[0] == 0x42) failed
#####################################################################
Could you help me solving the issue?
Thanks in advance,
Marco Passerini