Hi, 

I'm trying Slurm 14.03.0, and in particular the new feature which allows the 
job_submit.lua to print an error message to the shall of our customers upon job 
submission. 
>From what I understood from the code one can print to the user shell by 
>calling the function "log_user" with a string as an argument. This seems to 
>work at the first job submission, but after that Slurm crashes. 

Here's a snippet of my code where you can see how I call the log_user function: 

################## job_submit.lua ################################### 
function slurm_job_submit ( job_desc, part_list, submit_uid ) 
setmetatable (job_desc, job_req_meta) 
local part_rec = _build_part_table (part_list) 


-- *** YOUR LOGIC GOES BELOW *** 
-- print (job_desc.num_tasks, job_desc.min_nodes, job_desc.max_nodes, 
job_desc.partition) 

-- Call function check_bq which checks the billing quota. If the quota is 
-- exceeded return with the status 2050. 

if(not check_bq(submit_uid, job_desc.group_id)) then 
log_user("Job aborted because you project is over quota") 
return 2050 
end 
return 0 
end 

function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid ) 
setmetatable (job_desc, job_req_meta) 
setmetatable (job_rec, job_rec_meta) 

-- *** YOUR LOGIC GOES BELOW *** 

return 0 
##################################################################### 


This is what happens at the user shell: 

##################################################################### 
$ srun -N1 hostname 
srun: error: Job aborted because you project is over quota 
srun: error: Unable to allocate resources: Job violates accounting/QOS policy 
(job submit limit, user's size and/or time limits) 

$ srun -N1 hostname 
srun: error: slurm_receive_msg: Zero Bytes were transmitted or received 
srun: error: Unable to allocate resources: Zero Bytes were transmitted or 
received 

$ srun -N1 hostname 
** stuck ... ** 
##################################################################### 


This is what is logged in the logs: 

#################### /var/log/slurm/Slurmctld.log ################### 
[2014-03-11T15:47:18.754] _slurm_rpc_allocate_resources: Job violates 
accounting/QOS policy (job submit limit, user's size and/or time limits) 
[2014-03-11T15:47:37.505] error: () Error: from job_submit_lua.c:207: (): 
Assertion (p[0] == 0x42) failed 
##################################################################### 


Could you help me solving the issue? 

Thanks in advance, 

Marco Passerini 

Reply via email to