Nathan,
I have very much appreciated the job_submit.lua plugin for helping
educate users on what is an acceptable job. It is one of my favorite
features about SLURM and has been invaluable in assisting students in
submitting valid job requirements.
If a user specifies some absurd amount of memory, or some other sbatch
or srun parameter... or does not choose a parameter, I like to notify
the user what they have done wrong. For example I require all users to
specify a QoS when they submit a job.
====== BEGIN EXAMPLE job_submit.lua ======
function slurm_job_modify(job_desc, part_list, submit_uid)
end
function slurm_job_submit(job_desc, part_list, submit_uid)
--[[ Start with an error count of 0 ]]--
local asc_error = 0
local asc_error_verbose = ""
--[[ Pretend if statement ]]--
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nERROR: Job requested
something we dont like.\n", asc_error_verbose)
--[[ End Pretend if statement ]]--
--[[ Pretend if statement ]]--
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nERROR: More bad stuff.\n",
asc_error_verbose)
--[[ End Pretend if statement ]]--
if asc_error > 0 then
slurm.log_user("\n%s", asc_error_verbose)
return slurm.ERROR
end
--[[ Want to return slurm.SUCCESS if the entire script runs to end
]]--
return slurm.SUCCESS
end
====== END EXAMPLE job_submit.lua =======
This is the method that I worked out, where it collects all of the
errors inside asc_error_verbose and dumps out at the end with return
slurm.ERROR. If you use the current file above, it will return every
job with those errors above. This would be a great way to check that
job_submit.lua is working on your system. If you have any current jobs
though, it will kill them all... so use this on a development
environment for testing.
My example for making a user specify a QoS:
local asc_qos = job_desc.qos
if asc_qos == nil then
asc_error = asc_error + 1
asc_error_verbose = string.format("%s\nJob must request a QoS using
the --qos= flag.\n",asc_error_verbose)
asc_qos = "invalid"
end
I'd be more than happy to share my job_submit.lua if anyone is
interested. I only ask that you share yours back.
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Tue, 2017-06-27 at 14:30 -0600, Nathan Vance wrote:
> Darby,
>
> The "job_submit.lua: initialized" line in slurm.conf was indeed the
> issue. When compiling slurm I only got the "yes lua" line without the
> flags, but that seems to be just a difference in OS's.
>
> Now that I have debugging feedback I should be good to go!
>
> Thanks,
> Nathan
>
> On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311) <darby.vicker-1@n
> asa.gov> wrote:
> > We recently started using a lua job submit plugin as well. You
> > have to have the lua-devel package installed when you compile
> > slurm. It looks like you do (but we use RHEL the package name is
> > lua-devel) but confirm that you see something like these in
> > config.log:
> >
> > configure:24784: result: yes lua
> > pkg_cv_lua_LIBS='-llua -lm -ldl '
> > lua_CFLAGS=' -DLUA_COMPAT_ALL'
> > lua_LIBS='-llua -lm -ldl '
> >
> > Do you have this in your slurm.conf?
> >
> > JobSubmitPlugins=lua
> >
> > I'm guessing not given you don't see anything in the logs. Before I
> > got all the errors worked out, I would see errors like this in
> > slurmctld_log:
> >
> > error: Couldn't find the specified plugin name for job_submit/lua
> > looking at all files
> > error: cannot find job_submit plugin for job_submit/lua
> > error: cannot create job_submit context for job_submit/lua
> > failed to initialize job_submit plugin
> >
> >
> > After getting everything working, you should see this:
> >
> > job_submit.lua: initialized
> >
> > As well as any other slurm.log_info messages you put in your lua
> > script.
> >
> >
> > From: Nathan Vance <[email protected]>
> > Reply-To: slurm-dev <[email protected]>
> > Date: Tuesday, June 27, 2017 at 12:15 PM
> > To: slurm-dev <[email protected]>
> > Subject: [slurm-dev] Job Submit Lua Plugin
> >
> > Hello all!
> >
> > I've been working on getting off the ground with Lua plugins. The
> > goal is to implement Torque's routing queues for SLURM, but so far
> > I have been unable to get SLURM to even call my plugin.
> >
> > What I have tried:
> > 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same
> > directory as slurm.conf)
> > 2) Restarted slurmctld and verified that no functionality was
> > broken
> > 3) Added slurm.log_info("I got here") to several points in the
> > script. After restarting slurmctld and submitting a job, grep "I
> > got here" -R /var/log found no results.
> > 4) In case there was a problem with the log file, I added
> > os.execute("touch /home/myUser/slurm_job_submitted") to the top of
> > the slurm_job_submit method. Restarting slurmctld and submitting a
> > job still produced no evidence that my plugin was called.
> > 5) In case there were permission issues, I made job_submit.lua
> > executable. Nothing. Even grep "job_submit" -R /var/log (in case
> > there was an error calling the script) comes up dry.
> >
> > Relevant information:
> > OS: Ubuntu 16.04
> > Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)
> > SLURM version: 17.02.5, compiled from source (after installing Lua)
> > using ./configure --prefix=/usr --sysconfdir=/etc/slurm
> >
> > Any guidance to get me up and running would be greatly appreciated!
> >
> > Thanks,
> > Nathan
>
>