Try this:
[root@dmc197 ~]# sbatch ls_test
sbatch: error:
Job requested Min Nodes: 4294967294.
sbatch: error: Batch job submission failed: Unspecified error
[root@dmc197 ~]# sbatch -N 2 ls_test
sbatch: error:
Job requested Min Nodes: 2.
sbatch: error: Batch job submission failed: Unspecified error
[root@dmc197 ~]# cat /etc/slurm/job_submit.lua
function slurm_job_modify(job_desc, part_list, submit_uid)
end
function slurm_job_submit(job_desc, part_list, submit_uid)
local test_min_nodes = job_desc.min_nodes
error_verbose = string.format("Job requested Min Nodes: %s.\n",
test_min_nodes)
slurm.log_user("\n%s", error_verbose)
return slurm.ERROR
end
--
Nicholas McCollum
HPC Systems Administrator
Alabama Supercomputer Authority
On Wed, 2017-06-28 at 13:51 -0600, Nathan Vance wrote:
> Correction (copy/pasted wrong thing): It was the
> "JobSubmitPlugins=lua" line in slurm.conf, not "job_submit.lua:
> initialized", that did the trick.
>
> At least, I thought that was the end of the story. Now I'm getting
> odd errors with reading job_desc and part_list that behave, in my
> estimate, like lua's receiving a bad pointer to the underlying c data
> structure.
>
> On ubuntu, the unedited job_submit.lua provided with the sample code
> runs without crashing, though it does not respect the --
> partition="foo" flag in sbatch as the source code suggests it should.
> When edited to include slurm.log_info("bar"), the script crashes
> with:
> /etc/slurm/job_submit.lua:38: attempt to compare number with nil
> The fact that behaviour changes based on the presence of unrelated
> code makes me think that this is a pointer issue, but I don't know
> enough about the compilation of lua to bytecode to diagnose it.
>
> On centos, with or without the log command, it crashes at the same
> point as on ubuntu.
>
> On both:
> When I comment out the example code so that it doesn't crash, then
> try to print out values in job_desc, I get some really odd results.
> For example, job_desc.min_nodes is 4294967294 (on both systems),
> regardless of what I set with sbatch job.sh --nodes=X. At first I
> thought that slurm gave my lua script a bad pointer to something that
> had already been garbage collected, but then I discovered that if I
> hard code something in lua such as job_desc.min_nodes=X, then slurm
> assigns X nodes to the job. So perhaps slurm respects what lua
> populates job_desc with, but slurm initially fills it with arbitrary
> values?
>
> Here's the lua script I used for the above experiments:
> ======== BEGIN job_submit.lua ========
> function slurm_job_submit(job_desc, part_list, submit_uid)
> slurm.log_info(job_desc.min_nodes)
> job_desc.min_nodes=5
> return slurm.SUCCESS
> end
>
> function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)
> return slurm.SUCCESS
> end
>
> slurm.log_info("initialized")
> return slurm.SUCCESS
> ======== END job_submit.lua ========
>
> As an aside, it looks like job_desc uses job_descriptor under the
> hood:
> https://github.com/SchedMD/slurm/blob/master/slurm/slurm.h.in#L1373-L
> 1553
> As I wasn't positive, I experimented first using job_desc.qos, which
> Nicholas indicated should be supported, but while it exhibited
> similar behaviour to min_nodes, it didn't fail quite as
> spectacularly.
> I couldn't figure out what structure backs part_list. The
> documentation at https://slurm.schedmd.com/job_submit_plugins.html
> isn't clear when all it says is that it's a "List of pointer to
> partitions which this user is authorized to use." [sic]
>
> I'm still using slurm 17.02.5. On ubuntu I'm using lua5.2, and on
> centos it's lua5.1. In both cases, lua (both the interpreter and the
> dev libraries) were installed from the repositories, and slurm was
> built from source.
>
> It seems like I filled an email with a whole lot of complaints and no
> real questions. So, is this a configuration error on my end? Should I
> suck it up and write my plugin in c, even though I don't need full
> access to slurmctld? Should I switch to using slurm-wlm? Should I
> open a bug report?
>
> Thanks,
> Nathan
>
> On 27 June 2017 at 17:07, Nicholas McCollum <[email protected]>
> wrote:
> > Nathan,
> >
> > I have very much appreciated the job_submit.lua plugin for helping
> > educate users on what is an acceptable job. It is one of my
> > favorite
> > features about SLURM and has been invaluable in assisting students
> > in
> > submitting valid job requirements.
> >
> > If a user specifies some absurd amount of memory, or some other
> > sbatch
> > or srun parameter... or does not choose a parameter, I like to
> > notify
> > the user what they have done wrong. For example I require all
> > users to
> > specify a QoS when they submit a job.
> >
> > ====== BEGIN EXAMPLE job_submit.lua ======
> >
> > function slurm_job_modify(job_desc, part_list, submit_uid)
> > end
> >
> > function slurm_job_submit(job_desc, part_list, submit_uid)
> >
> > --[[ Start with an error count of 0 ]]--
> > local asc_error = 0
> > local asc_error_verbose = ""
> >
> > --[[ Pretend if statement ]]--
> > asc_error = asc_error + 1
> > asc_error_verbose = string.format("%s\nERROR: Job requested
> > something we dont like.\n", asc_error_verbose)
> > --[[ End Pretend if statement ]]--
> >
> > --[[ Pretend if statement ]]--
> > asc_error = asc_error + 1
> > asc_error_verbose = string.format("%s\nERROR: More bad
> > stuff.\n",
> > asc_error_verbose)
> > --[[ End Pretend if statement ]]--
> >
> > if asc_error > 0 then
> > slurm.log_user("\n%s", asc_error_verbose)
> > return slurm.ERROR
> > end
> >
> > --[[ Want to return slurm.SUCCESS if the entire script runs to
> > end
> > ]]--
> > return slurm.SUCCESS
> > end
> >
> > ====== END EXAMPLE job_submit.lua =======
> >
> > This is the method that I worked out, where it collects all of the
> > errors inside asc_error_verbose and dumps out at the end with
> > return
> > slurm.ERROR. If you use the current file above, it will return
> > every
> > job with those errors above. This would be a great way to check
> > that
> > job_submit.lua is working on your system. If you have any current
> > jobs
> > though, it will kill them all... so use this on a development
> > environment for testing.
> >
> > My example for making a user specify a QoS:
> >
> > local asc_qos = job_desc.qos
> > if asc_qos == nil then
> > asc_error = asc_error + 1
> > asc_error_verbose = string.format("%s\nJob must request a QoS
> > using
> > the --qos= flag.\n",asc_error_verbose)
> > asc_qos = "invalid"
> > end
> >
> >
> > I'd be more than happy to share my job_submit.lua if anyone is
> > interested. I only ask that you share yours back.
> >
> > --
> > Nicholas McCollum
> > HPC Systems Administrator
> > Alabama Supercomputer Authority
> >
> > On Tue, 2017-06-27 at 14:30 -0600, Nathan Vance wrote:
> > > Darby,
> > >
> > > The "job_submit.lua: initialized" line in slurm.conf was indeed
> > the
> > > issue. When compiling slurm I only got the "yes lua" line without
> > the
> > > flags, but that seems to be just a difference in OS's.
> > >
> > > Now that I have debugging feedback I should be good to go!
> > >
> > > Thanks,
> > > Nathan
> > >
> > > On 27 June 2017 at 16:13, Vicker, Darby (JSC-EG311) <darby.vicker
> > -1@n
> > > asa.gov> wrote:
> > > > We recently started using a lua job submit plugin as well. You
> > > > have to have the lua-devel package installed when you compile
> > > > slurm. It looks like you do (but we use RHEL the package name
> > is
> > > > lua-devel) but confirm that you see something like these in
> > > > config.log:
> > > >
> > > > configure:24784: result: yes lua
> > > > pkg_cv_lua_LIBS='-llua -lm -ldl '
> > > > lua_CFLAGS=' -DLUA_COMPAT_ALL'
> > > > lua_LIBS='-llua -lm -ldl '
> > > >
> > > > Do you have this in your slurm.conf?
> > > >
> > > > JobSubmitPlugins=lua
> > > >
> > > > I'm guessing not given you don't see anything in the logs.
> > Before I
> > > > got all the errors worked out, I would see errors like this in
> > > > slurmctld_log:
> > > >
> > > > error: Couldn't find the specified plugin name for
> > job_submit/lua
> > > > looking at all files
> > > > error: cannot find job_submit plugin for job_submit/lua
> > > > error: cannot create job_submit context for job_submit/lua
> > > > failed to initialize job_submit plugin
> > > >
> > > >
> > > > After getting everything working, you should see this:
> > > >
> > > > job_submit.lua: initialized
> > > >
> > > > As well as any other slurm.log_info messages you put in your
> > lua
> > > > script.
> > > >
> > > >
> > > > From: Nathan Vance <[email protected]>
> > > > Reply-To: slurm-dev <[email protected]>
> > > > Date: Tuesday, June 27, 2017 at 12:15 PM
> > > > To: slurm-dev <[email protected]>
> > > > Subject: [slurm-dev] Job Submit Lua Plugin
> > > >
> > > > Hello all!
> > > >
> > > > I've been working on getting off the ground with Lua plugins.
> > The
> > > > goal is to implement Torque's routing queues for SLURM, but so
> > far
> > > > I have been unable to get SLURM to even call my plugin.
> > > >
> > > > What I have tried:
> > > > 1) Copied contrib/lua/job_submit.lua to /etc/slurm/ (the same
> > > > directory as slurm.conf)
> > > > 2) Restarted slurmctld and verified that no functionality was
> > > > broken
> > > > 3) Added slurm.log_info("I got here") to several points in the
> > > > script. After restarting slurmctld and submitting a job, grep
> > "I
> > > > got here" -R /var/log found no results.
> > > > 4) In case there was a problem with the log file, I added
> > > > os.execute("touch /home/myUser/slurm_job_submitted") to the top
> > of
> > > > the slurm_job_submit method. Restarting slurmctld and
> > submitting a
> > > > job still produced no evidence that my plugin was called.
> > > > 5) In case there were permission issues, I made job_submit.lua
> > > > executable. Nothing. Even grep "job_submit" -R /var/log (in
> > case
> > > > there was an error calling the script) comes up dry.
> > > >
> > > > Relevant information:
> > > > OS: Ubuntu 16.04
> > > > Lua: lua5.2 and liblua5.2-dev (I can use Lua interactively)
> > > > SLURM version: 17.02.5, compiled from source (after installing
> > Lua)
> > > > using ./configure --prefix=/usr --sysconfdir=/etc/slurm
> > > >
> > > > Any guidance to get me up and running would be greatly
> > appreciated!
> > > >
> > > > Thanks,
> > > > Nathan
> > >
> > >
>
>