On Mon, 2 May 2011 09:18:53 -0700, "[email protected]" <[email protected]>
wrote:
> Thanks for that pointer. Not to beat a dead horse, but my problem is
> that --exclusive works in two modes. In immediate mode (for lack of a
> better term), I rarely want --exclusive, but when initiating multiple
> jobs from within an existing allocation, I want to force it.
Yeah, I think the two different uses for --exclusive have caused
a lot of confusion over time.
One thing you could try is to check for SLURM_JOBID set in srun's
environment. If it is set, this srun is running under an existing
allocation, and if not set then then the srun is running in
allocate-and-run mode.
In order to check the environment from each instance of srun, you
will probably have to use a spank plugin or other plugin that runs
in the context of the process. For example, a spank/lua plugin
might look like:
function slurm_spank_init (spank)
if (spank.remote) then return end
local posix = require 'posix'
if posix.getenv ("SLURM_JOBID") then
posix.setenv ("SLURM_EXCLUSIVE", 1, 1)
end
end
This sets SLURM_EXCLUSIVE=1 in srun before srun processes its
environment, so it works. However, the only gotcha is that if
the user doesn't explicitly set --ntasks then this produces
an error
srun: error: --ntasks must be set with --exclusive
Maybe another idea would be to extend the job_submit plugin API
to include job step filtering as well?
mark
> Jeff
>
> On Sat, Apr 30, 2011 at 8:37 AM, <[email protected]> wrote:
> > Setting SLURM_EXCLUSIVE environment variable will make this the
> > default behavior for srun commands. You could just set this for
> > this particular user if desired.
> >T
> > Moe
> >
> >
> > Quoting [email protected]:
> >
> >> Ideally I'd like a way of setting this as the default for (s)batch as
> >> it provides an avenue for users to overload the nodes thinking that
> >> they're running faster (don't get me started...). I can insist the
> >> user runs their job steps serially but it's hard to enforce without a
> >> lot of effort.
> >>
> >> Jeff
> >>
> >> On Fri, Apr 29, 2011 at 2:05 PM, Auble, Danny <[email protected]> wrote:
> >>>
> >>> Hey Jeff, Thanks for the accolades ;).
> >>>
> >>> Have you tried the srun '--exclusive' option? That should keep things
> >>> separate.
> >>>
> >>> Let us know if that doesn't work,
> >>> Danny
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: [email protected]
> >>>> [mailto:[email protected]] On Behalf Of
> >>>> [email protected]
> >>>> Sent: Friday, April 29, 2011 2:01 PM
> >>>> To: [email protected]
> >>>> Subject: [slurm-dev] Stupid Sbatch Question
> >>>>
> >>>> I have a user (in the same sense that I have an ingrown toenail) who
> >>>> runs a job like this...
> >>>>
> >>>> She starts with 'sbatch -p whatever -n 145 scriptname'
> >>>> The script contains something like:
> >>>> srun foo &
> >>>> srun bar
> >>>>
> >>>> What's happening is that each core is doubly allocated e.g. 16 procs
> >>>> running on an 8 core node. Why isn't the second srun constrained by
> >>>> the resources consumed by the 1st?
> >>>>
> >>>> This is running under (the very excellent) Slurm 2.2.3.
> >>>>
> >>>> Jeff Katcher
> >>>> FHCRC Cluster Monkey
> >>>
> >>>
> >>
> >>
> >
> >
> >
> >
> >
>