Hello, 

Why not enable this functionality by setting DefaultTime=0 in slurm.conf which 
would let us set this on per-partition basis, rather than through job submit 
plugin. (Unless i'm missing something obvious here) 

Also currently setting DefaultTime=0 (on 2.5.6 at least) gives following 
message:
# srun -N2 hostname
srun: error: Unable to create job step: Job/step already completing or completed


I suppose it is the way it should be, but seems rather illogical to be able to 
set this at all. 

-- 
Nikita Burtsev
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, June 28, 2013 at 7:25 PM, Daniel M. Weeks wrote:

> 
> Hi Ryan,
> 
> Thanks. We had considered this approach but went in a different
> direction for a couple reasons:
> 
> We have a good number of users that script job submissions and may blast
> out up to several hundred jobs. A user might not realize their jobs are
> getting cutoff until many of them run and it's a waste of resources.
> 
> Also, we have many users that are relatively new to HPC/Slurm and work
> from guides or tutorials that don't explain things very well. The
> distinct error message at job submission rather than a related error
> after a "failure" (from the user's perspective) keeps a lot of support
> emails out of my inbox. Of course I'd like them to learn to use Slurm
> better but they usually want to focus on their own research first.
> 
> - Dan
> 
> On 06/28/2013 11:00 AM, Ryan Cox wrote:
> > An alternative that we do is choose very low defaults for people:
> > PartitionName=Default DefaultTime=30:00 #plus other options ........
> > DefMemPerCPU=512
> > 
> > The disadvantage to this approach is that it doesn't give an obvious
> > error message at submit time. However, it's not hard to figure out what
> > happened when they hit the time limit or the error output says they went
> > over their memory limit.
> > 
> > Ryan
> > 
> > On 06/28/2013 08:29 AM, Daniel M. Weeks wrote:
> > > At CCNI, we use backfill scheduling on all our systems. However, we have
> > > found that users typically do not specify a time limit for their job so
> > > the scheduler assumes the maximum from QoS/user limits/partition
> > > limits/etc. This really hurts backfilling since the scheduler remains
> > > ignorant of short jobs.
> > > 
> > > Attached is a small patch I wrote containing a job submit plugin and a
> > > new error message. The plugin rejects a job submission when it is
> > > missing a time limit and will provide the user with a clear and distinct
> > > error.
> > > 
> > > I've just re-tested and the patch applies and builds cleanly on the
> > > slurm-2.5, slurm-2.6, and master branches.
> > > 
> > > Please let me know if you find this useful, run across problems, or have
> > > suggestions/improvements. Thanks.
> > > 
> > 
> > 
> > -- 
> > Ryan Cox
> > Operations Director
> > Fulton Supercomputing Lab
> > Brigham Young University
> > 
> 
> 
> 
> -- 
> Daniel M. Weeks
> Systems Programmer
> Computational Center for Nanotechnology Innovations
> Rensselaer Polytechnic Institute
> Troy, NY 12180
> 518-276-4458
> 
> 


Reply via email to