Hello, Why not enable this functionality by setting DefaultTime=0 in slurm.conf which would let us set this on per-partition basis, rather than through job submit plugin. (Unless i'm missing something obvious here)
Also currently setting DefaultTime=0 (on 2.5.6 at least) gives following message: # srun -N2 hostname srun: error: Unable to create job step: Job/step already completing or completed I suppose it is the way it should be, but seems rather illogical to be able to set this at all. -- Nikita Burtsev Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, June 28, 2013 at 7:25 PM, Daniel M. Weeks wrote: > > Hi Ryan, > > Thanks. We had considered this approach but went in a different > direction for a couple reasons: > > We have a good number of users that script job submissions and may blast > out up to several hundred jobs. A user might not realize their jobs are > getting cutoff until many of them run and it's a waste of resources. > > Also, we have many users that are relatively new to HPC/Slurm and work > from guides or tutorials that don't explain things very well. The > distinct error message at job submission rather than a related error > after a "failure" (from the user's perspective) keeps a lot of support > emails out of my inbox. Of course I'd like them to learn to use Slurm > better but they usually want to focus on their own research first. > > - Dan > > On 06/28/2013 11:00 AM, Ryan Cox wrote: > > An alternative that we do is choose very low defaults for people: > > PartitionName=Default DefaultTime=30:00 #plus other options ........ > > DefMemPerCPU=512 > > > > The disadvantage to this approach is that it doesn't give an obvious > > error message at submit time. However, it's not hard to figure out what > > happened when they hit the time limit or the error output says they went > > over their memory limit. > > > > Ryan > > > > On 06/28/2013 08:29 AM, Daniel M. Weeks wrote: > > > At CCNI, we use backfill scheduling on all our systems. However, we have > > > found that users typically do not specify a time limit for their job so > > > the scheduler assumes the maximum from QoS/user limits/partition > > > limits/etc. This really hurts backfilling since the scheduler remains > > > ignorant of short jobs. > > > > > > Attached is a small patch I wrote containing a job submit plugin and a > > > new error message. The plugin rejects a job submission when it is > > > missing a time limit and will provide the user with a clear and distinct > > > error. > > > > > > I've just re-tested and the patch applies and builds cleanly on the > > > slurm-2.5, slurm-2.6, and master branches. > > > > > > Please let me know if you find this useful, run across problems, or have > > > suggestions/improvements. Thanks. > > > > > > > > > -- > > Ryan Cox > > Operations Director > > Fulton Supercomputing Lab > > Brigham Young University > > > > > > -- > Daniel M. Weeks > Systems Programmer > Computational Center for Nanotechnology Innovations > Rensselaer Polytechnic Institute > Troy, NY 12180 > 518-276-4458 > >
