[slurm-dev] Re: poor utilization, jobs not being scheduled
Am 31.10.2016 um 00:47 schrieb Vlad Firoiu: > What do you mean the ScheduleType is not explicit? I see > `SchedulerType=sched/backfill`. (I don't know too much about slurm so I > am probably misunderstanding something.) Vlad, you are right: ScheduleType _is_ set. I meant SelectType. See my other mail where I corrected the typo in the meantime and added a few uncurated ideas. Cheers, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321
[slurm-dev] Re: poor utilization, jobs not being scheduled
Am 31.10.2016 um 00:23 schrieb Benjamin Redling: > Are you aware that as long as SchedulerType Sorry, typo. I meant *SelectType* (The rest I wrote next is just unfiltered noise from my brain while skimming the conf:) >is not set to anything explicitly, select/linear is the default? > Am 30. Oktober 2016 19:00:28 MEZ, schrieb Vlad Firoiu: [...] > ## SelectType=select/linear This seems bad for good utilization. > PriorityType=priority/multifactor > PriorityDecayHalfLife=7-0 > PriorityUsageResetPeriod=MONTHLY I'm undecided about that... > # [...] > PriorityWeightQOS = 10 > PriorityWeightFairShare = 1000 Clearly dominant QOS/Fairshare: I hope accounting is setup right -- in the "portion" of your slurm.conf you didn't provide? > PriorityWeightAge = 10 Does this even matter compared to QOS and Fairshare? > PriorityWeightJobSize = 1 Reasons for not using PriorityFavorSmall (with or without SMALL_RELATIVE_TO_TIME) and considering the job size? I think that would improve utilization in combination with a SelectType that takes CR into account. The 10min job you mentioned would not starve behind the big ones. > PriorityWeightPartition = 1 > PriorityMaxAge=1-0 This way the age_factor maxes out after one day -- comparatively fast to the default of a week. But age doesn't really matter compared to QOS and Fairshare... what's the idea behind that? > The particular job in question has 0 priority. My fault: to ask for "sprio -w" with the individual weights in the first place would have been nice. Regards, Benjamin -- FSU Jena | JULIELab.de/Staff/Benjamin+Redling.html vox: +49 3641 9 44323 | fax: +49 3641 9 44321
[slurm-dev] Re: Can slurm work on one node?
I think it should. Can you send through your slurm.conf? Also, the logs usually explicitly say why slurmctld/slurmd don't start, and the best way to judge if slurm is running is with systemd: systemctl status slurmctl systemctl status slurmd cheers L. -- The most dangerous phrase in the language is, "We've always done it this way." - Grace Hopper On 31 October 2016 at 10:34, Peixin Qiaowrote: > I installed slurm-16.05.6 on ubuntu 16.04 on one node. > > When I started slurmdctld and slurmd, it does not start. > > I input sinfo, the output is: slurm_load_partitions: Unable to contact > slurm controller (connect failure) > I input: ps -ef | grep slurm, there is no output. > > Best Regards, > Peixin >
[slurm-dev] Re: poor utilization, jobs not being scheduled
What do you mean the ScheduleType is not explicit? I see `SchedulerType=sched/backfill`. (I don't know too much about slurm so I am probably misunderstanding something.) On Sun, Oct 30, 2016 at 7:25 PM, Benjamin Redling < benjamin.ra...@uni-jena.de> wrote: > Are you aware that as long as SchedulerType is not set to anything > explicitly, select/linear is the default? You won't get good utilization > with that (Tuning the rest seems secondary to me. Only worth a look if you > confirm that you want to reserve whole nodes.) Regards, Benjamin > > > Am 30. Oktober 2016 19:00:28 MEZ, schrieb Vlad Firoiu: >> >> Here's a portion of slurm.conf: >> >> # SCHEDULING >> SchedulerType=sched/backfill >> SchedulerParameters=bf_continue,bf_window=10080,bf_ >> resolution=600,bf_max_job_test=1,bf_interval=30,bf_max_job_user=50 >> #SchedulerAuth= >> #SchedulerPort= >> #SchedulerRootFilter= >> ## SelectType=select/linear >> FastSchedule=0 >> PriorityType=priority/multifactor >> PriorityDecayHalfLife=7-0 >> PriorityUsageResetPeriod=MONTHLY >> #PriorityDecayHalfLife=14-0 >> #PriorityUsageResetPeriod=14-0 >> #PriorityWeightFairshare=10 >> #PriorityWeightAge=1000 >> #PriorityWeightPartition=1 >> #PriorityWeightJobSize=1000 >> #PriorityMaxAge=1-0 >> # >> PriorityWeightQOS = 10 >> PriorityWeightFairShare = 1000 >> PriorityWeightAge = 10 >> PriorityWeightJobSize = 1 >> PriorityWeightPartition = 1 >> PriorityMaxAge=1-0 >> >> The particular job in question has 0 priority. >> >> On Sat, Oct 29, 2016 at 7:50 PM, Benjamin Redling < >> benjamin.ra...@uni-jena.de> wrote: >> >>> Are you allowed and able to post the slurm.conf? What does sprio -o %Q >>> -j say about that job? BR, Benjamin >>> >>> Am 29. Oktober 2016 20:56:18 MESZ, schrieb Vlad Firoiu < >>> vlad...@gmail.com>: I'm trying to figure out why utilization is low on our university cluster. It appears that many cores are available, but a minimal resource 10 minute job has been waiting in queue for days. There happen to be some big high priority jobs at the front of the queue, and I've noticed that these are being constantly scheduled and unscheduled. Is this expected behavior? Might it be causing slurm to never reach lower priority jobs and consider them for scheduling/backfill? >>> >>> -- >>> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail >>> gesendet. >>> >> >> > -- > Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail > gesendet. >
[slurm-dev] Can slurm work on one node?
I installed slurm-16.05.6 on ubuntu 16.04 on one node. When I started slurmdctld and slurmd, it does not start. I input sinfo, the output is: slurm_load_partitions: Unable to contact slurm controller (connect failure) I input: ps -ef | grep slurm, there is no output. Best Regards, Peixin
[slurm-dev] Re: Slurm versions 16.05.6 and 17.02.0-pre3 are now available
On 29/10/16 00:58, Peixin Qiao wrote: > Will the slurm version 16.05.6 support ubuntu 16.04? If you build it from source I suspect any moderately recent version will work there. If you are asking about the Ubuntu packaged version, then that's a question for Canonical, not SchedMD. :-) All the best, Chris -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci
[slurm-dev] Re: poor utilization, jobs not being scheduled
Here's a portion of slurm.conf: # SCHEDULING SchedulerType=sched/backfill SchedulerParameters=bf_continue,bf_window=10080,bf_resolution=600,bf_max_job_test=1,bf_interval=30,bf_max_job_user=50 #SchedulerAuth= #SchedulerPort= #SchedulerRootFilter= ## SelectType=select/linear FastSchedule=0 PriorityType=priority/multifactor PriorityDecayHalfLife=7-0 PriorityUsageResetPeriod=MONTHLY #PriorityDecayHalfLife=14-0 #PriorityUsageResetPeriod=14-0 #PriorityWeightFairshare=10 #PriorityWeightAge=1000 #PriorityWeightPartition=1 #PriorityWeightJobSize=1000 #PriorityMaxAge=1-0 # PriorityWeightQOS = 10 PriorityWeightFairShare = 1000 PriorityWeightAge = 10 PriorityWeightJobSize = 1 PriorityWeightPartition = 1 PriorityMaxAge=1-0 The particular job in question has 0 priority. On Sat, Oct 29, 2016 at 7:50 PM, Benjamin Redling < benjamin.ra...@uni-jena.de> wrote: > Are you allowed and able to post the slurm.conf? What does sprio -o %Q -j > say about that job? BR, Benjamin > > Am 29. Oktober 2016 20:56:18 MESZ, schrieb Vlad Firoiu>: >> >> I'm trying to figure out why utilization is low on our university >> cluster. It appears that many cores are available, but a minimal resource >> 10 minute job has been waiting in queue for days. There happen to be some >> big high priority jobs at the front of the queue, and I've noticed that >> these are being constantly scheduled and unscheduled. Is this expected >> behavior? Might it be causing slurm to never reach lower priority jobs and >> consider them for scheduling/backfill? >> > > -- > Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail > gesendet. >
[slurm-dev] Re: slurm network address problem ?
> from Ole Holm Nielsen: | You might want to check out my Wiki-page for setting up Slurm on CentOS >7.2: https://wiki.fysik.dtu.dk/niflheim/SLURM . >Perhaps you'll solve the problem using this information? Thank you very much >for your reply ! I have read your good page again. It looks by my opinion that you should add to it some your own strings about adding (in your example) corresponding SlurmctldLogFile and SlurmdLogFile statements to slurm.conf . I may now write again that (in my previous message) I wrote about 2 slurm ERRORS. Now slurm works on my PC, but in patricular there was no any NETWORK ADDRESS errors presented by slurm. I didn't any changes having relations to NETWORK ADDRESSes but slurm now works OK. But reading of your site gave me new ideas what may be interesting to do. Yours Mikhail > > >On 10/27/2016 04:14 PM, Mikhail Kuzminsky wrote: >> I worked w/PBS and SGE; now I'm beginner w/slurm, and installed >> slurm-16.05.5 on my home PC/x86-64 under CentOS 7.2 1511 (desktop >> installation). >> >> 1) Munge isn't necessary for my one PC w/slurm. But >> (after standard rpmbuild) directory /usr/lib64/slurm don't have >> auth_none.so plugin, >> and AuthType=auth/none in slurm.conf can't work, I use AuthType=auth/munge. >> >> 2) Both slurmd and slurmctld work on "myhome1" node. >> But "scontrol show nodes" informs that: >> ... >> Reason=NO NETWORK ADDRESS FOUND [slurm@2016-10-16T10:49:05] >> >> Therefore any my srun's say >> srun: Required node not available (down, drained or reserved) >> srun: job NN queued and waiting for resources >> >> >> I tried 3 variants of NodeName in slurm.conf : w/o NodeAddr and w/use >> 192.168.1.10 or even 127.0.0.1 >> My current string in slurm.conf is: >> NodeName=myhome1 Sockets=1 CoresPerSocket=2 ThreadsPerCore=1 >> NodeAddr=192.168.1.10 State=UNKNOWN >> >> >> Statical IP w/192.168.0.10 local address is used on my PC. >> /etc/hosts is: >> 127.0.0.1 localhost localhost.localdomain localhost4 >> localhost4.localdomain4 >> ::1 localhost localhost.localdomain localhost6 >> localhost6.localdomain6 >> 192.168.1.10 myhome1.ru myhome1 >> >> /etc/sysconfig/network-scripts/ifcfg-enp8s0 contains: >> TYPE="Ethernet" >> BOOTPROTO="static" >> NAME="enp8s0" >> DEVICE="enp8s0" >> IPADDR="192.168.1.10" >> GATEWAY="192.168.1.1" >> DEFROUTE="yes" >> ONBOOT="yes" >> PREFIX="24" >> etc >> >> External connection w/global Internet is realized via the same "enp8s0" >> and home router at 192.168.1.1 >> >> What should I do to have normal address for myhome1 for work w/slurm ? Mikhail Kuzminsky