[slurm-dev] Slurm Routing Algorithm
Hi Slurm, What is the specific jub submission partition method slurm uses by default? Which routing algorithms do slurm support? Thanks, Peixin
[slurm-dev] Re: I am confused about slurm user
Thanks for your help. I installed slurm and start it successfully. Now I need to read scheduling code and think about how to plugin plan-based scheduling. Have a good night. Best Regards, Peixin On Tue, Nov 15, 2016 at 1:57 AM, Alexandre Strube <su...@surak.eti.br> wrote: > If you are installing from Ubuntu packages, it creates the user for you. > > If you are installing from source, you must create a user with useradd or > adduser. > > Then you chown the file to this user. Remove the colon, like > > chown slurm /var/spool/slurmctld /var/log/slurm > > > []s > Alexandre Strube > > On 14 Nov 2016, at 23:44, Peixin Qiao <pq...@hawk.iit.edu> wrote: > > Hi experts, > > When I install and start slurm on Ubuntu as http://slurm.schedmd.com/ > quickstart_admin.html, I am confused about step 7: > > NOTE: The *SlurmUser* must exist prior to starting Slurm and must exist > on all nodes of the cluster. > NOTE: The parent directories for Slurm's log files, process ID files, > state save directories, etc. are not created by Slurm. They must be created > and made writable by *SlurmUser* as needed prior to starting Slurm > daemons. > > How to create slurmuser before creating those directories for slurm's log > files..? > > When I create those directories as https://wiki.fysik.dtu.dk/ > niflheim/SLURM: > > mkdir /var/spool/slurmctld /var/log/slurm > chown slurm: /var/spool/slurmctld /var/log/slurm > chmod 755 /var/spool/slurmctld /var/log/slurm > > > The command line shows: > chown: invalid spec: "slurm:" > > Could any expert explain the meaning of step 7? > > Best Regards, > Peixin > >
[slurm-dev] I am confused about slurm user
Hi experts, When I install and start slurm on Ubuntu as http://slurm.schedmd.com/quickstart_admin.html, I am confused about step 7: NOTE: The *SlurmUser* must exist prior to starting Slurm and must exist on all nodes of the cluster. NOTE: The parent directories for Slurm's log files, process ID files, state save directories, etc. are not created by Slurm. They must be created and made writable by *SlurmUser* as needed prior to starting Slurm daemons. How to create slurmuser before creating those directories for slurm's log files..? When I create those directories as https://wiki.fysik.dtu.dk/niflheim/SLURM: mkdir /var/spool/slurmctld /var/log/slurm chown slurm: /var/spool/slurmctld /var/log/slurm chmod 755 /var/spool/slurmctld /var/log/slurm The command line shows: chown: invalid spec: "slurm:" Could any expert explain the meaning of step 7? Best Regards, Peixin
[slurm-dev]
Hi, When I make slurm.conf, I am confused about the following note: NOTE: The parent directories for Slurm's log files, process ID files, state save directories, etc. are not created by Slurm. They must be created and made writable by *SlurmUser* as needed prior to starting Slurm daemons. How to make these files in detail? Best Regards, Peixin
[slurm-dev] start munge again after boot?
Hi, I install munge and restart my computer, then munge stopped work and restarting munge didn't work. It says: munged: Error: Failed to check pidfile dir "/var/run/munge": cannot canonicalize "/var/run/munge": No such file or directory Then, I reconfigure munge and start again, it works. Does it mean that /var/run/munge is temporary file and I need reconfigure munge, install and start it every time when I restart my computer? This is munge working on my computer, is it correct? ps -ef | grep munged root 3371 1643 0 14:02 ?00:00:00 munged peixin3377 31934 0 14:02 pts/100:00:00 grep --color=auto munged Best Regards, Peixin
[slurm-dev] Re: Can slurm work on one node?
Hi Ray, Which slurm version did you use on ubuntu? 15.08.7? Best Regards, Peixin On Mon, Oct 31, 2016 at 10:27 AM, Raymond Wan <rwan.w...@gmail.com> wrote: > > Hi, > > > On Mon, Oct 31, 2016 at 11:17 PM, Peixin Qiao <pq...@hawk.iit.edu> wrote: > > When I input ps -ef | grep munged, the result is as follows > > > > root 5312 1168 0 Oct30 ?00:00:01 munged > > root 5358 1168 0 Oct30 ?00:00:00 /usr/sbin/munged --force > > root 5390 1168 0 Oct30 ?00:00:00 /usr/sbin/munged --force > > peixin 15221 15207 0 10:12 pts/200:00:00 grep --color=auto munged > > > I'm not much of an expert, but it seems odd that you have two > instances of munged... > > As for your question in your very first post, slurm definitely works > on a single node. That's how I usually use slurm (not everyone is > lucky enough to have a cluster; some (of us) have just a single > computer but would like a queuing system). > > One problem I've had in the past was with the log directories and > /var/run directories not being created. That *might* explain the > "Reason: No such file or directory" error messages. Double-check that > all of the necessary directories exist and are writable by the slurm > user. > > One suggestion if you are still stuck is to consider using the older > version of slurm that comes with Ubuntu 16.04.1 . It is 15.08, but it > works. You might want to go to installing from source once you at > least get that working. (Just a suggestion -- I can understand if > you're starting from scratch, it would be appealing to use the latest > version.) > > Ray >
[slurm-dev] Re: Can slurm work on one node?
When I input ps -ef | grep munged, the result is as follows root 5312 1168 0 Oct30 ?00:00:01 munged root 5358 1168 0 Oct30 ?00:00:00 /usr/sbin/munged --force root 5390 1168 0 Oct30 ?00:00:00 /usr/sbin/munged --force peixin 15221 15207 0 10:12 pts/200:00:00 grep --color=auto munged Best Regards, Peixin On Mon, Oct 31, 2016 at 10:00 AM, andrealphus <andrealp...@gmail.com> wrote: > > did you install munge? > > On Mon, Oct 31, 2016 at 7:11 AM, Peixin Qiao <pq...@hawk.iit.edu> wrote: > > Hi Lachlan, > > > > My slurm.conf is as follows: > > > > # slurm.conf file generated by configurator easy.html. > > # Put this file on all nodes of your cluster. > > # See the slurm.conf man page for more information. > > # > > ClusterName=peixin-Inspiron-660s > > ControlMachine=peixin-Inspiron-660s > > #ControlAddr= > > # > > AuthType=auth/none > > CacheGroups=0 > > CryptoType=crypto/openssl > > #MailProg=/bin/mail > > MpiDefault=none > > #MpiParams=ports=#-# > > ProctrackType=proctrack/pgid > > ReturnToService=0 > > SlurmctldPidFile=/var/run/slurmctld.pid > > SlurmctldPort=6817 > > SlurmdPidFile=/var/run/slurmd.pid > > SlurmdPort=6818 > > SlurmdSpoolDir=/var/spool/slurmd > > SlurmUser=slurm > > #SlurmdUser=root > > StateSaveLocation=/var/spool > > SwitchType=switch/none > > TaskPlugin=task/none > > # > > # > > # TIMERS > > InactiveLimit=0 > > KillWait=30 > > MinJobAge=300 > > SlurmctldTimeout=300 > > SlurmdTimeout=300 > > Waittime=0 > > # > > # SCHEDULING > > FastSchedule=1 > > SchedulerType=sched/backfill > > SchedulerPort=7321 > > SelectType=select/linear > > # > > # > > # LOGGING AND ACCOUNTING > > AccountingStorageType=accounting_storage/none > > ClusterName=cluster > > JobCompType=jobcomp/none > > JobCredentialPrivateKey = /usr/local/etc/slurm.key > > JobCredentialPublicCertificate = /usr/local/etc/slurm.cert > > #JobAcctGatherFrequency=30 > > JobAcctGatherType=jobacct_gather/peixin-Inspiron-660s > > SlurmctldDebug=3 > > #SlurmctldLogFile= > > SlurmdDebug=3 > > #SlurmdLogFile= > > # > > # > > # COMPUTE NODES > > NodeName=peixin-Inspiron-660s CPUs=4 RealMemory=5837 Sockets=4 > > PartitionName=debug Nodes=peixin-Inspiron-660s Default=YES > > > > The result for command systemctl status slurmctld: > > slurmctld.service > >Loaded: not-found (Reason: No such file or directory) > >Active: inactive (dead) > > The result for command systemctl status slurmd: > > slurmd.service > >Loaded: not-found (Reason: No such file or directory) > >Active: inactive (dead) > > > > > > Best Regards, > > Peixin > > > > On Sun, Oct 30, 2016 at 6:51 PM, Lachlan Musicman <data...@gmail.com> > wrote: > >> > >> I think it should. Can you send through your slurm.conf? > >> > >> Also, the logs usually explicitly say why slurmctld/slurmd don't start, > >> and the best way to judge if slurm is running is with systemd: > >> > >> systemctl status slurmctl > >> systemctl status slurmd > >> > >> > >> > >> cheers > >> L. > >> > >> -- > >> The most dangerous phrase in the language is, "We've always done it this > >> way." > >> > >> - Grace Hopper > >> > >> On 31 October 2016 at 10:34, Peixin Qiao <pq...@hawk.iit.edu> wrote: > >>> > >>> I installed slurm-16.05.6 on ubuntu 16.04 on one node. > >>> > >>> When I started slurmdctld and slurmd, it does not start. > >>> > >>> I input sinfo, the output is: slurm_load_partitions: Unable to contact > >>> slurm controller (connect failure) > >>> I input: ps -ef | grep slurm, there is no output. > >>> > >>> Best Regards, > >>> Peixin > >> > >> > > >
[slurm-dev] Re: Can slurm work on one node?
Hi Lachlan, My slurm.conf is as follows: # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ClusterName=peixin-Inspiron-660s ControlMachine=peixin-Inspiron-660s #ControlAddr= # AuthType=auth/none CacheGroups=0 CryptoType=crypto/openssl #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=0 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool SwitchType=switch/none TaskPlugin=task/none # # # TIMERS InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=300 SlurmdTimeout=300 Waittime=0 # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/linear # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobCompType=jobcomp/none JobCredentialPrivateKey = /usr/local/etc/slurm.key JobCredentialPublicCertificate = /usr/local/etc/slurm.cert #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/peixin-Inspiron-660s SlurmctldDebug=3 #SlurmctldLogFile= SlurmdDebug=3 #SlurmdLogFile= # # # COMPUTE NODES NodeName=peixin-Inspiron-660s CPUs=4 RealMemory=5837 Sockets=4 PartitionName=debug Nodes=peixin-Inspiron-660s Default=YES The result for command systemctl status slurmctld: slurmctld.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) The result for command systemctl status slurmd: slurmd.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead) Best Regards, Peixin On Sun, Oct 30, 2016 at 6:51 PM, Lachlan Musicman <data...@gmail.com> wrote: > I think it should. Can you send through your slurm.conf? > > Also, the logs usually explicitly say why slurmctld/slurmd don't start, > and the best way to judge if slurm is running is with systemd: > > systemctl status slurmctl > systemctl status slurmd > > > > cheers > L. > > -- > The most dangerous phrase in the language is, "We've always done it this > way." > > - Grace Hopper > > On 31 October 2016 at 10:34, Peixin Qiao <pq...@hawk.iit.edu> wrote: > >> I installed slurm-16.05.6 on ubuntu 16.04 on one node. >> >> When I started slurmdctld and slurmd, it does not start. >> >> I input sinfo, the output is: slurm_load_partitions: Unable to contact >> slurm controller (connect failure) >> I input: ps -ef | grep slurm, there is no output. >> >> Best Regards, >> Peixin >> > >
[slurm-dev] Can slurm work on one node?
I installed slurm-16.05.6 on ubuntu 16.04 on one node. When I started slurmdctld and slurmd, it does not start. I input sinfo, the output is: slurm_load_partitions: Unable to contact slurm controller (connect failure) I input: ps -ef | grep slurm, there is no output. Best Regards, Peixin
[slurm-dev] munge running error
Hi slurm, I install munge on ubuntu 16.04 as follows: tar munge-0.5.12.tar.xz cd munge-0.5.12 ./configure make make install cd /etc/init.d sudo munged Then I get the error: munged: Error: Failed to check keyfile "/usr/local/etc/munge/munge.key": No such file or directory Best Regards, Peixin
[slurm-dev] Re: Slurm versions 16.05.6 and 17.02.0-pre3 are now available
Hi Danny, Will the slurm version 16.05.6 support ubuntu 16.04? I have tried last version on ubuntu and have some error. Someone face the same error and said it was bug. Best Regards, Peixin On Thu, Oct 27, 2016 at 5:37 PM, Danny Aublewrote: > Slurm version 16.05.6 is now available and includes around 40 bug fixes > developed over the past month. > > We have also made the third pre-release of version 17.02, which is under > development and scheduled for release in February 2017. > > Slurm downloads are available from http://www.schedmd.com/#repos. > > We are excited to see you all next month at SC16, please feel free to come > by our booth #412. > > The Slurm BoF will be Thursday, November 17th 12:15pm - 1:15pm in room > 355-E > > More information about that can be found at http://sc16.supercomputing. > org/presentation/?id=bof101=sess321. >
[slurm-dev] Re: slurmd: fatal: Frontend not configured correctly in slurm.conf
Hi Gennaro, My slurm.conf is as follows: # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=peixin #ControlAddr= # AuthType=auth/none CacheGroups=0 CryptoType=crypto/openssl #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool SwitchType=switch/none TaskPlugin=task/none # # # TIMERS InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/linear # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobCompType=jobcomp/none JobCredentialPrivateKey = /usr/local/etc/slurm.key JobCredentialPublicCertificate = /usr/local/etc/slurm.cert #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 #SlurmctldLogFile= SlurmdDebug=3 #SlurmdLogFile= # # # COMPUTE NODES NodeName=peixin CPUs=4 RealMemory=5837 Sockets=4 PartitionName=debug Nodes=peixin Default=YES Does your slurm work well in your ubuntu 16.04 system? Best Regards, Peixin On Tue, Oct 25, 2016 at 5:12 PM, Gennaro Oliva <oliv...@na.icar.cnr.it> wrote: > > Hi Peixin and Alexandre, > > On Mon, Oct 24, 2016 at 08:12:04AM -0700, Alexandre Strube wrote: > > 2016-10-24 17:05 GMT+02:00 Peixin Qiao <pq...@hawk.iit.edu>: > > > When I install slurm and start it on ubuntu 16.04, I got the error: > > > > > > slurred: fatal: Frontend not configured correctly in slurm.conf. See > man > > > slurm.conf for frontendname > > I have installed a vm with ubuntu 16.04 and configured a single node > with slurm version 15.08.7-1build1 without getting the Frontend error. > Can you please share your slurm.conf > Thanks > -- > Gennaro Oliva >
[slurm-dev] Re: plan-based scheduler plugin
I have found it and it is in /src/plugins/sched/. I read Slurm Scheduler Plugin API: http://slurm.schedmd.com/schedplugins.html Slurm scheduler plugins are Slurm plugins that implement the Slurm scheduler API described herein. They must conform to the Slurm Plugin API with the following specifications: Does it mean that I cannot plugin plan-based algorithm to change the queue-based method? Best Regards, Peixin On Tue, Oct 25, 2016 at 10:28 AM, Peixin Qiao <pq...@hawk.iit.edu> wrote: > Hello, > > I want to plugin plan-based scheduler to change the FCFS scheduler. I > cannot find the FCFS and backfill api. Could you please help me tell me > where I can find it? > > Best Regards, > Peixin >
[slurm-dev] plan-based scheduler plugin
Hello, I want to plugin plan-based scheduler to change the FCFS scheduler. I cannot find the FCFS and backfill api. Could you please help me tell me where I can find it? Best Regards, Peixin
[slurm-dev] slurm_load_partitions: Unable to contact slurm controller (connect failure)
Hello, I installed slurm-llnl on Debian on one computer. When I ran slurmctld and slurmd, I got the error: slurm_load_partitions: Unable to contact slurm controller (connect failure). The slurm.conf is as follows: # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=debian #ControlAddr= # AuthType=auth/none CacheGroups=0 CryptoType=crypto/openssl #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/spool SwitchType=switch/none TaskPlugin=task/none # # # TIMERS InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/linear # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster JobCompType=jobcomp/none JobCredentialPrivateKey = /usr/local/etc/slurm.key JobCredentialPublicCertificate = /usr/local/etc/slurm.cert #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 #SlurmctldLogFile= SlurmdDebug=3 #SlurmdLogFile= # # # COMPUTE NODES NodeName=debian CPUs=4 RealMemory=5837 Sockets=4 PartitionName=debug Nodes=debian Default=YES Best Regards, Peixin
[slurm-dev] slurmd: fatal: Frontend not configured correctly in slurm.conf
Hello, When I install slurm and start it on ubuntu 16.04, I got the error: slurred: fatal: Frontend not configured correctly in slurm.conf. See man slurm.conf for frontendname After seeing man slurm.conf, I still confused about how to change slurm.conf. Could you please help me with the detailed change in the slurm.conf file? Best Regards, Peixin Ph.D. candidate in Computer Science Illinois Institute of Technology