[slurm-dev] Slurm Routing Algorithm

2017-02-13 Thread Peixin Qiao
Hi Slurm,

What is the specific jub submission partition method slurm uses by default?
Which routing algorithms do slurm support?

Thanks,
Peixin


[slurm-dev] Re: I am confused about slurm user

2016-11-16 Thread Peixin Qiao
Thanks for your help. I installed slurm and start it successfully.

Now I need to read scheduling code and think about how to plugin plan-based
scheduling.

Have a good night.

Best Regards,
Peixin

On Tue, Nov 15, 2016 at 1:57 AM, Alexandre Strube <su...@surak.eti.br>
wrote:

> If you are installing from Ubuntu packages, it creates the user for you.
>
> If you are installing from source, you must create a user with useradd or
> adduser.
>
> Then you chown the file to this user. Remove the colon, like
>
> chown slurm /var/spool/slurmctld /var/log/slurm
>
>
> []s
> Alexandre Strube
>
> On 14 Nov 2016, at 23:44, Peixin Qiao <pq...@hawk.iit.edu> wrote:
>
> Hi experts,
>
> When I install and start slurm on Ubuntu as http://slurm.schedmd.com/
> quickstart_admin.html, I am confused about step 7:
>
> NOTE: The *SlurmUser* must exist prior to starting Slurm and must exist
> on all nodes of the cluster.
> NOTE: The parent directories for Slurm's log files, process ID files,
> state save directories, etc. are not created by Slurm. They must be created
> and made writable by *SlurmUser* as needed prior to starting Slurm
> daemons.
>
> How to create slurmuser before creating those directories for slurm's log
> files..?
>
> When I create those directories as https://wiki.fysik.dtu.dk/
> niflheim/SLURM:
>
> mkdir /var/spool/slurmctld /var/log/slurm
> chown slurm: /var/spool/slurmctld /var/log/slurm
> chmod 755 /var/spool/slurmctld /var/log/slurm
>
>
> The command line shows:
> chown: invalid spec: "slurm:"
>
> Could any expert explain the meaning of step 7?
>
> Best Regards,
> Peixin
>
>


[slurm-dev] I am confused about slurm user

2016-11-14 Thread Peixin Qiao
Hi experts,

When I install and start slurm on Ubuntu as
http://slurm.schedmd.com/quickstart_admin.html, I am confused about step 7:

NOTE: The *SlurmUser* must exist prior to starting Slurm and must exist on
all nodes of the cluster.
NOTE: The parent directories for Slurm's log files, process ID files, state
save directories, etc. are not created by Slurm. They must be created and
made writable by *SlurmUser* as needed prior to starting Slurm daemons.

How to create slurmuser before creating those directories for slurm's log
files..?

When I create those directories as https://wiki.fysik.dtu.dk/niflheim/SLURM:

mkdir /var/spool/slurmctld /var/log/slurm
chown slurm: /var/spool/slurmctld /var/log/slurm
chmod 755 /var/spool/slurmctld /var/log/slurm


The command line shows:
chown: invalid spec: "slurm:"

Could any expert explain the meaning of step 7?

Best Regards,
Peixin


[slurm-dev]

2016-11-07 Thread Peixin Qiao
Hi,

When I make slurm.conf, I am confused about the following note:

NOTE: The parent directories for Slurm's log files, process ID files, state
save directories, etc. are not created by Slurm. They must be created and
made writable by *SlurmUser* as needed prior to starting Slurm daemons.

How to make these files in detail?

Best Regards,
Peixin


[slurm-dev] start munge again after boot?

2016-11-07 Thread Peixin Qiao
Hi,

I install munge and restart my computer, then munge stopped work and
restarting munge didn't work. It says:

munged: Error: Failed to check pidfile dir "/var/run/munge": cannot
canonicalize "/var/run/munge": No such file or directory

Then, I reconfigure munge and start again, it works. Does it mean that
/var/run/munge is temporary file and  I need reconfigure munge, install and
start it every time when I restart my computer?

This is munge working on my computer, is it correct?
ps -ef | grep munged
root  3371  1643  0 14:02 ?00:00:00 munged
peixin3377 31934  0 14:02 pts/100:00:00 grep --color=auto munged

Best Regards,
Peixin


[slurm-dev] Re: Can slurm work on one node?

2016-10-31 Thread Peixin Qiao
Hi Ray,

Which slurm version did you use on ubuntu? 15.08.7?

Best Regards,
Peixin

On Mon, Oct 31, 2016 at 10:27 AM, Raymond Wan <rwan.w...@gmail.com> wrote:

>
> Hi,
>
>
> On Mon, Oct 31, 2016 at 11:17 PM, Peixin Qiao <pq...@hawk.iit.edu> wrote:
> > When I input ps -ef | grep munged, the result is as follows
> >
> > root  5312  1168  0 Oct30 ?00:00:01 munged
> > root  5358  1168  0 Oct30 ?00:00:00 /usr/sbin/munged --force
> > root  5390  1168  0 Oct30 ?00:00:00 /usr/sbin/munged --force
> > peixin   15221 15207  0 10:12 pts/200:00:00 grep --color=auto munged
>
>
> I'm not much of an expert, but it seems odd that you have two
> instances of munged...
>
> As for your question in your very first post, slurm definitely works
> on a single node.  That's how I usually use slurm (not everyone is
> lucky enough to have a cluster; some (of us) have just a single
> computer but would like a queuing system).
>
> One problem I've had in the past was with the log directories and
> /var/run directories not being created.  That *might* explain the
> "Reason: No such file or directory" error messages.  Double-check that
> all of the necessary directories exist and are writable by the slurm
> user.
>
> One suggestion if you are still stuck is to consider using the older
> version of slurm that comes with Ubuntu 16.04.1 .  It is 15.08, but it
> works.  You might want to go to installing from source once you at
> least get that working.  (Just a suggestion -- I can understand if
> you're starting from scratch, it would be appealing to use the latest
> version.)
>
> Ray
>


[slurm-dev] Re: Can slurm work on one node?

2016-10-31 Thread Peixin Qiao
When I input ps -ef | grep munged, the result is as follows

root  5312  1168  0 Oct30 ?00:00:01 munged
root  5358  1168  0 Oct30 ?00:00:00 /usr/sbin/munged --force
root  5390  1168  0 Oct30 ?00:00:00 /usr/sbin/munged --force
peixin   15221 15207  0 10:12 pts/200:00:00 grep --color=auto munged

Best Regards,
Peixin

On Mon, Oct 31, 2016 at 10:00 AM, andrealphus <andrealp...@gmail.com> wrote:

>
> did you install munge?
>
> On Mon, Oct 31, 2016 at 7:11 AM, Peixin Qiao <pq...@hawk.iit.edu> wrote:
> > Hi Lachlan,
> >
> > My slurm.conf is as follows:
> >
> > # slurm.conf file generated by configurator easy.html.
> > # Put this file on all nodes of your cluster.
> > # See the slurm.conf man page for more information.
> > #
> > ClusterName=peixin-Inspiron-660s
> > ControlMachine=peixin-Inspiron-660s
> > #ControlAddr=
> > #
> > AuthType=auth/none
> > CacheGroups=0
> > CryptoType=crypto/openssl
> > #MailProg=/bin/mail
> > MpiDefault=none
> > #MpiParams=ports=#-#
> > ProctrackType=proctrack/pgid
> > ReturnToService=0
> > SlurmctldPidFile=/var/run/slurmctld.pid
> > SlurmctldPort=6817
> > SlurmdPidFile=/var/run/slurmd.pid
> > SlurmdPort=6818
> > SlurmdSpoolDir=/var/spool/slurmd
> > SlurmUser=slurm
> > #SlurmdUser=root
> > StateSaveLocation=/var/spool
> > SwitchType=switch/none
> > TaskPlugin=task/none
> > #
> > #
> > # TIMERS
> > InactiveLimit=0
> > KillWait=30
> > MinJobAge=300
> > SlurmctldTimeout=300
> > SlurmdTimeout=300
> > Waittime=0
> > #
> > # SCHEDULING
> > FastSchedule=1
> > SchedulerType=sched/backfill
> > SchedulerPort=7321
> > SelectType=select/linear
> > #
> > #
> > # LOGGING AND ACCOUNTING
> > AccountingStorageType=accounting_storage/none
> > ClusterName=cluster
> > JobCompType=jobcomp/none
> > JobCredentialPrivateKey = /usr/local/etc/slurm.key
> > JobCredentialPublicCertificate = /usr/local/etc/slurm.cert
> > #JobAcctGatherFrequency=30
> > JobAcctGatherType=jobacct_gather/peixin-Inspiron-660s
> > SlurmctldDebug=3
> > #SlurmctldLogFile=
> > SlurmdDebug=3
> > #SlurmdLogFile=
> > #
> > #
> > # COMPUTE NODES
> > NodeName=peixin-Inspiron-660s CPUs=4 RealMemory=5837 Sockets=4
> > PartitionName=debug Nodes=peixin-Inspiron-660s Default=YES
> >
> > The result for command systemctl status slurmctld:
> > slurmctld.service
> >Loaded: not-found (Reason: No such file or directory)
> >Active: inactive (dead)
> > The result for command systemctl status slurmd:
> > slurmd.service
> >Loaded: not-found (Reason: No such file or directory)
> >Active: inactive (dead)
> >
> >
> > Best Regards,
> > Peixin
> >
> > On Sun, Oct 30, 2016 at 6:51 PM, Lachlan Musicman <data...@gmail.com>
> wrote:
> >>
> >> I think it should. Can you send through your slurm.conf?
> >>
> >> Also, the logs usually explicitly say why slurmctld/slurmd don't start,
> >> and the best way to judge if slurm is running is with systemd:
> >>
> >> systemctl status slurmctl
> >> systemctl status slurmd
> >>
> >>
> >>
> >> cheers
> >> L.
> >>
> >> --
> >> The most dangerous phrase in the language is, "We've always done it this
> >> way."
> >>
> >> - Grace Hopper
> >>
> >> On 31 October 2016 at 10:34, Peixin Qiao <pq...@hawk.iit.edu> wrote:
> >>>
> >>> I installed slurm-16.05.6 on ubuntu 16.04 on one node.
> >>>
> >>> When I started slurmdctld and slurmd, it does not start.
> >>>
> >>> I input sinfo, the output is: slurm_load_partitions: Unable to contact
> >>> slurm controller (connect failure)
> >>> I input: ps -ef | grep slurm, there is no output.
> >>>
> >>> Best Regards,
> >>> Peixin
> >>
> >>
> >
>


[slurm-dev] Re: Can slurm work on one node?

2016-10-31 Thread Peixin Qiao
Hi Lachlan,

My slurm.conf is as follows:

# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=peixin-Inspiron-660s
ControlMachine=peixin-Inspiron-660s
#ControlAddr=
#
AuthType=auth/none
CacheGroups=0
CryptoType=crypto/openssl
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=0
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=300
SlurmdTimeout=300
Waittime=0
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobCompType=jobcomp/none
JobCredentialPrivateKey = /usr/local/etc/slurm.key
JobCredentialPublicCertificate = /usr/local/etc/slurm.cert
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/peixin-Inspiron-660s
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#
#
# COMPUTE NODES
NodeName=peixin-Inspiron-660s CPUs=4 RealMemory=5837 Sockets=4
PartitionName=debug Nodes=peixin-Inspiron-660s Default=YES

The result for command systemctl status slurmctld:
slurmctld.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)
The result for command systemctl status slurmd:
slurmd.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)


Best Regards,
Peixin

On Sun, Oct 30, 2016 at 6:51 PM, Lachlan Musicman <data...@gmail.com> wrote:

> I think it should. Can you send through your slurm.conf?
>
> Also, the logs usually explicitly say why slurmctld/slurmd don't start,
> and the best way to judge if slurm is running is with systemd:
>
> systemctl status slurmctl
> systemctl status slurmd
>
>
>
> cheers
> L.
>
> --
> The most dangerous phrase in the language is, "We've always done it this
> way."
>
> - Grace Hopper
>
> On 31 October 2016 at 10:34, Peixin Qiao <pq...@hawk.iit.edu> wrote:
>
>> I installed slurm-16.05.6 on ubuntu 16.04 on one node.
>>
>> When I started slurmdctld and slurmd, it does not start.
>>
>> I input sinfo, the output is: slurm_load_partitions: Unable to contact
>> slurm controller (connect failure)
>> I input: ps -ef | grep slurm, there is no output.
>>
>> Best Regards,
>> Peixin
>>
>
>


[slurm-dev] Can slurm work on one node?

2016-10-30 Thread Peixin Qiao
I installed slurm-16.05.6 on ubuntu 16.04 on one node.

When I started slurmdctld and slurmd, it does not start.

I input sinfo, the output is: slurm_load_partitions: Unable to contact
slurm controller (connect failure)
I input: ps -ef | grep slurm, there is no output.

Best Regards,
Peixin


[slurm-dev] munge running error

2016-10-28 Thread Peixin Qiao
Hi slurm,

I install munge on ubuntu 16.04 as follows:

tar munge-0.5.12.tar.xz
cd munge-0.5.12
./configure
make
make install
cd /etc/init.d
sudo munged

Then I get the error:
munged: Error: Failed to check keyfile "/usr/local/etc/munge/munge.key": No
such file or directory

Best Regards,
Peixin


[slurm-dev] Re: Slurm versions 16.05.6 and 17.02.0-pre3 are now available

2016-10-28 Thread Peixin Qiao
Hi Danny,

Will the slurm version 16.05.6 support ubuntu 16.04?

I have tried last version on ubuntu and have some error. Someone face the
same error and said it was bug.

Best Regards,
Peixin

On Thu, Oct 27, 2016 at 5:37 PM, Danny Auble  wrote:

> Slurm version 16.05.6 is now available and includes around 40 bug fixes
> developed over the past month.
>
> We have also made the third pre-release of version 17.02, which is under
> development and scheduled for release in February 2017.
>
> Slurm downloads are available from http://www.schedmd.com/#repos.
>
> We are excited to see you all next month at SC16, please feel free to come
> by our booth #412.
>
> The Slurm BoF will be Thursday, November 17th 12:15pm - 1:15pm in room
> 355-E
>
> More information about that can be found at http://sc16.supercomputing.
> org/presentation/?id=bof101=sess321.
>


[slurm-dev] Re: slurmd: fatal: Frontend not configured correctly in slurm.conf

2016-10-25 Thread Peixin Qiao
Hi Gennaro,

My slurm.conf is as follows:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=peixin
#ControlAddr=
#
AuthType=auth/none
CacheGroups=0
CryptoType=crypto/openssl
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobCompType=jobcomp/none
JobCredentialPrivateKey = /usr/local/etc/slurm.key
JobCredentialPublicCertificate = /usr/local/etc/slurm.cert
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#
#
# COMPUTE NODES
NodeName=peixin CPUs=4 RealMemory=5837 Sockets=4
PartitionName=debug Nodes=peixin Default=YES

Does your slurm work well in your ubuntu 16.04 system?

Best Regards,
Peixin

On Tue, Oct 25, 2016 at 5:12 PM, Gennaro Oliva <oliv...@na.icar.cnr.it>
wrote:

>
> Hi Peixin and Alexandre,
>
> On Mon, Oct 24, 2016 at 08:12:04AM -0700, Alexandre Strube wrote:
> > 2016-10-24 17:05 GMT+02:00 Peixin Qiao <pq...@hawk.iit.edu>:
> > > When I install slurm and start it on ubuntu 16.04, I got the error:
> > >
> > > slurred: fatal: Frontend not configured correctly in slurm.conf. See
> man
> > > slurm.conf for frontendname
>
> I have installed a vm with ubuntu 16.04 and configured a single node
> with slurm version 15.08.7-1build1 without getting the Frontend error.
> Can you please share your slurm.conf
> Thanks
> --
> Gennaro Oliva
>


[slurm-dev] Re: plan-based scheduler plugin

2016-10-25 Thread Peixin Qiao
I have found it and it is in /src/plugins/sched/. I read Slurm Scheduler
Plugin API:

http://slurm.schedmd.com/schedplugins.html

Slurm scheduler plugins are Slurm plugins that implement the Slurm
scheduler API described herein. They must conform to the Slurm Plugin API
with the following specifications:

Does it mean that I cannot plugin plan-based algorithm to change the
queue-based method?


Best Regards,

Peixin


On Tue, Oct 25, 2016 at 10:28 AM, Peixin Qiao <pq...@hawk.iit.edu> wrote:

> Hello,
>
> I want to plugin plan-based scheduler to change the FCFS scheduler. I
> cannot find the FCFS and backfill api. Could you please help me tell me
> where I can find it?
>
> Best Regards,
> Peixin
>


[slurm-dev] plan-based scheduler plugin

2016-10-25 Thread Peixin Qiao
Hello,

I want to plugin plan-based scheduler to change the FCFS scheduler. I
cannot find the FCFS and backfill api. Could you please help me tell me
where I can find it?

Best Regards,
Peixin


[slurm-dev] slurm_load_partitions: Unable to contact slurm controller (connect failure)

2016-10-24 Thread Peixin Qiao
Hello,

I installed slurm-llnl on Debian on one computer. When I ran slurmctld and
slurmd, I got the error:
slurm_load_partitions: Unable to contact slurm controller (connect failure).

The slurm.conf is as follows:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=debian
#ControlAddr=
#
AuthType=auth/none
CacheGroups=0
CryptoType=crypto/openssl
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
JobCompType=jobcomp/none
JobCredentialPrivateKey = /usr/local/etc/slurm.key
JobCredentialPublicCertificate = /usr/local/etc/slurm.cert
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
#SlurmctldLogFile=
SlurmdDebug=3
#SlurmdLogFile=
#
#
# COMPUTE NODES
NodeName=debian CPUs=4 RealMemory=5837 Sockets=4
PartitionName=debug Nodes=debian Default=YES

Best Regards,
Peixin


[slurm-dev] slurmd: fatal: Frontend not configured correctly in slurm.conf

2016-10-24 Thread Peixin Qiao
Hello,

When I install slurm and start it on ubuntu 16.04, I got the error:

slurred: fatal: Frontend not configured correctly in slurm.conf. See man
slurm.conf for frontendname

After seeing man slurm.conf, I still confused about how to change
slurm.conf. Could you please help me with the detailed change in the
slurm.conf file?

Best Regards,
Peixin
Ph.D. candidate in Computer Science
Illinois Institute of Technology