I want to run the slurmd on the 1 compute node I have specified in my configuration file. I don't have SLURM installed on this node. It is only installed on the master. I have read that slurm.conf needs to be on all nodes. Where does SLURM expect slurm.conf to be on the nodes? Do I need to install SLURM separately on all nodes?
Thanks, Adam / On Mon, Jun 15, 2015 at 12:12 PM, Cooper, Adam <[email protected]> wrote: > Thanks, Mike. You were right. > > I killed the stale process and am now able to run the slurmctld. > > Adam > > / > > On Mon, Jun 15, 2015 at 11:51 AM, Michael Robbert <[email protected]> > wrote: > >> Adam, >> That error looks like you already have a slurmctld running on this host. >> (or possibly some other program that is listening on the same TCP port). >> >> By default slurmctld binds to TCP/6817 and I don’t see a different port >> specified in your config file. That is probably fine, don’t change it if >> you don’t need to. Try running netstat to see what is currently listening >> on that port: >> >> # netstat -ltpn|grep 6817 >> tcp 0 0 0.0.0.0:6817 0.0.0.0:* >> LISTEN 11143/slurmctld >> >> It is likely a stale slurmctld process. If so just kill it and try to >> start again. >> >> Mike >> >> On Jun 15, 2015, at 9:02 AM, Cooper, Adam <[email protected]> wrote: >> >> Hi, >> I am new to SLURM and I have been tasked to install it on a cluster of 15 >> servers. Right now, I have just installed SLURM on the master, and hope to >> get the daemons running and scheduling jobs there before I try to get it >> working for the whole cluster. All of the machines are running Ubuntu >> 12.04. I have worked through some errors already; however, currently when I >> run: >> >> sudo slurmctld -Dv >> >> I get this out: >> >> slurmctld: pidfile not locked, assuming no running daemon >> >> slurmctld: slurmctld version 14.11.7 started on cluster cluster >> >> slurmctld: OpenSSL cryptographic signature plugin loaded >> >> slurmctld: preempt/none loaded >> >> slurmctld: ExtSensors NONE plugin loaded >> >> slurmctld: Accounting storage NOT INVOKED plugin loaded >> >> slurmctld: layouts: no layout to initialize >> >> slurmctld: topology NONE plugin loaded >> >> slurmctld: sched: Backfill scheduler plugin loaded >> >> slurmctld: route default plugin loaded >> >> slurmctld: layouts: loading entities/relations information >> >> slurmctld: Recovered state of 1 nodes >> >> slurmctld: Recovered information about 0 jobs >> >> slurmctld: Recovered state of 0 reservations >> >> slurmctld: State of 0 triggers recovered >> >> slurmctld: read_slurm_conf: backup_controller not specified. >> >> slurmctld: Running as primary controller >> >> *slurmctld: error: Error binding slurm stream socket: Address already in >> use* >> >> *slurmctld: fatal: slurm_init_msg_engine_addrname_port error Address >> already in use* >> >> >> By the way, I am running the daemon with root because my boss does not >> want me to create a separate 'slurm' user. Any idea what might cause this >> fatal error? I've attached an rtf of the current slurm configuration file >> (I've REDACTED some things to keep private), which I made using the online >> configuration tool. >> >> Please let me know any more relevant information that your need. Thank >> you in advance, and sorry for my lack of knowledge; this is very new work >> for me. >> >> >> Adam Cooper >> >> Brown University Computer Engineering '16 >> >> >> >> / >> <slurm_conf_current.rtf> >> >> >> >
