Hi, I am new to SLURM and I have been tasked to install it on a cluster of 15 servers. Right now, I have just installed SLURM on the master, and hope to get the daemons running and scheduling jobs there before I try to get it working for the whole cluster. All of the machines are running Ubuntu 12.04. I have worked through some errors already; however, currently when I run:
sudo slurmctld -Dv I get this out: slurmctld: pidfile not locked, assuming no running daemon slurmctld: slurmctld version 14.11.7 started on cluster cluster slurmctld: OpenSSL cryptographic signature plugin loaded slurmctld: preempt/none loaded slurmctld: ExtSensors NONE plugin loaded slurmctld: Accounting storage NOT INVOKED plugin loaded slurmctld: layouts: no layout to initialize slurmctld: topology NONE plugin loaded slurmctld: sched: Backfill scheduler plugin loaded slurmctld: route default plugin loaded slurmctld: layouts: loading entities/relations information slurmctld: Recovered state of 1 nodes slurmctld: Recovered information about 0 jobs slurmctld: Recovered state of 0 reservations slurmctld: State of 0 triggers recovered slurmctld: read_slurm_conf: backup_controller not specified. slurmctld: Running as primary controller *slurmctld: error: Error binding slurm stream socket: Address already in use* *slurmctld: fatal: slurm_init_msg_engine_addrname_port error Address already in use* By the way, I am running the daemon with root because my boss does not want me to create a separate 'slurm' user. Any idea what might cause this fatal error? I've attached an rtf of the current slurm configuration file (I've REDACTED some things to keep private), which I made using the online configuration tool. Please let me know any more relevant information that your need. Thank you in advance, and sorry for my lack of knowledge; this is very new work for me. Adam Cooper Brown University Computer Engineering '16 /
slurm_conf_current.rtf
Description: RTF file
