I want to run the slurmd on the 1 compute node I have specified in my
configuration file.  I don't have SLURM installed on this node.  It is only
installed on the master.  I have read that slurm.conf needs to be on all
nodes. Where does SLURM expect slurm.conf to be on the nodes? Do I need to
install SLURM separately on all nodes?

Thanks,
Adam



/

On Mon, Jun 15, 2015 at 12:12 PM, Cooper, Adam <[email protected]>
wrote:

>  Thanks, Mike. You were right.
>
> I killed the stale process and am now able to run the slurmctld.
>
> Adam
>
> /
>
> On Mon, Jun 15, 2015 at 11:51 AM, Michael Robbert <[email protected]>
> wrote:
>
>> Adam,
>> That error looks like you already have a slurmctld running on this host.
>> (or possibly some other program that is listening on the same TCP port).
>>
>> By default slurmctld binds to TCP/6817 and I don’t see a different port
>> specified in your config file. That is probably fine, don’t change it if
>> you don’t need to. Try running netstat to see what is currently listening
>> on that port:
>>
>> # netstat -ltpn|grep 6817
>> tcp        0      0 0.0.0.0:6817                0.0.0.0:*
>>     LISTEN      11143/slurmctld
>>
>> It is likely a stale slurmctld process. If so just kill it and try to
>> start again.
>>
>> Mike
>>
>> On Jun 15, 2015, at 9:02 AM, Cooper, Adam <[email protected]> wrote:
>>
>>  Hi,
>> I am new to SLURM and I have been tasked to install it on a cluster of 15
>> servers.  Right now, I have just installed SLURM on the master, and hope to
>> get the daemons running and scheduling jobs there before I try to get it
>> working for the whole cluster. All of the machines are running Ubuntu
>> 12.04. I have worked through some errors already; however, currently when I
>> run:
>>
>> sudo slurmctld -Dv
>>
>> I get this out:
>>
>> slurmctld: pidfile not locked, assuming no running daemon
>>
>> slurmctld: slurmctld version 14.11.7 started on cluster cluster
>>
>> slurmctld: OpenSSL cryptographic signature plugin loaded
>>
>> slurmctld: preempt/none loaded
>>
>> slurmctld: ExtSensors NONE plugin loaded
>>
>> slurmctld: Accounting storage NOT INVOKED plugin loaded
>>
>> slurmctld: layouts: no layout to initialize
>>
>> slurmctld: topology NONE plugin loaded
>>
>> slurmctld: sched: Backfill scheduler plugin loaded
>>
>> slurmctld: route default plugin loaded
>>
>> slurmctld: layouts: loading entities/relations information
>>
>> slurmctld: Recovered state of 1 nodes
>>
>> slurmctld: Recovered information about 0 jobs
>>
>> slurmctld: Recovered state of 0 reservations
>>
>> slurmctld: State of 0 triggers recovered
>>
>> slurmctld: read_slurm_conf: backup_controller not specified.
>>
>> slurmctld: Running as primary controller
>>
>> *slurmctld: error: Error binding slurm stream socket: Address already in
>> use*
>>
>> *slurmctld: fatal: slurm_init_msg_engine_addrname_port error Address
>> already in use*
>>
>>
>> By the way, I am running the daemon with root because my boss does not
>> want me to create a separate 'slurm' user.  Any idea what might cause this
>> fatal error?  I've attached an rtf of the current slurm configuration file
>> (I've REDACTED some things to keep private), which I made using the online
>> configuration tool.
>>
>> Please let me know any more relevant information that your need. Thank
>> you in advance, and sorry for my lack of knowledge; this is very new work
>> for me.
>>
>>
>> Adam Cooper
>>
>> Brown University Computer Engineering '16
>>
>>
>>
>> /
>>  <slurm_conf_current.rtf>
>>
>>
>>
>

Reply via email to