Hi! I am trying install slurmd version 2.6.5 on Red Hat Enterprise Linux Server release 5.1
First I am trying to install slurm on a single node I am getting problem given below. ps -el | grep slurmctld its giving nothing ps -el | grep slurmd its giving nothing /etc/init.d/slurm start its giving /etc/init.d/slurm: command not found sudo service slurm start its giving service: command not found sudo slurmd -Dvvvvv its giving output given below slurmd: Node configuration differs from hardware: CPUs=6:6(hw) Boards=1:1(hw) SocketsPerBoard=6:1(hw) CoresPerSocket=1:6(hw) ThreadsPerCore=1:1(hw) slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/topology_none.so slurmd: topology NONE plugin loaded slurmd: debug3: Success. slurmd: Gathering cpu frequency information for 6 cpus slurmd: debug: cpu_freq_init: cpu 0, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 1, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 2, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 3, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 4, reset freq: 1600000, reset governor: userspace slurmd: debug: cpu_freq_init: cpu 5, reset freq: 1600000, reset governor: userspace slurmd: debug3: NodeName = localhost slurmd: debug3: TopoAddr = localhost slurmd: debug3: TopoPattern = node slurmd: debug3: CacheGroups = 0 slurmd: debug3: Confile = `/usr/local/etc/slurm.conf' slurmd: debug3: Debug = 3 slurmd: debug3: CPUs = 6 (CF: 6, HW: 6) slurmd: debug3: Boards = 1 (CF: 1, HW: 1) slurmd: debug3: Sockets = 6 (CF: 6, HW: 1) slurmd: debug3: Cores = 1 (CF: 1, HW: 6) slurmd: debug3: Threads = 1 (CF: 1, HW: 1) slurmd: debug3: UpTime = 14155466 = 163-20:04:26 slurmd: debug3: Block Map = 0,1,2,3,4,5 slurmd: debug3: Inverse Map = 0,1,2,3,4,5 slurmd: debug3: RealMemory = 24098 slurmd: debug3: TmpDisk = 48418 slurmd: debug3: Epilog = `(null)' slurmd: debug3: Logfile = `(null)' slurmd: debug3: HealthCheck = `(null)' slurmd: debug3: NodeName = localhost slurmd: debug3: NodeAddr = (null) slurmd: debug3: Port = 6818 slurmd: debug3: Prolog = `(null)' slurmd: debug3: TmpFS = `/tmp' slurmd: debug3: Public Cert = `(null)' slurmd: debug3: Slurmstepd = `/usr/local/sbin/slurmstepd' slurmd: debug3: Spool Dir = `/tmp/slurmd' slurmd: debug3: Pid File = `/var/run/slurmd.pid' slurmd: debug3: Slurm UID = 0 slurmd: debug3: TaskProlog = `(null)' slurmd: debug3: TaskEpilog = `(null)' slurmd: debug3: TaskPluginParam = 0 slurmd: debug3: Use PAM = 0 slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/proctrack_pgid.so slurmd: debug3: Success. slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/task_none.so slurmd: task NONE plugin loaded slurmd: debug3: Success. slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/auth_munge.so slurmd: auth plugin for Munge (http://code.google.com/p/munge/) loaded slurmd: debug3: Success. slurmd: debug: spank: opening plugin stack /usr/local/etc/plugstack.conf slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/crypto_munge.so slurmd: Munge cryptographic signature plugin loaded slurmd: debug3: Success. slurmd: debug3: initializing slurmd spool directory slurmd: debug3: slurmd initialization successful slurmd: Warning: Core limit is only 0 KB slurmd: slurmd version 2.6.5 started slurmd: debug3: finished daemonize slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/jobacct_gather_none.so slurmd: Job accounting gather NOT_INVOKED plugin loaded slurmd: debug3: Success. slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/switch_none.so slurmd: switch NONE plugin loaded slurmd: debug3: Success. slurmd: debug3: successfully opened slurm listen port *:6818 slurmd: slurmd started on Tue, 11 Feb 2014 15:27:09 +0530 slurmd: CPUs=6 Boards=1 Sockets=6 Cores=1 Threads=1 Memory=24098 TmpDisk=48418 Uptime=14155466 slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_energy_none.so slurmd: AcctGatherEnergy NONE plugin loaded slurmd: debug3: Success. slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_profile_none.so slurmd: AcctGatherProfile NONE plugin loaded slurmd: debug3: Success. slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_infiniband_none.so slurmd: AcctGatherInfiniband NONE plugin loaded slurmd: debug3: Success. slurmd: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_filesystem_none.so slurmd: AcctGatherFilesystem NONE plugin loaded slurmd: debug3: Success. slurmd: debug2: No acct_gather.conf file (/usr/local/etc/acct_gather.conf) slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 127.0.0.1:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 127.0.0.1:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 127.0.0.1:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: error: Unable to register: Unable to contact slurm controller (connect failure) slurmd: debug: Unable to register with slurm controller, retrying slurmd: debug3: CPUs=6 Boards=1 Sockets=6 Cores=1 Threads=1 Memory=24098 TmpDisk=48418 Uptime=14155476 slurmd: debug2: _slurm_connect failed: Connection refused slurmd: debug2: Error connecting slurm stream socket at 127.0.0.1:6817: Connection refused slurmd: debug: Failed to contact primary controller: Connection refused slurmd: debug2: _slurm_connect failed: Connection refused sudo slurmctld -Dvvvvv its giving output given below debug3: Trying to load plugin /usr/local/lib/slurm/accounting_storage_none.so slurmctld: Accounting storage NOT INVOKED plugin loaded slurmctld: debug3: Success. slurmctld: debug3: not enforcing associations and no list was given so we are giving a blank list slurmctld: debug3: Version in assoc_mgr_state header is 1 slurmctld: slurmctld version 2.6.5 started on cluster hybrid_cluster slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/crypto_munge.so slurmctld: Munge cryptographic signature plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/select_linear.so slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/preempt_none.so slurmctld: preempt/none loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/checkpoint_none.so slurmctld: debug3: Success. slurmctld: Checkpoint plugin loaded: checkpoint/none slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/jobacct_gather_none.so slurmctld: Job accounting gather NOT_INVOKED plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/ext_sensors_none.so slurmctld: ExtSensors NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug: No backup controller to shutdown slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/switch_none.so slurmctld: switch NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug: Reading slurm.conf file: /usr/local/etc/slurm.conf slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/topology_none.so slurmctld: topology NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug: No DownNodes slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/jobcomp_none.so slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/sched_backfill.so slurmctld: sched: Backfill scheduler plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Version string in node_state header is VER006 slurmctld: Recovered state of 1 nodes slurmctld: debug3: Version string in job_state header is VER014 slurmctld: debug3: Job id in job_state header is 1 slurmctld: debug3: Set job_id_sequence to 1 slurmctld: Recovered information about 0 jobs slurmctld: debug: Updating partition uid access list slurmctld: debug3: Version string in resv_state header is VER004 slurmctld: Recovered state of 0 reservations slurmctld: State of 0 triggers recovered slurmctld: read_slurm_conf: backup_controller not specified. slurmctld: Running as primary controller slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/priority_basic.so slurmctld: debug: Priority BASIC plugin loaded slurmctld: debug3: Success. slurmctld: debug3: _slurmctld_background pid = 9501 slurmctld: debug3: _slurmctld_rpc_mgr pid = 9501 slurmctld: debug: power_save module disabled, SuspendTime < 0 slurmctld: debug2: slurmctld listening on 0.0.0.0:6817 slurmctld: debug: Spawning registration agent for localhost 1 hosts slurmctld: debug2: Spawning RPC agent for msg_type 1001 slurmctld: debug2: got 1 threads to send out slurmctld: debug2: Tree head got back 0 looking for 1 slurmctld: debug3: Tree sending to localhost slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.0.1:6818: Connection refused slurmctld: debug3: connect refused, retrying slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.0.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.0.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.0.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.0.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug3: problems with localhost slurmctld: debug2: Tree head got back 1 slurmctld: agent/is_node_resp: node:localhost rpc:1001 : Communication connection failure ^[[Bslurmctld: debug: backfill: beginning slurmctld: debug: backfill: no jobs to backfill slurmctld: debug2: Testing job time limits and checkpoints For running slurmd on all compute node we need to install slurm 2.6.5 on all the compute node manually ????? I am new to slurm please help me to resolve these problems I am attaching my slurm.conf file also. Thanks Nagendra
slurm.conf
Description: Binary data
