Hi all, I'd like to know what kind of errors can lead to 'Connection refused' message in slurm.
I've installed Slurm 14.03.6 in a 64bit Ubuntu VM (VirtualBox) and I get this message when I run slurmctld: "problems with erica-VirtualBox", but it has no hints of what's wrong. The configuration file is: root@erica-VirtualBox:/usr/local/etc# more slurm.conf # slurm.conf file generated by configurator easy.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=erica-VirtualBox ControlAddr=localhost # MailProg=/usr/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/pgid ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid #SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid #SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm #SlurmdUser=root StateSaveLocation=/var/tmp/slurm SwitchType=switch/none TaskPlugin=task/none # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING FastSchedule=1 SchedulerType=sched/backfill #SchedulerPort=7321 SelectType=select/linear # # # LOGGING AND ACCOUNTING AccountingStorageType=accounting_storage/none ClusterName=cluster #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux #SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm/slurmctld #SlurmdDebug=3 SlurmdLogFile=/var/log/slurm/slurmd # # # COMPUTE NODES NodeName=erica-VirtualBox CPUs=1 RealMemory=2002 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN PartitionName=particao1 Nodes=erica-VirtualBox Default=YES MaxTime=INFINITE State=UP The log of slurmctld shows: root@erica-VirtualBox:/usr/local/etc# slurmctld -D -vvvv slurmctld: pidfile not locked, assuming no running daemon slurmctld: debug3: Version in last_conf_lite header is 6912 slurmctld: error: Job accounting information gathered, but not stored slurmctld: slurmctld version 14.03.6 started on cluster cluster slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/crypto_munge.so slurmctld: Munge cryptographic signature plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/select_linear.so slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/preempt_none.so slurmctld: preempt/none loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/checkpoint_none.so slurmctld: debug3: Success. slurmctld: Checkpoint plugin loaded: checkpoint/none slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_energy_none.so slurmctld: AcctGatherEnergy NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_profile_none.so slurmctld: AcctGatherProfile NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_infiniband_none.so slurmctld: AcctGatherInfiniband NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/acct_gather_filesystem_none.so slurmctld: AcctGatherFilesystem NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug2: No acct_gather.conf file (/usr/local/etc/acct_gather.conf) slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/jobacct_gather_linux.so slurmctld: Job accounting gather LINUX plugin loaded slurmctld: debug3: Success. slurmctld: WARNING: We will use a much slower algorithm with proctrack/pgid, use Proctracktype=proctrack/linuxproc or some other proctrack when using jobacct_gather/linux slurmctld: error: WARNING: Even though we are collecting accounting information you have asked for it not to be stored (accounting_storage/none) if this is not what you have in mind you will need to change it. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/ext_sensors_none.so slurmctld: ExtSensors NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/switch_none.so slurmctld: switch NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug: No backup controller to shutdown slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/accounting_storage_none.so slurmctld: Accounting storage NOT INVOKED plugin loaded slurmctld: debug3: Success. slurmctld: debug3: not enforcing associations and no list was given so we are giving a blank list slurmctld: debug3: Version in assoc_mgr_state header is 1 slurmctld: debug: Reading slurm.conf file: /usr/local/etc/slurm.conf slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/topology_none.so slurmctld: topology NONE plugin loaded slurmctld: debug3: Success. slurmctld: debug: No DownNodes slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/jobcomp_none.so slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/sched_backfill.so slurmctld: sched: Backfill scheduler plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Version string in node_state header is PROTOCOL_VERSION slurmctld: Recovered state of 1 nodes slurmctld: debug3: Version string in job_state header is PROTOCOL_VERSION slurmctld: debug3: Job id in job_state header is 1 slurmctld: debug3: Set job_id_sequence to 1 slurmctld: Recovered information about 0 jobs slurmctld: debug: Updating partition uid access list slurmctld: debug3: Version string in resv_state header is PROTOCOL_VERSION slurmctld: Recovered state of 0 reservations slurmctld: State of 0 triggers recovered slurmctld: read_slurm_conf: backup_controller not specified. slurmctld: Running as primary controller slurmctld: debug3: Trying to load plugin /usr/local/lib/slurm/priority_basic.so slurmctld: debug: Priority BASIC plugin loaded slurmctld: debug3: Success. slurmctld: debug3: _slurmctld_background pid = 5995 slurmctld: debug: power_save module disabled, SuspendTime < 0 slurmctld: debug3: _slurmctld_rpc_mgr pid = 5995 slurmctld: debug2: slurmctld listening on 0.0.0.0:6817 slurmctld: debug: Spawning registration agent for erica-VirtualBox 1 hosts slurmctld: debug2: Spawning RPC agent for msg_type REQUEST_NODE_REGISTRATION_STATUS slurmctld: debug2: got 1 threads to send out slurmctld: debug2: Tree head got back 0 looking for 1 slurmctld: debug3: Tree sending to erica-VirtualBox slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug3: connect refused, retrying slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug2: _slurm_connect failed: Connection refused slurmctld: debug2: Error connecting slurm stream socket at 127.0.1.1:6818: Connection refused slurmctld: debug3: problems with erica-VirtualBox slurmctld: debug2: Tree head got back 1 slurmctld: debug2: Tree head got back 1 slurmctld: agent/is_node_resp: node:erica-VirtualBox rpc:1001 : Communication connection failure ^Cslurmctld: Terminate signal (SIGINT or SIGTERM) received slurmctld: debug: sched: slurmctld terminating slurmctld: debug3: _slurmctld_rpc_mgr shutting down slurmctld: Saving all slurm state slurmctld: debug3: Writing job id 1 to header record of job_state file slurmctld: debug3: _slurmctld_background shutting down slurmctld: Unable to remove pidfile '/var/run/slurmctld.pid': Permission denied Regards, -- =============== Erica Riello
