Felix,

I would suggest you look in the munged log for errors and make sure time is 
sync'd across all your nodes.

Trevor

> On Mar 17, 2015, at 5:31 AM, Felix Willenborg 
> <[email protected]> wrote:
> 
> 
> Hi there,
> 
> first of all, i'm kinda new to slurm, so hopefully i may have missed 
> something very basic here.
> 
> I'm trying to set up a system of six to seven nodes with homogenic hardware 
> as SLURM nodes. The nodes are connected via Infiniband. As a controller, i 
> have a system which differs the hardware specification a little bit. To keep 
> munge.key and slurm.conf homogenic on all systems i use salt. So far so good.
> 
> The problem i recieve is that no node is responding to the master when 
> "sinfo" is run under the controller. "scontrol ping" although says on every 
> node, that the primary controller is up, which is really confusing. Another 
> thing which seems weird is, that when i watch the log file of the controller, 
> it says that the node is found when slurmd on the node is restarted, and 
> after one minute approximately the connection is lost again.
> 
> I checked pretty much everything which came in my mind, like possible blocked 
> ports or user/group rights set wrong. Maybe you have an idea.. i ran out of 
> them. Also, here is the - anonymized - slurm.conf aswell as the slurmctld.log 
> and slurmd.log of on node. I'm looking forward to some help!!
> 
> Best wishes,
> Felix Willenborg
> 
> slurm.conf
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> # slurm.conf file generated by configurator.html.
> # Put this file on all nodes of your cluster.
> # See the slurm.conf man page for more information.
> #
> ControlMachine=erica
> ControlAddr=***.***.***.***
> #BackupController=
> #BackupAddr=
> #
> AuthType=auth/munge
> CacheGroups=0
> #CheckpointType=checkpoint/none
> CryptoType=crypto/munge
> #DisableRootJobs=NO
> #EnforcePartLimits=NO
> #Epilog=
> #EpilogSlurmctld=
> #FirstJobId=1
> #MaxJobId=999999
> #GresTypes=gpu
> #GroupUpdateForce=0
> #GroupUpdateTime=600
> #JobCheckpointDir=/var/slurm/checkpoint
> #JobCredentialPrivateKey=
> #JobCredentialPublicCertificate=
> #JobFileAppend=0
> #JobRequeue=1
> #JobSubmitPlugins=1
> #KillOnBadExit=0
> #Licenses=foo*4,bar
> #MailProg=/bin/mail
> #MaxJobCount=5000
> #MaxStepCount=40000
> #MaxTasksPerNode=128
> MpiDefault=none
> #MpiParams=ports=#-#
> #PluginDir=
> #PlugStackConfig=
> #PrivateData=jobs
> ProctrackType=proctrack/pgid
> #Prolog=
> #PrologSlurmctld=
> #PropagatePrioProcess=0
> #PropagateResourceLimits=
> #PropagateResourceLimitsExcept=
> ReturnToService=1
> #SallocDefaultCommand=
> SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
> SlurmctldPort=6817
> SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
> SlurmdPort=6818
> SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
> SlurmUser=slurm
> #SlurmdUser=root
> #SrunEpilog=
> #SrunProlog=
> StateSaveLocation=/var/lib/slurm-llnl/slurmctld
> SwitchType=switch/none
> #TaskEpilog=
> TaskPlugin=task/none
> #TaskPluginParam=
> #TaskProlog=
> #TopologyPlugin=topology/tree
> #TmpFs=/tmp
> #TrackWCKey=no
> #TreeWidth=
> #UnkillableStepProgram=
> #UsePAM=1
> #
> #
> # TIMERS
> #BatchStartTimeout=10
> #CompleteWait=0
> #EpilogMsgTime=2000
> #GetEnvTimeout=2
> #HealthCheckInterval=0
> #HealthCheckProgram=
> InactiveLimit=0
> KillWait=30
> #MessageTimeout=10
> #ResvOverRun=0
> MinJobAge=300
> #OverTimeLimit=0
> SlurmctldTimeout=120
> SlurmdTimeout=7200
> #UnkillableStepTimeout=60
> #VSizeFactor=0
> Waittime=0
> #
> #
> # SCHEDULING
> #DefMemPerCPU=0
> FastSchedule=1
> #MaxMemPerCPU=0
> #SchedulerRootFilter=1
> #SchedulerTimeSlice=30
> SchedulerType=sched/backfill
> SchedulerPort=7321
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> #
> #
> # JOB PRIORITY
> #PriorityType=priority/basic
> #PriorityDecayHalfLife=
> #PriorityCalcPeriod=
> #PriorityFavorSmall=
> #PriorityMaxAge=
> #PriorityUsageResetPeriod=
> #PriorityWeightAge=
> #PriorityWeightFairshare=
> #PriorityWeightJobSize=
> #PriorityWeightPartition=
> #PriorityWeightQOS=
> #
> #
> # LOGGING AND ACCOUNTING
> #AccountingStorageEnforce=0
> #AccountingStorageHost=
> AccountingStorageLoc=/var/log/slurm-llnl/accounting
> #AccountingStoragePass=
> #AccountingStoragePort=
> AccountingStorageType=accounting_storage/filetxt
> #AccountingStorageUser=
> AccountingStoreJobComment=YES
> ClusterName=cluster
> #DebugFlags=
> #JobCompHost=
> #JobCompLoc=
> #JobCompPass=
> #JobCompPort=
> JobCompType=jobcomp/none
> #JobCompUser=
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/linux
> SlurmctldDebug=7
> SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log
> SlurmdDebug=7
> SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
> #SlurmSchedLogFile=
> #SlurmSchedLogLevel=
> #
> #
> # POWER SAVE SUPPORT FOR IDLE NODES (optional)
> #SuspendProgram=
> #ResumeProgram=
> #SuspendTimeout=
> #ResumeTimeout=
> #ResumeRate=
> #SuspendExcNodes=
> #SuspendExcParts=
> #SuspendRate=
> #SuspendTime=
> #
> #
> # COMPUTE NODES
> #NodeName=node[01-06] CPUs=12 RealMemory=128910 Sockets=2 CoresPerSocket=6 
> ThreadsPerCore=1 State=UNKNOWN
> NodeName=node01 NodeAddr=***.***.***.51 CPUs=12 RealMemory=128910 Sockets=2 
> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> NodeName=node02 NodeAddr=***.***.***.52 CPUs=12 RealMemory=128910 Sockets=2 
> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> NodeName=node03 NodeAddr=***.***.***.53 CPUs=12 RealMemory=128910 Sockets=2 
> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> NodeName=node04 NodeAddr=***.***.***.54 CPUs=12 RealMemory=128910 Sockets=2 
> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> NodeName=node05 NodeAddr=***.***.***.55 CPUs=12 RealMemory=128910 Sockets=2 
> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> NodeName=node06 NodeAddr=***.***.***.56 CPUs=12 RealMemory=128910 Sockets=2 
> CoresPerSocket=6 ThreadsPerCore=1 State=UNKNOWN
> PartitionName=dft default=YES Nodes=node[01-06] MaxTime=INFINITE State=UP
> 
> 
> slurmctld.log
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> [2015-03-16T15:39:54.813] debug:  sched: slurmctld starting
> [2015-03-16T15:39:54.817] error: Configured MailProg is invalid
> [2015-03-16T15:39:54.817] debug3: Trying to load plugin 
> /usr/lib/slurm/accounting_storage_filetxt.so
> [2015-03-16T15:39:54.817] debug2: slurmdb_init() called
> [2015-03-16T15:39:54.817] Accounting storage FileTxt plugin loaded
> [2015-03-16T15:39:54.818] debug3: Success.
> [2015-03-16T15:39:54.818] debug3: not enforcing associations and no list was 
> given so we are giving a blank list
> [2015-03-16T15:39:54.818] debug3: Version in assoc_mgr_state header is 1
> [2015-03-16T15:39:54.818] slurmctld version 2.6.5 started on cluster cluster
> [2015-03-16T15:39:54.818] debug3: Trying to load plugin 
> /usr/lib/slurm/crypto_munge.so
> [2015-03-16T15:39:54.818] Munge cryptographic signature plugin loaded
> [2015-03-16T15:39:54.818] debug3: Success.
> [2015-03-16T15:39:54.818] debug3: Trying to load plugin 
> /usr/lib/slurm/select_cons_res.so
> [2015-03-16T15:39:54.818] Consumable Resources (CR) Node Selection plugin 
> loaded with argument 20
> [2015-03-16T15:39:54.818] debug3: Success.
> [2015-03-16T15:39:54.818] debug3: Trying to load plugin 
> /usr/lib/slurm/preempt_none.so
> [2015-03-16T15:39:54.818] preempt/none loaded
> [2015-03-16T15:39:54.818] debug3: Success.
> [2015-03-16T15:39:54.818] debug3: Trying to load plugin 
> /usr/lib/slurm/checkpoint_none.so
> [2015-03-16T15:39:54.818] debug3: Success.
> [2015-03-16T15:39:54.818] Checkpoint plugin loaded: checkpoint/none
> [2015-03-16T15:39:54.818] debug3: Trying to load plugin 
> /usr/lib/slurm/jobacct_gather_linux.so
> [2015-03-16T15:39:54.818] Job accounting gather LINUX plugin loaded
> [2015-03-16T15:39:54.818] debug3: Success.
> [2015-03-16T15:39:54.819] WARNING: We will use a much slower algorithm with 
> proctrack/pgid, use Proctracktype=proctrack/linuxproc or some other proctrack 
> when using jobacct_gather/linux
> [2015-03-16T15:39:54.819] debug3: Trying to load plugin 
> /usr/lib/slurm/ext_sensors_none.so
> [2015-03-16T15:39:54.819] ExtSensors NONE plugin loaded
> [2015-03-16T15:39:54.819] debug3: Success.
> [2015-03-16T15:39:54.819] debug:  No backup controller to shutdown
> [2015-03-16T15:39:54.819] debug3: Trying to load plugin 
> /usr/lib/slurm/switch_none.so
> [2015-03-16T15:39:54.819] switch NONE plugin loaded
> [2015-03-16T15:39:54.819] debug3: Success.
> [2015-03-16T15:39:54.819] debug:  Reading slurm.conf file: 
> /etc/slurm-llnl/slurm.conf
> [2015-03-16T15:39:54.820] debug3: Trying to load plugin 
> /usr/lib/slurm/topology_none.so
> [2015-03-16T15:39:54.820] topology NONE plugin loaded
> [2015-03-16T15:39:54.820] debug3: Success.
> [2015-03-16T15:39:54.827] debug:  No DownNodes
> [2015-03-16T15:39:54.827] debug3: Trying to load plugin 
> /usr/lib/slurm/jobcomp_none.so
> [2015-03-16T15:39:54.827] debug3: Success.
> [2015-03-16T15:39:54.827] debug3: Trying to load plugin 
> /usr/lib/slurm/sched_backfill.so
> [2015-03-16T15:39:54.827] sched: Backfill scheduler plugin loaded
> [2015-03-16T15:39:54.827] debug3: Success.
> [2015-03-16T15:39:54.828] debug3: Version string in node_state header is 
> VER006
> [2015-03-16T15:39:54.828] Recovered state of 6 nodes
> [2015-03-16T15:39:54.828] debug3: Version string in job_state header is VER014
> [2015-03-16T15:39:54.828] debug3: Job id in job_state header is 42
> [2015-03-16T15:39:54.828] debug3: Set job_id_sequence to 42
> [2015-03-16T15:39:54.828] Recovered information about 0 jobs
> [2015-03-16T15:39:54.828] cons_res: select_p_node_init
> [2015-03-16T15:39:54.828] cons_res: preparing for 1 partitions
> [2015-03-16T15:39:54.828] debug:  Updating partition uid access list
> [2015-03-16T15:39:54.828] debug3: Version string in resv_state header is 
> VER004
> [2015-03-16T15:39:54.828] Recovered state of 0 reservations
> [2015-03-16T15:39:54.828] State of 0 triggers recovered
> [2015-03-16T15:39:54.828] read_slurm_conf: backup_controller not specified.
> [2015-03-16T15:39:54.828] cons_res: select_p_reconfigure
> [2015-03-16T15:39:54.828] cons_res: select_p_node_init
> [2015-03-16T15:39:54.828] cons_res: preparing for 1 partitions
> [2015-03-16T15:39:54.828] Running as primary controller
> [2015-03-16T15:39:54.829] debug3: Trying to load plugin 
> /usr/lib/slurm/priority_basic.so
> [2015-03-16T15:39:54.829] debug:  Priority BASIC plugin loaded
> [2015-03-16T15:39:54.829] debug3: Success.
> [2015-03-16T15:39:54.830] debug3: _slurmctld_rpc_mgr pid = 30521
> [2015-03-16T15:39:54.830] debug3: _slurmctld_background pid = 30521
> [2015-03-16T15:39:54.830] debug:  power_save module disabled, SuspendTime < 0
> [2015-03-16T15:39:54.830] debug2: slurmctld listening on 0.0.0.0:6817
> [2015-03-16T15:39:57.832] debug:  Spawning registration agent for node[01-06] 
> 6 hosts
> [2015-03-16T15:39:57.832] debug2: Spawning RPC agent for msg_type 1001
> [2015-03-16T15:39:57.837] debug2: got 1 threads to send out
> [2015-03-16T15:39:57.840] debug3: Tree sending to node01
> [2015-03-16T15:39:57.841] debug3: Tree sending to node02
> [2015-03-16T15:39:57.842] debug3: Tree sending to node03
> [2015-03-16T15:39:57.843] debug3: Tree sending to node04
> [2015-03-16T15:39:57.844] debug3: Tree sending to node05
> [2015-03-16T15:39:57.844] debug2: Tree head got back 0 looking for 6
> [2015-03-16T15:39:57.844] debug3: Tree sending to node06
> [2015-03-16T15:39:58.989] debug3: Trying to load plugin 
> /usr/lib/slurm/auth_munge.so
> [2015-03-16T15:39:58.989] auth plugin for Munge 
> (http://code.google.com/p/munge/) loaded
> [2015-03-16T15:39:58.989] debug3: Success.
> [2015-03-16T15:39:58.990] debug2: Processing RPC: 
> MESSAGE_NODE_REGISTRATION_STATUS from uid=0
> [2015-03-16T15:39:58.990] debug:  validate_node_specs: node node01 registered 
> with 0 jobs
> [2015-03-16T15:39:58.990] debug2: _slurm_rpc_node_registration complete for 
> node01 usec=69
> [2015-03-16T15:40:02.845] debug2: _slurm_connect poll timeout: Connection 
> timed out
> [2015-03-16T15:40:02.845] debug2: Error connecting slurm stream socket at 
> ***.***.***.***52:6818: Connection timed out
> [2015-03-16T15:40:02.845] debug3: problems with node01
> [2015-03-16T15:40:02.845] debug2: _slurm_connect poll timeout: Connection 
> timed out
> [2015-03-16T15:40:02.845] debug2: Error connecting slurm stream socket at 
> ***.***.***.54:6818: Connection timed out
> [2015-03-16T15:40:02.845] debug2: _slurm_connect poll timeout: Connection 
> timed out
> [2015-03-16T15:40:02.845] debug2: Error connecting slurm stream socket at 
> ***.***.***.56:6818: Connection timed out
> [2015-03-16T15:40:02.845] debug2: _slurm_connect poll timeout: Connection 
> timed out
> [2015-03-16T15:40:02.846] debug2: Error connecting slurm stream socket at 
> ***.***.***.53:6818: Connection timed out
> [2015-03-16T15:40:02.846] debug2: Tree head got back 1
> [2015-03-16T15:40:02.846] debug2: _slurm_connect poll timeout: Connection 
> timed out
> [2015-03-16T15:40:02.846] debug2: Error connecting slurm stream socket at 
> ***.***.***.57:6818: Connection timed out
> [2015-03-16T15:40:02.846] debug3: problems with node06
> [2015-03-16T15:40:02.846] debug3: problems with node03
> [2015-03-16T15:40:02.846] debug3: problems with node05
> [2015-03-16T15:40:02.846] debug3: problems with node02
> [2015-03-16T15:40:02.846] debug2: _slurm_connect poll timeout: Connection 
> timed out
> [2015-03-16T15:40:02.846] debug2: Error connecting slurm stream socket at 
> ***.***.***.55:6818: Connection timed out
> [2015-03-16T15:40:02.846] debug2: Tree head got back 2
> [2015-03-16T15:40:02.846] debug3: problems with node04
> [2015-03-16T15:40:02.846] debug2: Tree head got back 3
> [2015-03-16T15:40:02.846] debug2: Tree head got back 4
> [2015-03-16T15:40:02.846] debug2: Tree head got back 5
> [2015-03-16T15:40:02.846] debug2: Tree head got back 5
> [2015-03-16T15:40:02.846] debug2: Tree head got back 6
> [2015-03-16T15:40:02.846] agent/is_node_resp: node:node01 rpc:1001 : 
> Communication connection failure
> [2015-03-16T15:40:02.846] agent/is_node_resp: node:node06 rpc:1001 : 
> Communication connection failure
> [2015-03-16T15:40:02.846] agent/is_node_resp: node:node03 rpc:1001 : 
> Communication connection failure
> [2015-03-16T15:40:02.846] agent/is_node_resp: node:node02 rpc:1001 : 
> Communication connection failure
> [2015-03-16T15:40:02.846] agent/is_node_resp: node:node04 rpc:1001 : 
> Communication connection failure
> [2015-03-16T15:40:02.846] agent/is_node_resp: node:node05 rpc:1001 : 
> Communication connection failure
> [2015-03-16T15:40:03.113] debug:  node_not_resp: node node01 responded since 
> msg sent
> [2015-03-16T15:40:03.833] error: Nodes node[01-06] not responding
> [2015-03-16T15:40:24.835] debug2: Testing job time limits and checkpoints
> [2015-03-16T15:40:53.000] debug:  backfill: beginning
> [2015-03-16T15:40:53.000] debug:  backfill: no jobs to backfill
> [2015-03-16T15:40:54.838] debug2: Testing job time limits and checkpoints
> [2015-03-16T15:40:54.838] debug2: Performing purge of old job records
> [2015-03-16T15:40:54.838] debug:  sched: Running job scheduler
> [2015-03-16T15:41:24.842] debug2: Testing job time limits and checkpoints
> [2015-03-16T15:41:54.845] debug2: Testing job time limits and checkpoints
> [2015-03-16T15:41:54.845] debug2: Performing purge of old job records
> [2015-03-16T15:41:54.845] debug:  sched: Running job scheduler
> [2015-03-16T15:42:24.848] debug2: Testing job time limits and checkpoints
> 
> 
> slurmd.log
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> [2015-03-16T15:39:58.984] debug3: Trying to load plugin 
> /usr/lib/slurm/topology_none.so
> [2015-03-16T15:39:58.984] topology NONE plugin loaded
> [2015-03-16T15:39:58.984] debug3: Success.
> [2015-03-16T15:39:58.984] Gathering cpu frequency information for 12 cpus
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 0, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 1, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 2, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 3, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 4, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 5, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.984] debug:  cpu_freq_init: cpu 6, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.985] debug:  cpu_freq_init: cpu 7, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.985] debug:  cpu_freq_init: cpu 8, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.985] debug:  cpu_freq_init: cpu 9, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.985] debug:  cpu_freq_init: cpu 10, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.985] debug:  cpu_freq_init: cpu 11, reset freq: 1200000, 
> reset governor: ondemand
> [2015-03-16T15:39:58.985] debug3: NodeName    = node01
> [2015-03-16T15:39:58.985] debug3: TopoAddr    = node01
> [2015-03-16T15:39:58.985] debug3: TopoPattern = node
> [2015-03-16T15:39:58.985] debug3: CacheGroups = 0
> [2015-03-16T15:39:58.985] debug3: Confile     = `/etc/slurm-llnl/slurm.conf'
> [2015-03-16T15:39:58.985] debug3: Debug       = 7
> [2015-03-16T15:39:58.985] debug3: CPUs        = 12 (CF: 12, HW: 12)
> [2015-03-16T15:39:58.985] debug3: Boards      = 1  (CF:  1, HW:  1)
> [2015-03-16T15:39:58.985] debug3: Sockets     = 2  (CF:  2, HW:  2)
> [2015-03-16T15:39:58.985] debug3: Cores       = 6  (CF:  6, HW:  6)
> [2015-03-16T15:39:58.985] debug3: Threads     = 1  (CF:  1, HW:  1)
> [2015-03-16T15:39:58.985] debug3: UpTime      = 1734749 = 20-01:52:29
> [2015-03-16T15:39:58.985] debug3: Block Map   = 0,1,2,3,4,5,6,7,8,9,10,11
> [2015-03-16T15:39:58.985] debug3: Inverse Map = 0,1,2,3,4,5,6,7,8,9,10,11
> [2015-03-16T15:39:58.985] debug3: RealMemory  = 128910
> [2015-03-16T15:39:58.985] debug3: TmpDisk     = 210195
> [2015-03-16T15:39:58.985] debug3: Epilog      = `(null)'
> [2015-03-16T15:39:58.985] debug3: Logfile     = 
> `/var/log/slurm-llnl/slurmd.log'
> [2015-03-16T15:39:58.985] debug3: HealthCheck = `(null)'
> [2015-03-16T15:39:58.985] debug3: NodeName    = node01
> [2015-03-16T15:39:58.985] debug3: NodeAddr    = ***.***.***.52
> [2015-03-16T15:39:58.985] debug3: Port        = 6818
> [2015-03-16T15:39:58.985] debug3: Prolog      = `(null)'
> [2015-03-16T15:39:58.985] debug3: TmpFS       = `/tmp'
> [2015-03-16T15:39:58.985] debug3: Public Cert = `(null)'
> [2015-03-16T15:39:58.985] debug3: Slurmstepd  = `/usr/sbin/slurmstepd'
> [2015-03-16T15:39:58.985] debug3: Spool Dir   = `/var/lib/slurm-llnl/slurmd'
> [2015-03-16T15:39:58.985] debug3: Pid File    = 
> `/var/run/slurm-llnl/slurmd.pid'
> [2015-03-16T15:39:58.985] debug3: Slurm UID   = 64030
> [2015-03-16T15:39:58.985] debug3: TaskProlog  = `(null)'
> [2015-03-16T15:39:58.985] debug3: TaskEpilog  = `(null)'
> [2015-03-16T15:39:58.985] debug3: TaskPluginParam = 0
> [2015-03-16T15:39:58.985] debug3: Use PAM     = 0
> [2015-03-16T15:39:58.985] debug3: Trying to load plugin 
> /usr/lib/slurm/proctrack_pgid.so
> [2015-03-16T15:39:58.985] debug3: Success.
> [2015-03-16T15:39:58.985] debug3: Trying to load plugin 
> /usr/lib/slurm/task_none.so
> [2015-03-16T15:39:58.985] task NONE plugin loaded
> [2015-03-16T15:39:58.985] debug3: Success.
> [2015-03-16T15:39:58.985] debug3: Trying to load plugin 
> /usr/lib/slurm/auth_munge.so
> [2015-03-16T15:39:58.985] auth plugin for Munge 
> (http://code.google.com/p/munge/) loaded
> [2015-03-16T15:39:58.985] debug3: Success.
> [2015-03-16T15:39:58.985] debug:  spank: opening plugin stack 
> /etc/slurm-llnl/plugstack.conf
> [2015-03-16T15:39:58.985] debug3: Trying to load plugin 
> /usr/lib/slurm/crypto_munge.so
> [2015-03-16T15:39:58.985] Munge cryptographic signature plugin loaded
> [2015-03-16T15:39:58.985] debug3: Success.
> [2015-03-16T15:39:58.985] debug3: initializing slurmd spool directory
> [2015-03-16T15:39:58.985] debug3: slurmd initialization successful
> [2015-03-16T15:39:58.986] Warning: Core limit is only 0 KB
> [2015-03-16T15:39:58.986] slurmd version 2.6.5 started
> [2015-03-16T15:39:58.986] debug3: finished daemonize
> [2015-03-16T15:39:58.986] debug3: Trying to load plugin 
> /usr/lib/slurm/jobacct_gather_linux.so
> [2015-03-16T15:39:58.986] Job accounting gather LINUX plugin loaded
> [2015-03-16T15:39:58.986] debug3: Success.
> [2015-03-16T15:39:58.986] WARNING: We will use a much slower algorithm with 
> proctrack/pgid, use Proctracktype=proctrack/linuxproc or some other proctrack 
> when using jobacct_gather/linux
> [2015-03-16T15:39:58.986] debug3: Trying to load plugin 
> /usr/lib/slurm/switch_none.so
> [2015-03-16T15:39:58.987] switch NONE plugin loaded
> [2015-03-16T15:39:58.987] debug3: Success.
> [2015-03-16T15:39:58.987] debug3: successfully opened slurm listen port 
> ***.***.***.52:6818
> [2015-03-16T15:39:58.987] slurmd started on Mon, 16 Mar 2015 15:39:58 +0100
> [2015-03-16T15:39:58.987] CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1 
> Memory=128910 TmpDisk=210195 Uptime=1734749
> [2015-03-16T15:39:58.987] debug3: Trying to load plugin 
> /usr/lib/slurm/acct_gather_energy_none.so
> [2015-03-16T15:39:58.987] AcctGatherEnergy NONE plugin loaded
> [2015-03-16T15:39:58.987] debug3: Success.
> [2015-03-16T15:39:58.987] debug3: Trying to load plugin 
> /usr/lib/slurm/acct_gather_profile_none.so
> [2015-03-16T15:39:58.987] AcctGatherProfile NONE plugin loaded
> [2015-03-16T15:39:58.987] debug3: Success.
> [2015-03-16T15:39:58.987] debug3: Trying to load plugin 
> /usr/lib/slurm/acct_gather_infiniband_none.so
> [2015-03-16T15:39:58.988] AcctGatherInfiniband NONE plugin loaded
> [2015-03-16T15:39:58.988] debug3: Success.
> [2015-03-16T15:39:58.988] debug3: Trying to load plugin 
> /usr/lib/slurm/acct_gather_filesystem_none.so
> [2015-03-16T15:39:58.988] AcctGatherFilesystem NONE plugin loaded
> [2015-03-16T15:39:58.988] debug3: Success.
> [2015-03-16T15:39:58.988] debug2: No acct_gather.conf file 
> (/etc/slurm-llnl/acct_gather.conf)

-- Trevor

Reply via email to