Hello there, I´m trying to set up slurm for a cluster which consists of Head- and Compute node (Debian 4.6.3-14). I installed munge (munge-0.5.10) via rpm and got it running on both head and node. I installed slurm-llnl (2.3.4-2) via rpm as well, on the node slurmd is running, but connection to the head is refused. The headnode is my problem:
With sudo slurmctld -cDvvvvv I get: slurmctld: pidfile not locked, assuming no running daemon slurmctld: debug3: Trying to load plugin /usr/lib/slurm/accounting_storage_filetxt.so slurmctld: debug2: slurmdb_init() called slurmctld: error: open /var/log/slurm_jobacct.log: Permission denied slurmctld: error: Couldn't load specified plugin name for accounting_storage/filetxt: Plugin init() callback failed slurmctld: error: cannot resolve acct_storage plugin operations slurmctld: debug: Association database appears down, reading from state file. slurmctld: debug2: No association state file (/var/lib/slurm-llnl/slurmctld/assoc_mgr_state) to recover slurmctld: slurmctld version 2.3.4 started on cluster cluster slurmctld: debug3: Trying to load plugin /usr/lib/slurm/crypto_munge.so slurmctld: Munge cryptographic signature plugin loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/lib/slurm/select_linear.so slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/lib/slurm/preempt_none.so slurmctld: preempt/none loaded slurmctld: debug3: Success. slurmctld: debug3: Trying to load plugin /usr/lib/slurm/checkpoint_none.so slurmctld: debug3: Success. slurmctld: Checkpoint plugin loaded: checkpoint/none slurmctld: debug3: Trying to load plugin /usr/lib/slurm/accounting_storage_filetxt.so slurmctld: debug2: slurmdb_init() called slurmctld: error: open /var/log/slurm_jobacct.log: Permission denied slurmctld: error: Couldn't load specified plugin name for accounting_storage/filetxt: Plugin init() callback failed slurmctld: error: cannot resolve acct_storage plugin operations slurmctld: fatal: failed to initialize accounting_storage plugin This is what my config file looks like: # slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. # ControlMachine=head1 ControlAddr=10.0.0.1 #BackupController= #BackupAddr= # AuthType=auth/munge CacheGroups=0 #CheckpointType=checkpoint/none CryptoType=crypto/munge #DisableRootJobs=NO #EnforcePartLimits=NO #Epilog= #PrologSlurmctld= #FirstJobId=1 #MaxJobId=999999 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 JobCheckpointDir=/var/lib/slurm-llnl/checkpoint #JobCredentialPrivateKey= #JobCredentialPublicCertificate= #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins=1 #KillOnBadExit=0 #Licenses=foo*4,bar #MailProg=/usr/bin/mail #MaxJobCount=5000 #MaxStepCount=40000 #MaxTasksPerNode=128 MpiDefault=none #MpiParams=ports=#-# #PluginDir= #PlugStackConfig= #PrivateData=jobs ProctrackType=proctrack/pgid #Prolog= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= ReturnToService=1 #SallocDefaultCommand= SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd SlurmUser=slurm #SrunEpilog= #SrunProlog= StateSaveLocation=/var/lib/slurm-llnl/slurmctld SwitchType=switch/none #TaskEpilog= TaskPlugin=task/none #TaskPluginParam= #TaskProlog= #TopologyPlugin=topology/tree #TmpFs=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 # # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # # SCHEDULING #DefMemPerCPU=0 FastSchedule=1 #MaxMemPerCPU=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 SchedulerType=sched/backfill SchedulerPort=7321 SelectType=select/linear #SelectTypeParameters= # # # JOB PRIORITY #PriorityType=priority/basic #PriorityDecayHalfLife= #PriorityCalcPeriod= #PriorityFavorSmall= #PriorityMaxAge= #PriorityUsageResetPeriod= #PriorityWeightAge= #PriorityWeightFairshare= #PriorityWeightJobSize= #PriorityWeightPartition= #PriorityWeightQOS= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 #AccountingStorageHost= #AccountingStorageLoc= #AccountingStoragePass= #AccountingStoragePort= AccountingStorageType=accounting_storage/filetxt #AccountingStorageUser= AccountingStoreJobComment=YES ClusterName=cluster #DebugFlags= #JobCompHost= #JobCompLoc= #JobCompPass= #JobCompPort= JobCompType=jobcomp/filetxt #JobCompUser= JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/linux SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/var/log/slurm-llnl/slurmd.log #SlurmSchedLogFile= #SlurmSchedLogLevel= # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeName=node1 NodeAddr=10.0.0.101 CPUs=1 State=UNKNOWN PartitionName=benchmark Nodes=node1 Default=YES MaxTime=INFINITE State=UP I hope I included all the neccessary information and would aprecciate any ideas on how to fix this. Thanks in advance for dealing with it, Maria
