Oh, you might also want to try out this simple patch too seeing as 
you're on a Blue Gene:
http://bugs.schedmd.com/show_bug.cgi?id=95

Mark.

On 31/07/12 03:45, James Sweet wrote:
>
> Hi,
>
> I am trying to configure slurm to run on a 4 rack BG/Q system and am getting 
> stuck in creating a config that slurmctld likes. I have already created
> blocks in mmcs for both a whole rack and also each individual midplane. To 
> start I would like to try and create static blocks in slurm that map to the
> rack/midplane blocks in mmcs. I have read though the slurm bluegene admin 
> guide but i'm unsure as to how to fix my config to sort out the  "Duplicated
> NodeName" error i am seeing when I try and run slurmctld in debug mode.
>
> The error appears on both 2.4.1 and 2.5.0-0.pre2. The following slurm.conf, 
> bluegene.conf and slurmctrld -Dvvv output are for 2.5.0-0.pre2
>
> * slurm.conf
>
>> [jim@bgqsn 2.5.0-0.pre2]$ grep -v ^# /opt/slurm/2.5.0-0.pre2/etc/slurm.conf 
>> |sed '/^$/d'
>> ClusterName=bgq-pre_ga
>> ControlMachine=bgqsn
>> SlurmUser=slurm
>> SlurmctldPort=6817
>> SlurmdPort=6818
>> AuthType=auth/munge
>> StateSaveLocation=/tmp
>> SlurmdSpoolDir=/tmp/slurmd
>> SwitchType=switch/none
>> MpiDefault=none
>> SlurmctldPidFile=/var/run/slurmctld.pid
>> SlurmdPidFile=/var/run/slurmd.pid
>> ProctrackType=proctrack/pgid
>> CacheGroups=0
>> ReturnToService=0
>> Prolog=/opt/slurm/2.5.0-0.pre2/sbin/slurm_prolog
>> Epilog=/opt/slurm/2.5.0-0.pre2/sbin/slurm_epilog
>> SlurmctldTimeout=300
>> SlurmdTimeout=300
>> InactiveLimit=0
>> MinJobAge=300
>> KillWait=30
>> Waittime=0
>> SchedulerType=sched/backfill
>> SelectType=select/bluegene
>> FastSchedule=1
>> DebugFlags=BGBlockPick,SelectType
>> SlurmctldDebug=3
>> SlurmctldLogFile=/tmp/slurm.log
>> SlurmdDebug=3
>> JobCompType=jobcomp/none
>> NodeName=bgq[0000x1011] State=UNKNOWN
>> PartitionName=DEFAULT Shared=FORCE
>> PartitionName=pbatch State=UP Nodes=bgq[0000x1011] Default=Yes
>
> * bluegene.conf
>
>> [jim@bgqsn 2.5.0-0.pre2]$ grep -v ^# 
>> /opt/slurm/2.5.0-0.pre2/etc/bluegene.conf |sed '/^$/d'
>> MloaderImage=/bgsys/drivers/ppcfloor/boot/uloader
>> IONodesPerMP=8 # io semi-poor
>> BridgeAPILogFile=/tmp/bridgeapi.log
>> BridgeAPIVerbose=2
>> DebugFlags=BGBlockPick,SelectType
>> BasePartitionNodeCnt=512
>> NodeCardNodeCnt=32
>> LayoutMode=STATIC
>
> * slurmctld
>
>> [jim@bgqsn 2.5.0-0.pre2]$ sudo ./sbin/slurmctld -Dvvvvv
>> slurmctld: pidfile not locked, assuming no running daemon
>> slurmctld: Warning: Core limit is only 0 KB
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/accounting_storage_none.so
>> slurmctld: Accounting storage NOT INVOKED plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: not enforcing associations and no list was given so we 
>> are giving a blank list
>> slurmctld: debug2: No Assoc usage file (/tmp/assoc_usage) to recover
>> slurmctld: slurmctld version 2.5.0-pre2 started on cluster bgq-pre_ga
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/crypto_munge.so
>> slurmctld: Munge cryptographic signature plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/select_bluegene.so
>> slurmctld: BlueGene node selection plugin loading...
>> slurmctld: debug:  Setting dimensions from slurm.conf file
>> slurmctld: Attempting to contact MMCS
>> slurmctld: BlueGene configured with 2122 midplanes
>> slurmctld: debug:  We are using 2122 of the system.
>> slurmctld: BlueGene plugin loaded successfully
>> slurmctld: BlueGene node selection plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/preempt_none.so
>> slurmctld: preempt/none loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/checkpoint_none.so
>> slurmctld: debug3: Success.
>> slurmctld: Checkpoint plugin loaded: checkpoint/none
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/jobacct_gather_none.so
>> slurmctld: Job accounting gather NOT_INVOKED plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug:  No backup controller to shutdown
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/switch_none.so
>> slurmctld: switch NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Prefix is bgq bgq[0000x1011] 4
>> slurmctld: debug3: Trying to load plugin 
>> /opt/slurm/2.5.0-0.pre2/lib/slurm/topology_none.so
>> slurmctld: topology NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: fatal: Duplicated NodeName bgq0000 in the config file
>> [jim@bgqsn 2.5.0-0.pre2]$
>
> Any pointers or help would be much appreciated.
>
> Many Thanks
>
> James Sweet
>

Reply via email to