Folks,


my goal is to run a parallel job on a cluster of KNL nodes configured with the same cluster *and* memory mode.


at first, i made a simple prototype with 8 nodes, and the four following features : north, east, west and south.

each node is part of one quadrant, and there are two nodes per quadrant.


from my slurm.conf(*):

# COMPUTE NODES
NodeName=n[0-1] Procs=4 State=UNKNOWN Feature=north,east
NodeName=n[2-3] Procs=4 State=UNKNOWN Feature=south,east
NodeName=n[4-5] Procs=4 State=UNKNOWN Feature=south,west
NodeName=n[6-7] Procs=4 State=UNKNOWN Feature=north,west
PartitionName=debug Nodes=n[0-7] Default=YES MaxTime=INFINITE State=UP


$ sinfo -o "%30N %20b %f"
NODELIST                       ACTIVE_FEATURES      AVAIL_FEATURES
n[0-1]                         north,east           north,east
n[2-3]                         south,east           south,east
n[4-5]                         south,west           south,west
n[6-7]                         north,west           north,west
$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      8   idle n[0-7]


my submission command is

salloc -N 2 -C '[north|south]&[east|west]' ./hello.sh

and the hello.sh script simply displays the node list (e.g. echo $SLURM_NODELIST)


at first, n[0-1] are allocated (e.g. north-east quadrant) => OK

then i make n0 unavailable, and n[6-7] are allocated (e.g. north-west) quadrant => OK

then i make n6 unavaliable, and [n1,7] are allocated (e.g. one node is north-east and the other node is north-west) => KO


is there something wrong with my command line ?

or is this a bug ?

fwiw, i was unsuccessful using parenthesis :

$ salloc -N 2 -C '([north|south])&([east|west])' ./hello.sh
salloc: error: Job submit/allocate failed: Invalid feature specification



(*)

i noted the man page suggests AvailableFeatures and ActiveFeatures can be set by scontrol.

my initial plan was to

scontrol update NodeName=n[0-7] AvailableFeatures=north,east,west,south

and then

scontrol update NodeName=n[0 -1] ActiveFeatures=north,east

...

both commands seem to work, but all available features are active

$ sinfo -o "%30N %30b %f"
NODELIST                       ACTIVE_FEATURES AVAIL_FEATURES
n[0-1]                         north,east,west,south north,east,west,south

did i correctly interpret the man pages ?
if yes, is this a bug ?


Thanks in advance for you help

Gilles

Reply via email to