Hi guys. 





I have the Intel Knights Landing server and I set slurm on my knl server.





but, I have failed to submit test jobs to use high bandwidth memory.





I need your help.





When I run my command, this is my slurmctld.log





#srun --gres=hbm:1  numactl --membind=1 mpirun -np 24 osu_latency


===============================================================


[2017-03-02T17:19:17.426] slurmctld version 17.02.0-0pre4 started on cluster 
cluster

[2017-03-02T17:19:17.430] AllowMCDRAM=cache,hybrid,flat,equal,auto 
AllowNUMA=a2a,snc2,hemi,quad

[2017-03-02T17:19:17.430] AllowUserBoot=ALL

[2017-03-02T17:19:17.430] DefaultMCDRAM=flat DefaultNUMA=quad

[2017-03-02T17:19:17.431] McPath=/sys/devices/system/edac/mc

[2017-03-02T17:19:17.431] SyscfgPath=/usr/bin/syscfg/syscfg

[2017-03-02T17:19:17.431] UmeCheckInterval=0

[2017-03-02T17:19:17.434] layouts: no layout to initialize

[2017-03-02T17:19:17.451] layouts: loading entities/relations information

[2017-03-02T17:19:17.451] Recovered state of 1 nodes

[2017-03-02T17:19:17.452] Recovered information about 0 jobs

[2017-03-02T17:19:17.452] gres/hbm: state for knl02

[2017-03-02T17:19:17.452]   gres_cnt found:TBD configured:0 avail:0 alloc:0

[2017-03-02T17:19:17.452]   gres_bit_alloc:

[2017-03-02T17:19:17.452]   gres_used:(null)

[2017-03-02T17:19:17.453] Recovered state of 0 reservations

[2017-03-02T17:19:17.453] _preserve_plugins: backup_controller not specified

[2017-03-02T17:19:17.453] Running as primary controller

[2017-03-02T17:19:17.454] No parameter for mcs plugin, default values set

[2017-03-02T17:19:17.454] mcs: MCSParameters = (null). ondemand set.

[2017-03-02T17:19:20.013] _update_node_avail_features: nodes knl02 available 
features set to: a2a,hemi,quad,snc2,snc4,cache,flat,hybrid,auto,knl

[2017-03-02T17:19:20.017] _update_node_active_features: nodes knl02 active 
features set to: quad,flat

[2017-03-02T17:19:20.017] gres/hbm: state for knl02

[2017-03-02T17:19:20.018]   gres_cnt found:17179869184 configured:17179869184 
avail:17179869184 alloc:0

[2017-03-02T17:19:20.018]   gres_bit_alloc:

[2017-03-02T17:19:20.018]   gres_used:(null)

[2017-03-02T17:19:20.462] 
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=2,partition_job_depth=0,sched_max_job_start=0,sched_min_interval=0

[2017-03-02T17:24:21.535] gres: hbm state for job 171

[2017-03-02T17:24:21.535]   gres_cnt:1 node_cnt:0 type:(null)

[2017-03-02T17:24:21.535] error: gres/hbm: node knl02 gres bitmap size bad (0 < 
17179869184)


==========================================================================================








slurm.conf


====================================


# LOGGING AND ACCOUNTING

#AccountingStorageType=accounting_storage/none

AccountingStorageType=accounting_storage/filetxt

ClusterName=cluster

#JobAcctGatherFrequency=30

JobAcctGatherType=jobacct_gather/linux

SlurmctldDebug=3

SlurmctldLogFile=/var/log/slurm/slurmctld.log

SlurmdDebug=3

SlurmdLogFile=/var/log/slurm/slurmd.log

NodeFeaturesPlugins=knl_generic

DebugFlags=NodeFeatures,Gres

GresTypes=hbm

RebootProgram=/sbin/reboot

#Nodes

Nodename=knl02 Sockets=1 CoresPerSocket=68 ThreadsPerCore=4 RealMemory=95891 
Feature=knl

PartitionName=hbm Default=YES MaxTime=INFINITE State=UP Nodes=knl02

#Auth

AuthType=auth/none


=======================================================





knl_generic.conf


======================================================


# Sample knl_generic.conf

SyscfgPath=/usr/bin/syscfg/syscfg

DefaultNUMA=quad         # NUMA=all2all

AllowNUMA=quad,a2a,snc2,hemi

DefaultMCDRAM=flat     # MCDRAM=cache


==========================================================








gres.conf


================================================


# Configure support 


Name=hbm File=/dev/shm/mcdram


=================================================





what's the problem?





When I use gres:hbm option, I could run normally with ddr memory not mcdram.
=====================================
Seungwoo Rho
National Institute of Supercomputing and Networking,
KISTI,
52-11, Eoeundong, Yuseonggu,
Daejeon, 305-806, Republic of Korea
e-mail : [email protected]
Phone : +82-42-869-1643

Mobie : +82-10-8849-4001
=====================================

Reply via email to