[slurm-dev] Spawning large number of tasks on one node - timeout?

Dominikus Heinzeller Wed, 23 Mar 2016 00:54:04 -0700
Hi all,
        
        I am having a problem with spawning a large number of threads on a
        node. My server consists of 4 sockets x 12 cores per socket x 2 threads 
per
        core = 96 procs
        
        The slurm.conf contains the following line:
        
        NodeName=keal1  Procs=96 SocketsPerBoard=4 CoresPerSocket=12
        ThreadsPerCore=2 RealMemory=1031770 State=UNKNOWN
        
        
        - The system is a Redhat SL 7.2 system running kernel
        3.10.0-327.10.1.el7.x86_64
        - slurm 15.08.2 (pre-compiled by the vendor, the system comes with
        gnu 4.8.5)
        - mpi library:  mvapich2-2.2b compiled with intel-15.0.4
        (with-pm=none, with-pmi=slurm)
        
        
        srun -N1 -n90 myprogram works, but takes about 30s to get past
        MPI_init
        
        srun -N1 -n91 myprogram aborts with:
        
        srun: defined options for program `srun'
        srun: --------------- ---------------------
        srun: user           : `heinzeller-d'
        srun: uid            : 9528
        srun: gid            : 945
        srun: cwd            : /home/heinzeller-d
        srun: ntasks         : 91 (set)
        srun: nodes          : 1 (set)
        srun: jobid          : 1867 (default)
        srun: partition      : default
        srun: profile        : `NotSet'
        srun: job name       : `sh'
        srun: reservation    : `(null)'
        srun: burst_buffer   : `(null)'
        srun: wckey          : `(null)'
        srun: cpu_freq_min   : 4294967294
        srun: cpu_freq_max   : 4294967294
        srun: cpu_freq_gov   : 4294967294
        srun: switches       : -1
        srun: wait-for-switches : -1
        srun: distribution   : unknown
        srun: cpu_bind       : default
        srun: mem_bind       : default
        srun: verbose        : 1
        srun: slurmd_debug   : 0
        srun: immediate      : false
        srun: label output   : false
        srun: unbuffered IO  : false
        srun: overcommit     : false
        srun: threads        : 60
        srun: checkpoint_dir : /var/slurm/checkpoint
        srun: wait           : 0
        srun: account        : (null)
        srun: comment        : (null)
        srun: dependency     : (null)
        srun: exclusive      : false
        srun: qos            : (null)
        srun: constraints    :
        srun: geometry       : (null)
        srun: reboot         : yes
        srun: rotate         : no
        srun: preserve_env   : false
        srun: network        : (null)
        srun: propagate      : NONE
        srun: prolog         : (null)
        srun: epilog         : (null)
        srun: mail_type      : NONE
        srun: mail_user      : (null)
        srun: task_prolog    : (null)
        srun: task_epilog    : (null)
        srun: multi_prog     : no
        srun: sockets-per-node  : -2
        srun: cores-per-socket  : -2
        srun: threads-per-core  : -2
        srun: ntasks-per-node   : -2
        srun: ntasks-per-socket : -2
        srun: ntasks-per-core   : -2
        srun: plane_size        : 4294967294
        srun: core-spec         : NA
        srun: power             :
        srun: sicp              : 0
        srun: remote command    : `./heat.exe'
        srun: launching 1867.7 on host keal1, 91 tasks: [0-90]
        srun: route default plugin loaded
        srun: Node keal1, 91 tasks started
        srun: Sent KVS info to 3 nodes, up to 33 tasks per node
        In: PMI_Abort(1, Fatal error in MPI_Init:
        Other MPI error, error stack:
        MPIR_Init_thread(514)..........:
        MPID_Init(365).................: channel initialization failed
        MPIDI_CH3_Init(495)............:
        MPIDI_CH3I_SHMEM_Helper_fn(908): ftruncate: Invalid argument
        )
        srun: Complete job step 1867.7 received
        slurmstepd: error: *** STEP 1867.7 ON keal1 CANCELLED AT
        2016-03-22T17:12:49 ***
        
        To me, this looks like a timeout problem at about 30s of MPI_Init
        time. If so, I am wondering how to speed up the MPI_Init?
        
        Timing for different number of tasks on that box:
        
        srun: Received task exit notification for 90 tasks (status=0x0000).
        srun: keal1: tasks 0-89: Completed
        real 0m32.962s
        user 0m0.022s
        sys 0m0.032s
        
        srun: Received task exit notification for 80 tasks (status=0x0000).
        srun: keal1: tasks 0-79: Completed
        real 0m26.755s
        user 0m0.016s
        sys 0m0.036s
        
        srun: Received task exit notification for 40 tasks (status=0x0000).
        srun: keal1: tasks 0-39: Completed
        
        real 0m12.810s
        user 0m0.014s
        sys 0m0.036s
        
        On a different node with 2 sockets x 10 cores per socket x 2 threads
        per core = 40 procs:
        
        srun: Received task exit notification for 40 tasks (status=0x0000).
        srun: kea05: tasks 0-39: Completed
        real 0m4.949s
        user 0m0.011s
        sys 0m0.012s
        
        Any help or suggestion what I could do?
        
        Thanks very much in advance,
        
        Dom/
[slurm-dev] Spawning large number of tasks on one node - timeout?

Reply via email to