[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Manuel Rodríguez Pascual
Hi,

You are both right :)  The problem is kind of solved now.

As Douglas and Jane stated,  after changing my SlurmSpoolDir to a local one
the error on the subject of this mail disappeared. I can now run  "srun -n
2 --tasks-per-node=1   ./helloWorldMPI" with no problem. However it does
not behave as expected (or at least as I would like to), as it creates a
job with just 1 task on each node instead of  a parallel one. This leads us
to the next point.

As Janne pointed out, my mvapich was not correctly compiled to support
srun. I managed to solve the compilation errors compiling with
"--with-pm=slurm" . The problem was basically not exporting
"/usr/local/lib" in LD_LIBRARY_PATH.

The problem with this new mvapich compilation is that "mpiexec" , "mpirun"
and all the similar commands related to mpi execution are not created, as
you are stating that srun will be used for that (as stated here
https://wiki.mpich.org/mpich/index.php/Frequently_Asked_Questions#Q:_What_are_process_managers.3F
). So altogether, you can either choose to execute mpi jobs with "mpiexec"
or with "srun". Is this correct, or am I missing something?

Thanks for your help and your fast support. best regards,


Manuel



2016-11-18 14:37 GMT+01:00 Douglas Jacobsen :

> Hello,
>
> Is " /home/localsoft/slurm/spool" local to the node?  Or is it on the
> network?  I think each node needs to have separate data (like job_cred)
> stored there, and if each slurmd is competing for that file naming space I
> could imagine that srun could have problems.  I typically use
> /var/spool/slurmd.
>
> From the slurm.conf page:
>
> """
>
> *SlurmdSpoolDir* Fully qualified pathname of a directory into which the
> *slurmd* daemon's state information and batch job script information are
> written. This must be a common pathname for all nodes, but should represent
> a directory which is local to each node (reference a local file system).
> The default value is "/var/spool/slurmd". Any "%h" within the name is
> replaced with the hostname on which the *slurmd* is running. Any "%n"
> within the name is replaced with the Slurm node name on which the *slurmd*
> is running.
>
> """
>
> I hope that helps,
>
> Doug
> On 11/18/16 1:07 AM, Janne Blomqvist wrote:
>
> On 2016-11-17 12:53, Manuel Rodríguez Pascual wrote:
>
> Hi all,
>
> I keep having some issues using Slurm + mvapich2. It seems that I cannot
> correctly configure Slurm and mvapich2 to work together. In particular,
> sbatch works correctly but srun does not.  Maybe someone here can
> provide me some guidance, as I suspect that the error is an obvious one,
> but I just cannot find it.
>
> CONFIGURATION INFO:
> I am employing Slurm 17.02.0-0pre2 and mvapich 2.2.
> Mvapich is compiled with "--disable-mcast --with-slurm= location>"  <---there is a note about this at the bottom of the mail
> Slurm is compiled with no special options. After compilation, I executed
> "make && make install" in "contribs/pmi2/" (I read it somewhere)
> Slurm is configured with "MpiDefault=pmi2" in slurm.conf
>
> TESTS:
> I am executing a "helloWorldMPI" that displays a hello world message and
> writes down the node name for each MPI task.
>
> sbatch works perfectly:
>
> $ sbatch -n 2 --tasks-per-node=2 --wrap 'mpiexec  ./helloWorldMPI'
> Submitted batch job 750
>
> $ more slurm-750.out
> Process 0 of 2 is on acme12.ciemat.es  
> 
> Hello world from process 0 of 2
> Process 1 of 2 is on acme12.ciemat.es  
> 
> Hello world from process 1 of 2
>
> $sbatch -n 2 --tasks-per-node=1 -p debug --wrap 'mpiexec  ./helloWorldMPI'
> Submitted batch job 748
>
> $ more slurm-748.out
> Process 0 of 2 is on acme11.ciemat.es  
> 
> Hello world from process 0 of 2
> Process 1 of 2 is on acme12.ciemat.es  
> 
> Hello world from process 1 of 2
>
>
> However, srun fails.
> On a single node it works correctly:
> $ srun -n 2 --tasks-per-node=2   ./helloWorldMPI
> Process 0 of 2 is on acme11.ciemat.es  
> 
> Hello world from process 0 of 2
> Process 1 of 2 is on acme11.ciemat.es  
> 
> Hello world from process 1 of 2
>
> But when using more than one node, it fails. Below there is the
> experiment with a lot of debugging info, in case it helps.
>
> (note that the job ID will be different sometimes as this mail is the
> result of multiple submissions and copy/pastes)
>
> $ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
> srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
> slurmstepd: error: *** STEP 753.0 ON acme11 CANCELLED AT
> 2016-11-17T10:19:47 ***
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> srun: error: acme11: task 0: Killed
> srun: error: acme12: task 1: Killed
>
>
> Slurmctld output:
> 

[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Douglas Jacobsen

Hello,

Is " /home/localsoft/slurm/spool" local to the node?  Or is it on the 
network?  I think each node needs to have separate data (like job_cred) 
stored there, and if each slurmd is competing for that file naming space 
I could imagine that srun could have problems. I typically use 
/var/spool/slurmd.


From the slurm.conf page:

"""

*SlurmdSpoolDir*
   Fully qualified pathname of a directory into which the *slurmd*
   daemon's state information and batch job script information are
   written. This must be a common pathname for all nodes, but should
   represent a directory which is local to each node (reference a local
   file system). The default value is "/var/spool/slurmd". Any "%h"
   within the name is replaced with the hostname on which the *slurmd*
   is running. Any "%n" within the name is replaced with the Slurm node
   name on which the *slurmd* is running. 


"""

I hope that helps,

Doug

On 11/18/16 1:07 AM, Janne Blomqvist wrote:

On 2016-11-17 12:53, Manuel Rodríguez Pascual wrote:

Hi all,

I keep having some issues using Slurm + mvapich2. It seems that I cannot
correctly configure Slurm and mvapich2 to work together. In particular,
sbatch works correctly but srun does not.  Maybe someone here can
provide me some guidance, as I suspect that the error is an obvious one,
but I just cannot find it.

CONFIGURATION INFO:
I am employing Slurm 17.02.0-0pre2 and mvapich 2.2.
Mvapich is compiled with "--disable-mcast --with-slurm="  <---there is a note about this at the bottom of the mail
Slurm is compiled with no special options. After compilation, I executed
"make && make install" in "contribs/pmi2/" (I read it somewhere)
Slurm is configured with "MpiDefault=pmi2" in slurm.conf

TESTS:
I am executing a "helloWorldMPI" that displays a hello world message and
writes down the node name for each MPI task.

sbatch works perfectly:

$ sbatch -n 2 --tasks-per-node=2 --wrap 'mpiexec  ./helloWorldMPI'
Submitted batch job 750

$ more slurm-750.out
Process 0 of 2 is on acme12.ciemat.es 
Hello world from process 0 of 2
Process 1 of 2 is on acme12.ciemat.es 
Hello world from process 1 of 2

$sbatch -n 2 --tasks-per-node=1 -p debug --wrap 'mpiexec  ./helloWorldMPI'
Submitted batch job 748

$ more slurm-748.out
Process 0 of 2 is on acme11.ciemat.es 
Hello world from process 0 of 2
Process 1 of 2 is on acme12.ciemat.es 
Hello world from process 1 of 2


However, srun fails.
On a single node it works correctly:
$ srun -n 2 --tasks-per-node=2   ./helloWorldMPI
Process 0 of 2 is on acme11.ciemat.es 
Hello world from process 0 of 2
Process 1 of 2 is on acme11.ciemat.es 
Hello world from process 1 of 2

But when using more than one node, it fails. Below there is the
experiment with a lot of debugging info, in case it helps.

(note that the job ID will be different sometimes as this mail is the
result of multiple submissions and copy/pastes)

$ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
slurmstepd: error: *** STEP 753.0 ON acme11 CANCELLED AT
2016-11-17T10:19:47 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: acme11: task 0: Killed
srun: error: acme12: task 1: Killed


Slurmctld output:
slurmctld: debug2: Performing purge of old job records
slurmctld: debug2: Performing full system state save
slurmctld: debug3: Writing job id 753 to header record of job_state file
slurmctld: debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION
from uid=500
slurmctld: debug3: JobDesc: user_id=500 job_id=N/A partition=(null)
name=helloWorldMPI
slurmctld: debug3:cpus=2-4294967294 pn_min_cpus=-1 core_spec=-1
slurmctld: debug3:Nodes=1-[4294967294] Sock/Node=65534
Core/Sock=65534 Thread/Core=65534
slurmctld: debug3:pn_min_memory_job=18446744073709551615
pn_min_tmp_disk=-1
slurmctld: debug3:immediate=0 features=(null) reservation=(null)
slurmctld: debug3:req_nodes=(null) exc_nodes=(null) gres=(null)
slurmctld: debug3:time_limit=-1--1 priority=-1 contiguous=0 shared=-1
slurmctld: debug3:kill_on_node_fail=-1 script=(null)
slurmctld: debug3:argv="./helloWorldMPI"
slurmctld: debug3:stdin=(null) stdout=(null) stderr=(null)
slurmctld: debug3:work_dir=/home/slurm/tests alloc_node:sid=acme31:11229
slurmctld: debug3:power_flags=
slurmctld: debug3:resp_host=172.17.31.165 alloc_resp_port=56804
other_port=33290
slurmctld: debug3:dependency=(null) account=(null) qos=(null)
comment=(null)
slurmctld: debug3:mail_type=0 mail_user=(null) nice=0 num_tasks=2
open_mode=0 overcommit=-1 acctg_freq=(null)
slurmctld: debug3:network=(null) begin=Unknown cpus_per_task=-1
requeue=-1 licenses=(null)
slurmctld: debug3:end_time= signal=0@0 wait_all_nodes=-1 cpu_freq=
slurmctld: debug3:ntasks_per_node=1 ntasks_per_socket=-1

[slurm-dev] Re: problem configuring mvapich + slurm: "error: mpi/pmi2: failed to send temp kvs to compute nodes"

2016-11-18 Thread Janne Blomqvist
On 2016-11-17 12:53, Manuel Rodríguez Pascual wrote:
> Hi all,
>
> I keep having some issues using Slurm + mvapich2. It seems that I cannot
> correctly configure Slurm and mvapich2 to work together. In particular,
> sbatch works correctly but srun does not.  Maybe someone here can
> provide me some guidance, as I suspect that the error is an obvious one,
> but I just cannot find it.
>
> CONFIGURATION INFO:
> I am employing Slurm 17.02.0-0pre2 and mvapich 2.2.
> Mvapich is compiled with "--disable-mcast --with-slurm= location>"  <---there is a note about this at the bottom of the mail
> Slurm is compiled with no special options. After compilation, I executed
> "make && make install" in "contribs/pmi2/" (I read it somewhere)
> Slurm is configured with "MpiDefault=pmi2" in slurm.conf
>
> TESTS:
> I am executing a "helloWorldMPI" that displays a hello world message and
> writes down the node name for each MPI task.
>
> sbatch works perfectly:
>
> $ sbatch -n 2 --tasks-per-node=2 --wrap 'mpiexec  ./helloWorldMPI'
> Submitted batch job 750
>
> $ more slurm-750.out
> Process 0 of 2 is on acme12.ciemat.es 
> Hello world from process 0 of 2
> Process 1 of 2 is on acme12.ciemat.es 
> Hello world from process 1 of 2
>
> $sbatch -n 2 --tasks-per-node=1 -p debug --wrap 'mpiexec  ./helloWorldMPI'
> Submitted batch job 748
>
> $ more slurm-748.out
> Process 0 of 2 is on acme11.ciemat.es 
> Hello world from process 0 of 2
> Process 1 of 2 is on acme12.ciemat.es 
> Hello world from process 1 of 2
>
>
> However, srun fails.
> On a single node it works correctly:
> $ srun -n 2 --tasks-per-node=2   ./helloWorldMPI
> Process 0 of 2 is on acme11.ciemat.es 
> Hello world from process 0 of 2
> Process 1 of 2 is on acme11.ciemat.es 
> Hello world from process 1 of 2
>
> But when using more than one node, it fails. Below there is the
> experiment with a lot of debugging info, in case it helps.
>
> (note that the job ID will be different sometimes as this mail is the
> result of multiple submissions and copy/pastes)
>
> $ srun -n 2 --tasks-per-node=1   ./helloWorldMPI
> srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
> slurmstepd: error: *** STEP 753.0 ON acme11 CANCELLED AT
> 2016-11-17T10:19:47 ***
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> srun: error: acme11: task 0: Killed
> srun: error: acme12: task 1: Killed
>
>
> Slurmctld output:
> slurmctld: debug2: Performing purge of old job records
> slurmctld: debug2: Performing full system state save
> slurmctld: debug3: Writing job id 753 to header record of job_state file
> slurmctld: debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION
> from uid=500
> slurmctld: debug3: JobDesc: user_id=500 job_id=N/A partition=(null)
> name=helloWorldMPI
> slurmctld: debug3:cpus=2-4294967294 pn_min_cpus=-1 core_spec=-1
> slurmctld: debug3:Nodes=1-[4294967294] Sock/Node=65534
> Core/Sock=65534 Thread/Core=65534
> slurmctld: debug3:pn_min_memory_job=18446744073709551615
> pn_min_tmp_disk=-1
> slurmctld: debug3:immediate=0 features=(null) reservation=(null)
> slurmctld: debug3:req_nodes=(null) exc_nodes=(null) gres=(null)
> slurmctld: debug3:time_limit=-1--1 priority=-1 contiguous=0 shared=-1
> slurmctld: debug3:kill_on_node_fail=-1 script=(null)
> slurmctld: debug3:argv="./helloWorldMPI"
> slurmctld: debug3:stdin=(null) stdout=(null) stderr=(null)
> slurmctld: debug3:work_dir=/home/slurm/tests alloc_node:sid=acme31:11229
> slurmctld: debug3:power_flags=
> slurmctld: debug3:resp_host=172.17.31.165 alloc_resp_port=56804
> other_port=33290
> slurmctld: debug3:dependency=(null) account=(null) qos=(null)
> comment=(null)
> slurmctld: debug3:mail_type=0 mail_user=(null) nice=0 num_tasks=2
> open_mode=0 overcommit=-1 acctg_freq=(null)
> slurmctld: debug3:network=(null) begin=Unknown cpus_per_task=-1
> requeue=-1 licenses=(null)
> slurmctld: debug3:end_time= signal=0@0 wait_all_nodes=-1 cpu_freq=
> slurmctld: debug3:ntasks_per_node=1 ntasks_per_socket=-1
> ntasks_per_core=-1
> slurmctld: debug3:mem_bind=65534:(null) plane_size:65534
> slurmctld: debug3:array_inx=(null)
> slurmctld: debug3:burst_buffer=(null)
> slurmctld: debug3:mcs_label=(null)
> slurmctld: debug3:deadline=Unknown
> slurmctld: debug3:bitflags=0 delay_boot=4294967294
> slurmctld: debug3: User (null)(500) doesn't have a default account
> slurmctld: debug3: User (null)(500) doesn't have a default account
> slurmctld: debug3: found correct qos
> slurmctld: debug3: before alteration asking for nodes 1-4294967294 cpus
> 2-4294967294
> slurmctld: debug3: after alteration asking for nodes 1-4294967294 cpus
> 2-4294967294
> slurmctld: debug2: found 8 usable nodes from config containing
> acme[11-14,21-24]
> slurmctld: debug3: _pick_best_nodes: job 754