Hi Again ! Dirk Eddelbuettel wrote:
Works for me (though I prefer salloc), suggesting that you did something to your network topology or Open MPI configuration: :~$ cat /tmp/jerome_hw.c // mpicc -o phello phello.c // mpirun -np 5 phello #include <unistd.h> #include <stdio.h> #include <mpi.h> int main(int narg, char *args[]){ int rank,size; char ProcessorName[MPI_MAX_PROCESSOR_NAME]; int ProcessorNameLength; MPI_Init(&narg,&args); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Get_processor_name(ProcessorName,&ProcessorNameLength); sleep(11); fprintf(stdout, "Hello world! I am %d of %d and my name is `%s'\n", rank,size, ProcessorName); MPI_Finalize(); return 0; } // // End of file `phello.c'. :~$ mpicc.openmpi -o /tmp/jerome_hw /tmp/jerome_hw.c :~$ orterun -np 2 /tmp/jerome_hw Hello world! I am 1 of 2 and my name is `xyz-1' Hello world! I am 0 of 2 and my name is `xyz-1' :~$ salloc orterun -np 2 /tmp/jerome_hw salloc: Granted job allocation 421 Hello world! I am 0 of 2 and my name is `xyz-1' Hello world! I am 1 of 2 and my name is `xyz-1' salloc: Relinquishing job allocation 421 :~$
The above submission works the same on my clusters. But in fact, my issue involve interconnection between the nodes of the clusters: in the above examples involve no connection between nodes. My cluster is a cluster of quadcore computers: if in the sbatch script #SBATCH --nodes=7 #SBATCH --ntasks=15 is replaced by #SBATCH --nodes=1 #SBATCH --ntasks=4 everything is fine as no interconnection is involved. Can you test the inconnection part of the story ?
| I have set no MCA parameter, and the firewalls are off, and the kernels (2.6.16) were built with no Security feature. Try simplifying further: no default hosts beside localhost etc. Try orterun before you try salloc. Simplicity first.
I try to keep thing simple (and secure): I will double check my set up. Thanks, Jerome
Dirk