Yes, I have tried NetPipe-Java and iperf for bandwidth and configuration test. NetPipe Java achieves maximum 9.40 Gbps while iperf achieves maximum 9.61 Gbps bandwidth. I have also tested my bandwidth program on 1Gbps Ethernet connection and it achieves 901 Mbps bandwidth. I am using the same program for 10G network benchmarks. Please find attached source file of bandwidth program.
As far as --bind-to core is concerned, I think it is working fine. Here is output of --report-bindings switch. [host3:07134] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.] [host4:10282] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.] On Tue, Apr 15, 2014 at 8:39 PM, Ralph Castain <r...@open-mpi.org> wrote: > Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure the > problem isn't in your program? Outside of that, you might want to > explicitly tell it to --bind-to core just to be sure it does so - it's > supposed to do that by default, but might as well be sure. You can check by > adding --report-binding to the cmd line. > > > On Apr 14, 2014, at 11:10 PM, Muhammad Ansar Javed < > muhammad.an...@seecs.edu.pk> wrote: > > Hi, > I am trying to benchmark Open MPI performance on 10G Ethernet network > between two hosts. The performance numbers of benchmarks are less than > expected. The maximum bandwidth achieved by OMPI-C is 5678 Mbps and I was > expecting around 9000+ Mbps. Moreover latency is also quite higher than > expected, ranging from 37 to 59 us. Here is complete set of numbers. > > > > *LatencyOpen MPI C Size Time (us)* > 1 37.76 > 2 37.75 > 4 37.78 > 8 55.17 > 16 37.89 > 32 39.08 > 64 37.78 > 128 59.46 > 256 39.37 > 512 40.39 > 1024 47.18 > 2048 47.84 > > > > > *BandwidthOpen MPI C Size (Bytes) Bandwidth (Mbps)* > 2048 412.22 > 4096 539.59 > 8192 827.73 > 16384 1655.35 > 32768 3274.3 > 65536 1995.22 > 131072 3270.84 > 262144 4316.22 > 524288 5019.46 > 1048576 5236.17 > 2097152 5362.61 > 4194304 5495.2 > 8388608 5565.32 > 16777216 5678.32 > > > My environments consists of two hosts having point-to-point (switch-less) > 10Gbps Ethernet connection. Environment (OS, user, directory structure > etc) on both hosts is exactly same. There is no NAS or shared file system > between both hosts. Following are configuration and job launching commands > that I am using. Moreover, I have attached output of script ompi_info > --all. > > Configuration commmand: ./configure --enable-mpi-java > --prefix=/home/mpj/installed/openmpi_installed CC=/usr/bin/gcc > --disable-mpi-fortran > > Job launching command: mpirun -np 2 -hostfile machines -npernode 1 > ./latency.out > > Are these numbers okay? If not then please suggest performance tuning > steps... > > Thanks > > -- > Ansar Javed > HPC Lab > SEECS NUST > Contact: +92 334 438 9394 > Email: muhammad.an...@seecs.edu.pk > <ompi_info.tar.bz2>_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Regards Ansar Javed HPC Lab SEECS NUST Contact: +92 334 438 9394 Email: muhammad.an...@seecs.edu.pk
#include <stdio.h> #include <stdlib.h> #include "mpi.h" #define LEN 16777216 int main(int argc, char** argv) { double start=0.0, stop=0.0; double totalTime = 0.0; MPI_Init(&argc, &argv); MPI_Status status; char* sbuf = (char *) malloc (LEN*sizeof(char)); int WARM_UP = 2000 ; int REPEAT = 5000 ; int j=1,i=0; int rank =0; int LOG2N_MAX = 1000000,log2nbyte=0,padding=0; double timed = 0.0; double latency = 0.0; for(i =0;i < LEN ;i++) { sbuf[i] = 's'; //rbuf[i] = 'x'; } MPI_Comm_rank(MPI_COMM_WORLD, &rank); char name[MPI_MAX_PROCESSOR_NAME]; int len; MPI_Get_processor_name(name, &len); printf("Hello, world. I am %d on %s\n", rank, name);fflush(stdout); /* Logrithmic Loop */ for (log2nbyte = 0; (log2nbyte <= LOG2N_MAX) && (j < LEN); ++log2nbyte) { j = (1 << log2nbyte); /* Warm Up Loop */ for(i=0;i < WARM_UP ;i++) { if(rank == 0) { MPI_Recv(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD, &status); MPI_Send(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD); } else if(rank == 1) { MPI_Send(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD); MPI_Recv(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD , &status); } } /* Warm Up Loop */ start = MPI_Wtime(); /* Latency Calculation Loop */ for (i = 0; i < REPEAT ; i++) { if(rank == 0) { MPI_Send(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD); MPI_Recv(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD , &status); } else if (rank == 1) { MPI_Recv(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD, &status); MPI_Send(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD); } } stop = MPI_Wtime(); timed = stop - start; /* End latency calculation loop */ latency = ( ((timed)/(2*REPEAT) )*1000*1000); if(rank == 0) { printf("%d\t%.2f\t%.2f\n", j , (latency), (( 8*j ) /( 1024*1024* (latency/(1000*1000)))) ); } }//end logrithmic loop MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); return 0; }