Yes, I have tried NetPipe-Java and iperf for bandwidth and configuration
test. NetPipe Java achieves maximum 9.40 Gbps while iperf achieves maximum
9.61 Gbps bandwidth. I have also tested my bandwidth program on 1Gbps
Ethernet connection and it achieves 901 Mbps bandwidth. I am using the same
program for 10G network benchmarks. Please find attached source file of
bandwidth program.

As far as --bind-to core is concerned, I think it is working fine. Here is
output of --report-bindings switch.
[host3:07134] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././.]
[host4:10282] MCW rank 1 bound to socket 0[core 0[hwt 0]]: [B/././.]




On Tue, Apr 15, 2014 at 8:39 PM, Ralph Castain <r...@open-mpi.org> wrote:

> Have you tried a typical benchmark (e.g., NetPipe or OMB) to ensure the
> problem isn't in your program? Outside of that, you might want to
> explicitly tell it to --bind-to core just to be sure it does so - it's
> supposed to do that by default, but might as well be sure. You can check by
> adding --report-binding to the cmd line.
>
>
> On Apr 14, 2014, at 11:10 PM, Muhammad Ansar Javed <
> muhammad.an...@seecs.edu.pk> wrote:
>
> Hi,
> I am trying to benchmark Open MPI performance on 10G Ethernet network
> between two hosts. The performance numbers of benchmarks are less than
> expected. The maximum bandwidth achieved by OMPI-C is 5678 Mbps and I was
> expecting around 9000+ Mbps. Moreover latency is also quite higher than
> expected, ranging from 37 to 59 us. Here is complete set of numbers.
>
>
>
> *LatencyOpen MPI C    Size    Time (us)*
> 1         37.76
> 2         37.75
> 4         37.78
> 8         55.17
> 16       37.89
> 32       39.08
> 64       37.78
> 128     59.46
> 256     39.37
> 512     40.39
> 1024   47.18
> 2048   47.84
>
>
>
>
> *BandwidthOpen MPI C    Size (Bytes)    Bandwidth (Mbps)*
> 2048               412.22
> 4096               539.59
> 8192               827.73
> 16384             1655.35
> 32768             3274.3
> 65536             1995.22
> 131072           3270.84
> 262144           4316.22
> 524288           5019.46
> 1048576         5236.17
> 2097152         5362.61
> 4194304         5495.2
> 8388608         5565.32
> 16777216       5678.32
>
>
> My environments consists of two hosts having point-to-point (switch-less)
> 10Gbps Ethernet connection.  Environment (OS, user, directory structure
> etc) on both hosts is exactly same. There is no NAS or shared file system
> between both hosts. Following are configuration and job launching commands
> that I am using. Moreover, I have attached output of script ompi_info
> --all.
>
> Configuration commmand: ./configure --enable-mpi-java
> --prefix=/home/mpj/installed/openmpi_installed CC=/usr/bin/gcc
> --disable-mpi-fortran
>
> Job launching command: mpirun -np 2 -hostfile machines -npernode 1
> ./latency.out
>
> Are these numbers okay? If not then please suggest performance tuning
> steps...
>
> Thanks
>
> --
> Ansar Javed
> HPC Lab
> SEECS NUST
> Contact: +92 334 438 9394
> Email: muhammad.an...@seecs.edu.pk
>  <ompi_info.tar.bz2>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Regards

Ansar Javed
HPC Lab
SEECS NUST
Contact: +92 334 438 9394
Email: muhammad.an...@seecs.edu.pk
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define LEN 16777216

int main(int argc, char** argv) {

  double start=0.0, stop=0.0;
  double totalTime = 0.0;

  MPI_Init(&argc, &argv);				    


  MPI_Status status;

  char* sbuf     = (char *) malloc (LEN*sizeof(char));

  int WARM_UP = 2000 ;
  int REPEAT = 5000 ;			

  int j=1,i=0;   	       		   	
  int rank =0;
  int LOG2N_MAX = 1000000,log2nbyte=0,padding=0;	
  double timed = 0.0;		
  double latency = 0.0;
    
  for(i =0;i < LEN ;i++) {
    sbuf[i] = 's';
    //rbuf[i] = 'x';
  }
  
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);    

char name[MPI_MAX_PROCESSOR_NAME];
int  len;
    MPI_Get_processor_name(name, &len);
    printf("Hello, world.  I am %d on %s\n", rank, name);fflush(stdout);    
  /* Logrithmic Loop */
  for (log2nbyte = 0; (log2nbyte <= LOG2N_MAX) && (j < LEN); ++log2nbyte) { 

	j = (1 << log2nbyte);

	/* Warm Up Loop */
	for(i=0;i < WARM_UP ;i++) {
	  if(rank == 0) {	
MPI_Recv(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD, &status);		  
MPI_Send(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD);		  
	  }
	  else if(rank == 1) {
MPI_Send(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD);		  
MPI_Recv(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD , &status);		  
	  }		
	}
	
	/* Warm Up Loop */                
	start = MPI_Wtime();				
		
	/* Latency Calculation Loop */
	
	for (i = 0; i < REPEAT ; i++) {	   	
	  if(rank == 0) {
MPI_Send(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD);		  
MPI_Recv(sbuf, j, MPI_CHAR, 1, 998, MPI_COMM_WORLD , &status);		  
	  } else if (rank == 1) {
MPI_Recv(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD, &status);		  
MPI_Send(sbuf, j, MPI_CHAR, 0, 998, MPI_COMM_WORLD);		  
	  }
	}		
		
	stop = MPI_Wtime();
	timed = stop - start;			
	/* End latency calculation loop */
 	latency = ( ((timed)/(2*REPEAT) )*1000*1000);	
	if(rank == 0) {
        printf("%d\t%.2f\t%.2f\n", j , (latency), 
		            (( 8*j ) /( 1024*1024* (latency/(1000*1000)))) );
	}
    }//end logrithmic loop
    	 		
    MPI_Barrier(MPI_COMM_WORLD);    		
    MPI_Finalize();	  	
    return 0;
}

Reply via email to