Hello, Alina! I use "OSU MPI Multiple Bandwidth / Message Rate Test v4.4.1". I downloaded it from the website: http://mvapich.cse.ohio-state.edu/benchmarks/ I have attached "osu_mbw_mr.c" to this letter. Best regards, Timur
Четверг, 18 июня 2015, 18:23 +03:00 от Alina Sklarevich <ali...@dev.mellanox.co.il>: >Hi Timur, > >Can you please tell me which osu version you are using? >Unless it is from HPCX, please attach the source file of osu_mbw_mr.c you are >using. > >Thank you, >Alina. > >On Tue, Jun 16, 2015 at 7:10 PM, Timur Ismagilov < tismagi...@mail.ru > wrote: >>I'm sorry, I'm forget to attach results. >>With ' --bind-to socket' i get the same results as '--bind-to-core' : 3813 >>MB/s. >>I have attached ompi_yalla_socket.out and ompi_yalla_socket.err files to this >>letter. >> >>Best regards, >>Timur >> >> >>Вторник, 16 июня 2015, 18:15 +03:00 от Alina Sklarevich < >>ali...@dev.mellanox.co.il >: >>>Hi Timur, >>> >>>Can you please try running your ompi_yalla cmd with ' --bind-to socket' >>>(instead of binding to core) and check if it affects the results? >>>We saw that it made a difference on the performance in our lab so that's why >>>I asked you to try the same. >>> >>>Thanks, >>>Alina. >>> >>>On Tue, Jun 16, 2015 at 5:53 PM, Timur Ismagilov < tismagi...@mail.ru > >>>wrote: >>>>Hello, Alina! >>>> >>>>If I use --map-by node I will get only intranode communications on >>>>osu_mbw_mr. I use --map-by core instead. >>>> >>>>I have 2 nodes, each node has 2 sockets with 8 cores per socket. >>>> >>>>When I run osu_mbw_mr on 2 nodes with 32 MPI procs (command see below), I >>>>expect to see the unidirectional bandwidth of 4xFDR link as a result of >>>>this test. >>>> >>>>With IntelMPI I get 6367 MB/s, >>>>With ompi_yalla I get about 3744 MB/s (problem: it is a half of impi result) >>>>With openmpi without mxm (ompi_clear) I get 6321 MB/s. >>>> >>>>How can I increase yalla results? >>>> >>>>IntelMPI cmd: >>>>/opt/software/intel/impi/ 4.1.0.030/intel64/bin/mpiexec.hydra >>>>-machinefile machines.pYAvuK -n 32 -binding domain=core >>>>../osu_impi/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 >>>> >>>>ompi_yalla cmd: >>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-fca-v1.8.5/bin/mpirun >>>> -report-bindings -display-map -mca coll_hcoll_enable 1 -x >>>>HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x >>>>MXM_SHM_KCOPY_MODE=off --mca pml yalla --map-by core --bind-to core >>>>--hostfile hostlist >>>>../osu_ompi_hcoll/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 >>>> >>>>ompi_clear cmd: >>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-clear-v1.8.5/bin/mpirun >>>> -report-bindings -display-map --hostfile hostlist --map-by core >>>>--bind-to core >>>>../osu_ompi_clear/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0 >>>> >>>>I have attached output files to this letter: >>>>ompi_clear.out, ompi_clear.err - contains ompi_clear results >>>>ompi_yalla.out, ompi_yalla.err - contains ompi_yalla results >>>>impi.out, impi.err - contains intel MPI results >>>> >>>>Best regards, >>>>Timur >>>> >>>>Воскресенье, 7 июня 2015, 16:11 +03:00 от Alina Sklarevich < >>>>ali...@dev.mellanox.co.il >: >>>>>Hi Timur, >>>>> >>>>>After running the osu_mbw_mr benchmark in our lab, we obsereved that the >>>>>binding policy made a difference on the performance. >>>>>Can you please rerun your ompi tests with the following added to your >>>>>command line? (one of them in each run) >>>>> >>>>>1. --map-by node --bind-to socket >>>>>2. --map-by node --bind-to core >>>>> >>>>>Please attach your results. >>>>> >>>>>Thank you, >>>>>Alina. >>>>> >>>>>On Thu, Jun 4, 2015 at 6:53 PM, Timur Ismagilov < tismagi...@mail.ru > >>>>>wrote: >>>>>>Hello, Alina. >>>>>>1. Here is my >>>>>>ompi_yalla command line: >>>>>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x >>>>>>HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off >>>>>>--mca pml yalla --hostfile hostlist $@ >>>>>>echo $HPCX_MPI_DIR >>>>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ >>>>>> ompi-mellanox-fca-v1.8.5 >>>>>>This mpi was configured with: --with-mxm=/path/to/mxm >>>>>>--with-hcoll=/path/to/hcoll >>>>>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to/ >>>>>>ompi-mellanox-fca-v1.8.5 >>>>>>ompi_clear command line: >>>>>>HPCX_MPI_DIR/bin/mpirun --hostfile hostlist $@ >>>>>>echo $HPCX_MPI_DIR >>>>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ >>>>>> ompi-clear-v1.8.5 >>>>>>This mpi was configured with: >>>>>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to >>>>>>/ompi-clear-v1.8.5 >>>>>>2. When i run osu_mbr_mr with key "-x MXM_TLS=self,shm,rc" . It fails >>>>>>with segmentation fault : >>>>>>stdout log is in attached file osu_mbr_mr_n-2_ppn-16.out; >>>>>>stderr log is in attached file osu_mbr_mr_n-2_ppn-16.err; >>>>>>cmd line: >>>>>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x >>>>>>HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off >>>>>>--mca pml yalla -x MXM_TLS=self,shm,rc --hostfile hostlist osu_mbw_mr -v >>>>>>-r=0 >>>>>>osu_mbw_mr.c >>>>>>I have changed WINDOW_SIZES in osu_mbw_mr.c: >>>>>>#define WINDOW_SIZES {8, 16, 32, 64, 128, 256, 512, 1024 } >>>>>>3. I add results of running osu_mbw_mr with yalla and without hcoll on 32 >>>>>>and 64 nodes (512 and 1024 mpi procs >>>>>>) to mvs10p_mpi.xls : list osu_mbr_mr. >>>>>>The results are 20 percents smaller than old results (with hcoll). >>>>>> >>>>>> >>>>>> >>>>>>Среда, 3 июня 2015, 10:29 +03:00 от Alina Sklarevich < >>>>>>ali...@dev.mellanox.co.il >: >>>>>>>Hello Timur, >>>>>>> >>>>>>>I will review your results and try to reproduce them in our lab. >>>>>>> >>>>>>>You are using an old OFED - OFED-1.5.4.1 and we suspect that this may be >>>>>>>causing the performance issues you are seeing. >>>>>>> >>>>>>>In the meantime, could you please: >>>>>>> >>>>>>>1. send us the exact command lines that you were running when you got >>>>>>>these results? >>>>>>> >>>>>>>2. add the following to the command line that you are running with 'pml >>>>>>>yalla' and attach the results? >>>>>>>"-x MXM_TLS=self,shm,rc" >>>>>>> >>>>>>>3. run your command line with yalla and without hcoll? >>>>>>> >>>>>>>Thanks, >>>>>>>Alina. >>>>>>> >>>>>>> >>>>>>> >>>>>>>On Tue, Jun 2, 2015 at 4:56 PM, Timur Ismagilov < tismagi...@mail.ru > >>>>>>>wrote: >>>>>>>>Hi, Mike! >>>>>>>>I have impi v 4.1.2 (- impi) >>>>>>>>I build ompi 1.8.5 with MXM and hcoll (- ompi_yalla) >>>>>>>>I build ompi 1.8.5 without MXM and hcoll (- ompi_clear) >>>>>>>>I start osu p2p: osu_mbr_mr test with this MPIs. >>>>>>>>You can find the result of benchmark in attached file(mvs10p_mpi.xls: >>>>>>>>list osu_mbr_mr) >>>>>>>> >>>>>>>>On 64 nodes (and 1024 mpi processes) ompi_yalla get 2 time worse perf >>>>>>>>than ompi_clear. >>>>>>>>Is mxm with yalla reduces performance in p2p compared with >>>>>>>>ompi_clear(and impi)? >>>>>>>>Am I doing something wrong? >>>>>>>>P.S. My colleague Alexander Semenov is in CC >>>>>>>>Best regards, >>>>>>>>Timur >>>>>>>> >>>>>>>>Четверг, 28 мая 2015, 20:02 +03:00 от Mike Dubman < >>>>>>>>mi...@dev.mellanox.co.il >: >>>>>>>>>it is not apples-to-apples comparison. >>>>>>>>> >>>>>>>>>yalla/mxm is point-to-point library, it is not collective library. >>>>>>>>>collective algorithm happens on top of yalla. >>>>>>>>> >>>>>>>>>Intel collective algorithm for a2a is better than OMPI built-in >>>>>>>>>collective algorithm. >>>>>>>>> >>>>>>>>>To see benefit of yalla - you should run p2p benchmarks >>>>>>>>>(osu_lat/bw/bibw/mr) >>>>>>>>> >>>>>>>>> >>>>>>>>>On Thu, May 28, 2015 at 7:35 PM, Timur Ismagilov < tismagi...@mail.ru >>>>>>>>>> wrote: >>>>>>>>>>I compare ompi-1.8.5 (hpcx-1.3.3-icc) with impi v 4.1.4. >>>>>>>>>> >>>>>>>>>>I build ompi with MXM but without HCOLL and without knem (I work on >>>>>>>>>>it). Configure options are: >>>>>>>>>> ./configure --prefix=my_prefix >>>>>>>>>>--with-mxm=path/to/hpcx/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/mxm >>>>>>>>>> --with-platform=contrib/platform/mellanox/optimized >>>>>>>>>> >>>>>>>>>>As a result of the IMB-MPI1 Alltoall test, I have got disappointing >>>>>>>>>>results: for the most message sizes on 64 nodes and 16 processes per >>>>>>>>>>node impi is much (~40%) better. >>>>>>>>>> >>>>>>>>>>You can look at the results in the file "mvs10p_mpi.xlsx", I attach >>>>>>>>>>it. System configuration is also there. >>>>>>>>>> >>>>>>>>>>What do you think about? Is there any way to improve ompi yalla >>>>>>>>>>performance results? >>>>>>>>>> >>>>>>>>>>I attach the output of "IMB-MPI1 Alltoall" for yalla and impi. >>>>>>>>>> >>>>>>>>>>P.S. My colleague Alexander Semenov is in CC >>>>>>>>>> >>>>>>>>>>Best regards, >>>>>>>>>>Timur >>>>>>>>> >>>>>>>>> >>>>>>>>>-- >>>>>>>>> >>>>>>>>>Kind Regards, >>>>>>>>> >>>>>>>>>M. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>users mailing list >>>>>>>>us...@open-mpi.org >>>>>>>>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>Link to this post: >>>>>>>>http://www.open-mpi.org/community/lists/users/2015/06/27029.php >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> >> >>---------------------------------------------------------------------- >> >> >
osu_mbw_mr.c
Description: Binary data