Re: [OMPI users] Fwd[2]: OMPI yalla vs impi

Timur Ismagilov Fri, 19 Jun 2015 04:54:14 -0400 (EDT)

Hello, Alina!

I use "OSU MPI Multiple Bandwidth / Message Rate Test v4.4.1". 
I downloaded it from the website: http://mvapich.cse.ohio-state.edu/benchmarks/
I have attached "osu_mbw_mr.c" to this letter.
Best regards,
Timur


Четверг, 18 июня 2015, 18:23 +03:00 от Alina Sklarevich 
<ali...@dev.mellanox.co.il>:
>Hi Timur,
>
>Can you please tell me which osu version you are using?
>Unless it is from HPCX, please attach the source file of osu_mbw_mr.c you are 
>using.
>
>Thank you,
>Alina.
>
>On Tue, Jun 16, 2015 at 7:10 PM, Timur Ismagilov  < tismagi...@mail.ru > wrote:
>>I'm sorry, I'm forget to attach results.
>>With ' --bind-to socket' i get the same results as '--bind-to-core' : 3813 
>>MB/s.
>>I have attached ompi_yalla_socket.out and ompi_yalla_socket.err files to this 
>>letter.
>>
>>Best regards,
>>Timur
>>
>>
>>Вторник, 16 июня 2015, 18:15 +03:00 от Alina Sklarevich < 
>>ali...@dev.mellanox.co.il >:
>>>Hi Timur,
>>>
>>>Can you please try running your  ompi_yalla cmd with ' --bind-to socket' 
>>>(instead of binding to core) and check if it affects the results?
>>>We saw that it made a difference on the performance in our lab so that's why 
>>>I asked you to try the same.
>>>
>>>Thanks,
>>>Alina.
>>>
>>>On Tue, Jun 16, 2015 at 5:53 PM, Timur Ismagilov  < tismagi...@mail.ru > 
>>>wrote:
>>>>Hello, Alina!
>>>>
>>>>If I use  --map-by node I will get only intranode communications on 
>>>>osu_mbw_mr. I use --map-by core instead.
>>>>
>>>>I have 2 nodes, each node has 2 sockets with 8 cores per socket.
>>>>
>>>>When I run osu_mbw_mr on 2 nodes with 32 MPI procs (command see below), I  
>>>>expect to see the unidirectional bandwidth of 4xFDR  link as a result  of 
>>>>this test.
>>>>
>>>>With IntelMPI I get 6367 MB/s, 
>>>>With ompi_yalla I get about 3744 MB/s (problem: it is a half of impi result)
>>>>With openmpi without mxm (ompi_clear) I get 6321 MB/s.
>>>>
>>>>How can I increase yalla results?
>>>>
>>>>IntelMPI cmd:
>>>>/opt/software/intel/impi/ 4.1.0.030/intel64/bin/mpiexec.hydra   
>>>>-machinefile machines.pYAvuK -n 32 -binding domain=core  
>>>>../osu_impi/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v -r=0
>>>>
>>>>ompi_yalla cmd:
>>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-fca-v1.8.5/bin/mpirun
>>>>  -report-bindings -display-map -mca coll_hcoll_enable 1 -x  
>>>>HCOLL_MAIN_IB=mlx4_0:1 -x     MXM_IB_PORTS=mlx4_0:1 -x  
>>>>MXM_SHM_KCOPY_MODE=off --mca pml yalla --map-by core --bind-to core  
>>>>--hostfile hostlist  
>>>>../osu_ompi_hcoll/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v  -r=0
>>>>
>>>>ompi_clear cmd:
>>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-clear-v1.8.5/bin/mpirun
>>>>  -report-bindings -display-map --hostfile hostlist --map-by core  
>>>>--bind-to core  
>>>>../osu_ompi_clear/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_mbw_mr -v  -r=0
>>>>
>>>>I have attached output files to this letter:
>>>>ompi_clear.out, ompi_clear.err - contains ompi_clear results
>>>>ompi_yalla.out, ompi_yalla.err - contains ompi_yalla results
>>>>impi.out, impi.err - contains intel MPI results
>>>>
>>>>Best regards,
>>>>Timur
>>>>
>>>>Воскресенье,  7 июня 2015, 16:11 +03:00 от Alina Sklarevich < 
>>>>ali...@dev.mellanox.co.il >:
>>>>>Hi Timur,
>>>>>
>>>>>After running the osu_mbw_mr benchmark in our lab, we obsereved that the 
>>>>>binding policy made a difference on the performance.
>>>>>Can you please rerun your ompi tests with the following added to your 
>>>>>command line? (one of them in each run)
>>>>>
>>>>>1. --map-by node --bind-to socket
>>>>>2. --map-by node --bind-to core
>>>>>
>>>>>Please attach your results.
>>>>>
>>>>>Thank you,
>>>>>Alina.
>>>>>
>>>>>On Thu, Jun 4, 2015 at 6:53 PM, Timur Ismagilov  < tismagi...@mail.ru > 
>>>>>wrote:
>>>>>>Hello, Alina.
>>>>>>1. Here is my 
>>>>>>ompi_yalla command line:
>>>>>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x 
>>>>>>HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off 
>>>>>>--mca pml yalla --hostfile hostlist $@
>>>>>>echo $HPCX_MPI_DIR 
>>>>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/
>>>>>> ompi-mellanox-fca-v1.8.5
>>>>>>This mpi was configured with: --with-mxm=/path/to/mxm 
>>>>>>--with-hcoll=/path/to/hcoll 
>>>>>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to/ 
>>>>>>ompi-mellanox-fca-v1.8.5
>>>>>>ompi_clear command line:
>>>>>>HPCX_MPI_DIR/bin/mpirun  --hostfile hostlist $@
>>>>>>echo $HPCX_MPI_DIR 
>>>>>>/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/
>>>>>> ompi-clear-v1.8.5
>>>>>>This mpi was configured with: 
>>>>>>--with-platform=contrib/platform/mellanox/optimized --prefix=/path/to 
>>>>>>/ompi-clear-v1.8.5
>>>>>>2. When i run osu_mbr_mr with key "-x MXM_TLS=self,shm,rc" . It fails 
>>>>>>with segmentation fault : 
>>>>>>stdout log is in attached file osu_mbr_mr_n-2_ppn-16.out; 
>>>>>>stderr log is in attached file osu_mbr_mr_n-2_ppn-16.err;
>>>>>>cmd line:
>>>>>>$HPCX_MPI_DIR/bin/mpirun -mca coll_hcoll_enable 1 -x 
>>>>>>HCOLL_MAIN_IB=mlx4_0:1 -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off 
>>>>>>--mca pml yalla -x MXM_TLS=self,shm,rc --hostfile hostlist osu_mbw_mr -v 
>>>>>>-r=0
>>>>>>osu_mbw_mr.c
>>>>>>I have changed WINDOW_SIZES in osu_mbw_mr.c:
>>>>>>#define WINDOW_SIZES {8, 16, 32, 64,  128, 256, 512, 1024 }  
>>>>>>3. I add results of running osu_mbw_mr with yalla and without hcoll on 32 
>>>>>>and 64 nodes (512 and 1024 mpi procs
>>>>>>) to  mvs10p_mpi.xls : list osu_mbr_mr.
>>>>>>The results are 20 percents smaller than old results (with hcoll).
>>>>>>
>>>>>>
>>>>>>
>>>>>>Среда,  3 июня 2015, 10:29 +03:00 от Alina Sklarevich < 
>>>>>>ali...@dev.mellanox.co.il >:
>>>>>>>Hello Timur,
>>>>>>>
>>>>>>>I will review your results and try to reproduce them in our lab.
>>>>>>>
>>>>>>>You are using an old OFED - OFED-1.5.4.1 and we suspect that this may be 
>>>>>>>causing the performance issues you are seeing.
>>>>>>>
>>>>>>>In the meantime, could you please:
>>>>>>>
>>>>>>>1. send us the exact command lines that you were running when you got 
>>>>>>>these results?
>>>>>>>
>>>>>>>2. add the following to the command line that you are running with 'pml 
>>>>>>>yalla' and attach the results?
>>>>>>>"-x MXM_TLS=self,shm,rc"
>>>>>>>
>>>>>>>3. run your command line with yalla and without hcoll?
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Alina.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On Tue, Jun 2, 2015 at 4:56 PM, Timur Ismagilov  < tismagi...@mail.ru > 
>>>>>>>wrote:
>>>>>>>>Hi, Mike!
>>>>>>>>I have impi v 4.1.2 (- impi)
>>>>>>>>I build ompi 1.8.5 with MXM and hcoll (- ompi_yalla)
>>>>>>>>I build ompi 1.8.5 without MXM and hcoll (- ompi_clear)
>>>>>>>>I start osu p2p: osu_mbr_mr test with this MPIs.
>>>>>>>>You can find the result of benchmark in attached file(mvs10p_mpi.xls: 
>>>>>>>>list osu_mbr_mr)
>>>>>>>>
>>>>>>>>On 64 nodes (and 1024 mpi processes) ompi_yalla get 2 time worse perf 
>>>>>>>>than ompi_clear.
>>>>>>>>Is mxm with yalla  reduces performance in p2p  compared with 
>>>>>>>>ompi_clear(and impi)?
>>>>>>>>Am  I  doing something wrong?
>>>>>>>>P.S. My colleague Alexander Semenov is in CC
>>>>>>>>Best regards,
>>>>>>>>Timur
>>>>>>>>
>>>>>>>>Четверг, 28 мая 2015, 20:02 +03:00 от Mike Dubman < 
>>>>>>>>mi...@dev.mellanox.co.il >:
>>>>>>>>>it is not apples-to-apples comparison.
>>>>>>>>>
>>>>>>>>>yalla/mxm is point-to-point library, it is not collective library.
>>>>>>>>>collective algorithm happens on top of yalla.
>>>>>>>>>
>>>>>>>>>Intel collective algorithm for a2a is better than OMPI built-in 
>>>>>>>>>collective algorithm.
>>>>>>>>>
>>>>>>>>>To see benefit of yalla - you should run p2p benchmarks 
>>>>>>>>>(osu_lat/bw/bibw/mr)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>On Thu, May 28, 2015 at 7:35 PM, Timur Ismagilov  < tismagi...@mail.ru 
>>>>>>>>>> wrote:
>>>>>>>>>>I compare ompi-1.8.5 (hpcx-1.3.3-icc) with impi v 4.1.4.
>>>>>>>>>>
>>>>>>>>>>I build ompi with MXM but without HCOLL and without  knem (I work on 
>>>>>>>>>>it). Configure options are:
>>>>>>>>>> ./configure  --prefix=my_prefix   
>>>>>>>>>>--with-mxm=path/to/hpcx/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/mxm
>>>>>>>>>>   --with-platform=contrib/platform/mellanox/optimized
>>>>>>>>>>
>>>>>>>>>>As a result of the IMB-MPI1 Alltoall test, I have got disappointing  
>>>>>>>>>>results: for the most message sizes on 64 nodes and 16 processes per  
>>>>>>>>>>node impi is much (~40%) better.
>>>>>>>>>>
>>>>>>>>>>You can look at the results in the file "mvs10p_mpi.xlsx", I attach 
>>>>>>>>>>it. System configuration is also there.
>>>>>>>>>>
>>>>>>>>>>What do you think about? Is there any way to improve ompi yalla 
>>>>>>>>>>performance results?
>>>>>>>>>>
>>>>>>>>>>I attach the output of  "IMB-MPI1 Alltoall" for yalla and impi.
>>>>>>>>>>
>>>>>>>>>>P.S. My colleague Alexander Semenov is in CC
>>>>>>>>>>
>>>>>>>>>>Best regards,
>>>>>>>>>>Timur
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>-- 
>>>>>>>>>
>>>>>>>>>Kind Regards,
>>>>>>>>>
>>>>>>>>>M.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>users mailing list
>>>>>>>>us...@open-mpi.org
>>>>>>>>Subscription:  http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>Link to this post:  
>>>>>>>>http://www.open-mpi.org/community/lists/users/2015/06/27029.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>>
>>----------------------------------------------------------------------
>>
>>
>

osu_mbw_mr.c
Description: Binary data

Re: [OMPI users] Fwd[2]: OMPI yalla vs impi

Reply via email to