Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

Ralph Castain Wed, 5 Aug 2009 09:23:18 -0400

Could you send us the mpirun cmd line? I wonder if you are missing some
options that could help. Also, you might:


(a) upgrade to 1.3.3 - it looks like you are using some kind of pre-release
version

(b) add -mca mpi_show_mca_params env,file - this will cause rank=0 to output
what mca params it sees, and where they came from

(c) check that you built a non-debug version, and remembered to compile your
application with a -O3 flag - i.e., "mpicc -O3 ...". Remember, OMPI does not
automatically add optimization flags to mpicc!

Thanks
Ralph


On Wed, Aug 5, 2009 at 7:15 AM, Torgny Faxen <fa...@nsc.liu.se> wrote:

> Pasha,
> no collectives are being used.
>
> A simple grep in the code reveals the following MPI functions being used:
> MPI_Init
> MPI_wtime
> MPI_COMM_RANK
> MPI_COMM_SIZE
> MPI_BUFFER_ATTACH
> MPI_BSEND
> MPI_PACK
> MPI_UNPACK
> MPI_PROBE
> MPI_GET_COUNT
> MPI_RECV
> MPI_IPROBE
> MPI_FINALIZE
>
> where MPI_IPROBE is the clear winner in terms of number of calls.
>
> /Torgny
>
>
> Pavel Shamis (Pasha) wrote:
>
>> Do you know if the application use some collective operations ?
>>
>> Thanks
>>
>> Pasha
>>
>> Torgny Faxen wrote:
>>
>>> Hello,
>>> we are seeing a large difference in performance for some applications
>>> depending on what MPI is being used.
>>>
>>> Attached are performance numbers and oprofile output (first 30 lines)
>>> from one out of 14 nodes from one application run using OpenMPI, IntelMPI
>>> and Scali MPI respectively.
>>>
>>> Scali MPI is faster the other two MPI:s with a factor of 1.6 and 1.75:
>>>
>>> ScaliMPI: walltime for the whole application is 214 seconds
>>> OpenMPI: walltime for the whole application is 376 seconds
>>> Intel MPI: walltime for the whole application is 346 seconds.
>>>
>>> The application is running with the main send receive commands being:
>>> MPI_Bsend
>>> MPI_Iprobe followed by MPI_Recv (in case of there being a message). Quite
>>> often MPI_Iprobe is being called just to check whether there is a certain
>>> message pending.
>>>
>>> Any idea on tuning tips, performance analysis, code modifications to
>>> improve the OpenMPI performance? A lot of time is being spent in
>>> "mca_btl_sm_component_progress", "btl_openib_component_progress" and other
>>> internal routines.
>>>
>>> The code is running on a cluster with 140 HP ProLiant DL160 G5 compute
>>> servers. Infiniband interconnect. Intel Xeon E5462 processors. The profiled
>>> application is using 144 cores on 18 nodes over Infiniband.
>>>
>>> Regards / Torgny
>>> =====================================================================================================================0
>>>
>>> OpenMPI  1.3b2
>>> =====================================================================================================================0
>>>
>>>
>>> Walltime: 376 seconds
>>>
>>> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
>>> Profiling through timer interrupt
>>> samples  %        image name               app name
>>> symbol name
>>> 668288   22.2113  mca_btl_sm.so            rco2.24pe
>>>  mca_btl_sm_component_progress
>>> 441828   14.6846  rco2.24pe                rco2.24pe                step_
>>> 335929   11.1650  libmlx4-rdmav2.so        rco2.24pe                (no
>>> symbols)
>>> 301446   10.0189  mca_btl_openib.so        rco2.24pe
>>>  btl_openib_component_progress
>>> 161033    5.3521  libopen-pal.so.0.0.0     rco2.24pe
>>>  opal_progress
>>> 157024    5.2189  libpthread-2.5.so        rco2.24pe
>>>  pthread_spin_lock
>>> 99526     3.3079  no-vmlinux               no-vmlinux               (no
>>> symbols)
>>> 93887     3.1204  mca_btl_sm.so            rco2.24pe
>>>  opal_using_threads
>>> 69979     2.3258  mca_pml_ob1.so           rco2.24pe
>>>  mca_pml_ob1_iprobe
>>> 58895     1.9574  mca_bml_r2.so            rco2.24pe
>>>  mca_bml_r2_progress
>>> 55095     1.8311  mca_pml_ob1.so           rco2.24pe
>>>  mca_pml_ob1_recv_request_match_wild
>>> 49286     1.6381  rco2.24pe                rco2.24pe
>>>  tracer_
>>> 41946     1.3941  libintlc.so.5            rco2.24pe
>>>  __intel_new_memcpy
>>> 40730     1.3537  rco2.24pe                rco2.24pe
>>>  scobi_
>>> 36586     1.2160  rco2.24pe                rco2.24pe
>>>  state_
>>> 20986     0.6975  rco2.24pe                rco2.24pe                diag_
>>> 19321     0.6422  libmpi.so.0.0.0          rco2.24pe
>>>  PMPI_Unpack
>>> 18552     0.6166  libmpi.so.0.0.0          rco2.24pe
>>>  PMPI_Iprobe
>>> 17323     0.5757  rco2.24pe                rco2.24pe
>>>  clinic_
>>> 16194     0.5382  rco2.24pe                rco2.24pe
>>>  k_epsi_
>>> 15330     0.5095  libmpi.so.0.0.0          rco2.24pe
>>>  PMPI_Comm_f2c
>>> 13778     0.4579  libmpi_f77.so.0.0.0      rco2.24pe
>>>  mpi_iprobe_f
>>> 13241     0.4401  rco2.24pe                rco2.24pe
>>>  s_recv_
>>> 12386     0.4117  rco2.24pe                rco2.24pe
>>>  growth_
>>> 11699     0.3888  rco2.24pe                rco2.24pe
>>>  testnrecv_
>>> 11268     0.3745  libmpi.so.0.0.0          rco2.24pe
>>>  mca_pml_base_recv_request_construct
>>> 10971     0.3646  libmpi.so.0.0.0          rco2.24pe
>>>  ompi_convertor_unpack
>>> 10034     0.3335  mca_pml_ob1.so           rco2.24pe
>>>  mca_pml_ob1_recv_request_match_specific
>>> 10003     0.3325  libimf.so                rco2.24pe                exp.L
>>> 9375      0.3116  rco2.24pe                rco2.24pe
>>>  subbasin_
>>> 8912      0.2962  libmpi_f77.so.0.0.0      rco2.24pe
>>>  mpi_unpack_f
>>>
>>>
>>>
>>> =====================================================================================================================0
>>>
>>> Intel MPI, version 3.2.0.011/
>>> =====================================================================================================================0
>>>
>>>
>>> Walltime: 346 seconds
>>>
>>> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
>>> Profiling through timer interrupt
>>> samples  %        image name               app name
>>> symbol name
>>> 486712   17.7537  rco2                     rco2                     step_
>>> 431941   15.7558  no-vmlinux               no-vmlinux               (no
>>> symbols)
>>> 212425    7.7486  libmpi.so.3.2            rco2
>>> MPIDI_CH3U_Recvq_FU
>>> 188975    6.8932  libmpi.so.3.2            rco2
>>> MPIDI_CH3I_RDSSM_Progress
>>> 172855    6.3052  libmpi.so.3.2            rco2
>>> MPIDI_CH3I_read_progress
>>> 121472    4.4309  libmpi.so.3.2            rco2
>>> MPIDI_CH3I_SHM_read_progress
>>> 64492     2.3525  libc-2.5.so              rco2
>>> sched_yield
>>> 52372     1.9104  rco2                     rco2
>>> tracer_
>>> 48621     1.7735  libmpi.so.3.2            rco2                     .plt
>>> 45475     1.6588  libmpiif.so.3.2          rco2
>>> pmpi_iprobe__
>>> 44082     1.6080  libmpi.so.3.2            rco2
>>> MPID_Iprobe
>>> 42788     1.5608  libmpi.so.3.2            rco2
>>> MPIDI_CH3_Stop_recv
>>> 42754     1.5595  libpthread-2.5.so        rco2
>>> pthread_mutex_lock
>>> 42190     1.5390  libmpi.so.3.2            rco2
>>> PMPI_Iprobe
>>> 41577     1.5166  rco2                     rco2
>>> scobi_
>>> 40356     1.4721  libmpi.so.3.2            rco2
>>> MPIDI_CH3_Start_recv
>>> 38582     1.4073  libdaplcma.so.1.0.2      rco2                     (no
>>> symbols)
>>> 37545     1.3695  rco2                     rco2
>>> state_
>>> 35597     1.2985  libc-2.5.so              rco2                     free
>>> 34019     1.2409  libc-2.5.so              rco2
>>> malloc
>>> 31841     1.1615  rco2                     rco2
>>> s_recv_
>>> 30955     1.1291  libmpi.so.3.2            rco2
>>> __I_MPI___intel_new_memcpy
>>> 27876     1.0168  libc-2.5.so              rco2
>>> _int_malloc
>>> 26963     0.9835  rco2                     rco2
>>> testnrecv_
>>> 23005     0.8391  libpthread-2.5.so        rco2
>>> __pthread_mutex_unlock_usercnt
>>> 22290     0.8131  libmpi.so.3.2            rco2
>>> MPID_Segment_manipulate
>>> 22086     0.8056  libmpi.so.3.2            rco2
>>> MPIDI_CH3I_read_progress_expected
>>> 19146     0.6984  rco2                     rco2                     diag_
>>> 18250     0.6657  rco2                     rco2
>>> clinic_
>>> =====================================================================================================================0
>>>
>>> Scali MPI, version 3.13.10-59413
>>> =====================================================================================================================0
>>>
>>>
>>> Walltime:
>>>
>>> CPU: CPU with timer interrupt, speed 0 MHz (estimated)
>>> Profiling through timer interrupt
>>> samples  %        image name               app name
>>> symbol name
>>> 484267   30.0664  rco2.24pe                rco2.24pe                step_
>>> 111949    6.9505  libmlx4-rdmav2.so        rco2.24pe                (no
>>> symbols)
>>> 73930     4.5900  libmpi.so                rco2.24pe
>>>  scafun_rq_handle_body
>>> 57846     3.5914  libmpi.so                rco2.24pe
>>>  invert_decode_header
>>> 55836     3.4667  libpthread-2.5.so        rco2.24pe
>>>  pthread_spin_lock
>>> 53703     3.3342  rco2.24pe                rco2.24pe
>>>  tracer_
>>> 40934     2.5414  rco2.24pe                rco2.24pe
>>>  scobi_
>>> 40244     2.4986  libmpi.so                rco2.24pe
>>>  scafun_request_probe_handler
>>> 37399     2.3220  rco2.24pe                rco2.24pe
>>>  state_
>>> 30455     1.8908  libmpi.so                rco2.24pe
>>>  invert_matchandprobe
>>> 29707     1.8444  no-vmlinux               no-vmlinux               (no
>>> symbols)
>>> 29147     1.8096  libmpi.so                rco2.24pe
>>>  FMPI_scafun_Iprobe
>>> 27969     1.7365  libmpi.so                rco2.24pe
>>>  decode_8_u_64
>>> 27475     1.7058  libmpi.so                rco2.24pe
>>>  scafun_rq_anysrc_fair_one
>>> 25966     1.6121  libmpi.so                rco2.24pe
>>>  scafun_uxq_probe
>>> 24380     1.5137  libc-2.5.so              rco2.24pe
>>>  memcpy
>>> 22615     1.4041  libmpi.so                rco2.24pe                .plt
>>> 21172     1.3145  rco2.24pe                rco2.24pe                diag_
>>> 20716     1.2862  libc-2.5.so              rco2.24pe
>>>  memset
>>> 18565     1.1526  libmpi.so                rco2.24pe
>>>  openib_wrapper_poll_cq
>>> 18192     1.1295  rco2.24pe                rco2.24pe
>>>  clinic_
>>> 17135     1.0638  libmpi.so                rco2.24pe
>>>  PMPI_Iprobe
>>> 16685     1.0359  rco2.24pe                rco2.24pe
>>>  k_epsi_
>>> 16236     1.0080  libmpi.so                rco2.24pe
>>>  PMPI_Unpack
>>> 15563     0.9662  libmpi.so                rco2.24pe
>>>  scafun_r_rq_append
>>> 14829     0.9207  libmpi.so                rco2.24pe
>>>  scafun_rq_test_finished
>>> 13349     0.8288  rco2.24pe                rco2.24pe
>>>  s_recv_
>>> 12490     0.7755  libmpi.so                rco2.24pe
>>>  flop_matchandprobe
>>> 12427     0.7715  libibverbs.so.1.0.0      rco2.24pe                (no
>>> symbols)
>>> 12272     0.7619  libmpi.so                rco2.24pe
>>>  scafun_rq_handle
>>> 12146     0.7541  rco2.24pe                rco2.24pe
>>>  growth_
>>> 10175     0.6317  libmpi.so                rco2.24pe
>>>  wrp2p_test_finished
>>> 9888      0.6139  libimf.so                rco2.24pe                exp.L
>>> 9179      0.5699  rco2.24pe                rco2.24pe
>>>  subbasin_
>>> 9082      0.5639  rco2.24pe                rco2.24pe
>>>  testnrecv_
>>> 8901      0.5526  libmpi.so                rco2.24pe
>>>  openib_wrapper_purge_requests
>>> 7425      0.4610  rco2.24pe                rco2.24pe
>>>  scobimain_
>>> 7378      0.4581  rco2.24pe                rco2.24pe
>>>  scobi_interface_
>>> 6530      0.4054  rco2.24pe                rco2.24pe
>>>  setvbc_
>>> 6471      0.4018  libfmpi.so               rco2.24pe
>>>  pmpi_iprobe
>>> 6341      0.3937  rco2.24pe                rco2.24pe                snap_
>>>
>>>
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>
> --
> ---------------------------------------------------------
>  Torgny Faxén
>  National Supercomputer Center
>  Linköping University
>  S-581 83 Linköping
>  Sweden
>
>  Email:fa...@nsc.liu.se <email%3afa...@nsc.liu.se>
>  Telephone: +46 13 285798 (office) +46 13 282535  (fax)
>  http://www.nsc.liu.se
> ---------------------------------------------------------
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

Reply via email to