Bonjour,

 I am now at a loss with my running of OpenMPI (namely 1.4.3)
on a SGI Altix cluster with 2048 or 4096 cores, running over Infiniband.

After fixing several rather obvious failures with Ralph, Jeff and John help,
I am now facing the bottom of this story since :
- there are no more obvious failures with messages
- compared to the running of the application with SGI-MPT, the CPU performances I get
are very low, decreasing when the number of cores increases (cf below)
- these performances are highly reproducible
- I tried a very high number of -mca parameters, to no avail

 If I take as a reference the MPT CPU speed performance,
it is of about 900 (in some arbitrary unit), whatever the
number of cores I used (up to 8192).

 But, when running with OMPI, I get:
- 700 with 1024 cores (which is already rather low)
- 300 with 2048 cores
- 60   with 4096 cores.

The computing loop, over which the above CPU performance is evaluated, includes a stack of MPI exchanges [per core : 8 x (MPI_Isend + MPI_Irecv) + MPI_Waitall]

 The application is of the 'domain partition' type,
and the performances, together with the memory footprint,
are very identical on all  cores. The memory footprint is twice higher in
the OMPI case (1.5GB/core) than in the MPT case (0.7GB/core).

*What could be wrong with all these, please* ?

 I provided (in attachment) the '*ompi_info* -all ' output.
The *config.log* is in attachment as well.
I compiled OMPI with icc. I checked numa and affinity are OK.

I use the following command to run my OMPI app:
"mpiexec -mca btl_openib_rdma_pipeline_send_length 65536\
 -mca btl_openib_rdma_pipeline_frag_size 65536\
 -mca btl_openib_min_rdma_pipeline_size 65536\
 -mca btl_self_rdma_pipeline_send_length 262144\
 -mca btl_self_rdma_pipeline_frag_size 262144\
 -mca plm_rsh_num_concurrent 4096 -mca mpi_paffinity_alone 1\
 -mca mpi_leave_pinned 1 -mca btl_sm_max_send_size 128\
 -mca coll_tuned_pre_allocate_memory_comm_size_limit 128\
 -mca btl_openib_cq_size 128 -mca btl_ofud_rd_num 128\
 -mca mpool_rdma_rcache_size_limit 131072 -mca mpi_preconnect_mpi 0\
 -mca mpool_sm_min_size 131072 -mca mpi_abort_print_stack 1\
 -mca btl sm,openib,self -mca btl_openib_want_fork_support 0\
 -mca opal_set_max_sys_limits 1 -mca osc_pt2pt_no_locks 1\
 -mca osc_rdma_no_locks 1\
 $PBS_JOBDIR/phmc_tm_p2.$PBS_JOBID -v -f $Jinput".

*OpenIB info*:

1) OFED-1.4.1, installed by SGI SGI

2) Linux xxxxxx 2.6.16.60-0.42.10-smp #1 SMP Tue Apr 27 05:11:27 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux
OS : SGI ProPack 6SP5 for Linux, Build 605r1.sles10-0909302200

3) Running most probably an SGI subnet manager

4) > ibv_devinfo (on a worker node)
hca_id:    mlx4_0
    fw_ver:                2.7.000
    node_guid:            0030:48ff:ffcc:4c44
    sys_image_guid:            0030:48ff:ffcc:4c47
    vendor_id:            0x02c9
    vendor_part_id:            26418
    hw_ver:                0xA0
    board_id:            SM_2071000001000
    phys_port_cnt:            2
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        2048 (4)
            active_mtu:        2048 (4)
            sm_lid:            1
            port_lid:        6009
            port_lmc:        0x00

        port:    2
            state:            PORT_ACTIVE (4)
            max_mtu:        2048 (4)
            active_mtu:        2048 (4)
            sm_lid:            1
            port_lid:        6010
            port_lmc:        0x00

5) > ifconfig -a (on a worker node)
eth0      Link encap:Ethernet  HWaddr 00:30:48:CE:73:30
inet adr:192.168.159.10 Bcast:192.168.159.255 Masque:255.255.255.0
          adr inet6: fe80::230:48ff:fece:7330/64 Scope:Lien
          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:32337499 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34733462 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000
RX bytes:11486224753 (10954.1 Mb) TX bytes:16450996864 (15688.8 Mb)
          Mémoire:fbc60000-fbc80000

eth1      Link encap:Ethernet  HWaddr 00:30:48:CE:73:31
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Mémoire:fbce0000-fbd00000

ib0 Link encap:UNSPEC HWaddr 80-00-00-48-FE-C0-00-00-00-00-00-00-00-00-00-00
          inet adr:10.148.9.198  Bcast:10.148.255.255  Masque:255.255.0.0
          adr inet6: fe80::230:48ff:ffcc:4c45/64 Scope:Lien
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:115055101 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5390843 errors:0 dropped:182 overruns:0 carrier:0
          collisions:0 lg file transmission:256
RX bytes:49592870352 (47295.4 Mb) TX bytes:43566897620 (41548.6 Mb)

ib1 Link encap:UNSPEC HWaddr 80-00-00-49-FE-C0-00-00-00-00-00-00-00-00-00-00
          inet adr:10.149.9.198  Bcast:10.149.255.255  Masque:255.255.0.0
          adr inet6: fe80::230:48ff:ffcc:4c46/64 Scope:Lien
          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1
          RX packets:673448 errors:0 dropped:0 overruns:0 frame:0
          TX packets:187 errors:0 dropped:5 overruns:0 carrier:0
          collisions:0 lg file transmission:256
          RX bytes:37713088 (35.9 Mb)  TX bytes:11228 (10.9 Kb)

lo        Link encap:Boucle locale
          inet adr:127.0.0.1  Masque:255.0.0.0
          adr inet6: ::1/128 Scope:Hôte
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:33504149 errors:0 dropped:0 overruns:0 frame:0
          TX packets:33504149 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:0
          RX bytes:5100850397 (4864.5 Mb)  TX bytes:5100850397 (4864.5 Mb)

sit0      Link encap:IPv6-dans-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

6) > limit (on a worker node)
cputime      unlimited
filesize     unlimited
datasize     unlimited
stacksize    300000 kbytes
coredumpsize 0 kbytes
memoryuse    unlimited
vmemoryuse   unlimited
descriptors  16384
memorylocked unlimited
maxproc      303104

 If some info is still missing despite all my efforts, please ask.

 Thanks in advance for any hints,   Best,      G.


Attachment: config.log.gz
Description: GNU Zip compressed data

Attachment: ompi_info-all.001.gz
Description: GNU Zip compressed data

Reply via email to