I have recently upgraded from Ubuntu 12.04 to 14.04 and OpenMPI gives the 
following warning upon execution, which did not appear before the upgrade.

WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

Everything that I could find on google suggests to change log_num_mtt, but I 
cannot do this for the following reasons:
1. There is no log_num_mtt in /sys/module/mlx4_core/parameters/
2. Adding "options mlx4_core log_num_mtt=24" to /etc/modprobe.d/mlx4.conf 
doesn't seem to change anything
3. I am not sure how I can restart the driver because there is no 
"/etc/init.d/openibd" file (I've rebooted the system but it didn't do anything 
to create log_num_mtt)

[Template information]
1. OpenFabrics is from the Ubuntu distribution using "apt-get install 
infiniband-diags ibutils ibverbs-utils libmlx4-dev"
2. OS is Ubuntu 14.04 LTS
3. Subnet manager is from the Ubuntu distribution using "apt-get install opensm"
4. Output of ibv_devinfo is:
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.10.600
        node_guid:                      0002:c903:003d:52b0
        sys_image_guid:                 0002:c903:003d:52b3
        vendor_id:                      0x02c9
        vendor_part_id:                 4099
        hw_ver:                         0x0
        board_id:                       MT_1100120019
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand
5. Output of ifconfig for IB is
ib0       Link encap:UNSPEC  HWaddr 
80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00  
          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::202:c903:3d:52b1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:26 errors:0 dropped:0 overruns:0 frame:0
          TX packets:34 errors:0 dropped:16 overruns:0 carrier:0
          collisions:0 txqueuelen:256 
          RX bytes:5843 (5.8 KB)  TX bytes:4324 (4.3 KB)
6. ulimit -l is "unlimited"

Thanks,
Rio

Reply via email to