BTW - after you get more comfortable with your new-to-you cluster, I recommend you upgrade your Open MPI installation. v1.2.8 has a lot of bugfixes relative to v1.2.2. Also, Open MPI 1.3 should be available "next month"... so watch for an announcement on that front.
On Thu, Nov 20, 2008 at 3:16 PM, Michael Oevermann <michael.oeverm...@tu-berlin.de> wrote: > Hi Ralph, > > that was indeed a typo, the command is of course > > /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile > /home/sysgen/infiniband-mpi-test/machine > /usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 > > with a blank behind /machine. Anyway, your suggested options -mca btl > openib,sm,self > did help!!! Right now I am not able to check the performance results as the > cluster is busy with jobs so I cannot > compare with the old benchmark results. > > Thanks for help! > > Michael > > > Ralph Castain schrieb: >> >> Your command line may have just come across with a typo, but something >> isn't right: >> >> -hostfile >> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 >> >> That looks more like a path to a binary than a path to a hostfile. Is >> there a missing space or filename somewhere? >> >> If not, then I would have expected this to error out since the argument >> would be taken as the hostfile, leaving no executable specified. >> >> If you get that straightened out, then try adding -mca btl openib,sm,self >> to the cmd line. This will direct mpirun to use only the OpenIB, shared >> memory, and loopback transports, so you shouldn't pick up uDAPL any more. >> >> Ralph >> >> >> On Nov 20, 2008, at 12:38 PM, Michael Oevermann wrote: >> >>> Hi all, >>> >>> I have "inherited" a small cluster with a head node and four compute >>> nodes which I have to administer. The nodes are connected via infiniband >>> (OFED), but the head is not. I am a complete novice to the infiniband stuff >>> and here is my problem: >>> >>> The infiniband configuration seems to be OK. The usual tests suggested in >>> the OFED install guide give the expected output, e.g. >>> >>> >>> ibv_devinfo on the nodes: >>> >>> >>> ************************* oscar_cluster ************************* >>> --------- n01--------- >>> hca_id: mthca0 >>> fw_ver: 1.2.0 >>> node_guid: 0002:c902:0025:930c >>> sys_image_guid: 0002:c902:0025:930f >>> vendor_id: 0x02c9 >>> vendor_part_id: 25204 >>> hw_ver: 0xA0 >>> board_id: MT_03B0140001 >>> phys_port_cnt: 1 >>> port: 1 >>> state: PORT_ACTIVE (4) >>> max_mtu: 2048 (4) >>> active_mtu: 2048 (4) >>> sm_lid: 2 >>> port_lid: 1 >>> port_lmc: 0x00 >>> >>> etc. for the other nodes. >>> >>> sminfo on the nodes: >>> >>> ************************* oscar_cluster ************************* >>> --------- n01--------- >>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6881 priority 0 >>> state 3 SMINFO_MASTER >>> --------- n02--------- >>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6882 priority 0 >>> state 3 SMINFO_MASTER >>> --------- n03--------- >>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6883 priority 0 >>> state 3 SMINFO_MASTER >>> --------- n04--------- >>> sminfo: sm lid 2 sm guid 0x2c90200259201, activity count 6884 priority 0 >>> state 3 SMINFO_MASTER >>> >>> >>> >>> However, when I directly start a mpi job (without using a scheduler) via: >>> >>> /usr/mpi/gcc4/openmpi-1.2.2-1/bin/mpirun -np 4 -hostfile >>> /home/sysgen/infiniband-mpi-test/machine/usr/mpi/gcc4/openmpi-1.2.2-1/tests/IMB-2.3/IMB-MPI1 >>> >>> I get the error message: >>> >>> 0,1,0]: uDAPL on host n01 was unable to find any NICs. >>> Another transport will be used instead, although this may result in >>> lower performance. >>> >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> [0,1,2]: uDAPL on host n01 was unable to find any NICs. >>> Another transport will be used instead, although this may result in >>> lower performance. >>> >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> [0,1,3]: uDAPL on host n02 was unable to find any NICs. >>> Another transport will be used instead, although this may result in >>> lower performance. >>> >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> [0,1,1]: uDAPL on host n02 was unable to find any NICs. >>> Another transport will be used instead, although this may result in >>> lower performance. >>> >>> -------------------------------------------------------------------------- >>> MPI with normal GB Etherrnet and IP networking just works fine, but the >>> infinband doesn't. The MPI libs I am using >>> for the test are definitely compiled with IB support and the tests have >>> been run successfully on >>> the cluster before. >>> >>> Any suggestions what is going wrong here? >>> >>> Best regards and thanks for any help! >>> >>> Michael >>> >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/