Hi all, I have "inherited" a small cluster with a head node and four compute nodes which I have to administer. The nodes are connected via infiniband (OFED). When I do a
cexec :1-4 ibstatus I get someinformation indicating that the infiniband is sort of available: ************************* oscar_cluster ************************* --------- n01--------- Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0025:930d base lid: 0x1 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 10 Gb/sec (4X) --------- n02--------- Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0025:931d base lid: 0x3 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 10 Gb/sec (4X) --------- n03--------- default gid: fe80:0000:0000:0000:0002:c902:0025:9321 base lid: 0x5 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 10 Gb/sec (4X) --------- n04--------- Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0002:c902:0025:9201 base lid: 0x2 sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 10 Gb/sec (4X) However, when I start runing an mpi job I get the following message indicating that the infiniband is not working (I am definitely using the mpi-libs compiled with infiniband support): [0,1,0]: uDAPL on host n01 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,2]: uDAPL on host n01 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,3]: uDAPL on host n02 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- -------------------------------------------------------------------------- [0,1,1]: uDAPL on host n02 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. -------------------------------------------------------------------------- I am a complete novice in the infiniband area, so can anybody give me some advise what's going wrong here and how to get the jobs running with infiniband? Thanks for any help Michael ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users