Hi,
When u test clusters via OSCAR WIZARD , all of test became successfull?!
Did u test other MPIs too?!
On Thu, Nov 20, 2008 at 1:41 PM, Michael Oevermann <
[EMAIL PROTECTED]> wrote:
> Hi all,
> I have "inherited" a small cluster with a head node and four compute
> nodes which
> I have to administer. The nodes are connected via infiniband (OFED). When
> I
> do a
>
> cexec :1-4 ibstatus
>
> I get someinformation indicating that the infiniband is sort of available:
>
> ************************* oscar_cluster *************************
> --------- n01---------
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0002:c902:0025:930d
> base lid: 0x1
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 10 Gb/sec (4X)
>
> --------- n02---------
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0002:c902:0025:931d
> base lid: 0x3
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 10 Gb/sec (4X)
>
> --------- n03---------
> default gid: fe80:0000:0000:0000:0002:c902:0025:9321
> base lid: 0x5
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 10 Gb/sec (4X)
>
> --------- n04---------
> Infiniband device 'mthca0' port 1 status:
> default gid: fe80:0000:0000:0000:0002:c902:0025:9201
> base lid: 0x2
> sm lid: 0x1
> state: 4: ACTIVE
> phys state: 5: LinkUp
> rate: 10 Gb/sec (4X)
>
>
>
>
> However, when I start runing an mpi job I get the following message
> indicating that the infiniband is not working (I am definitely using the
> mpi-libs compiled with infiniband support):
>
> [0,1,0]: uDAPL on host n01 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [0,1,2]: uDAPL on host n01 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [0,1,3]: uDAPL on host n02 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [0,1,1]: uDAPL on host n02 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
>
> I am a complete novice in the infiniband area, so can anybody give me
> some advise
> what's going wrong here and how to get the jobs running with infiniband?
>
>
> Thanks for any help
>
> Michael
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>
--
A.Nazemian
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users