What scheduler (if any) are you using to run you mpi command?  What is
the command you are trying to run?  Has this worked in the past?  Do
you have any experienced users who you can ask how they were running
things previously?

On Thu, Nov 20, 2008 at 5:11 AM, Michael Oevermann
<[EMAIL PROTECTED]> wrote:
> Hi all,
> I have "inherited" a small cluster with a head node and four compute
> nodes which
> I have to administer.  The nodes are connected via infiniband (OFED). When I
> do a
>
> cexec :1-4 ibstatus
>
> I get someinformation indicating that the infiniband is sort of available:
>
> ************************* oscar_cluster *************************
> --------- n01---------
> Infiniband device 'mthca0' port 1 status:
>        default gid:     fe80:0000:0000:0000:0002:c902:0025:930d
>        base lid:        0x1
>        sm lid:          0x1
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            10 Gb/sec (4X)
>
> --------- n02---------
> Infiniband device 'mthca0' port 1 status:
>        default gid:     fe80:0000:0000:0000:0002:c902:0025:931d
>        base lid:        0x3
>        sm lid:          0x1
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            10 Gb/sec (4X)
>
> --------- n03---------
>        default gid:     fe80:0000:0000:0000:0002:c902:0025:9321
>        base lid:        0x5
>        sm lid:          0x1
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            10 Gb/sec (4X)
>
> --------- n04---------
> Infiniband device 'mthca0' port 1 status:
>        default gid:     fe80:0000:0000:0000:0002:c902:0025:9201
>        base lid:        0x2
>        sm lid:          0x1
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            10 Gb/sec (4X)
>
>
>
>
> However, when I  start runing an mpi job I get the following message
> indicating that the infiniband is not working (I am definitely using the
> mpi-libs compiled with infiniband support):
>
> [0,1,0]: uDAPL on host n01 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [0,1,2]: uDAPL on host n01 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [0,1,3]: uDAPL on host n02 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> [0,1,1]: uDAPL on host n02 was unable to find any NICs.
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
>
> I am a complete novice in the infiniband area, so can anybody give me
> some advise
> what's going wrong here and how to get the jobs running with infiniband?
>
>
> Thanks for any help
>
> Michael
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Oscar-users mailing list
> Oscar-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Oscar-users mailing list
Oscar-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to