On 08/17/2011 10:38 AM, Chris Dagdigian wrote:
Hi folks,

I'm sorta stymied by the magic of effortless openmpi tight integration
with SGE and am wondering how best to proceed...

[...]

However with the magic/automatic support that SGE has for OpenMPI there
is no written MPI hosts file that I can find ($TMPDIR/hosts does not
exist in the job context) -- the SGE scheduler just sends the selected
host set directly to the OpenMPI starter process and in my case it seems
clear that SGE is sending the "ethernet" hostnames instead of the IB
hostnames and thus my shiny IB fabric is being ignored in favor of
running MPI over the ethernet links.

What does your mpirun command look like? Was OpenMPI built against the IB stack? Is this part of the OFED stack (so OpenMPI should have been built correctly)?

Specifically, could you do an

         ompi_info | egrep '(rdma|openib)'

and see what it reports?

One of our units:

[root@jr5-lab ~]# ompi_info | egrep '(rdma|openib)'
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.3)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.3)


So my basic question is "how to force tightly integrated openmpi to use
a (sligthly) different set of hostnames so that the IB fabric is
actually used ..."

unless you are using switches like

        -mca btl tcp,self,sm,^openib

its probably running on IB. You could have environment variables set or config variables set which impact this.

Right now I'm thinking of mirroring part of the loose integration method
and writing a simple pe_starter method that will take $pe_hosts and
translate it into a hostfile that has the 'nodeN' to 'inodeN' regex
applied. Then I can modify my job scripts to force mpirun to accept a
machinesfile or hostfile argument.

Is there a better way ?

See above. You can turn on significant verbosity and get the layers to report what they are doing.


Also, is there a better way to "prove" what network/interface endpoints
openmpi is using? So far for debugging I've been using the following
options to sorta prove to myself that the non-IB network is being used:

$MPIRUN --display-devel-allocation --display-allocation --verbose
--show-progress

use

        $MPIRUN   --mca btl sm,self,openib,^tcp ...

to force tcpip off. If the code fails, then probably there is a network issue to be addressed.


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: [email protected]
web  : http://scalableinformatics.com
       http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to