Gustavo,

I will definitely try to compile OpenMPI myself and see if the problem
persist
Regarding your note on homogeneous nodes; I tried to do that as much as
possible.
But I had no control over two nodes and each of them had different setup.
As Jeff suggested, using .bashrc seems to solve the issue

Thanks

On Wed, Feb 15, 2012 at 6:52 PM, Gustavo Correa <g...@ldeo.columbia.edu>wrote:

> Hi Tohiko
>
> If you compiled Open MPI in a computer with IB hardware,
> then copied the installation tree to another machine,
> or if you installed from an RPM or other package generated in a
> machine with IB, your OpenMPI will have IB enabled,  I think, even if the
> machine where it is running does not have IB.
>
> This is a matter of taste, but here is what I think,
> regarding a previous question you sent.
> I would rather compile open MPI from source, in the machine[s] where it
> will
> run, and install it with the same path on all machines {or in a single NFS
> shared directory},
> to make things simpler.
> I would use the most homogeneous set of machines possible,  to avoid too
> many headaches.
> I.e. use the least common denominator, so to speak.
> Say, everything x86_64, all with Ethernet only [or all with IB + Ethernet,
> but you
> don't seem to have IB, at least not on all machines].
>
> I hope this helps,
> Gus Correa
>
> On Feb 15, 2012, at 1:27 AM, Tohiko Looka wrote:
>
> > Mm... This is really strange
> > I don't have that service and there is no ib* output in 'ifconfig -a' or
> 'Infinband' in 'lspci'
> > Which makes me believe that I don't have such a network. I also checked
> on an identical computer on the same network with the same results.
> >
> > What's strange is that these messages didn't use to show up and they
> don't show up on that identical computer; only on mine. Even though both
> computers have the same hardware, openMPI version and on the same network.
> >
> > I guess I can safely ignore these warnings and run on Ethernet, but it
> would be nice to know what happened there, in case anybody has an idea.
> >
> > Thank you,
> >
> > On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa <g...@ldeo.columbia.edu>
> wrote:
> > Hi Tohiko
> >
> > OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> > To check if the compute nodes have IB interfaces, try:
> >
> > lspci [and search the output for Infinband]
> >
> > To see if the IB interface is configured try:
> >
> > ifconfig -a  [and search the output for ib0, ib1, or similar]
> >
> > To check if the OFED module is up try:
> >
> > 'service openibd status'
> >
> >
> > As an alternative, you could also try to run your program over Ethernet,
> avoiding Infinband,
> > in case you don't have IB or if somehow it is broken.
> > It is slower than Infiniband, though.
> >
> > Try something like this:
> >
> > mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
> >
> > I hope this helps,
> > Gus Correa
> >
> > On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
> >
> > > Sorry for the noob question, but how do I check my network type and if
> OFED service is running correctly or not? And how do I run it
> > >
> > > Thank you,
> > >
> > > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres <jsquy...@cisco.com>
> wrote:
> > > Do you have an OpenFabrics-based network?  (e.g., InfiniBand or iWarp)
> > >
> > > If so, this error message usually means that OFED is either installed
> incorrectly, or is not running properly (e.g., its services didn't get
> started properly upon boot).
> > >
> > > If you don't have an OpenFabrics-based network, then it usually means
> that you have OpenFabrics services running when you really shouldn't
> (because you don't have any OpenFabrics-based devices).
> > >
> > >
> > > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> > >
> > > > Greetings,
> > > >
> > > > Until today I was running my openmpi applications with no
> errors/warnings
> > > > Today I restarted my computer (possibly after an automatic openmpi
> update) and got these warnings when
> > > > running my program
> > > > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np
> 10 hello
> > > > librdmacm: couldn't read ABI version.
> > > > librdmacm: assuming: 4
> > > > CMA: unable to get RDMA device list
> > > >
> --------------------------------------------------------------------------
> > > > [[21652,1],0]: A high-performance Open MPI point-to-point messaging
> module
> > > > was unable to find any relevant network interfaces:
> > > >
> > > > Module: OpenFabrics (openib)
> > > >   Host: kw12614
> > > >
> > > > Another transport will be used instead, although this may result in
> > > > lower performance.
> > > >
> --------------------------------------------------------------------------
> > > > [kw12614:03195] 10 more processes have sent help message
> help-mpi-btl-base.txt / btl:no-nics
> > > > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate" to 0 to
> see all help / error messages
> > > >
> > > >
> > > > Is this normal? And how come it happened now?
> > > > -- Tohiko
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > > For corporate legal information go to:
> > > http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to