Hello,

 I suggest you use the '--mca btl sm,openib,self' option when launching your job
with the mpirun command. Or '--mca btl openib,self'  if shared-memory is not    
involved.

 I think there is no cheap way to check this other than switching on several
'btl' related mca debug flags. Please refer to the list provided thru the
ompi_info list (or the FAQs).

 Regards,   Gilbert.

On Wed, 10 Dec 2008, Sangamesh B wrote:

> Now its working fine.
> 
> Thanks for the suggestion.
> 
> Some clarifications required:
> 
> I think its possible to mention: communication should happen thru only
> IB, not Ethernet.
> Not getting how to do it.
> 
> How to check whether IB is used or not?
> 
> Regards,
> Sangamesh
> On Sun, Dec 7, 2008 at 9:08 PM, Brian Dobbins <bdobb...@gmail.com> wrote:
> >
> > Hi Sangamesh,
> >
> >   I think the problem is that you're loading a different version of OpenMPI
> > at runtime:
> >
> > [master:17781] [ 1] /usr/lib64/openmpi/libmpi.so.0 [0x34b19544b8]
> >
> >   .. The path there is to '/usr/lib64/openmpi', which is probably a
> > system-installed GCC version.  You want to use your version in:
> >
> >  /opt/openmpi_intel/1.2.8/
> >
> >   You probably just need to re-set your LD_LIBRARY_PATH environment variable
> > to reflect this new path, such as:
> >
> > (for bash)
> > export LD_LIBRARY_PATH=/opt/openmpi_intel/1.2.8/lib:${LD_LIBRARY_PATH}
> >
> >   ... By doing this, it should find the proper library files (assuming
> > that's the directory they're in - check your instal!).  You may also wish to
> > remove the old version of OpenMPI that came with the system - a yum 'list'
> > command should show you the package, and then just remove it.  The
> > 'feupdateenv' thing is more of a red herring, I think... this happens (I
> > think!) because the system uses a Linux version of the library instead of an
> > Intel one.  You can add the flag '-shared-intel' to your compile flags or
> > command line and that should get rid of that, if it bugs you.  Someone else
> > can, I'm sure, explain in far more detail what the issue there is.
> >
> >   Hope that helps.. if not, post the output of 'ldd hellompi' here, as well
> > as an 'ls /opt/openmpi_intel/1.2.8/'
> >
> >   Cheers!
> >   - Brian
> >
> >
> >
> > On Sun, Dec 7, 2008 at 9:50 AM, Sangamesh B <forum....@gmail.com> wrote:
> >>
> >> Hello all,
> >>
> >> Installed Open MPI 1.2.8 with Intel C++compilers on Cent OS 4.5 based
> >> Rocks 4.3 linux cluster (& Voltaire infiniband). Installation was
> >> smooth.
> >>
> >> The following error occurred during compilation:
> >>
> >> # mpicc hellompi.c -o hellompi
> >> /opt/intel/cce/10.1.018/lib/libimf.so: warning: warning: feupdateenv
> >> is not implemented and will always fail
> >>
> >> It produced the executable. But during execution it failed with
> >> Segmentation fault:
> >>
> >>  # which mpirun
> >> /opt/openmpi_intel/1.2.8/bin/mpirun
> >> # mpirun -np 2 ./hellompi
> >> ./hellompi: Symbol `ompi_mpi_comm_world' has different size in shared
> >> object, consider re-linking
> >> ./hellompi: Symbol `ompi_mpi_comm_world' has different size in shared
> >> object, consider re-linking
> >> [master:17781] *** Process received signal ***
> >> [master:17781] Signal: Segmentation fault (11)
> >> [master:17781] Signal code: Address not mapped (1)
> >> [master:17781] Failing at address: 0x10
> >> [master:17781] [ 0] /lib64/tls/libpthread.so.0 [0x34b150c4f0]
> >> [master:17781] [ 1] /usr/lib64/openmpi/libmpi.so.0 [0x34b19544b8]
> >> [master:17781] [ 2]
> >> /usr/lib64/openmpi/libmpi.so.0(ompi_proc_init+0x14d) [0x34b1954cfd]
> >> [master:17781] [ 3] /usr/lib64/openmpi/libmpi.so.0(ompi_mpi_init+0xba)
> >> [0x34b19567da]
> >> [master:17781] [ 4] /usr/lib64/openmpi/libmpi.so.0(MPI_Init+0x94)
> >> [0x34b1977ab4]
> >> [master:17781] [ 5] ./hellompi(main+0x44) [0x401c0c]
> >> [master:17781] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
> >> [0x34b0e1c3fb]
> >> [master:17781] [ 7] ./hellompi [0x401b3a]
> >> [master:17781] *** End of error message ***
> >> [master:17778] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed:
> >> Connection reset by peer (104)
> >> mpirun noticed that job rank 0 with PID 17781 on node master exited on
> >> signal 11 (Segmentation fault).
> >> 1 additional process aborted (not shown)
> >>
> >> But this is not the case, during non-mpi c code compilation or execution.
> >>
> >> # icc sample.c -o sample
> >> # ./sample
> >>
> >> Compiler is working
> >> #
> >>
> >> What might be the reason for this & how it can be resolved?
> >>
> >> Thanks,
> >> Sangamesh
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
*---------------------------------------------------------------------*
  Gilbert Grosdidier                 gilbert.grosdid...@in2p3.fr
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
 ---------------------------------------------------------------------

Reply via email to