Hello, I suggest you use the '--mca btl sm,openib,self' option when launching your job with the mpirun command. Or '--mca btl openib,self' if shared-memory is not involved.
I think there is no cheap way to check this other than switching on several 'btl' related mca debug flags. Please refer to the list provided thru the ompi_info list (or the FAQs). Regards, Gilbert. On Wed, 10 Dec 2008, Sangamesh B wrote: > Now its working fine. > > Thanks for the suggestion. > > Some clarifications required: > > I think its possible to mention: communication should happen thru only > IB, not Ethernet. > Not getting how to do it. > > How to check whether IB is used or not? > > Regards, > Sangamesh > On Sun, Dec 7, 2008 at 9:08 PM, Brian Dobbins <bdobb...@gmail.com> wrote: > > > > Hi Sangamesh, > > > > I think the problem is that you're loading a different version of OpenMPI > > at runtime: > > > > [master:17781] [ 1] /usr/lib64/openmpi/libmpi.so.0 [0x34b19544b8] > > > > .. The path there is to '/usr/lib64/openmpi', which is probably a > > system-installed GCC version. You want to use your version in: > > > > /opt/openmpi_intel/1.2.8/ > > > > You probably just need to re-set your LD_LIBRARY_PATH environment variable > > to reflect this new path, such as: > > > > (for bash) > > export LD_LIBRARY_PATH=/opt/openmpi_intel/1.2.8/lib:${LD_LIBRARY_PATH} > > > > ... By doing this, it should find the proper library files (assuming > > that's the directory they're in - check your instal!). You may also wish to > > remove the old version of OpenMPI that came with the system - a yum 'list' > > command should show you the package, and then just remove it. The > > 'feupdateenv' thing is more of a red herring, I think... this happens (I > > think!) because the system uses a Linux version of the library instead of an > > Intel one. You can add the flag '-shared-intel' to your compile flags or > > command line and that should get rid of that, if it bugs you. Someone else > > can, I'm sure, explain in far more detail what the issue there is. > > > > Hope that helps.. if not, post the output of 'ldd hellompi' here, as well > > as an 'ls /opt/openmpi_intel/1.2.8/' > > > > Cheers! > > - Brian > > > > > > > > On Sun, Dec 7, 2008 at 9:50 AM, Sangamesh B <forum....@gmail.com> wrote: > >> > >> Hello all, > >> > >> Installed Open MPI 1.2.8 with Intel C++compilers on Cent OS 4.5 based > >> Rocks 4.3 linux cluster (& Voltaire infiniband). Installation was > >> smooth. > >> > >> The following error occurred during compilation: > >> > >> # mpicc hellompi.c -o hellompi > >> /opt/intel/cce/10.1.018/lib/libimf.so: warning: warning: feupdateenv > >> is not implemented and will always fail > >> > >> It produced the executable. But during execution it failed with > >> Segmentation fault: > >> > >> # which mpirun > >> /opt/openmpi_intel/1.2.8/bin/mpirun > >> # mpirun -np 2 ./hellompi > >> ./hellompi: Symbol `ompi_mpi_comm_world' has different size in shared > >> object, consider re-linking > >> ./hellompi: Symbol `ompi_mpi_comm_world' has different size in shared > >> object, consider re-linking > >> [master:17781] *** Process received signal *** > >> [master:17781] Signal: Segmentation fault (11) > >> [master:17781] Signal code: Address not mapped (1) > >> [master:17781] Failing at address: 0x10 > >> [master:17781] [ 0] /lib64/tls/libpthread.so.0 [0x34b150c4f0] > >> [master:17781] [ 1] /usr/lib64/openmpi/libmpi.so.0 [0x34b19544b8] > >> [master:17781] [ 2] > >> /usr/lib64/openmpi/libmpi.so.0(ompi_proc_init+0x14d) [0x34b1954cfd] > >> [master:17781] [ 3] /usr/lib64/openmpi/libmpi.so.0(ompi_mpi_init+0xba) > >> [0x34b19567da] > >> [master:17781] [ 4] /usr/lib64/openmpi/libmpi.so.0(MPI_Init+0x94) > >> [0x34b1977ab4] > >> [master:17781] [ 5] ./hellompi(main+0x44) [0x401c0c] > >> [master:17781] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb) > >> [0x34b0e1c3fb] > >> [master:17781] [ 7] ./hellompi [0x401b3a] > >> [master:17781] *** End of error message *** > >> [master:17778] [0,0,0]-[0,1,1] mca_oob_tcp_msg_recv: readv failed: > >> Connection reset by peer (104) > >> mpirun noticed that job rank 0 with PID 17781 on node master exited on > >> signal 11 (Segmentation fault). > >> 1 additional process aborted (not shown) > >> > >> But this is not the case, during non-mpi c code compilation or execution. > >> > >> # icc sample.c -o sample > >> # ./sample > >> > >> Compiler is working > >> # > >> > >> What might be the reason for this & how it can be resolved? > >> > >> Thanks, > >> Sangamesh > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- *---------------------------------------------------------------------* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) ---------------------------------------------------------------------