Hi Brian,

 Thank you for your help. I have attached all the files you have asked
 for in a tar file.

 Please find attached the 'config.log' and 'libmpi.la' for my Solaris
 installation.

 The output from 'mpicc -showme' is

 sunos$ mpicc -showme
 gcc -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/include
 -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-
 5.9/include/openmpi/ompi-L/home/cs/manredd/OpenMPI/openmpi-
 1.0.1/OpenMPI-SunOS-5.9/lib -lmpi
 -lorte -lopal -lnsl -lsocket -lthread -laio -lm -lnsl -lsocket -
 lthread -ldl

 There are serious issues when running on just solaris machines.

 I am using the host file and app file shown below. Both the 
 machines are
 SunOS and are similar.

 hosts.txt
 ---------
 csultra01 slots=1
 csultra02 slots=1

 mpiinit_appfile
 ---------------
 -np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos
 -np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos

 Running mpirun without -d option hangs.

 csultra01$ mpirun --hostfile hosts.txt --app mpiinit_appfile
 hangs

 Running mpirun with -d option dumps core with output in the file
 "mpirun_output_d_option.txt", which is attached. The core is also 
 attached.
 Running just on only one host is also not working. The output from
 mpirun using "-d" option for this scenario is attached in file
 "mpirun_output_d_option_one_host.txt".

 I have also attached the list of packages installed on my solaris
 machine in "pkginfo.txt"

 I hope these will help you to resolve the issue.

 Regards,
 Ravi.

> ----- Original Message -----
> From: Brian Barrett <brbar...@open-mpi.org>
> Date: Friday, March 10, 2006 7:09 pm
> Subject: Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9;
> problems on heterogeneous cluster
> To: Open MPI Users <us...@open-mpi.org>
> 
> > On Mar 10, 2006, at 12:09 AM, Ravi Manumachu wrote:
> > 
> > > I am facing problems running OpenMPI-1.0.1 on a heterogeneous 
> > cluster.>
> > > I have a Linux machine and a SunOS machine in this cluster.
> > >
> > > linux$ uname -a
> > > Linux pg1cluster01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 
> EDT 
> > 2004> i686 i686 i386 GNU/Linux
> > >
> > > sunos$ uname -a
> > > SunOS csultra01 5.9 Generic_112233-10 sun4u sparc SUNW,Ultra-5_10
> > 
> > Unfortunately, this will not work with Open MPI at present.  Open 
> > MPI  
> > 1.0.x does not have any support for running across platforms with 
> 
> > different endianness.  Open MPI 1.1.x has much better support for 
> 
> > such situations, but is far from complete, as the MPI datatype 
> > engine  
> > does not properly fix up endian issues.  We're working on the 
> > issue,  
> > but can not give a timetable for completion.
> > 
> > Also note that (while not a problem here) Open MPI also does not  
> > support running in a mixed 32 bit / 64 bit environment.  All  
> > processes must be 32 or 64 bit, but not a mix.
> > 
> > > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/ 
> > > mpiinit_sunos:
> > > fatal: relocation error: file
> > > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/ 
> > > libmca_common_sm.so.0:
> > > symbol nanosleep: referenced symbol not found
> > > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/ 
> > > mpiinit_sunos:
> > > fatal: relocation error: file
> > > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/ 
> > > libmca_common_sm.so.0:
> > > symbol nanosleep: referenced symbol not found
> > >
> > > I have fixed this by compiling with "-lrt" option to the linker.
> > 
> > You shouldn't have to do this...  Could you send me the 
> config.log  
> > file configure for Open MPI, the installed $prefix/lib/libmpi.la  
> > file, and the output of mpicc -showme?
> > 
> > > sunos$ mpicc -o mpiinit_sunos mpiinit.c -lrt
> > >
> > > However when I run this again, I get the error:
> > >
> > > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > > [pg1cluster01:19858] ERROR: A daemon on node csultra01 failed 
> to 
> > start> as expected.
> > > [pg1cluster01:19858] ERROR: There may be more information 
> > available  
> > > from
> > > [pg1cluster01:19858] ERROR: the remote shell (see above).
> > > [pg1cluster01:19858] ERROR: The daemon exited unexpectedly with 
> 
> > > status 255.
> > > 2 processes killed (possibly by Open MPI)
> > 
> > Both of these are quite unexpected.  It looks like there is 
> > something  
> > wrong with your Solaris build.  Can you run on *just* the Solaris 
> 
> > machine?  We only have limited resources for testing on Solaris, 
> > but  
> > have not run into this issue before.  What happens if you run 
> > mpirun  
> > on just the Solaris machine with the -d option to mpirun?
> > 
> > > Sometimes I get the error.
> > >
> > > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > > [csultra01:06256] mca_common_sm_mmap_init: ftruncate failed 
> with  
> > > errno=28
> > > [csultra01:06256] mca_mpool_sm_init: unable to create shared 
> > memory  
> > > mapping
> > > ----------------------------------------------------------------
> --
> > ---- 
> > > ----
> > > It looks like MPI_INIT failed for some reason; your parallel  
> > > process is
> > > likely to abort.  There are many reasons that a parallel 
> process can
> > > fail during MPI_INIT; some of which are due to configuration or 
> 
> > > environment
> > > problems.  This failure appears to be an internal failure; 
> here's 
> > some> additional information (which may only be relevant to an 
> Open 
> > MPI> developer):
> > >
> > >   PML add procs failed
> > >   --> Returned value -2 instead of OMPI_SUCCESS
> > > ----------------------------------------------------------------
> --
> > ---- 
> > > ----
> > > *** An error occurred in MPI_Init
> > > *** before MPI was initialized
> > > *** MPI_ERRORS_ARE_FATAL (goodbye)
> > 
> > This looks like you got far enough along that you ran into our  
> > endianness issues, so this is about the best case you can hope 
> for 
> > in  
> > your configuration.  The ftruncate error worries me, however.  
> But 
> > I  
> > think this is another symptom of something wrong with your Sun 
> > Sparc  
> > build.
> > 
> > Brian
> > 
> > -- 
> >   Brian Barrett
> >   Open MPI developer
> >   http://www.open-mpi.org/
> > 
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> 

Attachment: OpenMPI-1.0.1-SunOS-5.9.tar.gz
Description: GNU Zip compressed data

Reply via email to