Hi Brian,

Thank you for your help. I have attached all the files you have asked
for in a tar file.

Please find attached the 'config.log' and 'libmpi.la' for my Solaris
installation.

The output from 'mpicc -showme' is

sunos$ mpicc -showme
gcc -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/include
-I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/include/openmpi/ompi
-L/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib -lmpi
-lorte -lopal -lnsl -lsocket -lthread -laio -lm -lnsl -lsocket -lthread -ldl

There are serious issues when running on just solaris machines.

I am using the host file and app file shown below. Both the machines are
SunOS and are similar.

hosts.txt
---------
csultra01 slots=1
csultra02 slots=1

mpiinit_appfile
---------------
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos

Running mpirun without -d option hangs.

csultra01$ mpirun --hostfile hosts.txt --app mpiinit_appfile
hangs

Running mpirun with -d option dumps core with output in the file
"mpirun_output_d_option.txt", which is attached. The core is also attached.

Running just on only one host is also not working. The output from
mpirun using "-d" option for this scenario is attached in file
"mpirun_output_d_option_one_host.txt".

I have also attached the list of packages installed on my solaris
machine in "pkginfo.txt"

I hope these will help you to resolve the issue.

Regards,
Ravi.

----- Original Message -----
From: Brian Barrett <brbar...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: Friday, March 10, 2006 7:09 pm
Subject: Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9;
problems on heterogeneous cluster
To: Open MPI Users <us...@open-mpi.org>

> On Mar 10, 2006, at 12:09 AM, Ravi Manumachu wrote:
> 
> > I am facing problems running OpenMPI-1.0.1 on a heterogeneous 
> cluster.>
> > I have a Linux machine and a SunOS machine in this cluster.
> >
> > linux$ uname -a
> > Linux pg1cluster01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 
> 2004> i686 i686 i386 GNU/Linux
> >
> > sunos$ uname -a
> > SunOS csultra01 5.9 Generic_112233-10 sun4u sparc SUNW,Ultra-5_10
> 
> Unfortunately, this will not work with Open MPI at present.  Open 
> MPI  
> 1.0.x does not have any support for running across platforms with  
> different endianness.  Open MPI 1.1.x has much better support for  
> such situations, but is far from complete, as the MPI datatype 
> engine  
> does not properly fix up endian issues.  We're working on the 
> issue,  
> but can not give a timetable for completion.
> 
> Also note that (while not a problem here) Open MPI also does not  
> support running in a mixed 32 bit / 64 bit environment.  All  
> processes must be 32 or 64 bit, but not a mix.
> 
> > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/ 
> > mpiinit_sunos:
> > fatal: relocation error: file
> > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/ 
> > libmca_common_sm.so.0:
> > symbol nanosleep: referenced symbol not found
> > ld.so.1: /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/ 
> > mpiinit_sunos:
> > fatal: relocation error: file
> > /home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/lib/ 
> > libmca_common_sm.so.0:
> > symbol nanosleep: referenced symbol not found
> >
> > I have fixed this by compiling with "-lrt" option to the linker.
> 
> You shouldn't have to do this...  Could you send me the config.log  
> file configure for Open MPI, the installed $prefix/lib/libmpi.la  
> file, and the output of mpicc -showme?
> 
> > sunos$ mpicc -o mpiinit_sunos mpiinit.c -lrt
> >
> > However when I run this again, I get the error:
> >
> > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > [pg1cluster01:19858] ERROR: A daemon on node csultra01 failed to 
> start> as expected.
> > [pg1cluster01:19858] ERROR: There may be more information 
> available  
> > from
> > [pg1cluster01:19858] ERROR: the remote shell (see above).
> > [pg1cluster01:19858] ERROR: The daemon exited unexpectedly with  
> > status 255.
> > 2 processes killed (possibly by Open MPI)
> 
> Both of these are quite unexpected.  It looks like there is 
> something  
> wrong with your Solaris build.  Can you run on *just* the Solaris  
> machine?  We only have limited resources for testing on Solaris, 
> but  
> have not run into this issue before.  What happens if you run 
> mpirun  
> on just the Solaris machine with the -d option to mpirun?
> 
> > Sometimes I get the error.
> >
> > $ mpirun --hostfile hosts.txt --app mpiinit_appfile
> > [csultra01:06256] mca_common_sm_mmap_init: ftruncate failed with  
> > errno=28
> > [csultra01:06256] mca_mpool_sm_init: unable to create shared 
> memory  
> > mapping
> > ------------------------------------------------------------------
> ---- 
> > ----
> > It looks like MPI_INIT failed for some reason; your parallel  
> > process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during MPI_INIT; some of which are due to configuration or  
> > environment
> > problems.  This failure appears to be an internal failure; here's 
> some> additional information (which may only be relevant to an Open 
> MPI> developer):
> >
> >   PML add procs failed
> >   --> Returned value -2 instead of OMPI_SUCCESS
> > ------------------------------------------------------------------
> ---- 
> > ----
> > *** An error occurred in MPI_Init
> > *** before MPI was initialized
> > *** MPI_ERRORS_ARE_FATAL (goodbye)
> 
> This looks like you got far enough along that you ran into our  
> endianness issues, so this is about the best case you can hope for 
> in  
> your configuration.  The ftruncate error worries me, however.  But 
> I  
> think this is another symptom of something wrong with your Sun 
> Sparc  
> build.
> 
> Brian
> 
> -- 
>   Brian Barrett
>   Open MPI developer
>   http://www.open-mpi.org/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

Reply via email to