Re: [OMPI users] x86_64 head with x86 diskless nodes, Node execution fails with SEGV_MAPERR

2006-07-16 Thread Eric Thibodeau
Thanks, now all makes more sense to me. I'll try the hard way, multiple builds 
for multiple envs ;)

Eric

Le dimanche 16 juillet 2006 18:21, Brian Barrett a écrit :
> On Jul 16, 2006, at 4:13 PM, Eric Thibodeau wrote:
> > Now that I have that out of the way, I'd like to know how I am  
> > supposed to compile my apps so that they can run on an homogenous  
> > network with mpi. Here is an example:
> >
> > kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpicc -L/ 
> > usr/X/lib -lm -lX11 -O3 mandelbrot-mpi.c -o mandelbrot-mpi
> >
> > kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpirun -- 
> > hostfile hostlist -np 3 ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/ 
> > mandelbrot-mpi
> >
> > -- 
> > 
> >
> > Could not execute the executable "/home/kyron/1_Files/1_ETS/ 
> > 1_Maitrise/MGL810/Devoir2/mandelbrot-mpi": Exec format error
> >
> >
> > This could mean that your PATH or executable name is wrong, or that  
> > you do not
> >
> > have the necessary permissions. Please ensure that the executable  
> > is able to be
> >
> > found and executed.
> >
> > -- 
> > 
> >
> > As can be seen with the uname -a that was run previously, I have 2  
> > "local nodes" on the x86_64 and two i686 nodes. I tried to find  
> > examples in the Doc on howto compile applications correctly for  
> > such a setup without compromising performance but I came short of  
> > an example.
> 
>  From the sound of it, you have a heterogeneous configuration -- some  
> nodes are x86_64 and some are x86.  Because of this, you either have  
> to compile your application twice, once for each platform or compile  
> your application for the lowest common denominator.  My guess would  
> be that it easier and more foolproof if you compiled everything in 32  
> bit mode.  If you run in a mixed mode, using application schemas (see  
> the mpirun man page) will be the easiest way to make things work.
> 
> 
> Brian
> 

-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517



Re: [OMPI users] x86_64 head with x86 diskless nodes, Node execution fails with SEGV_MAPERR

2006-07-16 Thread Brian Barrett

On Jul 16, 2006, at 4:13 PM, Eric Thibodeau wrote:
Now that I have that out of the way, I'd like to know how I am  
supposed to compile my apps so that they can run on an homogenous  
network with mpi. Here is an example:


kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpicc -L/ 
usr/X/lib -lm -lX11 -O3 mandelbrot-mpi.c -o mandelbrot-mpi


kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpirun -- 
hostfile hostlist -np 3 ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/ 
mandelbrot-mpi


-- 



Could not execute the executable "/home/kyron/1_Files/1_ETS/ 
1_Maitrise/MGL810/Devoir2/mandelbrot-mpi": Exec format error



This could mean that your PATH or executable name is wrong, or that  
you do not


have the necessary permissions. Please ensure that the executable  
is able to be


found and executed.

-- 



As can be seen with the uname -a that was run previously, I have 2  
"local nodes" on the x86_64 and two i686 nodes. I tried to find  
examples in the Doc on howto compile applications correctly for  
such a setup without compromising performance but I came short of  
an example.


From the sound of it, you have a heterogeneous configuration -- some  
nodes are x86_64 and some are x86.  Because of this, you either have  
to compile your application twice, once for each platform or compile  
your application for the lowest common denominator.  My guess would  
be that it easier and more foolproof if you compiled everything in 32  
bit mode.  If you run in a mixed mode, using application schemas (see  
the mpirun man page) will be the easiest way to make things work.



Brian

--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




Re: [OMPI users] x86_64 head with x86 diskless nodes, Node execution fails with SEGV_MAPERR

2006-07-16 Thread Eric Thibodeau
/me blushes in shame, it would seem that all I needed to do since the begining 
was to run a make distclean. I apprantly had some old compiled files lying 
around. Now I get:

kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpirun  --hostfile 
hostlist -np 4 uname -a
Linux headless 2.6.17-ck1-r1 #1 SMP Tue Jul 11 16:39:18 EDT 2006 x86_64 AMD 
Opteron(tm) Processor 244 GNU/Linux
Linux headless 2.6.17-ck1-r1 #1 SMP Tue Jul 11 16:39:18 EDT 2006 x86_64 AMD 
Opteron(tm) Processor 244 GNU/Linux
Linux node0 2.6.16-gentoo-r7 #5 Tue Jul 11 12:30:41 EDT 2006 i686 AMD 
Athlon(TM) XP 2500+ GNU/Linux
Linux node1 2.6.16-gentoo-r7 #5 Tue Jul 11 12:30:41 EDT 2006 i686 AMD 
Athlon(TM) XP 2500+ GNU/Linux

Which is correct. Sorry for the misfire, I hadn't thought of cleaning up the 
compilation dir...

Now that I have that out of the way, I'd like to know how I am supposed to 
compile my apps so that they can run on an homogenous network with mpi. Here is 
an example:
kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpicc -L/usr/X/lib 
-lm -lX11 -O3 mandelbrot-mpi.c -o mandelbrot-mpi
kyron@headless ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2 $ mpirun --hostfile 
hostlist -np 3 ~/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/mandelbrot-mpi
--
Could not execute the executable 
"/home/kyron/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/mandelbrot-mpi": Exec 
format error

This could mean that your PATH or executable name is wrong, or that you do not
have the necessary permissions.  Please ensure that the executable is able to be
found and executed.
--
As can be seen with the uname -a that was run previously, I have 2 "local 
nodes" on the x86_64 and two i686 nodes. I tried to find examples in the Doc on 
howto compile applications correctly for such a setup without compromising 
performance but I came short of an example.

Thanks,

Eric
PS: I know..maybe I should start another thread ;)

Le dimanche 16 juillet 2006 14:31, Brian Barrett a écrit :
> On Jul 15, 2006, at 2:58 PM, Eric Thibodeau wrote:
> > But, for some reason, on the Athlon node (in their image on the  
> > server I should say) OpenMPI still doesn't seem to be built  
> > correctly since it crashes as follows:
> >
> >
> > kyron@node0 ~ $ mpirun -np 1 uptime
> >
> > Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> >
> > Failing at addr:(nil)
> >
> > [0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]
> >
> > [1] func:[0xe440]
> >
> > [2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1 
> > +0x1d7) [0xb7fa0227]
> >
> > [3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init 
> > +0x23) [0xb7fa3683]
> >
> > [4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f)  
> > [0xb7f9ff7f]
> >
> > [5] func:mpirun(orterun+0x255) [0x804a015]
> >
> > [6] func:mpirun(main+0x22) [0x8049db6]
> >
> > [7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]
> >
> > [8] func:mpirun [0x8049d11]
> >
> > *** End of error message ***
> >
> > Segmentation fault
> >
> >
> > The crash happens both in the chrooted env and on the nodes. I  
> > configured both systems to have Linux and POSIX threads, though I  
> > see openmpi is calling the POSIX version (a message on the mailling  
> > list had hinted on keeping the Linux threads around...I have to  
> > anyways since sone apps like Matlab extensions still depend on  
> > this...). The following is the output for the libc info.
> 
> That's interesting...  We regularly build Open MPI on 32 bit Linux  
> machines (and in 32 bit mode on Opteron machines) without too much  
> issue.  It looks like we're jumping into a NULL pointer, which  
> generally means that a ORTE framework failed to initialize itself  
> properly.  It would be useful if you could rebuild with debugging  
> symbols (just add -g to CFLAGS when configuring) and run mpirun in  
> gdb.  If we can determine where the error is occurring, that would  
> definitely help in debugging your problem.
> 
> Brian
> 
> 

-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517

Re: [OMPI users] x86_64 head with x86 diskless nodes, Node execution fails with SEGV_MAPERR

2006-07-16 Thread Brian Barrett

On Jul 15, 2006, at 2:58 PM, Eric Thibodeau wrote:
But, for some reason, on the Athlon node (in their image on the  
server I should say) OpenMPI still doesn't seem to be built  
correctly since it crashes as follows:



kyron@node0 ~ $ mpirun -np 1 uptime

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)

Failing at addr:(nil)

[0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]

[1] func:[0xe440]

[2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1 
+0x1d7) [0xb7fa0227]


[3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init 
+0x23) [0xb7fa3683]


[4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f)  
[0xb7f9ff7f]


[5] func:mpirun(orterun+0x255) [0x804a015]

[6] func:mpirun(main+0x22) [0x8049db6]

[7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]

[8] func:mpirun [0x8049d11]

*** End of error message ***

Segmentation fault


The crash happens both in the chrooted env and on the nodes. I  
configured both systems to have Linux and POSIX threads, though I  
see openmpi is calling the POSIX version (a message on the mailling  
list had hinted on keeping the Linux threads around...I have to  
anyways since sone apps like Matlab extensions still depend on  
this...). The following is the output for the libc info.


That's interesting...  We regularly build Open MPI on 32 bit Linux  
machines (and in 32 bit mode on Opteron machines) without too much  
issue.  It looks like we're jumping into a NULL pointer, which  
generally means that a ORTE framework failed to initialize itself  
properly.  It would be useful if you could rebuild with debugging  
symbols (just add -g to CFLAGS when configuring) and run mpirun in  
gdb.  If we can determine where the error is occurring, that would  
definitely help in debugging your problem.


Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/




[OMPI users] x86_64 head with x86 diskless nodes, Node execution fails with SEGV_MAPERR

2006-07-15 Thread Eric Thibodeau
Hello all,

I've been trying to set up a small test cluster with a dual Opteron 
head and Athlon nodes. My environment in both cases is Gentoo and the nodes 
boot off PXE using an image built and stored on the master node. I chroot into 
the node's environment using:

linux32 chroot ${ROOT} /bin/bash

To cross over the 64/32bit barrier. My user's home direcory is loop-mounted 
into that environment and NFS exported to the nodes. I build OpenMPI in the 
following way:

In the build folder of OpenMPI-1.1:
./configure --cache-file=config_`uname -m`.cache 
--enable-pretty-print-stacktrace --prefix=$HOME/openmpi_`uname -m`
make -j4 && make install

I perform this exact same command in the Opteron and chrooted environment for 
the Athlon machines. This then gives me the following folders in my $HOME:
/home/kyron/openmpi_i686
/home/kyron/openmpi_x86_64

But, for some reason, on the Athlon node (in their image on the server I should 
say) OpenMPI still doesn't seem to be built correctly since it crashes as 
follows:

kyron@node0 ~ $ mpirun -np 1 uptime
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]
[1] func:[0xe440]
[2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1+0x1d7) 
[0xb7fa0227]
[3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init+0x23) 
[0xb7fa3683]
[4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f) [0xb7f9ff7f]
[5] func:mpirun(orterun+0x255) [0x804a015]
[6] func:mpirun(main+0x22) [0x8049db6]
[7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]
[8] func:mpirun [0x8049d11]
*** End of error message ***
Segmentation fault

The crash happens both in the chrooted env and on the  nodes. I configured both 
systems to have Linux and POSIX threads, though I see openmpi is calling the 
POSIX version (a message on the mailling list had hinted on keeping the Linux 
threads around...I have to anyways since sone apps like Matlab extensions still 
depend on this...). The following is the output for the libc info.

kyron@headless ~ $ /lib/tls/libc.so.6
GNU C Library stable release version 2.3.6, by Roland McGrath et al.
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.1.1 (Gentoo 4.1.1).
Compiled on a Linux 2.6.11 system on 2006-07-14.
Available extensions:
GNU libio by Per Bothner
crypt add-on version 2.1 by Michael Glad and others
Native POSIX Threads Library by Ulrich Drepper et al
The C stubs add-on version 2.1.2.
GNU Libidn by Simon Josefsson
BIND-8.2.3-T5B
NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.
For bug reporting instructions, please see:
.

I am attaching the config.log and ompi_info for both platforms. Before sending 
this e-mail I tried compiling OpenMPI on one of the nodes (booted off the 
image) and I am getting the exact same problem (so chroot vs local build 
doesn't seem to be a factor). The attached file contains:

config.log.x86_64   <--config log for the Opteron build (works locally)
config.log_node0<--config log for the Athlon build (on the node)
ompi_info.i686  <--ompi_info on the Athlon node
ompi_info.x86_64<--ompi_info on the Opteron Master

Thanks,

-- 
Eric Thibodeau
Neural Bucket Solutions Inc.
T. (514) 736-1436
C. (514) 710-0517

ENV_info.tbz
Description: application/tbz