[OMPI users] multi-threaded MPI

2007-11-07 Thread Brian Budge
Hi All -

I am working on a networked cache for an out-of-core application, and
currently I have it set up where I have several worker threads, and
one "request" thread per node.  The worker threads check the cache on
their own node first, and if there's a miss, they make a request to
the other nodes in the cluster to see who has the data.  The request
thread answers requests, and if a node is chosen to deliver data, the
request thread spawns another thread to handle that particular
request.

Currently my application dies in MPI_Barrier before any computation
begins (but after my request threads are spawned).  After looking into
this a bit, it seems that OpenMPI has to have thread support to handle
a model like this (i.e. multiple Sends and Recvs happening at once per
process).  According to

>  ompi_info | grep Thread
  Thread support: posix (mpi: no, progress: no)

I don't have this thread support.  I am running OpenMPI v 1.1.2 (the
latest openmpi package in Gentoo).  Can anyone make a recommendation
for what would be the version to try?

Thanks,
  Brian


[OMPI users] MPI Spawn terminates application

2007-11-07 Thread Murat Knecht
Greetings,
when MPI_Spawn cannot launch an application for whatever reason, the
entire job is cancelled with some message like the following.
Is there a way to handle this nicely, e.g. by throwing an exception? I
understand, this does not work, when the job is first started with
mpirun, as there is no application yet to fall back on, but in case of a
running application, it should be possible to simply inform it that the
spawning request failed. Then the application could begin to handle the
error and terminate gracefully. I did enable C++ Exceptions btw, so I
guess this is not implemented. Is there a technical (e.g. architectural)
reason behind this, or simply a yet-to-be-added feature?
All the best,
Murat



Re: [OMPI users] openib errors as user, but not root

2007-11-07 Thread Andrus, Mr. Brian (Contractor)
Ah! It WAS the torque startup script they provide!

It pays to get into the weeds. 


Brian Andrus perotsystems 
Site Manager | Sr. Computer Scientist 
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866 


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Wednesday, November 07, 2007 4:26 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib errors as user, but not root

Check out:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more

In particular, see the stuff about using resource managers.



On Nov 7, 2007, at 7:22 PM, Andrus, Mr. Brian (Contractor) wrote:

> Ok, I am having some difficulty troubleshooting this.
>
> If I run my hello program without torque, it works fine:
> [root@login1 root]# mpirun --mca btl openib,self -host
> n01,n02,n03,n04,n05 /data/root/hello
> Hello from process 0 of 5 on node n01
> Hello from process 1 of 5 on node n02
> Hello from process 2 of 5 on node n03
> Hello from process 3 of 5 on node n04
> Hello from process 4 of 5 on node n05
>
> If I submit it as root, it seems happy:
> [root@login1 root]# qsub
> #!/bin/bash
> #PBS -j oe
> #PBS -l nodes=5:ppn=1
> #PBS -W x=NACCESSPOLICY:SINGLEJOB
> #PBS -N TestJob
> #PBS -q long
> #PBS -o output.txt
> #PBS -V
> cd $PBS_O_WORKDIR
> rm -f output.txt
> date
> mpirun --mca btl openib,self /data/root/hello 
> 103.cluster.default.domain
> [root@login1 root]# cat output.txt
> Wed Nov  7 16:20:33 PST 2007
> Hello from process 0 of 5 on node n05
> Hello from process 1 of 5 on node n04
> Hello from process 2 of 5 on node n03
> Hello from process 3 of 5 on node n02
> Hello from process 4 of 5 on node n01
>
> If I do it as me, not so good:
> [andrus@login1 data]$ qsub
> [andrus@login1 data]$ qsub
> #!/bin/bash
> #PBS -j oe
> #PBS -l nodes=1:ppn=1
> #PBS -W x=NACCESSPOLICY:SINGLEJOB
> #PBS -N TestJob
> #PBS -q long
> #PBS -o output.txt
> #PBS -V
> cd $PBS_O_WORKDIR
> rm -f output.txt
> date
> mpirun --mca btl openib,self /data/root/hello 
> 105.littlemac.default.domain
> [andrus@login1 data]$ cat output.txt
> Wed Nov  7 16:23:00 PST 2007
> --
>  The OpenIB BTL failed to initialize while trying to allocate some

> locked memory.  This typically can indicate that the memlock limits 
> are set too low.  For most HPC installations, the memlock limits 
> should be set to "unlimited".  The failure occured here:
>
> Host:  n01
> OMPI source:   btl_openib.c:828
> Function:  ibv_create_cq()
> Device:mthca0
> Memlock limit: 32768
>
> You may need to consult with your system administrator to get this 
> problem fixed.  This FAQ entry on the Open MPI web site may also be
> helpful:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> --
> 
> --
>  It looks like MPI_INIT failed for some reason; your parallel 
> process is likely to abort.  There are many reasons that a parallel 
> process can fail during MPI_INIT; some of which are due to 
> configuration or environment problems.  This failure appears to be an 
> internal failure; here's some additional information (which may only 
> be relevant to an Open MPI
> developer):
>
>   PML add procs failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --
> 
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
>
>
> I have checked that ulimit is unlimited. I cannot seem to figure this.

> Any help?
> Brian Andrus perotsystems
> Site Manager | Sr. Computer Scientist
> Naval Research Lab
> 7 Grace Hopper Ave, Monterey, CA  93943 Phone (831) 656-4839 | Fax 
> (831) 656-4866 ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] openib errors as user, but not root

2007-11-07 Thread Andrus, Mr. Brian (Contractor)
I have checked those out.

I am trying to test limits. If I ssh directly to a node and check,
everything is ok:
[andrus@login1 ~]$ ssh n01 ulimit -l
unlimited

The settings in /etc/security/limits.conf are right too. 


Brian Andrus perotsystems 
Site Manager | Sr. Computer Scientist 
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866 


-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Wednesday, November 07, 2007 4:26 PM
To: Open MPI Users
Subject: Re: [OMPI users] openib errors as user, but not root

Check out:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more

In particular, see the stuff about using resource managers.



On Nov 7, 2007, at 7:22 PM, Andrus, Mr. Brian (Contractor) wrote:

> Ok, I am having some difficulty troubleshooting this.
>
> If I run my hello program without torque, it works fine:
> [root@login1 root]# mpirun --mca btl openib,self -host
> n01,n02,n03,n04,n05 /data/root/hello
> Hello from process 0 of 5 on node n01
> Hello from process 1 of 5 on node n02
> Hello from process 2 of 5 on node n03
> Hello from process 3 of 5 on node n04
> Hello from process 4 of 5 on node n05
>
> If I submit it as root, it seems happy:
> [root@login1 root]# qsub
> #!/bin/bash
> #PBS -j oe
> #PBS -l nodes=5:ppn=1
> #PBS -W x=NACCESSPOLICY:SINGLEJOB
> #PBS -N TestJob
> #PBS -q long
> #PBS -o output.txt
> #PBS -V
> cd $PBS_O_WORKDIR
> rm -f output.txt
> date
> mpirun --mca btl openib,self /data/root/hello 
> 103.cluster.default.domain
> [root@login1 root]# cat output.txt
> Wed Nov  7 16:20:33 PST 2007
> Hello from process 0 of 5 on node n05
> Hello from process 1 of 5 on node n04
> Hello from process 2 of 5 on node n03
> Hello from process 3 of 5 on node n02
> Hello from process 4 of 5 on node n01
>
> If I do it as me, not so good:
> [andrus@login1 data]$ qsub
> [andrus@login1 data]$ qsub
> #!/bin/bash
> #PBS -j oe
> #PBS -l nodes=1:ppn=1
> #PBS -W x=NACCESSPOLICY:SINGLEJOB
> #PBS -N TestJob
> #PBS -q long
> #PBS -o output.txt
> #PBS -V
> cd $PBS_O_WORKDIR
> rm -f output.txt
> date
> mpirun --mca btl openib,self /data/root/hello 
> 105.littlemac.default.domain
> [andrus@login1 data]$ cat output.txt
> Wed Nov  7 16:23:00 PST 2007
> --
>  The OpenIB BTL failed to initialize while trying to allocate some

> locked memory.  This typically can indicate that the memlock limits 
> are set too low.  For most HPC installations, the memlock limits 
> should be set to "unlimited".  The failure occured here:
>
> Host:  n01
> OMPI source:   btl_openib.c:828
> Function:  ibv_create_cq()
> Device:mthca0
> Memlock limit: 32768
>
> You may need to consult with your system administrator to get this 
> problem fixed.  This FAQ entry on the Open MPI web site may also be
> helpful:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> --
> 
> --
>  It looks like MPI_INIT failed for some reason; your parallel 
> process is likely to abort.  There are many reasons that a parallel 
> process can fail during MPI_INIT; some of which are due to 
> configuration or environment problems.  This failure appears to be an 
> internal failure; here's some additional information (which may only 
> be relevant to an Open MPI
> developer):
>
>   PML add procs failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --
> 
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
>
>
> I have checked that ulimit is unlimited. I cannot seem to figure this.

> Any help?
> Brian Andrus perotsystems
> Site Manager | Sr. Computer Scientist
> Naval Research Lab
> 7 Grace Hopper Ave, Monterey, CA  93943 Phone (831) 656-4839 | Fax 
> (831) 656-4866 ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] openib errors as user, but not root

2007-11-07 Thread Jeff Squyres

Check out:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages-more

In particular, see the stuff about using resource managers.



On Nov 7, 2007, at 7:22 PM, Andrus, Mr. Brian (Contractor) wrote:


Ok, I am having some difficulty troubleshooting this.

If I run my hello program without torque, it works fine:
[root@login1 root]# mpirun --mca btl openib,self -host  
n01,n02,n03,n04,n05 /data/root/hello

Hello from process 0 of 5 on node n01
Hello from process 1 of 5 on node n02
Hello from process 2 of 5 on node n03
Hello from process 3 of 5 on node n04
Hello from process 4 of 5 on node n05

If I submit it as root, it seems happy:
[root@login1 root]# qsub
#!/bin/bash
#PBS -j oe
#PBS -l nodes=5:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N TestJob
#PBS -q long
#PBS -o output.txt
#PBS -V
cd $PBS_O_WORKDIR
rm -f output.txt
date
mpirun --mca btl openib,self /data/root/hello
103.cluster.default.domain
[root@login1 root]# cat output.txt
Wed Nov  7 16:20:33 PST 2007
Hello from process 0 of 5 on node n05
Hello from process 1 of 5 on node n04
Hello from process 2 of 5 on node n03
Hello from process 3 of 5 on node n02
Hello from process 4 of 5 on node n01

If I do it as me, not so good:
[andrus@login1 data]$ qsub
[andrus@login1 data]$ qsub
#!/bin/bash
#PBS -j oe
#PBS -l nodes=1:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N TestJob
#PBS -q long
#PBS -o output.txt
#PBS -V
cd $PBS_O_WORKDIR
rm -f output.txt
date
mpirun --mca btl openib,self /data/root/hello
105.littlemac.default.domain
[andrus@login1 data]$ cat output.txt
Wed Nov  7 16:23:00 PST 2007
--
The OpenIB BTL failed to initialize while trying to allocate some
locked memory.  This typically can indicate that the memlock limits
are set too low.  For most HPC installations, the memlock limits
should be set to "unlimited".  The failure occured here:

Host:  n01
OMPI source:   btl_openib.c:828
Function:  ibv_create_cq()
Device:mthca0
Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--
--
It looks like MPI_INIT failed for some reason; your parallel process  
is

likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or  
environment

problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)



I have checked that ulimit is unlimited. I cannot seem to figure  
this. Any help?

Brian Andrus perotsystems
Site Manager | Sr. Computer Scientist
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] openib errors as user, but not root

2007-11-07 Thread Andrus, Mr. Brian (Contractor)
Ok, I am having some difficulty troubleshooting this.

If I run my hello program without torque, it works fine:
[root@login1 root]# mpirun --mca btl openib,self -host
n01,n02,n03,n04,n05 /data/root/hello
Hello from process 0 of 5 on node n01
Hello from process 1 of 5 on node n02
Hello from process 2 of 5 on node n03
Hello from process 3 of 5 on node n04
Hello from process 4 of 5 on node n05

If I submit it as root, it seems happy:
[root@login1 root]# qsub
#!/bin/bash
#PBS -j oe
#PBS -l nodes=5:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N TestJob
#PBS -q long
#PBS -o output.txt
#PBS -V
cd $PBS_O_WORKDIR
rm -f output.txt
date
mpirun --mca btl openib,self /data/root/hello
103.cluster.default.domain
[root@login1 root]# cat output.txt
Wed Nov  7 16:20:33 PST 2007
Hello from process 0 of 5 on node n05
Hello from process 1 of 5 on node n04
Hello from process 2 of 5 on node n03
Hello from process 3 of 5 on node n02
Hello from process 4 of 5 on node n01

If I do it as me, not so good:
[andrus@login1 data]$ qsub
[andrus@login1 data]$ qsub
#!/bin/bash
#PBS -j oe
#PBS -l nodes=1:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N TestJob
#PBS -q long
#PBS -o output.txt
#PBS -V
cd $PBS_O_WORKDIR
rm -f output.txt
date
mpirun --mca btl openib,self /data/root/hello
105.littlemac.default.domain
[andrus@login1 data]$ cat output.txt
Wed Nov  7 16:23:00 PST 2007

--
The OpenIB BTL failed to initialize while trying to allocate some
locked memory.  This typically can indicate that the memlock limits
are set too low.  For most HPC installations, the memlock limits
should be set to "unlimited".  The failure occured here:

Host:  n01
OMPI source:   btl_openib.c:828
Function:  ibv_create_cq()
Device:mthca0
Memlock limit: 32768

You may need to consult with your system administrator to get this
problem fixed.  This FAQ entry on the Open MPI web site may also be
helpful:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages

--

--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or
environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  PML add procs failed
  --> Returned "Error" (-1) instead of "Success" (0)

--
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)



I have checked that ulimit is unlimited. I cannot seem to figure this.
Any help?
Brian Andrus perotsystems 
Site Manager | Sr. Computer Scientist 
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA  93943
Phone (831) 656-4839 | Fax (831) 656-4866 




Re: [OMPI users] Double Standard Output for Non-MPI on ItaniumRunning Red Hat Enterprise Linux 4.0

2007-11-07 Thread Benjamin, Ted G.
Please understand that I'm decent at the engineering side of it.  As a
system administrator, I'm a decent engineer.

On the previous configurations, this program seems to run with any
number of processors.  I believe these successful users have been using
LAM/MPI.  While I was waiting for a reply, I installed LAM/MPI.  The
results were similar to those from OpenMPI.

While I can choose LAM/MPI, I'd prefer to port it to OpenMPI since that
is where all the development and most of the support are.

I cannot choose the Portland compiler.  I must use either GNU or Intel
compilers on the Itanium2.

Ted (more responses below)

On November 7, 2007 at 8:39 AM, Squyres, Jeff wrote:

On Nov 5, 2007, at 4:12 PM, Benjamin, Ted G. wrote:

>> I have a code that runs with both Portland and Intel
compilers
>> on X86, AMD64 and Intel EM64T running various flavors of
Linux on
>> clusters.  I am trying to port it to a 2-CPU Itanium2 (ia64)
running
>> Red Hat Enterprise Linux 4.0; it has gcc 3.4.6-8 and the
Intel
>> Fortran compiler 10.0.026 installed.  I have built Open MPI
1.2.4
>> using these compilers.
>> When I built the Open MPI, I didn't do anything special.  I
>> enabled debug, but that was really all.  Of course, you can
see that
>> in the config file that is attached.
>> This system is not part of a cluster.  The two onboard CPUs
(an
>> HP zx6000) are the only processors on which the job runs.
The code
>> must run on MPI because the source calls it.  I compiled the
target
>> software using the Fortran90 compiler (mpif90).
>> I've been running the code in the foreground so that I could
>> keep an eye on its behavior.
>> When I try to run the compiled and linked code [mpirun -np #
>> {executable file}], it performs as shown below:

>> (1) With the source compiled at optimization -O0 and -np 1,
the job
>> runs very slowly (6 days on the wall clock) to the correct
answer on
>> the benchmark;
>> (2) With the source compiled at optimization -O0 and -np 2,
the
>> benchmark job fails with a segmentation violation;

> Have you tried running your code through a memory-checking
debugger,
> and/or examining any corefiles that were generated to see if
there is
> a problem in your code?

> I will certainly not guarantee that Open MPI is bug free, but
problems
> like this are *usually* application-level issues.  One place I
always
> start is running the application in a debugger to see if you
can catch
> exactly where the Badness happens.  This can be most helpful.

I have tried to run a debugger, but I am not an expert at it.  I could
not get Intel's idb debugger to give me a prompt, but I could get a
prompt from gdb.  I've looked over the manual, but I'm not sure how to
put in the breakpoints et. al. that you geniuses use to evaluate a
program at critical junctures.  I actually used an "mpirun -np 2 dbg"
command to run it on 2 CPUs.  I attached the file at the prompt.  When I
did a run, it ran fine with no optimization and one processor.  With 2
processors, it didn't seem to do anything.  All I will say here is that
I have a lot to learn.  I'm calling on my friends for help on this.

>> (3) With the source compiled at all other optimization (-O1,
-O2, -
>> O3) and processor combinations (-np1 and -np 2), it fails in
what I
>> would call a "quiescent" manner.  What I mean by this is that
it
>> does not produce any error messages.  When I submit the job,
it
>> produces a little standard output and it quits after 2-3
seconds.

> That's fun.  Can you tell if it runs the app at all, or if it
dies before
> main() starts?  This is probably more of an issue for your
> intel support guy than us...

It's a Fortran program.  It starts in the main program.  I inserted some
PRINT*, statements of the "PRINT*,'Read the input at line 213' " variety
into the main program to see what would print.  It printed the first
four statements, but it didn't reach the last three.  The calls that
were reached were in the set-up section of the program.  The section
that wasn't reached had a lot of matrix-setting and solving subroutine
calls.

I'm going to point my Intel support person to this post and see where it
takes us.

>> In an attempt to find the problem, the technical support
agent
>> at Intel has had me run some simple "Hello" problems.
>> The first one is an MPI hello code that is the attached
>> hello_mpi.f.  This ran as expected, and it echoed one "Hello"
>> for each of the two processors.
>> The second one is a non-MPI hello that is the attached
>> hello.f90.  Since it is a non-MPI source, I was told that
running it
>> on a workstation with a properly configured MPI should only
echo on

Re: [OMPI users] Segmentation fault

2007-11-07 Thread Francesco Pietra
On Wed, Nov 07, 2007 at 07:00:31 -0800, Francesco Pietra wrote:
>
I was lucky, given my modest skill with systems. In a couple of hours the
system is OK again.

DOCK, configured for MPICH and compiled gcc, is now running parallel with
pointing to OpenMPI 1.2.3 compiled ifort/icc. Top -i shows all processors doing
their job and I waited to post until the procedure ended correctly.
Thanks
francesco



> , according to benchmatcs carried out by a number of guys) (intels are free
as
> gnu for my private use). And pointing MPICH for a program compiled gnu C
(like
> DOCK) to OpenMPI compiled intel was OK. ifort does not work if no icc is
> present, so you may understand why.

that is not true - ifort lives happily without icc - also in OpenMPI
context (and MPICH).



Karsten



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: [OMPI users] Segmentation fault

2007-11-07 Thread Karsten Bolding
On Wed, Nov 07, 2007 at 07:00:31 -0800, Francesco Pietra wrote:
> 



> , according to benchmatcs carried out by a number of guys) (intels are free as
> gnu for my private use). And pointing MPICH for a program compiled gnu C (like
> DOCK) to OpenMPI compiled intel was OK. ifort does not work if no icc is
> present, so you may understand why.

that is not true - ifort lives happily without icc - also in OpenMPI
context (and MPICH).



Karsten


-- 
--
Karsten BoldingBolding & Burchard Hydrodynamics
Strandgyden 25 Phone: +45 64422058
DK-5466 AsperupFax:   +45 64422068
DenmarkEmail: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
--


Re: [OMPI users] Segmentation fault

2007-11-07 Thread Francesco Pietra

--- Adrian Knoth  wrote:

> On Wed, Nov 07, 2007 at 08:09:14AM -0500, Jeff Squyres wrote:
> 
> > I'm not familiar with DOCK or Debian, but you will definitely have  
> 
> And last but not least,

Surely not last. My OpenMPI was intel compiled. Simple reason: Amber9, as a
Fortran program, runs faster on intels than on gnu (or any other known compiler
, according to benchmatcs carried out by a number of guys) (intels are free as
gnu for my private use). And pointing MPICH for a program compiled gnu C (like
DOCK) to OpenMPI compiled intel was OK. ifort does not work if no icc is
present, so you may understand why.

In summary, when posing a question, either there is a suggestion how to
possibly come out, or side comments are junk.

Thanks 
francesco

f.

 I'd like to point to the official Debian package
> for OMPI:
> 
>http://packages.debian.org/openmpi
> 
> 
> -- 
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> 
> private: http://adi.thur.de
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: [OMPI users] problems compiling svn-version

2007-11-07 Thread Karsten Bolding
works fine now.

In earth sciences - at least oceanography and meteorology - Fortran is
still the language of choice.

kb


On Wed, Nov 07, 2007 at 12:25:06 +0100, Adrian Knoth wrote:
> On Wed, Nov 07, 2007 at 10:41:55AM +, Karsten Bolding wrote:
> 
> > Hello
> 
> Hi!
> 
> > there is no support for Fortran - even though F77 and F90 are set as
> 
> Fortran? Who needs Fortran? ;)
> 
> Check line 151 in the Makefile. We've disabled Fortran for our developer
> builds, as we're interested in OMPI, not in Fortran.
> 
> You can simply remove the two "--disable-mpi-*" switches.
> 
> 
> HTH
> 
> -- 
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> 
> private: http://adi.thur.de
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
--
Karsten BoldingBolding & Burchard Hydrodynamics
Strandgyden 25 Phone: +45 64422058
DK-5466 AsperupFax:   +45 64422068
DenmarkEmail: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
--


Re: [OMPI users] Job does not quit even when the simulation dies

2007-11-07 Thread Ralph H Castain
As Jeff indicated, the degree of capability has improved over time - I'm not
sure which version this represents.

The type of failure also plays a major role in our ability to respond. If a
process actually segfaults or dies, we usually pick that up pretty well and
abort the rest of the job (certainly, that seems to be working pretty well
in the 1.2 series and beyond).

If an MPI communication fails, I'm not sure what the MPI layer does - I
believe it may retry for awhile, but I don't know how robust the error
handling is in that layer. Perhaps someone else could address that question.

If an actual node fails, then we don't handle that very well at all, even in
today's development version. The problem is that we need to rely on the
daemon on that node to tell us that the local procs died - if the node dies,
then the daemon can't do that, so we never know it happened.

We are working on solutions to that problem. Hopefully, we will have at
least a preliminary version in the next release.

Ralph



On 11/7/07 6:44 AM, "Jeff Squyres"  wrote:

> Support for failure scenarios is something that is getting better over
> time in Open MPI.
> 
> It looks like the version you are using either didn't properly catch
> that there was a failure and/or then cleanly exit all MPI processes.
> 
> 
> On Nov 6, 2007, at 9:01 PM, Teng Lin wrote:
> 
>> Hi,
>> 
>> 
>> Just realize I have a job run for a long time, while some of the nodes
>> already die. Is there any way to ask other nodes to quit ?
>> 
>> 
>> [kyla-0-1.local:09741] mca_btl_tcp_frag_send: writev failed with
>> errno=104
>> [kyla-0-1.local:09742] mca_btl_tcp_frag_send: writev failed with
>> errno=104
>> 
>> The FAQ does mention it is related  to :
>>  Connection reset by peer: These types of errors usually occur after
>> MPI_INIT has completed, and typically indicate that an MPI process has
>> died unexpectedly (e.g., due to a seg fault). The specific error
>> message indicates that a peer MPI process tried to write to the now-
>> dead MPI process and failed.
>> 
>> Thanks,
>> Teng
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] Job does not quit even when the simulation dies

2007-11-07 Thread Jeff Squyres
Support for failure scenarios is something that is getting better over  
time in Open MPI.


It looks like the version you are using either didn't properly catch  
that there was a failure and/or then cleanly exit all MPI processes.



On Nov 6, 2007, at 9:01 PM, Teng Lin wrote:


Hi,


Just realize I have a job run for a long time, while some of the nodes
already die. Is there any way to ask other nodes to quit ?


[kyla-0-1.local:09741] mca_btl_tcp_frag_send: writev failed with
errno=104
[kyla-0-1.local:09742] mca_btl_tcp_frag_send: writev failed with
errno=104

The FAQ does mention it is related  to :
 Connection reset by peer: These types of errors usually occur after
MPI_INIT has completed, and typically indicate that an MPI process has
died unexpectedly (e.g., due to a seg fault). The specific error
message indicates that a peer MPI process tried to write to the now-
dead MPI process and failed.

Thanks,
Teng
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Double Standard Output for Non-MPI on Itanium Running Red Hat Enterprise Linux 4.0

2007-11-07 Thread Jeff Squyres

On Nov 5, 2007, at 4:12 PM, Benjamin, Ted G. wrote:

 I have a code that runs with both Portland and Intel compilers  
on X86, AMD64 and Intel EM64T running various flavors of Linux on  
clusters.  I am trying to port it to a 2-CPU Itanium2 (ia64) running  
Red Hat Enterprise Linux 4.0; it has gcc 3.4.6-8 and the Intel  
Fortran compiler 10.0.026 installed.  I have built Open MPI 1.2.4  
using these compilers.
 When I built the Open MPI, I didn’t do anything special.  I  
enabled debug, but that was really all.  Of course, you can see that  
in the config file that is attached.
 This system is not part of a cluster.  The two onboard CPUs (an  
HP zx6000) are the only processors on which the job runs.  The code  
must run on MPI because the source calls it.  I compiled the target  
software using the Fortran90 compiler (mpif90).
 I’ve been running the code in the foreground so that I could  
keep an eye on its behavior.
 When I try to run the compiled and linked code [mpirun –np #  
{executable file}], it performs as shown below:


(1) With the source compiled at optimization –O0 and –np 1, the job  
runs very slowly (6 days on the wall clock) to the correct answer on  
the benchmark;
(2) With the source compiled at optimization –O0 and –np 2, the  
benchmark job fails with a segmentation violation;


Have you tried running your code through a memory-checking debugger,  
and/or examining any corefiles that were generated to see if there is  
a problem in your code?


I will certainly not guarantee that Open MPI is bug free, but problems  
like this are *usually* application-level issues.  One place I always  
start is running the application in a debugger to see if you can catch  
exactly where the Badness happens.  This can be most helpful.


(3) With the source compiled at all other optimization (-O1, -O2, - 
O3) and processor combinations (-np1 and -np 2), it fails in what I  
would call a “quiescent” manner.  What I mean by this is that it  
does not produce any error messages.  When I submit the job, it  
produces a little standard output and it quits after 2-3 seconds.


That's fun.  Can you tell if it runs the app at all, or if it dies  
before main() starts?  This is probably more of an issue for your  
intel support guy than us...


 In an attempt to find the problem, the technical support agent  
at Intel has had me run some simple “Hello” problems.
 The first one is an MPI hello code that is the attached  
hello_mpi.f.  This ran as expected, and it echoed one “Hello” for  
each of the two processors.
 The second one is a non-MPI hello that is the attached  
hello.f90.  Since it is a non-MPI source, I was told that running it  
on a workstation with a properly configured MPI should only echo one  
“Hello”; the Intel agent told me that two such echoes indicate a  
problem with Open MPI.  It echoed twice, so now I have come to you  
for help.


I'm not sure what you mean by that.  If you:

mpirun -np 4 hostname

where "hostname" is non-MPI program (e.g., /bin/hostname), you'll  
still see the output 4 times because you told MPI to run 4 copies of  
"hostname".  In this way, Open MPI is just being used as a job launcher.


So if I'm understanding you right,

   mpirun -np 2 my_non_mpi_f90_hello_app

should still print 2 copies of "hello".  If it does, then Open MPI is  
doing exactly what it should do.


Specifically: Open MPI's mpirun can be used to launch non-MPI  
applications (the same is not necessarily true for other MPI  
implementations).


 The other three attached files are the output requested on the  
“Getting Help” page – (1) the output of /sbin/ifconfig, (2) the  
output of ompt_info –all and (3) the config.log file.
 The installation of the Open MPI itself was as easy as could  
be.  I am really ignorant of how it works beyond what I’ve read from  
the FAQs and learned in a little digging, so I hope it’s a simple  
solution.


FWIW, I see that you're using Open MPI v1.2.  Our latest version is  
v1.2.4; if possible, you might want to try and upgrade (e.g., delete  
your prior installation, recompile/reinstall Open MPI, and then  
recompile/relink your application against the new Open MPI  
installation); it has all of our latest bug fixes, etc.


--
Jeff Squyres
Cisco Systems




Re: [OMPI users] Segmentation fault

2007-11-07 Thread Francesco Pietra
Hi Jeff:
I understand that my question was posed in extremely vague terms. Though,
pointing MPICH to the installation of OpenMPI was suggested by the author of
DOCK and it performed perfectly for a long while, until yesterday. Perhaps,
could you please instruct me how to verify beyond doubt if the "apt-get update"
has modified the version of OpenMPI that was originally installed (1.2.3)? On
its side, Debian Linux is a perfectly standard Linux.

francesco



--- Jeff Squyres  wrote:

> I'm not familiar with DOCK or Debian, but you will definitely have  
> problems if you mix-n-match MPI implementations.  Specifically, the  
> mpi.h files are not compatible between MPICH and Open MPI.
> 
> Additionally, you may run into problems if you compile your app with  
> one version of Open MPI and then run it with another.  We have not  
> [yet] done anything in terms of binary compatibility between versions.
> 
> 
> On Nov 7, 2007, at 8:05 AM, Francesco Pietra wrote:
> 
> > I wonder whether any suggestion can be offered about segmentation  
> > fault
> > occurring on running a docking program (DOCK 6.1, written in C) on  
> > Debian Linux
> > amd64 etch, i.e. dual opterons machine. Running DOCK6.1 parallel was  
> > OK until
> > yesterday. I vaguely remember that before these problems I carried  
> > out a
> >
> > apt-get upgrade
> >
> > and something was done for OpenMPI.
> >
> > DOCK 6.1 was compiled:
> >
> > ./configure gnu parallel
> > MPICH_HOME=/usr/local
> > export MPICH_HOME
> > make dock
> >
> > by pointing MPICH (for which DOCK 6.1 is configured, to my  
> > installation of
> > OpenMPI 1.2.3
> >
> > In my .bashrc:
> >
> > DOCK_HOME=/usr/local/dock6
> > PATH=$PATH:$DOCK_HOME/bib; export DOCK_HOME PATH
> >
> > MPI_HOME=/usr/local
> > export MPI_home
> >
> >
> > which mpicxx
> > /usr/local/bin/mpicxx
> >
> >
> >
> > updatedb
> > locate mpi.h
> > /usr/include/sc/util/group/memmtmpi.h
> > /usr/include/sc/util/group/messmpi.h
> > /usr/dock6/src/dock/base_mpi.h
> > /usr/local/include/mpi.h
> > /usr/local/openmpi-1.2.3/ompi/include/mpi.h
> > /usr/local/openmpi-1.2.3/ompi/include/mpi.h.in
> > /usr/local/openmpi-1.2.3/ompi/mpi/f77/prototypes_mpi.h
> > ---
> >
> > On these basis, running:
> >
> > mpirun -np 4 dock6.mpi -i dock.in -o dock.out
> >
> > the process halted with error message:
> >
> > Initialing MPI routines 
> > [deb64:03540] *** Process received signal ***
> > [deb64:03540] Signal: Segmentation fault (11)
> > [deb64:03540] Signal code: Address not mapped (1)
> > [deb64:03540] Failing at address: 0x2b9ef5691000
> > dock6.mpi[3540]: segfault at 2b9ef5691 rip 00447b1b  
> > rsp
> > 7fff43c137b0 error 6
> > [deb64:03540] [0] /lib/libthread.so.0 [0x2b9e681bc410]
> > [deb64:03540] [1] dock6.mpi (_ZN60rient12match_ligandER7DOCKMol+0x40b)
> > [0x447b1b]
> > [deb64:03540] [2] dock6.mpi (main+0xaf5) [0x42cc75]
> > [deb64:03540] [3] dock6.mpi /lib/libc.so.6(__libc_start_main+0xda)
> > [0x2b9e682e14ca]
> > [deb64:03540] [4] dock6.mpi (__gxx_personality_v0+0xc2) [0x41b4ea]
> > [deb64:03540] *** End of error message ***
> > mpirun noticed that jpb rank 0 with PID 3537 on node deb64 exited on  
> > signal 15
> > (Terminated).
> > 3 additional processes aborted (not shown)
> >
> >
> > Thanks
> > francesco pietra
> >
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: [OMPI users] Segmentation fault

2007-11-07 Thread Adrian Knoth
On Wed, Nov 07, 2007 at 08:09:14AM -0500, Jeff Squyres wrote:

> I'm not familiar with DOCK or Debian, but you will definitely have  

And last but not least, I'd like to point to the official Debian package
for OMPI:

   http://packages.debian.org/openmpi


-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI users] Segmentation fault

2007-11-07 Thread Jeff Squyres
I'm not familiar with DOCK or Debian, but you will definitely have  
problems if you mix-n-match MPI implementations.  Specifically, the  
mpi.h files are not compatible between MPICH and Open MPI.


Additionally, you may run into problems if you compile your app with  
one version of Open MPI and then run it with another.  We have not  
[yet] done anything in terms of binary compatibility between versions.



On Nov 7, 2007, at 8:05 AM, Francesco Pietra wrote:

I wonder whether any suggestion can be offered about segmentation  
fault
occurring on running a docking program (DOCK 6.1, written in C) on  
Debian Linux
amd64 etch, i.e. dual opterons machine. Running DOCK6.1 parallel was  
OK until
yesterday. I vaguely remember that before these problems I carried  
out a


apt-get upgrade

and something was done for OpenMPI.

DOCK 6.1 was compiled:

./configure gnu parallel
MPICH_HOME=/usr/local
export MPICH_HOME
make dock

by pointing MPICH (for which DOCK 6.1 is configured, to my  
installation of

OpenMPI 1.2.3

In my .bashrc:

DOCK_HOME=/usr/local/dock6
PATH=$PATH:$DOCK_HOME/bib; export DOCK_HOME PATH

MPI_HOME=/usr/local
export MPI_home


which mpicxx
/usr/local/bin/mpicxx



updatedb
locate mpi.h
/usr/include/sc/util/group/memmtmpi.h
/usr/include/sc/util/group/messmpi.h
/usr/dock6/src/dock/base_mpi.h
/usr/local/include/mpi.h
/usr/local/openmpi-1.2.3/ompi/include/mpi.h
/usr/local/openmpi-1.2.3/ompi/include/mpi.h.in
/usr/local/openmpi-1.2.3/ompi/mpi/f77/prototypes_mpi.h
---

On these basis, running:

mpirun -np 4 dock6.mpi -i dock.in -o dock.out

the process halted with error message:

Initialing MPI routines 
[deb64:03540] *** Process received signal ***
[deb64:03540] Signal: Segmentation fault (11)
[deb64:03540] Signal code: Address not mapped (1)
[deb64:03540] Failing at address: 0x2b9ef5691000
dock6.mpi[3540]: segfault at 2b9ef5691 rip 00447b1b  
rsp

7fff43c137b0 error 6
[deb64:03540] [0] /lib/libthread.so.0 [0x2b9e681bc410]
[deb64:03540] [1] dock6.mpi (_ZN60rient12match_ligandER7DOCKMol+0x40b)
[0x447b1b]
[deb64:03540] [2] dock6.mpi (main+0xaf5) [0x42cc75]
[deb64:03540] [3] dock6.mpi /lib/libc.so.6(__libc_start_main+0xda)
[0x2b9e682e14ca]
[deb64:03540] [4] dock6.mpi (__gxx_personality_v0+0xc2) [0x41b4ea]
[deb64:03540] *** End of error message ***
mpirun noticed that jpb rank 0 with PID 3537 on node deb64 exited on  
signal 15

(Terminated).
3 additional processes aborted (not shown)


Thanks
francesco pietra

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



[OMPI users] Segmentation fault

2007-11-07 Thread Francesco Pietra
I wonder whether any suggestion can be offered about segmentation fault
occurring on running a docking program (DOCK 6.1, written in C) on Debian Linux
amd64 etch, i.e. dual opterons machine. Running DOCK6.1 parallel was OK until
yesterday. I vaguely remember that before these problems I carried out a

apt-get upgrade

and something was done for OpenMPI. 

DOCK 6.1 was compiled:

./configure gnu parallel
MPICH_HOME=/usr/local
export MPICH_HOME
make dock

by pointing MPICH (for which DOCK 6.1 is configured, to my installation of
OpenMPI 1.2.3

In my .bashrc:

DOCK_HOME=/usr/local/dock6
PATH=$PATH:$DOCK_HOME/bib; export DOCK_HOME PATH

MPI_HOME=/usr/local
export MPI_home


which mpicxx
/usr/local/bin/mpicxx



updatedb
locate mpi.h
/usr/include/sc/util/group/memmtmpi.h
/usr/include/sc/util/group/messmpi.h
/usr/dock6/src/dock/base_mpi.h
/usr/local/include/mpi.h
/usr/local/openmpi-1.2.3/ompi/include/mpi.h
/usr/local/openmpi-1.2.3/ompi/include/mpi.h.in
/usr/local/openmpi-1.2.3/ompi/mpi/f77/prototypes_mpi.h
---

On these basis, running:

mpirun -np 4 dock6.mpi -i dock.in -o dock.out

the process halted with error message:

Initialing MPI routines 
[deb64:03540] *** Process received signal ***
[deb64:03540] Signal: Segmentation fault (11)
[deb64:03540] Signal code: Address not mapped (1)
[deb64:03540] Failing at address: 0x2b9ef5691000
dock6.mpi[3540]: segfault at 2b9ef5691 rip 00447b1b rsp
7fff43c137b0 error 6
[deb64:03540] [0] /lib/libthread.so.0 [0x2b9e681bc410]
[deb64:03540] [1] dock6.mpi (_ZN60rient12match_ligandER7DOCKMol+0x40b)
[0x447b1b]
[deb64:03540] [2] dock6.mpi (main+0xaf5) [0x42cc75]
[deb64:03540] [3] dock6.mpi /lib/libc.so.6(__libc_start_main+0xda)
[0x2b9e682e14ca]
[deb64:03540] [4] dock6.mpi (__gxx_personality_v0+0xc2) [0x41b4ea]
[deb64:03540] *** End of error message ***
mpirun noticed that jpb rank 0 with PID 3537 on node deb64 exited on signal 15
(Terminated).
3 additional processes aborted (not shown)


Thanks
francesco pietra

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: [OMPI users] problems compiling svn-version

2007-11-07 Thread Adrian Knoth
On Wed, Nov 07, 2007 at 10:41:55AM +, Karsten Bolding wrote:

> Hello

Hi!

> there is no support for Fortran - even though F77 and F90 are set as

Fortran? Who needs Fortran? ;)

Check line 151 in the Makefile. We've disabled Fortran for our developer
builds, as we're interested in OMPI, not in Fortran.

You can simply remove the two "--disable-mpi-*" switches.


HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de


Re: [OMPI users] problems compiling svn-version

2007-11-07 Thread Karsten Bolding
Hello

On Wed, Nov 07, 2007 at 11:03:56AM +0100, Adrian Knoth wrote:
> On Wed, Nov 07, 2007 at 09:45:24AM +, Karsten Bolding wrote:


> 
> Place the attached Makefile as i.e. /tmp/my-ompi/Makefile, get the svn
> snapshot into /tmp/my-ompi/ompi and just run "make" in /tmp/my-ompi/.
> 
> Over here, it looks like this:
> 
> adi@ipc654:/var/tmp/meta-ompi/trunk$ ls
> Makefile  Rakefile  cc.build.job  cunit  ompi  test  tool  unittests
> 
> You don't need to care about the other files, just to outline where to
> place the OMPI source.
> 
> 
> You might want to change CONFIGURE_FLAGS in the Makefile, you'd probably
> comment out the debug line and go for the second variant.

half working - I can compile and I get an installed version. However,
there is no support for Fortran - even though F77 and F90 are set as
environment varibles (both set to ifort).

> 
> HTH
> 
> -- 
> Cluster and Metacomputing Working Group
> Friedrich-Schiller-Universität Jena, Germany
> 

kb

-- 
--
Karsten BoldingBolding & Burchard Hydrodynamics
Strandgyden 25 Phone: +45 64422058
DK-5466 AsperupFax:   +45 64422068
DenmarkEmail: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
--


Re: [OMPI users] problems compiling svn-version

2007-11-07 Thread Adrian Knoth
On Wed, Nov 07, 2007 at 09:45:24AM +, Karsten Bolding wrote:

> Hello

Hi!

> Are there any known issues with ubuntus version of libtool. When I run

Libtool is always an issue ;) To circumvent this, we have a Makefile
fetching the right versions, compiling the whole autotools chain,
prepends the new PATH and then compiles OMPI.

Place the attached Makefile as i.e. /tmp/my-ompi/Makefile, get the svn
snapshot into /tmp/my-ompi/ompi and just run "make" in /tmp/my-ompi/.

Over here, it looks like this:

adi@ipc654:/var/tmp/meta-ompi/trunk$ ls
Makefile  Rakefile  cc.build.job  cunit  ompi  test  tool  unittests

You don't need to care about the other files, just to outline where to
place the OMPI source.


You might want to change CONFIGURE_FLAGS in the Makefile, you'd probably
comment out the debug line and go for the second variant.



HTH

-- 
Cluster and Metacomputing Working Group
Friedrich-Schiller-Universität Jena, Germany

private: http://adi.thur.de
# Meta-Makefile to build OpenMPI.
#
# (c) Christian Kauhaus 
# -- $Id: Makefile 3640 2007-10-09 14:23:11Z ckauhaus $

SHELL = bash

# 
# Configuration section
#
ARCH := $(shell uname -m -s | tr ' ' '-')
# but compile as 32bit even on amd64
ifeq ($(ARCH),Linux-x86_64)
OMPI_CFLAGS = -m32
OMPI_LDFLAGS = -Wl,-melf_i386
endif
# BuildBot does not set PWD on directory changes
PWD := $(shell pwd | tr -d '\n')
OMPI = $(PWD)/ompi
# *_INSTALL_PREFIX are initially set to the same default, but may be overridden
# individually.
OMPI_INSTALL_PREFIX = $(PWD)/$(ARCH)
TOOLS_INSTALL_PREFIX = $(PWD)/$(ARCH)
BUILD_DIR = $(PWD)/build/$(ARCH)
OMPI_BUILD_DIR = $(BUILD_DIR)/ompi
TOOLS_BUILD_DIR = $(BUILD_DIR)/autotools
TOOLS = $(TOOLS_INSTALL_PREFIX)/bin/autoconf \
  $(TOOLS_INSTALL_PREFIX)/bin/automake \
  $(TOOLS_INSTALL_PREFIX)/bin/libtool
# CONFIGURE_FLAGS are appended to OMPI's ./configure (besides ignore-Fortran)
CONFIGURE_FLAGS = --enable-debug --enable-trace --enable-static --disable-dlopen
#CONFIGURE_FLAGS = --with-platform=optimized --enable-static --disable-dlopen
CONFIGURE_FLAGS := $(CONFIGURE_FLAGS) CFLAGS=$(OMPI_CFLAGS) 
CXXFLAGS=$(OMPI_CFLAGS) LDFLAGS=$(OMPI_LDFLAGS)

# use our own auto* tools
PATH := $(DESTDIR)$(OMPI_INSTALL_PREFIX)/bin:$(TOOLS_INSTALL_PREFIX)/bin:$(PATH)

ifeq ($(findstring curl, $(shell which curl)),curl)
WGET = curl
else
WGET = wget -q --output-document=- 
endif


#
# Get required versions from distribution script
#
GETVERSION = $(shell grep "^$(1)_" $(OMPI)/contrib/dist/make_dist_tarball | sed 
-e 's/.*=//g')
LT_VERSION = $(call GETVERSION,LT)
AM_VERSION = $(call GETVERSION,AM)
AC_VERSION = $(call GETVERSION,AC)


#
# main targets
#
all: openmpi

.PHONY: test
test: openmpi
ulimit -u unlimited; umask 022; rake test


#
# Toolchain
#
.PHONY: tools
tools: $(TOOLS_BUILD_DIR) $(TOOLS)

$(TOOLS_BUILD_DIR):
mkdir -p $@

# build GNU libtool
$(TOOLS_INSTALL_PREFIX)/bin/libtool: $(TOOLS_BUILD_DIR)/libtool/Makefile \
  $(TOOLS_INSTALL_PREFIX)/bin/automake
cd $(dir $<) && umask 022 && $(MAKE) && $(MAKE) install

$(TOOLS_BUILD_DIR)/libtool/Makefile: $(TOOLS_BUILD_DIR)/libtool-$(LT_VERSION)/*
mkdir -p $(dir $@)
cd $(dir $@) && $(dir $<)configure --prefix=$(TOOLS_INSTALL_PREFIX) 

# build GNU automake
$(TOOLS_INSTALL_PREFIX)/bin/automake: $(TOOLS_BUILD_DIR)/automake/Makefile \
  $(TOOLS_INSTALL_PREFIX)/bin/autoconf
cd $(dir $<) && umask 022 && $(MAKE) && $(MAKE) install

$(TOOLS_BUILD_DIR)/automake/Makefile: 
$(TOOLS_BUILD_DIR)/automake-$(AM_VERSION)/*
mkdir -p $(dir $@)
cd $(dir $@) && $(dir $<)configure --prefix=$(TOOLS_INSTALL_PREFIX) 

# build GNU autoconf
$(TOOLS_INSTALL_PREFIX)/bin/autoconf: $(TOOLS_BUILD_DIR)/autoconf/Makefile
cd $(dir $<) && umask 022 && $(MAKE) && $(MAKE) install

$(TOOLS_BUILD_DIR)/autoconf/Makefile: 
$(TOOLS_BUILD_DIR)/autoconf-$(AC_VERSION)/*
mkdir -p $(dir $@)
cd $(dir $@) && $(dir $<)configure --prefix=$(TOOLS_INSTALL_PREFIX) 

# the download magic
$(TOOLS_BUILD_DIR)/autoconf-$(AC_VERSION)/*:
$(WGET) \
   ftp://ftp.gnu.org/pub/gnu/autoconf/autoconf-$(AC_VERSION).tar.gz |\
   (cd $(TOOLS_BUILD_DIR) && gzip -dc | tar xf -)

$(TOOLS_BUILD_DIR)/automake-$(AM_VERSION)/*:
$(WGET) \
   ftp://ftp.gnu.org/pub/gnu/automake/automake-$(AM_VERSION).tar.gz |\
   (cd $(TOOLS_BUILD_DIR) && gzip -dc | tar xf -)

LT_URL=$(if $(findstring 2.1a, $(LT_VERSION)), \
 http://www.open-mpi.org/svn/libtool.tar.gz, \
 ftp://ftp.gnu.org/pub/gnu/libtool/libtool-$(LT_VERSION).tar.gz)

$(TOOLS_BUILD_DIR)/libtool-$(LT_VERSION)/*:
$(WGET) \
   $(LT_URL) |\
   (cd $(TOOLS_BUILD_DIR) && gzip -dc | tar xf -)


# 
# build OpenMPI
#
.PHONY: openmpi compile install reinstall delete_install
openmpi: install
compile: $(OMPI_BUILD_DIR)
install: $(DESTDIR)$(OMPI_INSTALL_PREFIX)
reinstall: compile delete_install install

delete_install:
rm -rf $(DESTDIR)$(OMPI_INST

[OMPI users] problems compiling svn-version

2007-11-07 Thread Karsten Bolding

Hello


As it seems I need a feature only present in the svn-version of OpenMPI
I'm in the process of installing and compiling this version.

I've tried on two different machines.

1) debian everything worked OK.
autoconf 2.61-4
automake 1:1.10+nogfdl-1
libtool  1.5.24-1
ifort Version 10.0

2) ubuntu (single processor/quad-core)
autoconf 2.61-4
automake 1:1.10+nogfdl-1
libtool  1.5.24-1ubuntu1
ifort Version 10.0

make[2]: Entering directory
`/data/kb/compile/openmpi-svn/orte/tools/orteboot'
/bin/sh ../../../libtool --tag=CC   --mode=link gcc  -g -Wall -Wundef
-Wno-long-long -Wsign-compare -Wmissing-prototypes -Wstrict-prototypes
-Wcomment -pedantic -Werror-implicit-function-declaration
-finline-functions -fno-strict-aliasing -pthread  -export-dynamic   -o
orteboot orteboot.o ../../../orte/libopen-rte.la  -lnsl -lutil  -lm 
gcc -g -Wall -Wundef -Wno-long-long -Wsign-compare -Wmissing-prototypes
-Wstrict-prototypes -Wcomment -pedantic
-Werror-implicit-function-declaration -finline-functions
-fno-strict-aliasing -pthread -o .libs/orteboot orteboot.o
-Wl,--export-dynamic  ../../../orte/.libs/libopen-rte.so -lnsl -lutil
-lm  -Wl,--rpath -Wl,/opt/openmpi-svn/lib
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_sys_limits'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_cr_finalize'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_cr_set_enabled'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_path_access'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_crs_base_extract_expected_component'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_crs_base_state_str'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_mutex_check_locks'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_progress_set_yield_when_idle'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_cr_init'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_progress_set_event_flag'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_crs_base_snapshot_t_class'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_cr_reg_coord_callback'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_cr_output'
../../../orte/.libs/libopen-rte.so: undefined reference to
`opal_get_num_processors'
collect2: ld returned 1 exit status
make[2]: *** [orteboot] Error 1


If I do:
strings orte/.libs/libopen-rte.so.0.0.0 | grep opal_get_num_processors
I get:
opal_get_num_processors

Are there any known issues with ubuntus version of libtool. When I run
./autogen.sh 
I get this:

[Running] autoheader
** Adjusting libtool for OMPI :-(
   ++ patching for pathscale multi-line output (LT 1.5.x)
[Running] autoconf
[Running] libtoolize --automake --copy --ltdl
   -- Moving libltdl to opal/
** Adjusting libltdl for OMPI :-(
   ++ patching for argz bugfix in libtool 1.5
  -- your libtool doesn't need this! yay!
   ++ patching 64-bit OS X bug in ltmain.sh
  -- your libtool doesn't need this! yay!
   ++ RTLD_GLOBAL in libltdl
  -- your libltdl doesn't need this! yay!


I don't get that on machine 1.

I tried to copy orte/.libs/libopen-rte.so from 1 to 2 without luck.

kb

-- 
--
Karsten BoldingBolding & Burchard Hydrodynamics
Strandgyden 25 Phone: +45 64422058
DK-5466 AsperupFax:   +45 64422068
DenmarkEmail: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
--


Re: [OMPI users] machinefile and rank

2007-11-07 Thread Sharon Melamed
Yes, this feature is currently in the SVN.
You can use the syntax in:
https://svn.open-mpi.org/trac/ompi/ticket/1023
Currently the process affinity doesn't work but the ranks are running
on the machines as specify in the hostfile.

Currently Ralph is working on removing the new syntax from the hostfile
And together we will implement it on anew config file.

Sharon.

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Karsten Bolding
Sent: Wednesday, November 07, 2007 9:40 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] machinefile and rank

On Tue, Nov 06, 2007 at 09:22:50 -0500, Jeff Squyres wrote:
> Unfortunately, not yet.  I believe that this kind of functionality is

> slated for the v1.3 series -- is that right Ralph/Voltaire?

thats a pity since performance of the setup is horrible if I can't
control the order. 

the svn code will develop into v1.3 - right? Is the feature already in
svn?

kb

-- 
--
Karsten BoldingBolding & Burchard Hydrodynamics
Strandgyden 25 Phone: +45 64422058
DK-5466 AsperupFax:   +45 64422068
DenmarkEmail: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
--
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] mpicc Segmentation Fault with Intel Compiler

2007-11-07 Thread Michael Schulz

On 06.11.2007, at 10:42, Åke Sandgren wrote:

Hi,


On Tue, 2007-11-06 at 10:28 +0100, Michael Schulz wrote:

Hi,

I've the same problem described by some other users, that I can't
compile anything if I'm using the open-mpi compiled with the Intel-
Compiler.


ompi_info --all

Segmentation fault

OpenSUSE 10.3
Kernel: 2.6.22.9-0.4-default
Intel P4

Configure-Flags: CC=icc, CXX=icpc, F77=ifort, F90=ifort

Intel-Compiler: both, C and Fortran 10.0.025

Is there any known solution?


I had the same problem with pathscale.
Try this, i think it is the solution i found.

diff -ru site/opal/runtime/opal_init.c
amd64_ubuntu606-psc/opal/runtime/opal_init.c
--- site/opal/runtime/opal_init.c   2007-10-20 03:00:35.0
+0200
+++ amd64_ubuntu606-psc/opal/runtime/opal_init.c2007-10-23
16:12:15.0 +0200
@@ -169,7 +169,7 @@
 }

 /* register params for opal */
-if (OPAL_SUCCESS !=  opal_register_params()) {
+if (OPAL_SUCCESS !=  (ret = opal_register_params())) {
 error = "opal_register_params";
 goto return_error;
 }


thanks, but this doesn't solve my segv Problem.

Michael




Re: [OMPI users] machinefile and rank

2007-11-07 Thread Karsten Bolding
On Tue, Nov 06, 2007 at 09:22:50 -0500, Jeff Squyres wrote:
> Unfortunately, not yet.  I believe that this kind of functionality is  
> slated for the v1.3 series -- is that right Ralph/Voltaire?

thats a pity since performance of the setup is horrible if I can't
control the order. 

the svn code will develop into v1.3 - right? Is the feature already in
svn?

kb

-- 
--
Karsten BoldingBolding & Burchard Hydrodynamics
Strandgyden 25 Phone: +45 64422058
DK-5466 AsperupFax:   +45 64422068
DenmarkEmail: kars...@bolding-burchard.com

http://www.findvej.dk/Strandgyden25,5466,11,3
--


Re: [OMPI users] mpicc Segmentation Fault with Intel Compiler

2007-11-07 Thread Åke Sandgren
On Tue, 2007-11-06 at 20:49 -0500, Jeff Squyres wrote:
> On Nov 6, 2007, at 4:42 AM, Åke Sandgren wrote:
> 
> > I had the same problem with pathscale.
> 
> There is a known outstanding problem with the pathscale problem.  I am  
> still waiting for a solution from their engineers (we don't know yet  
> whether it's an OMPI issue or a Pathscale issue, but my [biased] money  
> is on a Pathscale issue :-) -- it doesn't happen with any other  
> compiler).
> 
> > Try this, i think it is the solution i found.
> >
> > diff -ru site/opal/runtime/opal_init.c
> > amd64_ubuntu606-psc/opal/runtime/opal_init.c
> > --- site/opal/runtime/opal_init.c   2007-10-20 03:00:35.0
> > +0200
> > +++ amd64_ubuntu606-psc/opal/runtime/opal_init.c2007-10-23
> > 16:12:15.0 +0200
> > @@ -169,7 +169,7 @@
> > }
> >
> > /* register params for opal */
> > -if (OPAL_SUCCESS !=  opal_register_params()) {
> > +if (OPAL_SUCCESS !=  (ret = opal_register_params())) {
> > error = "opal_register_params";
> > goto return_error;
> > }
> 
> I don't see why this change would make any difference in terms of a  
> segv...?
> 
> I see that ret is an uninitialized variable in the error case (which  
> I'll fix -- thanks for pointing it out :-) ) -- but I don't see how  
> that would fix a segv.  Am I missing something?

The problem is that i don't really remember what fixed my problem (or if
it got interrupted before i managed to fix it in the first place).
I have been busy building other software for a couple of weeks.
The above was simply the only patch i hade made that i didn't know
exactly what it was doing.

But judging from trying to run that version of ompi_info i still have
problems.

I've been working with this for a while and can hopefully continue
pursuing it next week or so.



Re: [OMPI users] machinefile and rank

2007-11-07 Thread Gleb Natapov
On Tue, Nov 06, 2007 at 09:22:50PM -0500, Jeff Squyres wrote:
> Unfortunately, not yet.  I believe that this kind of functionality is  
> slated for the v1.3 series -- is that right Ralph/Voltaire?
> 
Yes, the file format will be different, but arbitrary mapping will be
possible.

> 
> On Nov 5, 2007, at 11:22 AM, Karsten Bolding wrote:
> 
> > Hello
> >
> > I'm using a machinefile like:
> > n03
> > n04
> > n03
> > n03
> > n03
> > n02
> > n01
> > ..
> > ..
> > ..
> >
> > the order of the entries is determined by an external program for load
> > balancing reasons. When the job is started the ranks do not correspond
> > to entries in the machinefile. Is there a way to force that entry  
> > one in
> > the machinefile gets rank=0, sencond entry gets rank=1 etc.
> >
> >
> > Karsten
> >
> >
> > -- 
> > --
> > Karsten BoldingBolding & Burchard Hydrodynamics
> > Strandgyden 25 Phone: +45 64422058
> > DK-5466 AsperupFax:   +45 64422068
> > DenmarkEmail: kars...@bolding-burchard.com
> >
> > http://www.findvej.dk/Strandgyden25,5466,11,3
> > --
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Gleb.