Re: [OMPI users] Open MPI 2009 released

2009-04-01 Thread Damien Hocking
Outstanding. I'll have two. Damien George Bosilca wrote: The Open MPI Team, representing a consortium of bailed-out banks, car manufacturers, and insurance companies, is pleased to announce the release of the "unbreakable" / bug-free version Open MPI 2009, (expected to be available by

[OMPI users] Open MPI 2009 released

2009-04-01 Thread George Bosilca
The Open MPI Team, representing a consortium of bailed-out banks, car manufacturers, and insurance companies, is pleased to announce the release of the "unbreakable" / bug-free version Open MPI 2009, (expected to be available by mid-2011). This release is essentially a complete rewrite of Open

Re: [OMPI users] Cannot build OpenMPI 1.3 with PGI pgf90 and Gnu gcc/g++.

2009-04-01 Thread Jeff Squyres
On Mar 31, 2009, at 4:21 PM, Gus Correa wrote: Please, correct my argument below if I am wrong. I am not sure yet if the problem is caused by libtool, because somehow it was not present in OpenMPI 1.2.8. Just as a comparison, the libtool commands on 1.2.8 and 1.3 are very similar, although

Re: [OMPI users] job runs with mpirun on a node but not if submitted via Torque.

2009-04-01 Thread Rahul Nabar
On Wed, Apr 1, 2009 at 1:13 AM, Ralph Castain wrote: > So I gather that by "direct" you mean that you don't get an allocation from > Maui before running the job, but for the other you do? Otherwise, OMPI > should detect the that it is running under Torque and automatically use the

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread Josh Hursey
On Apr 1, 2009, at 12:42 PM, Dave Love wrote: Josh Hursey writes: The configure flag that you are looking for is: --with-ft=cr Is there a good reason why --with-blcr doesn't imply it? Not really. Though it is most likely difficult to make it happen given the

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Dave Love
Rolf Vandevaart writes: > No, orte_leave_session_attached is needed to avoid the errno=2 errors > from the sm btl. (It is fixed in 1.3.2 and trunk) [It does cause other trouble, but I forget what the exact behaviour was when I lost it as a default.] >> Yes, but there's

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread Dave Love
Josh Hursey writes: > The configure flag that you are looking for is: > --with-ft=cr Is there a good reason why --with-blcr doesn't imply it? > You may also want to consider using the thread options too for > improved C/R response: > --enable-mpi-threads

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread PN
Thanks. $ cat hpl-8cpu-test.sge #!/bin/bash # #$ -N HPL_8cpu_GB #$ -pe orte 8 #$ -cwd #$ -j y #$ -S /bin/bash #$ -V # /opt/openmpi-gcc/bin/mpirun --display-allocation --display-map -v -np $NSLOTS --host node0001,node0002 hostname $ cat HPL_8cpu_GB.o46 == ALLOCATED NODES

[OMPI users] mpirun: symbol lookup error: /usr/local/lib/openmpi/mca_plm_lsf.so: undefined symbol: ls b_init

2009-04-01 Thread Alessandro Surace
Hi guys, I try to repost my question... I've a problem with the last stable build and the last nightly snapshot. When I run a job directly with mpirun no problem. If I try to submit it with lsf: bsub -a openmpi -m grid01 mpirun.lsf /mnt/ewd/mpi/fibonacci/fibonacci_mpi I get the follow error:

Re: [OMPI users] mpirun interaction with pbsdsh

2009-04-01 Thread Ralph Castain
Ick is the proper response. :-) The old 1.2 series would attempt to spawn a local orted on each of those nodes, and that is what is failing. Best guess is that it is because pbsdsh doesn't fully replicate a key part of the environment that is expected. One thing you could try is do this

[OMPI users] mpirun interaction with pbsdsh

2009-04-01 Thread Brock Palen
Ok this is weird, and the correct answer is probably "don't do that", Anyway: User wants to run many many small jobs, faster than our scheduler +torque can start, he uses pbsdsh to start them in parallel, under tm. pbsdsh bash -c 'cd $PBS_O_WORKDIR/$PBS_VNODENUM; mpirun -np 1 application'

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Ralph Castain
Rolf has correctly reminded me that display-allocation occurs prior to host filtering, so you will see all of the allocated nodes. You'll see the impact of the host specifications in display-map, Sorry for the confusion - thanks to Rolf for pointing it out. Ralph On Apr 1, 2009, at 7:40 AM,

Re: [OMPI users] Strange Net problem

2009-04-01 Thread Gabriele Fatigati
Hi Ralph, unfortunately, in this machine i can't upgrade OpenMPI at the moment. Is there a way to limit or to reduce the probability of this error? 2009/4/1 Ralph Castain : > Hi Gabriele > > I don't think this is a timeout issue. OMPI 1.2.x doesn't scale very well to > that size

Re: [OMPI users] Strange Net problem

2009-04-01 Thread Ralph Castain
Hi Gabriele I don't think this is a timeout issue. OMPI 1.2.x doesn't scale very well to that size due to a requirement that the underlying out-of-band system fully connect at the TCP level. Thus, every process in your job will be opening 2002 sockets (one to every other process, one to

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Ralph Castain
As an FYI: you can debug allocation issues more easily by: mpirun --display-allocation --do-not-launch -n 1 foo This will read the allocation, do whatever host filtering you specify with -host and -hostfile options, report out the result, and then terminate without trying to launch

Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Ralph Castain
There is indeed a heartbeat mechanism you can use - it is "off" by default. You can set it to check every N seconds with: -mca orte_heartbeat_rate N on your command line. Or if you want it to always run, add "orte_heartbeat_rate = N" to your default MCA param file. OMPI will declare the

Re: [OMPI users] Strange behaviour of SGE+OpenMPI

2009-04-01 Thread Rolf Vandevaart
It turns out that the use of --host and --hostfile act as a filter of which nodes to run on when you are running under SGE. So, listing them several times does not affect where the processes land. However, this still does not explain why you are seeing what you are seeing. One thing you can

Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Guanyinzhu
I mean killed the orted deamon process during the mpi job running , but the mpi job hang and could't notice one of it's rank failed. > Date: Wed, 1 Apr 2009 19:09:34 +0800 > From: ml.jgmben...@mailsnare.net > To: us...@open-mpi.org > Subject: Re: [OMPI users] Beginner's question: how to

[OMPI users] Can't find libsvml in the execution

2009-04-01 Thread Marce
Hi all, I have compiled OpenMPI 1.2.7 with Intel Compilers (icc and ifort) in a cluster with Centos 4.7. It was ok, but when I try to launch an execution, mpirun can't find some libraries. When I check the linked libraries in the nodes, the output was: [marce@nodo1 ~]$ ldd

Re: [OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Jerome BENOIT
Is there a firewall somewhere ? Jerome Guanyinzhu wrote: Hi! I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on Redhat Linux x86_64. I run a test like this: just killed the orted process and the job hung for a long time (hang for 2~3 hours then I killed the job). I

[OMPI users] Strange Net problem

2009-04-01 Thread Gabriele Fatigati
Dear OpenMPI developers, m i have a strange problem during running my application ( 2000 processors). I'm using openmpi 1.2.22 over Infiniband. The follow is the mca-params.conf: btl = ^tcp btl_tcp_if_exclude = eth0,ib0,ib1 oob_tcp_include = eth1,lo,eth0 btl_openib_warn_default_gid_prefix = 0

[OMPI users] Beginner's question: how to avoid a running mpi job hang if host or network failed or orted deamon killed?

2009-04-01 Thread Guanyinzhu
Hi! I'm using OpenMPI 1.3 on ten nodes connected with Gigabit Ethernet on Redhat Linux x86_64. I run a test like this: just killed the orted process and the job hung for a long time (hang for 2~3 hours then I killed the job). I have the follow questions: when network failed or

Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem

2009-04-01 Thread M C
Hi Josh, Yep, adding that "--with-ft=cr" flag did the trick. Thanks. Cheers, m > From: jjhur...@open-mpi.org > To: us...@open-mpi.org > Date: Tue, 31 Mar 2009 15:48:05 -0400 > Subject: Re: [OMPI users] OpenMPI 1.3.1 + BLCR build problem > > I think that the missing configure option might be

Re: [OMPI users] job runs with mpirun on a node but not if submitted via Torque.

2009-04-01 Thread Ralph Castain
The difference you are seeing here indicates that the "direct" run is using the rsh launcher, while the other run is using the Torque launcher. So I gather that by "direct" you mean that you don't get an allocation from Maui before running the job, but for the other you do? Otherwise,

Re: [OMPI users] job runs with mpirun on a node but not if submitted via Torque.

2009-04-01 Thread Rahul Nabar
2009/3/31 Ralph Castain : > I have no idea why your processes are crashing when run via Torque - are you > sure that the processes themselves crash? Are they segfaulting - if so, can > you use gdb to find out where? I have to admit I'm a newbiee with gdb. I am trying to recompile