Re: [OMPI users] Backwards compatibility?

2009-07-23 Thread David Doria
On Thu, Jul 23, 2009 at 5:47 PM, Ralph Castain wrote: > I doubt those two would work together - however, a combination of 1.3.2 and > 1.3.3 should. > > You might look at the ABI compatibility discussion threads (there have been > several) on this list for the reasons. Basically,

Re: [OMPI users] Interaction of MPI_Send and MPI_Barrier

2009-07-23 Thread Richard Treumann
No - it is not guaranteed. (it is highly probable though) The return from the MPI_Send only guarantees that the data is safely held somewhere other than the send buffer so you are free to modify the send buffer. The MPI standard does not say where the data is to be held. It only says that once

Re: [OMPI users] Receiving an unknown number of messages

2009-07-23 Thread Eugene Loh
Shaun Jackman wrote: Eugene Loh wrote: Shaun Jackman wrote: For my MPI application, each process reads a file and for each line sends a message (MPI_Send) to one of the other processes determined by the contents of that line. Each process posts a single MPI_Irecv and uses

[OMPI users] Interaction of MPI_Send and MPI_Barrier

2009-07-23 Thread Shaun Jackman
Hi, Two processes run the following program: request = MPI_Irecv MPI_Send (to the other process) MPI_Barrier flag = MPI_Test(request) Without the barrier, there's a race and MPI_Test may or may not return true, indicating whether the message has been received. With the barrier, is it

[OMPI users] Open MPI:Problem with 64-bit openMPI and intel compiler

2009-07-23 Thread Sims, James S. Dr.
I have an OpenMPI program compiled with a version of OpenMPI built using the ifort 10.1 compiler. I can compile and run this code with no problem, using the 32 bit version of ifort. And I can also submit batch jobs using torque with this 32-bit code. However, compiling the same code to produce

Re: [OMPI users] Receiving an unknown number of messages

2009-07-23 Thread Shaun Jackman
Eugene Loh wrote: Shaun Jackman wrote: For my MPI application, each process reads a file and for each line sends a message (MPI_Send) to one of the other processes determined by the contents of that line. Each process posts a single MPI_Irecv and uses MPI_Request_get_status to test for a

Re: [OMPI users] Open-MPI-1.3.2 compatibility with old torque?

2009-07-23 Thread Song, Kai Song
Hi Ralph, Thanks for the fast reply! I put the --display-allocation and --display-map flags on and it looks like the nodes allocation is just fine, but the job still hang. The output looks like this: /home/kaisong/test node0001 node0001 node node Starting parallel job

[OMPI users] TCP btl misbehaves if btl_tcp_port_min_v4 is not set.

2009-07-23 Thread Eric Thibodeau
Hello all, (this _might_ be related to https://svn.open-mpi.org/trac/ompi/ticket/1505) I just compiled and installed 1.3.3 ins a CentOS 5 environment and we noticed the processes would deadlock as soon as they would start using TCP communications. The test program is one that has been

Re: [OMPI users] Profiling performance by forcing transport choice.

2009-07-23 Thread Eugene Loh
Nifty Tom Mitchell wrote: On Thu, Jun 25, 2009 at 08:37:21PM -0400, Jeff Squyres wrote: Subject: Re: [OMPI users] 50%performance reduction due to OpenMPI v 1.3.2forcing allMPI traffic over Ethernet instead of using Infiniband While the previous thread on "performance

Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Craig Tierney
Rolf Vandevaart wrote: > I think what you are looking for is this: > > --mca plm_rsh_disable_qrsh 1 > > This means we will disable the use of qrsh and use rsh or ssh instead. > > The --mca pls ^sge does not work anymore for two reasons. First, the > "pls" framework was renamed "plm".

Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
> You don't specify and based on your description I infer that you are not > using a batch/queueing system, but just a rsh/ssh based start-up mechanism. You are absolutely correct. I am using rsh/ssh based start-up mechanism. A batch/queueing system might be able to tell you whether a remote

Re: [OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Rolf Vandevaart
I think what you are looking for is this: --mca plm_rsh_disable_qrsh 1 This means we will disable the use of qrsh and use rsh or ssh instead. The --mca pls ^sge does not work anymore for two reasons. First, the "pls" framework was renamed "plm". Secondly, the gridgengine plm was folded

Re: [OMPI users] Network connection check

2009-07-23 Thread Durga Choudhury
The 'system' command will fork a separate process to run. If I remember correctly, forking within MPI can lead to undefined behavior. Can someone in OpenMPI development team clarify? What I don't understand is: why is your TCP network so unstable that you are worried about reachability? For MPI

[OMPI users] Problem launching jobs in SGE (with loose integration), OpenMPI 1.3.3

2009-07-23 Thread Craig Tierney
I have built OpenMPI 1.3.3 without support for SGE. I just want to launch jobs with loose integration right now. Here is how I configured it: ./configure CC=pgcc CXX=pgCC F77=pgf90 F90=pgf90 FC=pgf90 --prefix=/opt/openmpi/1.3.3-pgi --without-sge --enable-io-romio

Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
Thank you all Jeff, Jody, Prentice and Bogdan for your invaluable clarification, solution and suggestion, Open MPI should return a failure if TCP connectivity is lost, even with a > non-blocking point-to-point operation. The failure should be returned in > the call to MPI_TEST (and friends).

Re: [OMPI users] Network connection check

2009-07-23 Thread Bogdan Costescu
On Thu, 23 Jul 2009, vipin kumar wrote: 1: Slave machine is reachable or not, (How I will do that ??? Given - I have IP address and Host Name of Slave machine.) 2: if reachable, check whether program(orted and "slaveprocess") is alive or not. You don't specify and based on your

Re: [OMPI users] Network connection check

2009-07-23 Thread Prentice Bisbal
Jeff Squyres wrote: > On Jul 22, 2009, at 10:05 AM, vipin kumar wrote: > >> Actually requirement is how a C/C++ program running in "master" node >> should find out whether "slave" node is reachable (as we check this >> using "ping" command) or not ? Because IP address may change at any >> time,

Re: [OMPI users] Network connection check

2009-07-23 Thread jody
Maybe you could make a system call to ping the other machine. char sCommand[512]; // build the command string sprintf(sCommand, "ping -c %d -q %s > /dev/null", numPings, sHostName); // execute the command int iResult =system(sCommand); If the ping was successful, iResult will

Re: [OMPI users] Warning: declaration ‘struct MPI::Grequest_intercept_t’ does not declare anything

2009-07-23 Thread Jeff Squyres
On Jul 22, 2009, at 3:17 AM, Alexey Sokolov wrote: from /home/user/NetBeansProjects/Correlation_orig/ Correlation/Correlation.cpp:2: /usr/include/openmpi/1.2.4-gcc/openmpi/ompi/mpi/cxx/ request_inln.h:347: warning: declaration ‘struct MPI::Grequest_intercept_t’ does not

Re: [OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"

2009-07-23 Thread Igor Kozin
Hi Gus, I played with collectives a few months ago. Details are here http://www.cse.scitech.ac.uk/disco/publications/WorkingNotes.ConnectX.pdf That was in the context of 1.2.6 You can get available tuning options by doing ompi_info -all -mca coll_tuned_use_dynamic_rules 1 | grep alltoall and

Re: [OMPI users] Network connection check

2009-07-23 Thread Jeff Squyres
On Jul 23, 2009, at 7:36 AM, vipin kumar wrote: I can't use blocking communication routines in my main program ( "masterprocess") because any type of network failure( may be due to physical connectivity or TCP connectivity or MPI connection as you told) may occur. So I am using non

Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Jeff Squyres
FWIW, for the Fortran MPI programmers out there, the MPI Forum is hard at work on a new Fortran 03 set of bindings for MPI-3. We have a prototype in a side branch of Open MPI that is "mostly" working. We (the MPI Forum) expect to release a short document describing the new features and

Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-23 Thread Jeff Squyres
On Jul 23, 2009, at 6:39 AM, Dave Love wrote: > The MPI ABI has not changed since 1.3.2. Good, thanks. I hadn't had time to investigate the items in the release notes that looked suspicious. Are there actually any known ABI incompatibilities between 1.3.0 and 1.3.2? We haven't noticed

Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
On Thu, Jul 23, 2009 at 3:03 PM, Ralph Castain wrote: > It depends on which network fails. If you lose all TCP connectivity, Open > MPI should abort the job as the out-of-band system will detect the loss of > connection. If you only lose the MPI connection (whether TCP or some

Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Dave Love
Jeff Squyres writes: > I *think* that there are compiler flags that you can use with ifort to > make it behave similarly to gfortran in terms of sizes and constant > values, etc. At a slight tangent, if there are flags that might be helpful to add to gfortran for

Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread Dave Love
Jeff Squyres writes: > See https://svn.open-mpi.org/source/xref/ompi_1.3/README#257. Ah, neat. I'd never thought of that, possibly due to ELF not being relevant when I first started worrying about that sort of thing. > Indeed. In OMPI, we tried to make this as simple as

Re: [OMPI users] [Open MPI Announce] Open MPI v1.3.3 released

2009-07-23 Thread Dave Love
Jeff Squyres writes: > The MPI ABI has not changed since 1.3.2. Good, thanks. I hadn't had time to investigate the items in the release notes that looked suspicious. Are there actually any known ABI incompatibilities between 1.3.0 and 1.3.2? We haven't noticed any as far

Re: [OMPI users] Network connection check

2009-07-23 Thread Ralph Castain
It depends on which network fails. If you lose all TCP connectivity, Open MPI should abort the job as the out-of-band system will detect the loss of connection. If you only lose the MPI connection (whether TCP or some other interconnect), then I believe the system will eventually generate

Re: [OMPI users] ifort and gfortran module

2009-07-23 Thread rahmani
Hi Martin in your following solution I have a question: in step2. move the Fortran module to the directory ... what is "Fortran module" in step3. we don't need to install openmpi? thanks - Original Message - From: "Martin Siegert" To: "Open MPI Users"

Re: [OMPI users] Network connection check

2009-07-23 Thread vipin kumar
> > Are you asking to find out this information before issuing "mpirun"? Open > MPI does assume that the nodes you are trying to use are reachable. > > NO, Scenario is a pair of processes are running one in "master" node say "masterprocess" and one in "slave" node say "slaveprocess". When

[OMPI users] Tuned collectives: How to choose them dynamically? (-mca coll_tuned_dynamic_rules_filename dyn_rules)"

2009-07-23 Thread Gus Correa
Dear OpenMPI experts I would like to experiment with the OpenMPI tuned collectives, hoping to improve the performance of some programs we run in production mode. However, I could not find any documentation on how to select the different collective algorithms and other parameters. In particular,