Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread TERRY DONTJE
good idea anyway. It hurts nothing, takes milliseconds to do, and guarantees nothing got left behind (e.g., if someone was using a debug version of OMPI and directed opal_output to a file). On Nov 4, 2011, at 4:43 AM, TERRY DONTJE wrote: David, are you saying your jobs consistently leave behind

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread TERRY DONTJE
On 11/23/2011 2:02 PM, Paul Kapinos wrote: Hello Ralph, hello all, Two news, as usual a good and a bad one. The good: we believe to find out *why* it hangs The bad: it seem for me, this is a bug or at least undocumented feature of Open MPI /1.5.x. In detail: As said, we see mystery hang-ups

Re: [OMPI users] Deadlock at MPI_FInalize

2011-11-28 Thread TERRY DONTJE
Are all the other processes gone? What version of OMPI are you using? On 11/28/2011 9:00 AM, Mudassar Majeed wrote: Dear people, In my MPI application, all the processes call the MPI_Finalize (all processes reach there) but the rank 0 process could not finish with MPI_F

Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-15 Thread TERRY DONTJE
IIRC, RNR's are usually due to the receiving side not having a segment registered and ready to receive data on a QP. The btl does go through a big dance and does its own flow control to make sure this doesn't happen. So when this happens are both the sending and receiving nodes using mthca's

Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-19 Thread TERRY DONTJE
w. On Thu, Dec 15, 2011, at 07:00 AM, TERRY DONTJE wrote: IIRC, RNR's are usually due to the receiving side not having a segment registered and ready to receive data on a QP. The btl does go through a big dance and does its own flow control to make sure this doesn't happen. So when this

Re: [OMPI users] using MPI_Recv in two different threads.

2012-01-11 Thread TERRY DONTJE
I am a little confused by your problem statement. Are you saying you want to have each MPI process to have multiple threads that can call MPI concurrently? If so you'll want to read up on the MPI_Init_thread function. --td On 1/11/2012 7:19 AM, Hamilton Fischer wrote: Hi, I'm actually usin

Re: [OMPI users] Openmpi SGE and BLACS

2012-01-13 Thread TERRY DONTJE
Do you have a stack of where exactly things are seg faulting in blacs_pinfo? --td On 1/13/2012 8:12 AM, Conn ORourke wrote: Dear Openmpi Users, I am reserving several processors with SGE upon which I want to run a number of openmpi jobs, all of which individually (and combined) use less tha

Re: [OMPI users] localhost only

2012-01-17 Thread TERRY DONTJE
Is there a way to set up an interface analogous to Unix's loopback? I suspect setting "-mca btl self,sm" wouldn't help since this is probably happening while the processes are bootstrapping. --td On 1/16/2012 7:26 PM, Ralph Castain wrote: The problem is that OMPI is looking for a tcp port fo

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread TERRY DONTJE
ompi_info should tell you the current version of Open MPI your path is pointing to. Are you sure your path is pointing to the area that the OpenFOAM package delivered Open MPI into? --td On 1/27/2012 5:02 AM, Brett Tully wrote: Interesting. In the same set of updates, I installed OpenFOAM from

Re: [OMPI users] Strange OpenMPI messages

2012-02-15 Thread TERRY DONTJE
Do you get any interfaces shown when you run "ibstat" on any of the nodes your job is spawned on? --td On 2/15/2012 1:27 AM, Tohiko Looka wrote: Mm... This is really strange I don't have that service and there is no ib* output in 'ifconfig -a' or 'Infinband' in 'lspci' Which makes me believe

Re: [OMPI users] HRM problem

2012-04-24 Thread TERRY DONTJE
To determine if an MPI process is waiting for a message do what Rayson suggested and attach a debugger to the processes and see if any of them are stuck in MPI. Either internally in a MPI_Recv or MPI_Wait call or looping on a MPI_Test call. Other things to consider. Is this the first time y

Re: [OMPI users] HRM problem

2012-04-24 Thread TERRY DONTJE
and Infiniband, * --td ** On Tue, Apr 24, 2012 at 3:02 PM, TERRY DONTJE <mailto:terry.don...@oracle.com>> wrote: To determine if an MPI process is waiting for a message do what Rayson suggested and attach a debugger to the processes and see if any of them are stuck in MPI

Re: [OMPI users] MPI doesn't recognize multiple cores available on multicore machines

2012-04-26 Thread TERRY DONTJE
On 4/25/2012 1:00 PM, Jeff Squyres wrote: On Apr 25, 2012, at 12:51 PM, Ralph Castain wrote: Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to see the output of that so we can understand what it thinks the topology is like as this underpins the binding code. The

Re: [OMPI users] MPI over tcp

2012-05-04 Thread TERRY DONTJE
On 5/4/2012 8:26 AM, Rolf vandeVaart wrote: 2. If that works, then you can also run with a debug switch to see what connections are being made by MPI. You can see the connections being made in the attached log: [archimedes:29820] btl: tcp: attempting to connect() to [[60576,1],2] address 13

Re: [OMPI users] MPI over tcp

2012-05-04 Thread TERRY DONTJE
On 5/4/2012 1:17 PM, Don Armstrong wrote: On Fri, 04 May 2012, Rolf vandeVaart wrote: On Behalf Of Don Armstrong On Thu, 03 May 2012, Rolf vandeVaart wrote: 2. If that works, then you can also run with a debug switch to see what connections are being made by MPI. You can see the connections

Re: [OMPI users] Regarding the execution time calculation

2012-05-08 Thread TERRY DONTJE
On 5/7/2012 8:40 PM, Jeff Squyres (jsquyres) wrote: On May 7, 2012, at 8:31 PM, Jingcha Joba wrote: So in the above stated example, end-start will be: + 20ms ? (time slice of P2 + P3 = 20ms) More or less (there's nonzero amount of time required for the kernel scheduler, and the time quantu

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread TERRY DONTJE
BTW, the changes prior to r26496 failed some of the MTT test runs on several systems. So if the current implementation is deemed not "correct" I suspect we will need to figure out if there are changes to the tests that need to be done. See http://www.open-mpi.org/mtt/index.php?do_redir=2066 f

Re: [OMPI users] problem with sctp.h on Solaris

2012-06-05 Thread TERRY DONTJE
This looks like a missing check in the sctp configure.m4. I am working on a patch. --td On 6/5/2012 10:10 AM, Siegmar Gross wrote: Hello, I compiled "openmpi-1.6" on "Solaris 10 sparc" and "Solaris 10 x86" with "gcc-4.6.2" and "Sun C 5.12". Today I searched my log-files for "WARNING" and fou

Re: [OMPI users] "-library=stlport4" neccessary for Sun C

2012-06-06 Thread TERRY DONTJE
On 6/6/2012 4:38 AM, Siegmar Gross wrote: Hello, I compiled "openmpi-1.6" on "Solaris 10 sparc", "Solaris 10 x86", and Linux (openSuSE 12.1) with "Sun C 5.12". Today I searched my log-files for "WARNING" and found the following message. WARNING: ***

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE
Can you get on one of the nodes and see the job's processes? If so can you then attach a debugger to it and get a stack? I wonder if the processes are stuck in MPI_Init? --td On 6/7/2012 6:06 AM, Duke wrote: Hi again, Somehow the verbose flag (-v) did not work for me. I tried --debug-daem

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE
Another sanity think to try is see if you can run your test program on just one of the nodes? If that works more than likely MPI is having issues setting up connections between the nodes. --td On 6/7/2012 6:06 AM, Duke wrote: Hi again, Somehow the verbose flag (-v) did not work for me. I tr

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE
his being a firewall issue is something to look into. --td On 6/7/2012 6:36 AM, Duke wrote: On 6/7/12 5:31 PM, TERRY DONTJE wrote: Can you get on one of the nodes and see the job's processes? If so can you then attach a debugger to it and get a stack? I wonder if the processes are st

Re: [OMPI users] MPI_Comm_spawn and exit of parent process.

2012-06-18 Thread TERRY DONTJE
On 6/16/2012 8:03 AM, Roland Schulz wrote: Hi, I would like to start a single process without mpirun and then use MPI_Comm_spawn to start up as many processes as required. I don't want the parent process to take up any resources, so I tried to disconnect the inter communicator and then finali

Re: [OMPI users] sndlib problem by mpicc compiler

2012-07-30 Thread TERRY DONTJE
I am not sure I am understanding the problem correctly so let me describe it back to you with a couple clarifications. So your program using sf_open compiles successfully when using gcc and mpicc. However, when you run the executable compiled using mpicc sndFile is null? If the above is rig

Re: [OMPI users] sndlib problem by mpicc compiler

2012-07-30 Thread TERRY DONTJE
sing to the compiler. This should give you an idea the difference between your gcc and mpicc compilation. I would suspect either mpicc is using a compiler significantly different than gcc or that mpicc might be passing some optimization parameter that is messing the code execution (just a gu

Re: [OMPI users] sndlib problem by mpicc compiler

2012-07-30 Thread TERRY DONTJE
I have to make this program. Please have a look in a picture from link below, maybe it will be more clear. http://vipjg.nazwa.pl/sndfile_error.png 2012/7/30 TERRY DONTJE: On 7/30/2012 6:11 AM, Paweł Jaromin wrote: Hello Thanks for fast answer, but the problem looks a little different.

Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-30 Thread TERRY DONTJE
Do you know what r# of 1.6 you were trying to compile? Is this via the tarball or svn? thanks, --td On 7/30/2012 9:41 AM, Daniel Junglas wrote: Hi, I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine. Compilation and installation worked without a problem. However, when trying to ru

Re: [OMPI users] OpenMPI Giving problems when using -mca btl mx, , sm, self

2007-09-28 Thread Terry Dontje
Hi Hammad, It looks to me like none of the btl's could resolve a route between the node that process rank 0 is on to the other nodes. I would suggest trying np=2 over a couple pairs of machines to see if that works and you can truly be sure that only the first node is having this problem. It

Re: [OMPI users] core from today

2007-11-13 Thread Terry Dontje
Marcin, A couple questions: What OS are you running on? Did you run this job oversubscribed, that is more processes than there are cpus? I've found with oversubscribed jobs that the recursive calls to opal_progress by the SM BTL that the yield within opal_progress (intending to give up the c

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-03 Thread Terry Dontje
What kind of system lib errors are you seeing and do you have a stack trace? Note, I was trying something similar with Solaris and 64-bit on a SPARC machine and was seeing segv's inside the MPI Library due to a pointer being passed through an integer (thus dropping the upper 32 bits). Funny t

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Terry Dontje
e on SPARC except the address is smaller on the former. Greg, I would be interested in knowing if you are still seeing the problem on Leopard and whether the above setting helps any. --td * *Subject:* Re: [OMPI users] ScaLapack and BLACS on Leopard *From:* Terry Dontje (/Terry.Dontje_at_[h

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-07 Thread Terry Dontje
WHATMPI. Sorry for the misinformation, --td Terry Dontje wrote: Ok, I think I found the cause of the SPARC segv when trying to use a 64-bit compiled Open MPI library. If one does not set the WHATMPI variable in the Bmake.inc it defaults to UseF77Mpi which assumes all handles are ints. This is a

Re: [OMPI users] Problem with Sun Fortran Install with OpenMPI

2008-04-21 Thread Terry Dontje
drop another email with the results and I'll see what I can do. hth, Terry Dontje Sun Microsystems, Inc. Date: Mon, 21 Apr 2008 11:01:50 -0400 From: cfdm...@aim.com Subject: [OMPI users] Problem with Sun Fortran Install with OpenMPI To: us...@open-mpi.org Message-ID: <8ca71d7c32c3

Re: [OMPI users] Problem with Sun Fortran Install with OpenMPI

2008-04-22 Thread Terry Dontje
eciate the help, Rob -Original Message----- From: Terry Dontje To: us...@open-mpi.org Sent: Mon, 21 Apr 2008 11:36 am Subject: Re: [OMPI users] Problem with Sun Fortran Install with OpenMPI Looking at the gcc.error attachment that looks to be the one of the problems talked about

Re: [OMPI users] Communicators in Fortran and C

2008-06-05 Thread Terry Dontje
You can translate the communicator from Fortran to C using the MPI_COMM_F2C routine. --td Message: 4 Date: Thu, 05 Jun 2008 08:53:55 +0200 From: Samuel Sarholz Subject: [OMPI users] Communicators in Fortran and C To: us...@open-mpi.org Message-ID: <48478d83.6080...@rz.rwth-aachen.de> Content-T

Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

2008-07-29 Thread Terry Dontje
I have not tested this type of setup so the following disclaimer needs to be said. These are not exactly the same release number. They are close but their code could have something in them that makes them incompatible. One idea comes to mind is whether the two nodes are on the same subnet? If

Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

2008-07-29 Thread Terry Dontje
ly out of the hands of the TCP stack and lower). Alexander Shabarshin P.S. Between Linuxes I even tried different versions of OpenMPI 1.2.4 and 1.2.5 - these versions work together correctly, but not with ClusterTools... Are the linuxes boxes on the same subnet? --td - Original Message - F

Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

2008-07-29 Thread Terry Dontje
Date: Tue, 29 Jul 2008 14:19:14 -0400 From: "Alexander Shabarshin" Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools To: Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=resp

Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

2008-07-29 Thread Terry Dontje
Terry Dontje wrote: Date: Tue, 29 Jul 2008 14:19:14 -0400 From: "Alexander Shabarshin" Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools To: Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin> Content-Type: text/plain; format=flowed; cha

Re: [OMPI users] Communitcation between OpenMPI and ClusterTools

2008-07-30 Thread Terry Dontje
really do expect this to work. --td Terry Dontje wrote: Terry Dontje wrote: Date: Tue, 29 Jul 2008 14:19:14 -0400 From: "Alexander Shabarshin" Subject: Re: [OMPI users] Communitcation between OpenMPI and ClusterTools To: Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin&

Re: [OMPI users] SM btl slows down bandwidth?

2008-08-14 Thread Terry Dontje
Interestingly enough on the SPARC platform the Solaris memcpy's actually use non-temporal stores for copies >= 64KB. By default some of the mca parameters to the sm BTL stop at 32KB. I've done experimentations of bumping the sm segment sizes to above 64K and seen incredible speedup on our M90

Re: [OMPI users] SM btl slows down bandwidth?

2008-08-16 Thread Terry Dontje
); different compiler authors have chosen to implement different optimizations that work well in different applications. So yes, you may well see different run-time performance with different compilers depending on your application and/or MPI implementations. Some compilers may ha

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-17 Thread Terry Dontje
Sofia, I took your program and actually ran it successfully on my systems using Open MPI r19400. A couple questions: 1. Have you tried to run the program on a single node? mpirun -np 2 --host 10.4.5.123 --prefix /usr/local ./PruebaSumaParalela.out 2. Can you try and run the code the

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-17 Thread Terry Dontje
Date: Wed, 17 Sep 2008 16:23:59 +0200 From: "Sofia Aparicio Secanellas" Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv To: "Open MPI Users" Message-ID: <0625EEFB84E04647A1930A963A8DF7E3@aparicio1> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=r

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-18 Thread Terry Dontje
It might also be interesting to see the result of "ifconfig -a" on both of your machines. --td Date: Thu, 18 Sep 2008 10:19:37 +0200 From: "Sofia Aparicio Secanellas" Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv To: "Open MPI Users" Message-ID: Content-Type: text/plain; form

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-18 Thread Terry Dontje
Turns out you debugged the mpirun I was actually wanting you to attach to your program, PruebaSumaParalela.out, on both nodes and dump each of their stacks. Is there a reason why you are using 1.2.2 instead of 1.2.7 or something from the 1.3 branch? I am wondering if maybe there is some sort

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-19 Thread Terry Dontje
Hello Sofia, Ok, so I really wanted the stack of when you run with "-mca mpi_preconnect_all 1" I believe you'll see that one of the processes will be in init. However, the stack still probably will not help me help you. What needs to happen is to step through the code in dbx while the conn

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-19 Thread Terry Dontje
Hello Sofia, After further reflection I wonder if you have a firewall that is preventing connections to certain ports. --td Terry Dontje wrote: Hello Sofia, Ok, so I really wanted the stack of when you run with "-mca mpi_preconnect_all 1" I believe you'll see that one o

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-23 Thread Terry Dontje
ocal -mca btl self,tcp -mca btl_tcp_if_include eth1 ./PruebaSumaParalela.out I enclose you the results. Thank you. Sofia - Original Message - From: "Terry Dontje" To: Sent: Friday, September 19, 2008 7:54 PM Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv > Hello Sofia, >

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-23 Thread Terry Dontje
Hello Sofia, After talking with another OMPI member can you humor me and do "/sbin/iptables -L" on both your machines. You'll need to be root to do such. --td List-Post: users@lists.open-mpi.org Date: Tue, 23 Sep 2008 06:02:30 -0400 From: Terry Dontje Subject: Re: [OMPI user

Re: [OMPI users] Problem with MPI_Send and MPI_Recv

2008-09-23 Thread Terry Dontje
Hello Sofia, Very puzzling indeed. Can your try to run hostname or uptime with mpirun? That is something like: mpirun -np 2 --host 10.1.10.208,10.1.10.240 --mca mpi_preconnect_all 1 --prefix /usr/local -mca btl self,tcp -mca btl_tcp_if_include eth1 hostname --td List-Post: users@lists.o

Re: [OMPI users] OMPI link error with petsc 2.3.3

2008-10-07 Thread Terry Dontje
Yann, I'll take a look at this it looks like there definitely is an issue between our libmpi.so and libmpi_f90.so files. I noticed that the linkage message is a warning does the code actually fail when running? --td List-Post: users@lists.open-mpi.org Date: Tue, 07 Oct 2008 16:55:14 +0200 Fr

[OMPI users] OMPI link error with petsc 2.3.3

2008-10-07 Thread Terry Dontje
Yann, How were you trying to link your code with PETSc? Did you use mpif90 or mpif77 wrappers or were you using cc or mpicc wrappers? I ran some basic tests that test the usage of MPI_STATUS_IGNORE using mpif90 (and mpif77) and it works fine. However I was able to generate a similar error as y

Re: [OMPI users] OMPI link error with petsc 2.3.3

2008-10-08 Thread Terry Dontje
t/SUNWhpc/HPC8.0/lib/amd64/libmpi.so value=0x8; file /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi_f90.so value=0x14); /opt/SUNWhpc/HPC8.0/lib/amd64/libmpi.so definition taken /usr/bin/rm -f solv_ksp.o Thanks for your help, Yann Terry Dontje wrote: > Yann, > > How were you trying to link yo

Re: [OMPI users] OMPI link error with petsc 2.3.3

2008-10-08 Thread Terry Dontje
Yann, Well, when you use f90 to link it passed the linker the -t option which is described in the manpage with the following: Turns off the warning for multiply-defined symbols that have different sizes or different alignments. That's why :-) To your original question should y

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Terry Dontje
We've found on certain applications binding to processors can have up to a 2x difference. ScaliMPI automatically binds processes by socket so if you are not running a one process per cpu job each process will land on a different socket. OMPI defaults to not binding at all. You may want to tr

Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI

2009-08-05 Thread Terry Dontje
A comment to the below. I meant the 2x performance was for shared memory communications. --td Message: 3 Date: Wed, 05 Aug 2009 09:55:42 -0400 From: Terry Dontje Subject: Re: [OMPI users] Performance difference on OpenMPI, IntelMPI and ScaliMPI To: us...@open-mpi.org Message-ID

Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-07 Thread Terry Dontje
Craig, Did your affinity script bind the processes per socket or linearly to cores. If the former you'll want to look at using rankfiles and place the ranks based on sockets. TWe have found this especially useful if you are not running fully subscribed on your machines. Also, if you think t

Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-07 Thread Terry Dontje
Hi Neeraj, Were there specific collectives that were slower? Also what kind of cluster were you running on? How many nodes and cores per node? thanks, --td Message: 3 Date: Fri, 7 Aug 2009 16:51:05 +0530 From: nee...@crlindia.com Subject: Re: [OMPI users] Performance question about OpenMPI

Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB

2009-08-07 Thread Terry Dontje
Date: Fri, 07 Aug 2009 07:12:45 -0600 From: Craig Tierney Subject: Re: [OMPI users] Performance question about OpenMPI and MVAPICH2 on IB To: Open MPI Users Message-ID: <4a7c284d.3040...@noaa.gov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Terry Dontje

Re: [OMPI users] Fortran Library Problem using openMPI

2009-10-30 Thread Terry Dontje
A copy of the configure line for Open MPI would be helpful. Which Intel compiler are you using, version and bitness. Can you do file on libmpi_f77.so? Also, are you sure that /usr/local/lib is where you installed you Open MPI build and that isn't something latent? --td Date: Fri, 30 Oct 2

Re: [OMPI users] Fortran Library Problem using openMPI

2009-10-30 Thread Terry Dontje
Georg, I think your problem is you are using a ia32 (32 bit compiler) with a 64 bit built library. Either you need to use the intel64 compiler or build Open MPI with the 32 bit library. --td *Subject:* Re: [OMPI users] Fortran Library Problem using openMPI *From:* Georg A. Reichstein (/rei

Re: [OMPI users] Fortran Library Problem using openMPI

2009-10-30 Thread Terry Dontje
Let me try this one more time. Your application is being built with a 32 bit compiler ia32. However the Open MPI libraries look to be built with the 64 bit compiler intel64. One or the other needs to change. --td Terry Dontje wrote: Georg, I think your problem is you are using a ia32

Re: [OMPI users] Fortran Library Problem using openMPI

2009-10-30 Thread Terry Dontje
Also, is the configure line you are giving below the application configure line. I was actually asking for the Open MPI configure line. --td Terry Dontje wrote: Let me try this one more time. Your application is being built with a 32 bit compiler ia32. However the Open MPI libraries look

Re: [OMPI users] Runtime error while running mpirun

2009-10-30 Thread Terry Dontje
Hi Basant, I am not familiar with Windows builds of Open MPI. However, can you see if you Open MPI build actually created a mca_paffinity_window's dll? I could imagine the issue might be that the dll is not finding a needed dependency. Under Windows is there a command similar to Unix's ldd

Re: [OMPI users] Help: Firewall problems

2009-11-05 Thread Terry Dontje
Technically MPI Spec may not put a requirement on TCP/IP, however Open MPI's runtime environment needs some way to launch jobs and pass data around in a standard way and it currently uses TCP/IP. That being said there have been rumblings for some time to use other protocols but that has not ye

Re: [OMPI users] exceedingly virtual memory consumption of MPI, environment if higher-setting "ulimit -s"

2009-11-19 Thread Terry Dontje
A couple things to note. First Sun MPI 8.2.1 is effectively OMPI 1.3.4. I also reproduced the below issue using a C code so I think this is a general issue with OMPI and not Fortran based. I did a pmap of a process and there were two anon spaces equal to the stack space set by ulimit. In o

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-11 Thread Terry Dontje
Date: Thu, 10 Dec 2009 17:57:27 -0500 From: Jeff Squyres On Dec 10, 2009, at 5:53 PM, Gus Correa wrote: > How does the efficiency of loopback > (let's say, over TCP and over IB) compare with "sm"? Definitely not as good; that's why we have sm. :-) I don't have any quantificatio

Re: [OMPI users] totalview and message queue, empty windows

2010-02-02 Thread Terry Dontje
Hi DevL, what compiler and options are you using to build OMPI. I am seeing something similar (Warning messages and the Message Queue window having bizarre values) when building with the Pathscale compiler but I don't see this with SunStudio, gcc, Intel or PGI. However, I do see pending recei

Re: [OMPI users] totalview and message queue, empty windows

2010-02-04 Thread Terry Dontje
e the debugging symbols. Unfortunately, I still haven't tracked down Ashley's issue which I think probably has more to do with the OMPI code instead of the debugging information not being generated. --td Terry Dontje wrote: Hi DevL, what compiler and options are you using to build OMPI

Re: [OMPI users] Anybody built a working 1.4.1 on Solaris 8 (Sparc)?

2010-02-05 Thread Terry Dontje
We haven't tried Solaris 8 in quite some time. However, for your first issue did you include the --enable-heterogeneous option on your configure command? Since you are mix IA-32 and SPARC nodes you'll want to include this so the endian issue doesn't bite you. --td Message: 5 Date: Thu, 04

Re: [OMPI users] Anybody built a working 1.4.1 on Solaris 8, (Sparc)?

2010-02-09 Thread Terry Dontje
Date: Fri, 05 Feb 2010 16:16:29 -0800 From: "David Mathog" > We haven't tried Solaris 8 in quite some time. However, for your first > issue did you include the --enable-heterogeneous option on your > configure command? > > Since you are mix IA-32 and SPARC nodes you'll want to include this s

Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale

2010-02-10 Thread Terry Dontje
Jeff Squyres wrote: Iain did the genius for the new assembly. Iain -- can you respond? Iain is on vacation right now so he probably want be able to respond until next week. --td On Feb 9, 2010, at 5:44 PM, Mostyn Lewis wrote: The old opal_atomic_cmpset_32 worked: static inline int

Re: [OMPI users] Segmentation fault when Send/Recv, onheterogeneouscluster (32/64 bit machines)

2010-03-08 Thread Terry Dontje
We (Oracle) have not done that much extensive limits testing going between 32 to 64bit applications. Most of the testing we've done is more around endianess (SPARC vs x86_64). Though the below is kind of interesting. Sounds like the eager limit isn't being normalized on the 64 bit machines.

Re: [OMPI users] users Digest, Vol 1546, Issue 2

2010-04-19 Thread Terry Dontje
FWIW, I took your code compiled it on a linux system using OMPI 1.4 r22761 and Solaris Studio C compilers. Then I ran it with "mpirun -np 4 a.out" and it seems to work for me: Hello MPI World From process 0: Num processes: 4 Hello MPI World from process 1! Hello MPI World from process 2! Hello

Re: [OMPI users] Fwd: Open MPI v1.4 cant find default

2010-04-19 Thread Terry Dontje
Sorry, reposting this under the correct subject. FWIW, I took your code compiled it on a linux system using OMPI 1.4 r22761 and Solaris Studio C compilers. Then I ran it with "mpirun -np 4 a.out" and it seems to work for me: Hello MPI World From process 0: Num processes: 4 Hello MPI World from p

Re: [OMPI users] 'readv failed: Connection timed out' issue

2010-04-20 Thread Terry Dontje
Hi Jonathan, Do you know what the top level function is or communication pattern? Is it some type of collective or a pattern that has a many to one. What might be happening is that since OMPI uses a lazy connections by default if all processes are trying to establish communications to the same

Re: [OMPI users] mpirun -np 4 hello_world; on a eight processor shared memory machine produces wrong output

2010-04-23 Thread Terry Dontje
This looks like you are using an mpirun or mpiexec from mvapich to run an executable compiled with OMPI. Can you make sure that you are using the right mpirun? --td Pankatz, Klaus wrote: Yes, I did that. It ist basically the same problem with a Fortran version of this little program. With

Re: [OMPI users] mpirun -np 4 hello_world; on a eight processor shared memory machine produces wrong output

2010-04-23 Thread Terry Dontje
rs/pankatz/OPENmpi/bin/mpirun which is the right one. Von: users-boun...@open-mpi.org [users-boun...@open-mpi.org] im Auftrag von Terry Dontje [terry.don...@oracle.com] Gesendet: Freitag, 23. April 2010 14:29 An: Open MPI Users Betreff: Re: [OMPI users] mp

Re: [OMPI users] deadlock when calling MPI_gatherv

2010-04-27 Thread Terry Dontje
How does the stack for the non-SM BTL run look, I assume it probably is the same? Also, can you dump the message queues for rank 1? What's interesting is you have a bunch of pending receives, do you expect that to be the case when the MPI_Gatherv occurred? --td Teng Lin wrote: Hi, We rece

Re: [OMPI users] MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 0.

2010-04-28 Thread Terry Dontje
Does the "NODE 0: no room for lattice" message have any significance? Could it be that for whatever reason the application is running out of memory? --td kishore kumar wrote: Hi, I am trying to run SPEC MPI 2007 workload on a quad-core machine. However getting this error message. I also tri

Re: [OMPI users] Questions about binding processes

2010-05-10 Thread Terry Dontje
NGUYEN Laurent wrote: Hello, I'm trying to understand the difference between theses two options: " --mca mpi_paffinity_alone 1 " and " --bind-to-core " To me, it's the same thing (may be paffinity have maffinity in addition) but the purpose af theses options is to bind MPI process to process

Re: [OMPI users] Questions about binding processes

2010-05-10 Thread Terry Dontje
NGUYEN Laurent wrote: Ok, thank you for your answer. I think this rankfile feature is very interesting to run some jobs like MPMD jobs or hybrid jobs (multithreaded or GPU for examples). Regards, Point taken. The basic premise Jeff and I are working on is to see if we could come up with a s

Re: [OMPI users] Questions about MPI_Isend

2010-05-11 Thread Terry Dontje
Gijsbert Wiesenekker wrote: On May 11, 2010, at 9:29 , Gabriele Fatigati wrote: Dear Gijsbert, >Ideally I would like to check how many MPI_Isend messages have not been processed yet, so that I can stop >sending messages if there are 'too many' waiting. Is there a way to do this? you can

Re: [OMPI users] Buffer size limit and memory consumption problem on heterogeneous (32 bit / 64 bit) machines

2010-05-20 Thread Terry Dontje
Olivier Riff wrote: Hello, I assume this question has been already discussed many times, but I can not find on Internet a solution to my problem. It is about buffer size limit of MPI_Send and MPI_Recv with heterogeneous system (32 bit laptop / 64 bit cluster). My configuration is : open mpi 1

Re: [OMPI users] OpenMPI-Ranking problem

2010-06-08 Thread Terry Dontje
Which version of OMPI are you running on and the OS version? Can you try and replace the rankfile specification with --bind-to-core and tell me if that works any better? --td Chamila Janath wrote: _rankfile_ rank 0=10.16.71.1 slot=0 I launched my mpi app using, $ mpirun -np 1 -rf rankfile

Re: [OMPI users] Specifying slots in rankfile

2010-06-10 Thread Terry Dontje
It looks like the rankfile "*" syntax was broke between version r22761 and r23214. So, it looks like a regression to me. Ethan is looking into trying to narrow this down more. --td Ralph Castain wrote: I would have to look at the code, but I suspect it doesn't handle "*". Could be upgraded

Re: [OMPI users] Specifying slots in rankfile

2010-06-10 Thread Terry Dontje
Sorry, there was a miscommunications between Ethan and I. The "*" nomenclature never worked in OMPI, it is the specification of "n:*" that works and we believe still works. --td Terry Dontje wrote: It looks like the rankfile "*" syntax was broke between versi

Re: [OMPI users] using the carto facility

2009-01-06 Thread Terry Dontje
Lydia, sorry I led you astray I meant for you to use the rankfile feature as described in the mpirun manpage under the heading "Specifying Ranks". --td Message: 1 Date: Mon, 5 Jan 2009 17:09:41 + (GMT) From: Lydia Heck Subject: [OMPI users] using the carto facility To: us...@open-mpi.org

Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-14 Thread Terry Dontje
You know this all looks very similar to the reason why rolfv putback r20351 which essentially defined out restrict within opal_config_bottom.h when using Sun Studio. --td List-Post: users@lists.open-mpi.org Date: Fri, 13 Mar 2009 16:40:49 -0400 From: Jeff Squyres Subject: Re: [OMPI users] PGI

Re: [OMPI users] Linux opteron infiniband sunstudio configure problem

2009-03-30 Thread Terry Dontje
Sorry for the delay in response, I was out of the office late last week. Can you tell me what version of Open MPI you are trying to build (1.2 or 1.3 branch)? Are you using the tarball on the Open MPI site or code downloaded from the svn repository? Can you tell me which distribution and vers

Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem

2009-03-30 Thread Terry Dontje
I also was unable to reproduce the configure error with the latest 1.3 tarball. I was on a SLES distribution. What distribution are you on and can you possibly try and configure using gcc instead of Sun Studio? I have a feeling this issue is a larger configure issue and not Sun Studio specif

Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem

2009-03-30 Thread Terry Dontje
Terry Dontje wrote: I also was unable to reproduce the configure error with the latest 1.3 tarball. I was on a SLES distribution. What distribution are you on and can you possibly try and configure using gcc instead of Sun Studio? I have a feeling this issue is a larger configure issue and

Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem

2009-03-30 Thread Terry Dontje
Date: Mon, 30 Mar 2009 19:05:25 +0100 From: Kevin McManus Subject: Re: [OMPI users] Linux opteron infiniband sunstudioconfigure problem To: Open MPI Users Message-ID: <20090330180524.gt13...@gre.ac.uk> Content-Type: text/plain; charset=us-ascii > > I will try to reproduce the

Re: [OMPI users] Linux opteron infiniband sunstudio configure, problem

2009-03-31 Thread Terry Dontje
ype; you must specify one Can you manually run UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')` in your shell without error? --td Terry Dontje wrote: Date: Mon, 30 Mar 2009 19:05:25 +0100 From: Kevin McManus Subject: Re: [OMPI users] Linux opteron infiniband sunstudio c

Re: [OMPI users] Problems Compiling OpenMPI with Sun Studio 12

2009-04-03 Thread Terry Dontje
Which version of the Sun Studio compilers are you using also which version of OMPI are you trying to build. I am successful with building OMPI with the Sun Studio Express release 200811 on Linux systems if I don't use the C++ compiler. Prior releases we did suffer from some issues. A "cc -V"

<    1   2