subject:"\[OMPI users\] torque pbs behaviour..."

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

Well, it now is launching just fine, so that's one thing! :-) Afraid I'll have to let the TCP btl guys take over from here. It looks like everything is up and running, but something strange is going on in the MPI comm layer. You can turn off those mca params I gave you as you are now past that po

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

Yeah, it's the lib confusion that's the problem - this is the problem: [saturna.cluster:07360] [[14551,0],0] ORTE_ERROR_LOG: Buffer type (described > vs non-described) mismatch - operation not allowed in file > base/odls_base_default_fns.c at line 2475 > Have you tried configuring with --enable-m

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Klymak Jody

On 11-Aug-09, at 6:16 AM, Jeff Squyres wrote: This means that OMPI is finding an mca_iof_proxy.la file at run time from a prior version of Open MPI. You might want to use "find" or "locate" to search your nodes and find it. I suspect that you somehow have an OMPI 1.3.x install that overl

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Klymak Jody

On 11-Aug-09, at 7:03 AM, Ralph Castain wrote: Sigh - too early in the morning for this old brain, I fear... You are right - the ranks are fine, and local rank doesn't matter. It sounds like a problem where the TCP messaging is getting a message ack'd from someone other than the process th

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Klymak Jody

On 11-Aug-09, at 6:28 AM, Ralph Castain wrote: -mca plm_base_verbose 5 --debug-daemons -mca odls_base_verbose 5 I'm afraid the output will be a tad verbose, but I would appreciate seeing it. Might also tell us something about the lib issue. Command line was: /usr/local/openmpi/bin/mpirun

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

Sigh - too early in the morning for this old brain, I fear... You are right - the ranks are fine, and local rank doesn't matter. It sounds like a problem where the TCP messaging is getting a message ack'd from someone other than the process that was supposed to recv the message. This should cause

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Jeff Squyres

On Aug 11, 2009, at 9:43 AM, Klymak Jody wrote: [xserve03.local][[61029,1],4][btl_tcp_endpoint.c: 486:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[61029,1],3] This would well be caused by a version mismatch between your nodes. E.g., if one node is running

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Klymak Jody

On 11-Aug-09, at 6:28 AM, Ralph Castain wrote: The reason your job is hanging is sitting in the orte-ps output. You have multiple processes declaring themselves to be the same MPI rank. That definitely won't work. Its the "local rank" if that makes any difference... Any thoughts on this o

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

Oops - I should have looked at your output more closely. The component_find warnings are clearly indicating some old libs laying around, but that isn't why your job is hanging. The reason your job is hanging is sitting in the orte-ps output. You have multiple processes declaring themselves to be t

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

Sorry, but Jeff is correct - that error message clearly indicates a version mismatch. Somewhere, one or more of your nodes is still picking up an old version. On Tue, Aug 11, 2009 at 7:16 AM, Jeff Squyres wrote: > On Aug 11, 2009, at 9:11 AM, Klymak Jody wrote: > > I have removed all the OS-X

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Jeff Squyres

On Aug 11, 2009, at 9:11 AM, Klymak Jody wrote: I have removed all the OS-X -supplied libraries, recompiled and installed openmpi 1.3.3, and I am *still* getting this warning when running ompi_info: [saturna.cluster:50307] mca: base: component_find: iof "mca_iof_proxy" uses an MCA interface tha

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Klymak Jody

On 10-Aug-09, at 8:03 PM, Ralph Castain wrote: Interesting! Well, I always make sure I have my personal OMPI build before any system stuff, and I work exclusively on Mac OS-X: I am still finding this very mysterious I have removed all the OS-X -supplied libraries, recompiled and insta

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

On Aug 11, 2009, at 5:17 AM, Ashley Pittman wrote: On Tue, 2009-08-11 at 03:03 -0600, Ralph Castain wrote: If it isn't already there, try putting a print statement tight at program start, another just prior to MPI_Init, and another just after MPI_Init. It could be that something is hanging som

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ashley Pittman

On Tue, 2009-08-11 at 03:03 -0600, Ralph Castain wrote: > If it isn't already there, try putting a print statement tight at > program start, another just prior to MPI_Init, and another just after > MPI_Init. It could be that something is hanging somewhere during > program startup since it sounds li

Re: [OMPI users] torque pbs behaviour...

2009-08-11 Thread Ralph Castain

If it isn't already there, try putting a print statement tight at program start, another just prior to MPI_Init, and another just after MPI_Init. It could be that something is hanging somewhere during program startup since it sounds like everything is launching just fine. On Aug 10, 2009,

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Klymak Jody

On 10-Aug-09, at 8:03 PM, Ralph Castain wrote: Interesting! Well, I always make sure I have my personal OMPI build before any system stuff, and I work exclusively on Mac OS-X: Note that I always configure with --prefix=somewhere-in-my-own-dir, never to a system directory. Avoids this kind

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

Interesting! Well, I always make sure I have my personal OMPI build before any system stuff, and I work exclusively on Mac OS-X: rhc$ echo $PATH /Library/Frameworks/Python.framework/Versions/Current/bin:/Users/rhc/ openmpi/bin:/Users/rhc/bin:/opt/local/bin:/usr/X11R6/bin:/usr/local/ bin:/opt/

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Klymak Jody

On 10-Aug-09, at 6:44 PM, Ralph Castain wrote: Check your LD_LIBRARY_PATH - there is an earlier version of OMPI in your path that is interfering with operation (i.e., it comes before your 1.3.3 installation). H, The OS X faq says not to do this: "Note that there is no need to add Open

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

Check your LD_LIBRARY_PATH - there is an earlier version of OMPI in your path that is interfering with operation (i.e., it comes before your 1.3.3 installation). On Aug 10, 2009, at 7:38 PM, Klymak Jody wrote: So, mpirun --display-allocation -pernode --display-map hostname gives me the o

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Klymak Jody

So, mpirun --display-allocation -pernode --display-map hostname gives me the output below. Simple jobs seem to run, but the MITgcm does not, either under ssh or torque. It hangs at some early point in execution before anything is written, so its hard for me to tell what the error is. Co

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

No problem - actually, that default works with any environment, not just Torque On Aug 10, 2009, at 4:37 PM, Gus Correa wrote: Thank you for the correction, Ralph. I didn't know there was a (wise) default for the number of processes when using Torque-enabled OpenMPI. Gus Correa Ralph Casta

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Gus Correa

Thank you for the correction, Ralph. I didn't know there was a (wise) default for the number of processes when using Torque-enabled OpenMPI. Gus Correa Ralph Castain wrote: Just to correct something said here. You need to tell mpirun how many processes to launch, regardless of whether you are

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

Just to correct something said here. You need to tell mpirun how many processes to launch, regardless of whether you are using Torque or not. This is not correct. If you don't tell mpirun how many processes to launch, we will automatically launch one process for every slot in your allocati

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

No problem - yes indeed, 1.1.x would be a bad choice :-) On Aug 10, 2009, at 3:58 PM, Jody Klymak wrote: On Aug 10, 2009, at 14:39 PM, Ralph Castain wrote: mpirun --display-allocation -pernode --display-map hostname Ummm, hmm, this is embarassing, none of those command line arguments w

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Jody Klymak

On Aug 10, 2009, at 14:39 PM, Ralph Castain wrote: mpirun --display-allocation -pernode --display-map hostname Ummm, hmm, this is embarassing, none of those command line arguments worked, making me suspicious... It looks like somehow I decided to build and run openMPI 1.1.5, or at lea

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Gus Correa

Hi Jody, list See comments inline. Jody Klymak wrote: On Aug 10, 2009, at 13:01 PM, Gus Correa wrote: Hi Jody We don't have Mac OS-X, but Linux, not sure if this applies to you. Did you configure your OpenMPI with Torque support, and pointed to the same library that provides the Torque yo

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

On Aug 10, 2009, at 3:25 PM, Jody Klymak wrote: Hi Ralph, On Aug 10, 2009, at 13:04 PM, Ralph Castain wrote: Umm...are you saying that your $PBS_NODEFILE contains the following: No, if I put cat $PBS_NODEFILE in the pbs script I get xserve02.local ... xserve02.local xserve01.local ... xse

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Jody Klymak

Hi Ralph, On Aug 10, 2009, at 13:04 PM, Ralph Castain wrote: Umm...are you saying that your $PBS_NODEFILE contains the following: No, if I put cat $PBS_NODEFILE in the pbs script I get xserve02.local ... xserve02.local xserve01.local ... xserve01.local each repeated 8 times. So that seems

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Jody Klymak

On Aug 10, 2009, at 13:01 PM, Gus Correa wrote: Hi Jody We don't have Mac OS-X, but Linux, not sure if this applies to you. Did you configure your OpenMPI with Torque support, and pointed to the same library that provides the Torque you are using (--with-tm=/path/to/torque-library-directory)

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Ralph Castain

Umm...are you saying that your $PBS_NODEFILE contains the following: xserve01.local np=8 xserve02.local np=8 If so, that could be part of the problem - it isn't the standard notation we are expecting to see in that file. What Torque normally provides is one line for each slot, so we would

Re: [OMPI users] torque pbs behaviour...

2009-08-10 Thread Gus Correa

Hi Jody We don't have Mac OS-X, but Linux, not sure if this applies to you. Did you configure your OpenMPI with Torque support, and pointed to the same library that provides the Torque you are using (--with-tm=/path/to/torque-library-directory)? Are you using the right mpirun? (There are so man

[OMPI users] torque pbs behaviour...

2009-08-10 Thread Jody Klymak

Hi All, I've been trying to get torque pbs to work on my OS X 10.5.7 cluster with openMPI (after finding that Xgrid was pretty flaky about connections). I *think* this is an MPI problem (perhaps via operator error!) If I submit openMPI with: #PBS -l nodes=2:ppn=8 mpirun MyProg pbs

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

Re: [OMPI users] torque pbs behaviour...

[OMPI users] torque pbs behaviour...

32 matches

Site Navigation

Mail list logo

Footer information