Re: [OMPI users] alltoall messages > 2^26

2011-04-04 Thread David Zhang
Any error messages? Maybe the nodes ran out of memory? I know MPI implement some kind of buffering under the hood, so even though you're sending array's over 2^26 in size, it may require more than that for MPI to actually send it. On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico

Re: [OMPI users] mpi problems,

2011-04-04 Thread Terry Dontje
libfui.so is a library a part of the Solaris Studio FORTRAN tools. It should be located under lib from where your Solaris Studio compilers are installed from. So one question is whether you actually have Studio Fortran installed on all your nodes or not? --td On 04/04/2011 04:02 PM, Ralph

Re: [OMPI users] mpi problems,

2011-04-04 Thread Samuel K. Gutierrez
What does 'ldd ring2' show? How was it compiled? -- Samuel K. Gutierrez Los Alamos National Laboratory On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote: [jian@therock ~]$ echo $LD_LIBRARY_PATH /opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-

Re: [OMPI users] mpi problems,

2011-04-04 Thread Ralph Castain
Well, where is libfui located? Is that location in your ld path? Is the lib present on all nodes in your hostfile? On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote: > [jian@therock ~]$ echo $LD_LIBRARY_PATH >

Re: [OMPI users] mpi problems,

2011-04-04 Thread Nehemiah Dacres
[jian@therock ~]$ echo $LD_LIBRARY_PATH /opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64:/home/jian/.crlibs:/home/jian/.crlibs32 [jian@therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/mpirun -np 4 -hostfile list ring2 ring2: error while loading

Re: [OMPI users] mpi problems,

2011-04-04 Thread Samuel K. Gutierrez
Hi, Try prepending the path to your compiler libraries. Example (bash-like): export LD_LIBRARY_PATH=/compiler/prefix/lib:/ompi/prefix/lib: $LD_LIBRARY_PATH -- Samuel K. Gutierrez Los Alamos National Laboratory On Apr 4, 2011, at 1:33 PM, Nehemiah Dacres wrote: altering LD_LIBRARY_PATH

Re: [OMPI users] mpi problems,

2011-04-04 Thread Jeff Squyres
I don't know what libfui.so.1 is, but this FAQ entry may answer your question...? http://www.open-mpi.org/faq/?category=mpi-apps#override-wrappers-after-v1.0 On Apr 4, 2011, at 3:33 PM, Nehemiah Dacres wrote: > altering LD_LIBRARY_PATH alter's the process's path to mpi's libraries, how >

Re: [OMPI users] mpi problems,

2011-04-04 Thread Nehemiah Dacres
altering LD_LIBRARY_PATH alter's the process's path to mpi's libraries, how do i alter its path to compiler libs like libfui.so.1? it needs to find them cause it was compiled by a sun compiler On Mon, Apr 4, 2011 at 10:06 AM, Nehemiah Dacres wrote: > > As Ralph indicated,

Re: [OMPI users] question about running openmpi with different interconnects

2011-04-04 Thread Jeff Squyres
On Apr 4, 2011, at 10:30 AM, Borenstein, Bernard S wrote: > We have added clusters with different interconnects and decided to build one > OPENMPI 1.4.3 version to handle all the possible interconnects > and run everywhere. I have two questions about this : > > 1 – is there a way for Openmpi

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Jeff Squyres
On Apr 4, 2011, at 10:18 AM, Rob Latham wrote: > What I see happening here is the OpenMPI finalize routine is deleting > attributes. one of those attributes is ROMIO's, which in turn tries > to free keyvals. Is the deadlock that noting "under" ompi_attr_delete > can itself call ompi_*

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Dave Goodell
FWIW, we solved this problem with ROMIO in MPICH2 by making the "big global lock" a recursive mutex. In the past it was implicitly so because of the way that recursive MPI calls were handled. In current MPICH2 it's explicitly initialized with type PTHREAD_MUTEX_RECURSIVE instead. -Dave On

Re: [OMPI users] mpi problems,

2011-04-04 Thread Nehemiah Dacres
> As Ralph indicated, he'll add the hostname to the error message (but that > might be tricky; that error message is coming from rsh/ssh...). > > In the meantime, you might try (csh style): > > foreach host (`cat list`) >echo $host >ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted > end > >

Re: [OMPI users] mpi problems,

2011-04-04 Thread Nehemiah Dacres
that's an excellent suggestion On Mon, Apr 4, 2011 at 9:45 AM, Jeff Squyres wrote: > As Ralph indicated, he'll add the hostname to the error message (but that > might be tricky; that error message is coming from rsh/ssh...). > > In the meantime, you might try (csh style): >

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-04 Thread Ralph Castain
Hmmm...yes, I guess we did get off-track then. This soln is exactly what I proposed on the first response to your thread, and was repeated by others later on. :-/ So long as mpirun is executed on the node where the "sister mom" is located, and as long as your script "B" does -not- include an

Re: [OMPI users] mpi problems,

2011-04-04 Thread Ralph Castain
On Apr 4, 2011, at 8:42 AM, Nehemiah Dacres wrote: > you do realize that this is Sun Cluster Tools branch (it is a branch right? > or is it a *port* of openmpi to sun's compilers?) I'm not sure if your > changes made it into sunct 8.2.1 My point was that the error message currently doesn't

Re: [OMPI users] mpi problems,

2011-04-04 Thread Jeff Squyres
As Ralph indicated, he'll add the hostname to the error message (but that might be tricky; that error message is coming from rsh/ssh...). In the meantime, you might try (csh style): foreach host (`cat list`) echo $host ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted end On Apr 4, 2011,

Re: [OMPI users] mpi problems,

2011-04-04 Thread Nehemiah Dacres
you do realize that this is Sun Cluster Tools branch (it is a branch right? or is it a *port* of openmpi to sun's compilers?) I'm not sure if your changes made it into sunct 8.2.1 On Mon, Apr 4, 2011 at 9:34 AM, Ralph Castain wrote: > Guess I can/will add the node name to the

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-04 Thread Jeff Squyres
On Apr 4, 2011, at 10:38 AM, Laurence Marks wrote: > Thanks, I think we may have a mistaken communication here; I assume > that the computer where they have disabled rsh and ssh they have > "something" to communicate with so we don't need to use pbsdsh. Clarification in terminology: -

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-04 Thread Laurence Marks
Thanks, I think we may have a mistaken communication here; I assume that the computer where they have disabled rsh and ssh they have "something" to communicate with so we don't need to use pbsdsh. If they don't there is not much a lowly user like me can do. I think we can close this, since like

Re: [OMPI users] mpi problems,

2011-04-04 Thread Ralph Castain
Guess I can/will add the node name to the error message - should have been there before now. If it is a debug build, you can add "-mca plm_base_verbose 1" to the cmd line and get output tracing the launch and showing you what nodes are having problems. On Apr 4, 2011, at 8:24 AM, Nehemiah

[OMPI users] question about running openmpi with different interconnects

2011-04-04 Thread Borenstein, Bernard S
We have added clusters with different interconnects and decided to build one OPENMPI 1.4.3 version to handle all the possible interconnects and run everywhere. I have two questions about this : 1 - is there a way for Openmpi to print out the interconnect it selected to use at run time? I am

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Ralph Castain
On Apr 4, 2011, at 8:18 AM, Rob Latham wrote: > On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote: >> >> opal_mutex_lock(): Resource deadlock avoided >> #0 0x0012e416 in __kernel_vsyscall () >> #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> #2

Re: [OMPI users] mpi problems,

2011-04-04 Thread Nehemiah Dacres
I have installed it via a symlink on all of the nodes, I can go 'tentakel which mpirun ' and it finds it' I'll check the library paths but isn't there a way to find out which nodes are returning the error? On Thu, Mar 31, 2011 at 7:30 AM, Jeff Squyres wrote: > The error

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Rob Latham
On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote: > > opal_mutex_lock(): Resource deadlock avoided > #0 0x0012e416 in __kernel_vsyscall () > #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #2 0x01038e42 in abort () at abort.c:92 > #3 0x00d9da68 in

Re: [OMPI users] MPI-2 I/O functions (Open MPI 1.5.x on Windows)

2011-04-04 Thread Rob Latham
On Sat, Apr 02, 2011 at 09:07:55PM +0900, Satoi Ogawa wrote: > Dear Developers and Users, > > Thank you for your development of Open MPI. > > I want to use Open MPI 1.5.3 on Windows 7 32bit, one PC. > But there is something wrong with the part using MPI-2 I/O functions > in my program. > It

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Pascal Deveze
Why don't you use the command "mpirun" to run your mpi programm ? Pascal fa...@email.com a écrit : Pascal Deveze wrote: > Could you check that your programm closes all MPI-IO files before calling MPI_Finalize ? Yes, I checked that. All files should be closed. I've also written a small test

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread fah10
Pascal Deveze wrote: > Could you check that your programm closes all MPI-IO files beforecalling > MPI_Finalize ? Yes, I checked that. All files should be closed. I've also written a small test program, which is attached below. The output refers to openmpi-1.5.3 with threading support,

Re: [OMPI users] openmpi/pbsdsh/Torque problem

2011-04-04 Thread Ralph Castain
I apologize - I realized late last night that I had a typo in my recommended command. It should read: mpirun -mca plm rsh -mca plm_rsh_agent pbsdsh -mca ras ^tm --machinefile m1 ^^^ Also, if you know that #procs <= #cores on your nodes,

Re: [OMPI users] WRF run on multiple Nodes

2011-04-04 Thread John Hearns
On 2 April 2011 04:16, Ahsan Ali wrote: > Hello, >  I want to run WRF on multiple nodes in a linux cluster using openmpi, > giving the command mpirun -np 4 ./wrf.exe just submit it to the single node > . I don't know how to run it on other nodes as well. Help needed.

Re: [OMPI users] Deadlock with mpi_init_thread + mpi_file_set_view

2011-04-04 Thread Pascal Deveze
Could you check that your programm closes all MPI-IO files before calling MPI_Finalize ? fa...@email.com a écrit : > Even inside MPICH2, I have given little attention to threadsafety and > the MPI-IO routines. In MPICH2, each MPI_File* function grabs the big > critical section lock -- not

Re: [OMPI users] WRF run on multiple Nodes

2011-04-04 Thread Ahsan Ali
Dear David, I don't know where the machinefile is ?. I found a command *Running with Open MPI mpirun -np 168 -mca btl self,sm,openib –hostfile /home/demo/hostfile-ompi.14 -mca mpi_paffinity_alone 1 ~/WRFV3.2.1/run/wrf.exe.* for* *Dell PowerEdge M610 14-node cluster with Mellanox QDR InfiniBand