Any error messages? Maybe the nodes ran out of memory? I know MPI
implement some kind of buffering under the hood, so even though you're
sending array's over 2^26 in size, it may require more than that for MPI to
actually send it.
On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico
libfui.so is a library a part of the Solaris Studio FORTRAN tools. It
should be located under lib from where your Solaris Studio compilers are
installed from. So one question is whether you actually have Studio
Fortran installed on all your nodes or not?
--td
On 04/04/2011 04:02 PM, Ralph
What does 'ldd ring2' show? How was it compiled?
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote:
[jian@therock ~]$ echo $LD_LIBRARY_PATH
/opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-
Well, where is libfui located? Is that location in your ld path? Is the lib
present on all nodes in your hostfile?
On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote:
> [jian@therock ~]$ echo $LD_LIBRARY_PATH
>
[jian@therock ~]$ echo $LD_LIBRARY_PATH
/opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64:/home/jian/.crlibs:/home/jian/.crlibs32
[jian@therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/mpirun -np 4 -hostfile
list ring2
ring2: error while loading
Hi,
Try prepending the path to your compiler libraries.
Example (bash-like):
export LD_LIBRARY_PATH=/compiler/prefix/lib:/ompi/prefix/lib:
$LD_LIBRARY_PATH
--
Samuel K. Gutierrez
Los Alamos National Laboratory
On Apr 4, 2011, at 1:33 PM, Nehemiah Dacres wrote:
altering LD_LIBRARY_PATH
I don't know what libfui.so.1 is, but this FAQ entry may answer your
question...?
http://www.open-mpi.org/faq/?category=mpi-apps#override-wrappers-after-v1.0
On Apr 4, 2011, at 3:33 PM, Nehemiah Dacres wrote:
> altering LD_LIBRARY_PATH alter's the process's path to mpi's libraries, how
>
altering LD_LIBRARY_PATH alter's the process's path to mpi's libraries, how
do i alter its path to compiler libs like libfui.so.1? it needs to find them
cause it was compiled by a sun compiler
On Mon, Apr 4, 2011 at 10:06 AM, Nehemiah Dacres wrote:
>
> As Ralph indicated,
On Apr 4, 2011, at 10:30 AM, Borenstein, Bernard S wrote:
> We have added clusters with different interconnects and decided to build one
> OPENMPI 1.4.3 version to handle all the possible interconnects
> and run everywhere. I have two questions about this :
>
> 1 – is there a way for Openmpi
On Apr 4, 2011, at 10:18 AM, Rob Latham wrote:
> What I see happening here is the OpenMPI finalize routine is deleting
> attributes. one of those attributes is ROMIO's, which in turn tries
> to free keyvals. Is the deadlock that noting "under" ompi_attr_delete
> can itself call ompi_*
FWIW, we solved this problem with ROMIO in MPICH2 by making the "big global
lock" a recursive mutex. In the past it was implicitly so because of the way
that recursive MPI calls were handled. In current MPICH2 it's explicitly
initialized with type PTHREAD_MUTEX_RECURSIVE instead.
-Dave
On
> As Ralph indicated, he'll add the hostname to the error message (but that
> might be tricky; that error message is coming from rsh/ssh...).
>
> In the meantime, you might try (csh style):
>
> foreach host (`cat list`)
>echo $host
>ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
> end
>
>
that's an excellent suggestion
On Mon, Apr 4, 2011 at 9:45 AM, Jeff Squyres wrote:
> As Ralph indicated, he'll add the hostname to the error message (but that
> might be tricky; that error message is coming from rsh/ssh...).
>
> In the meantime, you might try (csh style):
>
Hmmm...yes, I guess we did get off-track then. This soln is exactly what I
proposed on the first response to your thread, and was repeated by others later
on. :-/
So long as mpirun is executed on the node where the "sister mom" is located,
and as long as your script "B" does -not- include an
On Apr 4, 2011, at 8:42 AM, Nehemiah Dacres wrote:
> you do realize that this is Sun Cluster Tools branch (it is a branch right?
> or is it a *port* of openmpi to sun's compilers?) I'm not sure if your
> changes made it into sunct 8.2.1
My point was that the error message currently doesn't
As Ralph indicated, he'll add the hostname to the error message (but that might
be tricky; that error message is coming from rsh/ssh...).
In the meantime, you might try (csh style):
foreach host (`cat list`)
echo $host
ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
end
On Apr 4, 2011,
you do realize that this is Sun Cluster Tools branch (it is a branch right?
or is it a *port* of openmpi to sun's compilers?) I'm not sure if your
changes made it into sunct 8.2.1
On Mon, Apr 4, 2011 at 9:34 AM, Ralph Castain wrote:
> Guess I can/will add the node name to the
On Apr 4, 2011, at 10:38 AM, Laurence Marks wrote:
> Thanks, I think we may have a mistaken communication here; I assume
> that the computer where they have disabled rsh and ssh they have
> "something" to communicate with so we don't need to use pbsdsh.
Clarification in terminology:
-
Thanks, I think we may have a mistaken communication here; I assume
that the computer where they have disabled rsh and ssh they have
"something" to communicate with so we don't need to use pbsdsh. If
they don't there is not much a lowly user like me can do.
I think we can close this, since like
Guess I can/will add the node name to the error message - should have been
there before now.
If it is a debug build, you can add "-mca plm_base_verbose 1" to the cmd line
and get output tracing the launch and showing you what nodes are having
problems.
On Apr 4, 2011, at 8:24 AM, Nehemiah
We have added clusters with different interconnects and decided to build one
OPENMPI 1.4.3 version to handle all the possible interconnects
and run everywhere. I have two questions about this :
1 - is there a way for Openmpi to print out the interconnect it selected to use
at run time? I am
On Apr 4, 2011, at 8:18 AM, Rob Latham wrote:
> On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote:
>>
>> opal_mutex_lock(): Resource deadlock avoided
>> #0 0x0012e416 in __kernel_vsyscall ()
>> #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>> #2
I have installed it via a symlink on all of the nodes, I can go 'tentakel
which mpirun ' and it finds it' I'll check the library paths but isn't there
a way to find out which nodes are returning the error?
On Thu, Mar 31, 2011 at 7:30 AM, Jeff Squyres wrote:
> The error
On Sat, Apr 02, 2011 at 04:59:34PM -0400, fa...@email.com wrote:
>
> opal_mutex_lock(): Resource deadlock avoided
> #0 0x0012e416 in __kernel_vsyscall ()
> #1 0x01035941 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #2 0x01038e42 in abort () at abort.c:92
> #3 0x00d9da68 in
On Sat, Apr 02, 2011 at 09:07:55PM +0900, Satoi Ogawa wrote:
> Dear Developers and Users,
>
> Thank you for your development of Open MPI.
>
> I want to use Open MPI 1.5.3 on Windows 7 32bit, one PC.
> But there is something wrong with the part using MPI-2 I/O functions
> in my program.
> It
Why don't you use the command "mpirun" to run your mpi programm ?
Pascal
fa...@email.com a écrit :
Pascal Deveze wrote:
> Could you check that your programm closes all MPI-IO files before
calling MPI_Finalize ?
Yes, I checked that. All files should be closed. I've also written a
small test
Pascal Deveze wrote:
> Could you check that your programm closes all MPI-IO files beforecalling
> MPI_Finalize ?
Yes, I checked that. All files should be closed. I've also written a small
test program,
which is attached below. The output refers to openmpi-1.5.3 with threading
support,
I apologize - I realized late last night that I had a typo in my recommended
command. It should read:
mpirun -mca plm rsh -mca plm_rsh_agent pbsdsh -mca ras ^tm --machinefile m1
^^^
Also, if you know that #procs <= #cores on your nodes,
On 2 April 2011 04:16, Ahsan Ali wrote:
> Hello,
> I want to run WRF on multiple nodes in a linux cluster using openmpi,
> giving the command mpirun -np 4 ./wrf.exe just submit it to the single node
> . I don't know how to run it on other nodes as well. Help needed.
Could you check that your programm closes all MPI-IO files before
calling MPI_Finalize ?
fa...@email.com a écrit :
> Even inside MPICH2, I have given little attention to threadsafety and
> the MPI-IO routines. In MPICH2, each MPI_File* function grabs the big
> critical section lock -- not
Dear David,
I don't know where the machinefile is ?. I found a command *Running with
Open MPI mpirun -np 168 -mca btl self,sm,openib –hostfile
/home/demo/hostfile-ompi.14 -mca mpi_paffinity_alone 1
~/WRFV3.2.1/run/wrf.exe.* for* *Dell PowerEdge M610 14-node cluster
with Mellanox QDR InfiniBand
31 matches
Mail list logo