Hello. I'm getting some odd error messages in certain situations
associated with the btl components (happens with both 1.0.2 and 1.1).
When certain btl components are NOT loaded, openMPI issues error
messages associated with those very components. For instance, consider
an application that
I am actually running the released 1.1. I can send you my code, if you want,
and you could try running it off a single node with -np 4 or 5
(oversubscribing) and see if you get a BUS_ADRALN error off one node. The only
restriction to compiling the code is that X libs be available (display is
Brian Barrett wrote:
On Wed, 2006-06-28 at 09:43 -0400, Patrick Jessee wrote:
Hello. I've tracked down the source of the previously reported startup
problem with Openmpi 1.1. On startup, it fails with the messages:
mca_oob_tcp_accept: accept() failed with errno 9.
:
This didn't
The problems was resolved in the 1.1 series...so I didn't push any further.
Thanks!
Le mercredi 28 juin 2006 09:21, openmpi-user a écrit :
> Hi Eric (and all),
>
> don't know if this really messes things up, but you have set up lam-mpi
> in your path-variables, too:
>
> [enterprise:24786]
Yeah bummers, but something tells me it might not be OpenMPI's fault. Here's
why:
1- The tech that takes care of these machines told me that he gets RTC errors
on bootup (the cpu borads are apprantly "out of sync" since the clocks aren't
set correctly).
2- There is also a possibility that the
Hello. I've tracked down the source of the previously reported startup
problem with Openmpi 1.1. On startup, it fails with the messages:
mca_oob_tcp_accept: accept() failed with errno 9.
:
This didn't happen with 1.0.2.
The trigger for this behavior is if standard input happens to be
Hi Eric (and all),
don't know if this really messes things up, but you have set up lam-mpi
in your path-variables, too:
[enterprise:24786] pls:rsh: reset LD_LIBRARY_PATH:
(this thread started as a LAM question
[http://www.lam-mpi.org/MailArchives/lam/2006/06/12497.php], and one
message contained an Open MPI question, so I took the liberty of moving
it to the OMPI user's list)
> As for openmpi, I get a lot of messages like this
>
> global_ssi(1441) malloc: ***
If i mpirun the MPI application--'hello world' on a single computer(dual core)
itself, it is work. But it can't be successful when i mpirun it across multiple
nodes. The rsh/ssh agent is work, i can rsh/ssh to other nodes.Everytime i
mpirun 'hostname' , the remote rsh/ssh agent ask for the
Bummer! :-(
Just to be sure -- you had a clean config.cache file before you ran configure,
right? (e.g., the file didn't exist -- just to be sure it didn't get
potentially erroneous values from a previous run of configure) Also, FWIW,
it's not necessary to specify
leton (MCA v1.0, API v1.0, Component v1.1)
Enclosed you'll find the config.log.
Yours,
Frank
-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: config.log
Url:
http://www.open-mpi.org/MailArchives/users/attachments/20060628/a640acf1/config.pl
---
A common problem that I have seen is that all nodes in the cluster may
not be configured identically. For example, can you confirm that eth1
is your gigE interface on all nodes? It might have accidentally been
configured to be your IPoIB interface on some nodes.
If that's not the case, let us
Can you provide a little more information?
What exactly are you trying to mpirun across multiple nodes? Is it an MPI
application or a non-MPI application? For example, can you mpirun "hostname"
(i.e., the Unix hostname utility) across multiple nodes successfully?
If you're trying to mpirun
13 matches
Mail list logo