Hello. I'm getting some odd error messages in certain situations
associated with the btl components (happens with both 1.0.2 and 1.1).
When certain btl components are NOT loaded, openMPI issues error
messages associated with those very components. For instance, consider
an application that
I am actually running the released 1.1. I can send you my code, if you want,
and you could try running it off a single node with -np 4 or 5
(oversubscribing) and see if you get a BUS_ADRALN error off one node. The only
restriction to compiling the code is that X libs be available (display is not
Well, I've been using the trunk and not 1.1. I also just built
1.1.1a1r10538 and ran
it with no bus error. Though you are running 1.1b5r10421 so we're not
running the
same thing, as of yet.
I have a cluster of two v440 that have 4 cpus each running Solaris 10.
The tests I
am running are np
Terry,
I was about to comment on this. could you tell me the specs of your
machine. As you will notice in "my thread", I am running into problems on Sparc
SPM systems where the CPU borad's RTC are in a doubtfull state. Are-you running
1.1 on SMP machines. If so, on how many procs and wh
Frank,
Can you set your limit coredumpsize to non-zero rerun the program
and then get the stack via dbx?
So, I have a similar case of BUS_ADRALN on SPARC systems with an
older version (June 21st) of the trunk. I've since run using the latest
trunk and the
bus went away. I am now going to try
Brian Barrett wrote:
On Wed, 2006-06-28 at 09:43 -0400, Patrick Jessee wrote:
Hello. I've tracked down the source of the previously reported startup
problem with Openmpi 1.1. On startup, it fails with the messages:
mca_oob_tcp_accept: accept() failed with errno 9.
:
This didn't happe
The problems was resolved in the 1.1 series...so I didn't push any further.
Thanks!
Le mercredi 28 juin 2006 09:21, openmpi-user a écrit :
> Hi Eric (and all),
>
> don't know if this really messes things up, but you have set up lam-mpi
> in your path-variables, too:
>
> [enterprise:24786] pls:
On Wed, 2006-06-28 at 09:43 -0400, Patrick Jessee wrote:
> Hello. I've tracked down the source of the previously reported startup
> problem with Openmpi 1.1. On startup, it fails with the messages:
>
> mca_oob_tcp_accept: accept() failed with errno 9.
> :
>
> This didn't happen with 1.0.2.
Yeah bummers, but something tells me it might not be OpenMPI's fault. Here's
why:
1- The tech that takes care of these machines told me that he gets RTC errors
on bootup (the cpu borads are apprantly "out of sync" since the clocks aren't
set correctly).
2- There is also a possibility that the p
Hello. I've tracked down the source of the previously reported startup
problem with Openmpi 1.1. On startup, it fails with the messages:
mca_oob_tcp_accept: accept() failed with errno 9.
:
This didn't happen with 1.0.2.
The trigger for this behavior is if standard input happens to be clos
Hi Eric (and all),
don't know if this really messes things up, but you have set up lam-mpi
in your path-variables, too:
[enterprise:24786] pls:rsh: reset LD_LIBRARY_PATH:
/export/lca/home/lca0/etudiants/ac38820/openmpi_sun4u/lib:/export/lca/appl/Forte/SUNWspro/WS6U2/lib:/usr/local/lib:*/usr/l
This *may* be due to stdio blocking issues (e.g., not getting the
password/passphrase to ssh properly, so the application never actually launches
on the remote node).
The first thing I would do is find out why you are getting prompted for a
password. Open MPI requires that you are not prompte
(this thread started as a LAM question
[http://www.lam-mpi.org/MailArchives/lam/2006/06/12497.php], and one
message contained an Open MPI question, so I took the liberty of moving
it to the OMPI user's list)
> As for openmpi, I get a lot of messages like this
>
> global_ssi(1441) malloc: *** Dea
If i mpirun the MPI application--'hello world' on a single computer(dual core)
itself, it is work. But it can't be successful when i mpirun it across multiple
nodes. The rsh/ssh agent is work, i can rsh/ssh to other nodes.Everytime i
mpirun 'hostname' , the remote rsh/ssh agent ask for the passw
Bummer! :-(
Just to be sure -- you had a clean config.cache file before you ran configure,
right? (e.g., the file didn't exist -- just to be sure it didn't get
potentially erroneous values from a previous run of configure) Also, FWIW,
it's not necessary to specify --enable-ltdl-convenience;
omponent v1.1)
Enclosed you'll find the config.log.
Yours,
Frank
-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: config.log
Url:
http://www.open-mpi.org/MailArchives/users/attachments/20060628/a640acf1/config.pl
A common problem that I have seen is that all nodes in the cluster may
not be configured identically. For example, can you confirm that eth1
is your gigE interface on all nodes? It might have accidentally been
configured to be your IPoIB interface on some nodes.
If that's not the case, let us k
Can you provide a little more information?
What exactly are you trying to mpirun across multiple nodes? Is it an MPI
application or a non-MPI application? For example, can you mpirun "hostname"
(i.e., the Unix hostname utility) across multiple nodes successfully?
If you're trying to mpirun
, API v1.0, Component v1.1)
Enclosed you'll find the config.log.
Yours,
Frank
-- next part --
An embedded and charset-unspecified text was scrubbed...
Name: config.log
Url:
http://www.open-mpi.org/MailArchives/users/attachments/20060628/a640acf1/config.pl
--
next part --
An embedded and charset-unspecified text was scrubbed...
Name: config.log
Url:
http://www.open-mpi.org/MailArchives/users/attachments/20060628/a640acf1/config.pl
--
___
users mailing list
us...@ope
Hi!
I've recently updated to OpenMPI 1.1 on a few nodes and running into a
problem that wasn't there with OpenMPI 1.0.2.
Submitting a job to the XGrid with OpenMPI 1.1 yields a Bus error that
isn't there when not submitting the job to the XGrid:
[g5dual:/Network/CFD/MVH-1.0] motte% mpirun -
21 matches
Mail list logo