[OMPI users] error messages for btl components that aren't loaded

2006-06-28 Thread Patrick Jessee
Hello. I'm getting some odd error messages in certain situations associated with the btl components (happens with both 1.0.2 and 1.1). When certain btl components are NOT loaded, openMPI issues error messages associated with those very components. For instance, consider an application that

Re: [OMPI users] Re : OpenMPI 1.1: Signal:10, info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN)

2006-06-28 Thread Eric Thibodeau
I am actually running the released 1.1. I can send you my code, if you want, and you could try running it off a single node with -np 4 or 5 (oversubscribing) and see if you get a BUS_ADRALN error off one node. The only restriction to compiling the code is that X libs be available (display is

Re: [OMPI users] Openmpi 1.1: startup problem caused by file descriptor state

2006-06-28 Thread Patrick Jessee
Brian Barrett wrote: On Wed, 2006-06-28 at 09:43 -0400, Patrick Jessee wrote: Hello. I've tracked down the source of the previously reported startup problem with Openmpi 1.1. On startup, it fails with the messages: mca_oob_tcp_accept: accept() failed with errno 9. : This didn't

Re: [OMPI users] users Digest, Vol 317, Issue 4

2006-06-28 Thread Eric Thibodeau
The problems was resolved in the 1.1 series...so I didn't push any further. Thanks! Le mercredi 28 juin 2006 09:21, openmpi-user a écrit : > Hi Eric (and all), > > don't know if this really messes things up, but you have set up lam-mpi > in your path-variables, too: > > [enterprise:24786]

Re: [OMPI users] Installing OpenMPI on a solaris

2006-06-28 Thread Eric Thibodeau
Yeah bummers, but something tells me it might not be OpenMPI's fault. Here's why: 1- The tech that takes care of these machines told me that he gets RTC errors on bootup (the cpu borads are apprantly "out of sync" since the clocks aren't set correctly). 2- There is also a possibility that the

[OMPI users] Openmpi 1.1: startup problem caused by file descriptor state

2006-06-28 Thread Patrick Jessee
Hello. I've tracked down the source of the previously reported startup problem with Openmpi 1.1. On startup, it fails with the messages: mca_oob_tcp_accept: accept() failed with errno 9. : This didn't happen with 1.0.2. The trigger for this behavior is if standard input happens to be

Re: [OMPI users] users Digest, Vol 317, Issue 4

2006-06-28 Thread openmpi-user
Hi Eric (and all), don't know if this really messes things up, but you have set up lam-mpi in your path-variables, too: [enterprise:24786] pls:rsh: reset LD_LIBRARY_PATH:

[OMPI users] FW: mpi_allreduce error

2006-06-28 Thread Jeff Squyres (jsquyres)
(this thread started as a LAM question [http://www.lam-mpi.org/MailArchives/lam/2006/06/12497.php], and one message contained an Open MPI question, so I took the liberty of moving it to the OMPI user's list) > As for openmpi, I get a lot of messages like this > > global_ssi(1441) malloc: ***

Re: [OMPI users] rsh/ssh is work but mpirun hang ?

2006-06-28 Thread shen T.T.
If i mpirun the MPI application--'hello world' on a single computer(dual core) itself, it is work. But it can't be successful when i mpirun it across multiple nodes. The rsh/ssh agent is work, i can rsh/ssh to other nodes.Everytime i mpirun 'hostname' , the remote rsh/ssh agent ask for the

Re: [OMPI users] Installing OpenMPI on a solaris

2006-06-28 Thread Jeff Squyres (jsquyres)
Bummer! :-( Just to be sure -- you had a clean config.cache file before you ran configure, right? (e.g., the file didn't exist -- just to be sure it didn't get potentially erroneous values from a previous run of configure) Also, FWIW, it's not necessary to specify

Re: [OMPI users] OpenMPI 1.1: Signal:10 info.si_errno:0(Unknown, error: 0), si_code:1(BUS_ADRALN) (Terry D. Dontje)

2006-06-28 Thread openmpi-user
leton (MCA v1.0, API v1.0, Component v1.1) Enclosed you'll find the config.log. Yours, Frank -- next part -- An embedded and charset-unspecified text was scrubbed... Name: config.log Url: http://www.open-mpi.org/MailArchives/users/attachments/20060628/a640acf1/config.pl ---

Re: [OMPI users] Fw: OpenMPI version 1.1

2006-06-28 Thread Jeff Squyres (jsquyres)
A common problem that I have seen is that all nodes in the cluster may not be configured identically. For example, can you confirm that eth1 is your gigE interface on all nodes? It might have accidentally been configured to be your IPoIB interface on some nodes. If that's not the case, let us

Re: [OMPI users] rsh/ssh is work but mpirun hang ?

2006-06-28 Thread Jeff Squyres (jsquyres)
Can you provide a little more information? What exactly are you trying to mpirun across multiple nodes? Is it an MPI application or a non-MPI application? For example, can you mpirun "hostname" (i.e., the Unix hostname utility) across multiple nodes successfully? If you're trying to mpirun