Hi,
I am seeing problems with a small linux cluster when running OpenMPI
jobs. The error message I get is:
[frontend][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=110
Following the FAQ, I looked to see what this error code corresponds to:
$
Hi Adrian,
On 11/06/07, Adrian Knoth wrote:
Which OMPI version?
1.2.2
> $ perl -e 'die$!=110'
> Connection timed out at -e line 1.
Looks pretty much like a routing issue. Can you sniff on eth1 on the
frontend node?
I don't have root access, so am afraid
On 12/06/07, George Bosilca wrote:
Jonathan,
It will be difficult to make it works in this configuration. The problem
is that on the head node the network interface that have to be used is
eth1 while on the compute nodes is eth0. Therefore, the tcp_if_include
will not help
On 12/06/07, George Bosilca <bosi...@cs.utk.edu> wrote:
Jonathan Underwood wrote:
> Presumably switching the two interfaces on the frontend (eth0<->eth1)
> would also solve this problem?
>
If you have root privileges this seems to be a another good approach.
I don't, but w
Thanks Adrian - that's a useful suggestion, I'll explore that.
Jonathan.
On 18/08/06, Brian Barrett <brbar...@open-mpi.org> wrote:
On Aug 17, 2006, at 4:43 PM, Jonathan Underwood wrote:
> Compiling an mpi program with gcc options -pedantic -Wall gives the
> following warning:
>
> mpi.h:147: warning: ISO C90 does not support 'long lon