[OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood
Hi, I am seeing problems with a small linux cluster when running OpenMPI jobs. The error message I get is: [frontend][0,1,0][btl_tcp_endpoint.c:572:mca_btl_tcp_endpoint_complete_connect] connect() failed with errno=110 Following the FAQ, I looked to see what this error code corresponds to: $

Re: [OMPI users] TCP connection errors

2007-06-11 Thread Jonathan Underwood
Hi Adrian, On 11/06/07, Adrian Knoth wrote: Which OMPI version? 1.2.2 > $ perl -e 'die$!=110' > Connection timed out at -e line 1. Looks pretty much like a routing issue. Can you sniff on eth1 on the frontend node? I don't have root access, so am afraid

Re: [OMPI users] TCP connection errors

2007-06-12 Thread Jonathan Underwood
On 12/06/07, George Bosilca wrote: Jonathan, It will be difficult to make it works in this configuration. The problem is that on the head node the network interface that have to be used is eth1 while on the compute nodes is eth0. Therefore, the tcp_if_include will not help

Re: [OMPI users] TCP connection errors

2007-06-12 Thread Jonathan Underwood
On 12/06/07, George Bosilca <bosi...@cs.utk.edu> wrote: Jonathan Underwood wrote: > Presumably switching the two interfaces on the frontend (eth0<->eth1) > would also solve this problem? > If you have root privileges this seems to be a another good approach. I don't, but w

Re: [OMPI users] TCP connection errors

2007-06-13 Thread Jonathan Underwood
Thanks Adrian - that's a useful suggestion, I'll explore that. Jonathan.

Re: [OMPI users] mpi.h - not conforming to C90 spec

2006-08-17 Thread Jonathan Underwood
On 18/08/06, Brian Barrett <brbar...@open-mpi.org> wrote: On Aug 17, 2006, at 4:43 PM, Jonathan Underwood wrote: > Compiling an mpi program with gcc options -pedantic -Wall gives the > following warning: > > mpi.h:147: warning: ISO C90 does not support 'long lon