To echo what Josh said, there are no special compile flags being used.
If you send me a patch with debug output, I'd be happy to run it for you.
Both odin and sif are fairly normal linux based clusters, with ethernet
and openib IP networks. The ethernet network has both ipv4 & ipv6, and
the op
On Fri, Apr 18, 2008 at 01:00:40PM -0400, Josh Hursey wrote:
> The trick is to force Open MPI to use only tcp,self and nothing else.
> Did you try adding this (-mca btl tcp,self) to the runtime parameter
> set?
Sure. Even with 64 processes, I cannot trigger this behaviour. Neither
on Linux no
I'm seeing this problem as well even running just 4 processes on a
single node (though not as frequently as with higher process counts).
The trick is to force Open MPI to use only tcp,self and nothing else.
Did you try adding this (-mca btl tcp,self) to the runtime parameter
set?
-- Josh
On Fri, Apr 18, 2008 at 08:04:17AM -0400, Tim Prins wrote:
> Hi Adrian,
Hi!
> After this change, I am getting a lot of errors of the form:
> [sif2][[12854,1],9][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by
> peer (104)
>
> See for instanc
Hi Adrian,
After this change, I am getting a lot of errors of the form:
[sif2][[12854,1],9][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed: Connection reset by
peer (104)
See for instance: http://www.open-mpi.org/mtt/index.php?do_redir=615
I have found this espe