You might want to upgrade to 1.10.1, or at least to 1.8.8 as 1.6.5 is pretty old

> On Nov 26, 2015, at 1:49 PM, Grigory Shamov <grigory.sha...@umanitoba.ca> 
> wrote:
> 
> Hi All,
> 
> For a parallel MPI job, we sometimes (not always) get the following
> message:
> 
> [n047:25850] [[36630,0],1] -> [[36630,0],0] (node: n230) oob-tcp: Number
> of attempts to create TCP connection has been exceeded.  Can not
> communicate with peer
> [n047:25850] [[36630,0],1] ORTE_ERROR_LOG: Unreachable in file
> ../../../../../openmpi-1.6.5/orte/mca/grpcomm/bad/grpcomm_bad_module.c at
> line 412
> [n047:25850] [[36630,0],1] -> [[36630,0],0] (node: n230) oob-tcp: Number
> of attempts to create TCP connection has been exceeded.  Can not
> communicate with peer
> 
> These appear in the middle of a running job; we use OpenMPI 1.6.5 and OFED
> 2.4 on CentOS 6.  
> 
> -- 
> Grigory Shamov
> HPC Analist,
> Westgrid/ComputeCanada Site Lead
> University of Manitoba
> E2-588 EITC Building,
> (204) 474-9625
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/11/28113.php

Reply via email to