Try running:

which mpirun
ssh cl2n022 which mpirun
ssh cl2n010 which mpirun

and

ldd your_mpi_executable
ssh cl2n022 which mpirun
ssh cl2n010 which mpirun

Compare the results and ensure that you're finding the same mpirun on all 
nodes, and the same libmpi.so on all nodes.  There may well be another Open MPI 
installed in some non-default location of which you're unaware.


On May 31, 2012, at 8:21 PM, Edmund Sumbar wrote:

> Thanks for the tip Jeff,
> 
> I wish it was that simple. Unfortunately, this is the only version installed. 
> When I added --prefix to the mpiexec command line, I still got a seg fault, 
> but without the backtrace. Oh well, I'll keep trying (compiler upgrade etc).
> 
> [cl2n022:03057] *** Process received signal ***
> [cl2n022:03057] Signal: Segmentation fault (11)
> [cl2n022:03057] Signal code: Address not mapped (1)
> [cl2n022:03057] Failing at address: 0x10
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file util/nidmap.c at line 776
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ess_tm_module.c at line 310
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file base/odls_base_default_fns.c at line 2342
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file util/nidmap.c at line 776
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ess_tm_module.c at line 310
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file base/odls_base_default_fns.c at line 2342
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file util/nidmap.c at line 776
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ess_tm_module.c at line 310
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file base/odls_base_default_fns.c at line 2342
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file util/nidmap.c at line 776
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file ess_tm_module.c at line 310
> [cl2n022:03048] [[45689,0],7] ORTE_ERROR_LOG: Data unpack would read past end 
> of buffer in file base/odls_base_default_fns.c at line 2342
> [cl2n010:16470] *** Process received signal ***
> [cl2n010:16470] Signal: Segmentation fault (11)
> [cl2n010:16470] Signal code: Address not mapped (1)
> [cl2n010:16470] Failing at address: 0x10
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 32 with PID 3057 on node cl2n022 exited on 
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> 
> On Thu, May 31, 2012 at 2:54 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
> This type of error usually means that you are inadvertently mixing versions 
> of Open MPI (e.g., version A.B.C on one node and D.E.F on another node).
> 
> 
> 
> -- 
> Edmund Sumbar
> University of Alberta
> +1 780 492 9360
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to