Hello.

I have a user running a Fortran code that can be built and run on on
both 32-bit and 64-bit architectures.  When this code is built for the
x86-64 machines in our cluster, running on OMPI 1.2.7, it runs fine. 
However, if we build and run it on 32-bit x86 machines, also running the
same GNU/Linux distribution and also with OMPI 1.2.7, it crashes with
errors like:

[node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
[node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv]
mca_btl_tcp_frag_recv: readv failed with errno=110
mca_btl_tcp_frag_recv: readv failed with errno=104

We have tried different Fortran compilers (both PathScale and gfortran)
and keep getting these crashes, which occur after varying numbers of
iterations.  Running on a single node using MPI seems to work OK.

Are there any suggestions on how to figure out if it's a problem with
the code or the OMPI installation/software on the system?  We have tried
"--debug-daemons" with no new/interesting information being revealed. 
Is there a way to trap segfault messages or more detailed MPI
transaction information or anything else that could help diagnose this?

Thanks.
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - Access all of your messages and folders
                          wherever you are

Reply via email to