Hello. I have a user running a Fortran code that can be built and run on on both 32-bit and 64-bit architectures. When this code is built for the x86-64 machines in our cluster, running on OMPI 1.2.7, it runs fine. However, if we build and run it on 32-bit x86 machines, also running the same GNU/Linux distribution and also with OMPI 1.2.7, it crashes with errors like:
[node4][0,1,4][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] [node3][0,1,3][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=110 mca_btl_tcp_frag_recv: readv failed with errno=104 We have tried different Fortran compilers (both PathScale and gfortran) and keep getting these crashes, which occur after varying numbers of iterations. Running on a single node using MPI seems to work OK. Are there any suggestions on how to figure out if it's a problem with the code or the OMPI installation/software on the system? We have tried "--debug-daemons" with no new/interesting information being revealed. Is there a way to trap segfault messages or more detailed MPI transaction information or anything else that could help diagnose this? Thanks. -- V. Ram v_r_...@fastmail.fm -- http://www.fastmail.fm - Access all of your messages and folders wherever you are