Follow-up to a mislabeled thread: "How could OpenMPI (or MVAPICH) affect floating-point results?"
I have found a solution to my problem, but I would like to understand the underlying issue better. To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked with OpenMPI fails. The earliest symptom I could see was some strange difference in numerical values of quantities that should be unaffected by MPI calls. Tim's advice guided me to assume memory corruption. Eugene's advice guided me to explore the detailed differences in compilation. I observed that the MVAPICH mpif90 wrapper adds -fPIC. I tried adding -fPIC and -mcmodel=medium to the compilation of the OpenMPI-linked executable. Now it works fine. I haven't tried without -mcmodel=medium, but my guess is -fPIC did the trick. Does anyone know why compiling with -fPIC has helped? Does it suggest an application problem or an OpenMPI problem? To note: This is an Infiniband-based cluster. The application does pretty basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, isend, irecv, waitall. There is one task that uses iprobe with MPI_ANY_TAG, but this task is only involved in certain cases (including this one). Conversely, cases that do not call iprobe have not yet been observed to crash. I am deducing that this function is the problem. Thanks, Ed -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Blosch, Edwin L Sent: Tuesday, September 20, 2011 11:46 AM To: Open MPI Users Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results? Thank you for this explanation. I will assume that my problem here is some kind of memory corruption. -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Tim Prince Sent: Tuesday, September 20, 2011 10:36 AM To: us...@open-mpi.org Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results? On 9/20/2011 10:50 AM, Blosch, Edwin L wrote: > It appears to be a side effect of linkage that is able to change a > compute-only routine's answers. > > I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind > of corruption may be going on. > Those intrinsics have direct instruction set translations which shouldn't vary from -O1 on up nor with linkage options nor be affected by MPI or insertion of WRITEs. -- Tim Prince _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users