Hi. We've run into an IO issue with 1.4.1 and earlier versions. We're able to reproduce the issue in around 120 lines of code to help, I'd like to find if there's something we're simply doing incorrectly with the build or if it's in fact a known bug. I've included the following in order:
1. Configure options used on all versions tested 2. Successful run on 1.4.3 3. Failed run on 1.3.1 4. Failed run on 1.4.1 5. Source code of test 6. ompi_info We're running this on a single node with 2 processes. An additional thing to note is we can load the 1.4.2 or 1.4.3 environment and successfully run the 1.4.1 or 1.3.1 executable. Thanks. Steve 1. ./configure --prefix=/share/apps/openmpi/1.4.1/intel-12 --with-tm=/opt/torque --enable-debug --with-openib --with-wrapper-cflags="-shared-intel" --with-wrapper-cxxflags="-shared-intel" --with-wrapper-fflags="-shared-intel" --with-wrapper-fcflags="-shared-intel" 2. [smjones@compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.4.3 10 iotest running on mpi_size: 2 writing 10 ints to file iotest.dat... rank 0 writing: 0 to 4 rank 1 writing: 5 to 9 reading 10 ints from file iotest.dat... just read: 0 0 just read: 1 1 just read: 2 2 just read: 3 3 just read: 4 4 just read: 5 5 just read: 6 6 just read: 7 7 just read: 8 8 just read: 9 9 File looks good. 3. [smjones@compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.3.1 100 iotest running on mpi_size: 2 writing 100 ints to file iotest.dat... rank 0 writing: 0 to 49 rank 1 writing: 50 to 99 reading 100 ints from file iotest.dat... just read: 0 50 iotest.openmpi-1.3.1: iotest.cpp:105: int main(int, char**): Assertion `ibuf == i' failed. [compute-1-1:18731] *** Process received signal *** [compute-1-1:18731] Signal: Aborted (6) [compute-1-1:18731] Signal code: (-6) [compute-1-1:18731] [ 0] /lib64/libpthread.so.0 [0x357800e7c0] [compute-1-1:18731] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3577830265] [compute-1-1:18731] [ 2] /lib64/libc.so.6(abort+0x110) [0x3577831d10] [compute-1-1:18731] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35778296e6] [compute-1-1:18731] [ 4] codes/cti/tests/iotest/iotest.openmpi-1.3.1(main+0x3db) [0x408e7f] [compute-1-1:18731] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x357781d994] [compute-1-1:18731] [ 6] codes/cti/tests/iotest/iotest.openmpi-1.3.1(__gxx_personality_v0+0x139) [0x408989] [compute-1-1:18731] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 18731 on node compute-1-1.local exited on signal 6 (Aborted). -------------------------------------------------------------------------- 4. [smjones@compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.4.1 100 iotest running on mpi_size: 2 writing 100 ints to file iotest.dat... rank 1 writing: 50 to 99 rank 0 writing: 0 to 49 reading 100 ints from file iotest.dat... just read: 0 50 iotest.openmpi-1.4.1: iotest.cpp:105: int __unixcall main(int, char **): Assertion `ibuf == i' failed. [compute-1-1:19057] *** Process received signal *** [compute-1-1:19057] Signal: Aborted (6) [compute-1-1:19057] Signal code: (-6) [compute-1-1:19057] [ 0] /lib64/libpthread.so.0 [0x357800e7c0] [compute-1-1:19057] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3577830265] [compute-1-1:19057] [ 2] /lib64/libc.so.6(abort+0x110) [0x3577831d10] [compute-1-1:19057] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35778296e6] [compute-1-1:19057] [ 4] codes/cti/tests/iotest/iotest.openmpi-1.4.1(main+0x472) [0x401ab2] [compute-1-1:19057] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x357781d994] [compute-1-1:19057] [ 6] codes/cti/tests/iotest/iotest.openmpi-1.4.1(__gxx_personality_v0+0x41) [0x401589] [compute-1-1:19057] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 19057 on node compute-1-1.local exited on signal 6 (Aborted). -------------------------------------------------------------------------- 5. [smjones@frontend iotest]$ cat iotest.cpp #include <iostream> #include <math.h> #include <assert.h> #include <mpi.h> using std::cout; using std::cerr; using std::endl; // iotest // This simple test reproduces a problem with writing in MPI_Type_indexed in openmpi. // int main(int argc,char * argv[]) { MPI_Init(&argc,&argv); int mpi_size; MPI_Comm_size(MPI_COMM_WORLD, &mpi_size); int mpi_rank; MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank); if (mpi_rank == 0) cout << "iotest running on mpi_size: " << mpi_size << endl; if (argc != 2) { if (mpi_rank == 0) cout << "\n\nUsage: \n\nmpirun -np X iotest <global_number_of_ints>\n\n" << endl; MPI_Finalize(); return(-1); } // how many ints to write... int n = atoi(argv[1]); if (mpi_rank == 0) cout << "writing " << n << " ints to file iotest.dat..." << endl; // everybody figure out their local offset and size... int my_disp = mpi_rank*n/mpi_size; int my_n = (mpi_rank+1)*n/mpi_size - my_disp; cout << "rank " << mpi_rank << " writing: " << my_disp << " to " << my_disp+my_n-1 << endl; MPI_File fh; int ierr = MPI_File_open(MPI_COMM_WORLD,"iotest.dat", MPI_MODE_WRONLY | MPI_MODE_CREATE, MPI_INFO_NULL,&fh); assert(ierr == 0); // build the type... MPI_Datatype int_type; MPI_Type_indexed(1,&my_n,&my_disp,MPI_INT,&int_type); MPI_Type_commit(&int_type); // fill a buffer of ints with increasing values, starting with our offset... int * buf = new int[my_n]; for (int i = 0; i < my_n; ++i) buf[i] = my_disp + i; // set our view into the file... MPI_Offset offset = 0; MPI_File_set_view(fh, offset, MPI_INT, int_type, "native", MPI_INFO_NULL); // and write... MPI_Status status; MPI_File_write_all(fh, buf, my_n, MPI_INT, &status); // trim the file to the current size and close... offset += n*sizeof(int); MPI_File_set_size(fh,offset); MPI_File_close(&fh); // cleanup... delete[] buf; MPI_Type_free(&int_type); // --------------------------------------------------- // now let rank 0 read the file using standard io and check for // correctness... if (mpi_rank == 0) { if (mpi_rank == 0) cout << "reading " << n << " ints from file iotest.dat..." << endl; FILE * fp = fopen("iotest.dat","rb"); for (int i = 0; i < n; ++i) { // just read one at a time - ouch! int ibuf; fread(&ibuf,sizeof(int),1,fp); cout << "just read: " << i << " " << ibuf << endl; assert(ibuf == i); } fclose(fp); cout << "File looks good." << endl; } MPI_Barrier(MPI_COMM_WORLD); // shut down MPI stuff... MPI_Finalize(); return(0); } 6. [smjones@frontend iotest]$ ompi_info Package: Open MPI r...@frontend.somewhere.com Distribution Open MPI: 1.4.3 Open MPI SVN revision: r23834 Open MPI release date: Oct 05, 2010 Open RTE: 1.4.3 Open RTE SVN revision: r23834 Open RTE release date: Oct 05, 2010 OPAL: 1.4.3 OPAL SVN revision: r23834 OPAL release date: Oct 05, 2010 Ident string: 1.4.3 Prefix: /share/apps/openmpi/1.4.3/intel-12 Configured architecture: x86_64-unknown-linux-gnu Configure host: frontend.somewhere.com Configured by: root Configured on: Mon Sep 12 18:02:17 PDT 2011 Configure host: frontend.somewhere.com Built by: root Built on: Mon Sep 12 18:13:08 PDT 2011 Built host: frontend.somewhere.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: icc C compiler absolute: /opt/intel/composerxe-2011.2.137/bin/intel64/icc C++ compiler: icpc C++ compiler absolute: /opt/intel/composerxe-2011.2.137/bin/intel64/icpc Fortran77 compiler: ifort Fortran77 compiler abs: /opt/intel/composerxe-2011.2.137/bin/intel64/ifort Fortran90 compiler: ifort Fortran90 compiler abs: /opt/intel/composerxe-2011.2.137/bin/intel64/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Sparse Groups: no Internal debug support: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: no mpirun default --prefix: no MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol visibility support: yes FT Checkpoint support: no (checkpoint thread: no) MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4.3) MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4.3) MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.3) MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4.3) MCA carto: file (MCA v2.0, API v2.0, Component v1.4.3) MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.3) MCA timer: linux (MCA v2.0, API v2.0, Component v1.4.3) MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4.3) MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4.3) MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4.3) MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4.3) MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4.3) MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: basic (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: inter (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: self (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: sm (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: sync (MCA v2.0, API v2.0, Component v1.4.3) MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4.3) MCA io: romio (MCA v2.0, API v2.0, Component v1.4.3) MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4.3) MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4.3) MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: cm (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: csum (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4.3) MCA pml: v (MCA v2.0, API v2.0, Component v1.4.3) MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4.3) MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: ofud (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: openib (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: self (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: sm (MCA v2.0, API v2.0, Component v1.4.3) MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4.3) MCA topo: unity (MCA v2.0, API v2.0, Component v1.4.3) MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4.3) MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4.3) MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4.3) MCA iof: orted (MCA v2.0, API v2.0, Component v1.4.3) MCA iof: tool (MCA v2.0, API v2.0, Component v1.4.3) MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4.3) MCA odls: default (MCA v2.0, API v2.0, Component v1.4.3) MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4.3) MCA ras: tm (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4.3) MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4.3) MCA rml: oob (MCA v2.0, API v2.0, Component v1.4.3) MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4.3) MCA routed: direct (MCA v2.0, API v2.0, Component v1.4.3) MCA routed: linear (MCA v2.0, API v2.0, Component v1.4.3) MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4.3) MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4.3) MCA plm: tm (MCA v2.0, API v2.0, Component v1.4.3) MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4.3) MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: env (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4.3) MCA ess: tool (MCA v2.0, API v2.0, Component v1.4.3) MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4.3) MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4.3)