Alexander, i was able to reproduce this behaviour.
basically, bad things happen when the garbage collector is invoked ... i was even able to reproduce some crashes (but that happen at random stages) very early in the code by manually inserting calls to the garbage collector (e.g. System.gc();) Cheers, Gilles On 2015/01/19 9:03, Alexander Daryin wrote: > Hi > > I am using Java MPI bindings and periodically get fatal erros. This is > illustrated by the following model Java program. > > import mpi.MPI; > import mpi.MPIException; > import mpi.Prequest; > import mpi.Request; > import mpi.Status; > > import java.nio.ByteBuffer; > import java.util.Random; > > public class TestJavaMPI { > > private static final int NREQ = 16; > private static final int BUFFSIZE = 0x2000; > private static final int NSTEP = 1000000000; > > public static void main(String... args) throws MPIException { > MPI.Init(args); > Random random = new Random(); > Prequest[] receiveRequests = new Prequest[NREQ]; > Request[] sendRequests = new Request[NREQ]; > ByteBuffer[] receiveBuffers = new ByteBuffer[NREQ]; > ByteBuffer[] sendBuffers = new ByteBuffer[NREQ]; > for(int i = 0; i < NREQ; i++) { > receiveBuffers[i] = MPI.newByteBuffer(BUFFSIZE); > sendBuffers[i] = MPI.newByteBuffer(BUFFSIZE); > receiveRequests[i] = MPI.COMM_WORLD.recvInit(receiveBuffers[i], > BUFFSIZE, MPI.BYTE, MPI.ANY_SOURCE, MPI.ANY_TAG); > receiveRequests[i].start(); > sendRequests[i] = MPI.COMM_WORLD.iSend(sendBuffers[i], 0, > MPI.BYTE, MPI.PROC_NULL, 0); > } > for(int step = 0; step < NSTEP; step++) { > if( step % 128 == 0 ) System.out.println(step); > int index; > do { > Status status = Request.testAnyStatus(receiveRequests); > if( status != null ) > receiveRequests[status.getIndex()].start(); > index = Request.testAny(sendRequests); > } while( index == MPI.UNDEFINED ); > sendRequests[index].free(); > sendRequests[index] = MPI.COMM_WORLD.iSend(sendBuffers[index], > BUFFSIZE, MPI.BYTE, > random.nextInt(MPI.COMM_WORLD.getSize()), 0); > } > MPI.Finalize(); > } > } > > On Linux, this produces a segfault after about a million steps. On OS X, > instead of segfault it prints the following error message > > java(64053,0x127e4d000) malloc: *** error for object 0x7f80eb828808: > incorrect checksum for freed object - object was probably modified after > being freed. > *** set a breakpoint in malloc_error_break to debug > [mbp:64053] *** Process received signal *** > [mbp:64053] Signal: Abort trap: 6 (6) > [mbp:64053] Signal code: (0) > [mbp:64053] [ 0] 0 libsystem_platform.dylib 0x00007fff86b5ff1a > _sigtramp + 26 > [mbp:64053] [ 1] 0 ??? 0x0000000000000000 > 0x0 + 0 > [mbp:64053] [ 2] 0 libsystem_c.dylib 0x00007fff80c7bb73 > abort + 129 > [mbp:64053] [ 3] 0 libsystem_malloc.dylib 0x00007fff8c26ce06 > szone_error + 625 > [mbp:64053] [ 4] 0 libsystem_malloc.dylib 0x00007fff8c2645c8 > small_free_list_remove_ptr + 154 > [mbp:64053] [ 5] 0 libsystem_malloc.dylib 0x00007fff8c2632bf > szone_free_definite_size + 1856 > [mbp:64053] [ 6] 0 libjvm.dylib 0x000000010e257d89 > _ZN2os4freeEPvt + 63 > [mbp:64053] [ 7] 0 libjvm.dylib 0x000000010dea2b0a > _ZN9ChunkPool12free_all_butEm + 136 > [mbp:64053] [ 8] 0 libjvm.dylib 0x000000010e30ab33 > _ZN12PeriodicTask14real_time_tickEi + 77 > [mbp:64053] [ 9] 0 libjvm.dylib 0x000000010e3372a3 > _ZN13WatcherThread3runEv + 267 > [mbp:64053] [10] 0 libjvm.dylib 0x000000010e25d87e > _ZL10java_startP6Thread + 246 > [mbp:64053] [11] 0 libsystem_pthread.dylib 0x00007fff8f1402fc > _pthread_body + 131 > [mbp:64053] [12] 0 libsystem_pthread.dylib 0x00007fff8f140279 > _pthread_body + 0 > [mbp:64053] [13] 0 libsystem_pthread.dylib 0x00007fff8f13e4b1 > thread_start + 13 > [mbp:64053] *** End of error message *** > > OpenMPI version is 1.8.4. Java version is 1.8.0_25-b17. > > Best regards, > Alexander Daryin > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/01/26215.php