Alexander,

i was able to reproduce this behaviour.

basically, bad things happen when the garbage collector is invoked ...
i was even able to reproduce some crashes (but that happen at random
stages) very early in the code
by manually inserting calls to the garbage collector (e.g. System.gc();)

Cheers,

Gilles

On 2015/01/19 9:03, Alexander Daryin wrote:
> Hi
>
> I am using Java MPI bindings and periodically get fatal erros. This is 
> illustrated by the following model Java program.
>
> import mpi.MPI;
> import mpi.MPIException;
> import mpi.Prequest;
> import mpi.Request;
> import mpi.Status;
>
> import java.nio.ByteBuffer;
> import java.util.Random;
>
> public class TestJavaMPI {
>
>    private static final int NREQ = 16;
>    private static final int BUFFSIZE = 0x2000;
>    private static final int NSTEP = 1000000000;
>
>    public static void main(String... args) throws MPIException {
>        MPI.Init(args);
>        Random random = new Random();
>        Prequest[] receiveRequests = new Prequest[NREQ];
>        Request[] sendRequests = new Request[NREQ];
>        ByteBuffer[] receiveBuffers = new ByteBuffer[NREQ];
>        ByteBuffer[] sendBuffers = new ByteBuffer[NREQ];
>        for(int i = 0; i < NREQ; i++) {
>            receiveBuffers[i] = MPI.newByteBuffer(BUFFSIZE);
>            sendBuffers[i] = MPI.newByteBuffer(BUFFSIZE);
>            receiveRequests[i] = MPI.COMM_WORLD.recvInit(receiveBuffers[i], 
> BUFFSIZE, MPI.BYTE, MPI.ANY_SOURCE, MPI.ANY_TAG);
>            receiveRequests[i].start();
>            sendRequests[i] = MPI.COMM_WORLD.iSend(sendBuffers[i], 0, 
> MPI.BYTE, MPI.PROC_NULL, 0);
>        }
>        for(int step = 0; step < NSTEP; step++) {
>            if( step % 128 == 0 ) System.out.println(step);
>            int index;
>            do {
>                Status status = Request.testAnyStatus(receiveRequests);
>                if( status != null )
>                    receiveRequests[status.getIndex()].start();
>                index = Request.testAny(sendRequests);
>            } while( index == MPI.UNDEFINED );
>            sendRequests[index].free();
>            sendRequests[index] = MPI.COMM_WORLD.iSend(sendBuffers[index], 
> BUFFSIZE, MPI.BYTE,
>                    random.nextInt(MPI.COMM_WORLD.getSize()), 0);
>        }
>        MPI.Finalize();
>    }
> }
>
> On Linux, this produces a segfault after about a million steps. On OS X, 
> instead of segfault it prints the following error message
>
> java(64053,0x127e4d000) malloc: *** error for object 0x7f80eb828808: 
> incorrect checksum for freed object - object was probably modified after 
> being freed.
> *** set a breakpoint in malloc_error_break to debug
> [mbp:64053] *** Process received signal ***
> [mbp:64053] Signal: Abort trap: 6 (6)
> [mbp:64053] Signal code:  (0)
> [mbp:64053] [ 0] 0   libsystem_platform.dylib            0x00007fff86b5ff1a 
> _sigtramp + 26
> [mbp:64053] [ 1] 0   ???                                 0x0000000000000000 
> 0x0 + 0
> [mbp:64053] [ 2] 0   libsystem_c.dylib                   0x00007fff80c7bb73 
> abort + 129
> [mbp:64053] [ 3] 0   libsystem_malloc.dylib              0x00007fff8c26ce06 
> szone_error + 625
> [mbp:64053] [ 4] 0   libsystem_malloc.dylib              0x00007fff8c2645c8 
> small_free_list_remove_ptr + 154
> [mbp:64053] [ 5] 0   libsystem_malloc.dylib              0x00007fff8c2632bf 
> szone_free_definite_size + 1856
> [mbp:64053] [ 6] 0   libjvm.dylib                        0x000000010e257d89 
> _ZN2os4freeEPvt + 63
> [mbp:64053] [ 7] 0   libjvm.dylib                        0x000000010dea2b0a 
> _ZN9ChunkPool12free_all_butEm + 136
> [mbp:64053] [ 8] 0   libjvm.dylib                        0x000000010e30ab33 
> _ZN12PeriodicTask14real_time_tickEi + 77
> [mbp:64053] [ 9] 0   libjvm.dylib                        0x000000010e3372a3 
> _ZN13WatcherThread3runEv + 267
> [mbp:64053] [10] 0   libjvm.dylib                        0x000000010e25d87e 
> _ZL10java_startP6Thread + 246
> [mbp:64053] [11] 0   libsystem_pthread.dylib             0x00007fff8f1402fc 
> _pthread_body + 131
> [mbp:64053] [12] 0   libsystem_pthread.dylib             0x00007fff8f140279 
> _pthread_body + 0
> [mbp:64053] [13] 0   libsystem_pthread.dylib             0x00007fff8f13e4b1 
> thread_start + 13
> [mbp:64053] *** End of error message ***
>
> OpenMPI version is 1.8.4. Java version is 1.8.0_25-b17.
>
> Best regards,
> Alexander Daryin
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/01/26215.php

Reply via email to