We'll follow up on this github issue.

Alexander -- thanks for the bug report.  If you'd like to follow the progress 
of this issue, comment on https://github.com/open-mpi/ompi/issues/369.


> On Feb 1, 2015, at 5:08 PM, Oscar Vega-Gisbert <ov...@dsic.upv.es> wrote:
> 
> Hi,
> 
> I created an issue with a simplified example:
> 
> https://github.com/open-mpi/ompi/issues/369
> 
> Regards,
> Oscar
> 
> 
> El 25/01/15 a las 19:36, Oscar Vega-Gisbert escribió:
>> Hi,
>> 
>> I also reproduce this behaviour. But I think this crash is not related with 
>> garbage collector. Java is much better than you think.
>> 
>> May be MPI corrupts the Java runtime heap.
>> 
>> Regards,
>> Oscar
>> 
>> El 22/01/15 a las 08:07, Gilles Gouaillardet escribió:
>>> Alexander,
>>> 
>>> i was able to reproduce this behaviour.
>>> 
>>> basically, bad things happen when the garbage collector is invoked ...
>>> i was even able to reproduce some crashes (but that happen at random
>>> stages) very early in the code
>>> by manually inserting calls to the garbage collector (e.g. System.gc();)
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On 2015/01/19 9:03, Alexander Daryin wrote:
>>>> Hi
>>>> 
>>>> I am using Java MPI bindings and periodically get fatal erros. This is 
>>>> illustrated by the following model Java program.
>>>> 
>>>> import mpi.MPI;
>>>> import mpi.MPIException;
>>>> import mpi.Prequest;
>>>> import mpi.Request;
>>>> import mpi.Status;
>>>> 
>>>> import java.nio.ByteBuffer;
>>>> import java.util.Random;
>>>> 
>>>> public class TestJavaMPI {
>>>> 
>>>>    private static final int NREQ = 16;
>>>>    private static final int BUFFSIZE = 0x2000;
>>>>    private static final int NSTEP = 1000000000;
>>>> 
>>>>    public static void main(String... args) throws MPIException {
>>>>        MPI.Init(args);
>>>>        Random random = new Random();
>>>>        Prequest[] receiveRequests = new Prequest[NREQ];
>>>>        Request[] sendRequests = new Request[NREQ];
>>>>        ByteBuffer[] receiveBuffers = new ByteBuffer[NREQ];
>>>>        ByteBuffer[] sendBuffers = new ByteBuffer[NREQ];
>>>>        for(int i = 0; i < NREQ; i++) {
>>>>            receiveBuffers[i] = MPI.newByteBuffer(BUFFSIZE);
>>>>            sendBuffers[i] = MPI.newByteBuffer(BUFFSIZE);
>>>>            receiveRequests[i] = MPI.COMM_WORLD.recvInit(receiveBuffers[i], 
>>>> BUFFSIZE, MPI.BYTE, MPI.ANY_SOURCE, MPI.ANY_TAG);
>>>>            receiveRequests[i].start();
>>>>            sendRequests[i] = MPI.COMM_WORLD.iSend(sendBuffers[i], 0, 
>>>> MPI.BYTE, MPI.PROC_NULL, 0);
>>>>        }
>>>>        for(int step = 0; step < NSTEP; step++) {
>>>>            if( step % 128 == 0 ) System.out.println(step);
>>>>            int index;
>>>>            do {
>>>>                Status status = Request.testAnyStatus(receiveRequests);
>>>>                if( status != null )
>>>> receiveRequests[status.getIndex()].start();
>>>>                index = Request.testAny(sendRequests);
>>>>            } while( index == MPI.UNDEFINED );
>>>>            sendRequests[index].free();
>>>>            sendRequests[index] = MPI.COMM_WORLD.iSend(sendBuffers[index], 
>>>> BUFFSIZE, MPI.BYTE,
>>>>                    random.nextInt(MPI.COMM_WORLD.getSize()), 0);
>>>>        }
>>>>        MPI.Finalize();
>>>>    }
>>>> }
>>>> 
>>>> On Linux, this produces a segfault after about a million steps. On OS X, 
>>>> instead of segfault it prints the following error message
>>>> 
>>>> java(64053,0x127e4d000) malloc: *** error for object 0x7f80eb828808: 
>>>> incorrect checksum for freed object - object was probably modified after 
>>>> being freed.
>>>> *** set a breakpoint in malloc_error_break to debug
>>>> [mbp:64053] *** Process received signal ***
>>>> [mbp:64053] Signal: Abort trap: 6 (6)
>>>> [mbp:64053] Signal code:  (0)
>>>> [mbp:64053] [ 0] 0   libsystem_platform.dylib 0x00007fff86b5ff1a _sigtramp 
>>>> + 26
>>>> [mbp:64053] [ 1] 0   ??? 0x0000000000000000 0x0 + 0
>>>> [mbp:64053] [ 2] 0   libsystem_c.dylib 0x00007fff80c7bb73 abort + 129
>>>> [mbp:64053] [ 3] 0   libsystem_malloc.dylib 0x00007fff8c26ce06 szone_error 
>>>> + 625
>>>> [mbp:64053] [ 4] 0   libsystem_malloc.dylib 0x00007fff8c2645c8 
>>>> small_free_list_remove_ptr + 154
>>>> [mbp:64053] [ 5] 0   libsystem_malloc.dylib 0x00007fff8c2632bf 
>>>> szone_free_definite_size + 1856
>>>> [mbp:64053] [ 6] 0   libjvm.dylib 0x000000010e257d89 _ZN2os4freeEPvt + 63
>>>> [mbp:64053] [ 7] 0   libjvm.dylib 0x000000010dea2b0a 
>>>> _ZN9ChunkPool12free_all_butEm + 136
>>>> [mbp:64053] [ 8] 0   libjvm.dylib 0x000000010e30ab33 
>>>> _ZN12PeriodicTask14real_time_tickEi + 77
>>>> [mbp:64053] [ 9] 0   libjvm.dylib 0x000000010e3372a3 
>>>> _ZN13WatcherThread3runEv + 267
>>>> [mbp:64053] [10] 0   libjvm.dylib 0x000000010e25d87e 
>>>> _ZL10java_startP6Thread + 246
>>>> [mbp:64053] [11] 0   libsystem_pthread.dylib 0x00007fff8f1402fc 
>>>> _pthread_body + 131
>>>> [mbp:64053] [12] 0   libsystem_pthread.dylib 0x00007fff8f140279 
>>>> _pthread_body + 0
>>>> [mbp:64053] [13] 0   libsystem_pthread.dylib 0x00007fff8f13e4b1 
>>>> thread_start + 13
>>>> [mbp:64053] *** End of error message ***
>>>> 
>>>> OpenMPI version is 1.8.4. Java version is 1.8.0_25-b17.
>>>> 
>>>> Best regards,
>>>> Alexander Daryin
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2015/01/26215.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2015/01/26230.php
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Searchable archives: 
> http://www.open-mpi.org/community/lists/users/2015/02/index.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to