Have you turned on valgrind support in Open MPI. That is required to
quite these bogus warnings.
-Nathan
On Wed, Jan 14, 2015 at 10:17:50AM +, Victor Vysotskiy wrote:
> Hi,
>
> Our parallel applications behaves strange when it is compiled with Openmpi
> v1.8.4 on both Linux and Mac OS X platforms. The Valgrind reports memory
> problems in OpenMPI rather than in our code:
>
> =4440== Invalid read of size 1
> ==4440==at 0xCAD6D37: ompi_osc_rdma_callback (osc_rdma_data_move.c:1650)
> ==4440==by 0xC05E87F: ompi_request_complete (request.h:402)
> ==4440==by 0xC05F1F6: recv_request_pml_complete (pml_ob1_recvreq.h:181)
> ==4440==by 0xC060476: mca_pml_ob1_recv_frag_callback_match
> (pml_ob1_recvfrag.c:243)
> ==4440==by 0xB9F9D4E: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
> ==4440==by 0xB9FC23C: mca_btl_vader_component_progress
> (btl_vader_component.c:695)
> ==4440==by 0x606C1C7: opal_progress (opal_progress.c:187)
> ==4440==by 0x50E7A22: opal_condition_wait (condition.h:78)
> ==4440==by 0x50E8360: ompi_request_default_wait_all (req_wait.c:281)
> ==4440==by 0xD1578C3: ompi_coll_tuned_sendrecv_zero
> (coll_tuned_barrier.c:77)
> ==4440==by 0xD157FC1: ompi_coll_tuned_barrier_intra_two_procs
> (coll_tuned_barrier.c:318)
> ==4440==by 0xD149BCB: ompi_coll_tuned_barrier_intra_dec_fixed
> (coll_tuned_decision_fixed.c:194)
> ==4440== Address 0xd6d5d80 is 0 bytes inside a block of size 8,208 alloc'd
> ==4440==at 0x4C2CD7B: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4440==by 0x60BAD50: opal_malloc (malloc.c:101)
> ==4440==by 0xCAD07DF: component_select (osc_rdma_component.c:462)
> ==4440==by 0x51A2253: ompi_osc_base_select (osc_base_init.c:73)
> ==4440==by 0x50EF1E7: ompi_win_create (win.c:152)
> ==4440==by 0x51625AB: PMPI_Win_create (pwin_create.c:79)
> ==4440==by 0x5B3647: gtsk_setup_ (gtsk_nxtval.c:94)
>
> ==4440== Invalid read of size 2
> ==4440==at 0xCAD68C4: process_frag (osc_rdma_data_move.c:1554)
> ==4440==by 0xCAD6DBB: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
> ==4440==by 0xC05E87F: ompi_request_complete (request.h:402)
> ==4440==by 0xC05F1F6: recv_request_pml_complete (pml_ob1_recvreq.h:181)
> ==4440==by 0xC060476: mca_pml_ob1_recv_frag_callback_match
> (pml_ob1_recvfrag.c:243)
> ==4440==by 0xB9F9D4E: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
> ==4440==by 0xB9FC23C: mca_btl_vader_component_progress
> (btl_vader_component.c:695)
> ==4440==by 0x606C1C7: opal_progress (opal_progress.c:187)
> ==4440==by 0x50E7A22: opal_condition_wait (condition.h:78)
> ==4440==by 0x50E8360: ompi_request_default_wait_all (req_wait.c:281)
> ==4440==by 0xD1578C3: ompi_coll_tuned_sendrecv_zero
> (coll_tuned_barrier.c:77)
> ==4440==by 0xD157FC1: ompi_coll_tuned_barrier_intra_two_procs
> (coll_tuned_barrier.c:318)
> ==4440== Address 0xd6d5d88 is 8 bytes inside a block of size 8,208 alloc'd
> ==4440==at 0x4C2CD7B: malloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4440==by 0x60BAD50: opal_malloc (malloc.c:101)
> ==4440==by 0xCAD07DF: component_select (osc_rdma_component.c:462)
> ==4440==by 0x51A2253: ompi_osc_base_select (osc_base_init.c:73)
> ==4440==by 0x50EF1E7: ompi_win_create (win.c:152)
> ==4440==by 0x51625AB: PMPI_Win_create (pwin_create.c:79)
> ==4440==by 0x5B3647: gtsk_setup_ (gtsk_nxtval.c:94)
> ...
>
> Enclosed please find the complete report for the master processes. Could it
> be that these invalid memory operations are caused by our code? The line 94
> in our code looks like:
>
> MPI_Win_create(buff,size,sizeof(long int),MPI_INFO_NULL,MPI_COMM_WORLD,);
>
> /* char *buff;
> MPI_Aint size;
> MPI_Win twin;
> */
>
> I would greatly appreciate any help you can give me in working this problem.
>
> With best regards,
> Victor.
>
> P.s. The output of "ompi_info -- all" is also attached.
> ==4440== Memcheck, a memory error detector
> ==4440== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
> ==4440== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
> ==4440==
> ==4440== Warning: set address range perms: large range [0x1617c028,
> 0x2617c058) (noaccess)
> ==4440== Invalid read of size 1
> ==4440==at 0xCAD6D37: ompi_osc_rdma_callback (osc_rdma_data_move.c:1650)
> ==4440==by 0xC05E87F: ompi_request_complete (request.h:402)
> ==4440==by 0xC05F1F6: recv_request_pml_complete (pml_ob1_recvreq.h:181)
> ==4440==by 0xC060476: mca_pml_ob1_recv_frag_callback_match
> (pml_ob1_recvfrag.c:243)
> ==4440==by 0xB9F9D4E: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
> ==4440==by 0xB9FC23C: mca_btl_vader_component_progress
> (btl_vader_component.c:695)
> ==4440==by 0x606C1C7: opal_progress (opal_progress.c:187)
> ==4440==by 0x50E7A22: opal_condition_wait (condition.h:78)
> ==4440==by 0x50E8360: