Re: [OMPI users] Valgrind reports a plenty of Invalid read's in osc_rdma_data_move.c

2015-01-15 Thread Victor Vysotskiy
Hi Nathan,

surely, OpenMPI was compiled with the Valgrind support:

%/opt/mpi/openmpi-1.8.4.dbg/bin/ompi_info | grep -i memchecker
  MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.8.4)

The following configure options were used:

--enable-mem-debug --enable-debug --enable-memchecker --with-valgrind 
--with-mpi-param-check

Best,
Victor.

Re: [OMPI users] Valgrind reports a plenty of Invalid read's in osc_rdma_data_move.c

2015-01-14 Thread Nathan Hjelm

Have you turned on valgrind support in Open MPI. That is required to
quite these bogus warnings.

-Nathan

On Wed, Jan 14, 2015 at 10:17:50AM +, Victor Vysotskiy wrote:
> Hi, 
> 
> Our parallel applications behaves strange when it is compiled with Openmpi 
> v1.8.4 on both Linux and Mac OS X platforms.  The Valgrind reports memory 
> problems in OpenMPI rather than in our code:
> 
> =4440== Invalid read of size 1
> ==4440==at 0xCAD6D37: ompi_osc_rdma_callback (osc_rdma_data_move.c:1650)
> ==4440==by 0xC05E87F: ompi_request_complete (request.h:402)
> ==4440==by 0xC05F1F6: recv_request_pml_complete (pml_ob1_recvreq.h:181)
> ==4440==by 0xC060476: mca_pml_ob1_recv_frag_callback_match 
> (pml_ob1_recvfrag.c:243)
> ==4440==by 0xB9F9D4E: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
> ==4440==by 0xB9FC23C: mca_btl_vader_component_progress 
> (btl_vader_component.c:695)
> ==4440==by 0x606C1C7: opal_progress (opal_progress.c:187)
> ==4440==by 0x50E7A22: opal_condition_wait (condition.h:78)
> ==4440==by 0x50E8360: ompi_request_default_wait_all (req_wait.c:281)
> ==4440==by 0xD1578C3: ompi_coll_tuned_sendrecv_zero 
> (coll_tuned_barrier.c:77)
> ==4440==by 0xD157FC1: ompi_coll_tuned_barrier_intra_two_procs 
> (coll_tuned_barrier.c:318)
> ==4440==by 0xD149BCB: ompi_coll_tuned_barrier_intra_dec_fixed 
> (coll_tuned_decision_fixed.c:194)
> ==4440==  Address 0xd6d5d80 is 0 bytes inside a block of size 8,208 alloc'd
> ==4440==at 0x4C2CD7B: malloc (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4440==by 0x60BAD50: opal_malloc (malloc.c:101)
> ==4440==by 0xCAD07DF: component_select (osc_rdma_component.c:462)
> ==4440==by 0x51A2253: ompi_osc_base_select (osc_base_init.c:73)
> ==4440==by 0x50EF1E7: ompi_win_create (win.c:152)
> ==4440==by 0x51625AB: PMPI_Win_create (pwin_create.c:79)
> ==4440==by 0x5B3647: gtsk_setup_ (gtsk_nxtval.c:94)
> 
> ==4440== Invalid read of size 2
> ==4440==at 0xCAD68C4: process_frag (osc_rdma_data_move.c:1554)
> ==4440==by 0xCAD6DBB: ompi_osc_rdma_callback (osc_rdma_data_move.c:1656)
> ==4440==by 0xC05E87F: ompi_request_complete (request.h:402)
> ==4440==by 0xC05F1F6: recv_request_pml_complete (pml_ob1_recvreq.h:181)
> ==4440==by 0xC060476: mca_pml_ob1_recv_frag_callback_match 
> (pml_ob1_recvfrag.c:243)
> ==4440==by 0xB9F9D4E: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
> ==4440==by 0xB9FC23C: mca_btl_vader_component_progress 
> (btl_vader_component.c:695)
> ==4440==by 0x606C1C7: opal_progress (opal_progress.c:187)
> ==4440==by 0x50E7A22: opal_condition_wait (condition.h:78)
> ==4440==by 0x50E8360: ompi_request_default_wait_all (req_wait.c:281)
> ==4440==by 0xD1578C3: ompi_coll_tuned_sendrecv_zero 
> (coll_tuned_barrier.c:77)
> ==4440==by 0xD157FC1: ompi_coll_tuned_barrier_intra_two_procs 
> (coll_tuned_barrier.c:318)
> ==4440==  Address 0xd6d5d88 is 8 bytes inside a block of size 8,208 alloc'd
> ==4440==at 0x4C2CD7B: malloc (in 
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==4440==by 0x60BAD50: opal_malloc (malloc.c:101)
> ==4440==by 0xCAD07DF: component_select (osc_rdma_component.c:462)
> ==4440==by 0x51A2253: ompi_osc_base_select (osc_base_init.c:73)
> ==4440==by 0x50EF1E7: ompi_win_create (win.c:152)  
> ==4440==by 0x51625AB: PMPI_Win_create (pwin_create.c:79)
> ==4440==by 0x5B3647: gtsk_setup_ (gtsk_nxtval.c:94)
> ...
> 
> Enclosed please find the complete report for the master processes.  Could it 
> be that these invalid memory operations are caused by our code?  The line 94 
> in our code looks like:
> 
> MPI_Win_create(buff,size,sizeof(long int),MPI_INFO_NULL,MPI_COMM_WORLD,);
> 
> /* char *buff;
> MPI_Aint size;
> MPI_Win twin;
> */
> 
> I would greatly appreciate any help you can give me in working this problem.
> 
> With best regards,
> Victor.
> 
> P.s. The output of "ompi_info -- all" is  also attached. 

> ==4440== Memcheck, a memory error detector
> ==4440== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
> ==4440== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
> ==4440== 
> ==4440== Warning: set address range perms: large range [0x1617c028, 
> 0x2617c058) (noaccess)
> ==4440== Invalid read of size 1
> ==4440==at 0xCAD6D37: ompi_osc_rdma_callback (osc_rdma_data_move.c:1650)
> ==4440==by 0xC05E87F: ompi_request_complete (request.h:402)
> ==4440==by 0xC05F1F6: recv_request_pml_complete (pml_ob1_recvreq.h:181)
> ==4440==by 0xC060476: mca_pml_ob1_recv_frag_callback_match 
> (pml_ob1_recvfrag.c:243)
> ==4440==by 0xB9F9D4E: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220)
> ==4440==by 0xB9FC23C: mca_btl_vader_component_progress 
> (btl_vader_component.c:695)
> ==4440==by 0x606C1C7: opal_progress (opal_progress.c:187)
> ==4440==by 0x50E7A22: opal_condition_wait (condition.h:78)
> ==4440==by 0x50E8360: