Yes, I already removed the ones that you mentioned several weeks ago. It's good now that I have this good setup on my own Linux box; before I have been using exclusively on NERSC machine. Running valgrind in Cray environment gives huge number of warnings, pretty much useless.
Sherry On Wed, Mar 2, 2016 at 3:16 PM, Barry Smith <[email protected]> wrote: > > > On Mar 2, 2016, at 4:22 PM, Xiaoye S. Li <[email protected]> wrote: > > > > I didn't do that. I will re-configure MPICH. > > > > But, the superlu_dist_valgrind_errors file you sent me a few days ago > also contains this kind of errors. > > You are right. I apologize, for the tests we run now we do not see any > valgrind errors inside SuperLU_Dist. Looks like you have eliminated all of > the ones we use to see with your previous set of fixes. > > Barry > > > > > > > Sherry > > > > > > On Wed, Mar 2, 2016 at 2:19 PM, Barry Smith <[email protected]> wrote: > > > > When you configured MPICH did you use the flag --enable-g=meminit so > it would not generate its own valgrind errors? > > > > Barry > > > > > On Mar 2, 2016, at 4:11 PM, Xiaoye S. Li <[email protected]> wrote: > > > > > > I check that file, it also show not stripped. Not sure why it doesn't > work. Now I am using static library build to run valgrind, which works > fine. > > > > > > Now on to the valgrind output, I see quite a few warnings are > unnecessary. For example, > > > > > > ==13292== Conditional jump or move depends on uninitialised value(s) > > > ==13292== at 0x5452D86: MPIC_Waitall (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > ==13292== by 0x53AB23F: MPIR_Alltoall_intra (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > ==13292== by 0x53ABFD4: MPIR_Alltoall (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > ==13292== by 0x53AC08D: MPIR_Alltoall_impl (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > ==13292== by 0x53AC896: PMPI_Alltoall (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > ==13292== by 0x418161: dReDistribute_A (pddistribute.c:108) > > > ==13292== by 0x41950B: pddistribute (pddistribute.c:450) > > > ==13292== by 0x407D6A: pdgssvx (pdgssvx.c:1080) > > > ==13292== by 0x4027E5: main (pddrive.c:171) > > > > > > The line in pddistribute.c: 108 is this: > > > > > > MPI_Alltoall( nnzToSend, 1, mpi_int_t, nnzToRecv, 1, mpi_int_t, > > > grid->comm); > > > > > > For both buffers nnzToSend and nnzToRecv, I use "calloc" version to > allocate memory, i.e., malloc first, followed by zeroing the buffer. > > > mpi_int_t is defined as MPI_INT. > > > Why does it complain about uninitialized values? > > > > > > > > > Sherry > > > > > > > > > > > > > > > On Tue, Mar 1, 2016 at 8:27 PM, Satish Balay <[email protected]> > wrote: > > > sometimes 'cmake' does a 'strip' during install of the library [which > > > can delete the debug symbols]. We had to track this down for one of > > > the cmake packages. I don't remember what we did to workarround it.. > > > > > > >> > > > petsc@es:/scratch/petsc/petsc/arch-linux-pkgs-valgrind/lib$ file > libsuperlu_dist.so.5.0.0 > > > libsuperlu_dist.so.5.0.0: ELF 64-bit LSB shared object, x86-64, > version 1 (SYSV), dynamically linked, not stripped > > > << > > > > > > looks like superlu_dist installed by petsc is not stripped. Perhaps > > > you can try: > > > > > > file > /home/xiaoye/Dropbox/Codes/SuperLU/superlu_dist.git/lib/libsuperlu_dist.so.5.0.0 > > > > > > Satish > > > > > > On Tue, 1 Mar 2016, Barry Smith wrote: > > > > > > > > > > > Satish will know far better than me. I only use Linux when my Mac > OS fails me :-( > > > > > > > > > > > > > On Mar 1, 2016, at 8:41 PM, Xiaoye S. Li <[email protected]> wrote: > > > > > > > > > > This is on linux (ubunto). I did compile with -g, but only the > example driver (which is outside library) shows the line number, the > routine in the *.so does not show line number, see this: > > > > > > > > > > ==31609== Conditional jump or move depends on uninitialised > value(s) > > > > > ==31609== at 0x51EED86: MPIC_Waitall (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > > > ==31609== by 0x5148F99: MPIR_Alltoallv_intra (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > > > ==31609== by 0x5149916: MPIR_Alltoallv (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > > > ==31609== by 0x51499F6: MPIR_Alltoallv_impl (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > > > ==31609== by 0x514A0C7: PMPI_Alltoallv (in > /home/xiaoye/mpich-install/lib/libmpi.so.12.1.0) > > > > > ==31609== by 0x4E7C56A: pdCompRow_loc_to_CompCol_global (in > /home/xiaoye/Dropbox/Codes/SuperLU/superlu_dist.git/lib/libsupe\ > > > > > rlu_dist.so.5.0.0) > > > > > ==31609== by 0x4E71761: pdgssvx (in > /home/xiaoye/Dropbox/Codes/SuperLU/superlu_dist.git/lib/libsuperlu_dist.so.5.0.0) > > > > > ==31609== by 0x401400: main (pddrive.c:171) > > > > > > > > > > > > > > > Here are the flags: > > > > > > > > > > C_FLAGS = -DUSE_VENDOR_BLAS -DAdd_ -DDEBUGlevel=0 -DPRNTlevel=0 > -std=c99 -g -fPIC -I/home/xiaoye/Dropbox/Codes/SuperLU/superl\ > > > > > u_dist.git/SRC -I/home/xiaoye/lib/parmetis-4.0.3/include > -I/home/xiaoye/lib/parmetis-4.0.3/metis/include -I/home/xiaoye/mpich-\ > > > > > install/include > > > > > > > > > > > > > > > Any idea? > > > > > Sherry > > > > > > > > > > > > > > > On Tue, Mar 1, 2016 at 6:00 PM, Barry Smith <[email protected]> > wrote: > > > > > > > > > > > On Mar 1, 2016, at 7:41 PM, Xiaoye S. Li <[email protected]> wrote: > > > > > > > > > > > > Barry, > > > > > > > > > > > > I am cleaning up the valgrind errors. I did a build with shared > library option, but valgrind doesn't give me the source code line number. > Is it true that I need to build as static library? > > > > > > > > > > No but if you are running on an Apple you may need the > additional valgrind option --dsymutil=yes (yes it is totally goofy that it > doesn't just do this automatically). Also, of course, the source code needs > be compiled with the -g option. > > > > > > > > > > Barry > > > > > > > > > > > > > > > > > Sherry > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
