On 12/09/2015 10:00 AM, Nikolaus Rath wrote: > Hi Philippe, > > I found that I can work around the problem of gdb failing to produce > backtraces by compiling with -O0. Switching to -O1 or higher is enough to > cause issues. I also experimented using dwarf-2, dwarf-3, or dwarf-3 debug > information but that did not seem to matter. > > I tried to narrow down the problem with -O1, -gdwarf2, newer valgrind, and > newer gdb: > > $ valgrind --tool=massif --vgdb-error=0 ../../Q2D/LamyRidge/src/model/LR_model > ==4881== Massif, a heap profiler > ==4881== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote > ==4881== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info > ==4881== Command: ../../Q2D/LamyRidge/src/model/LR_model > [...] > > $ gdb ../../Q2D/LamyRidge/src/model/LR_model > GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 > Copyright (C) 2014 Free Software Foundation, Inc. > [...] > (gdb) target remote | /usr/lib/valgrind/../../bin/vgdb --pid=4881 > Remote debugging using | /usr/lib/valgrind/../../bin/vgdb --pid=4881 > [...] > (gdb) b taehdf5.f90:1936 > (gdb) c > (gdb) c > (gdb) b H5FL_reg_calloc > (gdb) c > Continuing. > >[...] > > So as far as I can tell, valgrind is getting the backtrace right. Is this > correct? > > If so, I guess the only explanation is that I am not setting the breakpoint > at the time where massif takes the snapshot?
Ok, I fell into a trap. I assumed that whatever causes gdb to hang when trying to print a backtrace also causes valgrind to produce wrong stacktraces. But that is not the case. So, when compiling with -O1 -gdwarf-2, the valgrind and gdb backtraces agree. However, when compiling with -O3 -gdwarf-2, there is a difference: Valgrind thinks: (gdb) monitor v.info scheduler [...] Thread 1: status = VgTs_Runnable ==5489== at 0x1010750: H5FL_reg_calloc (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model) ==5489== by 0xFA64E9: H5A_create (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model) ==5489== by 0xFA0610: H5Acreate2 (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model) ==5489== by 0xF8F3BD: h5acreate_c_ (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model) ==5489== by 0xF897B6: h5a_mp_h5acreate_f_ (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model) ==5489== by 0xB99FC6: taehdf5_mp_h5append_data_double_0d_ (taehdf5.f90:1936) ==5489== by 0xB248E6: plot_m_mp_plots_ (plot_hdf5.f:144) ==5489== by 0xB3B722: lr_mod_m_mp_check_dt_ (LR_model.F:487) ==5489== by 0xB272E3: lr_mod_m_mp_lr_step_ (LR_model.F:252) ==5489== by 0xB261DD: MAIN__ (LR_model.F:544) ==5489== by 0x406E3D: main (in /mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model) client stack range: [0xFFEBFE000 0xFFF000FFF] client SP: 0xFFEC2CFC8 valgrind stack top usage: 12424 of 1048576 But gdb says: (gdb) bt #0 0x0000000001010750 in H5FL_reg_calloc () #1 0x0000000000fa64ea in H5A_create () #2 0x0000000000fa0611 in H5Acreate2 () #3 0x0000000000f8f3be in h5acreate_c_ () #4 0x0000000000f897b7 in h5a_mp_h5acreate_f_ () #5 0x0000000000b99fc7 in h5dump_attr_int (loc_id=<optimized out>, f=<optimized out>, name=..., .tmp.NAME.len_V$1086=<optimized out>) at /home/nrath/Q2D/utils/src/taehdf5.f90:1936 #6 h5append_data_double_0d (group_id=1, f=<error reading variable: Cannot access memory at address 0xa000008>, name=..., .tmp.NAME.len_V$1cd8=272) at /home/nrath/Q2D/utils/src/taehdf5.f90:4193 #7 0x0000000000b248e7 in plot_m::plots (idt=1) at /home/nrath/Q2D/LamyRidge/src/model/plot_hdf5.f:144 #8 0x0000000000b3b723 in lr_mod_m::check_dt (idt=1) at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:487 #9 0x0000000000b272e4 in lr_mod_m::lr_step (idt=1, dt_r=<error reading variable: Cannot access memory at address 0xa000008>, t_r=0) at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:252 #10 0x0000000000b261de in lr_model () at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:544 Interestingly enough, but stacktraces are incorrect: gdb is missing the call to taehdf5_mp_h5append_data_double_0d_, and valgrind is missing the call to h5dump_attr_int. This is with valgrind 3.10.0 and gdb 7.7.1 (as above). (I also tried compiling with just "-O3" (should be using dwarf-3), "-O3 -gdwarf-4", and just "-O2", but the stacktrace difference was there in every case). Short of only using -O1 and -O0, is there a way to fix this? Best, -Nikolaus ------------------------------------------------------------------------------ _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users