On 12/09/2015 10:00 AM, Nikolaus Rath wrote:
> Hi Philippe,
> 
> I found that I can work around the problem of gdb failing to produce 
> backtraces by compiling with -O0. Switching to -O1 or higher is enough to 
> cause issues. I also experimented using dwarf-2, dwarf-3, or dwarf-3 debug 
> information but that did not seem to matter.
> 
> I tried to narrow down the problem with -O1, -gdwarf2, newer valgrind, and 
> newer gdb:
> 
> $ valgrind --tool=massif --vgdb-error=0 ../../Q2D/LamyRidge/src/model/LR_model
> ==4881== Massif, a heap profiler
> ==4881== Copyright (C) 2003-2013, and GNU GPL'd, by Nicholas Nethercote
> ==4881== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
> ==4881== Command: ../../Q2D/LamyRidge/src/model/LR_model
> [...]
> 
> $ gdb ../../Q2D/LamyRidge/src/model/LR_model
> GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> [...]
> (gdb) target remote | /usr/lib/valgrind/../../bin/vgdb --pid=4881
> Remote debugging using | /usr/lib/valgrind/../../bin/vgdb --pid=4881
> [...]
> (gdb) b taehdf5.f90:1936
> (gdb) c
> (gdb) c
> (gdb) b H5FL_reg_calloc
> (gdb) c
> Continuing.
> 
>[...]
> 
> So as far as I can tell, valgrind is getting the backtrace right. Is this 
> correct?
> 
> If so, I guess the only explanation is that I am not setting the breakpoint 
> at the time where massif takes the snapshot?

Ok, I fell into a trap. I assumed that whatever causes gdb to hang when trying 
to print a backtrace also causes valgrind to produce wrong stacktraces. But 
that is not the case. 

So, when compiling with -O1 -gdwarf-2, the valgrind and gdb backtraces agree. 
However, when compiling with -O3 -gdwarf-2, there is a difference:

Valgrind thinks:

(gdb) monitor v.info scheduler
[...]
Thread 1: status = VgTs_Runnable
==5489==    at 0x1010750: H5FL_reg_calloc (in 
/mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489==    by 0xFA64E9: H5A_create (in 
/mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489==    by 0xFA0610: H5Acreate2 (in 
/mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489==    by 0xF8F3BD: h5acreate_c_ (in 
/mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489==    by 0xF897B6: h5a_mp_h5acreate_f_ (in 
/mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
==5489==    by 0xB99FC6: taehdf5_mp_h5append_data_double_0d_ (taehdf5.f90:1936)
==5489==    by 0xB248E6: plot_m_mp_plots_ (plot_hdf5.f:144)
==5489==    by 0xB3B722: lr_mod_m_mp_check_dt_ (LR_model.F:487)
==5489==    by 0xB272E3: lr_mod_m_mp_lr_step_ (LR_model.F:252)
==5489==    by 0xB261DD: MAIN__ (LR_model.F:544)
==5489==    by 0x406E3D: main (in 
/mnt/nfs-home/nrath/Q2D/LamyRidge/src/model/build/LR_model)
client stack range: [0xFFEBFE000 0xFFF000FFF] client SP: 0xFFEC2CFC8
valgrind stack top usage: 12424 of 1048576

But gdb says:

(gdb) bt
#0  0x0000000001010750 in H5FL_reg_calloc ()
#1  0x0000000000fa64ea in H5A_create ()
#2  0x0000000000fa0611 in H5Acreate2 ()
#3  0x0000000000f8f3be in h5acreate_c_ ()
#4  0x0000000000f897b7 in h5a_mp_h5acreate_f_ ()
#5  0x0000000000b99fc7 in h5dump_attr_int (loc_id=<optimized out>, f=<optimized 
out>, name=..., 
    .tmp.NAME.len_V$1086=<optimized out>) at 
/home/nrath/Q2D/utils/src/taehdf5.f90:1936
#6  h5append_data_double_0d (group_id=1, 
    f=<error reading variable: Cannot access memory at address 0xa000008>, 
name=..., 
    .tmp.NAME.len_V$1cd8=272) at /home/nrath/Q2D/utils/src/taehdf5.f90:4193
#7  0x0000000000b248e7 in plot_m::plots (idt=1) at 
/home/nrath/Q2D/LamyRidge/src/model/plot_hdf5.f:144
#8  0x0000000000b3b723 in lr_mod_m::check_dt (idt=1)
    at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:487
#9  0x0000000000b272e4 in lr_mod_m::lr_step (idt=1, 
    dt_r=<error reading variable: Cannot access memory at address 0xa000008>, 
t_r=0)
    at /home/nrath/Q2D/LamyRidge/src/model/LR_model.F:252
#10 0x0000000000b261de in lr_model () at 
/home/nrath/Q2D/LamyRidge/src/model/LR_model.F:544

Interestingly enough, but stacktraces are incorrect: gdb is missing the call to 
taehdf5_mp_h5append_data_double_0d_, and valgrind is missing the call to 
h5dump_attr_int. 

This is with valgrind 3.10.0 and gdb 7.7.1 (as above).

(I also tried compiling with just "-O3" (should be using dwarf-3), "-O3 
-gdwarf-4", and just "-O2", but the stacktrace difference was there in every 
case).


Short of only using -O1 and -O0, is there a way to fix this?

Best,
-Nikolaus









------------------------------------------------------------------------------
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to