[Valgrind-users] Debugging a GC with valgrind

john skaller Mon, 17 Jan 2011 19:06:32 -0800

I have some kind of memory corruption in a C++ program generated by a tool.
The program uses my own exact garbage collector which may be the cause
of the problem. The size of the data being processed is to big to trace
anything by hand .. so I thought I'd try that excellent and magical tool, 
valgrind.


My problem is basically filtering out the false positives to find the real
problem. To repeat, I know for sure I am writing to the wrong place, and that's
causing my program to crash. The fault is intermittent in the sense that the
exact crash cause and time varies a little bit (for example the buggy program
doesn't crash under valgrind :)

There are several possible sources of my bug. 

(a) bug in code generator (unlikely)
(b) bug in library using some hand written C++ (unlikely)
(c) bug in the gc -- most likely

A GC bug is most likely to be deleting a reachable object. It's unlikely
to be an actual *bug* in the code as such, though that's possible
since I just found one yesterday and fixed it :)

Let's look at what Valgrind is telling me:

==21994== Invalid read of size 8
==21994==    at 0x100011E94: 
flx::gc::collector::flx_collector_t::mark(std::vector<flx::pthread::memory_range_t,
 std::allocator<flx::pthread::memory_range_t> >*) (in ./ls)
==21994==    by 0x100012718: 
flx::gc::collector::flx_collector_t::impl_collect() (in ./ls)
==21994==    by 0x1000148C8: 
flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls)
==21994==    by 0x100008839: flx::gc::generic::collector_t::collect() (in ./ls)
==21994==    by 0x1000142C1: flx::gc::generic::gc_profile_t::actually_collect() 
(in ./ls)
==21994==    by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() 
(in ./ls)
==21994==    by 0x10001452E: 
flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
unsigned long, bool) (in ./ls)
==21994==    by 0x10001467A: operator new(unsigned long, 
flx::gc::generic::gc_profile_t&, flx::gc::generic::gc_shape_t&, bool) (in ./ls)
==21994==    by 0x100000D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, 
flx::rtl::_uctor_) (in ./ls)
==21994==    by 0x7FFF5FBFCC3F: ???
==21994==  Address 0x7fff5fbfc908 is just below the stack ptr.  To suppress, 
use: --workaround-gcc296-bugs=yes

My GC does a conservative scan of the stack. It's possible it looks beyond the 
top (lowest address) of the
stack although this shouldn't happen (I will have to subtract sizeof(void*) 
from the stack value I calculate to fix
this problem. However it is perfectly *legal* to do this on x86_64 platform: 
the ABI specifies a hot zone and
code is free to use a certain number of bytes (256?) on the wrong side of the 
stack. So technically
this is a bug in Valgrind: the read isn't invalid, it's just suspicious.

==21994== Use of uninitialised value of size 8
==21994==    at 0x100084E57: JudyLGet (in ./ls)
==21994==    by 0x10006041D: JudyLLast (in ./ls)
==21994==    by 0x10001183F: 
flx::gc::collector::flx_collector_t::scan_object(void*, int) (in ./ls)
==21994==    by 0x100011EA2: 
flx::gc::collector::flx_collector_t::mark(std::vector<flx::pthread::memory_range_t,
 std::allocator<flx::pthread::memory_range_t> >*) (in ./ls)
==21994==    by 0x100012718: 
flx::gc::collector::flx_collector_t::impl_collect() (in ./ls)
==21994==    by 0x1000148C8: 
flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls)
==21994==    by 0x100008839: flx::gc::generic::collector_t::collect() (in ./ls)
==21994==    by 0x1000142C1: flx::gc::generic::gc_profile_t::actually_collect() 
(in ./ls)
==21994==    by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() 
(in ./ls)
==21994==    by 0x10001452E: 
flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
unsigned long, bool) (in ./ls)
==21994==    by 0x10001467A: operator new(unsigned long, 
flx::gc::generic::gc_profile_t&, flx::gc::generic::gc_shape_t&, bool) (in ./ls)
==21994==    by 0x100000D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, 
flx::rtl::_uctor_) (in ./ls)
==21994== 

The attempt to allocate an object (the "new" just above) has triggered a 
garbage collection.
I have no idea what the address of the uninitialised value is, why doesn't 
Valgrind tell me?

I get a lot of these. They're almost all certainly false positives. The only 
(intentionally) uninitialised
values being read is the one word on the wrong side of the stack error 
mentioned above.

However, I don't know how Valgrind is tracking whether something is initialised 
or not.
How does it do it?

What is actually happening above is I'm using a JudyArray. This is a digital 
trie, so it is
"calculating" pointers. But there should not be any cases of reading store at 
an address that
isn't initialised. Here is the call:

void flx_collector_t::scan_object(void *p, int reclimit)
{
  Word_t reachable = (parity & 1UL) ^ 1UL;
again:
  if(debug)
    fprintf(stderr,"Scan object %p, reachable bit value = 
%d\n",p,(int)reachable);
  Word_t cand = (Word_t)p;
  Word_t fp=cand;
  Word_t *w = (Word_t*)JudyLLast(j_shape,&fp,&je);

"je" is an error storage, so not relevant. j_shape is a mapping from
objects to shapes, all hell would break loose if that were uninitialised.
The variable fp is manifestly initialised. This code dereferences p if,
and only if, it is known to be an allocated object, BUT that doesn't happen
at this point, in fact the call on that last line is actually the check to see
if the object is allocated!

So I'm confused. What does the diagnostic actually mean?



--
john skaller
skal...@users.sourceforge.net





------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

[Valgrind-users] Debugging a GC with valgrind

Reply via email to