Re: [Valgrind-users] Debugging a GC with valgrind

Dave Goodell Tue, 18 Jan 2011 11:04:33 -0800

A few things that might help you here:

1) Build your program with debugging information, which will help you to 
understand exactly which line is causing a problem in your stack traces.


2) Tracking down "uninitialized value" warnings is much easier if you use the 
"--track-origins=yes" option to Valgrind.

3) I have a pretty limited understanding of Valgrind's handling of stack red 
zones, but there's a handy comment in memcheck/mc_main.c that sheds some light 
on the situation:

----8<----
   Dealing with stack redzones, and the NIA cache
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   This is one of the few non-obvious parts of the implementation.

   Some ABIs (amd64-ELF, ppc64-ELF, ppc32/64-XCOFF) define a small
   reserved area below the stack pointer, that can be used as scratch
   space by compiler generated code for functions.  In the Memcheck
   sources this is referred to as the "stack redzone".  The important
   thing here is that such redzones are considered volatile across
   function calls and returns.  So Memcheck takes care to mark them as
   undefined for each call and return, on the afflicted platforms.
   Past experience shows this is essential in order to get reliable
   messages about uninitialised values that come from the stack.
----8<----

The key bit is that Valgrind is marking the whole red zone as "undefined" at 
function entrance/exit, so only areas that are actually written during that 
function are potentially going to be marked as "defined".  Given this, you'll 
probably need to play some games with Valgrind's client request mechanism to 
temporarily tell valgrind that accesses to the red zone are safe.  I'm guessing 
that the solution would look something like this:

----8<----
#define RZ_SZB (128)
char *sp = /* stack pointer value */;
char vbits[RZ_SZB] = {0};
VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB);
/* ... scan the red zone here ... */
VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
----8<----

-Dave

On Jan 17, 2011, at 9:03 PM CST, john skaller wrote:

> I have some kind of memory corruption in a C++ program generated by a tool.
> The program uses my own exact garbage collector which may be the cause
> of the problem. The size of the data being processed is to big to trace
> anything by hand .. so I thought I'd try that excellent and magical tool, 
> valgrind.
> 
> My problem is basically filtering out the false positives to find the real
> problem. To repeat, I know for sure I am writing to the wrong place, and 
> that's
> causing my program to crash. The fault is intermittent in the sense that the
> exact crash cause and time varies a little bit (for example the buggy program
> doesn't crash under valgrind :)
> 
> There are several possible sources of my bug. 
> 
> (a) bug in code generator (unlikely)
> (b) bug in library using some hand written C++ (unlikely)
> (c) bug in the gc -- most likely
> 
> A GC bug is most likely to be deleting a reachable object. It's unlikely
> to be an actual *bug* in the code as such, though that's possible
> since I just found one yesterday and fixed it :)
> 
> Let's look at what Valgrind is telling me:
> 
> ==21994== Invalid read of size 8
> ==21994==    at 0x100011E94: 
> flx::gc::collector::flx_collector_t::mark(std::vector<flx::pthread::memory_range_t,
>  std::allocator<flx::pthread::memory_range_t> >*) (in ./ls)
> ==21994==    by 0x100012718: 
> flx::gc::collector::flx_collector_t::impl_collect() (in ./ls)
> ==21994==    by 0x1000148C8: 
> flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls)
> ==21994==    by 0x100008839: flx::gc::generic::collector_t::collect() (in 
> ./ls)
> ==21994==    by 0x1000142C1: 
> flx::gc::generic::gc_profile_t::actually_collect() (in ./ls)
> ==21994==    by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() 
> (in ./ls)
> ==21994==    by 0x10001452E: 
> flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
> unsigned long, bool) (in ./ls)
> ==21994==    by 0x10001467A: operator new(unsigned long, 
> flx::gc::generic::gc_profile_t&, flx::gc::generic::gc_shape_t&, bool) (in 
> ./ls)
> ==21994==    by 0x100000D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, 
> flx::rtl::_uctor_) (in ./ls)
> ==21994==    by 0x7FFF5FBFCC3F: ???
> ==21994==  Address 0x7fff5fbfc908 is just below the stack ptr.  To suppress, 
> use: --workaround-gcc296-bugs=yes
> 
> My GC does a conservative scan of the stack. It's possible it looks beyond 
> the top (lowest address) of the
> stack although this shouldn't happen (I will have to subtract sizeof(void*) 
> from the stack value I calculate to fix
> this problem. However it is perfectly *legal* to do this on x86_64 platform: 
> the ABI specifies a hot zone and
> code is free to use a certain number of bytes (256?) on the wrong side of the 
> stack. So technically
> this is a bug in Valgrind: the read isn't invalid, it's just suspicious.
> 
> ==21994== Use of uninitialised value of size 8
> ==21994==    at 0x100084E57: JudyLGet (in ./ls)
> ==21994==    by 0x10006041D: JudyLLast (in ./ls)
> ==21994==    by 0x10001183F: 
> flx::gc::collector::flx_collector_t::scan_object(void*, int) (in ./ls)
> ==21994==    by 0x100011EA2: 
> flx::gc::collector::flx_collector_t::mark(std::vector<flx::pthread::memory_range_t,
>  std::allocator<flx::pthread::memory_range_t> >*) (in ./ls)
> ==21994==    by 0x100012718: 
> flx::gc::collector::flx_collector_t::impl_collect() (in ./ls)
> ==21994==    by 0x1000148C8: 
> flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls)
> ==21994==    by 0x100008839: flx::gc::generic::collector_t::collect() (in 
> ./ls)
> ==21994==    by 0x1000142C1: 
> flx::gc::generic::gc_profile_t::actually_collect() (in ./ls)
> ==21994==    by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() 
> (in ./ls)
> ==21994==    by 0x10001452E: 
> flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
> unsigned long, bool) (in ./ls)
> ==21994==    by 0x10001467A: operator new(unsigned long, 
> flx::gc::generic::gc_profile_t&, flx::gc::generic::gc_shape_t&, bool) (in 
> ./ls)
> ==21994==    by 0x100000D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, 
> flx::rtl::_uctor_) (in ./ls)
> ==21994== 
> 
> The attempt to allocate an object (the "new" just above) has triggered a 
> garbage collection.
> I have no idea what the address of the uninitialised value is, why doesn't 
> Valgrind tell me?
> 
> I get a lot of these. They're almost all certainly false positives. The only 
> (intentionally) uninitialised
> values being read is the one word on the wrong side of the stack error 
> mentioned above.
> 
> However, I don't know how Valgrind is tracking whether something is 
> initialised or not.
> How does it do it?
> 
> What is actually happening above is I'm using a JudyArray. This is a digital 
> trie, so it is
> "calculating" pointers. But there should not be any cases of reading store at 
> an address that
> isn't initialised. Here is the call:
> 
> void flx_collector_t::scan_object(void *p, int reclimit)
> {
>  Word_t reachable = (parity & 1UL) ^ 1UL;
> again:
>  if(debug)
>    fprintf(stderr,"Scan object %p, reachable bit value = 
> %d\n",p,(int)reachable);
>  Word_t cand = (Word_t)p;
>  Word_t fp=cand;
>  Word_t *w = (Word_t*)JudyLLast(j_shape,&fp,&je);
> 
> "je" is an error storage, so not relevant. j_shape is a mapping from
> objects to shapes, all hell would break loose if that were uninitialised.
> The variable fp is manifestly initialised. This code dereferences p if,
> and only if, it is known to be an allocated object, BUT that doesn't happen
> at this point, in fact the call on that last line is actually the check to see
> if the object is allocated!
> 
> So I'm confused. What does the diagnostic actually mean?
> 
> 
> 
> --
> john skaller
> skal...@users.sourceforge.net
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand 
> malware threats, the impact they can have on your business, and how you 
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users


------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] Debugging a GC with valgrind

Reply via email to