I have some kind of memory corruption in a C++ program generated by a tool. The program uses my own exact garbage collector which may be the cause of the problem. The size of the data being processed is to big to trace anything by hand .. so I thought I'd try that excellent and magical tool, valgrind.
My problem is basically filtering out the false positives to find the real problem. To repeat, I know for sure I am writing to the wrong place, and that's causing my program to crash. The fault is intermittent in the sense that the exact crash cause and time varies a little bit (for example the buggy program doesn't crash under valgrind :) There are several possible sources of my bug. (a) bug in code generator (unlikely) (b) bug in library using some hand written C++ (unlikely) (c) bug in the gc -- most likely A GC bug is most likely to be deleting a reachable object. It's unlikely to be an actual *bug* in the code as such, though that's possible since I just found one yesterday and fixed it :) Let's look at what Valgrind is telling me: ==21994== Invalid read of size 8 ==21994== at 0x100011E94: flx::gc::collector::flx_collector_t::mark(std::vector<flx::pthread::memory_range_t, std::allocator<flx::pthread::memory_range_t> >*) (in ./ls) ==21994== by 0x100012718: flx::gc::collector::flx_collector_t::impl_collect() (in ./ls) ==21994== by 0x1000148C8: flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls) ==21994== by 0x100008839: flx::gc::generic::collector_t::collect() (in ./ls) ==21994== by 0x1000142C1: flx::gc::generic::gc_profile_t::actually_collect() (in ./ls) ==21994== by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() (in ./ls) ==21994== by 0x10001452E: flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, unsigned long, bool) (in ./ls) ==21994== by 0x10001467A: operator new(unsigned long, flx::gc::generic::gc_profile_t&, flx::gc::generic::gc_shape_t&, bool) (in ./ls) ==21994== by 0x100000D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, flx::rtl::_uctor_) (in ./ls) ==21994== by 0x7FFF5FBFCC3F: ??? ==21994== Address 0x7fff5fbfc908 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes My GC does a conservative scan of the stack. It's possible it looks beyond the top (lowest address) of the stack although this shouldn't happen (I will have to subtract sizeof(void*) from the stack value I calculate to fix this problem. However it is perfectly *legal* to do this on x86_64 platform: the ABI specifies a hot zone and code is free to use a certain number of bytes (256?) on the wrong side of the stack. So technically this is a bug in Valgrind: the read isn't invalid, it's just suspicious. ==21994== Use of uninitialised value of size 8 ==21994== at 0x100084E57: JudyLGet (in ./ls) ==21994== by 0x10006041D: JudyLLast (in ./ls) ==21994== by 0x10001183F: flx::gc::collector::flx_collector_t::scan_object(void*, int) (in ./ls) ==21994== by 0x100011EA2: flx::gc::collector::flx_collector_t::mark(std::vector<flx::pthread::memory_range_t, std::allocator<flx::pthread::memory_range_t> >*) (in ./ls) ==21994== by 0x100012718: flx::gc::collector::flx_collector_t::impl_collect() (in ./ls) ==21994== by 0x1000148C8: flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls) ==21994== by 0x100008839: flx::gc::generic::collector_t::collect() (in ./ls) ==21994== by 0x1000142C1: flx::gc::generic::gc_profile_t::actually_collect() (in ./ls) ==21994== by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() (in ./ls) ==21994== by 0x10001452E: flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, unsigned long, bool) (in ./ls) ==21994== by 0x10001467A: operator new(unsigned long, flx::gc::generic::gc_profile_t&, flx::gc::generic::gc_shape_t&, bool) (in ./ls) ==21994== by 0x100000D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, flx::rtl::_uctor_) (in ./ls) ==21994== The attempt to allocate an object (the "new" just above) has triggered a garbage collection. I have no idea what the address of the uninitialised value is, why doesn't Valgrind tell me? I get a lot of these. They're almost all certainly false positives. The only (intentionally) uninitialised values being read is the one word on the wrong side of the stack error mentioned above. However, I don't know how Valgrind is tracking whether something is initialised or not. How does it do it? What is actually happening above is I'm using a JudyArray. This is a digital trie, so it is "calculating" pointers. But there should not be any cases of reading store at an address that isn't initialised. Here is the call: void flx_collector_t::scan_object(void *p, int reclimit) { Word_t reachable = (parity & 1UL) ^ 1UL; again: if(debug) fprintf(stderr,"Scan object %p, reachable bit value = %d\n",p,(int)reachable); Word_t cand = (Word_t)p; Word_t fp=cand; Word_t *w = (Word_t*)JudyLLast(j_shape,&fp,&je); "je" is an error storage, so not relevant. j_shape is a mapping from objects to shapes, all hell would break loose if that were uninitialised. The variable fp is manifestly initialised. This code dereferences p if, and only if, it is known to be an allocated object, BUT that doesn't happen at this point, in fact the call on that last line is actually the check to see if the object is allocated! So I'm confused. What does the diagnostic actually mean? -- john skaller skal...@users.sourceforge.net ------------------------------------------------------------------------------ Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users