Re: [Valgrind-users] Debugging a GC with valgrind
8 #define RZ_SZB (128) char *sp = /* stack pointer value */; char vbits[RZ_SZB] = {0}; VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB); /* ... scan the red zone here ... */ VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); 8 I can try that, but really the problem isn't reading uninitialised values. My theory is gc is deleting reachable store: the actual bug is NOT reading some thing it should not be :) ... That would be nice but it is unclear. Consider: the only way** a GC could cause the problem of an over-write would be to delete a reachable object. ** unless there were some stupid bug in the GC, I did have one: chasing pointers down with recursion .. blows the stack on a long enough list. ... I understood that this GC is directly calling malloc/free for each object (so there is GC pool management). Then valgrind memcheck is (supposed to be) able to detect that GC is freeing a piece of memory, which is then dereferenced. You might need to increase the list of freed objects but kept in a corner to detect such bugs by using the option --freelist-vol= If GC is maintaining its own mempool, then valgrind mempool requests needs to be used. If the application is multi-threaded, it might also be a race condition. You could try helgrind or drd. Philippe This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL, unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
On Jan 20, 2011, at 6:43 PM CST, john skaller wrote: OK, so I do this now: ... if(debug) fprintf(stderr, Check if *%p=%p is a pointer\n,i,*(void**)i); scan_object(*(void**)i, reclimit); ... The VALGRIND macro there doesn't seem to be working, I must be doing something wrong. I'm trying to just mark the whole stack as defined. Here's output: ... The way that I read this output is that your range variable and the data in that range is probably defined because Valgrind isn't flagging i or the value at address i when you perform the Check if fprintf above. Rather, it's flagging an fprintf inside of scan_object on line 451, whose output you did not include. Maybe it's the reclimit variable? ==7159== Uninitialised value was created by a stack allocation ==7159==at 0x1E492: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) It's odd that the stack allocation is attributed to line 337, which AFAICT is the line where scan_object is called. Is there some sort of odd automatic C++ temporary allocation happening here that I can't see because some other code isn't shown here or (more likely) my C++ is too rusty somehow? Perhaps under optimization an uninitialized and otherwise-unused reclimit variable only gets created at the time that the arguments to scan_object are pushed onto the stack? -Dave -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
On 22/01/2011, at 2:30 AM, Dave Goodell wrote: On Jan 20, 2011, at 6:43 PM CST, john skaller wrote: OK, so I do this now: ... if(debug) fprintf(stderr, Check if *%p=%p is a pointer\n,i,*(void**)i); scan_object(*(void**)i, reclimit); ... The VALGRIND macro there doesn't seem to be working, I must be doing something wrong. I'm trying to just mark the whole stack as defined. Here's output: ... The way that I read this output is that your range variable and the data in that range is probably defined because Valgrind isn't flagging i or the value at address i when you perform the Check if fprintf above. Good point.. Rather, it's flagging an fprintf inside of scan_object on line 451, whose output you did not include. Maybe it's the reclimit variable? I shouldn't think so, though it's hard to be sure of anything. Scan object look to see if the pointer value it gets is actually a pointer into the heap. If so, it looks at the pointer inside the pointed at object (recursively). I know where the pointers are because I know the type of every heap object. It doesn't chase down the pointer unless its an actual pointer into the heap: it doesn't chase ints, raw C pointers (not Felix heap allocated), or pointers into the stack or static storage. In this program there is only ONE data structure on the heap: list nodes. Which contain exactly one pointer (to the next node). There are no list nodes on the stack. The pointers used for lists are actually tagged pointers: struct _uctor_ { int variant; void *data; }; variant = 0 means end of list and 1 means a node with data in it is being pointed at. [So a node the last node has a value in it and a pointer with variant 0 and data=NULL]. It is the variant which is sometimes overwritten with a value like 99762976, it should be only 0 or 1, my has switches on the variant, which include a wildcard for values other than 0 or 1, which cause a diagnostic to be printed match failure and then abort the program. ==7159== Uninitialised value was created by a stack allocation ==7159==at 0x1E492: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) It's odd that the stack allocation is attributed to line 337, which AFAICT is the line where scan_object is called. Is there some sort of odd automatic C++ temporary allocation happening here that I can't see because some other code isn't shown here or (more likely) my C++ is too rusty somehow? I don't think so. Perhaps under optimization an uninitialized and otherwise-unused reclimit variable only gets created at the time that the arguments to scan_object are pushed onto the stack? Well that's possible, but I'm doing a debug build, I *think* that doesn't do C++ optimisation (normally I run with -O3 on gcc). The nasty thing is now it is working, some quirk has changed something.. :) -- john skaller skal...@users.sourceforge.net -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
On Jan 20, 2011, at 4:35 AM CST, john skaller wrote: On 20/01/2011, at 3:04 AM, Dave Goodell wrote: On Jan 19, 2011, at 9:52 AM CST, john skaller wrote: On 20/01/2011, at 2:39 AM, Dave Goodell wrote: On Jan 18, 2011, at 10:56 PM CST, john skaller wrote: I could rewrite the GC so that the stack scan is in a separate subroutine, and then just exclude that using Valgrinds nice suppression mechanism. Yes, although this doesn't get rid of the uninitialized values, which could potentially propagate elsewhere in your code. It just suppresses error _reporting_ Well there's no way to get rid of these uninitialised values. Most are in fact initialised. The problem is something like: when a subroutine is called the return address is pushed on the stack, along with callee-save registers. I suspect Valgrind thinks these are uninitialised values. Valgrind tracks the V bits (validity bits) for registers too, so the validity of any pushed register values on the stack will depend on the validity of the register contents before the push. In other cases: consider, I have a data structure: struct { int x; long y; } a; on the stack with say 4 bytes padding after 'x'. When I read the 8 byte word at address a half is uninitialised. I'm not sure what Valgrind would say here: most of my uninitialised values are Value8. There is a way to get rid of the uninitialized values, one that I posted earlier in the thread. Try reading over the manual's explanation of how Valgrind tracks defined/undefined data: http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine And then the section on client requests: http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq If you follow that, then I think the best approach is to adapt the client request code sketch that I posted before for scanning the whole stack: 8 #define RZ_SZB (128) char *sp = /* stack pointer value */; char vbits[RZ_SZB] = {0}; VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB); /* ... scan the red zone here ... */ VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); 8 Right. I was thinking this was an external thread or signal handler examining another stack, in which case you would need to scan the whole red zone. But if it all happens as a result of an explicit new/allocate call then scanning the red zone shouldn't be necessary. Yeah. I'm not sure what happens on x86_64 Unix (OSX, Linux) with signals: I have a feeling they do not use the applications stack? or do they bump the RSP by the size of the redzone before calling the handler? I don't know for sure, although I'm certain there are several people on this list who could tell you about the mechanism in excruciating detail if they feel like it. Signals can definitely be handled on a separate signal stack if SA_ONSTACK is passed to sigaction. Anyhow, I've got some other test code exhibiting problems sometimes and not others, and I'm no closer to a solution. Sometime code works, sometimes it segfaults, sometimes it just overwrites the wrong place and I trap the problem and report it. The behaviour is always the same for the same program, data, and GC tuning parameters: the fault is unpredictable but thankfully not intermittent. Unfortunately running Valgrind is one of the things the bug is sensitive to, it runs and hides when running Valgrind :) That happens sometimes, especially if there are threads in your program because Valgrind changes the way that threads are scheduled. Valgrind also replaces the standard malloc and some other routines, which can change the way that they behave. Hopefully if you can clean up the false positives from your stack-scanning code then you will be able to find some bugs from the remaining warnings. -Dave -- Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
On 21/01/2011, at 2:55 AM, Dave Goodell wrote: Valgrind tracks the V bits (validity bits) for registers too, so the validity of any pushed register values on the stack will depend on the validity of the register contents before the push. OK thx .. I haven't looked at the VEX machine. If you follow that, then I think the best approach is to adapt the client request code sketch that I posted before for scanning the whole stack: BTW: it says in the docs that #including valgrind has a tiny overhead which won't be noticed except in inner loops perhaps. I have not yet looked at this in detail. Is there really a cost (if you don't use it). 8 #define RZ_SZB (128) char *sp = /* stack pointer value */; char vbits[RZ_SZB] = {0}; VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB); /* ... scan the red zone here ... */ VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); 8 I can try that, but really the problem isn't reading uninitialised values. My theory is gc is deleting reachable store: the actual bug is NOT reading some thing it should not be :) Unfortunately running Valgrind is one of the things the bug is sensitive to, it runs and hides when running Valgrind :) That happens sometimes, especially if there are threads in your program because Valgrind changes the way that threads are scheduled. No threads, at least none I know of. The actual program is just ls regexp where regexp is a Google Re2 (perl) regexp. The code actually collects all the files in a list before printing them, basically to test the subroutine that collects all the files in a list, since that's going into my library. Valgrind also replaces the standard malloc and some other routines, which can change the way that they behave. Hopefully if you can clean up the false positives from your stack-scanning code then you will be able to find some bugs from the remaining warnings. That would be nice but it is unclear. Consider: the only way** a GC could cause the problem of an over-write would be to delete a reachable object. ** unless there were some stupid bug in the GC, I did have one: chasing pointers down with recursion .. blows the stack on a long enough list. -- john skaller skal...@users.sourceforge.net -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
On Jan 20, 2011, at 6:07 PM CST, john skaller wrote: Hmmm ..ok did some client request stuff .. looks like I found a bug in OSX! First, what's this? ==7005== Command: tools/flx_ls ==7005== --7005-- warning: addVar: in range 0xcb2 .. 0xcd7 outside segment 0x1 .. 0x1000eefff (top) --7005-- warning: addVar: in range 0xcf4 .. 0xd88 outside segment 0x1 .. 0x1000eefff (top) --7005-- warning: addVar: in range 0xd89 .. 0xde1 outside segment 0x1 .. 0x1000eefff (top) ... [more] ... Dunno. Someone else probably does. Here's the bug? in OSX: ==7005== Conditional jump or move depends on uninitialised value(s) ==7005==at 0x10031BC28: pthread_rwlock_init (in /usr/lib/libSystem.B.dylib) ==7005==by 0x1161F: re2::Mutex::Mutex() (mutex.h:108) ==7005==by 0x100031605: re2::RE2::Init(re2::StringPiece const, re2::RE2::Options const) (re2.cc:147) ==7005==by 0x100031FEF: re2::RE2::RE2(std::string const) (re2.cc:98) ==7005==by 0x1767C: flxusr::flx_ls::_init_::resume() (in tools/flx_ls) ==7005== Uninitialised value was created by a heap allocation ==7005==at 0x1001CA374: operator new(unsigned long) (vg_replace_malloc.c:261) ==7005==by 0x1000315F2: re2::RE2::Init(re2::StringPiece const, re2::RE2::Options const) (re2.cc:147) ==7005==by 0x100031FEF: re2::RE2::RE2(std::string const) (re2.cc:98) ==7005==by 0x1767C: flxusr::flx_ls::_init_::resume() (in tools/flx_ls) ==7005== A rwlock (BSD) is an int. It is not initialised by re2. AFAICS it shouldn't have to be either, since that's what pthread_rwlock_init is for, but that routine appears to be doing a conditional jump on it. Looks like a regression or variant of https://bugs.kde.org/show_bug.cgi?id=196528. -Dave -- Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
OK, so I do this now: pthread::memory_range_t range = *i; // line 325, this is mark routine, marks reachable objects if(debug) { unsigned long n = (char*)range.e - (char*)range.b; fprintf(stderr, Conservate scan of memory %p-%p, %ld bytes\n,range.b, range.e, n); } VALGRIND_MAKE_MEM_DEFINED(range.b, (char*)range.e-(char*)range.b); void *end = range.e; for ( void *i = range.b; i != end; i = (void*)((void**)i+1)) { if(debug) fprintf(stderr, Check if *%p=%p is a pointer\n,i,*(void**)i); scan_object(*(void**)i, reclimit); } if(debug) fprintf(stderr, DONE: Conservate scan of memory %p-%p\n,range.b, range.e); The VALGRIND macro there doesn't seem to be working, I must be doing something wrong. I'm trying to just mark the whole stack as defined. Here's output: Actually collect Request to collect, thread 1004f6be0 Thread 1004f6be0 Stopping world, active threads=1 World stop thread=1004f6be0, stack=0x7fff5fbff128! Stack size = 1648 World STOPPED Collecting, thread 1004f6be0 Collector: Running mark Conservate scan of memory 0x7fff5fbff128-0x7fff5fbff798, 1648 bytes Check if *0x7fff5fbff128=0x18 is a pointer ==7159== Conditional jump or move depends on uninitialised value(s) ==7159==at 0x10032E579: __vfprintf (in /usr/lib/libSystem.B.dylib) ==7159==by 0x10032C06E: __vfprintf (in /usr/lib/libSystem.B.dylib) ==7159==by 0x10036F07A: vfprintf_l (in /usr/lib/libSystem.B.dylib) ==7159==by 0x10036EFFD: fprintf (in /usr/lib/libSystem.B.dylib) ==7159==by 0x1DCFA: flx::gc::collector::flx_collector_t::scan_object(void*, int) (flx_collector.cpp:451) ==7159==by 0x1E4A7: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) ==7159==by 0x1EFF6: flx::gc::collector::flx_collector_t::impl_collect() (flx_collector.cpp:535) ==7159==by 0x1000111A6: flx::gc::collector::flx_ts_collector_t::v_collect() (flx_ts_collector.cpp:21) ==7159==by 0x14D23: flx::gc::generic::collector_t::collect() (flx_gc.hpp:108) ==7159==by 0x100010B9F: flx::gc::generic::gc_profile_t::actually_collect() (flx_gc.cpp:59) ==7159==by 0x100010DBF: flx::gc::generic::gc_profile_t::maybe_collect() (flx_gc.cpp:53) ==7159==by 0x100010E0C: flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, unsigned long, bool) (flx_gc.cpp:81) ==7159== Uninitialised value was created by a stack allocation ==7159==at 0x1E492: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) .. more .. Scan object 0x18, reachable bit value = 1 ==7159== Use of uninitialised value of size 8 ==7159==at 0x100081737: JudyLGet (JudyLGet.c:327) ==7159==by 0x10005CCFD: JudyLLast (JudyLFirst.c:118) ==7159==by 0x1DD2F: flx::gc::collector::flx_collector_t::scan_object(void*, int) (flx_collector.cpp:454) ==7159==by 0x1E4A7: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) ==7159==by 0x1EFF6: flx::gc::collector::flx_collector_t::impl_collect() (flx_collector.cpp:535) ==7159==by 0x1000111A6: flx::gc::collector::flx_ts_collector_t::v_collect() (flx_ts_collector.cpp:21) ==7159==by 0x14D23: flx::gc::generic::collector_t::collect() (flx_gc.hpp:108) ==7159==by 0x100010B9F: flx::gc::generic::gc_profile_t::actually_collect() (flx_gc.cpp:59) ==7159==by 0x100010DBF: flx::gc::generic::gc_profile_t::maybe_collect() (flx_gc.cpp:53) ==7159==by 0x100010E0C: flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, unsigned long, bool) (flx_gc.cpp:81) ==7159==by 0x100010F58: operator new(unsigned long, flx::gc::generic::gc_profile_t, flx::gc::generic::gc_shape_t, bool) (flx_gc.cpp:117) ==7159==by 0x112D5: flxusr::lr::rev(flxusr::lr::thread_frame_t*, flx::rtl::_uctor_) (in ./lr) ==7159== Uninitialised value was created by a stack allocation ==7159==at 0x1E492: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) ==7159== ==7159== Use of uninitialised value of size 8 ==7159==at 0x100080E0D: JudyLGet (JudyLGet.c:125) ==7159==by 0x10005CCFD: JudyLLast (JudyLFirst.c:118) ==7159==by 0x1DD2F: flx::gc::collector::flx_collector_t::scan_object(void*, int) (flx_collector.cpp:454) ==7159==by 0x1E4A7: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337) ==7159==by 0x1EFF6: flx::gc::collector::flx_collector_t::impl_collect() (flx_collector.cpp:535) ==7159==by 0x1000111A6:
Re: [Valgrind-users] Debugging a GC with valgrind
On Jan 19, 2011, at 9:52 AM CST, john skaller wrote: On 20/01/2011, at 2:39 AM, Dave Goodell wrote: On Jan 18, 2011, at 10:56 PM CST, john skaller wrote: I could rewrite the GC so that the stack scan is in a separate subroutine, and then just exclude that using Valgrinds nice suppression mechanism. Yes, although this doesn't get rid of the uninitialized values, which could potentially propagate elsewhere in your code. It just suppresses error _reporting_. You could try using the various MEMPOOL client requests so that Valgrind might also be able to report errors in terms of a particular object's allocation stack trace. But I'm not sure I understand your situation 100%, so there's no guarantee that it will help. Not using a pool: standard malloc/free. Got it. Then using the MEMPOOL macros won't help you any. The key bit is that Valgrind is marking the whole red zone as undefined at function entrance/exit, so only areas that are actually written during that function are potentially going to be marked as defined. Yeah, I see, so actually the red zone is only safe to use within a function as scratch area. Sure, but it makes sense that a conservative collector like yours must scan the whole red zone. You're just doing something unconventional from Valgrind's point of view, so you need to tell it that you Know What You Are Doing. Well that's interesting. It isn't scanning the whole red zone. Maybe it should!! See above, how I get the low address bound. Still, if the redzone is only active inside a function and never across a function call, there's no need to scan it (since, say, the get_stack_pointer routine is a function call it should invalidate the red-zone, if I understand the comments you posted in your last email). Right. I was thinking this was an external thread or signal handler examining another stack, in which case you would need to scan the whole red zone. But if it all happens as a result of an explicit new/allocate call then scanning the red zone shouldn't be necessary. -Dave -- Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users
Re: [Valgrind-users] Debugging a GC with valgrind
A few things that might help you here: 1) Build your program with debugging information, which will help you to understand exactly which line is causing a problem in your stack traces. 2) Tracking down uninitialized value warnings is much easier if you use the --track-origins=yes option to Valgrind. 3) I have a pretty limited understanding of Valgrind's handling of stack red zones, but there's a handy comment in memcheck/mc_main.c that sheds some light on the situation: 8 Dealing with stack redzones, and the NIA cache ~~ This is one of the few non-obvious parts of the implementation. Some ABIs (amd64-ELF, ppc64-ELF, ppc32/64-XCOFF) define a small reserved area below the stack pointer, that can be used as scratch space by compiler generated code for functions. In the Memcheck sources this is referred to as the stack redzone. The important thing here is that such redzones are considered volatile across function calls and returns. So Memcheck takes care to mark them as undefined for each call and return, on the afflicted platforms. Past experience shows this is essential in order to get reliable messages about uninitialised values that come from the stack. 8 The key bit is that Valgrind is marking the whole red zone as undefined at function entrance/exit, so only areas that are actually written during that function are potentially going to be marked as defined. Given this, you'll probably need to play some games with Valgrind's client request mechanism to temporarily tell valgrind that accesses to the red zone are safe. I'm guessing that the solution would look something like this: 8 #define RZ_SZB (128) char *sp = /* stack pointer value */; char vbits[RZ_SZB] = {0}; VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB); /* ... scan the red zone here ... */ VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB); 8 -Dave On Jan 17, 2011, at 9:03 PM CST, john skaller wrote: I have some kind of memory corruption in a C++ program generated by a tool. The program uses my own exact garbage collector which may be the cause of the problem. The size of the data being processed is to big to trace anything by hand .. so I thought I'd try that excellent and magical tool, valgrind. My problem is basically filtering out the false positives to find the real problem. To repeat, I know for sure I am writing to the wrong place, and that's causing my program to crash. The fault is intermittent in the sense that the exact crash cause and time varies a little bit (for example the buggy program doesn't crash under valgrind :) There are several possible sources of my bug. (a) bug in code generator (unlikely) (b) bug in library using some hand written C++ (unlikely) (c) bug in the gc -- most likely A GC bug is most likely to be deleting a reachable object. It's unlikely to be an actual *bug* in the code as such, though that's possible since I just found one yesterday and fixed it :) Let's look at what Valgrind is telling me: ==21994== Invalid read of size 8 ==21994==at 0x100011E94: flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t, std::allocatorflx::pthread::memory_range_t *) (in ./ls) ==21994==by 0x100012718: flx::gc::collector::flx_collector_t::impl_collect() (in ./ls) ==21994==by 0x1000148C8: flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls) ==21994==by 0x18839: flx::gc::generic::collector_t::collect() (in ./ls) ==21994==by 0x1000142C1: flx::gc::generic::gc_profile_t::actually_collect() (in ./ls) ==21994==by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() (in ./ls) ==21994==by 0x10001452E: flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, unsigned long, bool) (in ./ls) ==21994==by 0x10001467A: operator new(unsigned long, flx::gc::generic::gc_profile_t, flx::gc::generic::gc_shape_t, bool) (in ./ls) ==21994==by 0x10D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, flx::rtl::_uctor_) (in ./ls) ==21994==by 0x7FFF5FBFCC3F: ??? ==21994== Address 0x7fff5fbfc908 is just below the stack ptr. To suppress, use: --workaround-gcc296-bugs=yes My GC does a conservative scan of the stack. It's possible it looks beyond the top (lowest address) of the stack although this shouldn't happen (I will have to subtract sizeof(void*) from the stack value I calculate to fix this problem. However it is perfectly *legal* to do this on x86_64 platform: the ABI specifies a hot zone and code is free to use a certain number of bytes (256?) on the wrong side of the stack. So technically this is a bug in Valgrind: the read isn't invalid, it's just suspicious. ==21994== Use of uninitialised value of size 8 ==21994==at 0x100084E57: JudyLGet (in ./ls) ==21994==by 0x10006041D:
Re: [Valgrind-users] Debugging a GC with valgrind
On 19/01/2011, at 5:58 AM, Dave Goodell wrote: A few things that might help you here: 1) Build your program with debugging information, which will help you to understand exactly which line is causing a problem in your stack traces. Done. 2) Tracking down uninitialized value warnings is much easier if you use the --track-origins=yes option to Valgrind. Also done. Told me the function making the original uninit value, but I already knew that anyhow. I needed to know which variable. It's likely valgrind is being too smart: the code is looking on the current stack for pointers, it's likely some words were part of a struct with an uninit value, this would be harmless. The values are looked up in a table (actually a JudyArray) to see if they're managed pointers. So it's ok if they're uninitialised values. My problem is that something is overwriting valid storage, either a bug in my list handling code or the GC deleting reachable objects. I can't tell which. Both pieces of code seem to work at least some of the time. There's never a problem *unless* the GC is called, but that doesn't prove its the GC, its possible the GC is deleting an object and a new one is created at the same address (malloc will certainly do this), and then the overwrite is causing a problem. Still .. the code *works* when it doesn't crash. 3) I have a pretty limited understanding of Valgrind's handling of stack red zones, but there's a handy comment in memcheck/mc_main.c that sheds some light on the situation: 8 Dealing with stack redzones, and the NIA cache ~~ This is one of the few non-obvious parts of the implementation. Some ABIs (amd64-ELF, ppc64-ELF, ppc32/64-XCOFF) define a small reserved area below the stack pointer, that can be used as scratch space by compiler generated code for functions. In the Memcheck sources this is referred to as the stack redzone. The important thing here is that such redzones are considered volatile across function calls and returns. So Memcheck takes care to mark them as undefined for each call and return, on the afflicted platforms. Past experience shows this is essential in order to get reliable messages about uninitialised values that come from the stack. 8 The key bit is that Valgrind is marking the whole red zone as undefined at function entrance/exit, so only areas that are actually written during that function are potentially going to be marked as defined. Yeah, I see, so actually the red zone is only safe to use within a function as scratch area. -- john skaller skal...@users.sourceforge.net -- Protect Your Site and Customers from Malware Attacks Learn about various malware tactics and how to avoid them. Understand malware threats, the impact they can have on your business, and how you can protect your company and customers by using code signing. http://p.sf.net/sfu/oracle-sfdevnl ___ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users