Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-21 Thread WAROQUIERS Philippe
 
 8
 #define RZ_SZB (128)
 char *sp = /* stack pointer value */;
 char vbits[RZ_SZB] = {0};
 VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
 VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB);
 /* ... scan the red zone here ... */
 VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
 8

I can try that, but really the problem isn't reading 
uninitialised values.

My theory is gc is deleting reachable store: the actual bug is NOT
reading some thing it should not be :)
...
That would be nice but it is unclear. Consider: the only way** 
a GC could cause 
the problem of an over-write would be to delete a reachable object.

** unless there were some stupid bug in the GC, I did have one: chasing
pointers down with recursion .. blows the stack on a long enough list.
...

I understood that this GC is directly calling malloc/free for each
object
(so there is GC pool management).
Then valgrind memcheck is (supposed to be) able to detect that GC is
freeing
a piece of memory, which is then dereferenced.

You might need to increase the 
list of freed objects but kept in a corner to detect such bugs
by using the option
   --freelist-vol=

If GC is maintaining its own mempool, then valgrind mempool requests
needs to be used.

If the application is multi-threaded, it might also be a race condition.
You could try helgrind or drd.

Philippe


 
This message and any files transmitted with it are legally privileged and 
intended for the sole use of the individual(s) or entity to whom they are 
addressed. If you are not the intended recipient, please notify the sender by 
reply and delete the message and any attachments from your system. Any 
unauthorised use or disclosure of the content of this message is strictly 
prohibited and may be unlawful.
 
Nothing in this e-mail message amounts to a contractual or legal commitment on 
the part of EUROCONTROL, unless it is confirmed by appropriately signed hard 
copy.
 
Any views expressed in this message are those of the sender.

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-21 Thread Dave Goodell
On Jan 20, 2011, at 6:43 PM CST, john skaller wrote:

 
 OK, so I do this now:
...
if(debug)
  fprintf(stderr, Check if *%p=%p is a pointer\n,i,*(void**)i);
scan_object(*(void**)i, reclimit);
...
 The VALGRIND macro there doesn't seem to be working, I must be
 doing something wrong. I'm trying to just mark the whole stack as defined.
 
 Here's output:
...

The way that I read this output is that your range variable and the data in 
that range is probably defined because Valgrind isn't flagging i or the value 
at address i when you perform the Check if fprintf above.  Rather, it's 
flagging an fprintf inside of scan_object on line 451, whose output you did not 
include.  Maybe it's the reclimit variable?

 ==7159==  Uninitialised value was created by a stack allocation
 ==7159==at 0x1E492: 
 flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
  std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)

It's odd that the stack allocation is attributed to line 337, which AFAICT is 
the line where scan_object is called.  Is there some sort of odd automatic C++ 
temporary allocation happening here that I can't see because some other code 
isn't shown here or (more likely) my C++ is too rusty somehow?  Perhaps under 
optimization an uninitialized and otherwise-unused reclimit variable only gets 
created at the time that the arguments to scan_object are pushed onto the stack?

-Dave


--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-21 Thread john skaller

On 22/01/2011, at 2:30 AM, Dave Goodell wrote:

 On Jan 20, 2011, at 6:43 PM CST, john skaller wrote:
 
 
 OK, so I do this now:
 ...
   if(debug)
 fprintf(stderr, Check if *%p=%p is a pointer\n,i,*(void**)i);
   scan_object(*(void**)i, reclimit);
 ...
 The VALGRIND macro there doesn't seem to be working, I must be
 doing something wrong. I'm trying to just mark the whole stack as defined.
 
 Here's output:
 ...
 
 The way that I read this output is that your range variable and the data in 
 that range is probably defined because Valgrind isn't flagging i or the value 
 at address i when you perform the Check if fprintf above.  

Good point..

 Rather, it's flagging an fprintf inside of scan_object on line 451, whose 
 output you did not include.  Maybe it's the reclimit variable?

I shouldn't think so, though it's hard to be sure of anything.

Scan object look to see if the pointer value it gets is actually a pointer 
into the heap.
If so, it looks at the pointer inside the pointed at object (recursively). I 
know where the
pointers are because I know the type of every heap object.

It doesn't chase down the pointer unless its an actual pointer into the heap:
it doesn't chase ints, raw C pointers (not Felix heap allocated), or pointers 
into the stack or static storage.

In this program there is only ONE data structure on the heap: list nodes.
Which contain exactly one pointer (to the next node). There are no list
nodes on the stack.

The pointers used for lists are actually tagged pointers:

struct _uctor_ { int variant; void *data; };
variant = 0 means end of list and 1 means a node with data in it is being 
pointed at.
[So a node the last node has a value in it and a pointer with variant 0 and 
data=NULL].

It is the variant which is sometimes overwritten with a value like 99762976,
it should be only 0 or 1, my has switches on the variant, which include
a wildcard for values other than 0 or 1, which cause a diagnostic to be
printed match failure and then abort the program.

 
 ==7159==  Uninitialised value was created by a stack allocation
 ==7159==at 0x1E492: 
 flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
  std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)
 
 It's odd that the stack allocation is attributed to line 337, which AFAICT is 
 the line where scan_object is called.  Is there some sort of odd automatic 
 C++ temporary allocation happening here that I can't see because some other 
 code isn't shown here or (more likely) my C++ is too rusty somehow?  

I don't think so.

 Perhaps under optimization an uninitialized and otherwise-unused reclimit 
 variable only gets created at the time that the arguments to scan_object are 
 pushed onto the stack?


Well that's possible, but I'm doing a debug build, I *think* that doesn't do 
C++ optimisation
(normally I run with -O3 on gcc).

The nasty thing is now it is working, some quirk has changed something.. :)


--
john skaller
skal...@users.sourceforge.net





--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-20 Thread Dave Goodell
On Jan 20, 2011, at 4:35 AM CST, john skaller wrote:

 On 20/01/2011, at 3:04 AM, Dave Goodell wrote:
 
 On Jan 19, 2011, at 9:52 AM CST, john skaller wrote:
 
 On 20/01/2011, at 2:39 AM, Dave Goodell wrote:
 
 On Jan 18, 2011, at 10:56 PM CST, john skaller wrote:
 
 I could rewrite the GC so that the stack scan is in a separate subroutine,
 and then just exclude that using Valgrinds nice suppression mechanism.
 
 Yes, although this doesn't get rid of the uninitialized values, which could 
 potentially propagate elsewhere in your code.  It just suppresses error 
 _reporting_
 
 Well there's no way to get rid of these uninitialised values. Most are in 
 fact initialised.
 The problem is something like: when a subroutine is called the return address 
 is pushed
 on the stack, along with callee-save registers.
 
 I suspect Valgrind thinks these are uninitialised values.

Valgrind tracks the V bits (validity bits) for registers too, so the validity 
of any pushed register values on the stack will depend on the validity of the 
register contents before the push.

 In other cases: consider, I have a data structure:
 
 struct { int x; long y; } a;
 
 on the stack with say 4 bytes padding after 'x'. When I read the 8 byte word
 at address a half is uninitialised. I'm not sure what Valgrind would say 
 here:
 most of my uninitialised values are Value8.

There is a way to get rid of the uninitialized values, one that I posted 
earlier in the thread.  Try reading over the manual's explanation of how 
Valgrind tracks defined/undefined data: 
http://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine

And then the section on client requests: 
http://valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.clientreq

If you follow that, then I think the best approach is to adapt the client 
request code sketch that I posted before for scanning the whole stack:

8
#define RZ_SZB (128)
char *sp = /* stack pointer value */;
char vbits[RZ_SZB] = {0};
VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB);
/* ... scan the red zone here ... */
VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
8

 Right.  I was thinking this was an external thread or signal handler 
 examining another stack, in which case you would need to scan the whole red 
 zone.  But if it all happens as a result of an explicit new/allocate call 
 then scanning the red zone shouldn't be necessary.
 
 Yeah. I'm not sure what happens on x86_64 Unix (OSX, Linux) with signals: I 
 have a feeling
 they do not use the applications stack? or do they bump the RSP by the size 
 of the redzone
 before calling the handler?

I don't know for sure, although I'm certain there are several people on this 
list who could tell you about the mechanism in excruciating detail if they feel 
like it.  Signals can definitely be handled on a separate signal stack if 
SA_ONSTACK is passed to sigaction.

 Anyhow, I've got some other test code exhibiting problems sometimes and not 
 others,
 and I'm no closer to a solution. Sometime code works, sometimes it segfaults,
 sometimes it just overwrites the wrong place and I trap the problem and 
 report it.
 The behaviour is always the same for the same program, data, and GC tuning 
 parameters: the fault is unpredictable but thankfully not intermittent.
 
 Unfortunately running Valgrind is one of the things the bug is sensitive to,
 it runs and hides when running Valgrind :)

That happens sometimes, especially if there are threads in your program because 
Valgrind changes the way that threads are scheduled.  Valgrind also replaces 
the standard malloc and some other routines, which can change the way that they 
behave.  Hopefully if you can clean up the false positives from your 
stack-scanning code then you will be able to find some bugs from the remaining 
warnings.

-Dave


--
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-20 Thread john skaller

On 21/01/2011, at 2:55 AM, Dave Goodell wrote:
 
 Valgrind tracks the V bits (validity bits) for registers too, so the validity 
 of any pushed register values on the stack will depend on the validity of the 
 register contents before the push.

OK thx .. I haven't looked at the VEX machine.

 If you follow that, then I think the best approach is to adapt the client 
 request code sketch that I posted before for scanning the whole stack:

BTW: it says in the docs that #including valgrind has a tiny overhead which 
won't be
noticed except in inner loops perhaps. I have not yet looked at this in 
detail.
Is there really a cost (if you don't use it). 


 8
 #define RZ_SZB (128)
 char *sp = /* stack pointer value */;
 char vbits[RZ_SZB] = {0};
 VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
 VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB);
 /* ... scan the red zone here ... */
 VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
 8

I can try that, but really the problem isn't reading uninitialised values.

My theory is gc is deleting reachable store: the actual bug is NOT
reading some thing it should not be :)


 
 Unfortunately running Valgrind is one of the things the bug is sensitive to,
 it runs and hides when running Valgrind :)
 
 That happens sometimes, especially if there are threads in your program 
 because Valgrind changes the way that threads are scheduled.  

No threads, at least none I know of. 
The actual program is just ls regexp where regexp is a Google Re2 (perl) 
regexp.
The code actually collects all the files in a list before printing them, 
basically to test
the subroutine that collects all the files in a list, since that's going into 
my library.

 Valgrind also replaces the standard malloc and some other routines, which can 
 change the way that they behave.  Hopefully if you can clean up the false 
 positives from your stack-scanning code then you will be able to find some 
 bugs from the remaining warnings.


That would be nice but it is unclear. Consider: the only way** a GC could cause 
the problem of an over-write would be to delete a reachable object.

** unless there were some stupid bug in the GC, I did have one: chasing
pointers down with recursion .. blows the stack on a long enough list.

--
john skaller
skal...@users.sourceforge.net





--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-20 Thread Dave Goodell
On Jan 20, 2011, at 6:07 PM CST, john skaller wrote:

 Hmmm ..ok did some client request stuff .. looks like I found a bug in OSX!
 
 First, what's this?
 
 ==7005== Command: tools/flx_ls
 ==7005== 
 --7005-- warning: addVar: in range 0xcb2 .. 0xcd7 outside segment 0x1 
 .. 0x1000eefff (top)
 --7005-- warning: addVar: in range 0xcf4 .. 0xd88 outside segment 0x1 
 .. 0x1000eefff (top)
 --7005-- warning: addVar: in range 0xd89 .. 0xde1 outside segment 0x1 
 .. 0x1000eefff (top)
 ... [more] ...

Dunno.  Someone else probably does.

 Here's the bug? in OSX:
 
 ==7005== Conditional jump or move depends on uninitialised value(s)
 ==7005==at 0x10031BC28: pthread_rwlock_init (in 
 /usr/lib/libSystem.B.dylib)
 ==7005==by 0x1161F: re2::Mutex::Mutex() (mutex.h:108)
 ==7005==by 0x100031605: re2::RE2::Init(re2::StringPiece const, 
 re2::RE2::Options const) (re2.cc:147)
 ==7005==by 0x100031FEF: re2::RE2::RE2(std::string const) (re2.cc:98)
 ==7005==by 0x1767C: flxusr::flx_ls::_init_::resume() (in tools/flx_ls)
 ==7005==  Uninitialised value was created by a heap allocation
 ==7005==at 0x1001CA374: operator new(unsigned long) 
 (vg_replace_malloc.c:261)
 ==7005==by 0x1000315F2: re2::RE2::Init(re2::StringPiece const, 
 re2::RE2::Options const) (re2.cc:147)
 ==7005==by 0x100031FEF: re2::RE2::RE2(std::string const) (re2.cc:98)
 ==7005==by 0x1767C: flxusr::flx_ls::_init_::resume() (in tools/flx_ls)
 ==7005== 
 
 A rwlock (BSD) is an int. It is not initialised by re2. AFAICS it shouldn't 
 have to be
 either, since that's what pthread_rwlock_init is for, but that routine appears
 to be doing a conditional jump on it.

Looks like a regression or variant of 
https://bugs.kde.org/show_bug.cgi?id=196528.

-Dave


--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-20 Thread john skaller

OK, so I do this now:

  pthread::memory_range_t range = *i;  // line 325, this is mark routine, 
marks reachable objects
  if(debug)
  {
unsigned long n = (char*)range.e - (char*)range.b;
fprintf(stderr, Conservate scan of memory %p-%p, %ld 
bytes\n,range.b, range.e, n);
  }
  VALGRIND_MAKE_MEM_DEFINED(range.b, (char*)range.e-(char*)range.b);
  void *end = range.e;
  for ( void *i = range.b; i != end; i = (void*)((void**)i+1))
  {
if(debug)
  fprintf(stderr, Check if *%p=%p is a pointer\n,i,*(void**)i);
scan_object(*(void**)i, reclimit);
  }
  if(debug)
fprintf(stderr, DONE: Conservate scan of memory %p-%p\n,range.b, 
range.e);

The VALGRIND macro there doesn't seem to be working, I must be
doing something wrong. I'm trying to just mark the whole stack as defined.

Here's output:


Actually collect
Request to collect, thread 1004f6be0
Thread 1004f6be0 Stopping world, active threads=1
World stop thread=1004f6be0, stack=0x7fff5fbff128!
Stack size = 1648
World STOPPED
Collecting, thread 1004f6be0
Collector: Running mark
Conservate scan of memory 0x7fff5fbff128-0x7fff5fbff798, 1648 bytes
Check if *0x7fff5fbff128=0x18 is a pointer
==7159== Conditional jump or move depends on uninitialised value(s)
==7159==at 0x10032E579: __vfprintf (in /usr/lib/libSystem.B.dylib)
==7159==by 0x10032C06E: __vfprintf (in /usr/lib/libSystem.B.dylib)
==7159==by 0x10036F07A: vfprintf_l (in /usr/lib/libSystem.B.dylib)
==7159==by 0x10036EFFD: fprintf (in /usr/lib/libSystem.B.dylib)
==7159==by 0x1DCFA: 
flx::gc::collector::flx_collector_t::scan_object(void*, int) 
(flx_collector.cpp:451)
==7159==by 0x1E4A7: 
flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
 std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)
==7159==by 0x1EFF6: flx::gc::collector::flx_collector_t::impl_collect() 
(flx_collector.cpp:535)
==7159==by 0x1000111A6: flx::gc::collector::flx_ts_collector_t::v_collect() 
(flx_ts_collector.cpp:21)
==7159==by 0x14D23: flx::gc::generic::collector_t::collect() 
(flx_gc.hpp:108)
==7159==by 0x100010B9F: flx::gc::generic::gc_profile_t::actually_collect() 
(flx_gc.cpp:59)
==7159==by 0x100010DBF: flx::gc::generic::gc_profile_t::maybe_collect() 
(flx_gc.cpp:53)
==7159==by 0x100010E0C: 
flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
unsigned long, bool) (flx_gc.cpp:81)
==7159==  Uninitialised value was created by a stack allocation
==7159==at 0x1E492: 
flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
 std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)

.. more ..

Scan object 0x18, reachable bit value = 1
==7159== Use of uninitialised value of size 8
==7159==at 0x100081737: JudyLGet (JudyLGet.c:327)
==7159==by 0x10005CCFD: JudyLLast (JudyLFirst.c:118)
==7159==by 0x1DD2F: 
flx::gc::collector::flx_collector_t::scan_object(void*, int) 
(flx_collector.cpp:454)
==7159==by 0x1E4A7: 
flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
 std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)
==7159==by 0x1EFF6: flx::gc::collector::flx_collector_t::impl_collect() 
(flx_collector.cpp:535)
==7159==by 0x1000111A6: flx::gc::collector::flx_ts_collector_t::v_collect() 
(flx_ts_collector.cpp:21)
==7159==by 0x14D23: flx::gc::generic::collector_t::collect() 
(flx_gc.hpp:108)
==7159==by 0x100010B9F: flx::gc::generic::gc_profile_t::actually_collect() 
(flx_gc.cpp:59)
==7159==by 0x100010DBF: flx::gc::generic::gc_profile_t::maybe_collect() 
(flx_gc.cpp:53)
==7159==by 0x100010E0C: 
flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
unsigned long, bool) (flx_gc.cpp:81)
==7159==by 0x100010F58: operator new(unsigned long, 
flx::gc::generic::gc_profile_t, flx::gc::generic::gc_shape_t, bool) 
(flx_gc.cpp:117)
==7159==by 0x112D5: flxusr::lr::rev(flxusr::lr::thread_frame_t*, 
flx::rtl::_uctor_) (in ./lr)
==7159==  Uninitialised value was created by a stack allocation
==7159==at 0x1E492: 
flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
 std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)
==7159== 
==7159== Use of uninitialised value of size 8
==7159==at 0x100080E0D: JudyLGet (JudyLGet.c:125)
==7159==by 0x10005CCFD: JudyLLast (JudyLFirst.c:118)
==7159==by 0x1DD2F: 
flx::gc::collector::flx_collector_t::scan_object(void*, int) 
(flx_collector.cpp:454)
==7159==by 0x1E4A7: 
flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
 std::allocatorflx::pthread::memory_range_t *) (flx_collector.cpp:337)
==7159==by 0x1EFF6: flx::gc::collector::flx_collector_t::impl_collect() 
(flx_collector.cpp:535)
==7159==by 0x1000111A6: 

Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-19 Thread Dave Goodell
On Jan 19, 2011, at 9:52 AM CST, john skaller wrote:

 On 20/01/2011, at 2:39 AM, Dave Goodell wrote:
 
 On Jan 18, 2011, at 10:56 PM CST, john skaller wrote:

 I could rewrite the GC so that the stack scan is in a separate subroutine,
 and then just exclude that using Valgrinds nice suppression mechanism.

Yes, although this doesn't get rid of the uninitialized values, which could 
potentially propagate elsewhere in your code.  It just suppresses error 
_reporting_.

 You could try using the various MEMPOOL client requests so that Valgrind 
 might also be able to report errors in terms of a particular object's 
 allocation stack trace.  But I'm not sure I understand your situation 100%, 
 so there's no guarantee that it will help.
 
 Not using a pool: standard malloc/free.

Got it.  Then using the MEMPOOL macros won't help you any.

 The key bit is that Valgrind is marking the whole red zone as undefined 
 at function entrance/exit, so only areas that are actually written during 
 that function are potentially going to be marked as defined.  
 
 Yeah, I see, so actually the red zone is only safe to use within a 
 function as scratch area.
 
 Sure, but it makes sense that a conservative collector like yours must scan 
 the whole red zone. You're just doing something unconventional from 
 Valgrind's point of view, so you need to tell it that you Know What You Are 
 Doing.
 
 Well that's interesting. It isn't scanning the whole red zone. Maybe it 
 should!!
 See above, how I get the low address bound. Still, if the redzone is only 
 active inside a
 function and never across a function call, there's no need to scan it (since, 
 say,
 the get_stack_pointer routine is a function call it should invalidate the 
 red-zone,
 if I understand the comments you posted in your last email).

Right.  I was thinking this was an external thread or signal handler examining 
another stack, in which case you would need to scan the whole red zone.  But if 
it all happens as a result of an explicit new/allocate call then scanning the 
red zone shouldn't be necessary.

-Dave


--
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users


Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-18 Thread Dave Goodell
A few things that might help you here:

1) Build your program with debugging information, which will help you to 
understand exactly which line is causing a problem in your stack traces.

2) Tracking down uninitialized value warnings is much easier if you use the 
--track-origins=yes option to Valgrind.

3) I have a pretty limited understanding of Valgrind's handling of stack red 
zones, but there's a handy comment in memcheck/mc_main.c that sheds some light 
on the situation:

8
   Dealing with stack redzones, and the NIA cache
   ~~

   This is one of the few non-obvious parts of the implementation.

   Some ABIs (amd64-ELF, ppc64-ELF, ppc32/64-XCOFF) define a small
   reserved area below the stack pointer, that can be used as scratch
   space by compiler generated code for functions.  In the Memcheck
   sources this is referred to as the stack redzone.  The important
   thing here is that such redzones are considered volatile across
   function calls and returns.  So Memcheck takes care to mark them as
   undefined for each call and return, on the afflicted platforms.
   Past experience shows this is essential in order to get reliable
   messages about uninitialised values that come from the stack.
8

The key bit is that Valgrind is marking the whole red zone as undefined at 
function entrance/exit, so only areas that are actually written during that 
function are potentially going to be marked as defined.  Given this, you'll 
probably need to play some games with Valgrind's client request mechanism to 
temporarily tell valgrind that accesses to the red zone are safe.  I'm guessing 
that the solution would look something like this:

8
#define RZ_SZB (128)
char *sp = /* stack pointer value */;
char vbits[RZ_SZB] = {0};
VALGRIND_GET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
VALGRIND_MAKE_MEM_DEFINED(sp-RZ_SZB, RZ_SZB);
/* ... scan the red zone here ... */
VALGRIND_SET_VBITS(sp-RZ_SZB, vbits, RZ_SZB);
8

-Dave

On Jan 17, 2011, at 9:03 PM CST, john skaller wrote:

 I have some kind of memory corruption in a C++ program generated by a tool.
 The program uses my own exact garbage collector which may be the cause
 of the problem. The size of the data being processed is to big to trace
 anything by hand .. so I thought I'd try that excellent and magical tool, 
 valgrind.
 
 My problem is basically filtering out the false positives to find the real
 problem. To repeat, I know for sure I am writing to the wrong place, and 
 that's
 causing my program to crash. The fault is intermittent in the sense that the
 exact crash cause and time varies a little bit (for example the buggy program
 doesn't crash under valgrind :)
 
 There are several possible sources of my bug. 
 
 (a) bug in code generator (unlikely)
 (b) bug in library using some hand written C++ (unlikely)
 (c) bug in the gc -- most likely
 
 A GC bug is most likely to be deleting a reachable object. It's unlikely
 to be an actual *bug* in the code as such, though that's possible
 since I just found one yesterday and fixed it :)
 
 Let's look at what Valgrind is telling me:
 
 ==21994== Invalid read of size 8
 ==21994==at 0x100011E94: 
 flx::gc::collector::flx_collector_t::mark(std::vectorflx::pthread::memory_range_t,
  std::allocatorflx::pthread::memory_range_t *) (in ./ls)
 ==21994==by 0x100012718: 
 flx::gc::collector::flx_collector_t::impl_collect() (in ./ls)
 ==21994==by 0x1000148C8: 
 flx::gc::collector::flx_ts_collector_t::v_collect() (in ./ls)
 ==21994==by 0x18839: flx::gc::generic::collector_t::collect() (in 
 ./ls)
 ==21994==by 0x1000142C1: 
 flx::gc::generic::gc_profile_t::actually_collect() (in ./ls)
 ==21994==by 0x1000144E1: flx::gc::generic::gc_profile_t::maybe_collect() 
 (in ./ls)
 ==21994==by 0x10001452E: 
 flx::gc::generic::gc_profile_t::allocate(flx::gc::generic::gc_shape_t*, 
 unsigned long, bool) (in ./ls)
 ==21994==by 0x10001467A: operator new(unsigned long, 
 flx::gc::generic::gc_profile_t, flx::gc::generic::gc_shape_t, bool) (in 
 ./ls)
 ==21994==by 0x10D51: flxusr::ls::rev(flxusr::ls::thread_frame_t*, 
 flx::rtl::_uctor_) (in ./ls)
 ==21994==by 0x7FFF5FBFCC3F: ???
 ==21994==  Address 0x7fff5fbfc908 is just below the stack ptr.  To suppress, 
 use: --workaround-gcc296-bugs=yes
 
 My GC does a conservative scan of the stack. It's possible it looks beyond 
 the top (lowest address) of the
 stack although this shouldn't happen (I will have to subtract sizeof(void*) 
 from the stack value I calculate to fix
 this problem. However it is perfectly *legal* to do this on x86_64 platform: 
 the ABI specifies a hot zone and
 code is free to use a certain number of bytes (256?) on the wrong side of the 
 stack. So technically
 this is a bug in Valgrind: the read isn't invalid, it's just suspicious.
 
 ==21994== Use of uninitialised value of size 8
 ==21994==at 0x100084E57: JudyLGet (in ./ls)
 ==21994==by 0x10006041D: 

Re: [Valgrind-users] Debugging a GC with valgrind

2011-01-18 Thread john skaller

On 19/01/2011, at 5:58 AM, Dave Goodell wrote:

 A few things that might help you here:
 
 1) Build your program with debugging information, which will help you to 
 understand exactly which line is causing a problem in your stack traces.

Done.

 2) Tracking down uninitialized value warnings is much easier if you use the 
 --track-origins=yes option to Valgrind.

Also done. Told me the function making the original uninit value, but I 
already
knew that anyhow. I needed to know which variable.

It's likely valgrind is being too smart: the code is looking on the current 
stack
for pointers, it's likely some words were part of a struct with an uninit value,
this would be harmless. The values are looked up in a table (actually
a JudyArray) to see if they're managed pointers. So it's ok if they're
uninitialised values.

My problem is that something is overwriting valid storage,
either a bug in my list handling code or the GC deleting
reachable objects. I can't tell which. 

Both pieces of code seem to work at least some of the time.
There's never a problem *unless* the GC is called, but that doesn't
prove its the GC, its possible the GC is deleting an object and a new
one is created at the same address (malloc will certainly do this),
and then the overwrite is causing a problem.

Still .. the code *works* when it doesn't crash. 


 3) I have a pretty limited understanding of Valgrind's handling of stack red 
 zones, but there's a handy comment in memcheck/mc_main.c that sheds some 
 light on the situation:
 
 8
   Dealing with stack redzones, and the NIA cache
   ~~
 
   This is one of the few non-obvious parts of the implementation.
 
   Some ABIs (amd64-ELF, ppc64-ELF, ppc32/64-XCOFF) define a small
   reserved area below the stack pointer, that can be used as scratch
   space by compiler generated code for functions.  In the Memcheck
   sources this is referred to as the stack redzone.  The important
   thing here is that such redzones are considered volatile across
   function calls and returns.  So Memcheck takes care to mark them as
   undefined for each call and return, on the afflicted platforms.
   Past experience shows this is essential in order to get reliable
   messages about uninitialised values that come from the stack.
 8
 
 The key bit is that Valgrind is marking the whole red zone as undefined at 
 function entrance/exit, so only areas that are actually written during that 
 function are potentially going to be marked as defined.  

Yeah, I see, so actually the red zone is only safe to use within a function 
as scratch area.


--
john skaller
skal...@users.sourceforge.net





--
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand 
malware threats, the impact they can have on your business, and how you 
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
___
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users