Nick Thanks for the hints. The penalty is expected. I've played with PIN and its pinatrace and MemTrace tools for a while. They worked and the performance is not too bad, though a bit slower than Valgrind.
I need the virtual address of the read/write. Hence, HW counters can't do the job. I did considered the Precise Event Based Sampling (PEBS). But only L1/2 cache load miss events are supported by PEBS. So it can't do the job either. If HW counters could work, it would the best solution since it is most light-weight. Now my 2nd choice is binary instrumentation frameworks like Valgrind. Now I still have a questions: is the memory trace generated by Valgrind or PIN representative enough to model real program behaviours, considering multi-threading? Thanks On Tue, 18 Aug 2009 08:24:47 +1000 Nicholas Nethercote <n.netherc...@gmail.com> wrote: > On Tue, Aug 18, 2009 at 7:18 AM, Peng Du<imdup...@gmail.com> wrote: > > Hello, everyone > > > > A newbie question, according to Valgrind's manual for the lackey > > tool: > > > > "It (lackey) could be made to run a lot faster by doing a slightly > > more sophisticated job of the instrumentation ..." > > > > Now I need a very simple memory read/write counting tool, just like > > lackey. But the tool has to be fast. > > > > Can anyone elaborate a little on how to make lackey A LOT FASTER? Or > > has anyone done so? If yes, do you mind sharing the source? > > I see two possibilities: > > - Currently there is one C function call per original instruction. > You could batch these up to a degree, eg. count multiple instructions > with a single C function call. Cachegrind does something like this. > > - You could use inline (Vex IR) instrumentation rather than C function > calls. See VEX/pub/libvex_ir.h for details of Vex IR. > > You could also do both of these together. Doing so is left as an > exercise to the reader. > > Even if you do all that, the tool would still have a significant > slowdown -- the limit case is Nulgrind (--tool=none) which does no > instrumentation and typically has a 5x slow-down. I don't know if > that is fast enough for your purposes. You could look at Pin or > DynamoRIO as alternative instrumentation frameworks that are better > suited to simple tools such as the one you need. Pin in particular > may have such a tool included in its distribution. Or you could look > at hardware program counters, if they provide the information you > need. > > Nick ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users