Karen,

On Mon, Jan 4, 2010 at 10:38 PM, Kristen Walcott <kr...@cs.virginia.edu> wrote:
> I've read that BTS tracing induces ~20% overhead, but when using an
> extremely stripped down version of the  bts_smpl code provided in libpfm4,
> the overhead is much more than this.  On a test run of bzip2, bts_smpl took
> 403 seconds whereas a normal run takes only 16.6 seconds.  In that test, I
> even fully removed the call to process_smpl_buf in the main loop.   I am
> using 2.6.32-rc7 with the most recent libpfm tree. Am I going about
> gathering BTS information the wrong way or is it inherently this slow to
> access?
>
BTS is a debugging mechanism not a sampling mechanism. I suspect bzip2 is
a very branchy code, thus it exacerbates the overhead of BTS.

The kernel sets up BTS such that it records about 2040 entries before
it interrupts.
Then, the entries are copied over to the sampling buffer. That incurs
some overhead.

Have you tried increasing your sampling buffer size, i.e., via mmap()?

------------------------------------------------------------------------------
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to