Thank you for your swift response. I really appreciate it. stephane eranian wrote: > Hello Erik, > > > On Tue, Jul 15, 2008 at 1:37 PM, Erik Junberger <[EMAIL PROTECTED]> wrote: > >> Hello perfmon devolopers and users. >> >> I am currently working on my thesis project, which in short is to >> integrate sampling capabilities into a commercial Java virtual machine. >> This is achieved via the perfmon2 kernel interface and libraries. The >> samples obtained is then used to make various dynamic optimizations. >> The idea is to use one system-wide context per CPU to measure last-level >> cache misses using PEBS. On buffer overflow, the PEBS buffer will be >> read and aggregated in a data structure for further analysis by another >> thread. >> >> > I just want to warn you that not ALL events support PEBS. > The LAST_LEVEL_CACHE_MISSES does not support PEBS. > > You need to look at the example in libpfm/examples/x86/smpl_core_pebs.c > I am aware of this, and I am actually using MEM_LOAD_RETIRED with Umask-02 or 03, which as far as I can tell corresponds to the same thing on a Core2?
Umask-02 : 0x04 : [L2_MISS] : Retired loads that miss the L2 cache (precise event) Umask-03 : 0x08 : [L2_LINE_MISS] : L2 cache line missed by retired loads (precise event) I have also tried with INST_RETIRED:ANY_P or INSTRUCTIONS_RETIRED which is also supported by PEBS. > > >> I have managed to implement this by looking at the examples supplied >> with perfmon. I can create the contexts, program them, bind them to a >> CPU and start monitoring without any problems. The values i receive >> however are a little bit strange, and I wonder if anyone has a clue of >> what might be going on. >> >> > I assume you've looked at examples/x86/smpl_pebs_core.c. The important > trick is about PFM_REGFL_NO_EMUL64 on the PMC controlling the counter. > > I set this flag in my code also. >> If I for instance set the /reg_value, reg_long_reset, reg_short_reset/ >> and /pfm_pebs_core_smpl_arg_t.cnt_reset/ to -10000 (on pmd0), the values >> are initially set correctly. Every time pmd0 wraps around however, the >> lower 32-bits of pmd0 will be set to 0, and the upper 32 to 1. This >> effectively means that I can't choose any other sampling period than 2³². >> When I read the /pfm_ds_area_core_t.pebs_cnt_reset/ value, the correct >> reset value is always returned, but this doesn't reflect reality. >> >> > With PEBS, you sampling value can only be 32-bit wide due to the wrmsrl() > restriction that it can only modify the lower 32 bits. In fact you actually > have > 31-bits, bit 31 being the sign bit. > I see. This should be sufficient though. > > >> When PMD0 wraps around, no interrupt is generated, and no overflow is >> registered in /pfm_pebs_core_smpl_hdr_t.overflows/. /pebs_index /is not >> incremented either. >> >> > With PEBS, there is not interrupt until the buffer fills up. That's > the whole idea. > Amortize the cost of taking the interrupt over a large number of samples. > But even though the PMC is set not to interrupt, the CPU will catch the > overflow > and micro-code will write a sample in the buffer. You'll only get an > overflow once > the buffer fills up, i.e., when the current position = threshold. > After a sample is > recorded, the micro-code reloads the counter with the cnt_reset value. > That field > is never actually modified by HW. > > > Yes, the concept of amortizing the cost of many samples is clear to me. But I never get any interrupts at all. I have also tried to have a separate thread do /read(fd, &msg, sizeof(msg))/ during the execution of the program. This call never returns. Also the /pebs_index /value never changes during execution, indicating that no samples are written to memory, if I am not mistaken? Shouldn't it also be impossible for pmd0 to reach a value lower than 2^64 - cnt_reset? Or is bit 31 to be considered a sign bit even in perfmon's virtual 64-bit counters? >> I have tried a lot of setup combinations in order to get this to work, >> but nothing has worked. PEBS monitoring on a per-thread basis works >> fine, so I don't think there is anything wrong with my system. I have >> tried this both with a 2.6.24 and 2.6.25 kernel versions, with >> libpfm-3.3 and 3.4 respectively. >> >> > I have tried this using pfmon in system-wide mode: > > $ pfmon --smpl-ignore-pids --system-wide --cpu-list=0 > --smpl-module=pebs -einstructions_retired > --long-smpl-periods=240000000 --pin-command my_test_program > > What happens on your system with this? > This seems to work fine, as I am at least getting some samples # results for CPU0 # total samples : 1 # total buffer overflows : 0 # ## counts %self %cum code addr 1 100.00% 100.00% 0x00007f576e5fde20 If I reduce --long-smpl-periods i get more samples + overflows. # results for CPU0 # total samples : 626 # total buffer overflows : 3 # ## counts %self %cum code addr 57 9.11% 9.11% 0x00007f1a9534be48 22 3.51% 12.62% 0x00007f1a9534c18c 21 3.35% 15.97% 0x00007f1a9534be53 21 3.35% 19.33% 0x00007f1a9534c1ba ----------------------------------------------------------------------------------------- Best regards: Erik Junberger ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel