Thank you for your swift response. I really appreciate it.

stephane eranian wrote:
> Hello Erik,
>
>
> On Tue, Jul 15, 2008 at 1:37 PM, Erik Junberger <[EMAIL PROTECTED]> wrote:
>   
>> Hello perfmon devolopers and users.
>>
>> I  am currently working on my thesis project, which in short is to
>> integrate sampling capabilities into a commercial Java virtual machine.
>> This is achieved via the perfmon2 kernel interface and libraries. The
>> samples obtained is then used to make various  dynamic optimizations.
>> The idea is to use one system-wide context per CPU to measure last-level
>> cache misses using PEBS. On buffer overflow, the PEBS buffer will be
>> read and aggregated in a data structure for further analysis by another
>> thread.
>>
>>     
> I just want to warn you that not ALL events support PEBS.
> The LAST_LEVEL_CACHE_MISSES does not support PEBS.
>
> You need to look at the example in libpfm/examples/x86/smpl_core_pebs.c
>   
I am aware of this, and I am actually using MEM_LOAD_RETIRED with 
Umask-02 or 03, which as far as I can tell corresponds to the same thing 
on a  Core2?

Umask-02 : 0x04 : [L2_MISS] : Retired loads that miss the L2 cache 
(precise event)
Umask-03 : 0x08 : [L2_LINE_MISS] : L2 cache line missed by retired loads 
(precise event)

I have also tried with INST_RETIRED:ANY_P or INSTRUCTIONS_RETIRED which 
is also supported by PEBS.

>
>   
>> I have managed to implement this by looking at the examples supplied
>> with  perfmon. I can create the contexts, program them, bind them to a
>> CPU and start monitoring without any problems. The values i receive
>> however are a little bit strange, and I wonder if anyone has a clue of
>> what might be going on.
>>
>>     
> I assume you've looked at examples/x86/smpl_pebs_core.c. The important
> trick is about PFM_REGFL_NO_EMUL64 on the PMC controlling the counter.
>
>   
I set this flag in my code also.

>> If I for instance set the /reg_value, reg_long_reset, reg_short_reset/
>> and /pfm_pebs_core_smpl_arg_t.cnt_reset/ to -10000 (on pmd0), the values
>> are initially set correctly. Every time pmd0 wraps around however, the
>> lower 32-bits of pmd0 will be set to 0, and the upper 32 to 1. This
>> effectively means that I can't choose any other sampling period than 2³².
>> When I read the /pfm_ds_area_core_t.pebs_cnt_reset/ value, the correct
>> reset value is always returned, but this doesn't reflect reality.
>>
>>     
> With PEBS, you sampling value can only be 32-bit wide due to the wrmsrl()
> restriction that it can only modify the lower 32 bits. In fact you actually 
> have
> 31-bits, bit 31 being the sign bit.
>   
I see. This should be sufficient though.

>
>   
>> When PMD0 wraps around, no interrupt is generated, and no overflow is
>> registered in /pfm_pebs_core_smpl_hdr_t.overflows/. /pebs_index /is not
>> incremented either.
>>
>>     
> With PEBS, there is not interrupt until the buffer fills up. That's
> the whole idea.
> Amortize the cost of taking the interrupt over a large number of samples.
> But even though the PMC is set not to interrupt, the CPU will catch the 
> overflow
> and micro-code will write a sample in the buffer. You'll only get an
> overflow once
> the buffer fills up, i.e., when the current position = threshold.
> After a sample is
> recorded, the micro-code reloads the counter with the cnt_reset value.
> That field
> is never actually modified by HW.
>
>
>   
Yes, the concept of amortizing the cost of many samples is clear to me. 
But I never get any interrupts at all.
I have also tried to have a separate thread do /read(fd, &msg, 
sizeof(msg))/ during the execution of the program. This call never returns.
Also the /pebs_index /value never changes during execution, indicating 
that no samples are written to memory, if I am not mistaken?

Shouldn't it also be impossible for pmd0 to reach a value lower than 
2^64 - cnt_reset? Or is bit 31 to be considered a sign bit even in 
perfmon's virtual 64-bit counters?
 
>> I have tried a lot of setup combinations in order to get this to work,
>> but nothing has worked. PEBS monitoring on a per-thread basis works
>> fine, so I don't think there is anything wrong with my system. I have
>> tried this both with a 2.6.24 and 2.6.25 kernel versions, with
>> libpfm-3.3 and 3.4 respectively.
>>
>>     
> I have tried this using pfmon in system-wide mode:
>
> $ pfmon --smpl-ignore-pids --system-wide --cpu-list=0
> --smpl-module=pebs -einstructions_retired
> --long-smpl-periods=240000000 --pin-command my_test_program
>
> What happens on your system with this?
>   
This seems to work fine, as I am at least getting some samples

# results for CPU0
# total samples          : 1
# total buffer overflows : 0
#
## counts   %self    %cum          code addr
       1 100.00% 100.00% 0x00007f576e5fde20

If I reduce --long-smpl-periods i get more samples + overflows.

# results for CPU0
# total samples          : 626
# total buffer overflows : 3
#
## counts   %self    %cum          code addr
      57   9.11%   9.11% 0x00007f1a9534be48
      22   3.51%  12.62% 0x00007f1a9534c18c
      21   3.35%  15.97% 0x00007f1a9534be53
      21   3.35%  19.33% 0x00007f1a9534c1ba
-----------------------------------------------------------------------------------------

Best regards:

Erik Junberger


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to