Re: [perfmon2] Kernel support for user level rdpmc?

Ryan Johnson Thu, 01 Sep 2011 08:07:31 -0700

On 01/09/2011 6:45 AM, stephane eranian wrote:
> On Thu, Sep 1, 2011 at 3:29 PM, stephane eranian<eran...@googlemail.com>  
> wrote:
>> On Thu, Sep 1, 2011 at 3:06 PM, Ryan Johnson
>> <ryan.john...@cs.utoronto.ca>  wrote:
>>> On 01/09/2011 1:55 AM, stephane eranian wrote:
>>>> On Thu, Sep 1, 2011 at 1:07 AM, Corey Ashford
>>>> <cjash...@linux.vnet.ibm.com>    wrote:
>>>>> On 08/25/2011 07:19 AM, stephane eranian wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Sorry for late reply.
>>>>>>
>>>>>> The current support for mmaped count is broken on perf_event x86.
>>>>>> It simply does not work. I think it only works on PPC at this point.
>>>>> Just as an aside, you can access the counter registers from user space
>>>>> on Power (aka PPC) machines, but because the kernel is free to schedule
>>>>> the events onto whatever counters that meet the resource constraints,
>>>>> it's not at all clear which hardware counter to read from user space,
>>>>> and in fact, with event rotation, the counter being used can change from
>>>>> one system tick till the next.
>>>>>
>>>>> If you program a single event, you can be guaranteed that it won't move
>>>>> around, but you still will have to guess or somehow determine which
>>>>> hardware counter is being used by the kernel.
>>>>>
>>>> Yes, and that's why they have this 'lock' field in there.It's not really a 
>>>> lock
>>>> but rather a generation counter. You need to read it before you attempt to
>>>> read and you need to check it when you're done reading. If the two values
>>>> don't match then the counter changed and you need to retry. And changes
>>>> means it may have moved to a different counter.
>>> This protocol is actually documented pretty well in
>>> <linux/perf_event.h>, too. Read the lock, read the index, read hw
>>> counter[index-1], read lock again to verify.
>>>
>>>> But the key problem here is the time scaling. In case you are multiplex
>>>> you need to be able to retrieve time_enabled and time_running to scale
>>>> the count. But that's not exposed, thus it does not work as soon as you
>>>> have multiplexing. Well, unless you only care about deltas and not the
>>>> absolute values.
>>> Doesn't perf_event_mmap_page expose both those, also protected by the
>>> generation counter? Or are you saying the kernel doesn't actually update
>>> those fields right now?
>>>
>> Yes, it does. I am not sure they're updated correctly, though.
>> I have not tried that in a very long time.
>>
> Did you manage to make libpfm4's self_count program work correctly?
> Even by just looking at the raw count coming out of rdpmc?
>
> I think there are issues with hdr->offset, i.e., the 64-bit sw-maintained
> base for the counter.
I only did limited testing because things took priority the last couple 
of weeks, but I'll be back into it in the next couple of weeks. 
Meanwhile, here's what I know:


The machine is a Westmere EX (which is why I can't just use an older 
kernel+perfctr) running kernel 2.6.38. I've got the cvs head for papi, 
wired up with git version 9fc1bc1e of libpfm4. self_count seg faults by 
default because rdpmc is privileged, and papi's unit tests cause the 
machine to hard-lock (have to use the hypervisor to reboot). One 
definite culprit is ctests/overflow_allcounters, but I haven't done a 
bisection search in 2.6.38 to see if there are any others. I upgraded to 
kernel 2.6.39, ctests/overflow_allcounters is the only unit test 
failure, but it "only" hard-locks the perf events infrastructure rather 
than the whole machine. The unit tests's process hangs with 0% cpu util 
and becomes unkillable, and any later process attempting to use perf 
events suffers the same fate. The mmap+rdpmc support is apparently 
disabled in 2.6.39, in that index=0 for all time. The self_count test 
runs without errors and reports monotonically increasing values, but I 
never attempted to verify that the starting count was meaningful.

For now I've rolled back to 2.6.38, since the later version is a step 
backwards for my needs. With the kernel module I mentioned before, 
user-level rdpmc seems to stay enabled indefinitely and self_count runs 
without errors. I've extended the test slightly to run fib with 
n={30,35,40}, to track which counter number it used directly (if any), 
and to report the deltas between measurements. Here's the output I get:
> $ ./self_count
> raw=0xcd73 offset=0x0, ena=36278 run=36278 idx=-1 direct=0
>                52595 PERF_COUNT_HW_CPU_CYCLES (delta=                cd73)
> raw=0xffff811d738b offset=0x7fffffff, ena=36278 run=36278 idx=0 direct=1
>      281474995417994 PERF_COUNT_HW_CPU_CYCLES (delta=       10000011ca617)
> raw=0xffff8d588633 offset=0x7fffffff, ena=36278 run=36278 idx=0 direct=1
>      281475200615986 PERF_COUNT_HW_CPU_CYCLES (delta=             c3b12a8)
> raw=0xffff94aa8789 offset=0xfffffffe, ena=36278 run=36278 idx=0 direct=1
>      281477470914439 PERF_COUNT_HW_CPU_CYCLES (delta=            87520155)
> raw=0xffff95c33ede offset=0xfffffffe, ena=36278 run=36278 idx=0 direct=1
>      281477489311452 PERF_COUNT_HW_CPU_CYCLES (delta=             118b755)
> raw=0xffffa1ea0c92 offset=0xfffffffe, ena=36278 run=36278 idx=0 direct=1
>      281477693181072 PERF_COUNT_HW_CPU_CYCLES (delta=             c26cdb4)
> raw=0xffffa8e995ed offset=0x17ffffffd, ena=36278 run=36278 idx=0 direct=1
>      281479958074858 PERF_COUNT_HW_CPU_CYCLES (delta=            86ff895a)
> raw=0xffffaa0262b9 offset=0x17ffffffd, ena=36278 run=36278 idx=0 direct=1
>      281479976477366 PERF_COUNT_HW_CPU_CYCLES (delta=             118cccc)
> raw=0xffffb6284921 offset=0x17ffffffd, ena=36278 run=36278 idx=0 direct=1
>      281480180287774 PERF_COUNT_HW_CPU_CYCLES (delta=             c25e668)

Judging from the above, the offset does seem to be broken, truncated to 
32 bits, perhaps? If I force to always call read() then it makes more sense:
> $ ./self_count
> raw=0xda66 offset=0x0, ena=39065 run=39065 idx=-1 direct=0
>                55910 PERF_COUNT_HW_CPU_CYCLES (delta=                da66)
> raw=0x11dc60e offset=0x0, ena=10052007 run=10052007 idx=-1 direct=0
>             18728462 PERF_COUNT_HW_CPU_CYCLES (delta=             11ceba8)
> raw=0xd590016 offset=0x0, ena=120077612 run=120077612 idx=-1 direct=0
>            223936534 PERF_COUNT_HW_CPU_CYCLES (delta=             c3b3a08)
> raw=0x9466a0de offset=0x0, ena=1334882738 run=1334882738 idx=-1 direct=0
>           2489753822 PERF_COUNT_HW_CPU_CYCLES (delta=            870da0c8)
> raw=0x957f95c4 offset=0x0, ena=1344755931 run=1344755931 idx=-1 direct=0
>           2508166596 PERF_COUNT_HW_CPU_CYCLES (delta=             118f4e6)
> raw=0xa1b53a47 offset=0x0, ena=1454582523 run=1454582523 idx=-1 direct=0
>           2713008711 PERF_COUNT_HW_CPU_CYCLES (delta=             c35a483)

The counter itself seems to work fine, though, and I'd only be using it 
for deltas anyway.

Ryan


------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Kernel support for user level rdpmc?

Reply via email to