Hi Stephan,

Thanks for the answer, but I am not sure to understand it.

The vol3b (the version I am using is dated to June 2013 at the bottom
of the first page) for Westemere describes offcore_rsp such as
REMOTE_CACHE_FWD and REMOTE_DRAM. These events concern only
dual-socket configurations, isn't it ? If not, what remote means for a
single socket system ?

In which other Intel documentation can I find the offcore_resp bits
for dual-socket systems and where you (libpfm4 author) found the
information allowing to report libpfm4 the results I copied in my
question ? Moreover, is there more bits only for offcore_resp to be
set and the single socket ones stay valid or is it a totally different
configuration has it seems to be ?

Manuel

2014-02-07 13:58 GMT+01:00 Stephane Eranian <eran...@googlemail.com>:
> Hi,
>
>
> On Tue, Feb 4, 2014 at 4:50 PM, Manuel Selva <selva.man...@gmail.com> wrote:
>> Hi all,
>>
>> I am using the checkevents program (on intel Xeon 5560, family 0X
>> 06_2C) to get the values of the configuration fields I have to give to
>> perf_event_open system call as following:
>>
>>> check_events OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_CACHE_HITM
>>
>> the result is:
>>
>> ........
>> Detected PMU models:
>> [18, ix86arch, "Intel X86 architectural PMU"]
>> [51, perf, "perf_events generic PMU"]
>> [53, wsm_dp, "Intel Westmere DP"]
>> [54, wsm_unc, "Intel Westmere uncore"]
>> Total events: 3042 available, 229 supported
>> Requested Event: OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_CACHE_HITM
>> Actual    Event:
>> wsm_dp::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:PF_DATA_RD:PF_RFO:REMOTE_CACHE_HITM:k=1:u=1:e=0:i=0:c=0:t=0
>> PMU            : Intel Westmere DP
>> IDX            : 111149145
>> Codes          : 0x5301b7 0x833
>>
>> After looking at the Intel documentation, it seems that the the second
>> field (config1 for perf_event_open) should be  0x1033 and not 0x833 as
>> reported by checkevents.
>>
>> More over, the output of the showevtinfo contains the following
>> regarding OFFCORE_RESPONSE_0 event:
>>
>> .......
>> Umask-13 : 0x100 : PMU : [UNCORE_HIT] : None : Response: counts L3
>> Hit: local or remote home requests that hit L3 cache in the uncore
>> with no coherency actions required (snooping)
>> Umask-14 : 0x200 : PMU : [OTHER_CORE_HIT_SNP] : None : Response:
>> counts L3 Hit: local or remote home requests that hit L3 cache in the
>> uncore and was serviced by another core with a cross core snoop where
>> no modified copies were found (clean)
>> Umask-15 : 0x400 : PMU : [OTHER_CORE_HITM] : None : Response: counts
>> L3 Hit: local or remote home requests that hit L3 cache in the uncore
>> and was serviced by another core with a cross core snoop where
>> modified copies were found (HITM)
>> Umask-16 : 0x800 : PMU : [REMOTE_CACHE_HITM] : None : Response: counts
>> L3 Hit: local or remote home requests that hit a remote L3 cacheline
>> in modified (HITM) state
>> Umask-17 : 0x1000 : PMU : [LOCAL_DRAM_AND_REMOTE_CACHE_HIT] : None :
>> Response: counts L3 Miss: local home requests that missed the L3 cache
>> and were serviced by local DRAM or a remote cache
>> Umask-18 : 0x2000 : PMU : [REMOTE_DRAM] : None : Response: counts L3
>> Miss: remote home requests that missed the L3 cache and were serviced
>> by remote DRAM
>> Umask-19 : 0x4000 : PMU : [OTHER_LLC_MISS] : None : Response: counts
>> L3 Miss: remote home requests that missed the L3 cache
>> Umask-20 : 0x8000 : PMU : [NON_DRAM] : None : Response: Non-DRAM
>> requests that were serviced by IOH
>> .......
>>
>> The Intel documentation (Intel 64 and IA-32 Architectures Software
>> Developer's Manual Volume 3B: System Programming Guide, Part 2)
>> chapter 18.6.1.3 (Off-core Response Performance Monitoring in the
>> Processor Core Programming) define in table 18.15 bits from 8 to 15
>> for response type as following:
>>
>> 8   UNCORE_HIT
>> 9   OTHER_CORE_HIT_SNP
>> 10 OTHER_CORE_HITM
>> 11 Reserved
>> 12 REMOTE_CACHE_FWD
>> 13 REMOTE_DRAM
>> 14 LOCAL_DRAM
>> 15 NON_DRAM
>>
>> The mask reported by showevtinfo are not the same as the bits
>> indicated by Intel's documentation. The terminology is not exactly the
>> same, but more important there is no reserved bit in the  showevtinfo
>> output.
> Reserved bit are not exposed by libpfm4. By definition, they don't count
> anything.
>
>>
>> What am I missing here ? Where do these differences can come ?
>>
> You are on a Westmere-DP processor (dual-socket), as such you have
> remote cache and thus offcore_resp has more bits to set. This is the
> same on SandyBridge, IvyBridge.
>
> The vol3b for Westmere does seem to describe only the single socket
> offcore_rsp.
>
>> ------------------------------------------------------------------------------
>> Managing the Performance of Cloud-Based Applications
>> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
>> Read the Whitepaper.
>> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
>> _______________________________________________
>> perfmon2-devel mailing list
>> perfmon2-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

------------------------------------------------------------------------------
Managing the Performance of Cloud-Based Applications
Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
Read the Whitepaper.
http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to