Hi Stephan, Thanks for the answer, but I am not sure to understand it.
The vol3b (the version I am using is dated to June 2013 at the bottom of the first page) for Westemere describes offcore_rsp such as REMOTE_CACHE_FWD and REMOTE_DRAM. These events concern only dual-socket configurations, isn't it ? If not, what remote means for a single socket system ? In which other Intel documentation can I find the offcore_resp bits for dual-socket systems and where you (libpfm4 author) found the information allowing to report libpfm4 the results I copied in my question ? Moreover, is there more bits only for offcore_resp to be set and the single socket ones stay valid or is it a totally different configuration has it seems to be ? Manuel 2014-02-07 13:58 GMT+01:00 Stephane Eranian <eran...@googlemail.com>: > Hi, > > > On Tue, Feb 4, 2014 at 4:50 PM, Manuel Selva <selva.man...@gmail.com> wrote: >> Hi all, >> >> I am using the checkevents program (on intel Xeon 5560, family 0X >> 06_2C) to get the values of the configuration fields I have to give to >> perf_event_open system call as following: >> >>> check_events OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_CACHE_HITM >> >> the result is: >> >> ........ >> Detected PMU models: >> [18, ix86arch, "Intel X86 architectural PMU"] >> [51, perf, "perf_events generic PMU"] >> [53, wsm_dp, "Intel Westmere DP"] >> [54, wsm_unc, "Intel Westmere uncore"] >> Total events: 3042 available, 229 supported >> Requested Event: OFFCORE_RESPONSE_0:ANY_DATA:REMOTE_CACHE_HITM >> Actual Event: >> wsm_dp::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:PF_DATA_RD:PF_RFO:REMOTE_CACHE_HITM:k=1:u=1:e=0:i=0:c=0:t=0 >> PMU : Intel Westmere DP >> IDX : 111149145 >> Codes : 0x5301b7 0x833 >> >> After looking at the Intel documentation, it seems that the the second >> field (config1 for perf_event_open) should be 0x1033 and not 0x833 as >> reported by checkevents. >> >> More over, the output of the showevtinfo contains the following >> regarding OFFCORE_RESPONSE_0 event: >> >> ....... >> Umask-13 : 0x100 : PMU : [UNCORE_HIT] : None : Response: counts L3 >> Hit: local or remote home requests that hit L3 cache in the uncore >> with no coherency actions required (snooping) >> Umask-14 : 0x200 : PMU : [OTHER_CORE_HIT_SNP] : None : Response: >> counts L3 Hit: local or remote home requests that hit L3 cache in the >> uncore and was serviced by another core with a cross core snoop where >> no modified copies were found (clean) >> Umask-15 : 0x400 : PMU : [OTHER_CORE_HITM] : None : Response: counts >> L3 Hit: local or remote home requests that hit L3 cache in the uncore >> and was serviced by another core with a cross core snoop where >> modified copies were found (HITM) >> Umask-16 : 0x800 : PMU : [REMOTE_CACHE_HITM] : None : Response: counts >> L3 Hit: local or remote home requests that hit a remote L3 cacheline >> in modified (HITM) state >> Umask-17 : 0x1000 : PMU : [LOCAL_DRAM_AND_REMOTE_CACHE_HIT] : None : >> Response: counts L3 Miss: local home requests that missed the L3 cache >> and were serviced by local DRAM or a remote cache >> Umask-18 : 0x2000 : PMU : [REMOTE_DRAM] : None : Response: counts L3 >> Miss: remote home requests that missed the L3 cache and were serviced >> by remote DRAM >> Umask-19 : 0x4000 : PMU : [OTHER_LLC_MISS] : None : Response: counts >> L3 Miss: remote home requests that missed the L3 cache >> Umask-20 : 0x8000 : PMU : [NON_DRAM] : None : Response: Non-DRAM >> requests that were serviced by IOH >> ....... >> >> The Intel documentation (Intel 64 and IA-32 Architectures Software >> Developer's Manual Volume 3B: System Programming Guide, Part 2) >> chapter 18.6.1.3 (Off-core Response Performance Monitoring in the >> Processor Core Programming) define in table 18.15 bits from 8 to 15 >> for response type as following: >> >> 8 UNCORE_HIT >> 9 OTHER_CORE_HIT_SNP >> 10 OTHER_CORE_HITM >> 11 Reserved >> 12 REMOTE_CACHE_FWD >> 13 REMOTE_DRAM >> 14 LOCAL_DRAM >> 15 NON_DRAM >> >> The mask reported by showevtinfo are not the same as the bits >> indicated by Intel's documentation. The terminology is not exactly the >> same, but more important there is no reserved bit in the showevtinfo >> output. > Reserved bit are not exposed by libpfm4. By definition, they don't count > anything. > >> >> What am I missing here ? Where do these differences can come ? >> > You are on a Westmere-DP processor (dual-socket), as such you have > remote cache and thus offcore_resp has more bits to set. This is the > same on SandyBridge, IvyBridge. > > The vol3b for Westmere does seem to describe only the single socket > offcore_rsp. > >> ------------------------------------------------------------------------------ >> Managing the Performance of Cloud-Based Applications >> Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. >> Read the Whitepaper. >> http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk >> _______________________________________________ >> perfmon2-devel mailing list >> perfmon2-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel