Hi, I've tried out offcore and and uncore counters on a Xeon E7- 4850 (Westmere-EX) 4-Socket-Server and compared the output of both counters.
Here are the results: perf stat --per-socket --interval-print 1000 -a \ -e "uncore_mbox_0/event=bbox_ cmds_read/" -e "uncore_mbox_1/event=bbox_cmds_read/" -e "cpu/config=0x5301b7,config1=0x40ff/" <- all accesses to local DRAM -e "cpu/config=0x5301bb,config1=0x20ff/" <- all accesses to remote DRAM taskset -c 10 ./stream.1000M.1000 -> start stream with single thread on socket 1 -> first touch policy allocates memory on socket 1 51.200567532 S0 1 51212 uncore_mbox_0/event=bbox_cmds_read/ 51.200567532 S0 1 53875 uncore_mbox_1/event=bbox_cmds_read/ 51.200567532 S0 20 930 cpu/config=0x5301b7,config1=0x40ff/ 51.200567532 S0 20 256 cpu/config=0x5301bb,config1=0x20ff/ 51.200567532 S1 1 35026588 uncore_mbox_0/event=bbox_cmds_read/ 51.200567532 S1 1 35027264 uncore_mbox_1/event=bbox_cmds_read/ 51.200567532 S1 20 70051225 cpu/config=0x5301b7,config1=0x40ff/ 51.200567532 S1 20 94532 cpu/config=0x5301bb,config1=0x20ff/ 51.200567532 S2 1 1100 uncore_mbox_0/event=bbox_cmds_read/ 51.200567532 S2 1 1313 uncore_mbox_1/event=bbox_cmds_read/ 51.200567532 S2 20 502 cpu/config=0x5301b7,config1=0x40ff/ 51.200567532 S2 20 543 cpu/config=0x5301bb,config1=0x20ff/ 51.200567532 S3 1 1837 uncore_mbox_0/event=bbox_cmds_read/ 51.200567532 S3 1 1995 uncore_mbox_1/event=bbox_cmds_read/ 51.200567532 S3 20 422 cpu/config=0x5301b7,config1=0x40ff/ 51.200567532 S3 20 937 cpu/config=0x5301bb,config1=0x20ff/ Observation: uncore_mbox_0 + uncore_mbox_1 = offcore_response_0 (config1=0x40ff) taskset -pc 20 $(pgrep stream.1000M.10) -> move process to socket 2 58.372255828 S0 1 34562 uncore_mbox_0/event=bbox_cmds_read/ 58.372255828 S0 1 36453 uncore_mbox_1/event=bbox_cmds_read/ 58.372255828 S0 20 1076 cpu/config=0x5301b7,config1=0x40ff/ 58.372255828 S0 20 419 cpu/config=0x5301bb,config1=0x20ff/ 58.372255828 S1 1 27712533 uncore_mbox_0/event=bbox_cmds_read/ 58.372255828 S1 1 27713447 uncore_mbox_1/event=bbox_cmds_read/ 58.372255828 S1 20 98 cpu/config=0x5301b7,config1=0x40ff/ 58.372255828 S1 20 490 cpu/config=0x5301bb,config1=0x20ff/ 58.372255828 S2 1 17692 uncore_mbox_0/event=bbox_cmds_read/ 58.372255828 S2 1 18255 uncore_mbox_1/event=bbox_cmds_read/ 58.372255828 S2 20 34914 cpu/config=0x5301b7,config1=0x40ff/ 58.372255828 S2 20 55478954 cpu/config=0x5301bb,config1=0x20ff/ 58.372255828 S3 1 1734 uncore_mbox_0/event=bbox_cmds_read/ 58.372255828 S3 1 2057 uncore_mbox_1/event=bbox_cmds_read/ 58.372255828 S3 20 407 cpu/config=0x5301b7,config1=0x40ff/ 58.372255828 S3 20 1110 cpu/config=0x5301bb,config1=0x20ff/ Observation: uncore_mbox_0 + uncore_mbox_1 = offcore_response_1 (config1=0x20ff) ./check_events OFFCORE_RESPONSE_0:ANY_REQUEST:LOCAL_DRAM_AND_REMOTE_CACHE_HIT Requested Event: OFFCORE_RESPONSE_0:ANY_REQUEST:LOCAL_DRAM_AND_REMOTE_CACHE_HIT Actual Event: wsm_dp::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:OTHER:LOCAL_DRAM_AND_REMOTE_CACHE_HIT:k=1:u=1:e=0:i=0:c=0:t=0 PMU : Intel Westmere DP IDX : 111149145 Codes : 0x5301b7 0x10ff ./check_events OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_DRAM Requested Event: OFFCORE_RESPONSE_0:ANY_REQUEST:REMOTE_DRAM Actual Event: wsm_dp::OFFCORE_RESPONSE_0:DMND_DATA_RD:DMND_RFO:DMND_IFETCH:WB:PF_DATA_RD:PF_RFO:PF_IFETCH:OTHER:REMOTE_DRAM:k=1:u=1:e=0:i=0:c=0:t=0 PMU : Intel Westmere DP IDX : 111149145 Codes : 0x5301b7 0x20ff offcore_response_1 (config1=0x10ff) --> gives wrong results on (Westmere-EX) 45.122619772 S0 1 47750 uncore_mbox_0/event=bbox_cmds_read/ 45.122619772 S0 1 49549 uncore_mbox_1/event=bbox_cmds_read/ 45.122619772 S0 1 47864 uncore_bbox_0/counter=0x1,event=0x1D/ 45.122619772 S0 1 49504 uncore_bbox_1/counter=0x1,event=0x1D/ 45.122619772 S0 20 212 cpu/config=0x5301b7,config1=0x10ff/ 45.122619772 S0 20 290 cpu/config=0x5301bb,config1=0x20ff/ 45.122619772 S1 1 37402338 uncore_mbox_0/event=bbox_cmds_read/ 45.122619772 S1 1 37398016 uncore_mbox_1/event=bbox_cmds_read/ 45.122619772 S1 1 37397916 uncore_bbox_0/counter=0x1,event=0x1D/ 45.122619772 S1 1 37442759 uncore_bbox_1/counter=0x1,event=0x1D/ 45.122619772 S1 20 574 cpu/config=0x5301b7,config1=0x10ff/ 45.122619772 S1 20 85665 cpu/config=0x5301bb,config1=0x20ff/ 45.122619772 S2 1 1382 uncore_mbox_0/event=bbox_cmds_read/ 45.122619772 S2 1 1921 uncore_mbox_1/event=bbox_cmds_read/ 45.122619772 S2 1 1385 uncore_bbox_0/counter=0x1,event=0x1D/ 45.122619772 S2 1 1920 uncore_bbox_1/counter=0x1,event=0x1D/ 45.122619772 S2 20 276 cpu/config=0x5301b7,config1=0x10ff/ 45.122619772 S2 20 1108 cpu/config=0x5301bb,config1=0x20ff/ 45.122619772 S3 1 1289 uncore_mbox_0/event=bbox_cmds_read/ 45.122619772 S3 1 1367 uncore_mbox_1/event=bbox_cmds_read/ 45.122619772 S3 1 1257 uncore_bbox_0/counter=0x1,event=0x1D/ 45.122619772 S3 1 1335 uncore_bbox_1/counter=0x1,event=0x1D/ 45.122619772 S3 20 258 cpu/config=0x5301b7,config1=0x10ff/ 45.122619772 S3 20 665 cpu/config=0x5301bb,config1=0x20ff/ Table 18-21, describing "MSR_OFFCORE_RSP_0 and MSR_OFFCORE_RSP_1 Bit Field Definition" on Nehalem seems to be valid for Westmere-EX / libpfm4 seems to be wrong??? (IntelĀ® 64 and IA-32 Architectures Software Developer s Manual, Volume 3 (3A, 3B & 3C): System Programming Guide - February 2014) Best regards, Andreas Hollmann 2014-02-24 14:31 GMT+01:00 Stephane Eranian <eran...@googlemail.com>: > On Mon, Feb 24, 2014 at 2:03 PM, Manuel Selva <selva.man...@gmail.com> wrote: >> Hi, >> >> Following my investigations I reached the following documentation of >> intel Vtunes amplifier tool: >> >> http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/~amplifierxe/pmw_dp/events/offcore_response.html >> > Yes, that's another good source of information. > >> In this document, the bit 14 is described has nothing and the bit 12 >> is defined as remote cache forward AND local ram accesses. >> >> According to this document, to the libpfm showevtinfo and to my >> experiments I am concluding that the Intel documentation is wrong, and >> that offcore response events are only able to count globally Local RAM >> accesses and remote cache accesses. It's impossible to count these >> events separately. This idea is conforted by the existence of core >> event named MEM_UNCORE_RETIRED.LOCAL_DRAM_AND_REMOTE_CACHE_HIT and the >> absence of separate events. >> > I believe your conclusion is correct AFAIR. On Westmere, you cannot measure > those event separately. You'd want to try on IvyBridge-EP (IvyTown), I think. > >> Nevertheless I was not able to confirm this hypothesis from an >> official Intel documentation and was wondering where you (libpfm >> author) got the information to write your library. >> > Waiting for an official answer from them as well. > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > perfmon2-devel mailing list > perfmon2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel ------------------------------------------------------------------------------ HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing & Easy Data Exploration http://p.sf.net/sfu/hpccsystems _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel