Re: [perfmon2] Implementing Intel's drill-down technique for performance analysis

2010-11-29 Thread Eduard Diner
Thanks, that helped! Currently I am trying to understand how I can count L1 cache misses. Can I get it simply with MEM_LOAD_RETIRED:L1D_HIT:i:c=1 ? When do I know if I can use the counter mask or not? In which cases does it make sense? 2010/11/25 stephane eranian > Hi, > > I think the section

Re: [perfmon2] Implementing Intel's drill-down technique for performance analysis

2010-11-25 Thread stephane eranian
Hi, I think the section of the manual you're referring to talks about drill-down on an Intel Core (Core 2) processors. There you do have RS_UOPS_DISPATCHED. If you want to do something similar on Nehalem, I recommend you look at the presentations here: https://openlab-mu-internal.web.cern.ch/op

Re: [perfmon2] Implementing Intel's drill-down technique for performance analysis

2010-11-25 Thread stephane eranian
Hi, What machine are you trying to do this on? On Thu, Nov 25, 2010 at 4:39 PM, Eduard Diner wrote: > Hi guys, > I have instrumented my code on Nehalem with libpfm4 and now I am a little > bit confused. > I am reading Intel's "IA64-Optimization-Reference". In Appendix B3.4 they > describe a dril

[perfmon2] Implementing Intel's drill-down technique for performance analysis

2010-11-25 Thread Eduard Diner
Hi guys, I have instrumented my code on Nehalem with libpfm4 and now I am a little bit confused. I am reading Intel's "IA64-Optimization-Reference". In Appendix B3.4 they describe a drill-down technique for performance analysis. I think this is a good starting point for analysis. Basically I want t