Re: [perfmon2] Replicating 'What can performance counters do for memory subsystem analysis?' results

stephane eranian Mon, 13 Sep 2010 15:36:12 -0700

Hi,

Another thing to watch out is user vs. kernel.
I have changed the logic inside libpfm4. If you're using the latest
version it measures user+kernel execution. The kernel may touch
the memory before you do. Thus, changing the optimization level
may not impact your results.


You can try forcing user level only:

 ./task -e LLC_MISSES:u,LLC_REFERENCES:u ./mcol

And yes, make sure the compiler is not playing games on you.


On Tue, Sep 14, 2010 at 12:07 AM, Vince Weaver <vweav...@eecs.utk.edu> wrote:
> On Mon, 13 Sep 2010, DRAM Ninjas wrote:
>> Thanks for your reply. I've tried with -O0 and without any optimization 
>> flags (not sure what gcc
>> defaults to, now that I think about it) and I get roughly the same thing. If 
>> I print out the values of
>> the two sums, that will force it to not optimize them out, right?
>
> yes.  You have to be careful when you use no optimizations, because the
> compiler generates really naive code in that case.  Have you checked the
> assembly output generated by the compiler yet?
>
>> And your point about being impossible to correlate measured to expected, 
>> could you provide any more
>> insight? I'm quite baffled at the miss rates for workloads that I know will 
>> miss _every_ access in the
>> main program loop (i.e. random memory walks in large array).
>
> You can look at the presentations here:
>   http://www.cs.utk.edu/~vweaver1/presentations/
>
> These are recent results (so no paper-length versions of them yet).  You
> want the slides at the end that show cache miss results for various x86_64
> processors.  This is for a very simple array-walk workload, and the
> results are very hard to interpret.  In no case were the results ever the
> "expected" result, even when doing backward or random strides.
>
> My entire PhD thesis was on my attempt to match perf-counters to simulator
> results.  It turns out to be very difficult on anything more recent
> than a MIPS R12000.
>
> Vince
> ------------------------------------------------------------------------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>
>

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Replicating 'What can performance counters do for memory subsystem analysis?' results

Reply via email to