Re: [perfmon2] Replicating 'What can performance counters do for memory subsystem analysis?' results

DRAM Ninjas Mon, 13 Sep 2010 20:19:15 -0700

Ah, I actually just started reading your paper yesterday before getting
sidetracked. I'll have to go back and finish reading. Thanks for the slides,
I'll take a look.


I installed msr-tools on ubuntu and gave your prefetch disabler a shot, and
it definitely succeeded:

$ ./task -e LLC_MISSES:u,LLC_REFERENCES:u ./mcol
Allocating 16 MiB for 1024x1024 matrix
493.45MiB/s
val=563167266032218500+562975029221663500i
         130,976,448 LLC_MISSES:u (16,545,745,424 : 16,545,745,424)
         131,414,516 LLC_REFERENCES:u (16,545,745,424 : 16,545,745,424)

(with -O0 and -O2)

Stephan:
Thanks for your input as well. A quick question: can libpfm4 track kernel
mode even if I'm not running 'task' as root?

Thanks,
Paul

On Mon, Sep 13, 2010 at 6:07 PM, Vince Weaver <[email protected]> wrote:

> On Mon, 13 Sep 2010, DRAM Ninjas wrote:
> > Thanks for your reply. I've tried with -O0 and without any optimization
> flags (not sure what gcc
> > defaults to, now that I think about it) and I get roughly the same thing.
> If I print out the values of
> > the two sums, that will force it to not optimize them out, right?
>
> yes.  You have to be careful when you use no optimizations, because the
> compiler generates really naive code in that case.  Have you checked the
> assembly output generated by the compiler yet?
>
> > And your point about being impossible to correlate measured to expected,
> could you provide any more
> > insight? I'm quite baffled at the miss rates for workloads that I know
> will miss _every_ access in the
> > main program loop (i.e. random memory walks in large array).
>
> You can look at the presentations here:
>   http://www.cs.utk.edu/~vweaver1/presentations/
>
> These are recent results (so no paper-length versions of them yet).  You
> want the slides at the end that show cache miss results for various x86_64
> processors.  This is for a very simple array-walk workload, and the
> results are very hard to interpret.  In no case were the results ever the
> "expected" result, even when doing backward or random strides.
>
> My entire PhD thesis was on my attempt to match perf-counters to simulator
> results.  It turns out to be very difficult on anything more recent
> than a MIPS R12000.
>
> Vince

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev

_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Replicating 'What can performance counters do for memory subsystem analysis?' results

Reply via email to