Ah, I actually just started reading your paper yesterday before getting
sidetracked. I'll have to go back and finish reading. Thanks for the slides,
I'll take a look.
I installed msr-tools on ubuntu and gave your prefetch disabler a shot, and
it definitely succeeded:
$ ./task -e LLC_MISSES:u,LLC_REFERENCES:u ./mcol
Allocating 16 MiB for 1024x1024 matrix
493.45MiB/s
val=563167266032218500+562975029221663500i
130,976,448 LLC_MISSES:u (16,545,745,424 : 16,545,745,424)
131,414,516 LLC_REFERENCES:u (16,545,745,424 : 16,545,745,424)
(with -O0 and -O2)
Stephan:
Thanks for your input as well. A quick question: can libpfm4 track kernel
mode even if I'm not running 'task' as root?
Thanks,
Paul
On Mon, Sep 13, 2010 at 6:07 PM, Vince Weaver <vweav...@eecs.utk.edu> wrote:
> On Mon, 13 Sep 2010, DRAM Ninjas wrote:
> > Thanks for your reply. I've tried with -O0 and without any optimization
> flags (not sure what gcc
> > defaults to, now that I think about it) and I get roughly the same thing.
> If I print out the values of
> > the two sums, that will force it to not optimize them out, right?
>
> yes. You have to be careful when you use no optimizations, because the
> compiler generates really naive code in that case. Have you checked the
> assembly output generated by the compiler yet?
>
> > And your point about being impossible to correlate measured to expected,
> could you provide any more
> > insight? I'm quite baffled at the miss rates for workloads that I know
> will miss _every_ access in the
> > main program loop (i.e. random memory walks in large array).
>
> You can look at the presentations here:
> http://www.cs.utk.edu/~vweaver1/presentations/
>
> These are recent results (so no paper-length versions of them yet). You
> want the slides at the end that show cache miss results for various x86_64
> processors. This is for a very simple array-walk workload, and the
> results are very hard to interpret. In no case were the results ever the
> "expected" result, even when doing backward or random strides.
>
> My entire PhD thesis was on my attempt to match perf-counters to simulator
> results. It turns out to be very difficult on anything more recent
> than a MIPS R12000.
>
> Vince
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel