On Mon, 13 Sep 2010, DRAM Ninjas wrote:
> Thanks for your reply. I've tried with -O0 and without any optimization flags 
> (not sure what gcc
> defaults to, now that I think about it) and I get roughly the same thing. If 
> I print out the values of
> the two sums, that will force it to not optimize them out, right? 

yes.  You have to be careful when you use no optimizations, because the 
compiler generates really naive code in that case.  Have you checked the 
assembly output generated by the compiler yet?

> And your point about being impossible to correlate measured to expected, 
> could you provide any more
> insight? I'm quite baffled at the miss rates for workloads that I know will 
> miss _every_ access in the
> main program loop (i.e. random memory walks in large array). 

You can look at the presentations here:
   http://www.cs.utk.edu/~vweaver1/presentations/

These are recent results (so no paper-length versions of them yet).  You 
want the slides at the end that show cache miss results for various x86_64 
processors.  This is for a very simple array-walk workload, and the 
results are very hard to interpret.  In no case were the results ever the 
"expected" result, even when doing backward or random strides.

My entire PhD thesis was on my attempt to match perf-counters to simulator 
results.  It turns out to be very difficult on anything more recent
than a MIPS R12000.

Vince
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to