On Tue, Sep 14, 2010 at 5:18 AM, DRAM Ninjas <dramnin...@gmail.com> wrote: > Ah, I actually just started reading your paper yesterday before getting > sidetracked. I'll have to go back and finish reading. Thanks for the slides, > I'll take a look. > I installed msr-tools on ubuntu and gave your prefetch disabler a shot, and > it definitely succeeded: > $ ./task -e LLC_MISSES:u,LLC_REFERENCES:u ./mcol > Allocating 16 MiB for 1024x1024 matrix > 493.45MiB/s > val=563167266032218500+562975029221663500i > 130,976,448 LLC_MISSES:u (16,545,745,424 : 16,545,745,424) > 131,414,516 LLC_REFERENCES:u (16,545,745,424 : 16,545,745,424) > (with -O0 and -O2) > Stephan: > Thanks for your input as well. A quick question: can libpfm4 track kernel > mode even if I'm not running 'task' as root?
Yes, it can. Either drop :u or add :u:k > Thanks, > Paul > On Mon, Sep 13, 2010 at 6:07 PM, Vince Weaver <vweav...@eecs.utk.edu> wrote: >> >> On Mon, 13 Sep 2010, DRAM Ninjas wrote: >> > Thanks for your reply. I've tried with -O0 and without any optimization >> > flags (not sure what gcc >> > defaults to, now that I think about it) and I get roughly the same >> > thing. If I print out the values of >> > the two sums, that will force it to not optimize them out, right? >> >> yes. You have to be careful when you use no optimizations, because the >> compiler generates really naive code in that case. Have you checked the >> assembly output generated by the compiler yet? >> >> > And your point about being impossible to correlate measured to expected, >> > could you provide any more >> > insight? I'm quite baffled at the miss rates for workloads that I know >> > will miss _every_ access in the >> > main program loop (i.e. random memory walks in large array). >> >> You can look at the presentations here: >> http://www.cs.utk.edu/~vweaver1/presentations/ >> >> These are recent results (so no paper-length versions of them yet). You >> want the slides at the end that show cache miss results for various x86_64 >> processors. This is for a very simple array-walk workload, and the >> results are very hard to interpret. In no case were the results ever the >> "expected" result, even when doing backward or random strides. >> >> My entire PhD thesis was on my attempt to match perf-counters to simulator >> results. It turns out to be very difficult on anything more recent >> than a MIPS R12000. >> >> Vince > > ------------------------------------------------------------------------------ > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > perfmon2-devel mailing list > perfmon2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > > ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel