Re: [perfmon2] Replicating 'What can performance counters do for memory subsystem analysis?' results

stephane eranian Mon, 13 Sep 2010 23:31:55 -0700

On Tue, Sep 14, 2010 at 5:18 AM, DRAM Ninjas <dramnin...@gmail.com> wrote:
> Ah, I actually just started reading your paper yesterday before getting
> sidetracked. I'll have to go back and finish reading. Thanks for the slides,
> I'll take a look.
> I installed msr-tools on ubuntu and gave your prefetch disabler a shot, and
> it definitely succeeded:
> $ ./task -e LLC_MISSES:u,LLC_REFERENCES:u ./mcol
> Allocating 16 MiB for 1024x1024 matrix
> 493.45MiB/s
> val=563167266032218500+562975029221663500i
>          130,976,448 LLC_MISSES:u (16,545,745,424 : 16,545,745,424)
>          131,414,516 LLC_REFERENCES:u (16,545,745,424 : 16,545,745,424)
> (with -O0 and -O2)
> Stephan:
> Thanks for your input as well. A quick question: can libpfm4 track kernel
> mode even if I'm not running 'task' as root?


Yes, it can. Either drop :u or add :u:k

> Thanks,
> Paul
> On Mon, Sep 13, 2010 at 6:07 PM, Vince Weaver <vweav...@eecs.utk.edu> wrote:
>>
>> On Mon, 13 Sep 2010, DRAM Ninjas wrote:
>> > Thanks for your reply. I've tried with -O0 and without any optimization
>> > flags (not sure what gcc
>> > defaults to, now that I think about it) and I get roughly the same
>> > thing. If I print out the values of
>> > the two sums, that will force it to not optimize them out, right?
>>
>> yes.  You have to be careful when you use no optimizations, because the
>> compiler generates really naive code in that case.  Have you checked the
>> assembly output generated by the compiler yet?
>>
>> > And your point about being impossible to correlate measured to expected,
>> > could you provide any more
>> > insight? I'm quite baffled at the miss rates for workloads that I know
>> > will miss _every_ access in the
>> > main program loop (i.e. random memory walks in large array).
>>
>> You can look at the presentations here:
>>   http://www.cs.utk.edu/~vweaver1/presentations/
>>
>> These are recent results (so no paper-length versions of them yet).  You
>> want the slides at the end that show cache miss results for various x86_64
>> processors.  This is for a very simple array-walk workload, and the
>> results are very hard to interpret.  In no case were the results ever the
>> "expected" result, even when doing backward or random strides.
>>
>> My entire PhD thesis was on my attempt to match perf-counters to simulator
>> results.  It turns out to be very difficult on anything more recent
>> than a MIPS R12000.
>>
>> Vince
>
> ------------------------------------------------------------------------------
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>
>

------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Replicating 'What can performance counters do for memory subsystem analysis?' results

Reply via email to