Re: [Valgrind-users] cachegrind for Xeon e5-2643v2

Josef Weidendorfer Fri, 04 Jul 2014 01:05:16 -0700

Am 03.07.2014 19:22, schrieb Jerry Scharf:
> Tom and Josef,
> 
> Thank you for your speedy responses.
> 
> Is cachegrind an instrumentation of the real system or a simulation of the 
> processor cache based on the code executed? From your responses, it sounds 
> like the later.


Yes.

For the first, use perf/PAPI/oprofile/VTUNE/... which use performance
counters of your processor.


> If so, this is actually better for what I want to do.
> 
> Because of the fact that the cache is shared among the cores, it is hard to 
> tell from single runs what it will be like with concurrent jobs. If it is 
> simulated and I can just set the cache size to whatever I want, I can run a 
> job with different cache parameters and find at least a first order guess of 
> how the job responds to cache size. This is the most useful thing for me 
> right now.

That's true. If you want to run 6 seperate processes on your 6-core, and
you have 25MB L3, you should check
how one process works with 25MB /6 L3, ie. around 4MB. As processes do
not share address space.

However, if you have multithreaded code: cachegrind currently does not
simulate shared L3 for multithreaded
code, but expects the full hierarchy to be private for each thread.

> It's a bit daunting when someone says I want millions of simulations running 
> as fast as possible. That is a good hard challenge. I've done work on long 
> lasting compute bound jobs, but this has a bunch more moving parts than just 
> the sparse matrix solver and forward error tests.

At least it sounds embarrasing parallel. Not too bad.

Josef

> 
> jerry
> 
> ----- Original Message -----
> | From: "Tom Hughes" <t...@compton.nu>
> | To: "Jerry Scharf" <jsch...@finsix.com>, 
> Valgrind-users@lists.sourceforge.net
> | Sent: Thursday, July 3, 2014 12:32:59 AM
> | Subject: Re: cachegrind for Xeon e5-2643v2
> | 
> | On 03/07/14 07:06, Jerry Scharf wrote:
> | 
> | > Why is it insisting that the numbers be powers of 2? If I use --LL
> | > with the real numbers it refuses to start. If I round these down
> | > to the next lower power of 2 (16M and 16 associations) it doesn't
> | > complain but it still doesn't run.
> | 
> | The internal design of the data structures used by cachegrind assumes
> | that sizes will be powers of two and that it can compute indexes by
> | bit
> | masking.
> | 
> | That is, as you have found, no longer true for modern processors, but
> | to
> | date nobody has stepped up to fix cachegrind. On top of that it isn't
> | clear that there is a good fix that wouldn't produce a loss of
> | performance.
> | 
> | Tom
> | 
> | --
> | Tom Hughes (t...@compton.nu)
> | http://compton.nu/
> | 
> 
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Valgrind-users mailing list
> Valgrind-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
> 

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Re: [Valgrind-users] cachegrind for Xeon e5-2643v2

Reply via email to