Am 03.07.2014 19:22, schrieb Jerry Scharf: > Tom and Josef, > > Thank you for your speedy responses. > > Is cachegrind an instrumentation of the real system or a simulation of the > processor cache based on the code executed? From your responses, it sounds > like the later.
Yes. For the first, use perf/PAPI/oprofile/VTUNE/... which use performance counters of your processor. > If so, this is actually better for what I want to do. > > Because of the fact that the cache is shared among the cores, it is hard to > tell from single runs what it will be like with concurrent jobs. If it is > simulated and I can just set the cache size to whatever I want, I can run a > job with different cache parameters and find at least a first order guess of > how the job responds to cache size. This is the most useful thing for me > right now. That's true. If you want to run 6 seperate processes on your 6-core, and you have 25MB L3, you should check how one process works with 25MB /6 L3, ie. around 4MB. As processes do not share address space. However, if you have multithreaded code: cachegrind currently does not simulate shared L3 for multithreaded code, but expects the full hierarchy to be private for each thread. > It's a bit daunting when someone says I want millions of simulations running > as fast as possible. That is a good hard challenge. I've done work on long > lasting compute bound jobs, but this has a bunch more moving parts than just > the sparse matrix solver and forward error tests. At least it sounds embarrasing parallel. Not too bad. Josef > > jerry > > ----- Original Message ----- > | From: "Tom Hughes" <t...@compton.nu> > | To: "Jerry Scharf" <jsch...@finsix.com>, > Valgrind-users@lists.sourceforge.net > | Sent: Thursday, July 3, 2014 12:32:59 AM > | Subject: Re: cachegrind for Xeon e5-2643v2 > | > | On 03/07/14 07:06, Jerry Scharf wrote: > | > | > Why is it insisting that the numbers be powers of 2? If I use --LL > | > with the real numbers it refuses to start. If I round these down > | > to the next lower power of 2 (16M and 16 associations) it doesn't > | > complain but it still doesn't run. > | > | The internal design of the data structures used by cachegrind assumes > | that sizes will be powers of two and that it can compute indexes by > | bit > | masking. > | > | That is, as you have found, no longer true for modern processors, but > | to > | date nobody has stepped up to fix cachegrind. On top of that it isn't > | clear that there is a good fix that wouldn't produce a loss of > | performance. > | > | Tom > | > | -- > | Tom Hughes (t...@compton.nu) > | http://compton.nu/ > | > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Valgrind-users mailing list > Valgrind-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/valgrind-users > ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users