I am using Cachegrind to study the cache miss behavior of our program. I need to collect the last level cache misses in bytes. According to the online manual http://valgrind.org/docs/manual/cg-manual.html (related parts are copied at the end for reference), Cachegrind outputs the number of misses.
To my understanding, the amount in bytes can be calculated as follows
Misses in bytes = (the number of misses) * (line size of the cache)
where the line size can be configured with option such as
|--LL=<size>,<associativity>,<line size>. |Is my understanding correct? If not,
could you tell me how to calculate the misses in bytes?
That product is the total bytes of traffic that are caused by misses.
But it ignores the width of the bus, which determines the duration
of the transfers. Most desktop computers have a 64-bit data bus
(72 bits if ECC) to DDR3 or DDR4 SDRAM. Some embedded devices
have a 32-bit bus (or even narrower). Desktop video graphic
display cards usually have 32, 64, 128, 192, 256, or 384 bits
[and no cache :-)] The bus width to L1 and L2 caches can be wider.
It's 128 or 256 bits on PowerPC chips, for instance.
(Yes: the icache fetches 4 or 8 32-bit instructions at a time,
and all can be decoded and executed in parallel except for
dataflow constraints. Aligning branch destinations to 32-byte
boundaries might make a big difference in execution speed.)
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users