On Mon, 24 Jan 2011, Nilay Vaish wrote:
On Mon, 24 Jan 2011, Steve Reinhardt wrote:
Yes, that's right. So there's probably no big win in trying to further
reduce the number of calls to lookup() in Ruby; the possibilities I see for
improvement are:
1. Adding an instruction buffer to SimpleCPU
On Thu, Jan 27, 2011 at 4:36 AM, Nilay Vaish ni...@cs.wisc.edu wrote:
I tried caching the index for the MRU block, so that the hash table need
not be looked up. It is hard to point if there is a speed up or not. When
I run m5.prof, profile results show that time taken by
CacheMemory::lookup()
From Steve's response,
it looks like I'm jumping in the conversation on the wrong page.
To be clear,
Nilay were you optimizing the lookup() calls or trying to reduce the number
of times lookup gets called? My MRU comments and keeping things in the the
SimpleCPU was directed toward the latter.
On Thu, 27 Jan 2011, Korey Sewell wrote:
From Steve's response,
it looks like I'm jumping in the conversation on the wrong page.
To be clear,
Nilay were you optimizing the lookup() calls or trying to reduce the number
of times lookup gets called? My MRU comments and keeping things in the the
On Thu, 27 Jan 2011, Steve Reinhardt wrote:
On Thu, Jan 27, 2011 at 4:36 AM, Nilay Vaish ni...@cs.wisc.edu wrote:
I tried caching the index for the MRU block, so that the hash table need
not be looked up. It is hard to point if there is a speed up or not. When
I run m5.prof, profile results
On Thu, Jan 27, 2011 at 4:01 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
I also have an implementation that performs linear search of cache sets
instead of hash table lookup. Again, I saw small improvement. But as you
mentioned, when the associativity will go up, linear search will perform
On Sun, Jan 23, 2011 at 4:08 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
On Sun, 23 Jan 2011, Korey Sewell wrote:
In sendFetch(), it calls sendTiming() which would then call the recvTiming
on the cache port since those two should be binded as peers.
I'm a little unsure of how the RubyPort,
On Mon, 24 Jan 2011, Steve Reinhardt wrote:
On Sun, Jan 23, 2011 at 4:08 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
On Sun, 23 Jan 2011, Korey Sewell wrote:
In sendFetch(), it calls sendTiming() which would then call the recvTiming
on the cache port since those two should be binded as peers.
Steve, we can try caching MRU cache block. We can also try replacing hash
table with a two dimensional array indexed using cache set and cache way.
This should at least show some decent speedup (depending on SMC code). The
O3 caches the MRU and ironically, I had just patched the InOrder model
Quoting Steve Reinhardt ste...@gmail.com:
Gabe, how many bytes at a time does the x86 predecoder fetch? If it
doesn't currently grab a cache line at a time, could it be made to do so,
and do you know if that would cause any issues with SMC?
All of the predecoders expect to receive one
I dug more in to the code today. There are three paths along which calls
are made to the RubyPort::M5Port::recvTiming(), which eventually results
in calls to CacheMemory::lookup().
1. TimingSimpleCPU::sendFetch() - 140 million
2. TimingSimpleCPU::handleReadPacket() - 30 million
3.
On Sun, 23 Jan 2011, Korey Sewell wrote:
In sendFetch(), it calls sendTiming() which would then call the recvTiming
on the cache port since those two should be binded as peers.
I'm a little unsure of how the RubyPort, Sequencer, CacheMemory, and
CacheController (?) relationship is working
I profiled m5 again, using the following command.
./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
--maxtick 2000 -n 8 --topology Mesh --mesh-rows 2 --num-l2cache 8
--num-dir 8
Results have been copied below. CacheMemory::lookup() still consumes some
time but is
What's the deal with Histogram::add()? Either it's too slow or it's being
called too much, I'd say, unless we're tracking some incredibly vital
statistics there. Can you use the call graph part of the profile to find
where most of the calls are coming from?
Also, can you look at the stats and
What's the deal with Histogram::add()? Either it's too slow or it's being
called too much, I'd say, unless we're tracking some incredibly vital
statistics there. Can you use the call graph part of the profile to find
where most of the calls are coming from?
Don't spend too much time fixing
Some more data from the same simulation, this time from m5out/stats.txt.
The number of memory references is slightly more than 30,000,000 where as
the number of lookups in the cache is about 256,000,000. So that would be
a ratio of 1 : 8.5. I suspect that the reason for this might be that every
On Wed, Jan 19, 2011 at 3:56 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
Some more data from the same simulation, this time from m5out/stats.txt.
The number of memory references is slightly more than 30,000,000 where as
the number of lookups in the cache is about 256,000,000. So that would be a
Do you mean that I do a lookup in all 8 caches for each reference? Is this
part of an assertion that's checking coherence invariants? It seems like
that would be something that we'd want to only do in debug mode (or maybe
only when explicitly enabled).
Seems like something that we could
On Wed, 19 Jan 2011, Steve Reinhardt wrote:
On Wed, Jan 19, 2011 at 3:56 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
Some more data from the same simulation, this time from m5out/stats.txt.
The number of memory references is slightly more than 30,000,000 where as
the number of lookups in the
19 matches
Mail list logo