Hi Nilay,
I apologize for the delay, but I was mostly travelling / in meetings last week
and I didn't have a chance to review your patches and emails until this morning.
Overall, your patches are definitely solid steps in the right direction and
your profiling data sounds very promising. If
On Mon, Dec 20, 2010 at 8:21 AM, Beckmann, Brad brad.beckm...@amd.comwrote:
Hi Nilay,
I apologize for the delay, but I was mostly travelling / in meetings last
week and I didn't have a chance to review your patches and emails until this
morning.
Overall, your patches are definitely solid
These profile results from testing ALPHA_FS_MESI_CMP_directory with
configs/example/ruby_fs.py. The simulation was allowed to run for
200,000,000,000 ticks.
Profile Result with unmodified SLICC
% cumulative self self total
time seconds secondscalls s/call
Nice work! No need to send the full profile, but what is the net speedup
here? It seems like we should have eliminated about 10% of the runtime, but
I wanted to verify that.
Also, what workload are you running on top? With all the time spent in
PerfectSwitch I'm guessing there's a lot of
I am running m5.prof multiple times to get an idea of average performance.
I will get back to you later today with the numbers.
Thanks
Nilay
On Mon, 20 Dec 2010, Steve Reinhardt wrote:
Nice work! No need to send the full profile, but what is the net speedup
here? It seems like we should
Brad
I have tested the changes that I made to files relating to SLICC and
MESI_CMP_directory protocol. I see a 90% decrease in the number of calls
to isTagPresent() when I run m5.prof for 200,000,000,000 ticks using
configs/examples/ruby_fs.py.
Thanks
Nilay
On Fri, 17 Dec 2010, Nilay
Brad
We would need to change the lookup functions for TBETable and CacheMemory.
Currently the lookup functions assume that the address passed on to the
lookup is present. This requires two lookups to the data structures
associated with these classes, one for checking whether the address is in
to the
end or removed entirely.
Brad
-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of
Nilay Vaish
Sent: Wednesday, December 08, 2010 11:53 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Hi Brad,
A couple
...@m5sim.org] On
Behalf Of Nilay Vaish
Sent: Thursday, December 09, 2010 5:24 PM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Hi Brad
Is there way to access the StateMachine object inside any of the AST
class
functions? I know the name of the machine can be accessed
It works perfectly. Thanks!
Nilay
On Thu, 9 Dec 2010, Beckmann, Brad wrote:
Hi Nilay,
Yes, I believe a machine can be accessed within AST class functions, though I
don't remember ever doing it myself. Look at the generate() function in
TypeFieldEnumAST. Here you see that the machine
: [m5-dev] Implementation of findTagInSet
Brad,
Let's try to break the required changes into small portions. Given my feeble
knowledge of Ruby, it would be for me to visualize what change is going to have
what effect.
One question, should we use pointers to pass the cache entry around, or should
...@m5sim.org] On Behalf
Of Nilay Vaish
Sent: Tuesday, December 07, 2010 5:21 PM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Brad,
Let's try to break the required changes into small portions. Given my
feeble knowledge of Ruby, it would be for me to visualize
Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Brad,
Let's try to break the required changes into small portions. Given my feeble
knowledge of Ruby, it would be for me to visualize what change is going to have
what effect.
One question, should we use pointers to pass
-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of
Nilay Vaish
Sent: Wednesday, December 08, 2010 11:53 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Hi Brad,
A couple of observations
a. If we make use of pointers, would
I have made changes to SLICC to support local reference variables. I think
we should reference variables in functions where back to back calls are
made to lookup/getCacheEntry functions.
Overall, I am still unclear how can we handle this issue.
Nilay
On Tue, 30 Nov 2010, Nilay Vaish wrote:
...@m5sim.org] On Behalf Of
Nilay Vaish
Sent: Tuesday, December 07, 2010 12:16 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
I have made changes to SLICC to support local reference variables. I think we
should reference variables in functions where back to back calls
this over a phone conversation.
Brad
-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of
Nilay Vaish
Sent: Tuesday, December 07, 2010 12:16 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
I have made changes to SLICC
This what I have thought of. Currently, doTransition() function takes in
the cache state of the address that is being supplied. This function
further calls setState() function, one of functions that repeatedly calls
isTagPresent(). Instead, if we pass the cache state and the cache Entry
, November 27, 2010 11:40 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Is it not possible to redesign the functions to accept CacheEntry as a
paramemter instead of a Address parameter?
___
m5-dev mailing list
m5-dev
-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of
Nilay Vaish
Sent: Saturday, November 27, 2010 11:40 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet
Is it not possible to redesign the functions to accept CacheEntry as a
paramemter instead of a Address
I conducted an experiment to figure out how many calls are made to the
hash table to check if the given address exists in the cache. For the same
setup as before, less than 10% calls are made. That is out of about
880,000,000 calls to the isTagPresent function, only about 81,000,000
actually
Is it not possible to redesign the functions to accept CacheEntry as a
paramemter instead of a Address parameter?
On Sat, 27 Nov 2010, Nilay Vaish wrote:
I conducted an experiment to figure out how many calls are made to the hash
table to check if the given address exists in the cache. For
I profiled the un-modified and the modified m5 ten times (this time there
was no load on the machine). Here are the average results:
% time std. dev actual time std. dev
un-modified
isTagPresent 19.99 0.35 47.17 1.23
cumulative 100 0.00
Hi Nilay,
Good job, this is clearly progress... you've sped up isTagPresent by 2X and
the simulation overall by almost 10%. That's nothing to sneeze at. It's
sad that isTagPresent is still the top function though. Can you do some
tracing or other experiments to get a feel for whether keeping
Brad and I had a discussion on Tuesday. We are still thinking how to
resolve this issue.
As a stop gap arrangement, I added a couple of variables to the
CacheMemory class which track the last address for which the lookup was
performed. I am posting the results from profiling before and after
Thanks for tracking that down; that confirms my suspicions.
I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.
Brad and I will be having a discussion today on how to resolve this issue.
--
Nilay
On Tue, 23 Nov 2010, Steve Reinhardt wrote:
Thanks for tracking that down; that confirms my suspicions.
I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag
I'm not the guy to ask for that... but actually I doubt the protocol itself
matters that much, you just need to look at the code path that gets
exercised on an L1 cache hit and see where the calls are. That part should
be almost if not entirely independent of the coherence protocol.
Steve
On
On Fri, 12 Nov 2010, Steve Reinhardt wrote:
Right now I am profiling with coherence protocol as MOESI_hammer. I am
thinking of profiling using a different protocol to make sure that it is not
an artifact of the protocol in use.
That sounds like a good idea.
All in all, we would ideally
I was looking at the MOESI hammer protocol. I think Steve's observation
that extra tag lookups are going on in the cache is correct. In particular
I noticed that in the getState() and setState() functions, first
isTagPresent(address) is called and on the basis of the result (which is
true or
I tried couple of ideas for improving the performance of the
findTagInSet() function. These include having one hash table per cache set
and replacing the hash table with a two dimensional array indexed using
cache set, cache way. Neither of these ideas showed significant enough
change in the
I went through the implementation of hash_map in C++. I realized that the
number of buckets get resized on the fly as the number of elements
increase. This means that we would have more than num_of_cache_sets *
num_ways buckets in the hash table.
--
Nilay
On Sat, 6 Nov 2010, Nilay Vaish
I ran ALPHA_FS_MOESI_hammer using the following command --
./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py
I don't know how the benchmark is picked in case none is specified. Below
is the gprof output --
% cumulative self self total
time seconds
You can look at the call graph profile further down in the gprof output to
figure out how much time is spent in functions that get called from
isTagPresent. If it's not specifically calling out findTagInSet, it may be
because it's inlined in isTagPresent.
Steve
On Fri, Nov 5, 2010 at 7:58 AM,
Do you know what hash function is in use? Seems to me that the default
hash function is to hash to self. May be we should test with a different
hash function.
--
Nilay
On Fri, 5 Nov 2010, Steve Reinhardt wrote:
You can look at the call graph profile further down in the gprof output to
I had another look at the profile output. On the machine that I am using
(a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns.
Assuming that the pipeline is functioning at is best, I think the number
of uops executed would be ~500. Is that too much for this function?
--
Nilay
If that's where a significant amount of time is being spent, we need to
either call it less or make it run faster :-). Doing both is even better.
At a high level, the process of looking something up in an N-way associative
cache should not take that many instructions if N is small (a shift and
I tried running ruby_fs.py, below is the error message that I received. I
don't think there is any documentation or mailing list discussion on how
to run ruby_fs.py. To me it seems that some parameter relating to the DMA
controller is missing from the command I tried out.
--
Nilay
You also have to build a binary that supports ruby, like
ALPHA_FS_MOESI_hammer. If you can't get that to work, try
ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the
workload you run doesn't really matter that much as long as it's long enough
to get a meaningful profile.
I profiled M5 but surprisingly I did not find any mention of the function
findTagInSet() in the output obtained from gprof. Does it matter what
coherence protocol is in use? I carried out the following step -
1. Compiled m5.prof using
scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof
What was the gprof output?
On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
I profiled M5 but surprisingly I did not find any mention of the function
findTagInSet() in the output obtained from gprof. Does it matter what
coherence protocol is in use? I carried out the
I ran m5.prof two times. Here are the top five functions --
% cumulative self self total
time seconds secondscalls s/call s/call name
8.35 5.05 5.05 58969209 0.00 0.00
BaseSimpleCPU::preExecute()
6.32 8.87 3.82 58975463 0.00
Ah, the issue is that you're using the old M5 memory hierarchy and not
Ruby. You need to run one of the Ruby versions, and use ruby_fs.py instead
of fs.py.
Steve
On Wed, Nov 3, 2010 at 9:00 PM, Nilay ni...@cs.wisc.edu wrote:
I ran m5.prof two times. Here are the top five functions --
%
I just compiled m5.prof and ran it (forgot what workload I ran on it,
probably one of the parsec benchmarks; it probably doesn't matter a lot).
If you've never used gprof before, this is a great time to learn!
Steve
On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote:
I am
44 matches
Mail list logo