Re: [m5-dev] Implementation of findTagInSet

2010-12-20 Thread Beckmann, Brad
Hi Nilay, I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning. Overall, your patches are definitely solid steps in the right direction and your profiling data sounds very promising. If

Re: [m5-dev] Implementation of findTagInSet

2010-12-20 Thread Steve Reinhardt
On Mon, Dec 20, 2010 at 8:21 AM, Beckmann, Brad brad.beckm...@amd.comwrote: Hi Nilay, I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning. Overall, your patches are definitely solid

Re: [m5-dev] Implementation of findTagInSet (fwd)

2010-12-20 Thread Nilay Vaish
These profile results from testing ALPHA_FS_MESI_CMP_directory with configs/example/ruby_fs.py. The simulation was allowed to run for 200,000,000,000 ticks. Profile Result with unmodified SLICC % cumulative self self total time seconds secondscalls s/call

Re: [m5-dev] Implementation of findTagInSet (fwd)

2010-12-20 Thread Steve Reinhardt
Nice work! No need to send the full profile, but what is the net speedup here? It seems like we should have eliminated about 10% of the runtime, but I wanted to verify that. Also, what workload are you running on top? With all the time spent in PerfectSwitch I'm guessing there's a lot of

Re: [m5-dev] Implementation of findTagInSet (fwd)

2010-12-20 Thread Nilay Vaish
I am running m5.prof multiple times to get an idea of average performance. I will get back to you later today with the numbers. Thanks Nilay On Mon, 20 Dec 2010, Steve Reinhardt wrote: Nice work! No need to send the full profile, but what is the net speedup here? It seems like we should

Re: [m5-dev] Implementation of findTagInSet

2010-12-19 Thread Nilay Vaish
Brad I have tested the changes that I made to files relating to SLICC and MESI_CMP_directory protocol. I see a 90% decrease in the number of calls to isTagPresent() when I run m5.prof for 200,000,000,000 ticks using configs/examples/ruby_fs.py. Thanks Nilay On Fri, 17 Dec 2010, Nilay

Re: [m5-dev] Implementation of findTagInSet

2010-12-11 Thread Nilay Vaish
Brad We would need to change the lookup functions for TBETable and CacheMemory. Currently the lookup functions assume that the address passed on to the lookup is present. This requires two lookups to the data structures associated with these classes, one for checking whether the address is in

Re: [m5-dev] Implementation of findTagInSet

2010-12-09 Thread Nilay Vaish
to the end or removed entirely. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Wednesday, December 08, 2010 11:53 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad, A couple

Re: [m5-dev] Implementation of findTagInSet

2010-12-09 Thread Beckmann, Brad
...@m5sim.org] On Behalf Of Nilay Vaish Sent: Thursday, December 09, 2010 5:24 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad Is there way to access the StateMachine object inside any of the AST class functions? I know the name of the machine can be accessed

Re: [m5-dev] Implementation of findTagInSet

2010-12-09 Thread Nilay Vaish
It works perfectly. Thanks! Nilay On Thu, 9 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, I believe a machine can be accessed within AST class functions, though I don't remember ever doing it myself. Look at the generate() function in TypeFieldEnumAST. Here you see that the machine

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Beckmann, Brad
: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass the cache entry around, or should

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Steve Reinhardt
...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 5:21 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Nilay Vaish
Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Beckmann, Brad
-Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Wednesday, December 08, 2010 11:53 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad, A couple of observations a. If we make use of pointers, would

Re: [m5-dev] Implementation of findTagInSet

2010-12-07 Thread Nilay Vaish
I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote:

Re: [m5-dev] Implementation of findTagInSet

2010-12-07 Thread Beckmann, Brad
...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls

Re: [m5-dev] Implementation of findTagInSet

2010-12-07 Thread Nilay Vaish
this over a phone conversation. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC

Re: [m5-dev] Implementation of findTagInSet

2010-12-03 Thread Nilay Vaish
This what I have thought of. Currently, doTransition() function takes in the cache state of the address that is being supplied. This function further calls setState() function, one of functions that repeatedly calls isTagPresent(). Instead, if we pass the cache state and the cache Entry

Re: [m5-dev] Implementation of findTagInSet

2010-11-30 Thread Nilay Vaish
, November 27, 2010 11:40 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Is it not possible to redesign the functions to accept CacheEntry as a paramemter instead of a Address parameter? ___ m5-dev mailing list m5-dev

Re: [m5-dev] Implementation of findTagInSet

2010-11-29 Thread Beckmann, Brad
-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Saturday, November 27, 2010 11:40 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Is it not possible to redesign the functions to accept CacheEntry as a paramemter instead of a Address

Re: [m5-dev] Implementation of findTagInSet

2010-11-27 Thread Nilay Vaish
I conducted an experiment to figure out how many calls are made to the hash table to check if the given address exists in the cache. For the same setup as before, less than 10% calls are made. That is out of about 880,000,000 calls to the isTagPresent function, only about 81,000,000 actually

Re: [m5-dev] Implementation of findTagInSet

2010-11-27 Thread Nilay Vaish
Is it not possible to redesign the functions to accept CacheEntry as a paramemter instead of a Address parameter? On Sat, 27 Nov 2010, Nilay Vaish wrote: I conducted an experiment to figure out how many calls are made to the hash table to check if the given address exists in the cache. For

Re: [m5-dev] Implementation of findTagInSet

2010-11-26 Thread Nilay Vaish
I profiled the un-modified and the modified m5 ten times (this time there was no load on the machine). Here are the average results: % time std. dev actual time std. dev un-modified isTagPresent 19.99 0.35 47.17 1.23 cumulative 100 0.00

Re: [m5-dev] Implementation of findTagInSet

2010-11-26 Thread Steve Reinhardt
Hi Nilay, Good job, this is clearly progress... you've sped up isTagPresent by 2X and the simulation overall by almost 10%. That's nothing to sneeze at. It's sad that isTagPresent is still the top function though. Can you do some tracing or other experiments to get a feel for whether keeping

Re: [m5-dev] Implementation of findTagInSet

2010-11-25 Thread Nilay Vaish
Brad and I had a discussion on Tuesday. We are still thinking how to resolve this issue. As a stop gap arrangement, I added a couple of variables to the CacheMemory class which track the last address for which the lookup was performed. I am posting the results from profiling before and after

Re: [m5-dev] Implementation of findTagInSet

2010-11-23 Thread Steve Reinhardt
Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag lookups for a single access; I don't know if that's just an API change or if that's something that needs to be folded into SLICCer.

Re: [m5-dev] Implementation of findTagInSet

2010-11-23 Thread Nilay Vaish
Brad and I will be having a discussion today on how to resolve this issue. -- Nilay On Tue, 23 Nov 2010, Steve Reinhardt wrote: Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag

Re: [m5-dev] Implementation of findTagInSet

2010-11-16 Thread Steve Reinhardt
I'm not the guy to ask for that... but actually I doubt the protocol itself matters that much, you just need to look at the code path that gets exercised on an L1 cache hit and see where the calls are. That part should be almost if not entirely independent of the coherence protocol. Steve On

Re: [m5-dev] Implementation of findTagInSet

2010-11-16 Thread Nilay Vaish
On Fri, 12 Nov 2010, Steve Reinhardt wrote: Right now I am profiling with coherence protocol as MOESI_hammer. I am thinking of profiling using a different protocol to make sure that it is not an artifact of the protocol in use. That sounds like a good idea. All in all, we would ideally

Re: [m5-dev] Implementation of findTagInSet

2010-11-16 Thread Nilay Vaish
I was looking at the MOESI hammer protocol. I think Steve's observation that extra tag lookups are going on in the cache is correct. In particular I noticed that in the getState() and setState() functions, first isTagPresent(address) is called and on the basis of the result (which is true or

Re: [m5-dev] Implementation of findTagInSet

2010-11-12 Thread Nilay Vaish
I tried couple of ideas for improving the performance of the findTagInSet() function. These include having one hash table per cache set and replacing the hash table with a two dimensional array indexed using cache set, cache way. Neither of these ideas showed significant enough change in the

Re: [m5-dev] Implementation of findTagInSet

2010-11-07 Thread Nilay Vaish
I went through the implementation of hash_map in C++. I realized that the number of buckets get resized on the fly as the number of elements increase. This means that we would have more than num_of_cache_sets * num_ways buckets in the hash table. -- Nilay On Sat, 6 Nov 2010, Nilay Vaish

Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Nilay Vaish
I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds

Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Steve Reinhardt
You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve On Fri, Nov 5, 2010 at 7:58 AM,

Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Nilay Vaish
Do you know what hash function is in use? Seems to me that the default hash function is to hash to self. May be we should test with a different hash function. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: You can look at the call graph profile further down in the gprof output to

Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Nilay Vaish
I had another look at the profile output. On the machine that I am using (a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming that the pipeline is functioning at is best, I think the number of uops executed would be ~500. Is that too much for this function? -- Nilay

Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Steve Reinhardt
If that's where a significant amount of time is being spent, we need to either call it less or make it run faster :-). Doing both is even better. At a high level, the process of looking something up in an N-way associative cache should not take that many instructions if N is small (a shift and

Re: [m5-dev] Implementation of findTagInSet

2010-11-04 Thread Nilay Vaish
I tried running ruby_fs.py, below is the error message that I received. I don't think there is any documentation or mailing list discussion on how to run ruby_fs.py. To me it seems that some parameter relating to the DMA controller is missing from the command I tried out. -- Nilay

Re: [m5-dev] Implementation of findTagInSet

2010-11-04 Thread Steve Reinhardt
You also have to build a binary that supports ruby, like ALPHA_FS_MOESI_hammer. If you can't get that to work, try ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the workload you run doesn't really matter that much as long as it's long enough to get a meaningful profile.

Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Nilay Vaish
I profiled M5 but surprisingly I did not find any mention of the function findTagInSet() in the output obtained from gprof. Does it matter what coherence protocol is in use? I carried out the following step - 1. Compiled m5.prof using scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof

Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Steve Reinhardt
What was the gprof output? On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled M5 but surprisingly I did not find any mention of the function findTagInSet() in the output obtained from gprof. Does it matter what coherence protocol is in use? I carried out the

Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Nilay
I ran m5.prof two times. Here are the top five functions -- % cumulative self self total time seconds secondscalls s/call s/call name 8.35 5.05 5.05 58969209 0.00 0.00 BaseSimpleCPU::preExecute() 6.32 8.87 3.82 58975463 0.00

Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Steve Reinhardt
Ah, the issue is that you're using the old M5 memory hierarchy and not Ruby. You need to run one of the Ruby versions, and use ruby_fs.py instead of fs.py. Steve On Wed, Nov 3, 2010 at 9:00 PM, Nilay ni...@cs.wisc.edu wrote: I ran m5.prof two times. Here are the top five functions -- %

Re: [m5-dev] Implementation of findTagInSet

2010-11-02 Thread Steve Reinhardt
I just compiled m5.prof and ran it (forgot what workload I ran on it, probably one of the parsec benchmarks; it probably doesn't matter a lot). If you've never used gprof before, this is a great time to learn! Steve On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I am