Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning. Overall, your patches are definitely solid steps in the right direction and your profiling data sounds very promising. If you get the chance, please send it to me. I would be interested to know what are the top performance bottlenecks after your change. Before you spend time converting the other protocols, I do want to discuss the three points you brought up last week (see below). I have a bunch of free time over the next three days (Mon. - Wed.) and I do think a telephone conversation is best to discuss these details. Let me know what times work for you. Brad 1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the calls to doTransition() function. To set these, we would need to make calls to a function that returns the pointer if the address is in the cache, NULL otherwise. I think we should retain the getEntry functions in the .sm files for in case of L1 cache both instruction and the data cache needs to be checked. This is something that I probably would prefer keeping out of SLICC. In fact, we should add getEntry functions for TBEs where ever required. These getEntry would now return a pointer instead of a reference. We would need to add support for return_by_pointer to SLICC. Also, since these functions would be used inside the Wakeup function, we would need to assume a common name for them across all protocols, just like getState() function. [BB] I would be very interested why you believe we should keep the getEntry functions out of SLICC. In my mind, this is one of the few functions that is very consistent across protocols. As I mentioned before, I really want to keep any notion of pointers out of the .sm files and avoid the changes you are proposing to getCacheEntry. We should probably discuss this in detail over-the-phone. 2. I still think we would need to change the changePermission function in the CacheMemory class. Presently it calls findTagInSet() twice. Instead, we would pass on the CacheEntry whose permissions need to be changed. This would save one call. We should also put the variable m_locked in the AbstractCacheEntry (may be make it part of the permission variable) to avoid the second call. [BB] I like moving the locked field to AbstractCacheEntry and removing the separate m_locked data structure. However, just a minor point, but we should avoid duplicating code in CacheMemory to support this change. Other than that, this looks good to me. 3. In the getState() and setState() functions, we need to specify that the function assumes that implicit TBE and CacheEntry pointers have been passed as arguments. How should we do this? I think we would need to push them in to the symbol table before they can be used in side the function. [BB] I'm a little confused by your current patch. It appears that you are proposing having two pairs of getState and setState functions. I would really like to avoid that and just have one pair of getState and setState functions. Also when I say implicitly pass the TBE and CacheEntry pointers, I mean that for the actions (similar to address). However, I think it is fine to explicitly pass these parameters into getState and setState (also similar to Address and State). ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
On Mon, Dec 20, 2010 at 8:21 AM, Beckmann, Brad brad.beckm...@amd.comwrote: Hi Nilay, I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning. Overall, your patches are definitely solid steps in the right direction and your profiling data sounds very promising. If you get the chance, please send it to me. I would be interested to know what are the top performance bottlenecks after your change. Before you spend time converting the other protocols, I do want to discuss the three points you brought up last week (see below). I have a bunch of free time over the next three days (Mon. - Wed.) and I do think a telephone conversation is best to discuss these details. Let me know what times work for you. Ditto for me on basically all of Brad's points. I'd like to see where the profile stands now. I'm also interested in catching up on Nilay's current changes; I'll try to read through the patches today. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet (fwd)
These profile results from testing ALPHA_FS_MESI_CMP_directory with configs/example/ruby_fs.py. The simulation was allowed to run for 200,000,000,000 ticks. Profile Result with unmodified SLICC % cumulative self self total time seconds secondscalls s/call s/call name 12.19 34.5134.51 551229802 0.00 0.00 CacheMemory::isTagPresent(Address const) const 8.41 58.3323.82 17760155 0.00 0.00 PerfectSwitch::wakeup() 4.49 71.0312.70 235904391 0.00 0.00 Histogram::add(long long) 2.54 78.23 7.20 172127510 0.00 0.00 CacheMemory::lookup(Address const) 2.33 84.82 6.59 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtrMessage, long long) 2.10 90.77 5.95 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 2.06 96.61 5.84 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 1.95102.12 5.51 43900461 0.00 0.00 RubyPort::M5Port::recvTiming(Packet*) 1.93107.58 5.46 580192104 0.00 0.00 Set::Set(Set const) 1.92113.02 5.44 46506080 0.00 0.00 L1Cache_Controller::wakeup() Result with modified SLICC % cumulative self self total time seconds secondscalls s/call s/call name 9.97 24.7824.78 17760155 0.00 0.00 PerfectSwitch::wakeup() 5.42 38.2713.49 101906879 0.00 0.00 CacheMemory::lookup_ptr(Address const) 5.32 51.5013.23 235904391 0.00 0.00 Histogram::add(long long) 2.30 57.21 5.71 580192104 0.00 0.00 Set::Set(Set const) 2.29 62.91 5.70 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtrMessage, long long) 2.19 68.36 5.45 46506080 0.00 0.00 L1Cache_Controller::wakeup() 2.14 73.67 5.31 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 2.10 78.89 5.22 11125106 0.00 0.00 MemoryControl::executeCycle() 2.06 84.02 5.13 96775149 0.00 0.00 RubyEventQueueNode::process() 1.98 88.94 4.92 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) . . . 1.30121.31 3.23 51172611 0.00 0.00 CacheMemory::isTagPresent(Address const) const I can send the complete data generated by gprof, if required. I have inlined my comments. On Mon, 20 Dec 2010, Beckmann, Brad wrote: Hi Nilay, I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning. Overall, your patches are definitely solid steps in the right direction and your profiling data sounds very promising. If you get the chance, please send it to me. I would be interested to know what are the top performance bottlenecks after your change. Before you spend time converting the other protocols, I do want to discuss the three points you brought up last week (see below). I have a bunch of free time over the next three days (Mon. - Wed.) and I do think a telephone conversation is best to discuss these details. Let me know what times work for you. The semester is over, so I am available almost throughout the day. Today, I have a meeting at 3, which I think should be at most an hour long. Over next two days, I do not have any thing scheduled so far. So any time will work. Brad 1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the calls to doTransition() function. To set these, we would need to make calls to a function that returns the pointer if the address is in the cache, NULL otherwise. I think we should retain the getEntry functions in the .sm files for in case of L1 cache both instruction and the data cache needs to be checked. This is something that I probably would prefer keeping out of SLICC. In fact, we should add getEntry functions for TBEs where ever required. These getEntry would now return a pointer instead of a reference. We would need to add support for return_by_pointer to SLICC. Also, since these functions would be used inside the Wakeup function, we would need to assume a common name for them across all protocols, just like getState() function. [BB] I would be very interested why you believe we should keep the getEntry functions out of SLICC. In my mind, this is one of the few functions that is very consistent across protocols. As I mentioned before, I really want to keep any notion of pointers out of the .sm files and avoid the changes you are proposing to getCacheEntry. We should probably discuss this in detail over-the-phone. We would need to figure out the cache memories machine has, their hierarchy, whether there are I and D caches. In fact, MOESI-hammer has L1I cache, L1D cache and L2 all in the same machine. I think we should not do this analysis in the compiler. 2. I
Re: [m5-dev] Implementation of findTagInSet (fwd)
Nice work! No need to send the full profile, but what is the net speedup here? It seems like we should have eliminated about 10% of the runtime, but I wanted to verify that. Also, what workload are you running on top? With all the time spent in PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're running the tester then that's not so bad, but if you're running a regular program that seems high. Thanks, Steve On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish ni...@cs.wisc.edu wrote: These profile results from testing ALPHA_FS_MESI_CMP_directory with configs/example/ruby_fs.py. The simulation was allowed to run for 200,000,000,000 ticks. Profile Result with unmodified SLICC % cumulative self self total time seconds secondscalls s/call s/call name 12.19 34.5134.51 551229802 0.00 0.00 CacheMemory::isTagPresent(Address const) const 8.41 58.3323.82 17760155 0.00 0.00 PerfectSwitch::wakeup() 4.49 71.0312.70 235904391 0.00 0.00 Histogram::add(long long) 2.54 78.23 7.20 172127510 0.00 0.00 CacheMemory::lookup(Address const) 2.33 84.82 6.59 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtrMessage, long long) 2.10 90.77 5.95 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 2.06 96.61 5.84 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 1.95102.12 5.51 43900461 0.00 0.00 RubyPort::M5Port::recvTiming(Packet*) 1.93107.58 5.46 580192104 0.00 0.00 Set::Set(Set const) 1.92113.02 5.44 46506080 0.00 0.00 L1Cache_Controller::wakeup() Result with modified SLICC % cumulative self self total time seconds secondscalls s/call s/call name 9.97 24.7824.78 17760155 0.00 0.00 PerfectSwitch::wakeup() 5.42 38.2713.49 101906879 0.00 0.00 CacheMemory::lookup_ptr(Address const) 5.32 51.5013.23 235904391 0.00 0.00 Histogram::add(long long) 2.30 57.21 5.71 580192104 0.00 0.00 Set::Set(Set const) 2.29 62.91 5.70 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtrMessage, long long) 2.19 68.36 5.45 46506080 0.00 0.00 L1Cache_Controller::wakeup() 2.14 73.67 5.31 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 2.10 78.89 5.22 11125106 0.00 0.00 MemoryControl::executeCycle() 2.06 84.02 5.13 96775149 0.00 0.00 RubyEventQueueNode::process() 1.98 88.94 4.92 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) . . . 1.30121.31 3.23 51172611 0.00 0.00 CacheMemory::isTagPresent(Address const) const I can send the complete data generated by gprof, if required. I have inlined my comments. On Mon, 20 Dec 2010, Beckmann, Brad wrote: Hi Nilay, I apologize for the delay, but I was mostly travelling / in meetings last week and I didn't have a chance to review your patches and emails until this morning. Overall, your patches are definitely solid steps in the right direction and your profiling data sounds very promising. If you get the chance, please send it to me. I would be interested to know what are the top performance bottlenecks after your change. Before you spend time converting the other protocols, I do want to discuss the three points you brought up last week (see below). I have a bunch of free time over the next three days (Mon. - Wed.) and I do think a telephone conversation is best to discuss these details. Let me know what times work for you. The semester is over, so I am available almost throughout the day. Today, I have a meeting at 3, which I think should be at most an hour long. Over next two days, I do not have any thing scheduled so far. So any time will work. Brad 1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the calls to doTransition() function. To set these, we would need to make calls to a function that returns the pointer if the address is in the cache, NULL otherwise. I think we should retain the getEntry functions in the .sm files for in case of L1 cache both instruction and the data cache needs to be checked. This is something that I probably would prefer keeping out of SLICC. In fact, we should add getEntry functions for TBEs where ever required. These getEntry would now return a pointer instead of a reference. We would need to add support for return_by_pointer to SLICC. Also, since these functions would be used inside the Wakeup function, we would need to assume a common name for them across all protocols, just like getState() function. [BB] I would be very interested why you believe we should keep the getEntry functions out of SLICC. In my mind, this is one of the few
Re: [m5-dev] Implementation of findTagInSet (fwd)
I am running m5.prof multiple times to get an idea of average performance. I will get back to you later today with the numbers. Thanks Nilay On Mon, 20 Dec 2010, Steve Reinhardt wrote: Nice work! No need to send the full profile, but what is the net speedup here? It seems like we should have eliminated about 10% of the runtime, but I wanted to verify that. Also, what workload are you running on top? With all the time spent in PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're running the tester then that's not so bad, but if you're running a regular program that seems high. Thanks, Steve On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish ni...@cs.wisc.edu wrote: These profile results from testing ALPHA_FS_MESI_CMP_directory with configs/example/ruby_fs.py. The simulation was allowed to run for 200,000,000,000 ticks. Profile Result with unmodified SLICC % cumulative self self total time seconds secondscalls s/call s/call name 12.19 34.5134.51 551229802 0.00 0.00 CacheMemory::isTagPresent(Address const) const 8.41 58.3323.82 17760155 0.00 0.00 PerfectSwitch::wakeup() 4.49 71.0312.70 235904391 0.00 0.00 Histogram::add(long long) 2.54 78.23 7.20 172127510 0.00 0.00 CacheMemory::lookup(Address const) 2.33 84.82 6.59 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtrMessage, long long) 2.10 90.77 5.95 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 2.06 96.61 5.84 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 1.95102.12 5.51 43900461 0.00 0.00 RubyPort::M5Port::recvTiming(Packet*) 1.93107.58 5.46 580192104 0.00 0.00 Set::Set(Set const) 1.92113.02 5.44 46506080 0.00 0.00 L1Cache_Controller::wakeup() Result with modified SLICC % cumulative self self total time seconds secondscalls s/call s/call name 9.97 24.7824.78 17760155 0.00 0.00 PerfectSwitch::wakeup() 5.42 38.2713.49 101906879 0.00 0.00 CacheMemory::lookup_ptr(Address const) 5.32 51.5013.23 235904391 0.00 0.00 Histogram::add(long long) 2.30 57.21 5.71 580192104 0.00 0.00 Set::Set(Set const) 2.29 62.91 5.70 93838596 0.00 0.00 MessageBuffer::enqueue(RefCountingPtrMessage, long long) 2.19 68.36 5.45 46506080 0.00 0.00 L1Cache_Controller::wakeup() 2.14 73.67 5.31 34537891 0.00 0.00 BaseSimpleCPU::preExecute() 2.10 78.89 5.22 11125106 0.00 0.00 MemoryControl::executeCycle() 2.06 84.02 5.13 96775149 0.00 0.00 RubyEventQueueNode::process() 1.98 88.94 4.92 105280086 0.00 0.00 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) . . . 1.30121.31 3.23 51172611 0.00 0.00 CacheMemory::isTagPresent(Address const) const I can send the complete data generated by gprof, if required. I have inlined my comments. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Brad I have tested the changes that I made to files relating to SLICC and MESI_CMP_directory protocol. I see a 90% decrease in the number of calls to isTagPresent() when I run m5.prof for 200,000,000,000 ticks using configs/examples/ruby_fs.py. Thanks Nilay On Fri, 17 Dec 2010, Nilay Vaish wrote: Hi Brad I have attached the patch for the changes that I have made so far. This patch, I believe, makes all the required changes to the file MESI_CMP_directory-L1cache.sm, apart from making changes to SLICC. Can you go through this? If this looks fine, then I will make changes to the other protocol files. I think we should have a telephonic discussion on this some time. Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Brad We would need to change the lookup functions for TBETable and CacheMemory. Currently the lookup functions assume that the address passed on to the lookup is present. This requires two lookups to the data structures associated with these classes, one for checking whether the address is in the cache, second one for returning a reference to the actual cache entry. Instead of returning a reference, we can return a pointer to the entry. This pointer will be null in case the address is not present in the cache. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never really straight forward. Let's think it over some more and let me know if you want to discuss this in more detail over-the-phone. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Hi Brad Is there way to access the StateMachine object inside any of the AST class functions? I know the name of the machine can be accessed. But can the machine itself be accessed? I need one of the variables in the StateMachine object to know whether or not TBETable exists in this machine. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, I think we can avoid handling pointers in the getState and setState functions if we also add bool functions is_cache_entry_valid and is_tbe_entry_valid that are implicitly defined in SLICC. I don't think we should try to get rid of getState and setState since they often contain valuable, protocol-specific checks in them. Instead for getState and setState, I believe we should simply replace the current isTagPresent calls with the new is_*_valid calls. As far as changePermission() goes, your solution seems reasonable, but we may also want to consider just not changing that function at all. ChangePermission() doesn't actually use a cache entry within the .sm file, so is doesn't necessarily need to be changed. Going back to breaking this work into smaller portions, that is definitely a portion I feel can be pushed to the end or removed entirely. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Wednesday, December 08, 2010 11:53 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad, A couple of observations a. If we make use of pointers, would we not need to handle them in getState and setState functions? b. changePermission() seems to be a problem. It would still perform a lookup because the fact that a CacheEntry is a locked or not is maintained in the CacheMemory object and not with the entry itself. We can move that variable to be part of the AbstractCacheEntry or we can combine it with the permission variable which is already there in the AbstractCacheEntry class. I think lock is only used in the implementation of LL/SC instructions. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never really straight forward. Let's think it over some more and let me know if you want to discuss this in more detail over-the-phone. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, Yes, I believe a machine can be accessed within AST class functions, though I don't remember ever doing it myself. Look at the generate() function in TypeFieldEnumAST. Here you see that the machine (a.k.a StateMachine) is grabbed from the symbol table and then different StateMachine functions are called on it. You can imagine adding a new function to StateMachine.py that returns whether the TBETable exists. That seems like it should work to me, but let me know if it doesn't. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Thursday, December 09, 2010 5:24 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad Is there way to access the StateMachine object inside any of the AST class functions? I know the name of the machine can be accessed. But can the machine itself be accessed? I need one of the variables in the StateMachine object to know whether or not TBETable exists in this machine. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, I think we can avoid handling pointers in the getState and setState functions if we also add bool functions is_cache_entry_valid and is_tbe_entry_valid that are implicitly defined in SLICC. I don't think we should try to get rid of getState and setState since they often contain valuable, protocol-specific checks in them. Instead for getState and setState, I believe we should simply replace the current isTagPresent calls with the new is_*_valid calls. As far as changePermission() goes, your solution seems reasonable, but we may also want to consider just not changing that function at all. ChangePermission() doesn't actually use a cache entry within the .sm file, so is doesn't necessarily need to be changed. Going back to breaking this work into smaller portions, that is definitely a portion I feel can be pushed to the end or removed entirely. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Wednesday, December 08, 2010 11:53 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad, A couple of observations a. If we make use of pointers, would we not need to handle them in getState and setState functions? b. changePermission() seems to be a problem. It would still perform a lookup because the fact that a CacheEntry is a locked or not is maintained in the CacheMemory object and not with the entry itself. We can move that variable to be part of the AbstractCacheEntry or we can combine it with the permission variable which is already there in the AbstractCacheEntry class. I think lock is only used in the implementation of LL/SC instructions. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never
Re: [m5-dev] Implementation of findTagInSet
It works perfectly. Thanks! Nilay On Thu, 9 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, I believe a machine can be accessed within AST class functions, though I don't remember ever doing it myself. Look at the generate() function in TypeFieldEnumAST. Here you see that the machine (a.k.a StateMachine) is grabbed from the symbol table and then different StateMachine functions are called on it. You can imagine adding a new function to StateMachine.py that returns whether the TBETable exists. That seems like it should work to me, but let me know if it doesn't. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never really straight forward. Let's think it over some more and let me know if you want to discuss this in more detail over-the-phone. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 5:21 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass the cache entry around, or should we make use of reference variables? Currently lookup functions return references to cache entries. Nilay On Tue, 7 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, this is not an easy issue to fix. First to answer your other question, I believe TBE stands for Transaction Buffer Entry, or something to that effect. As you suggested, we can pass in the cache entry and possibly even the TBE entry into the trigger function. Thus all actions will implicitly include these two parameters as inputs and not require continually lookups or even local variables. However, I believe to make this work we need to change the semantics for allocating and deallocating cache and TBE entries. In particular, these operations probably should be handled by specialized operators (similar to trigger) that correctly manage the pointers underneath. Does that make sense? Let me know if you'd like to brainstorm more about this over a phone conversation. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann, Brad wrote: Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions
Re: [m5-dev] Implementation of findTagInSet
This sounds like a great direction to me... continuing in this vein, would it be possible to factor out the protocol-specific implementations of getState() and setState() entirely? I'm thinking that each of these calls involves a check to see if the block is in a TBE or not, followed by the code to handle the case where it's not in a TBE but is in the cache, and if there's a way to only do the TBE check once per access that could save even more. In terms of keeping changes small, you should save this for after you do the changes Brad suggests, and maybe it's actually not even a good idea, but I wanted to plant the seed. Steve On Wed, Dec 8, 2010 at 9:00 AM, Beckmann, Brad brad.beckm...@amd.comwrote: Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never really straight forward. Let's think it over some more and let me know if you want to discuss this in more detail over-the-phone. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 5:21 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass the cache entry around, or should we make use of reference variables? Currently lookup functions return references to cache entries. Nilay On Tue, 7 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, this is not an easy issue to fix. First to answer your other question, I believe TBE stands for Transaction Buffer Entry, or something to that effect. As you suggested, we can pass in the cache entry and possibly even the TBE entry into the trigger function. Thus all actions will implicitly include these two parameters as inputs and not require continually lookups or even local variables. However, I believe to make this work we need to change the semantics for allocating and deallocating cache and TBE entries. In particular, these operations probably should be handled by specialized operators (similar to trigger) that correctly manage the pointers underneath. Does that make sense? Let me know if you'd like to brainstorm more about this over a phone conversation. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have
Re: [m5-dev] Implementation of findTagInSet
Hi Brad, A couple of observations a. If we make use of pointers, would we not need to handle them in getState and setState functions? b. changePermission() seems to be a problem. It would still perform a lookup because the fact that a CacheEntry is a locked or not is maintained in the CacheMemory object and not with the entry itself. We can move that variable to be part of the AbstractCacheEntry or we can combine it with the permission variable which is already there in the AbstractCacheEntry class. I think lock is only used in the implementation of LL/SC instructions. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never really straight forward. Let's think it over some more and let me know if you want to discuss this in more detail over-the-phone. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 5:21 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass the cache entry around, or should we make use of reference variables? Currently lookup functions return references to cache entries. Nilay On Tue, 7 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, this is not an easy issue to fix. First to answer your other question, I believe TBE stands for Transaction Buffer Entry, or something to that effect. As you suggested, we can pass in the cache entry and possibly even the TBE entry into the trigger function. Thus all actions will implicitly include these two parameters as inputs and not require continually lookups or even local variables. However, I believe to make this work we need to change the semantics for allocating and deallocating cache and TBE entries. In particular, these operations probably should be handled by specialized operators (similar to trigger) that correctly manage the pointers underneath. Does that make sense? Let me know if you'd like to brainstorm more about this over a phone conversation. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann
Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, I think we can avoid handling pointers in the getState and setState functions if we also add bool functions is_cache_entry_valid and is_tbe_entry_valid that are implicitly defined in SLICC. I don't think we should try to get rid of getState and setState since they often contain valuable, protocol-specific checks in them. Instead for getState and setState, I believe we should simply replace the current isTagPresent calls with the new is_*_valid calls. As far as changePermission() goes, your solution seems reasonable, but we may also want to consider just not changing that function at all. ChangePermission() doesn't actually use a cache entry within the .sm file, so is doesn't necessarily need to be changed. Going back to breaking this work into smaller portions, that is definitely a portion I feel can be pushed to the end or removed entirely. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Wednesday, December 08, 2010 11:53 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Hi Brad, A couple of observations a. If we make use of pointers, would we not need to handle them in getState and setState functions? b. changePermission() seems to be a problem. It would still perform a lookup because the fact that a CacheEntry is a locked or not is maintained in the CacheMemory object and not with the entry itself. We can move that variable to be part of the AbstractCacheEntry or we can combine it with the permission variable which is already there in the AbstractCacheEntry class. I think lock is only used in the implementation of LL/SC instructions. Nilay On Wed, 8 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Breaking the changes into small portions is a good idea, but we first need to decide exactly what we are doing. So far we've only thrown out some ideas. We have not yet to scope out a complete solution. I think we've settled on passing some sort of reference to the cache and tbe entries, but exactly whether that is by reference variables or pointers isn't clear. My initial preference is to use pointers in the generated code and set the pointers to NULL when a cache and/or tbe entry doesn't exist. However, one thing I really want to strive for is to keep pointer manipulation out of the .sm files. Writing SLICC code is hard enough and we don't want to burden the SLICC programmer with memory management as well. So how about this plan? - Lets remove all the getCacheEntry functions from the slicc files. I believe that almost all of these functions look exactly the same and it is easy enough for SLICC to just generate them instead. - Similarly let get rid of all isCacheTagPresent functions as well - Then lets replace all the getCacheEntry calls with an implicit SLICC supported variable called cache_entry and all the TBEs[addr*] calls with an implicit SLICC supported variable called tbe_entry. - Underneath these variables can actually be implemented as local inlined functions that assert whether the entries are valid and then return variables local to the state machine set to the current cache and tbe entry. - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what they can be reset is through explicit functions set_cache_entry, reset_cache_entry, set_tbe_entry, and reset_tbe_entry. These function would be called by the appropriate actions or possibly be merged with the existing check_allocate function. I think that will give us what we want, but I realize I've just proposed changing 100's if not 1000's lines of SLICC code. I hope that these changes are straight forward, but any change like that is never really straight forward. Let's think it over some more and let me know if you want to discuss this in more detail over-the-phone. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 5:21 PM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass the cache entry around, or should we make use of reference variables? Currently lookup functions return references to cache entries. Nilay On Tue, 7 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, this is not an easy issue to fix. First to answer your other question, I believe TBE stands for Transaction Buffer Entry, or something to that effect. As you suggested, we can pass in the cache entry and possibly even the TBE entry into the trigger function. Thus all actions will implicitly include these two
Re: [m5-dev] Implementation of findTagInSet
I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann, Brad wrote: Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions. The key in my mind is that we want to maintain the current programming invariant that SLICC does not expose pointers, but underneath the generated code needs to manage that sometimes the CacheEntry pointer may equal NULL. In particular, I would like to minimize any added complexity we put on the setState function. I think we can make this work, but we need to think through the details, including how replacements are handled. I have a few other things I need to take care of first, but I may be able to look into the details of how to make this work by the end of the week. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, Yes, this is not an easy issue to fix. First to answer your other question, I believe TBE stands for Transaction Buffer Entry, or something to that effect. As you suggested, we can pass in the cache entry and possibly even the TBE entry into the trigger function. Thus all actions will implicitly include these two parameters as inputs and not require continually lookups or even local variables. However, I believe to make this work we need to change the semantics for allocating and deallocating cache and TBE entries. In particular, these operations probably should be handled by specialized operators (similar to trigger) that correctly manage the pointers underneath. Does that make sense? Let me know if you'd like to brainstorm more about this over a phone conversation. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann, Brad wrote: Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions. The key in my mind is that we want to maintain the current programming invariant that SLICC does not expose pointers, but underneath the generated code needs to manage that sometimes the CacheEntry pointer may equal NULL. In particular, I would like to minimize any added complexity we put on the setState function. I think we can make this work, but we need to think through the details, including how replacements are handled. I have a few other things I need to take care of first, but I may be able to look into the details of how to make this work by the end of the week. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Brad, Let's try to break the required changes into small portions. Given my feeble knowledge of Ruby, it would be for me to visualize what change is going to have what effect. One question, should we use pointers to pass the cache entry around, or should we make use of reference variables? Currently lookup functions return references to cache entries. Nilay On Tue, 7 Dec 2010, Beckmann, Brad wrote: Hi Nilay, Yes, this is not an easy issue to fix. First to answer your other question, I believe TBE stands for Transaction Buffer Entry, or something to that effect. As you suggested, we can pass in the cache entry and possibly even the TBE entry into the trigger function. Thus all actions will implicitly include these two parameters as inputs and not require continually lookups or even local variables. However, I believe to make this work we need to change the semantics for allocating and deallocating cache and TBE entries. In particular, these operations probably should be handled by specialized operators (similar to trigger) that correctly manage the pointers underneath. Does that make sense? Let me know if you'd like to brainstorm more about this over a phone conversation. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Tuesday, December 07, 2010 12:16 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet I have made changes to SLICC to support local reference variables. I think we should reference variables in functions where back to back calls are made to lookup/getCacheEntry functions. Overall, I am still unclear how can we handle this issue. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann, Brad wrote: Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions. The key in my mind is that we want to maintain the current programming invariant that SLICC does not expose pointers, but underneath the generated code needs to manage that sometimes the CacheEntry pointer may equal NULL. In particular, I would like to minimize any added complexity we put on the setState function. I think we can make this work, but we need to think through the details, including how replacements are handled. I have a few other things I need to take care of first, but I may be able to look into the details of how to make this work by the end of the week. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
This what I have thought of. Currently, doTransition() function takes in the cache state of the address that is being supplied. This function further calls setState() function, one of functions that repeatedly calls isTagPresent(). Instead, if we pass the cache state and the cache Entry reference to doTransition(), then we will not have to call all the isTagPresent(). The problem with this is that before the caches looked up, there is a structure called TBE which is being looked up. What does TBE stands for? We can pass references to a TBE entry and a cache entry. If the TBE entry has a valid state, then it is used. Else the state of the cache entry is looked and used if valid. Nilay On Tue, 30 Nov 2010, Nilay Vaish wrote: Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann, Brad wrote: Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions. The key in my mind is that we want to maintain the current programming invariant that SLICC does not expose pointers, but underneath the generated code needs to manage that sometimes the CacheEntry pointer may equal NULL. In particular, I would like to minimize any added complexity we put on the setState function. I think we can make this work, but we need to think through the details, including how replacements are handled. I have a few other things I need to take care of first, but I may be able to look into the details of how to make this work by the end of the week. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Is it possible to have variables local to a function in .sm files. I am thinking of storing getCacheEntry()'s return value in a local variable. Nilay On Mon, 29 Nov 2010, Beckmann, Brad wrote: Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions. The key in my mind is that we want to maintain the current programming invariant that SLICC does not expose pointers, but underneath the generated code needs to manage that sometimes the CacheEntry pointer may equal NULL. In particular, I would like to minimize any added complexity we put on the setState function. I think we can make this work, but we need to think through the details, including how replacements are handled. I have a few other things I need to take care of first, but I may be able to look into the details of how to make this work by the end of the week. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Saturday, November 27, 2010 11:40 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Is it not possible to redesign the functions to accept CacheEntry as a paramemter instead of a Address parameter? ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, I don't think we want to replace the implicit Address parameter inside the state machines with the CacheEntry parameter, but we might want to supplement the state machine functions to include both. I don't think we can replace the Address parameter because certain transitions within a state machine don't operate on a CacheEntry, but they do operate on an Address. However, as we discussed last week, we might be able to pass the CacheEntry into the trigger function along with the Address, which is then implicitly included in all actions. The key in my mind is that we want to maintain the current programming invariant that SLICC does not expose pointers, but underneath the generated code needs to manage that sometimes the CacheEntry pointer may equal NULL. In particular, I would like to minimize any added complexity we put on the setState function. I think we can make this work, but we need to think through the details, including how replacements are handled. I have a few other things I need to take care of first, but I may be able to look into the details of how to make this work by the end of the week. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Saturday, November 27, 2010 11:40 AM To: M5 Developer List Subject: Re: [m5-dev] Implementation of findTagInSet Is it not possible to redesign the functions to accept CacheEntry as a paramemter instead of a Address parameter? On Sat, 27 Nov 2010, Nilay Vaish wrote: I conducted an experiment to figure out how many calls are made to the hash table to check if the given address exists in the cache. For the same setup as before, less than 10% calls are made. That is out of about 880,000,000 calls to the isTagPresent function, only about 81,000,000 actually go and search the hash table. I think we should work towards removing some of the redundant calls. I have a partial fix for some portion of the code. But again, it is not a design change. I am unsure how to change the design of Ruby and/or Slicc to get done with these redundant calls. Brad, do you have something in mind on this? Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I conducted an experiment to figure out how many calls are made to the hash table to check if the given address exists in the cache. For the same setup as before, less than 10% calls are made. That is out of about 880,000,000 calls to the isTagPresent function, only about 81,000,000 actually go and search the hash table. I think we should work towards removing some of the redundant calls. I have a partial fix for some portion of the code. But again, it is not a design change. I am unsure how to change the design of Ruby and/or Slicc to get done with these redundant calls. Brad, do you have something in mind on this? Thanks Nilay On Fri, 26 Nov 2010, Steve Reinhardt wrote: Hi Nilay, Good job, this is clearly progress... you've sped up isTagPresent by 2X and the simulation overall by almost 10%. That's nothing to sneeze at. It's sad that isTagPresent is still the top function though. Can you do some tracing or other experiments to get a feel for whether keeping the last N tags instead of the last 1 (for some small value of N, like 2 or 3) would be useful? Just printing out a trace of calls to isTagPresent should be enough to get a feeling for whether that's worth hacking in a test implementation. Also, I see a lot of your patch has to do with removing const labels from isTagPresent... this is exactly the scenario the 'mutable' keyword was designed for; if you mark your m_mru_* fields as mutable, then you shouldn't have to remove the const labels from any of the function calls. Steve On Fri, Nov 26, 2010 at 9:37 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled the un-modified and the modified m5 ten times (this time there was no load on the machine). Here are the average results: % time std. dev actual time std. dev un-modified isTagPresent 19.99 0.35 47.17 1.23 cumulative 100 0.00 235.913.37 modified isTagPresent 10.35 0.28 21.22 0.57 cumulative 100 0.00 205.112.94 Below is the patch, though it may not apply cleanly to current version of m5 since I have few un-committed patches enqueued. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Is it not possible to redesign the functions to accept CacheEntry as a paramemter instead of a Address parameter? On Sat, 27 Nov 2010, Nilay Vaish wrote: I conducted an experiment to figure out how many calls are made to the hash table to check if the given address exists in the cache. For the same setup as before, less than 10% calls are made. That is out of about 880,000,000 calls to the isTagPresent function, only about 81,000,000 actually go and search the hash table. I think we should work towards removing some of the redundant calls. I have a partial fix for some portion of the code. But again, it is not a design change. I am unsure how to change the design of Ruby and/or Slicc to get done with these redundant calls. Brad, do you have something in mind on this? Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I profiled the un-modified and the modified m5 ten times (this time there was no load on the machine). Here are the average results: % time std. dev actual time std. dev un-modified isTagPresent 19.99 0.35 47.17 1.23 cumulative 100 0.00 235.913.37 modified isTagPresent 10.35 0.28 21.22 0.57 cumulative 100 0.00 205.112.94 Below is the patch, though it may not apply cleanly to current version of m5 since I have few un-committed patches enqueued. # HG changeset patch # Parent 7ac53378e03b5116c48e6076167de6a2a2e06158 diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc --- a/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 13:23:51 2010 -0600 +++ b/src/mem/ruby/system/CacheMemory.cc Thu Nov 25 17:30:58 2010 -0600 @@ -84,6 +84,8 @@ m_locked[i][j] = -1; } } + +m_valid_mru_address = false; } CacheMemory::~CacheMemory() @@ -135,15 +137,26 @@ // Given a cache index: returns the index of the tag in a set. // returns -1 if the tag is not found. int -CacheMemory::findTagInSet(Index cacheSet, const Address tag) const +CacheMemory::findTagInSet(Index cacheSet, const Address tag) { assert(tag == line_address(tag)); + +if(m_valid_mru_address m_mru_address == tag) return m_mru_tag_index; + // search the set for the tags +m_valid_mru_address = true; +m_mru_address.setAddress(tag.getAddress()); + m5::hash_mapAddress, int::const_iterator it = m_tag_index.find(tag); if (it != m_tag_index.end()) if (m_cache[cacheSet][it-second]-m_Permission != AccessPermission_NotPresent) +{ +m_mru_tag_index = it-second; return it-second; +} + +m_mru_tag_index = -1; return -1; // Not found } @@ -215,7 +228,7 @@ // tests to see if an address is present in the cache bool -CacheMemory::isTagPresent(const Address address) const +CacheMemory::isTagPresent(const Address address) { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); @@ -276,6 +289,10 @@ m_locked[cacheSet][i] = -1; m_tag_index[address] = i; +m_valid_mru_address = true; +m_mru_address.setAddress(address.getAddress()); +m_mru_tag_index = i; + m_replacementPolicy_ptr- touch(cacheSet, i, g_eventQueue_ptr-getTime()); @@ -300,6 +317,8 @@ address); m_locked[cacheSet][loc] = -1; m_tag_index.erase(address); + +m_valid_mru_address = false; } } @@ -327,18 +346,18 @@ } // looks an address up in the cache -const AbstractCacheEntry -CacheMemory::lookup(const Address address) const +/*const AbstractCacheEntry +CacheMemory::lookup(const Address address) { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); assert(loc != -1); return *m_cache[cacheSet][loc]; -} +}*/ AccessPermission -CacheMemory::getPermission(const Address address) const +CacheMemory::getPermission(const Address address) { assert(address == line_address(address)); return lookup(address).m_Permission; diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh --- a/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 13:23:51 2010 -0600 +++ b/src/mem/ruby/system/CacheMemory.hh Thu Nov 25 17:30:58 2010 -0600 @@ -74,7 +74,7 @@ DataBlock* data_ptr); // tests to see if an address is present in the cache -bool isTagPresent(const Address address) const; +bool isTagPresent(const Address address); // Returns true if there is: // a) a tag match on this address or there is @@ -92,10 +92,10 @@ // looks an address up in the cache AbstractCacheEntry lookup(const Address address); -const AbstractCacheEntry lookup(const Address address) const; +//const AbstractCacheEntry lookup(const Address address) const; // Get/Set permission of cache block -AccessPermission getPermission(const Address address) const; +AccessPermission getPermission(const Address address); void changePermission(const Address address, AccessPermission new_perm); int getLatency() const { return m_latency; } @@ -138,7 +138,7 @@ // Given a cache tag: returns the index of the tag in a set. // returns -1 if the tag is not found. -int findTagInSet(Index line, const Address tag) const; +int findTagInSet(Index line, const Address tag); int findTagInSetIgnorePermissions(Index cacheSet, const Address tag) const; @@ -170,6 +170,10 @@ int m_cache_num_set_bits; int m_cache_assoc; int m_start_index_bit; + +Address m_mru_address; +int m_mru_tag_index; + bool m_valid_mru_address; }; #endif // __MEM_RUBY_SYSTEM_CACHEMEMORY_HH__ On Thu,
Re: [m5-dev] Implementation of findTagInSet
Hi Nilay, Good job, this is clearly progress... you've sped up isTagPresent by 2X and the simulation overall by almost 10%. That's nothing to sneeze at. It's sad that isTagPresent is still the top function though. Can you do some tracing or other experiments to get a feel for whether keeping the last N tags instead of the last 1 (for some small value of N, like 2 or 3) would be useful? Just printing out a trace of calls to isTagPresent should be enough to get a feeling for whether that's worth hacking in a test implementation. Also, I see a lot of your patch has to do with removing const labels from isTagPresent... this is exactly the scenario the 'mutable' keyword was designed for; if you mark your m_mru_* fields as mutable, then you shouldn't have to remove the const labels from any of the function calls. Steve On Fri, Nov 26, 2010 at 9:37 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled the un-modified and the modified m5 ten times (this time there was no load on the machine). Here are the average results: % time std. dev actual time std. dev un-modified isTagPresent 19.99 0.35 47.17 1.23 cumulative 100 0.00 235.913.37 modified isTagPresent 10.35 0.28 21.22 0.57 cumulative 100 0.00 205.112.94 Below is the patch, though it may not apply cleanly to current version of m5 since I have few un-committed patches enqueued. # HG changeset patch # Parent 7ac53378e03b5116c48e6076167de6a2a2e06158 diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc --- a/src/mem/ruby/system/CacheMemory.ccThu Nov 25 13:23:51 2010 -0600 +++ b/src/mem/ruby/system/CacheMemory.ccThu Nov 25 17:30:58 2010 -0600 @@ -84,6 +84,8 @@ m_locked[i][j] = -1; } } + +m_valid_mru_address = false; } CacheMemory::~CacheMemory() @@ -135,15 +137,26 @@ // Given a cache index: returns the index of the tag in a set. // returns -1 if the tag is not found. int -CacheMemory::findTagInSet(Index cacheSet, const Address tag) const +CacheMemory::findTagInSet(Index cacheSet, const Address tag) { assert(tag == line_address(tag)); + +if(m_valid_mru_address m_mru_address == tag) return m_mru_tag_index; + // search the set for the tags +m_valid_mru_address = true; +m_mru_address.setAddress(tag.getAddress()); + m5::hash_mapAddress, int::const_iterator it = m_tag_index.find(tag); if (it != m_tag_index.end()) if (m_cache[cacheSet][it-second]-m_Permission != AccessPermission_NotPresent) +{ +m_mru_tag_index = it-second; return it-second; +} + +m_mru_tag_index = -1; return -1; // Not found } @@ -215,7 +228,7 @@ // tests to see if an address is present in the cache bool -CacheMemory::isTagPresent(const Address address) const +CacheMemory::isTagPresent(const Address address) { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); @@ -276,6 +289,10 @@ m_locked[cacheSet][i] = -1; m_tag_index[address] = i; +m_valid_mru_address = true; +m_mru_address.setAddress(address.getAddress()); +m_mru_tag_index = i; + m_replacementPolicy_ptr- touch(cacheSet, i, g_eventQueue_ptr-getTime()); @@ -300,6 +317,8 @@ address); m_locked[cacheSet][loc] = -1; m_tag_index.erase(address); + +m_valid_mru_address = false; } } @@ -327,18 +346,18 @@ } // looks an address up in the cache -const AbstractCacheEntry -CacheMemory::lookup(const Address address) const +/*const AbstractCacheEntry +CacheMemory::lookup(const Address address) { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); assert(loc != -1); return *m_cache[cacheSet][loc]; -} +}*/ AccessPermission -CacheMemory::getPermission(const Address address) const +CacheMemory::getPermission(const Address address) { assert(address == line_address(address)); return lookup(address).m_Permission; diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh --- a/src/mem/ruby/system/CacheMemory.hhThu Nov 25 13:23:51 2010 -0600 +++ b/src/mem/ruby/system/CacheMemory.hhThu Nov 25 17:30:58 2010 -0600 @@ -74,7 +74,7 @@ DataBlock* data_ptr); // tests to see if an address is present in the cache -bool isTagPresent(const Address address) const; +bool isTagPresent(const Address address); // Returns true if there is: // a) a tag match on this address or there is @@ -92,10 +92,10 @@ // looks an address up in the cache AbstractCacheEntry lookup(const Address address); -const AbstractCacheEntry lookup(const Address address)
Re: [m5-dev] Implementation of findTagInSet
Brad and I had a discussion on Tuesday. We are still thinking how to resolve this issue. As a stop gap arrangement, I added a couple of variables to the CacheMemory class which track the last address for which the lookup was performed. I am posting the results from profiling before and after the change. I had compile m5 with MOESI_hammer protocol and the simulation was allowed to run for 20,000,000,000 ticks. I would suggest not to look at the absolute time values for they would vary depending on the load on the machine. Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 18.27 61.3261.32 888688475 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.97 81.3620.04 219389124 0.00 0.00 Histogram::add(long long) 2.99 91.3910.03 204574578 0.00 0.00 CacheMemory::lookup(Address const) 2.56 99.97 8.58 12852725 0.00 0.00 MemoryControl::executeCycle() 2.51108.38 8.41 45887816 0.00 0.00 L1Cache_Controller::wakeup() Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 11.38 41.6441.64 888688475 0.00 0.00 CacheMemory::isTagPresent(Address const) 5.99 63.5521.91 219389124 0.00 0.00 Histogram::add(long long) 2.90 74.1610.61 45887816 0.00 0.00 L1Cache_Controller::wakeup() 2.76 84.2510.09 12852725 0.00 0.00 MemoryControl::executeCycle() 2.49 93.36 9.11 34522950 0.00 0.00 BaseSimpleCPU::preExecute() I can post the patch on the review board if this looks good. -- Nilay On Tue, 23 Nov 2010, Nilay Vaish wrote: Brad and I will be having a discussion today on how to resolve this issue. -- Nilay On Tue, 23 Nov 2010, Steve Reinhardt wrote: Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag lookups for a single access; I don't know if that's just an API change or if that's something that needs to be folded into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on it, or likely to work on it again?) In the short term, it's possible that some of the overhead can be avoided by building a software cache into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call to see if we're looking up the same address as last time and if so just returning the same pointer before resorting to the hash table. I hope that doesn't lead to any coherence problems with the block changing out from under this cached copy... if so, perhaps an additional block check is required on hits. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag lookups for a single access; I don't know if that's just an API change or if that's something that needs to be folded into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on it, or likely to work on it again?) In the short term, it's possible that some of the overhead can be avoided by building a software cache into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call to see if we're looking up the same address as last time and if so just returning the same pointer before resorting to the hash table. I hope that doesn't lead to any coherence problems with the block changing out from under this cached copy... if so, perhaps an additional block check is required on hits. Steve On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I was looking at the MOESI hammer protocol. I think Steve's observation that extra tag lookups are going on in the cache is correct. In particular I noticed that in the getState() and setState() functions, first isTagPresent(address) is called and on the basis of the result (which is true or false), getCacheEntry(address) is called. Surprisingly, the getCacheEntry() function calls the isTagPresent() function again. These calls are in the file src/mem/protocol/MOESI_hammer-cache.sm Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Brad and I will be having a discussion today on how to resolve this issue. -- Nilay On Tue, 23 Nov 2010, Steve Reinhardt wrote: Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag lookups for a single access; I don't know if that's just an API change or if that's something that needs to be folded into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on it, or likely to work on it again?) In the short term, it's possible that some of the overhead can be avoided by building a software cache into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call to see if we're looking up the same address as last time and if so just returning the same pointer before resorting to the hash table. I hope that doesn't lead to any coherence problems with the block changing out from under this cached copy... if so, perhaps an additional block check is required on hits. Steve On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I was looking at the MOESI hammer protocol. I think Steve's observation that extra tag lookups are going on in the cache is correct. In particular I noticed that in the getState() and setState() functions, first isTagPresent(address) is called and on the basis of the result (which is true or false), getCacheEntry(address) is called. Surprisingly, the getCacheEntry() function calls the isTagPresent() function again. These calls are in the file src/mem/protocol/MOESI_hammer-cache.sm Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I'm not the guy to ask for that... but actually I doubt the protocol itself matters that much, you just need to look at the code path that gets exercised on an L1 cache hit and see where the calls are. That part should be almost if not entirely independent of the coherence protocol. Steve On Tue, Nov 16, 2010 at 7:13 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled M5 using MOESI_CMP_directory and MOESI_CMP_token protocols. isTagPresent() dominates in both of these protocols as well. But the percentage of simulation time used is lesser (varies from 10% to 16%). I will take a look at the assembly code for the direct cache set indexing approach. In order to reduce the number of calls made to tag lookup, I would need to read about the protocol itself. Can you point some documentation on MOESI_hammer protocol? -- Nilay On Fri, 12 Nov 2010, Steve Reinhardt wrote: On Fri, Nov 12, 2010 at 1:10 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I tried couple of ideas for improving the performance of the findTagInSet() function. These include having one hash table per cache set and replacing the hash table with a two dimensional array indexed using cache set, cache way. Neither of these ideas showed significant enough change in the percentage of the time taken by isTagPresent() function to come to a definite conclusion. I looked the assembly code generated for the isTagPresent() function. Note that the compiler inlines the funtion findTagIsPresent(). The assembly code is about 100 lines long. Again, the find function for std::hash_map gets inlined. There is one division operation in the code and several load operations. If we were to assume that the hash function is able to keep the average occupancy of each bucket close to 1, then I think the time taken by the function would be determined by the time taken by the loads. It might be that a lot of loads end up missing in the cache. I'm a little surprised that the direct cache set indexing approach was not faster since I'd think that would be far less than 100 instructions, but you're right that issues like whether loads hit or miss in the cache will have a large impact. As far as reordering the tags is concerned, since the hash_map is not directly under our control, we will have to delete the tag entry in the hash table and insert it again to make sure that it is the first entry that is searched. The reordering only makes sense if we replace the hash table with a more conventional tag array. Right now I am profiling with coherence protocol as MOESI_hammer. I am thinking of profiling using a different protocol to make sure that it is not an artifact of the protocol in use. That sounds like a good idea. All in all, we would ideally like to both speed up individual calls and reduce the number of calls. IIRC, gprof indicated that findTagInSet() was called 4-5X more frequently than there were cache accesses, which makes no sense to me; it seems like a typical cache hit should only require a single tag lookup. That's another thing to keep in mind, is that typical programs really have very high cache hit rates, so another approach is to look at what happens in the process of servicing an L1 cache hit and optimize that path as much as possible. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
On Fri, 12 Nov 2010, Steve Reinhardt wrote: Right now I am profiling with coherence protocol as MOESI_hammer. I am thinking of profiling using a different protocol to make sure that it is not an artifact of the protocol in use. That sounds like a good idea. All in all, we would ideally like to both speed up individual calls and reduce the number of calls. IIRC, gprof indicated that findTagInSet() was called 4-5X more frequently than there were cache accesses, which makes no sense to me; it seems like a typical cache hit should only require a single tag lookup. Should this not be true in case a multiprocessor system is being simulated? I am not aware the configuration that ruby_fs.py makes use of. That's another thing to keep in mind, is that typical programs really have very high cache hit rates, so another approach is to look at what happens in the process of servicing an L1 cache hit and optimize that path as much as possible. Steve -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I was looking at the MOESI hammer protocol. I think Steve's observation that extra tag lookups are going on in the cache is correct. In particular I noticed that in the getState() and setState() functions, first isTagPresent(address) is called and on the basis of the result (which is true or false), getCacheEntry(address) is called. Surprisingly, the getCacheEntry() function calls the isTagPresent() function again. These calls are in the file src/mem/protocol/MOESI_hammer-cache.sm Thanks Nilay On Tue, 16 Nov 2010, Nilay Vaish wrote: I profiled M5 using MOESI_CMP_directory and MOESI_CMP_token protocols. isTagPresent() dominates in both of these protocols as well. But the percentage of simulation time used is lesser (varies from 10% to 16%). I will take a look at the assembly code for the direct cache set indexing approach. In order to reduce the number of calls made to tag lookup, I would need to read about the protocol itself. Can you point some documentation on MOESI_hammer protocol? -- Nilay On Fri, 12 Nov 2010, Steve Reinhardt wrote: On Fri, Nov 12, 2010 at 1:10 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I tried couple of ideas for improving the performance of the findTagInSet() function. These include having one hash table per cache set and replacing the hash table with a two dimensional array indexed using cache set, cache way. Neither of these ideas showed significant enough change in the percentage of the time taken by isTagPresent() function to come to a definite conclusion. I looked the assembly code generated for the isTagPresent() function. Note that the compiler inlines the funtion findTagIsPresent(). The assembly code is about 100 lines long. Again, the find function for std::hash_map gets inlined. There is one division operation in the code and several load operations. If we were to assume that the hash function is able to keep the average occupancy of each bucket close to 1, then I think the time taken by the function would be determined by the time taken by the loads. It might be that a lot of loads end up missing in the cache. I'm a little surprised that the direct cache set indexing approach was not faster since I'd think that would be far less than 100 instructions, but you're right that issues like whether loads hit or miss in the cache will have a large impact. As far as reordering the tags is concerned, since the hash_map is not directly under our control, we will have to delete the tag entry in the hash table and insert it again to make sure that it is the first entry that is searched. The reordering only makes sense if we replace the hash table with a more conventional tag array. Right now I am profiling with coherence protocol as MOESI_hammer. I am thinking of profiling using a different protocol to make sure that it is not an artifact of the protocol in use. That sounds like a good idea. All in all, we would ideally like to both speed up individual calls and reduce the number of calls. IIRC, gprof indicated that findTagInSet() was called 4-5X more frequently than there were cache accesses, which makes no sense to me; it seems like a typical cache hit should only require a single tag lookup. That's another thing to keep in mind, is that typical programs really have very high cache hit rates, so another approach is to look at what happens in the process of servicing an L1 cache hit and optimize that path as much as possible. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I tried couple of ideas for improving the performance of the findTagInSet() function. These include having one hash table per cache set and replacing the hash table with a two dimensional array indexed using cache set, cache way. Neither of these ideas showed significant enough change in the percentage of the time taken by isTagPresent() function to come to a definite conclusion. I looked the assembly code generated for the isTagPresent() function. Note that the compiler inlines the funtion findTagIsPresent(). The assembly code is about 100 lines long. Again, the find function for std::hash_map gets inlined. There is one division operation in the code and several load operations. If we were to assume that the hash function is able to keep the average occupancy of each bucket close to 1, then I think the time taken by the function would be determined by the time taken by the loads. It might be that a lot of loads end up missing in the cache. As far as reordering the tags is concerned, since the hash_map is not directly under our control, we will have to delete the tag entry in the hash table and insert it again to make sure that it is the first entry that is searched. Right now I am profiling with coherence protocol as MOESI_hammer. I am thinking of profiling using a different protocol to make sure that it is not an artifact of the protocol in use. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: If that's where a significant amount of time is being spent, we need to either call it less or make it run faster :-). Doing both is even better. At a high level, the process of looking something up in an N-way associative cache should not take that many instructions if N is small (a shift and an add to find the tag index, at most N loads and compares to match the tags). If we reorder the tags to search the MRU block first then we will probabilistically keep the average number of tags searched well below N. Steve On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I had another look at the profile output. On the machine that I am using (a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming that the pipeline is functioning at is best, I think the number of uops executed would be ~500. Is that too much for this function? -- Nilay On Fri, 5 Nov 2010, Nilay Vaish wrote: Do you know what hash function is in use? Seems to me that the default hash function is to hash to self. May be we should test with a different hash function. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds secondscalls s/call s/call name 19.72 51.2251.22 925285266 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.59 65.7414.52 229035720 0.00 0.00 Histogram::add(long long) 3.57 75.02 9.28 212664644 0.00 0.00 CacheMemory::lookup(Address const) 2.53 81.59 6.57 47830136 0.00 0.00 L1Cache_Controller::wakeup() The output shows that about a fifth of the time is spent in the isTagPresent() function. bool CacheMemory::isTagPresent(const Address address) const { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); if (loc == -1) { // We didn't find the tag DPRINTF(RubyCache, No tag match for address: %s\n, address); return false; } DPRINTF(RubyCache, address: %s found\n, address); return true; } Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() and the DPRINTF() will not get compiled. The addressToCacheSet() function does some bitwise operations and some arithmetic operations. So it is expected that it would not consume much time. So, most likely the findTagInSet() function takes a major portion of the overall time required by the isTagPresent() function. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I went through the implementation of hash_map in C++. I realized that the number of buckets get resized on the fly as the number of elements increase. This means that we would have more than num_of_cache_sets * num_ways buckets in the hash table. -- Nilay On Sat, 6 Nov 2010, Nilay Vaish wrote: I am still digging into how the cache configuration is specified. Currently, the system has four caches - two 32 KB, 256 sets, 2-way associative caches, one 2 MB, 2048 sets, 16-way set associative cache, and one 4 MB, 16384 sets, 4-set associative cache. Each of these caches make use of a hash table having only 193 buckets. This seems too low to me. Clearly if the cache is being used to full capacity, look ups in the hash table will have to go through lot many entries before figuring out whether the address being searched for is in the cache or not. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: If that's where a significant amount of time is being spent, we need to either call it less or make it run faster :-). Doing both is even better. At a high level, the process of looking something up in an N-way associative cache should not take that many instructions if N is small (a shift and an add to find the tag index, at most N loads and compares to match the tags). If we reorder the tags to search the MRU block first then we will probabilistically keep the average number of tags searched well below N. Steve On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I had another look at the profile output. On the machine that I am using (a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming that the pipeline is functioning at is best, I think the number of uops executed would be ~500. Is that too much for this function? -- Nilay On Fri, 5 Nov 2010, Nilay Vaish wrote: Do you know what hash function is in use? Seems to me that the default hash function is to hash to self. May be we should test with a different hash function. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds secondscalls s/call s/call name 19.72 51.2251.22 925285266 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.59 65.7414.52 229035720 0.00 0.00 Histogram::add(long long) 3.57 75.02 9.28 212664644 0.00 0.00 CacheMemory::lookup(Address const) 2.53 81.59 6.57 47830136 0.00 0.00 L1Cache_Controller::wakeup() The output shows that about a fifth of the time is spent in the isTagPresent() function. bool CacheMemory::isTagPresent(const Address address) const { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); if (loc == -1) { // We didn't find the tag DPRINTF(RubyCache, No tag match for address: %s\n, address); return false; } DPRINTF(RubyCache, address: %s found\n, address); return true; } Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() and the DPRINTF() will not get compiled. The addressToCacheSet() function does some bitwise operations and some arithmetic operations. So it is expected that it would not consume much time. So, most likely the findTagInSet() function takes a major portion of the overall time required by the isTagPresent() function. -- Nilay On Thu, 4 Nov 2010, Steve Reinhardt wrote: You also have to build a binary that supports ruby, like ALPHA_FS_MOESI_hammer. If you can't get that to work, try ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the workload you run doesn't really matter that much as long as it's long enough to get a meaningful profile. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds secondscalls s/call s/call name 19.72 51.2251.22 925285266 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.59 65.7414.52 229035720 0.00 0.00 Histogram::add(long long) 3.57 75.02 9.28 212664644 0.00 0.00 CacheMemory::lookup(Address const) 2.53 81.59 6.57 47830136 0.00 0.00 L1Cache_Controller::wakeup() The output shows that about a fifth of the time is spent in the isTagPresent() function. bool CacheMemory::isTagPresent(const Address address) const { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); if (loc == -1) { // We didn't find the tag DPRINTF(RubyCache, No tag match for address: %s\n, address); return false; } DPRINTF(RubyCache, address: %s found\n, address); return true; } Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() and the DPRINTF() will not get compiled. The addressToCacheSet() function does some bitwise operations and some arithmetic operations. So it is expected that it would not consume much time. So, most likely the findTagInSet() function takes a major portion of the overall time required by the isTagPresent() function. -- Nilay On Thu, 4 Nov 2010, Steve Reinhardt wrote: You also have to build a binary that supports ruby, like ALPHA_FS_MOESI_hammer. If you can't get that to work, try ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the workload you run doesn't really matter that much as long as it's long enough to get a meaningful profile. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Do you know what hash function is in use? Seems to me that the default hash function is to hash to self. May be we should test with a different hash function. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds secondscalls s/call s/call name 19.72 51.2251.22 925285266 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.59 65.7414.52 229035720 0.00 0.00 Histogram::add(long long) 3.57 75.02 9.28 212664644 0.00 0.00 CacheMemory::lookup(Address const) 2.53 81.59 6.57 47830136 0.00 0.00 L1Cache_Controller::wakeup() The output shows that about a fifth of the time is spent in the isTagPresent() function. bool CacheMemory::isTagPresent(const Address address) const { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); if (loc == -1) { // We didn't find the tag DPRINTF(RubyCache, No tag match for address: %s\n, address); return false; } DPRINTF(RubyCache, address: %s found\n, address); return true; } Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() and the DPRINTF() will not get compiled. The addressToCacheSet() function does some bitwise operations and some arithmetic operations. So it is expected that it would not consume much time. So, most likely the findTagInSet() function takes a major portion of the overall time required by the isTagPresent() function. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I had another look at the profile output. On the machine that I am using (a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming that the pipeline is functioning at is best, I think the number of uops executed would be ~500. Is that too much for this function? -- Nilay On Fri, 5 Nov 2010, Nilay Vaish wrote: Do you know what hash function is in use? Seems to me that the default hash function is to hash to self. May be we should test with a different hash function. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds secondscalls s/call s/call name 19.72 51.2251.22 925285266 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.59 65.7414.52 229035720 0.00 0.00 Histogram::add(long long) 3.57 75.02 9.28 212664644 0.00 0.00 CacheMemory::lookup(Address const) 2.53 81.59 6.57 47830136 0.00 0.00 L1Cache_Controller::wakeup() The output shows that about a fifth of the time is spent in the isTagPresent() function. bool CacheMemory::isTagPresent(const Address address) const { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); if (loc == -1) { // We didn't find the tag DPRINTF(RubyCache, No tag match for address: %s\n, address); return false; } DPRINTF(RubyCache, address: %s found\n, address); return true; } Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() and the DPRINTF() will not get compiled. The addressToCacheSet() function does some bitwise operations and some arithmetic operations. So it is expected that it would not consume much time. So, most likely the findTagInSet() function takes a major portion of the overall time required by the isTagPresent() function. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
If that's where a significant amount of time is being spent, we need to either call it less or make it run faster :-). Doing both is even better. At a high level, the process of looking something up in an N-way associative cache should not take that many instructions if N is small (a shift and an add to find the tag index, at most N loads and compares to match the tags). If we reorder the tags to search the MRU block first then we will probabilistically keep the average number of tags searched well below N. Steve On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I had another look at the profile output. On the machine that I am using (a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming that the pipeline is functioning at is best, I think the number of uops executed would be ~500. Is that too much for this function? -- Nilay On Fri, 5 Nov 2010, Nilay Vaish wrote: Do you know what hash function is in use? Seems to me that the default hash function is to hash to self. May be we should test with a different hash function. -- Nilay On Fri, 5 Nov 2010, Steve Reinhardt wrote: You can look at the call graph profile further down in the gprof output to figure out how much time is spent in functions that get called from isTagPresent. If it's not specifically calling out findTagInSet, it may be because it's inlined in isTagPresent. Steve On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I ran ALPHA_FS_MOESI_hammer using the following command -- ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py I don't know how the benchmark is picked in case none is specified. Below is the gprof output -- % cumulative self self total time seconds secondscalls s/call s/call name 19.72 51.2251.22 925285266 0.00 0.00 CacheMemory::isTagPresent(Address const) const 5.59 65.7414.52 229035720 0.00 0.00 Histogram::add(long long) 3.57 75.02 9.28 212664644 0.00 0.00 CacheMemory::lookup(Address const) 2.53 81.59 6.57 47830136 0.00 0.00 L1Cache_Controller::wakeup() The output shows that about a fifth of the time is spent in the isTagPresent() function. bool CacheMemory::isTagPresent(const Address address) const { assert(address == line_address(address)); Index cacheSet = addressToCacheSet(address); int loc = findTagInSet(cacheSet, address); if (loc == -1) { // We didn't find the tag DPRINTF(RubyCache, No tag match for address: %s\n, address); return false; } DPRINTF(RubyCache, address: %s found\n, address); return true; } Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() and the DPRINTF() will not get compiled. The addressToCacheSet() function does some bitwise operations and some arithmetic operations. So it is expected that it would not consume much time. So, most likely the findTagInSet() function takes a major portion of the overall time required by the isTagPresent() function. -- Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I tried running ruby_fs.py, below is the error message that I received. I don't think there is any documentation or mailing list discussion on how to run ruby_fs.py. To me it seems that some parameter relating to the DMA controller is missing from the command I tried out. -- Nilay ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 --detailed --caches --l2cache -F 50M5 Simulator System Copyright (c) 2001-2008 The Regents of The University of Michigan All Rights Reserved M5 compiled Nov 3 2010 18:10:26 M5 revision 3b2f82286e5d 7724 default WarnPatch qtip tip M5 started Nov 4 2010 08:50:01 M5 executing on scamorza.cs.wisc.edu command line: ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 --detailed --caches --l2cache -F 50 Error: could not create sytem for ruby protocol MI_example Traceback (most recent call last): File string, line 1, in ? File /scratch/nilay/GEM5/sibling/src/python/m5/main.py, line 359, in main exec filecode in scope File ./configs/example/ruby_fs.py, line 117, in ? system._dma_devices) File /scratch/nilay/GEM5/sibling/configs/ruby/Ruby.py, line 69, in create_system (cpu_sequencers, dir_cntrls, all_cntrls) = \ File string, line 0, in ? File /scratch/nilay/GEM5/sibling/configs/ruby/MI_example.py, line 138, in create_system system.dma_cntrl.dma_sequencer.port = dma_device.dma File /scratch/nilay/GEM5/sibling/src/python/m5/SimObject.py, line 586, in __getattr__ raise AttributeError, object '%s' has no attribute '%s' \ AttributeError: object 'LinuxAlphaSystem' has no attribute 'dma_cntrl' On Wed, 3 Nov 2010, Steve Reinhardt wrote: Ah, the issue is that you're using the old M5 memory hierarchy and not Ruby. You need to run one of the Ruby versions, and use ruby_fs.py instead of fs.py. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
You also have to build a binary that supports ruby, like ALPHA_FS_MOESI_hammer. If you can't get that to work, try ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the workload you run doesn't really matter that much as long as it's long enough to get a meaningful profile. Steve On Thu, Nov 4, 2010 at 7:17 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I tried running ruby_fs.py, below is the error message that I received. I don't think there is any documentation or mailing list discussion on how to run ruby_fs.py. To me it seems that some parameter relating to the DMA controller is missing from the command I tried out. -- Nilay ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 --detailed --caches --l2cache -F 50M5 Simulator System Copyright (c) 2001-2008 The Regents of The University of Michigan All Rights Reserved M5 compiled Nov 3 2010 18:10:26 M5 revision 3b2f82286e5d 7724 default WarnPatch qtip tip M5 started Nov 4 2010 08:50:01 M5 executing on scamorza.cs.wisc.edu command line: ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 --detailed --caches --l2cache -F 50 Error: could not create sytem for ruby protocol MI_example Traceback (most recent call last): File string, line 1, in ? File /scratch/nilay/GEM5/sibling/src/python/m5/main.py, line 359, in main exec filecode in scope File ./configs/example/ruby_fs.py, line 117, in ? system._dma_devices) File /scratch/nilay/GEM5/sibling/configs/ruby/Ruby.py, line 69, in create_system (cpu_sequencers, dir_cntrls, all_cntrls) = \ File string, line 0, in ? File /scratch/nilay/GEM5/sibling/configs/ruby/MI_example.py, line 138, in create_system system.dma_cntrl.dma_sequencer.port = dma_device.dma File /scratch/nilay/GEM5/sibling/src/python/m5/SimObject.py, line 586, in __getattr__ raise AttributeError, object '%s' has no attribute '%s' \ AttributeError: object 'LinuxAlphaSystem' has no attribute 'dma_cntrl' On Wed, 3 Nov 2010, Steve Reinhardt wrote: Ah, the issue is that you're using the old M5 memory hierarchy and not Ruby. You need to run one of the Ruby versions, and use ruby_fs.py instead of fs.py. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I profiled M5 but surprisingly I did not find any mention of the function findTagInSet() in the output obtained from gprof. Does it matter what coherence protocol is in use? I carried out the following step - 1. Compiled m5.prof using scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof 2. Ran blackscholes benchmark using the instructions specified in the technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et al. Specifically I ran the following command -- ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1 --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed --caches --l2cache -F 50 These are the contents of blackscholes.rcS #!/bin/sh # File to run the blackscholes benchmark cd /parsec/install/bin /sbin/m5 switchcpu /sbin/m5 dumpstats /sbin/m5 resetstats ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt /parsec/install/inputs/blackscholes/prices.txt echo Done :D /sbin/m5 exit /sbin/m5 exit Thanks Nilay On Tue, 2 Nov 2010, Steve Reinhardt wrote: I just compiled m5.prof and ran it (forgot what workload I ran on it, probably one of the parsec benchmarks; it probably doesn't matter a lot). If you've never used gprof before, this is a great time to learn! Steve On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I am looking at possible performance optimizations in Ruby. As you can see grasp from the mail excerpt below, the function findTagInSet() consumes lots of time. I am thinking of making the changes as suggested by Brad. I have questions for m5-dev members, in particular for Derek and Steve. How did you arrive at the conclusion that findTagInSet() is a problem? What benchmarks, profiling tools to use? Thanks Nilay -- Forwarded message -- Date: Mon, 20 Sep 2010 22:57:39 -0500 From: Beckmann, Brad brad.beckm...@amd.com To: 'Nilay Vaish' ni...@cs.wisc.edu Cc: Daniel Gibson gib...@cs.wisc.edu Subject: RE: Performane Optimizations in Ruby == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge percentage of time was being spent in CacheMemory's findTagInSet function. Right now that function uses a hashmap across the entire cache to map tags to way ids. I think Derek recently implemented this change in hopes to improve performance, and it might have for small caches, but I don't think it helps for larger caches. There a couple of possible solutions: per set hashmaps, or reordering the ways so that the MRU blocks are at the lower ids and use a loop. I think we should investigate both solutions and see which is better. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
What was the gprof output? On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled M5 but surprisingly I did not find any mention of the function findTagInSet() in the output obtained from gprof. Does it matter what coherence protocol is in use? I carried out the following step - 1. Compiled m5.prof using scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof 2. Ran blackscholes benchmark using the instructions specified in the technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et al. Specifically I ran the following command -- ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1 --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed --caches --l2cache -F 50 These are the contents of blackscholes.rcS #!/bin/sh # File to run the blackscholes benchmark cd /parsec/install/bin /sbin/m5 switchcpu /sbin/m5 dumpstats /sbin/m5 resetstats ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt /parsec/install/inputs/blackscholes/prices.txt echo Done :D /sbin/m5 exit /sbin/m5 exit Thanks Nilay On Tue, 2 Nov 2010, Steve Reinhardt wrote: I just compiled m5.prof and ran it (forgot what workload I ran on it, probably one of the parsec benchmarks; it probably doesn't matter a lot). If you've never used gprof before, this is a great time to learn! Steve On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I am looking at possible performance optimizations in Ruby. As you can see grasp from the mail excerpt below, the function findTagInSet() consumes lots of time. I am thinking of making the changes as suggested by Brad. I have questions for m5-dev members, in particular for Derek and Steve. How did you arrive at the conclusion that findTagInSet() is a problem? What benchmarks, profiling tools to use? Thanks Nilay -- Forwarded message -- Date: Mon, 20 Sep 2010 22:57:39 -0500 From: Beckmann, Brad brad.beckm...@amd.com To: 'Nilay Vaish' ni...@cs.wisc.edu Cc: Daniel Gibson gib...@cs.wisc.edu Subject: RE: Performane Optimizations in Ruby == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge percentage of time was being spent in CacheMemory's findTagInSet function. Right now that function uses a hashmap across the entire cache to map tags to way ids. I think Derek recently implemented this change in hopes to improve performance, and it might have for small caches, but I don't think it helps for larger caches. There a couple of possible solutions: per set hashmaps, or reordering the ways so that the MRU blocks are at the lower ids and use a loop. I think we should investigate both solutions and see which is better. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I ran m5.prof two times. Here are the top five functions -- % cumulative self self total time seconds secondscalls s/call s/call name 8.35 5.05 5.05 58969209 0.00 0.00 BaseSimpleCPU::preExecute() 6.32 8.87 3.82 58975463 0.00 0.00 AtomicSimpleCPU::tick() 4.69 11.71 2.84 5060904 0.00 0.00 FullO3CPUO3CPUImpl::tick() 3.60 13.89 2.18 84144079 0.00 0.00 CacheSet::findBlk(unsigned long long) const 2.86 15.62 1.73 79820055 0.00 0.00 CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*, std::allocatorPacket* ) % cumulative self self total time seconds secondscalls s/call s/call name 6.89 4.15 4.15 58969209 0.00 0.00 BaseSimpleCPU::preExecute() 6.78 8.23 4.08 58975463 0.00 0.00 AtomicSimpleCPU::tick() 4.92 11.19 2.96 5060904 0.00 0.00 FullO3CPUO3CPUImpl::tick() 3.90 13.54 2.35 84144079 0.00 0.00 CacheSet::findBlk(unsigned long long) const 3.34 15.55 2.01 79820055 0.00 0.00 CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*, std::allocatorPacket* ) -- Nilay On Wed, November 3, 2010 9:56 pm, Steve Reinhardt wrote: What was the gprof output? On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled M5 but surprisingly I did not find any mention of the function findTagInSet() in the output obtained from gprof. Does it matter what coherence protocol is in use? I carried out the following step - 1. Compiled m5.prof using scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof 2. Ran blackscholes benchmark using the instructions specified in the technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et al. Specifically I ran the following command -- ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1 --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed --caches --l2cache -F 50 These are the contents of blackscholes.rcS #!/bin/sh # File to run the blackscholes benchmark cd /parsec/install/bin /sbin/m5 switchcpu /sbin/m5 dumpstats /sbin/m5 resetstats ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt /parsec/install/inputs/blackscholes/prices.txt echo Done :D /sbin/m5 exit /sbin/m5 exit Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Ah, the issue is that you're using the old M5 memory hierarchy and not Ruby. You need to run one of the Ruby versions, and use ruby_fs.py instead of fs.py. Steve On Wed, Nov 3, 2010 at 9:00 PM, Nilay ni...@cs.wisc.edu wrote: I ran m5.prof two times. Here are the top five functions -- % cumulative self self total time seconds secondscalls s/call s/call name 8.35 5.05 5.05 58969209 0.00 0.00 BaseSimpleCPU::preExecute() 6.32 8.87 3.82 58975463 0.00 0.00 AtomicSimpleCPU::tick() 4.69 11.71 2.84 5060904 0.00 0.00 FullO3CPUO3CPUImpl::tick() 3.60 13.89 2.18 84144079 0.00 0.00 CacheSet::findBlk(unsigned long long) const 2.86 15.62 1.73 79820055 0.00 0.00 CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*, std::allocatorPacket* ) % cumulative self self total time seconds secondscalls s/call s/call name 6.89 4.15 4.15 58969209 0.00 0.00 BaseSimpleCPU::preExecute() 6.78 8.23 4.08 58975463 0.00 0.00 AtomicSimpleCPU::tick() 4.92 11.19 2.96 5060904 0.00 0.00 FullO3CPUO3CPUImpl::tick() 3.90 13.54 2.35 84144079 0.00 0.00 CacheSet::findBlk(unsigned long long) const 3.34 15.55 2.01 79820055 0.00 0.00 CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*, std::allocatorPacket* ) -- Nilay On Wed, November 3, 2010 9:56 pm, Steve Reinhardt wrote: What was the gprof output? On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I profiled M5 but surprisingly I did not find any mention of the function findTagInSet() in the output obtained from gprof. Does it matter what coherence protocol is in use? I carried out the following step - 1. Compiled m5.prof using scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof 2. Ran blackscholes benchmark using the instructions specified in the technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et al. Specifically I ran the following command -- ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1 --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed --caches --l2cache -F 50 These are the contents of blackscholes.rcS #!/bin/sh # File to run the blackscholes benchmark cd /parsec/install/bin /sbin/m5 switchcpu /sbin/m5 dumpstats /sbin/m5 resetstats ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt /parsec/install/inputs/blackscholes/prices.txt echo Done :D /sbin/m5 exit /sbin/m5 exit Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I just compiled m5.prof and ran it (forgot what workload I ran on it, probably one of the parsec benchmarks; it probably doesn't matter a lot). If you've never used gprof before, this is a great time to learn! Steve On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I am looking at possible performance optimizations in Ruby. As you can see grasp from the mail excerpt below, the function findTagInSet() consumes lots of time. I am thinking of making the changes as suggested by Brad. I have questions for m5-dev members, in particular for Derek and Steve. How did you arrive at the conclusion that findTagInSet() is a problem? What benchmarks, profiling tools to use? Thanks Nilay -- Forwarded message -- Date: Mon, 20 Sep 2010 22:57:39 -0500 From: Beckmann, Brad brad.beckm...@amd.com To: 'Nilay Vaish' ni...@cs.wisc.edu Cc: Daniel Gibson gib...@cs.wisc.edu Subject: RE: Performane Optimizations in Ruby == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge percentage of time was being spent in CacheMemory's findTagInSet function. Right now that function uses a hashmap across the entire cache to map tags to way ids. I think Derek recently implemented this change in hopes to improve performance, and it might have for small caches, but I don't think it helps for larger caches. There a couple of possible solutions: per set hashmaps, or reordering the ways so that the MRU blocks are at the lower ids and use a loop. I think we should investigate both solutions and see which is better. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev