Re: [m5-dev] Implementation of findTagInSet

2010-12-20 Thread Beckmann, Brad
Hi Nilay,

I apologize for the delay, but I was mostly travelling / in meetings last week 
and I didn't have a chance to review your patches and emails until this morning.

Overall, your patches are definitely solid steps in the right direction and 
your profiling data sounds very promising.  If you get the chance, please send 
it to me.  I would be interested to know what are the top performance 
bottlenecks after your change.

Before you spend time converting the other protocols, I do want to discuss the 
three points you brought up last week (see below).  I have a bunch of free time 
over the next three days (Mon. - Wed.) and I do think a telephone conversation 
is best to discuss these details.  Let me know what times work for you.

Brad


1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the 
calls to doTransition() function. To set these, we would need to make calls to 
a function that returns the pointer if the address is in the cache, NULL 
otherwise.

I think we should retain the getEntry functions in the .sm files for in case of 
L1 cache both instruction and the data cache needs to be checked. 
This is something that I probably would prefer keeping out of SLICC. In fact, 
we should add getEntry functions for TBEs where ever required.

These getEntry would now return a pointer instead of a reference. We would need 
to add support for return_by_pointer to SLICC. Also, since these functions 
would be used inside the Wakeup function, we would need to assume a common name 
for them across all protocols, just like getState() function.

[BB] I would be very interested why you believe we should keep the getEntry 
functions out of SLICC.  In my mind, this is one of the few functions that is 
very consistent across protocols.  As I mentioned before, I really want to keep 
any notion of pointers out of the .sm files and avoid the changes you are 
proposing to getCacheEntry.  We should probably discuss this in detail 
over-the-phone.

2. I still think we would need to change the changePermission function in the 
CacheMemory class. Presently it calls findTagInSet() twice. Instead, we would 
pass on the CacheEntry whose permissions need to be changed. This would save 
one call. We should also put the variable m_locked in the AbstractCacheEntry 
(may be make it part of the permission variable) to avoid the second call.

[BB] I like moving the locked field to AbstractCacheEntry and removing the 
separate m_locked data structure.  However, just a minor point, but we should 
avoid duplicating code in CacheMemory to support this change.  Other than that, 
this looks good to me.

3. In the getState() and setState() functions, we need to specify that the 
function assumes that implicit TBE and CacheEntry pointers have been passed as 
arguments. How should we do this? I think we would need to push them in to the 
symbol table before they can be used in side the function.

[BB] I'm a little confused by your current patch.  It appears that you are 
proposing having two pairs of getState and setState functions.  I would really 
like to avoid that and just have one pair of getState and setState functions.  
Also when I say implicitly pass the TBE and CacheEntry pointers, I mean that 
for the actions (similar to address).  However, I think it is fine to 
explicitly pass these parameters into getState and setState (also similar to 
Address and State).  


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-20 Thread Steve Reinhardt
On Mon, Dec 20, 2010 at 8:21 AM, Beckmann, Brad brad.beckm...@amd.comwrote:

 Hi Nilay,

 I apologize for the delay, but I was mostly travelling / in meetings last
 week and I didn't have a chance to review your patches and emails until this
 morning.

 Overall, your patches are definitely solid steps in the right direction and
 your profiling data sounds very promising.  If you get the chance, please
 send it to me.  I would be interested to know what are the top performance
 bottlenecks after your change.

 Before you spend time converting the other protocols, I do want to discuss
 the three points you brought up last week (see below).  I have a bunch of
 free time over the next three days (Mon. - Wed.) and I do think a telephone
 conversation is best to discuss these details.  Let me know what times work
 for you.


Ditto for me on basically all of Brad's points.  I'd like to see where the
profile stands now.  I'm also interested in catching up on Nilay's current
changes; I'll try to read through the patches today.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet (fwd)

2010-12-20 Thread Nilay Vaish
These profile results from testing ALPHA_FS_MESI_CMP_directory with 
configs/example/ruby_fs.py. The simulation was allowed to run for 
200,000,000,000 ticks.


Profile Result with unmodified SLICC

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 12.19 34.5134.51 551229802 0.00 0.00 
CacheMemory::isTagPresent(Address const) const

  8.41 58.3323.82 17760155 0.00 0.00 PerfectSwitch::wakeup()
  4.49 71.0312.70 235904391 0.00 0.00  Histogram::add(long 
long)
  2.54 78.23 7.20 172127510 0.00 0.00 
CacheMemory::lookup(Address const)
  2.33 84.82 6.59 93838596 0.00 0.00 
MessageBuffer::enqueue(RefCountingPtrMessage, long long)
  2.10 90.77 5.95 105280086 0.00 0.00 
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
  2.06 96.61 5.84 34537891 0.00 0.00 
BaseSimpleCPU::preExecute()
  1.95102.12 5.51 43900461 0.00 0.00 
RubyPort::M5Port::recvTiming(Packet*)

  1.93107.58 5.46 580192104 0.00 0.00  Set::Set(Set const)
  1.92113.02 5.44 46506080 0.00 0.00 
L1Cache_Controller::wakeup()



Result with modified SLICC

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
  9.97 24.7824.78 17760155 0.00 0.00 PerfectSwitch::wakeup()
  5.42 38.2713.49 101906879 0.00 0.00 
CacheMemory::lookup_ptr(Address const)
  5.32 51.5013.23 235904391 0.00 0.00  Histogram::add(long 
long)

  2.30 57.21 5.71 580192104 0.00 0.00  Set::Set(Set const)
  2.29 62.91 5.70 93838596 0.00 0.00 
MessageBuffer::enqueue(RefCountingPtrMessage, long long)
  2.19 68.36 5.45 46506080 0.00 0.00 
L1Cache_Controller::wakeup()
  2.14 73.67 5.31 34537891 0.00 0.00 
BaseSimpleCPU::preExecute()
  2.10 78.89 5.22 11125106 0.00 0.00 
MemoryControl::executeCycle()
  2.06 84.02 5.13 96775149 0.00 0.00 
RubyEventQueueNode::process()
  1.98 88.94 4.92 105280086 0.00 0.00 
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)

.
.
.
  1.30121.31 3.23 51172611 0.00 0.00 
CacheMemory::isTagPresent(Address const) const



I can send the complete data generated by gprof, if required.

I have inlined my comments.


On Mon, 20 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

I apologize for the delay, but I was mostly travelling / in meetings last 
week and I didn't have a chance to review your patches and emails until this 
morning.


Overall, your patches are definitely solid steps in the right direction and 
your profiling data sounds very promising.  If you get the chance, please 
send it to me.  I would be interested to know what are the top performance 
bottlenecks after your change.


Before you spend time converting the other protocols, I do want to discuss 
the three points you brought up last week (see below).  I have a bunch of 
free time over the next three days (Mon. - Wed.) and I do think a telephone 
conversation is best to discuss these details.  Let me know what times work 
for you.


The semester is over, so I am available almost throughout the day. Today, 
I have a meeting at 3, which I think should be at most an hour long. Over 
next two days, I do not have any thing scheduled so far. So any time will 
work.




Brad


1. Currently the implicit TBE and Cache Entry pointers are set to NULL in the 
calls to doTransition() function. To set these, we would need to make calls 
to a function that returns the pointer if the address is in the cache, NULL 
otherwise.


I think we should retain the getEntry functions in the .sm files for in case 
of L1 cache both instruction and the data cache needs to be checked. This is 
something that I probably would prefer keeping out of SLICC. In fact, we 
should add getEntry functions for TBEs where ever required.


These getEntry would now return a pointer instead of a reference. We would 
need to add support for return_by_pointer to SLICC. Also, since these 
functions would be used inside the Wakeup function, we would need to assume a 
common name for them across all protocols, just like getState() function.


[BB] I would be very interested why you believe we should keep the getEntry 
functions out of SLICC.  In my mind, this is one of the few functions that is 
very consistent across protocols.  As I mentioned before, I really want to 
keep any notion of pointers out of the .sm files and avoid the changes you 
are proposing to getCacheEntry.  We should probably discuss this in detail 
over-the-phone.


We would need to figure out the cache memories machine has, their 
hierarchy, whether there are I and D caches. In fact, MOESI-hammer has L1I 
cache, L1D cache and L2 all in the same machine. I think we should not do 
this analysis in the compiler.




2. I 

Re: [m5-dev] Implementation of findTagInSet (fwd)

2010-12-20 Thread Steve Reinhardt
Nice work!  No need to send the full profile, but what is the net speedup
here?  It seems like we should have eliminated about 10% of the runtime, but
I wanted to verify that.

Also, what workload are you running on top?  With all the time spent in
PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're
running the tester then that's not so bad, but if you're running a regular
program that seems high.

Thanks,

Steve

On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 These profile results from testing ALPHA_FS_MESI_CMP_directory with
 configs/example/ruby_fs.py. The simulation was allowed to run for
 200,000,000,000 ticks.

 Profile Result with unmodified SLICC


  %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  12.19 34.5134.51 551229802 0.00 0.00
 CacheMemory::isTagPresent(Address const) const
  8.41 58.3323.82 17760155 0.00 0.00 PerfectSwitch::wakeup()
  4.49 71.0312.70 235904391 0.00 0.00  Histogram::add(long
 long)
  2.54 78.23 7.20 172127510 0.00 0.00
 CacheMemory::lookup(Address const)
  2.33 84.82 6.59 93838596 0.00 0.00
 MessageBuffer::enqueue(RefCountingPtrMessage, long long)
  2.10 90.77 5.95 105280086 0.00 0.00
 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
  2.06 96.61 5.84 34537891 0.00 0.00
 BaseSimpleCPU::preExecute()
  1.95102.12 5.51 43900461 0.00 0.00
 RubyPort::M5Port::recvTiming(Packet*)
  1.93107.58 5.46 580192104 0.00 0.00  Set::Set(Set const)
  1.92113.02 5.44 46506080 0.00 0.00
 L1Cache_Controller::wakeup()


 Result with modified SLICC


  %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  9.97 24.7824.78 17760155 0.00 0.00 PerfectSwitch::wakeup()
  5.42 38.2713.49 101906879 0.00 0.00
 CacheMemory::lookup_ptr(Address const)
  5.32 51.5013.23 235904391 0.00 0.00  Histogram::add(long
 long)
  2.30 57.21 5.71 580192104 0.00 0.00  Set::Set(Set const)
  2.29 62.91 5.70 93838596 0.00 0.00
 MessageBuffer::enqueue(RefCountingPtrMessage, long long)
  2.19 68.36 5.45 46506080 0.00 0.00
 L1Cache_Controller::wakeup()
  2.14 73.67 5.31 34537891 0.00 0.00
 BaseSimpleCPU::preExecute()
  2.10 78.89 5.22 11125106 0.00 0.00
 MemoryControl::executeCycle()
  2.06 84.02 5.13 96775149 0.00 0.00
 RubyEventQueueNode::process()
  1.98 88.94 4.92 105280086 0.00 0.00
 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
 .
 .
 .
  1.30121.31 3.23 51172611 0.00 0.00
 CacheMemory::isTagPresent(Address const) const


 I can send the complete data generated by gprof, if required.

 I have inlined my comments.



 On Mon, 20 Dec 2010, Beckmann, Brad wrote:

  Hi Nilay,

 I apologize for the delay, but I was mostly travelling / in meetings last
 week and I didn't have a chance to review your patches and emails until this
 morning.

 Overall, your patches are definitely solid steps in the right direction
 and your profiling data sounds very promising.  If you get the chance,
 please send it to me.  I would be interested to know what are the top
 performance bottlenecks after your change.

 Before you spend time converting the other protocols, I do want to discuss
 the three points you brought up last week (see below).  I have a bunch of
 free time over the next three days (Mon. - Wed.) and I do think a telephone
 conversation is best to discuss these details.  Let me know what times work
 for you.


 The semester is over, so I am available almost throughout the day. Today, I
 have a meeting at 3, which I think should be at most an hour long. Over next
 two days, I do not have any thing scheduled so far. So any time will work.



 Brad


 1. Currently the implicit TBE and Cache Entry pointers are set to NULL in
 the calls to doTransition() function. To set these, we would need to make
 calls to a function that returns the pointer if the address is in the cache,
 NULL otherwise.

 I think we should retain the getEntry functions in the .sm files for in
 case of L1 cache both instruction and the data cache needs to be checked.
 This is something that I probably would prefer keeping out of SLICC. In
 fact, we should add getEntry functions for TBEs where ever required.

 These getEntry would now return a pointer instead of a reference. We would
 need to add support for return_by_pointer to SLICC. Also, since these
 functions would be used inside the Wakeup function, we would need to assume
 a common name for them across all protocols, just like getState() function.

 [BB] I would be very interested why you believe we should keep the
 getEntry functions out of SLICC.  In my mind, this is one of the few
 

Re: [m5-dev] Implementation of findTagInSet (fwd)

2010-12-20 Thread Nilay Vaish
I am running m5.prof multiple times to get an idea of average performance. 
I will get back to you later today with the numbers.


Thanks
Nilay

On Mon, 20 Dec 2010, Steve Reinhardt wrote:


Nice work!  No need to send the full profile, but what is the net speedup
here?  It seems like we should have eliminated about 10% of the runtime, but
I wanted to verify that.

Also, what workload are you running on top?  With all the time spent in
PerfectSwitch I'm guessing there's a lot of interconnect traffic; if you're
running the tester then that's not so bad, but if you're running a regular
program that seems high.

Thanks,

Steve

On Mon, Dec 20, 2010 at 9:47 AM, Nilay Vaish ni...@cs.wisc.edu wrote:


These profile results from testing ALPHA_FS_MESI_CMP_directory with
configs/example/ruby_fs.py. The simulation was allowed to run for
200,000,000,000 ticks.

Profile Result with unmodified SLICC


 %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 12.19 34.5134.51 551229802 0.00 0.00
CacheMemory::isTagPresent(Address const) const
 8.41 58.3323.82 17760155 0.00 0.00 PerfectSwitch::wakeup()
 4.49 71.0312.70 235904391 0.00 0.00  Histogram::add(long
long)
 2.54 78.23 7.20 172127510 0.00 0.00
CacheMemory::lookup(Address const)
 2.33 84.82 6.59 93838596 0.00 0.00
MessageBuffer::enqueue(RefCountingPtrMessage, long long)
 2.10 90.77 5.95 105280086 0.00 0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
 2.06 96.61 5.84 34537891 0.00 0.00
BaseSimpleCPU::preExecute()
 1.95102.12 5.51 43900461 0.00 0.00
RubyPort::M5Port::recvTiming(Packet*)
 1.93107.58 5.46 580192104 0.00 0.00  Set::Set(Set const)
 1.92113.02 5.44 46506080 0.00 0.00
L1Cache_Controller::wakeup()


Result with modified SLICC


 %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 9.97 24.7824.78 17760155 0.00 0.00 PerfectSwitch::wakeup()
 5.42 38.2713.49 101906879 0.00 0.00
CacheMemory::lookup_ptr(Address const)
 5.32 51.5013.23 235904391 0.00 0.00  Histogram::add(long
long)
 2.30 57.21 5.71 580192104 0.00 0.00  Set::Set(Set const)
 2.29 62.91 5.70 93838596 0.00 0.00
MessageBuffer::enqueue(RefCountingPtrMessage, long long)
 2.19 68.36 5.45 46506080 0.00 0.00
L1Cache_Controller::wakeup()
 2.14 73.67 5.31 34537891 0.00 0.00
BaseSimpleCPU::preExecute()
 2.10 78.89 5.22 11125106 0.00 0.00
MemoryControl::executeCycle()
 2.06 84.02 5.13 96775149 0.00 0.00
RubyEventQueueNode::process()
 1.98 88.94 4.92 105280086 0.00 0.00
RubyEventQueue::scheduleEventAbsolute(Consumer*, long long)
.
.
.
 1.30121.31 3.23 51172611 0.00 0.00
CacheMemory::isTagPresent(Address const) const


I can send the complete data generated by gprof, if required.

I have inlined my comments.




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-19 Thread Nilay Vaish

Brad

I have tested the changes that I made to files relating to SLICC and 
MESI_CMP_directory protocol. I see a 90% decrease in the number of calls 
to isTagPresent() when I run m5.prof for 200,000,000,000 ticks using 
configs/examples/ruby_fs.py.


Thanks
Nilay


On Fri, 17 Dec 2010, Nilay Vaish wrote:


Hi Brad

I have attached the patch for the changes that I have made so far. This 
patch, I believe, makes all the required changes to the file 
MESI_CMP_directory-L1cache.sm, apart from making changes to SLICC. Can you go 
through this? If this looks fine, then I will make changes to the other 
protocol files.


I think we should have a telephonic discussion on this some time.

Thanks
Nilay



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-11 Thread Nilay Vaish

Brad

We would need to change the lookup functions for TBETable and CacheMemory. 
Currently the lookup functions assume that the address passed on to the 
lookup is present. This requires two lookups to the data structures 
associated with these classes, one for checking whether the address is in 
the cache, second one for returning a reference to the actual cache entry. 
Instead of returning a reference, we can return a pointer to the entry. 
This pointer will be null in case the address is not present in the cache.


Nilay


On Wed, 8 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

Breaking the changes into small portions is a good idea, but we first 
need to decide exactly what we are doing.  So far we've only thrown out 
some ideas.  We have not yet to scope out a complete solution.  I think 
we've settled on passing some sort of reference to the cache and tbe 
entries, but exactly whether that is by reference variables or pointers 
isn't clear.  My initial preference is to use pointers in the generated 
code and set the pointers to NULL when a cache and/or tbe entry doesn't 
exist.  However, one thing I really want to strive for is to keep 
pointer manipulation out of the .sm files.  Writing SLICC code is hard 
enough and we don't want to burden the SLICC programmer with memory 
management as well.


So how about this plan?
 - Lets remove all the getCacheEntry functions from the slicc files.  I 
believe that almost all of these functions look exactly the same and it 
is easy enough for SLICC to just generate them instead.

- Similarly let get rid of all isCacheTagPresent functions as well
 - Then lets replace all the getCacheEntry calls with an implicit SLICC 
supported variable called cache_entry and all the TBEs[addr*] calls with 
an implicit SLICC supported variable called tbe_entry.
   - Underneath these variables can actually be implemented as local 
inlined functions that assert whether the entries are valid and then 
return variables local to the state machine set to the current cache and 
tbe entry.
   - The trigger function will implicitly set these variables (pointers 
underneath) to NULL or valid values, and the only what they can be reset 
is through explicit functions set_cache_entry, reset_cache_entry, 
set_tbe_entry, and reset_tbe_entry.  These function would be called 
by the appropriate actions or possibly be merged with the existing 
check_allocate function.


I think that will give us what we want, but I realize I've just proposed 
changing 100's if not 1000's lines of SLICC code.  I hope that these 
changes are straight forward, but any change like that is never really 
straight forward.


Let's think it over some more and let me know if you want to discuss 
this in more detail over-the-phone.


Brad



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-09 Thread Nilay Vaish

Hi Brad

Is there way to access the StateMachine object inside any of the AST class 
functions? I know the name of the machine can be accessed. But can the 
machine itself be accessed? I need one of the variables in the 
StateMachine object to know whether or not TBETable exists in this 
machine.



Nilay

On Wed, 8 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

I think we can avoid handling pointers in the getState and setState functions if we also add bool 
functions is_cache_entry_valid and is_tbe_entry_valid that are implicitly 
defined in SLICC.  I don't think we should try to get rid of getState and setState since they often 
contain valuable, protocol-specific checks in them.  Instead for getState and setState, I believe 
we should simply replace the current isTagPresent calls with the new is_*_valid calls.

As far as changePermission() goes, your solution seems reasonable, but we may 
also want to consider just not changing that function at all.  
ChangePermission() doesn't actually use a cache entry within the .sm file, so 
is doesn't necessarily need to be changed.  Going back to breaking this work 
into smaller portions, that is definitely a portion I feel can be pushed to the 
end or removed entirely.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Wednesday, December 08, 2010 11:53 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

Hi Brad,

A couple of observations

a. If we make use of pointers, would we not need to handle them in getState and 
setState functions?

b. changePermission() seems to be a problem. It would still perform a lookup 
because the fact that a CacheEntry is a locked or not is maintained in the 
CacheMemory object and not with the entry itself. We can move that variable to 
be part of the AbstractCacheEntry or we can combine it with the permission 
variable which is already there in the AbstractCacheEntry class. I think lock 
is only used in the implementation of LL/SC instructions.

Nilay


On Wed, 8 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

Breaking the changes into small portions is a good idea, but we first need to 
decide exactly what we are doing.  So far we've only thrown out some ideas.  We 
have not yet to scope out a complete solution.  I think we've settled on 
passing some sort of reference to the cache and tbe entries, but exactly 
whether that is by reference variables or pointers isn't clear.  My initial 
preference is to use pointers in the generated code and set the pointers to 
NULL when a cache and/or tbe entry doesn't exist.  However, one thing I really 
want to strive for is to keep pointer manipulation out of the .sm files.  
Writing SLICC code is hard enough and we don't want to burden the SLICC 
programmer with memory management as well.

So how about this plan?
 - Lets remove all the getCacheEntry functions from the slicc files.  I believe 
that almost all of these functions look exactly the same and it is easy enough 
for SLICC to just generate them instead.
- Similarly let get rid of all isCacheTagPresent functions as well
 - Then lets replace all the getCacheEntry calls with an implicit SLICC 
supported variable called cache_entry and all the TBEs[addr*] calls with an 
implicit SLICC supported variable called tbe_entry.
   - Underneath these variables can actually be implemented as local inlined 
functions that assert whether the entries are valid and then return variables 
local to the state machine set to the current cache and tbe entry.
   - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what 
they can be reset is through explicit functions set_cache_entry, reset_cache_entry, 
set_tbe_entry, and reset_tbe_entry.  These function would be called by the appropriate actions or 
possibly be merged with the existing check_allocate function.

I think that will give us what we want, but I realize I've just proposed changing 100's 
if not 1000's lines of SLICC code.  I hope that these changes are straight forward, but 
any change like that is never really straight forward.

Let's think it over some more and let me know if you want to discuss this in 
more detail over-the-phone.

Brad


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-09 Thread Beckmann, Brad
Hi Nilay,

Yes, I believe a machine can be accessed within AST class functions, though I 
don't remember ever doing it myself.  Look at the generate() function in 
TypeFieldEnumAST.  Here you see that the machine (a.k.a StateMachine) is 
grabbed from the symbol table and then different StateMachine functions are 
called on it.  You can imagine adding a new function to StateMachine.py that 
returns whether the TBETable exists.

That seems like it should work to me, but let me know if it doesn't.

Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
 Behalf Of Nilay Vaish
 Sent: Thursday, December 09, 2010 5:24 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Implementation of findTagInSet
 
 Hi Brad
 
 Is there way to access the StateMachine object inside any of the AST
 class
 functions? I know the name of the machine can be accessed. But can the
 machine itself be accessed? I need one of the variables in the
 StateMachine object to know whether or not TBETable exists in this
 machine.
 
 
 Nilay
 
 On Wed, 8 Dec 2010, Beckmann, Brad wrote:
 
  Hi Nilay,
 
  I think we can avoid handling pointers in the getState and setState
 functions if we also add bool functions is_cache_entry_valid and
 is_tbe_entry_valid that are implicitly defined in SLICC.  I don't
 think we should try to get rid of getState and setState since they
 often contain valuable, protocol-specific checks in them.  Instead for
 getState and setState, I believe we should simply replace the current
 isTagPresent calls with the new is_*_valid calls.
 
  As far as changePermission() goes, your solution seems reasonable,
 but we may also want to consider just not changing that function at
 all.  ChangePermission() doesn't actually use a cache entry within the
 .sm file, so is doesn't necessarily need to be changed.  Going back to
 breaking this work into smaller portions, that is definitely a portion
 I feel can be pushed to the end or removed entirely.
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
 Behalf Of Nilay Vaish
  Sent: Wednesday, December 08, 2010 11:53 AM
  To: M5 Developer List
  Subject: Re: [m5-dev] Implementation of findTagInSet
 
  Hi Brad,
 
  A couple of observations
 
  a. If we make use of pointers, would we not need to handle them in
 getState and setState functions?
 
  b. changePermission() seems to be a problem. It would still perform a
 lookup because the fact that a CacheEntry is a locked or not is
 maintained in the CacheMemory object and not with the entry itself. We
 can move that variable to be part of the AbstractCacheEntry or we can
 combine it with the permission variable which is already there in the
 AbstractCacheEntry class. I think lock is only used in the
 implementation of LL/SC instructions.
 
  Nilay
 
 
  On Wed, 8 Dec 2010, Beckmann, Brad wrote:
 
  Hi Nilay,
 
  Breaking the changes into small portions is a good idea, but we
 first need to decide exactly what we are doing.  So far we've only
 thrown out some ideas.  We have not yet to scope out a complete
 solution.  I think we've settled on passing some sort of reference to
 the cache and tbe entries, but exactly whether that is by reference
 variables or pointers isn't clear.  My initial preference is to use
 pointers in the generated code and set the pointers to NULL when a
 cache and/or tbe entry doesn't exist.  However, one thing I really want
 to strive for is to keep pointer manipulation out of the .sm files.
 Writing SLICC code is hard enough and we don't want to burden the SLICC
 programmer with memory management as well.
 
  So how about this plan?
   - Lets remove all the getCacheEntry functions from the slicc files.
 I believe that almost all of these functions look exactly the same and
 it is easy enough for SLICC to just generate them instead.
  - Similarly let get rid of all isCacheTagPresent functions as
 well
   - Then lets replace all the getCacheEntry calls with an implicit
 SLICC supported variable called cache_entry and all the TBEs[addr*]
 calls with an implicit SLICC supported variable called tbe_entry.
 - Underneath these variables can actually be implemented as local
 inlined functions that assert whether the entries are valid and then
 return variables local to the state machine set to the current cache
 and tbe entry.
 - The trigger function will implicitly set these variables
 (pointers underneath) to NULL or valid values, and the only what they
 can be reset is through explicit functions set_cache_entry,
 reset_cache_entry, set_tbe_entry, and reset_tbe_entry.  These
 function would be called by the appropriate actions or possibly be
 merged with the existing check_allocate function.
 
  I think that will give us what we want, but I realize I've just
 proposed changing 100's if not 1000's lines of SLICC code.  I hope that
 these changes are straight forward, but any change like that is never

Re: [m5-dev] Implementation of findTagInSet

2010-12-09 Thread Nilay Vaish

It works perfectly. Thanks!

Nilay

On Thu, 9 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

Yes, I believe a machine can be accessed within AST class functions, though I 
don't remember ever doing it myself.  Look at the generate() function in 
TypeFieldEnumAST.  Here you see that the machine (a.k.a StateMachine) is 
grabbed from the symbol table and then different StateMachine functions are 
called on it.  You can imagine adding a new function to StateMachine.py that 
returns whether the TBETable exists.

That seems like it should work to me, but let me know if it doesn't.

Brad





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Beckmann, Brad
Hi Nilay,

Breaking the changes into small portions is a good idea, but we first need to 
decide exactly what we are doing.  So far we've only thrown out some ideas.  We 
have not yet to scope out a complete solution.  I think we've settled on 
passing some sort of reference to the cache and tbe entries, but exactly 
whether that is by reference variables or pointers isn't clear.  My initial 
preference is to use pointers in the generated code and set the pointers to 
NULL when a cache and/or tbe entry doesn't exist.  However, one thing I really 
want to strive for is to keep pointer manipulation out of the .sm files.  
Writing SLICC code is hard enough and we don't want to burden the SLICC 
programmer with memory management as well.  

So how about this plan?
  - Lets remove all the getCacheEntry functions from the slicc files.  I 
believe that almost all of these functions look exactly the same and it is easy 
enough for SLICC to just generate them instead.
 - Similarly let get rid of all isCacheTagPresent functions as well
  - Then lets replace all the getCacheEntry calls with an implicit SLICC 
supported variable called cache_entry and all the TBEs[addr*] calls with an 
implicit SLICC supported variable called tbe_entry.
- Underneath these variables can actually be implemented as local inlined 
functions that assert whether the entries are valid and then return variables 
local to the state machine set to the current cache and tbe entry. 
- The trigger function will implicitly set these variables (pointers 
underneath) to NULL or valid values, and the only what they can be reset is 
through explicit functions set_cache_entry, reset_cache_entry, 
set_tbe_entry, and reset_tbe_entry.  These function would be called by the 
appropriate actions or possibly be merged with the existing check_allocate 
function.

I think that will give us what we want, but I realize I've just proposed 
changing 100's if not 1000's lines of SLICC code.  I hope that these changes 
are straight forward, but any change like that is never really straight 
forward.

Let's think it over some more and let me know if you want to discuss this in 
more detail over-the-phone.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Tuesday, December 07, 2010 5:21 PM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

Brad,

Let's try to break the required changes into small portions. Given my feeble 
knowledge of Ruby, it would be for me to visualize what change is going to have 
what effect.

One question, should we use pointers to pass the cache entry around, or should 
we make use of reference variables? Currently lookup functions return 
references to cache entries.

Nilay


On Tue, 7 Dec 2010, Beckmann, Brad wrote:

 Hi Nilay,

 Yes, this is not an easy issue to fix.  First to answer your other 
 question, I believe TBE stands for Transaction Buffer Entry, or 
 something to that effect.  As you suggested, we can pass in the cache 
 entry and possibly even the TBE entry into the trigger function.  Thus 
 all actions will implicitly include these two parameters as inputs and 
 not require continually lookups or even local variables.  However, I 
 believe to make this work we need to change the semantics for 
 allocating and deallocating cache and TBE entries.  In particular, 
 these operations probably should be handled by specialized operators 
 (similar to trigger) that correctly manage the pointers underneath.

 Does that make sense?  Let me know if you'd like to brainstorm more 
 about this over a phone conversation.

 Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On 
 Behalf Of Nilay Vaish
 Sent: Tuesday, December 07, 2010 12:16 AM
 To: M5 Developer List
 Subject: Re: [m5-dev] Implementation of findTagInSet

 I have made changes to SLICC to support local reference variables. I 
 think we should reference variables in functions where back to back 
 calls are made to lookup/getCacheEntry functions.

 Overall, I am still unclear how can we handle this issue.

 Nilay


 On Tue, 30 Nov 2010, Nilay Vaish wrote:

 Is it possible to have variables local to a function in .sm files. I 
 am thinking of storing getCacheEntry()'s return value in a local variable.

 Nilay

 On Mon, 29 Nov 2010, Beckmann, Brad wrote:

 Hi Nilay,

 I don't think we want to replace the implicit Address parameter 
 inside the state machines with the CacheEntry parameter, but we 
 might want to supplement the state machine functions to include 
 both.  I don't think we can replace the Address parameter because 
 certain transitions within a state machine don't operate on a 
 CacheEntry, but they do operate on an Address.  However, as we 
 discussed last week, we might be able to pass the CacheEntry into 
 the trigger function along with the Address, which is then 
 implicitly included in all actions

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Steve Reinhardt
This sounds like a great direction to me... continuing in this vein, would
it be possible to factor out the protocol-specific implementations of
getState() and setState() entirely?  I'm thinking that each of these calls
involves a check to see if the block is in a TBE or not, followed by the
code to handle the case where it's not in a TBE but is in the cache, and if
there's a way to only do the TBE check once per access that could save even
more.

In terms of keeping changes small, you should save this for after you do the
changes Brad suggests, and maybe it's actually not even a good idea, but I
wanted to plant the seed.

Steve

On Wed, Dec 8, 2010 at 9:00 AM, Beckmann, Brad brad.beckm...@amd.comwrote:

 Hi Nilay,

 Breaking the changes into small portions is a good idea, but we first need
 to decide exactly what we are doing.  So far we've only thrown out some
 ideas.  We have not yet to scope out a complete solution.  I think we've
 settled on passing some sort of reference to the cache and tbe entries, but
 exactly whether that is by reference variables or pointers isn't clear.  My
 initial preference is to use pointers in the generated code and set the
 pointers to NULL when a cache and/or tbe entry doesn't exist.  However, one
 thing I really want to strive for is to keep pointer manipulation out of the
 .sm files.  Writing SLICC code is hard enough and we don't want to burden
 the SLICC programmer with memory management as well.

 So how about this plan?
  - Lets remove all the getCacheEntry functions from the slicc files.  I
 believe that almost all of these functions look exactly the same and it is
 easy enough for SLICC to just generate them instead.
 - Similarly let get rid of all isCacheTagPresent functions as well
  - Then lets replace all the getCacheEntry calls with an implicit SLICC
 supported variable called cache_entry and all the TBEs[addr*] calls with an
 implicit SLICC supported variable called tbe_entry.
- Underneath these variables can actually be implemented as local
 inlined functions that assert whether the entries are valid and then return
 variables local to the state machine set to the current cache and tbe entry.
- The trigger function will implicitly set these variables (pointers
 underneath) to NULL or valid values, and the only what they can be reset is
 through explicit functions set_cache_entry, reset_cache_entry,
 set_tbe_entry, and reset_tbe_entry.  These function would be called by
 the appropriate actions or possibly be merged with the existing
 check_allocate function.

 I think that will give us what we want, but I realize I've just proposed
 changing 100's if not 1000's lines of SLICC code.  I hope that these changes
 are straight forward, but any change like that is never really straight
 forward.

 Let's think it over some more and let me know if you want to discuss this
 in more detail over-the-phone.

 Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf
 Of Nilay Vaish
 Sent: Tuesday, December 07, 2010 5:21 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Implementation of findTagInSet

 Brad,

 Let's try to break the required changes into small portions. Given my
 feeble knowledge of Ruby, it would be for me to visualize what change is
 going to have what effect.

 One question, should we use pointers to pass the cache entry around, or
 should we make use of reference variables? Currently lookup functions return
 references to cache entries.

 Nilay


 On Tue, 7 Dec 2010, Beckmann, Brad wrote:

  Hi Nilay,
 
  Yes, this is not an easy issue to fix.  First to answer your other
  question, I believe TBE stands for Transaction Buffer Entry, or
  something to that effect.  As you suggested, we can pass in the cache
  entry and possibly even the TBE entry into the trigger function.  Thus
  all actions will implicitly include these two parameters as inputs and
  not require continually lookups or even local variables.  However, I
  believe to make this work we need to change the semantics for
  allocating and deallocating cache and TBE entries.  In particular,
  these operations probably should be handled by specialized operators
  (similar to trigger) that correctly manage the pointers underneath.
 
  Does that make sense?  Let me know if you'd like to brainstorm more
  about this over a phone conversation.
 
  Brad
 
 
  -Original Message-
  From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
  Behalf Of Nilay Vaish
  Sent: Tuesday, December 07, 2010 12:16 AM
  To: M5 Developer List
  Subject: Re: [m5-dev] Implementation of findTagInSet
 
  I have made changes to SLICC to support local reference variables. I
  think we should reference variables in functions where back to back
  calls are made to lookup/getCacheEntry functions.
 
  Overall, I am still unclear how can we handle this issue.
 
  Nilay
 
 
  On Tue, 30 Nov 2010, Nilay Vaish wrote:
 
  Is it possible to have

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Nilay Vaish

Hi Brad,

A couple of observations

a. If we make use of pointers, would we not need to handle them in 
getState and setState functions?


b. changePermission() seems to be a problem. It would still perform a 
lookup because the fact that a CacheEntry is a locked or not is maintained 
in the CacheMemory object and not with the entry itself. We can move that 
variable to be part of the AbstractCacheEntry or we can combine it with 
the permission variable which is already there in the AbstractCacheEntry 
class. I think lock is only used in the implementation of LL/SC 
instructions.


Nilay


On Wed, 8 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

Breaking the changes into small portions is a good idea, but we first need to 
decide exactly what we are doing.  So far we've only thrown out some ideas.  We 
have not yet to scope out a complete solution.  I think we've settled on 
passing some sort of reference to the cache and tbe entries, but exactly 
whether that is by reference variables or pointers isn't clear.  My initial 
preference is to use pointers in the generated code and set the pointers to 
NULL when a cache and/or tbe entry doesn't exist.  However, one thing I really 
want to strive for is to keep pointer manipulation out of the .sm files.  
Writing SLICC code is hard enough and we don't want to burden the SLICC 
programmer with memory management as well.

So how about this plan?
 - Lets remove all the getCacheEntry functions from the slicc files.  I believe 
that almost all of these functions look exactly the same and it is easy enough 
for SLICC to just generate them instead.
- Similarly let get rid of all isCacheTagPresent functions as well
 - Then lets replace all the getCacheEntry calls with an implicit SLICC 
supported variable called cache_entry and all the TBEs[addr*] calls with an 
implicit SLICC supported variable called tbe_entry.
   - Underneath these variables can actually be implemented as local inlined 
functions that assert whether the entries are valid and then return variables 
local to the state machine set to the current cache and tbe entry.
   - The trigger function will implicitly set these variables (pointers underneath) to NULL or valid values, and the only what 
they can be reset is through explicit functions set_cache_entry, reset_cache_entry, 
set_tbe_entry, and reset_tbe_entry.  These function would be called by the appropriate actions or 
possibly be merged with the existing check_allocate function.

I think that will give us what we want, but I realize I've just proposed changing 100's 
if not 1000's lines of SLICC code.  I hope that these changes are straight forward, but 
any change like that is never really straight forward.

Let's think it over some more and let me know if you want to discuss this in 
more detail over-the-phone.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Tuesday, December 07, 2010 5:21 PM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

Brad,

Let's try to break the required changes into small portions. Given my feeble 
knowledge of Ruby, it would be for me to visualize what change is going to have 
what effect.

One question, should we use pointers to pass the cache entry around, or should 
we make use of reference variables? Currently lookup functions return 
references to cache entries.

Nilay


On Tue, 7 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

Yes, this is not an easy issue to fix.  First to answer your other
question, I believe TBE stands for Transaction Buffer Entry, or
something to that effect.  As you suggested, we can pass in the cache
entry and possibly even the TBE entry into the trigger function.  Thus
all actions will implicitly include these two parameters as inputs and
not require continually lookups or even local variables.  However, I
believe to make this work we need to change the semantics for
allocating and deallocating cache and TBE entries.  In particular,
these operations probably should be handled by specialized operators
(similar to trigger) that correctly manage the pointers underneath.

Does that make sense?  Let me know if you'd like to brainstorm more
about this over a phone conversation.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
Behalf Of Nilay Vaish
Sent: Tuesday, December 07, 2010 12:16 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

I have made changes to SLICC to support local reference variables. I
think we should reference variables in functions where back to back
calls are made to lookup/getCacheEntry functions.

Overall, I am still unclear how can we handle this issue.

Nilay


On Tue, 30 Nov 2010, Nilay Vaish wrote:


Is it possible to have variables local to a function in .sm files. I
am thinking of storing getCacheEntry()'s return value in a local variable.

Nilay

On Mon, 29 Nov 2010, Beckmann

Re: [m5-dev] Implementation of findTagInSet

2010-12-08 Thread Beckmann, Brad
Hi Nilay,

I think we can avoid handling pointers in the getState and setState functions 
if we also add bool functions is_cache_entry_valid and is_tbe_entry_valid 
that are implicitly defined in SLICC.  I don't think we should try to get rid 
of getState and setState since they often contain valuable, protocol-specific 
checks in them.  Instead for getState and setState, I believe we should simply 
replace the current isTagPresent calls with the new is_*_valid calls.

As far as changePermission() goes, your solution seems reasonable, but we may 
also want to consider just not changing that function at all.  
ChangePermission() doesn't actually use a cache entry within the .sm file, so 
is doesn't necessarily need to be changed.  Going back to breaking this work 
into smaller portions, that is definitely a portion I feel can be pushed to the 
end or removed entirely.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Wednesday, December 08, 2010 11:53 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

Hi Brad,

A couple of observations

a. If we make use of pointers, would we not need to handle them in getState and 
setState functions?

b. changePermission() seems to be a problem. It would still perform a lookup 
because the fact that a CacheEntry is a locked or not is maintained in the 
CacheMemory object and not with the entry itself. We can move that variable to 
be part of the AbstractCacheEntry or we can combine it with the permission 
variable which is already there in the AbstractCacheEntry class. I think lock 
is only used in the implementation of LL/SC instructions.

Nilay


On Wed, 8 Dec 2010, Beckmann, Brad wrote:

 Hi Nilay,

 Breaking the changes into small portions is a good idea, but we first need to 
 decide exactly what we are doing.  So far we've only thrown out some ideas.  
 We have not yet to scope out a complete solution.  I think we've settled on 
 passing some sort of reference to the cache and tbe entries, but exactly 
 whether that is by reference variables or pointers isn't clear.  My initial 
 preference is to use pointers in the generated code and set the pointers to 
 NULL when a cache and/or tbe entry doesn't exist.  However, one thing I 
 really want to strive for is to keep pointer manipulation out of the .sm 
 files.  Writing SLICC code is hard enough and we don't want to burden the 
 SLICC programmer with memory management as well.

 So how about this plan?
  - Lets remove all the getCacheEntry functions from the slicc files.  I 
 believe that almost all of these functions look exactly the same and it is 
 easy enough for SLICC to just generate them instead.
 - Similarly let get rid of all isCacheTagPresent functions as well
  - Then lets replace all the getCacheEntry calls with an implicit SLICC 
 supported variable called cache_entry and all the TBEs[addr*] calls with an 
 implicit SLICC supported variable called tbe_entry.
- Underneath these variables can actually be implemented as local inlined 
 functions that assert whether the entries are valid and then return variables 
 local to the state machine set to the current cache and tbe entry.
- The trigger function will implicitly set these variables (pointers 
 underneath) to NULL or valid values, and the only what they can be reset is 
 through explicit functions set_cache_entry, reset_cache_entry, 
 set_tbe_entry, and reset_tbe_entry.  These function would be called by 
 the appropriate actions or possibly be merged with the existing 
 check_allocate function.

 I think that will give us what we want, but I realize I've just proposed 
 changing 100's if not 1000's lines of SLICC code.  I hope that these changes 
 are straight forward, but any change like that is never really straight 
 forward.

 Let's think it over some more and let me know if you want to discuss this in 
 more detail over-the-phone.

 Brad


 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
 Nilay Vaish
 Sent: Tuesday, December 07, 2010 5:21 PM
 To: M5 Developer List
 Subject: Re: [m5-dev] Implementation of findTagInSet

 Brad,

 Let's try to break the required changes into small portions. Given my feeble 
 knowledge of Ruby, it would be for me to visualize what change is going to 
 have what effect.

 One question, should we use pointers to pass the cache entry around, or 
 should we make use of reference variables? Currently lookup functions return 
 references to cache entries.

 Nilay


 On Tue, 7 Dec 2010, Beckmann, Brad wrote:

 Hi Nilay,

 Yes, this is not an easy issue to fix.  First to answer your other
 question, I believe TBE stands for Transaction Buffer Entry, or
 something to that effect.  As you suggested, we can pass in the cache
 entry and possibly even the TBE entry into the trigger function.  Thus
 all actions will implicitly include these two

Re: [m5-dev] Implementation of findTagInSet

2010-12-07 Thread Nilay Vaish
I have made changes to SLICC to support local reference variables. I think 
we should reference variables in functions where back to back calls are 
made to lookup/getCacheEntry functions.


Overall, I am still unclear how can we handle this issue.

Nilay


On Tue, 30 Nov 2010, Nilay Vaish wrote:

Is it possible to have variables local to a function in .sm files. I am 
thinking of storing getCacheEntry()'s return value in a local variable.


Nilay

On Mon, 29 Nov 2010, Beckmann, Brad wrote:


Hi Nilay,

I don't think we want to replace the implicit Address parameter inside the 
state machines with the CacheEntry parameter, but we might want to 
supplement the state machine functions to include both.  I don't think we 
can replace the Address parameter because certain transitions within a 
state machine don't operate on a CacheEntry, but they do operate on an 
Address.  However, as we discussed last week, we might be able to pass the 
CacheEntry into the trigger function along with the Address, which is then 
implicitly included in all actions.  The key in my mind is that we want to 
maintain the current programming invariant that SLICC does not expose 
pointers, but underneath the generated code needs to manage that sometimes 
the CacheEntry pointer may equal NULL.  In particular, I would like to 
minimize any added complexity we put on the setState function.  I think we 
can make this work, but we need to think through the details, including how 
replacements are handled.


I have a few other things I need to take care of first, but I may be able 
to look into the details of how to make this work by the end of the week.


Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-07 Thread Beckmann, Brad
Hi Nilay,

Yes, this is not an easy issue to fix.  First to answer your other question, I 
believe TBE stands for Transaction Buffer Entry, or something to that effect.  
As you suggested, we can pass in the cache entry and possibly even the TBE 
entry into the trigger function.  Thus all actions will implicitly include 
these two parameters as inputs and not require continually lookups or even 
local variables.  However, I believe to make this work we need to change the 
semantics for allocating and deallocating cache and TBE entries.  In 
particular, these operations probably should be handled by specialized 
operators (similar to trigger) that correctly manage the pointers underneath.

Does that make sense?  Let me know if you'd like to brainstorm more about this 
over a phone conversation.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Tuesday, December 07, 2010 12:16 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

I have made changes to SLICC to support local reference variables. I think we 
should reference variables in functions where back to back calls are made to 
lookup/getCacheEntry functions.

Overall, I am still unclear how can we handle this issue.

Nilay


On Tue, 30 Nov 2010, Nilay Vaish wrote:

 Is it possible to have variables local to a function in .sm files. I 
 am thinking of storing getCacheEntry()'s return value in a local variable.

 Nilay

 On Mon, 29 Nov 2010, Beckmann, Brad wrote:

 Hi Nilay,
 
 I don't think we want to replace the implicit Address parameter 
 inside the state machines with the CacheEntry parameter, but we might 
 want to supplement the state machine functions to include both.  I 
 don't think we can replace the Address parameter because certain 
 transitions within a state machine don't operate on a CacheEntry, but 
 they do operate on an Address.  However, as we discussed last week, 
 we might be able to pass the CacheEntry into the trigger function 
 along with the Address, which is then implicitly included in all 
 actions.  The key in my mind is that we want to maintain the current 
 programming invariant that SLICC does not expose pointers, but 
 underneath the generated code needs to manage that sometimes the 
 CacheEntry pointer may equal NULL.  In particular, I would like to 
 minimize any added complexity we put on the setState function.  I 
 think we can make this work, but we need to think through the details, 
 including how replacements are handled.
 
 I have a few other things I need to take care of first, but I may be 
 able to look into the details of how to make this work by the end of the 
 week.
 
 Brad
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-07 Thread Nilay Vaish

Brad,

Let's try to break the required changes into small portions. Given my 
feeble knowledge of Ruby, it would be for me to visualize what change is 
going to have what effect.


One question, should we use pointers to pass the cache entry around, or 
should we make use of reference variables? Currently lookup functions 
return references to cache entries.


Nilay


On Tue, 7 Dec 2010, Beckmann, Brad wrote:


Hi Nilay,

Yes, this is not an easy issue to fix.  First to answer your other 
question, I believe TBE stands for Transaction Buffer Entry, or 
something to that effect.  As you suggested, we can pass in the cache 
entry and possibly even the TBE entry into the trigger function.  Thus 
all actions will implicitly include these two parameters as inputs and 
not require continually lookups or even local variables.  However, I 
believe to make this work we need to change the semantics for allocating 
and deallocating cache and TBE entries.  In particular, these operations 
probably should be handled by specialized operators (similar to trigger) 
that correctly manage the pointers underneath.


Does that make sense?  Let me know if you'd like to brainstorm more 
about this over a phone conversation.


Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Tuesday, December 07, 2010 12:16 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

I have made changes to SLICC to support local reference variables. I 
think we should reference variables in functions where back to back 
calls are made to lookup/getCacheEntry functions.


Overall, I am still unclear how can we handle this issue.

Nilay


On Tue, 30 Nov 2010, Nilay Vaish wrote:


Is it possible to have variables local to a function in .sm files. I
am thinking of storing getCacheEntry()'s return value in a local variable.

Nilay

On Mon, 29 Nov 2010, Beckmann, Brad wrote:


Hi Nilay,

I don't think we want to replace the implicit Address parameter inside 
the state machines with the CacheEntry parameter, but we might want to 
supplement the state machine functions to include both.  I don't think 
we can replace the Address parameter because certain transitions 
within a state machine don't operate on a CacheEntry, but they do 
operate on an Address.  However, as we discussed last week, we might 
be able to pass the CacheEntry into the trigger function along with 
the Address, which is then implicitly included in all actions.  The 
key in my mind is that we want to maintain the current programming 
invariant that SLICC does not expose pointers, but underneath the 
generated code needs to manage that sometimes the CacheEntry pointer 
may equal NULL.  In particular, I would like to minimize any added 
complexity we put on the setState function.  I think we can make this 
work, but we need to think through the details, including how 
replacements are handled.


I have a few other things I need to take care of first, but I may be
able to look into the details of how to make this work by the end of the week.

Brad

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-12-03 Thread Nilay Vaish
This what I have thought of. Currently, doTransition() function takes in 
the cache state of the address that is being supplied. This function 
further calls setState() function, one of functions that repeatedly calls 
isTagPresent(). Instead, if we pass the cache state and the cache Entry 
reference to doTransition(), then we will not have to call all the 
isTagPresent().


The problem with this is that before the caches looked up, there is a 
structure called TBE which is being looked up. What does TBE stands for? 
We can pass references to a TBE entry and a cache entry. If the TBE entry 
has a valid state, then it is used. Else the state of the cache entry is 
looked and used if valid.


Nilay

On Tue, 30 Nov 2010, Nilay Vaish wrote:

Is it possible to have variables local to a function in .sm files. I am 
thinking of storing getCacheEntry()'s return value in a local variable.


Nilay

On Mon, 29 Nov 2010, Beckmann, Brad wrote:


Hi Nilay,

I don't think we want to replace the implicit Address parameter inside the 
state machines with the CacheEntry parameter, but we might want to 
supplement the state machine functions to include both.  I don't think we 
can replace the Address parameter because certain transitions within a 
state machine don't operate on a CacheEntry, but they do operate on an 
Address.  However, as we discussed last week, we might be able to pass the 
CacheEntry into the trigger function along with the Address, which is then 
implicitly included in all actions.  The key in my mind is that we want to 
maintain the current programming invariant that SLICC does not expose 
pointers, but underneath the generated code needs to manage that sometimes 
the CacheEntry pointer may equal NULL.  In particular, I would like to 
minimize any added complexity we put on the setState function.  I think we 
can make this work, but we need to think through the details, including how 
replacements are handled.


I have a few other things I need to take care of first, but I may be able 
to look into the details of how to make this work by the end of the week.


Brad


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-30 Thread Nilay Vaish
Is it possible to have variables local to a function in .sm files. I 
am thinking of storing getCacheEntry()'s return value in a local 
variable.


Nilay

On Mon, 29 Nov 2010, Beckmann, Brad wrote:


Hi Nilay,

I don't think we want to replace the implicit Address parameter inside 
the state machines with the CacheEntry parameter, but we might want to 
supplement the state machine functions to include both.  I don't think 
we can replace the Address parameter because certain transitions within 
a state machine don't operate on a CacheEntry, but they do operate on an 
Address.  However, as we discussed last week, we might be able to pass 
the CacheEntry into the trigger function along with the Address, which 
is then implicitly included in all actions.  The key in my mind is that 
we want to maintain the current programming invariant that SLICC does 
not expose pointers, but underneath the generated code needs to manage 
that sometimes the CacheEntry pointer may equal NULL.  In particular, I 
would like to minimize any added complexity we put on the setState 
function.  I think we can make this work, but we need to think through 
the details, including how replacements are handled.


I have a few other things I need to take care of first, but I may be 
able to look into the details of how to make this work by the end of the 
week.


Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Saturday, November 27, 2010 11:40 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

Is it not possible to redesign the functions to accept CacheEntry as a 
paramemter instead of a Address parameter?



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-29 Thread Beckmann, Brad
Hi Nilay,

I don't think we want to replace the implicit Address parameter inside the 
state machines with the CacheEntry parameter, but we might want to supplement 
the state machine functions to include both.  I don't think we can replace the 
Address parameter because certain transitions within a state machine don't 
operate on a CacheEntry, but they do operate on an Address.  However, as we 
discussed last week, we might be able to pass the CacheEntry into the trigger 
function along with the Address, which is then implicitly included in all 
actions.  The key in my mind is that we want to maintain the current 
programming invariant that SLICC does not expose pointers, but underneath the 
generated code needs to manage that sometimes the CacheEntry pointer may equal 
NULL.  In particular, I would like to minimize any added complexity we put on 
the setState function.  I think we can make this work, but we need to think 
through the details, including how replacements are handled.

I have a few other things I need to take care of first, but I may be able to 
look into the details of how to make this work by the end of the week.

Brad


-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of 
Nilay Vaish
Sent: Saturday, November 27, 2010 11:40 AM
To: M5 Developer List
Subject: Re: [m5-dev] Implementation of findTagInSet

Is it not possible to redesign the functions to accept CacheEntry as a 
paramemter instead of a Address parameter?


On Sat, 27 Nov 2010, Nilay Vaish wrote:

 I conducted an experiment to figure out how many calls are made to the hash 
 table to check if the given address exists in the cache. For the same setup 
 as before, less than 10% calls are made. That is out of about 880,000,000 
 calls to the isTagPresent function, only about 81,000,000 actually go and 
 search the hash table.

 I think we should work towards removing some of the redundant calls. I have a 
 partial fix for some portion of the code. But again, it is not a design 
 change. I am unsure how to change the design of Ruby and/or Slicc to get done 
 with these redundant calls.

 Brad, do you have something in mind on this?

 Thanks
 Nilay

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-27 Thread Nilay Vaish
I conducted an experiment to figure out how many calls are made to the 
hash table to check if the given address exists in the cache. For the same 
setup as before, less than 10% calls are made. That is out of about 
880,000,000 calls to the isTagPresent function, only about 81,000,000 
actually go and search the hash table.


I think we should work towards removing some of the redundant calls. I 
have a partial fix for some portion of the code. But again, it is not a 
design change. I am unsure how to change the design of Ruby and/or Slicc 
to get done with these redundant calls.


Brad, do you have something in mind on this?

Thanks
Nilay

On Fri, 26 Nov 2010, Steve Reinhardt wrote:


Hi Nilay,

Good job, this is clearly progress... you've sped up isTagPresent by 2X and
the simulation overall by almost 10%.  That's nothing to sneeze at.  It's
sad that isTagPresent is still the top function though.  Can you do some
tracing or other experiments to get a feel for whether keeping the last N
tags instead of the last 1 (for some small value of N, like 2 or 3) would be
useful?  Just printing out a trace of calls to isTagPresent should be enough
to get a feeling for whether that's worth hacking in a test implementation.

Also, I see a lot of your patch has to do with removing const labels from
isTagPresent... this is exactly the scenario the 'mutable' keyword was
designed for; if you mark your m_mru_* fields as mutable, then you shouldn't
have to remove the const labels from any of the function calls.

Steve

On Fri, Nov 26, 2010 at 9:37 AM, Nilay Vaish ni...@cs.wisc.edu wrote:


I profiled the un-modified and the modified m5 ten times (this time there
was no load on the machine). Here are the average results:

 % time   std. dev  actual time   std. dev
un-modified
isTagPresent   19.99 0.35  47.17 1.23
cumulative 100   0.00  235.913.37

modified
isTagPresent   10.35 0.28  21.22 0.57
cumulative 100   0.00  205.112.94

Below is the patch, though it may not apply cleanly to current version of
m5 since I have few un-committed patches enqueued.


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-27 Thread Nilay Vaish
Is it not possible to redesign the functions to accept CacheEntry as a 
paramemter instead of a Address parameter?



On Sat, 27 Nov 2010, Nilay Vaish wrote:

I conducted an experiment to figure out how many calls are made to the hash 
table to check if the given address exists in the cache. For the same setup 
as before, less than 10% calls are made. That is out of about 880,000,000 
calls to the isTagPresent function, only about 81,000,000 actually go and 
search the hash table.


I think we should work towards removing some of the redundant calls. I have a 
partial fix for some portion of the code. But again, it is not a design 
change. I am unsure how to change the design of Ruby and/or Slicc to get done 
with these redundant calls.


Brad, do you have something in mind on this?

Thanks
Nilay


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-26 Thread Nilay Vaish
I profiled the un-modified and the modified m5 ten times (this time there 
was no load on the machine). Here are the average results:


  % time   std. dev  actual time   std. dev
un-modified
isTagPresent   19.99 0.35  47.17 1.23
cumulative 100   0.00  235.913.37

modified
isTagPresent   10.35 0.28  21.22 0.57
cumulative 100   0.00  205.112.94

Below is the patch, though it may not apply cleanly to current version of 
m5 since I have few un-committed patches enqueued.



# HG changeset patch
# Parent 7ac53378e03b5116c48e6076167de6a2a2e06158

diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc
--- a/src/mem/ruby/system/CacheMemory.cc	Thu Nov 25 13:23:51 2010 
-0600
+++ b/src/mem/ruby/system/CacheMemory.cc	Thu Nov 25 17:30:58 2010 
-0600

@@ -84,6 +84,8 @@
 m_locked[i][j] = -1;
 }
 }
+
+m_valid_mru_address = false;
 }

 CacheMemory::~CacheMemory()
@@ -135,15 +137,26 @@
 // Given a cache index: returns the index of the tag in a set.
 // returns -1 if the tag is not found.
 int
-CacheMemory::findTagInSet(Index cacheSet, const Address tag) const
+CacheMemory::findTagInSet(Index cacheSet, const Address tag)
 {
 assert(tag == line_address(tag));
+
+if(m_valid_mru_address  m_mru_address == tag) return 
m_mru_tag_index;

+
 // search the set for the tags
+m_valid_mru_address = true;
+m_mru_address.setAddress(tag.getAddress());
+
 m5::hash_mapAddress, int::const_iterator it = 
m_tag_index.find(tag);

 if (it != m_tag_index.end())
 if (m_cache[cacheSet][it-second]-m_Permission !=
 AccessPermission_NotPresent)
+{
+m_mru_tag_index = it-second;
 return it-second;
+}
+
+m_mru_tag_index = -1;
 return -1; // Not found
 }

@@ -215,7 +228,7 @@

 // tests to see if an address is present in the cache
 bool
-CacheMemory::isTagPresent(const Address address) const
+CacheMemory::isTagPresent(const Address address)
 {
 assert(address == line_address(address));
 Index cacheSet = addressToCacheSet(address);
@@ -276,6 +289,10 @@
 m_locked[cacheSet][i] = -1;
 m_tag_index[address] = i;

+m_valid_mru_address = true;
+m_mru_address.setAddress(address.getAddress());
+m_mru_tag_index = i;
+
 m_replacementPolicy_ptr-
 touch(cacheSet, i, g_eventQueue_ptr-getTime());

@@ -300,6 +317,8 @@
 address);
 m_locked[cacheSet][loc] = -1;
 m_tag_index.erase(address);
+
+m_valid_mru_address = false;
 }
 }

@@ -327,18 +346,18 @@
 }

 // looks an address up in the cache
-const AbstractCacheEntry
-CacheMemory::lookup(const Address address) const
+/*const AbstractCacheEntry
+CacheMemory::lookup(const Address address)
 {
 assert(address == line_address(address));
 Index cacheSet = addressToCacheSet(address);
 int loc = findTagInSet(cacheSet, address);
 assert(loc != -1);
 return *m_cache[cacheSet][loc];
-}
+}*/

 AccessPermission
-CacheMemory::getPermission(const Address address) const
+CacheMemory::getPermission(const Address address)
 {
 assert(address == line_address(address));
 return lookup(address).m_Permission;
diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh
--- a/src/mem/ruby/system/CacheMemory.hh	Thu Nov 25 13:23:51 2010 
-0600
+++ b/src/mem/ruby/system/CacheMemory.hh	Thu Nov 25 17:30:58 2010 
-0600

@@ -74,7 +74,7 @@
  DataBlock* data_ptr);

 // tests to see if an address is present in the cache
-bool isTagPresent(const Address address) const;
+bool isTagPresent(const Address address);

 // Returns true if there is:
 //   a) a tag match on this address or there is
@@ -92,10 +92,10 @@

 // looks an address up in the cache
 AbstractCacheEntry lookup(const Address address);
-const AbstractCacheEntry lookup(const Address address) const;
+//const AbstractCacheEntry lookup(const Address address) const;

 // Get/Set permission of cache block
-AccessPermission getPermission(const Address address) const;
+AccessPermission getPermission(const Address address);
 void changePermission(const Address address, AccessPermission 
new_perm);


 int getLatency() const { return m_latency; }
@@ -138,7 +138,7 @@

 // Given a cache tag: returns the index of the tag in a set.
 // returns -1 if the tag is not found.
-int findTagInSet(Index line, const Address tag) const;
+int findTagInSet(Index line, const Address tag);
 int findTagInSetIgnorePermissions(Index cacheSet,
   const Address tag) const;

@@ -170,6 +170,10 @@
 int m_cache_num_set_bits;
 int m_cache_assoc;
 int m_start_index_bit;
+
+Address m_mru_address;
+int m_mru_tag_index;
+   bool m_valid_mru_address;
 };

 #endif // __MEM_RUBY_SYSTEM_CACHEMEMORY_HH__



On Thu, 

Re: [m5-dev] Implementation of findTagInSet

2010-11-26 Thread Steve Reinhardt
Hi Nilay,

Good job, this is clearly progress... you've sped up isTagPresent by 2X and
the simulation overall by almost 10%.  That's nothing to sneeze at.  It's
sad that isTagPresent is still the top function though.  Can you do some
tracing or other experiments to get a feel for whether keeping the last N
tags instead of the last 1 (for some small value of N, like 2 or 3) would be
useful?  Just printing out a trace of calls to isTagPresent should be enough
to get a feeling for whether that's worth hacking in a test implementation.

Also, I see a lot of your patch has to do with removing const labels from
isTagPresent... this is exactly the scenario the 'mutable' keyword was
designed for; if you mark your m_mru_* fields as mutable, then you shouldn't
have to remove the const labels from any of the function calls.

Steve

On Fri, Nov 26, 2010 at 9:37 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I profiled the un-modified and the modified m5 ten times (this time there
 was no load on the machine). Here are the average results:

  % time   std. dev  actual time   std. dev
 un-modified
 isTagPresent   19.99 0.35  47.17 1.23
 cumulative 100   0.00  235.913.37

 modified
 isTagPresent   10.35 0.28  21.22 0.57
 cumulative 100   0.00  205.112.94

 Below is the patch, though it may not apply cleanly to current version of
 m5 since I have few un-committed patches enqueued.


 # HG changeset patch
 # Parent 7ac53378e03b5116c48e6076167de6a2a2e06158

 diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.cc
 --- a/src/mem/ruby/system/CacheMemory.ccThu Nov 25 13:23:51 2010
 -0600
 +++ b/src/mem/ruby/system/CacheMemory.ccThu Nov 25 17:30:58 2010
 -0600
 @@ -84,6 +84,8 @@
 m_locked[i][j] = -1;
 }
 }
 +
 +m_valid_mru_address = false;
  }

  CacheMemory::~CacheMemory()
 @@ -135,15 +137,26 @@
  // Given a cache index: returns the index of the tag in a set.
  // returns -1 if the tag is not found.
  int
 -CacheMemory::findTagInSet(Index cacheSet, const Address tag) const
 +CacheMemory::findTagInSet(Index cacheSet, const Address tag)
  {
 assert(tag == line_address(tag));
 +
 +if(m_valid_mru_address  m_mru_address == tag) return
 m_mru_tag_index;
 +
 // search the set for the tags
 +m_valid_mru_address = true;
 +m_mru_address.setAddress(tag.getAddress());
 +
 m5::hash_mapAddress, int::const_iterator it = m_tag_index.find(tag);
 if (it != m_tag_index.end())
 if (m_cache[cacheSet][it-second]-m_Permission !=
 AccessPermission_NotPresent)
 +{
 +m_mru_tag_index = it-second;
 return it-second;
 +}
 +
 +m_mru_tag_index = -1;
 return -1; // Not found
  }

 @@ -215,7 +228,7 @@

  // tests to see if an address is present in the cache

  bool
 -CacheMemory::isTagPresent(const Address address) const
 +CacheMemory::isTagPresent(const Address address)

  {
 assert(address == line_address(address));
 Index cacheSet = addressToCacheSet(address);
 @@ -276,6 +289,10 @@
 m_locked[cacheSet][i] = -1;
 m_tag_index[address] = i;

 +m_valid_mru_address = true;
 +m_mru_address.setAddress(address.getAddress());
 +m_mru_tag_index = i;
 +
 m_replacementPolicy_ptr-
 touch(cacheSet, i, g_eventQueue_ptr-getTime());

 @@ -300,6 +317,8 @@
 address);
 m_locked[cacheSet][loc] = -1;
 m_tag_index.erase(address);
 +
 +m_valid_mru_address = false;
 }
  }

 @@ -327,18 +346,18 @@
  }

  // looks an address up in the cache
 -const AbstractCacheEntry
 -CacheMemory::lookup(const Address address) const
 +/*const AbstractCacheEntry
 +CacheMemory::lookup(const Address address)

  {
 assert(address == line_address(address));
 Index cacheSet = addressToCacheSet(address);
 int loc = findTagInSet(cacheSet, address);
 assert(loc != -1);
 return *m_cache[cacheSet][loc];
 -}
 +}*/

  AccessPermission
 -CacheMemory::getPermission(const Address address) const
 +CacheMemory::getPermission(const Address address)

  {
 assert(address == line_address(address));
 return lookup(address).m_Permission;
 diff -r 7ac53378e03b src/mem/ruby/system/CacheMemory.hh
 --- a/src/mem/ruby/system/CacheMemory.hhThu Nov 25 13:23:51 2010
 -0600
 +++ b/src/mem/ruby/system/CacheMemory.hhThu Nov 25 17:30:58 2010
 -0600
 @@ -74,7 +74,7 @@
  DataBlock* data_ptr);

 // tests to see if an address is present in the cache
 -bool isTagPresent(const Address address) const;
 +bool isTagPresent(const Address address);

 // Returns true if there is:
 //   a) a tag match on this address or there is
 @@ -92,10 +92,10 @@

 // looks an address up in the cache
 AbstractCacheEntry lookup(const Address address);
 -const AbstractCacheEntry lookup(const Address address) 

Re: [m5-dev] Implementation of findTagInSet

2010-11-25 Thread Nilay Vaish
Brad and I had a discussion on Tuesday. We are still thinking how to 
resolve this issue.


As a stop gap arrangement, I added a couple of variables to the 
CacheMemory class which track the last address for which the lookup was 
performed. I am posting the results from profiling before and after the 
change. I had compile m5 with MOESI_hammer protocol and the simulation was 
allowed to run for 20,000,000,000 ticks. I would suggest not to look at 
the absolute time values for they would vary depending on the load on the 
machine.


Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 18.27 61.3261.32 888688475 0.00 0.00 
CacheMemory::isTagPresent(Address const) const
  5.97 81.3620.04 219389124 0.00 0.00  Histogram::add(long 
long)
  2.99 91.3910.03 204574578 0.00 0.00 
CacheMemory::lookup(Address const)
  2.56 99.97 8.58 12852725 0.00 0.00 
MemoryControl::executeCycle()
  2.51108.38 8.41 45887816 0.00 0.00 
L1Cache_Controller::wakeup()




Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 11.38 41.6441.64 888688475 0.00 0.00 
CacheMemory::isTagPresent(Address const)
  5.99 63.5521.91 219389124 0.00 0.00  Histogram::add(long 
long)
  2.90 74.1610.61 45887816 0.00 0.00 
L1Cache_Controller::wakeup()
  2.76 84.2510.09 12852725 0.00 0.00 
MemoryControl::executeCycle()
  2.49 93.36 9.11 34522950 0.00 0.00 
BaseSimpleCPU::preExecute()



I can post the patch on the review board if this looks good.

--
Nilay



On Tue, 23 Nov 2010, Nilay Vaish wrote:


Brad and I will be having a discussion today on how to resolve this issue.

--
Nilay


On Tue, 23 Nov 2010, Steve Reinhardt wrote:


Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know 
if

that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided 
by
building a software cache into isTagPresent(), by storing the last 
address
looked up along with a pointer to the block, then just checking on each 
call

to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table.  I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-23 Thread Steve Reinhardt
Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided by
building a software cache into isTagPresent(), by storing the last address
looked up along with a pointer to the block, then just checking on each call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table.  I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve


On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I was looking at the MOESI hammer protocol. I think Steve's observation
 that extra tag lookups are going on in the cache is correct. In particular I
 noticed that in the getState() and setState() functions, first
 isTagPresent(address) is called and on the basis of the result (which is
 true or false), getCacheEntry(address) is called. Surprisingly, the
 getCacheEntry() function calls the isTagPresent() function again. These
 calls are in the file src/mem/protocol/MOESI_hammer-cache.sm

 Thanks
 Nilay


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-23 Thread Nilay Vaish

Brad and I will be having a discussion today on how to resolve this issue.

--
Nilay


On Tue, 23 Nov 2010, Steve Reinhardt wrote:


Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided by
building a software cache into isTagPresent(), by storing the last address
looked up along with a pointer to the block, then just checking on each call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table.  I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve


On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote:


I was looking at the MOESI hammer protocol. I think Steve's observation
that extra tag lookups are going on in the cache is correct. In particular I
noticed that in the getState() and setState() functions, first
isTagPresent(address) is called and on the basis of the result (which is
true or false), getCacheEntry(address) is called. Surprisingly, the
getCacheEntry() function calls the isTagPresent() function again. These
calls are in the file src/mem/protocol/MOESI_hammer-cache.sm

Thanks
Nilay





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-16 Thread Steve Reinhardt
I'm not the guy to ask for that... but actually I doubt the protocol itself
matters that much, you just need to look at the code path that gets
exercised on an L1 cache hit and see where the calls are.  That part should
be almost if not entirely independent of the coherence protocol.

Steve

On Tue, Nov 16, 2010 at 7:13 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I profiled M5 using MOESI_CMP_directory and MOESI_CMP_token protocols.
 isTagPresent() dominates in both of these protocols as well. But the
 percentage of simulation time used is lesser (varies from 10% to 16%). I
 will take a look at the assembly code for the direct cache set indexing
 approach.

 In order to reduce the number of calls made to tag lookup, I would need to
 read about the protocol itself. Can you point some documentation on
 MOESI_hammer protocol?

 --
 Nilay



 On Fri, 12 Nov 2010, Steve Reinhardt wrote:

  On Fri, Nov 12, 2010 at 1:10 PM, Nilay Vaish ni...@cs.wisc.edu wrote:

  I tried couple of ideas for improving the performance of the
 findTagInSet()
 function. These include having one hash table per cache set and replacing
 the hash table with a two dimensional array indexed using cache set,
 cache
 way. Neither of these ideas showed significant enough change in the
 percentage of the time taken by isTagPresent() function to come to a
 definite conclusion.

 I looked the assembly code generated for the isTagPresent() function.
 Note
 that the compiler inlines the funtion findTagIsPresent(). The assembly
 code
 is about 100 lines long. Again, the find function for std::hash_map gets
 inlined. There is one division operation in the code and several load
 operations. If we were to assume that the hash function is able to keep
 the
 average occupancy of each bucket close to 1, then I think the time taken
 by
 the function would be determined by the time taken by the loads. It might
 be
 that a lot of loads end up missing in the cache.


 I'm a little surprised that the direct cache set indexing approach was not
 faster since I'd think that would be far less than 100 instructions, but
 you're right that issues like whether loads hit or miss in the cache will
 have a large impact.


  As far as reordering the tags is concerned, since the hash_map is not
 directly under our control, we will have to delete the tag entry in the
 hash
 table and insert it again to make sure that it is the first entry that is
 searched.


 The reordering only makes sense if we replace the hash table with a more
 conventional tag array.


  Right now I am profiling with coherence protocol as MOESI_hammer. I am
 thinking of profiling using a different protocol to make sure that it is
 not
 an artifact of the protocol in use.



 That sounds like a good idea.

 All in all, we would ideally like to both speed up individual calls and
 reduce the number of calls. IIRC, gprof indicated that findTagInSet() was
 called 4-5X more frequently than there were cache accesses, which makes no
 sense to me; it seems like a typical cache hit should only require a
 single
 tag lookup.

 That's another thing to keep in mind, is that typical programs really have
 very high cache hit rates, so another approach is to look at what happens
 in
 the process of servicing an L1 cache hit and optimize that path as much as
 possible.

 Steve

  ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-16 Thread Nilay Vaish



On Fri, 12 Nov 2010, Steve Reinhardt wrote:



Right now I am profiling with coherence protocol as MOESI_hammer. I am
thinking of profiling using a different protocol to make sure that it is not
an artifact of the protocol in use.



That sounds like a good idea.

All in all, we would ideally like to both speed up individual calls and
reduce the number of calls. IIRC, gprof indicated that findTagInSet() was
called 4-5X more frequently than there were cache accesses, which makes no
sense to me; it seems like a typical cache hit should only require a single
tag lookup.


Should this not be true in case a multiprocessor system is being 
simulated? I am not aware the configuration that ruby_fs.py makes use of.





That's another thing to keep in mind, is that typical programs really have
very high cache hit rates, so another approach is to look at what happens in
the process of servicing an L1 cache hit and optimize that path as much as
possible.

Steve



--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-16 Thread Nilay Vaish
I was looking at the MOESI hammer protocol. I think Steve's observation 
that extra tag lookups are going on in the cache is correct. In particular 
I noticed that in the getState() and setState() functions, first 
isTagPresent(address) is called and on the basis of the result (which is 
true or false), getCacheEntry(address) is called. Surprisingly, the 
getCacheEntry() function calls the isTagPresent() function again. These 
calls are in the file src/mem/protocol/MOESI_hammer-cache.sm


Thanks
Nilay

On Tue, 16 Nov 2010, Nilay Vaish wrote:

I profiled M5 using MOESI_CMP_directory and MOESI_CMP_token protocols. 
isTagPresent() dominates in both of these protocols as well. But the 
percentage of simulation time used is lesser (varies from 10% to 16%). I will 
take a look at the assembly code for the direct cache set indexing approach.


In order to reduce the number of calls made to tag lookup, I would need to 
read about the protocol itself. Can you point some documentation on 
MOESI_hammer protocol?


--
Nilay


On Fri, 12 Nov 2010, Steve Reinhardt wrote:


On Fri, Nov 12, 2010 at 1:10 PM, Nilay Vaish ni...@cs.wisc.edu wrote:

I tried couple of ideas for improving the performance of the 
findTagInSet()

function. These include having one hash table per cache set and replacing
the hash table with a two dimensional array indexed using cache set, 
cache

way. Neither of these ideas showed significant enough change in the
percentage of the time taken by isTagPresent() function to come to a
definite conclusion.

I looked the assembly code generated for the isTagPresent() function. Note
that the compiler inlines the funtion findTagIsPresent(). The assembly 
code

is about 100 lines long. Again, the find function for std::hash_map gets
inlined. There is one division operation in the code and several load
operations. If we were to assume that the hash function is able to keep 
the
average occupancy of each bucket close to 1, then I think the time taken 
by
the function would be determined by the time taken by the loads. It might 
be

that a lot of loads end up missing in the cache.



I'm a little surprised that the direct cache set indexing approach was not
faster since I'd think that would be far less than 100 instructions, but
you're right that issues like whether loads hit or miss in the cache will
have a large impact.



As far as reordering the tags is concerned, since the hash_map is not
directly under our control, we will have to delete the tag entry in the 
hash

table and insert it again to make sure that it is the first entry that is
searched.



The reordering only makes sense if we replace the hash table with a more
conventional tag array.



Right now I am profiling with coherence protocol as MOESI_hammer. I am
thinking of profiling using a different protocol to make sure that it is 
not

an artifact of the protocol in use.



That sounds like a good idea.

All in all, we would ideally like to both speed up individual calls and
reduce the number of calls. IIRC, gprof indicated that findTagInSet() was
called 4-5X more frequently than there were cache accesses, which makes no
sense to me; it seems like a typical cache hit should only require a single
tag lookup.

That's another thing to keep in mind, is that typical programs really have
very high cache hit rates, so another approach is to look at what happens 
in

the process of servicing an L1 cache hit and optimize that path as much as
possible.

Steve




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-12 Thread Nilay Vaish
I tried couple of ideas for improving the performance of the 
findTagInSet() function. These include having one hash table per cache set 
and replacing the hash table with a two dimensional array indexed using 
cache set, cache way. Neither of these ideas showed significant enough 
change in the percentage of the time taken by isTagPresent() function to 
come to a definite conclusion.


I looked the assembly code generated for the isTagPresent() function. Note 
that the compiler inlines the funtion findTagIsPresent(). The assembly 
code is about 100 lines long. Again, the find function for std::hash_map 
gets inlined. There is one division operation in the code and several load 
operations. If we were to assume that the hash function is able to keep 
the average occupancy of each bucket close to 1, then I think the time 
taken by the function would be determined by the time taken by the loads. 
It might be that a lot of loads end up missing in the cache.


As far as reordering the tags is concerned, since the hash_map is not 
directly under our control, we will have to delete the tag entry in the 
hash table and insert it again to make sure that it is the first entry 
that is searched.


Right now I am profiling with coherence protocol as MOESI_hammer. I am 
thinking of profiling using a different protocol to make sure that it is 
not an artifact of the protocol in use.


--
Nilay

On Fri, 5 Nov 2010, Steve Reinhardt wrote:


If that's where a significant amount of time is being spent, we need to
either call it less or make it run faster :-).  Doing both is even better.

At a high level, the process of looking something up in an N-way associative
cache should not take that many instructions if N is small (a shift and an
add to find the tag index, at most N loads and compares to match the tags).
If we reorder the tags to search the MRU block first then we will
probabilistically keep the average number of tags searched well below N.

Steve

On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish ni...@cs.wisc.edu wrote:


I had another look at the profile output. On the machine that I am using (a
3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming
that the pipeline is functioning at is best, I think the number of uops
executed would be ~500. Is that too much for this function?

--
Nilay


On Fri, 5 Nov 2010, Nilay Vaish wrote:

 Do you know what hash function is in use? Seems to me that the default

hash function is to hash to self. May be we should test with a different
hash function.

--
Nilay

On Fri, 5 Nov 2010, Steve Reinhardt wrote:

 You can look at the call graph profile further down in the gprof output

to
figure out how much time is spent in functions that get called from
isTagPresent.  If it's not specifically calling out findTagInSet, it may
be
because it's inlined in isTagPresent.

Steve

On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I ran ALPHA_FS_MOESI_hammer using the following command --


./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py

I don't know how the benchmark is picked in case none is specified.
Below
is the gprof output --


 %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 19.72 51.2251.22 925285266 0.00 0.00
CacheMemory::isTagPresent(Address const) const
 5.59 65.7414.52 229035720 0.00 0.00
 Histogram::add(long
long)
 3.57 75.02 9.28 212664644 0.00 0.00
CacheMemory::lookup(Address const)
 2.53 81.59 6.57 47830136 0.00 0.00
L1Cache_Controller::wakeup()


The output shows that about a fifth of the time is spent in the
isTagPresent() function.

bool
CacheMemory::isTagPresent(const Address address) const
{
  assert(address == line_address(address));
  Index cacheSet = addressToCacheSet(address);
  int loc = findTagInSet(cacheSet, address);

  if (loc == -1) {
  // We didn't find the tag
  DPRINTF(RubyCache, No tag match for address: %s\n, address);
  return false;
  }
  DPRINTF(RubyCache, address: %s found\n, address);
  return true;
}

Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
and the DPRINTF() will not get compiled. The addressToCacheSet()
function
does some bitwise operations and some arithmetic operations. So it is
expected that it would not consume much time. So, most likely the
findTagInSet() function takes a major portion of the overall time
required
by the isTagPresent() function.

--
Nilay




 ___

m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-07 Thread Nilay Vaish
I went through the implementation of hash_map in C++. I realized that the 
number of buckets get resized on the fly as the number of elements 
increase. This means that we would have more than num_of_cache_sets * 
num_ways buckets in the hash table.


--
Nilay

On Sat, 6 Nov 2010, Nilay Vaish wrote:

I am still digging into how the cache configuration is specified. Currently, 
the system has four caches - two 32 KB, 256 sets, 2-way associative caches, 
one 2 MB, 2048 sets, 16-way set associative cache, and one 4 MB, 16384 sets, 
4-set associative cache. Each of these caches make use of a hash table having 
only 193 buckets. This seems too low to me. Clearly if the cache is being 
used to full capacity, look ups in the hash table will have to go through lot 
many entries before figuring out whether the address being searched for is in 
the cache or not.


--
Nilay

On Fri, 5 Nov 2010, Steve Reinhardt wrote:


If that's where a significant amount of time is being spent, we need to
either call it less or make it run faster :-).  Doing both is even better.

At a high level, the process of looking something up in an N-way 
associative

cache should not take that many instructions if N is small (a shift and an
add to find the tag index, at most N loads and compares to match the tags).
If we reorder the tags to search the MRU block first then we will
probabilistically keep the average number of tags searched well below N.

Steve

On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

I had another look at the profile output. On the machine that I am using 
(a

3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming
that the pipeline is functioning at is best, I think the number of uops
executed would be ~500. Is that too much for this function?

--
Nilay


On Fri, 5 Nov 2010, Nilay Vaish wrote:

 Do you know what hash function is in use? Seems to me that the default

hash function is to hash to self. May be we should test with a different
hash function.

--
Nilay

On Fri, 5 Nov 2010, Steve Reinhardt wrote:

 You can look at the call graph profile further down in the gprof output

to
figure out how much time is spent in functions that get called from
isTagPresent.  If it's not specifically calling out findTagInSet, it may
be
because it's inlined in isTagPresent.

Steve


 ___

m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Nilay Vaish

I ran ALPHA_FS_MOESI_hammer using the following command --

./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py

I don't know how the benchmark is picked in case none is specified. Below 
is the gprof output --


  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 19.72 51.2251.22 925285266 0.00 0.00 
CacheMemory::isTagPresent(Address const) const
  5.59 65.7414.52 229035720 0.00 0.00  Histogram::add(long 
long)
  3.57 75.02 9.28 212664644 0.00 0.00 
CacheMemory::lookup(Address const)
  2.53 81.59 6.57 47830136 0.00 0.00 
L1Cache_Controller::wakeup()



The output shows that about a fifth of the time is spent in the 
isTagPresent() function.


bool
CacheMemory::isTagPresent(const Address address) const
{
assert(address == line_address(address));
Index cacheSet = addressToCacheSet(address);
int loc = findTagInSet(cacheSet, address);

if (loc == -1) {
// We didn't find the tag
DPRINTF(RubyCache, No tag match for address: %s\n, address);
return false;
}
DPRINTF(RubyCache, address: %s found\n, address);
return true;
}

Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert() 
and the DPRINTF() will not get compiled. The addressToCacheSet() function 
does some bitwise operations and some arithmetic operations. So it is 
expected that it would not consume much time. So, most likely the 
findTagInSet() function takes a major portion of the overall time required 
by the isTagPresent() function.


--
Nilay

On Thu, 4 Nov 2010, Steve Reinhardt wrote:


You also have to build a binary that supports ruby, like
ALPHA_FS_MOESI_hammer.  If you can't get that to work, try
ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the
workload you run doesn't really matter that much as long as it's long enough
to get a meaningful profile.

Steve



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Steve Reinhardt
You can look at the call graph profile further down in the gprof output to
figure out how much time is spent in functions that get called from
isTagPresent.  If it's not specifically calling out findTagInSet, it may be
because it's inlined in isTagPresent.

Steve

On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I ran ALPHA_FS_MOESI_hammer using the following command --

 ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py

 I don't know how the benchmark is picked in case none is specified. Below
 is the gprof output --


  %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  19.72 51.2251.22 925285266 0.00 0.00
 CacheMemory::isTagPresent(Address const) const
  5.59 65.7414.52 229035720 0.00 0.00  Histogram::add(long
 long)
  3.57 75.02 9.28 212664644 0.00 0.00
 CacheMemory::lookup(Address const)
  2.53 81.59 6.57 47830136 0.00 0.00
 L1Cache_Controller::wakeup()


 The output shows that about a fifth of the time is spent in the
 isTagPresent() function.

 bool
 CacheMemory::isTagPresent(const Address address) const
 {
assert(address == line_address(address));
Index cacheSet = addressToCacheSet(address);
int loc = findTagInSet(cacheSet, address);

if (loc == -1) {
// We didn't find the tag
DPRINTF(RubyCache, No tag match for address: %s\n, address);
return false;
}
DPRINTF(RubyCache, address: %s found\n, address);
return true;
 }

 Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
 and the DPRINTF() will not get compiled. The addressToCacheSet() function
 does some bitwise operations and some arithmetic operations. So it is
 expected that it would not consume much time. So, most likely the
 findTagInSet() function takes a major portion of the overall time required
 by the isTagPresent() function.

 --
 Nilay


 On Thu, 4 Nov 2010, Steve Reinhardt wrote:

  You also have to build a binary that supports ruby, like
 ALPHA_FS_MOESI_hammer.  If you can't get that to work, try
 ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the
 workload you run doesn't really matter that much as long as it's long
 enough
 to get a meaningful profile.

 Steve


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Nilay Vaish
Do you know what hash function is in use? Seems to me that the default 
hash function is to hash to self. May be we should test with a different 
hash function.


--
Nilay

On Fri, 5 Nov 2010, Steve Reinhardt wrote:


You can look at the call graph profile further down in the gprof output to
figure out how much time is spent in functions that get called from
isTagPresent.  If it's not specifically calling out findTagInSet, it may be
because it's inlined in isTagPresent.

Steve

On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote:


I ran ALPHA_FS_MOESI_hammer using the following command --

./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py

I don't know how the benchmark is picked in case none is specified. Below
is the gprof output --


 %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 19.72 51.2251.22 925285266 0.00 0.00
CacheMemory::isTagPresent(Address const) const
 5.59 65.7414.52 229035720 0.00 0.00  Histogram::add(long
long)
 3.57 75.02 9.28 212664644 0.00 0.00
CacheMemory::lookup(Address const)
 2.53 81.59 6.57 47830136 0.00 0.00
L1Cache_Controller::wakeup()


The output shows that about a fifth of the time is spent in the
isTagPresent() function.

bool
CacheMemory::isTagPresent(const Address address) const
{
   assert(address == line_address(address));
   Index cacheSet = addressToCacheSet(address);
   int loc = findTagInSet(cacheSet, address);

   if (loc == -1) {
   // We didn't find the tag
   DPRINTF(RubyCache, No tag match for address: %s\n, address);
   return false;
   }
   DPRINTF(RubyCache, address: %s found\n, address);
   return true;
}

Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
and the DPRINTF() will not get compiled. The addressToCacheSet() function
does some bitwise operations and some arithmetic operations. So it is
expected that it would not consume much time. So, most likely the
findTagInSet() function takes a major portion of the overall time required
by the isTagPresent() function.

--
Nilay



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Nilay Vaish
I had another look at the profile output. On the machine that I am using 
(a 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. 
Assuming that the pipeline is functioning at is best, I think the number 
of uops executed would be ~500. Is that too much for this function?


--
Nilay

On Fri, 5 Nov 2010, Nilay Vaish wrote:

Do you know what hash function is in use? Seems to me that the default hash 
function is to hash to self. May be we should test with a different hash 
function.


--
Nilay

On Fri, 5 Nov 2010, Steve Reinhardt wrote:


You can look at the call graph profile further down in the gprof output to
figure out how much time is spent in functions that get called from
isTagPresent.  If it's not specifically calling out findTagInSet, it may be
because it's inlined in isTagPresent.

Steve

On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote:


I ran ALPHA_FS_MOESI_hammer using the following command --

./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py

I don't know how the benchmark is picked in case none is specified. Below
is the gprof output --


 %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 19.72 51.2251.22 925285266 0.00 0.00
CacheMemory::isTagPresent(Address const) const
 5.59 65.7414.52 229035720 0.00 0.00  Histogram::add(long
long)
 3.57 75.02 9.28 212664644 0.00 0.00
CacheMemory::lookup(Address const)
 2.53 81.59 6.57 47830136 0.00 0.00
L1Cache_Controller::wakeup()


The output shows that about a fifth of the time is spent in the
isTagPresent() function.

bool
CacheMemory::isTagPresent(const Address address) const
{
   assert(address == line_address(address));
   Index cacheSet = addressToCacheSet(address);
   int loc = findTagInSet(cacheSet, address);

   if (loc == -1) {
   // We didn't find the tag
   DPRINTF(RubyCache, No tag match for address: %s\n, address);
   return false;
   }
   DPRINTF(RubyCache, address: %s found\n, address);
   return true;
}

Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
and the DPRINTF() will not get compiled. The addressToCacheSet() function
does some bitwise operations and some arithmetic operations. So it is
expected that it would not consume much time. So, most likely the
findTagInSet() function takes a major portion of the overall time required
by the isTagPresent() function.

--
Nilay





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-05 Thread Steve Reinhardt
If that's where a significant amount of time is being spent, we need to
either call it less or make it run faster :-).  Doing both is even better.

At a high level, the process of looking something up in an N-way associative
cache should not take that many instructions if N is small (a shift and an
add to find the tag index, at most N loads and compares to match the tags).
If we reorder the tags to search the MRU block first then we will
probabilistically keep the average number of tags searched well below N.

Steve

On Fri, Nov 5, 2010 at 9:27 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I had another look at the profile output. On the machine that I am using (a
 3.2 GHz Pentium 4), each call to isTagPresent() take about 57 ns. Assuming
 that the pipeline is functioning at is best, I think the number of uops
 executed would be ~500. Is that too much for this function?

 --
 Nilay


 On Fri, 5 Nov 2010, Nilay Vaish wrote:

  Do you know what hash function is in use? Seems to me that the default
 hash function is to hash to self. May be we should test with a different
 hash function.

 --
 Nilay

 On Fri, 5 Nov 2010, Steve Reinhardt wrote:

  You can look at the call graph profile further down in the gprof output
 to
 figure out how much time is spent in functions that get called from
 isTagPresent.  If it's not specifically calling out findTagInSet, it may
 be
 because it's inlined in isTagPresent.

 Steve

 On Fri, Nov 5, 2010 at 7:58 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

  I ran ALPHA_FS_MOESI_hammer using the following command --

 ./build/ALPHA_FS_MOESI_hammer/m5.prof ./configs/example/ruby_fs.py

 I don't know how the benchmark is picked in case none is specified.
 Below
 is the gprof output --


  %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  19.72 51.2251.22 925285266 0.00 0.00
 CacheMemory::isTagPresent(Address const) const
  5.59 65.7414.52 229035720 0.00 0.00
  Histogram::add(long
 long)
  3.57 75.02 9.28 212664644 0.00 0.00
 CacheMemory::lookup(Address const)
  2.53 81.59 6.57 47830136 0.00 0.00
 L1Cache_Controller::wakeup()


 The output shows that about a fifth of the time is spent in the
 isTagPresent() function.

 bool
 CacheMemory::isTagPresent(const Address address) const
 {
   assert(address == line_address(address));
   Index cacheSet = addressToCacheSet(address);
   int loc = findTagInSet(cacheSet, address);

   if (loc == -1) {
   // We didn't find the tag
   DPRINTF(RubyCache, No tag match for address: %s\n, address);
   return false;
   }
   DPRINTF(RubyCache, address: %s found\n, address);
   return true;
 }

 Since m5.prof is compiled with -DNDEBUG and -DTRACING_ON=0, the assert()
 and the DPRINTF() will not get compiled. The addressToCacheSet()
 function
 does some bitwise operations and some arithmetic operations. So it is
 expected that it would not consume much time. So, most likely the
 findTagInSet() function takes a major portion of the overall time
 required
 by the isTagPresent() function.

 --
 Nilay



  ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-04 Thread Nilay Vaish
I tried running ruby_fs.py, below is the error message that I received. I 
don't think there is any documentation or mailing list discussion on how 
to run ruby_fs.py. To me it seems that some parameter relating to the DMA 
controller is missing from the command I tried out.


--
Nilay


./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 --detailed 
--caches --l2cache -F 50M5 Simulator System


Copyright (c) 2001-2008
The Regents of The University of Michigan
All Rights Reserved


M5 compiled Nov  3 2010 18:10:26
M5 revision 3b2f82286e5d 7724 default WarnPatch qtip tip
M5 started Nov  4 2010 08:50:01
M5 executing on scamorza.cs.wisc.edu
command line: ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 
--detailed --caches --l2cache -F 50

Error: could not create sytem for ruby protocol MI_example
Traceback (most recent call last):
  File string, line 1, in ?
  File /scratch/nilay/GEM5/sibling/src/python/m5/main.py, line 359, in 
main

exec filecode in scope
  File ./configs/example/ruby_fs.py, line 117, in ?
system._dma_devices)
  File /scratch/nilay/GEM5/sibling/configs/ruby/Ruby.py, line 69, in 
create_system

(cpu_sequencers, dir_cntrls, all_cntrls) = \
  File string, line 0, in ?
  File /scratch/nilay/GEM5/sibling/configs/ruby/MI_example.py, line 138, 
in create_system

system.dma_cntrl.dma_sequencer.port = dma_device.dma
  File /scratch/nilay/GEM5/sibling/src/python/m5/SimObject.py, line 586, 
in __getattr__

raise AttributeError, object '%s' has no attribute '%s' \
AttributeError: object 'LinuxAlphaSystem' has no attribute 'dma_cntrl'


On Wed, 3 Nov 2010, Steve Reinhardt wrote:


Ah, the issue is that you're using the old M5 memory hierarchy and not
Ruby.  You need to run one of the Ruby versions, and use ruby_fs.py instead
of fs.py.

Steve


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-04 Thread Steve Reinhardt
You also have to build a binary that supports ruby, like
ALPHA_FS_MOESI_hammer.  If you can't get that to work, try
ALPHA_SE_MOESI_hammer and run one of the ALPHA_SE test workloads... the
workload you run doesn't really matter that much as long as it's long enough
to get a meaningful profile.

Steve

On Thu, Nov 4, 2010 at 7:17 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I tried running ruby_fs.py, below is the error message that I received. I
 don't think there is any documentation or mailing list discussion on how to
 run ruby_fs.py. To me it seems that some parameter relating to the DMA
 controller is missing from the command I tried out.

 --
 Nilay


 ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1 --detailed
 --caches --l2cache -F 50M5 Simulator System

 Copyright (c) 2001-2008
 The Regents of The University of Michigan
 All Rights Reserved


 M5 compiled Nov  3 2010 18:10:26
 M5 revision 3b2f82286e5d 7724 default WarnPatch qtip tip
 M5 started Nov  4 2010 08:50:01
 M5 executing on scamorza.cs.wisc.edu
 command line: ./build/ALPHA_FS/m5.prof ./configs/example/ruby_fs.py -n 1
 --detailed --caches --l2cache -F 50
 Error: could not create sytem for ruby protocol MI_example
 Traceback (most recent call last):
  File string, line 1, in ?
  File /scratch/nilay/GEM5/sibling/src/python/m5/main.py, line 359, in
 main
exec filecode in scope
  File ./configs/example/ruby_fs.py, line 117, in ?
system._dma_devices)
  File /scratch/nilay/GEM5/sibling/configs/ruby/Ruby.py, line 69, in
 create_system
(cpu_sequencers, dir_cntrls, all_cntrls) = \
  File string, line 0, in ?
  File /scratch/nilay/GEM5/sibling/configs/ruby/MI_example.py, line 138,
 in create_system
system.dma_cntrl.dma_sequencer.port = dma_device.dma
  File /scratch/nilay/GEM5/sibling/src/python/m5/SimObject.py, line 586,
 in __getattr__
raise AttributeError, object '%s' has no attribute '%s' \
 AttributeError: object 'LinuxAlphaSystem' has no attribute 'dma_cntrl'



 On Wed, 3 Nov 2010, Steve Reinhardt wrote:

  Ah, the issue is that you're using the old M5 memory hierarchy and not
 Ruby.  You need to run one of the Ruby versions, and use ruby_fs.py
 instead
 of fs.py.

 Steve

  ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Nilay Vaish
I profiled M5 but surprisingly I did not find any mention of the function 
findTagInSet() in the output obtained from gprof. Does it matter what 
coherence protocol is in use? I carried out the following step -


1. Compiled m5.prof using
scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof

2. Ran blackscholes benchmark using the instructions specified in the 
technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et 
al. Specifically I ran the following command --


./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1 
--script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed 
--caches --l2cache -F 50


These are the contents of blackscholes.rcS

#!/bin/sh
# File to run the blackscholes benchmark
cd /parsec/install/bin 
/sbin/m5 switchcpu

/sbin/m5 dumpstats
/sbin/m5 resetstats
./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt 
/parsec/install/inputs/blackscholes/prices.txt
echo Done :D
/sbin/m5 exit
/sbin/m5 exit

Thanks
Nilay


On Tue, 2 Nov 2010, Steve Reinhardt wrote:


I just compiled m5.prof and ran it (forgot what workload I ran on it,
probably one of the parsec benchmarks; it probably doesn't matter a lot).
If you've never used gprof before, this is a great time to learn!

Steve

On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote:


I am looking at possible performance optimizations in Ruby. As you can see
grasp from the mail excerpt below, the function findTagInSet() consumes lots
of time. I am thinking of making the changes as suggested by Brad. I have
questions for m5-dev members, in particular for Derek and Steve. How did you
arrive at the conclusion that findTagInSet() is a problem? What benchmarks,
profiling tools to use?

Thanks
Nilay

-- Forwarded message --
Date: Mon, 20 Sep 2010 22:57:39 -0500
From: Beckmann, Brad brad.beckm...@amd.com
To: 'Nilay Vaish' ni...@cs.wisc.edu
Cc: Daniel Gibson gib...@cs.wisc.edu
Subject: RE: Performane Optimizations in Ruby

== CacheMemory findTagInSet == Recently Steve mentioned to me that a huge
percentage of time was being spent in CacheMemory's findTagInSet function.
Right now that function uses a hashmap across the entire cache to map tags
to way ids.  I think Derek recently implemented this change in hopes to
improve performance, and it might have for small caches, but I don't think
it helps for larger caches.  There a couple of possible solutions: per set
hashmaps, or reordering the ways so that the MRU blocks are at the lower ids
and use a loop.  I think we should investigate both solutions and see which
is better.
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Steve Reinhardt
What was the gprof output?

On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I profiled M5 but surprisingly I did not find any mention of the function
 findTagInSet() in the output obtained from gprof. Does it matter what
 coherence protocol is in use? I carried out the following step -

 1. Compiled m5.prof using
 scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof

 2. Ran blackscholes benchmark using the instructions specified in the
 technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et al.
 Specifically I ran the following command --

 ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1
 --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed
 --caches --l2cache -F 50

 These are the contents of blackscholes.rcS

 #!/bin/sh
 # File to run the blackscholes benchmark
 cd /parsec/install/bin /sbin/m5 switchcpu
 /sbin/m5 dumpstats
 /sbin/m5 resetstats
 ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt
 /parsec/install/inputs/blackscholes/prices.txt
 echo Done :D
 /sbin/m5 exit
 /sbin/m5 exit

 Thanks
 Nilay



 On Tue, 2 Nov 2010, Steve Reinhardt wrote:

  I just compiled m5.prof and ran it (forgot what workload I ran on it,
 probably one of the parsec benchmarks; it probably doesn't matter a lot).
 If you've never used gprof before, this is a great time to learn!

 Steve

 On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

  I am looking at possible performance optimizations in Ruby. As you can
 see
 grasp from the mail excerpt below, the function findTagInSet() consumes
 lots
 of time. I am thinking of making the changes as suggested by Brad. I have
 questions for m5-dev members, in particular for Derek and Steve. How did
 you
 arrive at the conclusion that findTagInSet() is a problem? What
 benchmarks,
 profiling tools to use?

 Thanks
 Nilay

 -- Forwarded message --
 Date: Mon, 20 Sep 2010 22:57:39 -0500
 From: Beckmann, Brad brad.beckm...@amd.com
 To: 'Nilay Vaish' ni...@cs.wisc.edu
 Cc: Daniel Gibson gib...@cs.wisc.edu
 Subject: RE: Performane Optimizations in Ruby

 == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge
 percentage of time was being spent in CacheMemory's findTagInSet
 function.
 Right now that function uses a hashmap across the entire cache to map
 tags
 to way ids.  I think Derek recently implemented this change in hopes to
 improve performance, and it might have for small caches, but I don't
 think
 it helps for larger caches.  There a couple of possible solutions: per
 set
 hashmaps, or reordering the ways so that the MRU blocks are at the lower
 ids
 and use a loop.  I think we should investigate both solutions and see
 which
 is better.
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


  ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Nilay
I ran m5.prof two times. Here are the top five functions --

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
  8.35  5.05 5.05 58969209 0.00 0.00 
BaseSimpleCPU::preExecute()
  6.32  8.87 3.82 58975463 0.00 0.00  AtomicSimpleCPU::tick()
  4.69 11.71 2.84  5060904 0.00 0.00 
FullO3CPUO3CPUImpl::tick()
  3.60 13.89 2.18 84144079 0.00 0.00 
CacheSet::findBlk(unsigned long long) const
  2.86 15.62 1.73 79820055 0.00 0.00 
CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*,
std::allocatorPacket* )


  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
  6.89  4.15 4.15 58969209 0.00 0.00 
BaseSimpleCPU::preExecute()
  6.78  8.23 4.08 58975463 0.00 0.00  AtomicSimpleCPU::tick()
  4.92 11.19 2.96  5060904 0.00 0.00 
FullO3CPUO3CPUImpl::tick()
  3.90 13.54 2.35 84144079 0.00 0.00 
CacheSet::findBlk(unsigned long long) const
  3.34 15.55 2.01 79820055 0.00 0.00 
CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*,
std::allocatorPacket* )


--
Nilay

On Wed, November 3, 2010 9:56 pm, Steve Reinhardt wrote:
 What was the gprof output?

 On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I profiled M5 but surprisingly I did not find any mention of the
 function
 findTagInSet() in the output obtained from gprof. Does it matter what
 coherence protocol is in use? I carried out the following step -

 1. Compiled m5.prof using
 scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof

 2. Ran blackscholes benchmark using the instructions specified in the
 technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et
 al.
 Specifically I ran the following command --

 ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1
 --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed
 --caches --l2cache -F 50

 These are the contents of blackscholes.rcS

 #!/bin/sh
 # File to run the blackscholes benchmark
 cd /parsec/install/bin /sbin/m5 switchcpu
 /sbin/m5 dumpstats
 /sbin/m5 resetstats
 ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt
 /parsec/install/inputs/blackscholes/prices.txt
 echo Done :D
 /sbin/m5 exit
 /sbin/m5 exit

 Thanks
 Nilay





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-03 Thread Steve Reinhardt
Ah, the issue is that you're using the old M5 memory hierarchy and not
Ruby.  You need to run one of the Ruby versions, and use ruby_fs.py instead
of fs.py.

Steve

On Wed, Nov 3, 2010 at 9:00 PM, Nilay ni...@cs.wisc.edu wrote:

 I ran m5.prof two times. Here are the top five functions --

  %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  8.35  5.05 5.05 58969209 0.00 0.00
 BaseSimpleCPU::preExecute()
  6.32  8.87 3.82 58975463 0.00 0.00
  AtomicSimpleCPU::tick()
  4.69 11.71 2.84  5060904 0.00 0.00
 FullO3CPUO3CPUImpl::tick()
  3.60 13.89 2.18 84144079 0.00 0.00
 CacheSet::findBlk(unsigned long long) const
  2.86 15.62 1.73 79820055 0.00 0.00
 CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*,
 std::allocatorPacket* )


  %   cumulative   self  self total
  time   seconds   secondscalls   s/call   s/call  name
  6.89  4.15 4.15 58969209 0.00 0.00
 BaseSimpleCPU::preExecute()
  6.78  8.23 4.08 58975463 0.00 0.00
  AtomicSimpleCPU::tick()
  4.92 11.19 2.96  5060904 0.00 0.00
 FullO3CPUO3CPUImpl::tick()
  3.90 13.54 2.35 84144079 0.00 0.00
 CacheSet::findBlk(unsigned long long) const
  3.34 15.55 2.01 79820055 0.00 0.00
 CacheLRU::access(Packet*, CacheBlk*, int, std::listPacket*,
 std::allocatorPacket* )


 --
 Nilay

 On Wed, November 3, 2010 9:56 pm, Steve Reinhardt wrote:
  What was the gprof output?
 
  On Wed, Nov 3, 2010 at 4:45 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
 
  I profiled M5 but surprisingly I did not find any mention of the
  function
  findTagInSet() in the output obtained from gprof. Does it matter what
  coherence protocol is in use? I carried out the following step -
 
  1. Compiled m5.prof using
  scons -j 6 USE_MYSQL=False RUBY=True build/ALPHA_FS/m5.prof
 
  2. Ran blackscholes benchmark using the instructions specified in the
  technical report 'Running PARSEC v2.1 in the M5 Simulator' by Gebhart et
  al.
  Specifically I ran the following command --
 
  ./build/ALPHA_FS/m5.prof ./configs/example/fs.py -n 1
  --script=/scratch/nilay/GEM5/system/scripts/blackscholes.rcS --detailed
  --caches --l2cache -F 50
 
  These are the contents of blackscholes.rcS
 
  #!/bin/sh
  # File to run the blackscholes benchmark
  cd /parsec/install/bin /sbin/m5 switchcpu
  /sbin/m5 dumpstats
  /sbin/m5 resetstats
  ./blackscholes 64 /parsec/install/inputs/blackscholes/in_64K.txt
  /parsec/install/inputs/blackscholes/prices.txt
  echo Done :D
  /sbin/m5 exit
  /sbin/m5 exit
 
  Thanks
  Nilay
 
 
 


 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-02 Thread Steve Reinhardt
I just compiled m5.prof and ran it (forgot what workload I ran on it,
probably one of the parsec benchmarks; it probably doesn't matter a lot).
If you've never used gprof before, this is a great time to learn!

Steve

On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I am looking at possible performance optimizations in Ruby. As you can see
 grasp from the mail excerpt below, the function findTagInSet() consumes lots
 of time. I am thinking of making the changes as suggested by Brad. I have
 questions for m5-dev members, in particular for Derek and Steve. How did you
 arrive at the conclusion that findTagInSet() is a problem? What benchmarks,
 profiling tools to use?

 Thanks
 Nilay

 -- Forwarded message --
 Date: Mon, 20 Sep 2010 22:57:39 -0500
 From: Beckmann, Brad brad.beckm...@amd.com
 To: 'Nilay Vaish' ni...@cs.wisc.edu
 Cc: Daniel Gibson gib...@cs.wisc.edu
 Subject: RE: Performane Optimizations in Ruby

 == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge
 percentage of time was being spent in CacheMemory's findTagInSet function.
 Right now that function uses a hashmap across the entire cache to map tags
 to way ids.  I think Derek recently implemented this change in hopes to
 improve performance, and it might have for small caches, but I don't think
 it helps for larger caches.  There a couple of possible solutions: per set
 hashmaps, or reordering the ways so that the MRU blocks are at the lower ids
 and use a loop.  I think we should investigate both solutions and see which
 is better.
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev