Re: [m5-dev] Parallel M5

2008-06-30 Thread Ali Saidi

On Jun 30, 2008, at 5:15 PM, Steve Reinhardt wrote:

 On Mon, Jun 30, 2008 at 9:11 AM, Ali Saidi [EMAIL PROTECTED] wrote:
 The FastAlloc pools and StaticInst cache should clearly be  
 duplicated.

 Why would you want to duplicate the StaticInst cache?  It's a
 read-mostly structure so you'd only have to lock on a miss/insert, and
 having a larger shared capacity for a given memory footprint seems
 like a win to me.
Right now we use an stl::map for the cache, so we could try reader/ 
writer locks, but we couldn't just lock to insert since the insert may  
cause the read to go off in the weeds unless we make some assumptions  
about the stl code that I don't think we can make.

Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Parallel M5

2008-06-30 Thread nathan binkert
 OK, that makes more sense now.  Still seems like in the long term the
 right thing is to use a data structure that supports multiple readers
 with either per-bucket locks, a reader/writer lock, or some sort of
 non-blocking update (has x86 added a compare-and-swap yet?).

They've had cmpxchg since p5 or p6.  Intel's thread building blocks
stuff has concurrent_hash_map, but it's not really free.  I wonder if
there is some good library we should use for this stuff.  Anyone out
there know?  Reinventing the wheel isn't actually all that fun.

 Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Parallel M5

2008-06-30 Thread Gabe Black
Yes. CMPXCHG on page 98 of volume 3 of the AMD manuals. It says it 
supports the lock prefix so I'm assuming it's not otherwise atomic.

Gabe
 non-blocking update (has x86 added a compare-and-swap yet?).
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Parallel M5

2008-06-29 Thread Ali Saidi
I vote for (1) until it can be shown that it matters. A single pointer  
doesn't seem like a big deal, especially since most of the things we  
create and destroy frequently aren't SimObjects but other classes.

Ali

On Jun 29, 2008, at 12:23 PM, nathan binkert wrote:

 I'm nearly done with the first step of getting parallel M5 working.
 -- Add an EventQueue pointer to every SimObject and add
 schedule()/deschedule()/reschedule() functions to the Base SimObject
 to use that event queue pointer.
 -- Change all calls to event scheduling to use that EventQueue  
 pointer.
 -- Remove the schedule/deschedule/reschedule functions on the Event
 object.  Now, you must create an event and schedule it on an event
 queue.

 An example of this is something like this:

 old:
new LinkDelayEvent(this, packet, curTick + linkDelay);

 new:
Event *event = new LinkDelayEvent(this, packet);
this-schedule(event, curTick + linkDelay);

 I'd like to remove the queue pointer from the event object since there
 is only one use case where you've scheduled an event and you don't
 know which queue it's on if you want to de/reschedule it.  It's for
 repeat events like the SimLoopExitEvent.

 Here are the options:
 1) Leave the queue pointer in the object
 2) Pass the queue pointer as a parameter to the process() function
 3) Record the queue pointer in just those objects that require it
 4) Create a new flag to the event called AutoRepeat, create a virtual
 function that can be called to determine the repeat interval, and add
 support for repeat in the event queue
 5) Create a thread local global variable called currentEventQueue.
 (I hate this idea)

 I go back and forth as to the right thing to do.  I'd really like to
 avoid the queue pointer in all objects so we can keep events small,
 but I guess it can easily be argued that I shouldn't keep that
 optimization unless I know that it will pay off, but the only way to
 know if it will pay off is to just do it.  I also basically hate #5
 and it's on the bottom of my list.  One issue is that because of the
 committed instruction queue, there can be more than one event queue in
 a given thread.

 Anyone have any opinions?

 BTW:  I'll write up a Parallelizing M5 wiki page in the next day or so
 that enumerates all of my plans.

  Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Parallel M5

2008-06-29 Thread nathan binkert
 I vote for (1) until it can be shown that it matters. A single pointer
 doesn't seem like a big deal, especially since most of the things we
 create and destroy frequently aren't SimObjects but other classes.
Showing that it matters is pretty hard unless you actually do it.  A
profile won't actually be very useful.  The idea here is that I'm
trying hard to make events fit within a cache line.

 I'm fine with #1 or #3... since you have to subclass Event to override
 process() anyway, just moving the queue pointer to the subclass for
 those that need it doesn't seem so bad to me.  #2 and #4 seem
 unnecessarily complex and/or pervasive.
Alright, I've more or less done #3 and I'll do a comparison of it to
#1 to see if it makes a difference.

As a note, #1 is a bit different than what we currently have because I
will no longer pass the queue pointer into the object constructor, but
rather will set the queue pointer when the event is actually
scheduled.  This is because some events will move between queues.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Parallel M5

2008-06-29 Thread Gabe Black
This is probably slightly off topic, but could you explain more 
specifically the synchronization event stuff you mention on the wiki 
page? It sounds interesting but I can't picture what you're describing.

Gabe

nathan binkert wrote:
 I vote for (1) until it can be shown that it matters. A single pointer
 doesn't seem like a big deal, especially since most of the things we
 create and destroy frequently aren't SimObjects but other classes.
 
 Showing that it matters is pretty hard unless you actually do it.  A
 profile won't actually be very useful.  The idea here is that I'm
 trying hard to make events fit within a cache line.

   
 I'm fine with #1 or #3... since you have to subclass Event to override
 process() anyway, just moving the queue pointer to the subclass for
 those that need it doesn't seem so bad to me.  #2 and #4 seem
 unnecessarily complex and/or pervasive.
 
 Alright, I've more or less done #3 and I'll do a comparison of it to
 #1 to see if it makes a difference.

 As a note, #1 is a bit different than what we currently have because I
 will no longer pass the queue pointer into the object constructor, but
 rather will set the queue pointer when the event is actually
 scheduled.  This is because some events will move between queues.

   Nate
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev