Re: [m5-dev] Parallel M5
On Jun 30, 2008, at 5:15 PM, Steve Reinhardt wrote: On Mon, Jun 30, 2008 at 9:11 AM, Ali Saidi [EMAIL PROTECTED] wrote: The FastAlloc pools and StaticInst cache should clearly be duplicated. Why would you want to duplicate the StaticInst cache? It's a read-mostly structure so you'd only have to lock on a miss/insert, and having a larger shared capacity for a given memory footprint seems like a win to me. Right now we use an stl::map for the cache, so we could try reader/ writer locks, but we couldn't just lock to insert since the insert may cause the read to go off in the weeds unless we make some assumptions about the stl code that I don't think we can make. Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Parallel M5
OK, that makes more sense now. Still seems like in the long term the right thing is to use a data structure that supports multiple readers with either per-bucket locks, a reader/writer lock, or some sort of non-blocking update (has x86 added a compare-and-swap yet?). They've had cmpxchg since p5 or p6. Intel's thread building blocks stuff has concurrent_hash_map, but it's not really free. I wonder if there is some good library we should use for this stuff. Anyone out there know? Reinventing the wheel isn't actually all that fun. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Parallel M5
Yes. CMPXCHG on page 98 of volume 3 of the AMD manuals. It says it supports the lock prefix so I'm assuming it's not otherwise atomic. Gabe non-blocking update (has x86 added a compare-and-swap yet?). ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Parallel M5
I vote for (1) until it can be shown that it matters. A single pointer doesn't seem like a big deal, especially since most of the things we create and destroy frequently aren't SimObjects but other classes. Ali On Jun 29, 2008, at 12:23 PM, nathan binkert wrote: I'm nearly done with the first step of getting parallel M5 working. -- Add an EventQueue pointer to every SimObject and add schedule()/deschedule()/reschedule() functions to the Base SimObject to use that event queue pointer. -- Change all calls to event scheduling to use that EventQueue pointer. -- Remove the schedule/deschedule/reschedule functions on the Event object. Now, you must create an event and schedule it on an event queue. An example of this is something like this: old: new LinkDelayEvent(this, packet, curTick + linkDelay); new: Event *event = new LinkDelayEvent(this, packet); this-schedule(event, curTick + linkDelay); I'd like to remove the queue pointer from the event object since there is only one use case where you've scheduled an event and you don't know which queue it's on if you want to de/reschedule it. It's for repeat events like the SimLoopExitEvent. Here are the options: 1) Leave the queue pointer in the object 2) Pass the queue pointer as a parameter to the process() function 3) Record the queue pointer in just those objects that require it 4) Create a new flag to the event called AutoRepeat, create a virtual function that can be called to determine the repeat interval, and add support for repeat in the event queue 5) Create a thread local global variable called currentEventQueue. (I hate this idea) I go back and forth as to the right thing to do. I'd really like to avoid the queue pointer in all objects so we can keep events small, but I guess it can easily be argued that I shouldn't keep that optimization unless I know that it will pay off, but the only way to know if it will pay off is to just do it. I also basically hate #5 and it's on the bottom of my list. One issue is that because of the committed instruction queue, there can be more than one event queue in a given thread. Anyone have any opinions? BTW: I'll write up a Parallelizing M5 wiki page in the next day or so that enumerates all of my plans. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Parallel M5
I vote for (1) until it can be shown that it matters. A single pointer doesn't seem like a big deal, especially since most of the things we create and destroy frequently aren't SimObjects but other classes. Showing that it matters is pretty hard unless you actually do it. A profile won't actually be very useful. The idea here is that I'm trying hard to make events fit within a cache line. I'm fine with #1 or #3... since you have to subclass Event to override process() anyway, just moving the queue pointer to the subclass for those that need it doesn't seem so bad to me. #2 and #4 seem unnecessarily complex and/or pervasive. Alright, I've more or less done #3 and I'll do a comparison of it to #1 to see if it makes a difference. As a note, #1 is a bit different than what we currently have because I will no longer pass the queue pointer into the object constructor, but rather will set the queue pointer when the event is actually scheduled. This is because some events will move between queues. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Parallel M5
This is probably slightly off topic, but could you explain more specifically the synchronization event stuff you mention on the wiki page? It sounds interesting but I can't picture what you're describing. Gabe nathan binkert wrote: I vote for (1) until it can be shown that it matters. A single pointer doesn't seem like a big deal, especially since most of the things we create and destroy frequently aren't SimObjects but other classes. Showing that it matters is pretty hard unless you actually do it. A profile won't actually be very useful. The idea here is that I'm trying hard to make events fit within a cache line. I'm fine with #1 or #3... since you have to subclass Event to override process() anyway, just moving the queue pointer to the subclass for those that need it doesn't seem so bad to me. #2 and #4 seem unnecessarily complex and/or pervasive. Alright, I've more or less done #3 and I'll do a comparison of it to #1 to see if it makes a difference. As a note, #1 is a bit different than what we currently have because I will no longer pass the queue pointer into the object constructor, but rather will set the queue pointer when the event is actually scheduled. This is because some events will move between queues. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev