Hi Jason, based on the slides from the 2006 ISCA tutorial, trace simulation is ~1.5 MIPS, InOrder timing is 10 KIPS and OoO timing is 3KIPS (for whatever processor these measurements were taken on back then). I believe the bulk of the slowdown comes from the different modes Simics itself is running, but I don't have any evidence to support this claim. Unfortunately, none of us here has ever used the InOrder simulator.
Regards, Evangelos On Mar 29, 2013, at 5:04 PM, Jason Zebchuk wrote: > Hi Evangelos, > > There's a couple of reasons, but mostly it's because we want to see if we can > improve the time it takes to explore ideas by using long-running timing > simulations instead of the sampling methodology. At the moment, we tend to > spend a lot of time working in functional simulation trying to see if > something has potential, then if we want to measure the performance impact, > we have to generate flexpoints and run timing simulation. We've consistently > been frustrated by the need to develop models in the functional simulator and > then port the same model to the timing simulator. In addition, the time > required to generate the flexpoints also becomes a bit of a bottleneck, > especially for the new cloudsuite workloads. > > So we've been thinking of using the in-order core so that long-running timing > simulations would hopefully run fast enough that we could use them for early > exploration of the performance potential of different ideas. The thought here > being that the order InOrder simulator would be significantly faster than > just putting the OoO simualtor into in-order mode. Do you have a rough > estimate of the kind of speedup you experience between in-order and > out-of-order using the OoO simulator? > > > Thanks, > > Jason > > > > On 2013-03-29 9:26 AM, Evangelos Vlachos wrote: >> Hi Jason, >> >> is there a reason why you want to use the InOrder simulator? We discontinued >> it (at least) since the last release. Even when I started using Flexus (6-7 >> years ago) the older students were suggesting I would use the OoO simulator >> and configure it to model an InOrder core, just because the OoO codebase was >> getting more attention. I believe we have been doing that ever since. >> >> Regards, >> Evangelos >> >> On Mar 29, 2013, at 1:49 PM, Jason Zebchuk wrote: >> >>> We're using timing with an inorder core (InorderSimicsFeeder, Execute, >>> IFetch, and BPWarm instead of uArch, FetchAddressGenerate, and uFetch, >>> etc.). >>> >>> In the first case, we set it to stop after the first cycle and it actually >>> ran for about 165 cycles or so until the first instruction for each core >>> completed. We're simulating 16 cores with a scientific benchmark and most >>> of the cores tried to fetch the same instruction on the first cycle >>> resulting in a lot of queuing. I tracked the behavior in this case and it >>> issued 1 instruction for each core and completed just after every >>> instruction would have finished. >>> >>> In the second case, it was set to terminate after 15k cycles. Looking at >>> timestamps, that took a couple of minutes. The next 5k cycles took about 2 >>> hours and it still hadn't stopped executing. Because it's so slow, I >>> haven't tried to track down whether there are any memory requests that are >>> delayed this long in the hierarchy or whether there's some other reason why >>> it's still executing. From my experience, it's pretty rare for a memory >>> request to take that long, especially considering that the in-order core >>> should cause less contention than an out-of-order core. >>> >>> We did some debugging with gdb and it's definitely saving the statistics >>> every cycle, which is definitely create a huge slowdown. >>> >>> It looks like it's getting stuck in the loop in >>> nInorderSimicsFeeder::SimicsCycleManager::advanceCycles() in >>> components/InorderSimicsFeeder/CycleManager.hpp I would expect that trying >>> to terminate the simulation should cause it to break out of this loop, but >>> it looks like that's not happening. >>> >>> >>> Jason >>> >>> >>> >>> On 2013-03-29 1:10 AM, Mahmood Naderan wrote: >>>> Hi >>>> >>>> >It tried to terminate after the first cycle, but it looks like it kept >>>> >executing for several cycles afterwards. It kept printing out the >>>> >following messages: >>>> >>>> What is the end cycle? 1000? >>>> >>>> >>>> >In one case, it executed 15k cycles very quickly, and then took a couple >>>> >of hours executing another 5k cycles and it still hadn't stopped the >>>> >simulation >>>> >>>> Are you sure this behavior is the result of saving stats every cycle? >>>> >>>> Are you using trace? Timing? >>>> >>>> -- >>>> Regards, >>>> Mahmood >>>> >>>> >>>> >>>> From: Jason Zebchuk <[email protected]> >>>> To: "[email protected]" <[email protected]> >>>> Sent: Friday, March 29, 2013 5:11 AM >>>> Subject: Inorder simulation not stopping gracefully >>>> >>>> Hi guys, >>>> >>>> We tried running a simulation using the inorder core instead of the >>>> out-of-order core, and we ran into a little problem. >>>> >>>> We did: >>>> >>>> flexus.set "-magic-break:stop_cycle" "1" >>>> >>>> to stop after a single cycle. It tried to terminate after the first cycle, >>>> but it looks like it kept executing for several cycles afterwards. It kept >>>> printing out the following messages: >>>> >>>> <breakpoint_tracker.cpp:447> {1}- Reached target cycle. Ending simulation. >>>> <flexus.cpp:717> {1}- Terminating simulation. Timestamp: 2013-Mar-28 >>>> 20:02:51 >>>> <flexus.cpp:718> {1}- Saving final stats_db. >>>> >>>> This was repeated over and over (with the cycle number incrementing by one >>>> each time) until the simulation eventually stopped. >>>> >>>> It looks like it's waiting for outstanding memory requests to terminate >>>> before exiting the simulation. Is this the normal behavior with the >>>> in-order core? >>>> >>>> The real problem is that each cycle it tries to save the statistics. When >>>> we try running longer simulations, the statistics get rather large so it >>>> advances very slowly. We also saw cases where it would continue running >>>> for several hours after it should have terminated. In one case, it >>>> executed 15k cycles very quickly, and then took a couple of hours >>>> executing another 5k cycles and it still hadn't stopped the simulation. >>>> I'm not sure if this is an issue with the memory hierarchy taking a long >>>> time to complete all of the outstanding requests, or if there's some other >>>> bug in this case. >>>> >>>> Any thoughts you might have would be useful. >>>> >>>> >>>> Thanks, >>>> >>>> Jason >>> >> >
