An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060328/088fde2e/attachment.html From penglu01 at hotmail.com Tue Mar 28 19:14:23 2006 From: penglu01 at hotmail.com (lu peng) List-Post: [email protected] Date: Tue Mar 28 19:14:42 2006 Subject: [Simflex] Re: A question regarding to SPLASH2 In-Reply-To: <[email protected]> Message-ID: <[email protected]>
An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060329/1749fc84/attachment-0001.html From penglu01 at hotmail.com Wed Mar 29 13:01:00 2006 From: penglu01 at hotmail.com (lu peng) List-Post: [email protected] Date: Wed Mar 29 13:01:11 2006 Subject: [Simflex] A quesion about CPU0's user mode insts In-Reply-To: <[email protected]> Message-ID: <[email protected]> An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060329/7d773a0e/attachment.html From twenisch at ece.cmu.edu Wed Mar 29 13:34:39 2006 From: twenisch at ece.cmu.edu (Thomas Wenisch) List-Post: [email protected] Date: Wed Mar 29 13:34:49 2006 Subject: [Simflex] Re: A quesion about CPU0's user mode insts In-Reply-To: <[email protected]> References: <[email protected]> Message-ID: <[email protected]> Hi Lu, On Wed, 29 Mar 2006, lu peng wrote: > > Hi, Tom, > > Please see the attached statistics got from both an unmodified simflex 1.0.2 > inorder model and a simics trace model. The workload is a specjbb2000 with 1 > minute > ramping up and 8 warehouses. I just added some counters in the related > functions: the trace_snoop_operate() in SimicsTracer.hpp of the simflex and > text_trace_instruction() in trace.c of the trace model. I used a function > SIM_cpu_privilege_level() to detect the mode of an inst. For each cpu, I > count the number > of insts in either user mode or supervisory mode. In the following results. > CPU X:(Y,Z) means that there are Y user insts and Z supervisory insts > executed on CPU > X. > > From the comparison, you can see: > > 1. There are differences for each processor for the same checkpoint. Why? CMPFlex, UniFlex, DSMFlex, all are timing simulators, which means that each cache miss stalls the CPU. Because you are starting with empty caches, there will be many stalls from I-cache misses (and d-cache misses, but I-cache will dominate at first). In the default configuration, these will incur hundreds of cycles of delay each. If a CPU enters a loop in OS code (e.g., either waiting on an OS lock or the idle loop), it will spin rapidly and not incur misses - there will be neither I-cache nor d-cache misses. On the other hand, user code will run *very* slowly until caches warm up (unless you happen to be at one of the barriers in ocean where user code spins rapidly). Hence, empty caches will lead to an unnatural imbalance of user and system instruction mix. If you want Flexus to see the same instruction stream as the simics tracer, use TraceFlex. TraceFlex is much faster, and does not perturb Simics timing - it observes Simics' IPC=1 execution just like the trace module (which is why we call it TraceFlex). Over time (once caches warm up, after maybe 20M instructions or so) I would expect the tracer and inorder timing to have more similar behavior. However, they will never be identical. > Part A: from unmodified Simflex > <snip> > > CPU 0:(1242,192518),CPU 1:(18037,22244),CPU 2:(31095,56567),CPU > 3:(53139,51389),CPU 4:(81963,75759),CPU 5:(108589,25584),CPU > 6:(100769,10364),CPU 7:(69006,101735), > totalInst:1000000 > > ************* > > Part B: from a Simics Trace model > <snip> > > CPU 0:(106238,18727),CPU 1:(94573,30692),CPU 2:(79927,45034),CPU > 3:(99406,25545),CPU 4:(93509,31446),CPU 5:(85256,39707),CPU > 6:(122295,2680),CPU > 7:(120788,4177),,totalInst:1000000 > > With the exception of CPU 0, these mixes do not strike me as atypical for a cold start. Note that in the trace module, all of the CPUs have executed a total of 125K instructions. However, because of the timing effects of cache misses, the SimFlex results vary from almost 200K to as little as 40K. We have to ask why some CPUs are so much faster. The answer is that they are in a tight loop that doesn't incur misses. If you turn the clock frequency up to 1000 MHz, you will probably see somewhat fewer system instructions, because the timer interrupt will come around less frequently. However, the cache cold-start effect will still give you a heavy bias towards system instructions at first. Hope that helps. Regards, -Tom Wenisch Computer Architecture Lab Carnegie Mellon University From penglu01 at hotmail.com Thu Mar 30 13:19:13 2006 From: penglu01 at hotmail.com (lu peng) List-Post: [email protected] Date: Thu Mar 30 13:23:53 2006 Subject: [Simflex] Re: A quesion about CPU0's user mode insts In-Reply-To: <[email protected]> Message-ID: <[email protected]> An HTML attachment was scrubbed... URL: http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060330/8818dbd5/attachment.html
