An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060328/088fde2e/attachment.html
From penglu01 at hotmail.com  Tue Mar 28 19:14:23 2006
From: penglu01 at hotmail.com (lu peng)
List-Post: [email protected]
Date: Tue Mar 28 19:14:42 2006
Subject: [Simflex] Re: A question regarding to SPLASH2
In-Reply-To: <[email protected]>
Message-ID: <[email protected]>

An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060329/1749fc84/attachment-0001.html
From penglu01 at hotmail.com  Wed Mar 29 13:01:00 2006
From: penglu01 at hotmail.com (lu peng)
List-Post: [email protected]
Date: Wed Mar 29 13:01:11 2006
Subject: [Simflex] A quesion about CPU0's user mode insts
In-Reply-To: <[email protected]>
Message-ID: <[email protected]>

An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060329/7d773a0e/attachment.html
From twenisch at ece.cmu.edu  Wed Mar 29 13:34:39 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Wed Mar 29 13:34:49 2006
Subject: [Simflex] Re: A quesion about CPU0's user mode insts
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>

Hi Lu,


On Wed, 29 Mar 2006, lu peng wrote:

> 
> Hi, Tom,
> 
> Please see the attached statistics got from both an unmodified simflex 1.0.2 
> inorder model and a simics trace model. The workload is a specjbb2000 with 1 
> minute
> ramping up and 8 warehouses. I just added some counters in the related 
> functions: the trace_snoop_operate() in SimicsTracer.hpp of the simflex and
> text_trace_instruction() in trace.c of the trace model. I used a function 
> SIM_cpu_privilege_level() to detect the mode of an inst. For each cpu, I 
> count the number
> of insts in either user mode or supervisory mode. In the following results. 
> CPU X:(Y,Z) means that there are Y user insts and Z supervisory insts 
> executed on CPU
> X.
> 
> From the comparison, you can see:
> 
> 1. There are differences for each processor for the same checkpoint. Why?

CMPFlex, UniFlex, DSMFlex, all are timing simulators, which means that 
each cache miss stalls the CPU.  Because you are starting with empty 
caches, there will be many stalls from I-cache misses (and d-cache 
misses, but I-cache will dominate at first).  In the default 
configuration, these will incur hundreds of cycles of delay each.

If a CPU enters a loop in OS code (e.g., either waiting on an OS lock 
or the idle loop), it will spin rapidly and not incur misses - there will 
be neither I-cache nor d-cache misses.  On the other hand, user code will 
run *very* slowly until caches warm up (unless you happen to be at one of 
the barriers in ocean where user code spins rapidly).  Hence, empty caches 
will lead to an unnatural imbalance of user and system instruction mix.

If you want Flexus to see the same instruction stream as the simics 
tracer, use TraceFlex.  TraceFlex is much faster, and does not perturb 
Simics timing - it observes Simics' IPC=1 execution just like the trace 
module (which is why we call it TraceFlex).

Over time (once caches warm up, after maybe 20M instructions or so) I 
would expect the tracer and inorder timing to have more similar behavior. 
However, they will never be identical.

> Part A: from unmodified Simflex
> 
<snip>
> 
> CPU 0:(1242,192518),CPU 1:(18037,22244),CPU 2:(31095,56567),CPU 
> 3:(53139,51389),CPU 4:(81963,75759),CPU 5:(108589,25584),CPU 
> 6:(100769,10364),CPU 7:(69006,101735),
> totalInst:1000000
> 
> *************
> 
> Part B: from a Simics Trace model
> 
<snip>
> 
> CPU 0:(106238,18727),CPU 1:(94573,30692),CPU 2:(79927,45034),CPU 
> 3:(99406,25545),CPU 4:(93509,31446),CPU 5:(85256,39707),CPU 
> 6:(122295,2680),CPU
> 7:(120788,4177),,totalInst:1000000
> 
>


With the exception of CPU 0, these mixes do not strike me as atypical for 
a cold start.  Note that in the trace module, all of the CPUs have 
executed a total of 125K instructions.  However, because of the timing 
effects of cache misses, the SimFlex results vary from almost 200K to as 
little as 40K. We have to ask why some CPUs are so much faster.  The 
answer is that they are in a tight loop that doesn't incur misses.

If you turn the clock frequency up to 1000 MHz, you will probably see 
somewhat fewer system instructions, because the timer interrupt will come 
around less frequently.  However, the cache cold-start effect will still 
give you a heavy bias towards system instructions at first.

Hope that helps.

Regards,
-Tom Wenisch
Computer Architecture Lab
Carnegie Mellon University
From penglu01 at hotmail.com  Thu Mar 30 13:19:13 2006
From: penglu01 at hotmail.com (lu peng)
List-Post: [email protected]
Date: Thu Mar 30 13:23:53 2006
Subject: [Simflex] Re: A quesion about CPU0's user mode insts
In-Reply-To: <[email protected]>
Message-ID: <[email protected]>

An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060330/8818dbd5/attachment.html

Reply via email to