Hi Onur thanks for your answer.
Just to make sure I understand correct.
1) So the Idle cycles are not part of the benchmark we don' include
them in UIPC calculation. But what about the trap cycles? Traps are
for handling interrupts right? Should we include them in formula or not?
2) Okay here is an example.
I attached the configuration file of the flexus components. It's
basically the configuration of arm-corex-15 with one core and 4MB L2.
The simulator is CMP.L2SharedNUCA.OoO. The workloads is cloud9/1cpu. I
ran 800 flexpoints. 80 from each phase. Here are the cycle statistics.
NicoCluster1:/home/zhadji01/flexus-results/cloud9_1cpu_timing_4MBL2/cloud9_1cpu>stat-manager print sum | grep -i
cycles
sys-L1d-Track:CyclesWithN
sys-L1d-Track:CyclesWithN:Coherence
sys-L1d-Track:CyclesWithN:NonCoherence
sys-cycles 40000000
sys-uarch-SpinCycles 4243
sys-uarch-TB:Idle:AccountedCycles 0
sys-uarch-TB:SkippedCycles 0
sys-uarch-TB:System:AccountedCycles 323648
sys-uarch-TB:Trap:AccountedCycles 2166076
sys-uarch-TB:User:AccountedCycles 37512199
sys-ufetch-MissCycles 2811494
NicoCluster1:/home/zhadji01/flexus-results/cloud9_1cpu_timing_4MBL2/cloud9_1cpu>echo "323648 + 2166076 + 37512199" |
bc
40001923
You see that the total should be 40000000 but it is 40001923.
And one more question. I noticed that the insn_count file that
contains the number of instructions executed by simics also miss
matches the instructions flexus statistics show. But I guess the
difference is due to the fact that simics count steps and a simics
step may contain many instructions right?
Quoting Onur Koçberber <[email protected]>:
Hi Zacharias,
Here are the answers for your questions.
1) The formula is correct because the Idle counters count the time
when the processor is in the Idle loop. These cycles are not a part
of the benchmark and they do not contribute to anything useful, we
don't include them in the UIPC calculation.
2) Can you give us more details about your simulation configuration?
Which simulator and benchmark do you use? You can also send some of
the debug outputs so that we can see how long you run and how much
difference you have.
3) Yes (you listed 'User' counter twice, I assume you wanted to say
'Idle' for the second one).
4) I am not aware of a counter for the executed instructions.
However, you might use branch predictor information to find the
executed but not committed instructions.
Regards,
Onur
On Sep 6, 2012, at 2:02 PM, [email protected] wrote:
Hi,
1) In flexus 4.1 distibution the postprocess.sh calculates UIPC
with this formula User:Commits:Busy / (User:AccountedCycles +
System:AccountedCycles).
Shouldn't it be user:commits:busy / (User:AccountedCycles +
System:AccountedCycles + Idle::AccountedCycles +
Trap::AccountedCycles ) ?
2) I guess the sum of User:AccountedCycles, System:AccountedCycles,
Idle::AccountedCycles and Trap::AccountedCycles represents the
total execution cycles and it should be equal to flexpoints *
measured_region_cycles * cores. In all simulations I ran these
numbers where not equal (but close).
And I'm sure that all flex points ran correctly. Why is this happening?
3) Is the total number of instructions commited equal to
User:Commits:Busy + System:Commits:Busy + Trap:Commits:Busy +
User:Commits:Busy ?
4) Does flexus have separate statistics for instructions committed
and executed?
thanks -Zacharias Hadjilambrou
flexus.set "-L1d:allow_evict_clean" "1" # (EvictClean)
Cause the cache to evict clean blocks
flexus.set "-L1d:array_config" "STD:size=32768:assoc=2:repl=LRU" #
(ArrayConfiguration) Configuration of cache array
(STD:sets=1024:assoc=16:repl=LRU
flexus.set "-L1d:banks" "1" # (Banks) number
of banks on the data and tag arrays
flexus.set "-L1d:bsize" "64" # (BlockSize)
Block size
flexus.set "-L1d:bustime_data" "2" # (BusTime_Data)
Bus transfer time - data
flexus.set "-L1d:bustime_nodata" "1" #
(BusTime_NoData) Bus transfer time - no data
flexus.set "-L1d:cache_type"
"InclusiveMESI:snoop_lru=false:AlwaysNAck=true" # (CacheType) Type of cache
(InclusiveMOESI)
flexus.set "-L1d:cores" "1" # (Cores) Number
of cores
flexus.set "-L1d:data_issue_lat" "1" #
(DataIssueLatency) Minimum delay between issues to data pipeline
flexus.set "-L1d:data_lat" "2" # (DataLatency)
Total latency of data pipeline
flexus.set "-L1d:dup_tag_issue_lat" "1" #
(TagIssueLatency) Minimum delay between issues to tag pipeline
flexus.set "-L1d:eb" "8" #
(EvictBufferSize) Number of Evict Buffer entries
flexus.set "-L1d:evict_on_snoop" "0" # (EvictOnSnoop)
Send evictions on Snoop channel
flexus.set "-L1d:evict_writable_has_data" "0" #
(EvictWritableHasData) Send data with EvictWritable messages
flexus.set "-L1d:fast_evict_clean" "0" #
(FastEvictClean) Send clean evicts without reserving data bus
flexus.set "-L1d:gzip_flexpoints" "1" #
(GZipFlexpoints) Compress flexpoints with gzip
flexus.set "-L1d:level" "eL1" # (CacheLevel)
CacheLevel
flexus.set "-L1d:maf" "32" # (MAFSize)
Number of MAF entries
flexus.set "-L1d:maf_targets" "0" #
(MAFTargetsPerRequest) Number of MAF targets per request
flexus.set "-L1d:no_bus" "0" # (NoBus) No bus
model (i.e., infinite BW, no latency)
flexus.set "-L1d:perfect_cache" "0" # (PerfectCache)
Cache that always return hit
flexus.set "-L1d:ports" "2" # (Ports) Number
of ports on data and tag arrays
flexus.set "-L1d:pre_queue_size" "4" #
(PreQueueSizes) Size of input arbitration queues
flexus.set "-L1d:probe_fetchmiss" "0" #
(ProbeFetchMiss) Probe hierarchy on Ifetch miss
flexus.set "-L1d:queue_size" "8" # (QueueSizes)
Size of input and output queues
flexus.set "-L1d:snoops" "8" #
(SnoopBufferSize) Number of Snoop Buffer entries
flexus.set "-L1d:tag_lat" "1" # (TagLatency)
Total latency of tag pipeline
flexus.set "-L1d:text_flexpoints" "0" #
(TextFlexpoints) Store flexpoints as text files (compatible with old FastCache
component)
flexus.set "-L1d:trace_address" "0" # (TraceAddress)
Address to initiate tracing
flexus.set "-L1d:use_reply_channel" "1" #
(UseReplyChannel) Separate Reply and Snoop channels on BackSide
flexus.set "-L2:allow_evict_clean" "0" # (EvictClean)
Cause the cache to evict clean blocks
flexus.set "-L2:array_config"
"STD:total_sets=4096:assoc=16:repl=LRU" # (ArrayConfiguration) Configuration of
cache array (STD:sets=1024:assoc=16:repl=LRU
flexus.set "-L2:bank_interleaving" "64" #
(BankInterleaving) interleaving between directory banks (64 bytes)
flexus.set "-L2:banks" "1" # (Banks) number
of directory banks in each group
flexus.set "-L2:bsize" "64" # (BlockSize)
Block size
flexus.set "-L2:cache_eb_size" "16" #
(CacheEvictBufferSize) Number of Evict Buffer entries for the cache
flexus.set "-L2:controller" "Default" #
(ControllerType) Type of controller (Default or Detailed)
flexus.set "-L2:cores" "2" # (Cores) Number
of cores
flexus.set "-L2:data_issue_lat" "1" #
(DataIssueLatency) Minimum delay between issues to the data array
flexus.set "-L2:data_lat" "8" # (DataLatency)
Total latency of data array lookup
flexus.set "-L2:dir_config" "" #
(DirectoryConfig) Configuration of directory array (sets=1024:assoc=16)
flexus.set "-L2:dir_eb_size" "16" #
(DirEvictBufferSize) Number of Evict Buffer entries for the directory
flexus.set "-L2:dir_issue_lat" "1" #
(DirIssueLatency) Minimum delay between issues to the directory
flexus.set "-L2:dir_lat" "1" # (DirLatency)
Total latency of directory lookup
flexus.set "-L2:dir_type" "inf" #
(DirectoryType) Type of directory (infinite, std, region, etc.)
flexus.set "-L2:group_interleaving" "4096" #
(GroupInterleaving) interleaving between directory bank groups (1024 bytes)
flexus.set "-L2:groups" "1" # (Groups)
number of directory bank groups
flexus.set "-L2:level" "eL2" # (CacheLevel)
CacheLevel
flexus.set "-L2:maf_size" "64" # (MAFSize)
Number of MAF entries
flexus.set "-L2:policy" "NonInclusiveMESI" # (Policy)
Coherence policy for higher caches (NonInclusiveMESI)
flexus.set "-L2:queue_size" "16" # (QueueSize)
Size of input and output queues
flexus.set "-L2:tag_issue_lat" "1" #
(TagIssueLatency) Minimum delay between issues to the tag array
flexus.set "-L2:tag_lat" "1" # (TagLatency)
Total latency of tag array lookup
flexus.set "-decoder:dispatch" "3" #
(DispatchWidth) Maximum dispatch per cycle
flexus.set "-decoder:fiq" "8" # (FIQSize)
Fetch instruction queue size
flexus.set "-decoder:multithread" "0" # (Multithread)
Enable multi-threaded execution
flexus.set "-fag:bpreds" "2" # (MaxBPred) Max
branches predicted per cycle
flexus.set "-fag:faddrs" "3" #
(MaxFetchAddress) Max fetch addresses generated per cycle
flexus.set "-fag:threads" "1" # (Threads)
Number of threads under control of this FAG
flexus.set "-magic-break:ckpt_cycle" "0" #
(CkptCycleInterval) # of cycles between checkpoints.
flexus.set "-magic-break:ckpt_cycle_name" "0" #
(CkptCycleName) Base cycle # from which to build checkpoint names.
flexus.set "-magic-break:ckpt_iter" "0" #
(CheckpointOnIteration) Checkpoint simulation when CPU 0 reaches each iteration.
flexus.set "-magic-break:ckpt_trans" "-1" #
(CheckpointEveryXTransactions) Quiesce and save every X transactions. -1
disables
flexus.set "-magic-break:end_iter" "-1" #
(TerminateOnIteration) Terminate simulation when CPU 0 reaches iteration. -1
disables
flexus.set "-magic-break:end_trans" "-1" #
(TerminateOnTransaction) Terminate simulation after ## transactions. -1
disables
flexus.set "-magic-break:first_trans" "0" #
(FirstTransactionIs) Transaction number for first transaction.
flexus.set "-magic-break:iter" "0" #
(EnableIterationCounts) Enable Iteration Counts
flexus.set "-magic-break:min_cycle" "0" # (CycleMinimum)
Minimum number of cycles to run when TerminateOnTransaction is enabled.
flexus.set "-magic-break:stats_trans" "1000" #
(TransactionStatsInterval) Statistics interval on ## transactions. -1 disables
flexus.set "-magic-break:stop_cycle" "150000" # (StopCycle)
Cycle on which to halt simulation.
flexus.set "-magic-break:stop_on_magic" "-1" #
(TerminateOnMagicBreak) Terminate simulation on a specific magic breakpoint
flexus.set "-magic-break:trans" "1" #
(EnableTransactionCounts) Enable Transaction Counts
flexus.set "-magic-break:trans_type" "0" #
(TransactionType) Workload type. 0=TPCC/JBB 1=WEB
flexus.set "-memory-map:nodes" "1" # (NumNodes)
Number of Nodes
flexus.set "-memory-map:page_map" "1" # (ReadPageMap)
Load Page Map on start
flexus.set "-memory-map:pagesize" "8192" # (PageSize)
Page size in bytes (used by statistics only)
flexus.set "-memory-map:round_robin" "1" # (RoundRobin)
Use static round-robin page allocation
flexus.set "-memory-map:write_page_map" "1" #
(CreatePageMap) Write page map as pages are created
flexus.set "-memory:UseFetchReply" "1" #
(UseFetchReply) Send FetchReply in response to FetchReq (instead of MissReply)
flexus.set "-memory:max_requests" "128" # (MaxRequests)
Maximum requests queued in loopback
flexus.set "-memory:time" "90" # (Delay) Access
time
flexus.set "-net-mapper:Banks" "1" # (Banks) Number
of banks
flexus.set "-net-mapper:Cores" "1" # (Cores) Number
of cores
flexus.set "-net-mapper:DirInterleaving" "64" #
(DirInterleaving) Interleaving between directories (in bytes)
flexus.set "-net-mapper:DirLocation" "Distributed" # (DirLocation)
Directory location (Distributed|AtMemory)
flexus.set "-net-mapper:DirXORShift" "-1" # (DirXORShift)
XOR high order bits after shifting this many bits when calculating directory
index
flexus.set "-net-mapper:Directories" "1" # (Directories)
Number of directories
flexus.set "-net-mapper:LocalDir" "0" # (LocalDir)
Treate directory as always being local to the requester
flexus.set "-net-mapper:MemAcksNeedData" "1" #
(MemAcksNeedData) When memory replies directly to requester, require data with
final ack
flexus.set "-net-mapper:MemControllers" "1" #
(MemControllers) Number of memory controllers
flexus.set "-net-mapper:MemInterleaving" "64" #
(MemInterleaving) Interleaving between memory controllers (in bytes)
flexus.set "-net-mapper:MemLocation" "0" # (MemLocation)
Memory controller locations (ex: '8,15,24,31,32,39,48,55')
flexus.set "-net-mapper:MemReplyToDir" "1" #
(MemReplyToDir) Send memory replies to the directory (instead of original
requester)
flexus.set "-net-mapper:MemXORShift" "-1" # (MemXORShift)
XOR high order bits after shifting this many bits when calculating memory index
flexus.set "-net-mapper:TwoPhaseWB" "0" # (TwoPhaseWB) 2
Phase Write-Back sends NAcks to requester, not directory
flexus.set "-network:nodes" "3" # (NumNodes)
Number of Nodes
flexus.set "-network:topology-file" "1x3-mesh.topology" #
(NetworkTopologyFile) Network topology file
flexus.set "-network:virtual-channels" "3" # (VChannels)
Number of virtual channels
flexus.set "-nic:recv-capacity" "4" # (RecvCapacity)
Recv Queue Capacity
flexus.set "-nic:send-capacity" "4" # (SendCapacity)
Send Queue Capacity
flexus.set "-nic:vc" "3" # (VChannels)
Virtual channels
flexus.set "-uarch:break_on_resynch" "0" #
(BreakOnResynchronize) Break on resynchronizer
flexus.set "-uarch:ckpt_threshold" "0" #
(CheckpointThreshold) Number of instructions between checkpoints. 0 disables
periodic checkpoints
flexus.set "-uarch:coherence" "64" #
(CoherenceUnit) Coherence Unit
flexus.set "-uarch:consistency" "1" #
(ConsistencyModel) Consistency Model
flexus.set "-uarch:fpAddOpLatency" "3" #
(FpAddOpLatency) End-to-end latency of an FP ADD/SUB operation
flexus.set "-uarch:fpAddOpPipelineResetTime" "1" #
(FpAddOpPipelineResetTime) Number of cycles required between subsequent FP
ADD/SUB operations
flexus.set "-uarch:fpCmpOpLatency" "1" #
(FpCmpOpLatency) End-to-end latency of an FP compare operation
flexus.set "-uarch:fpCmpOpPipelineResetTime" "1" #
(FpCmpOpPipelineResetTime) Number of cycles required between subsequent FP
compare operations
flexus.set "-uarch:fpCvtOpLatency" "4" #
(FpCvtOpLatency) End-to-end latency of an FP convert operation
flexus.set "-uarch:fpCvtOpPipelineResetTime" "1" #
(FpCvtOpPipelineResetTime) Number of cycles required between subsequent FP
convert operations
flexus.set "-uarch:fpDivOpLatency" "6" #
(FpDivOpLatency) End-to-end latency of an FP DIV operation
flexus.set "-uarch:fpDivOpPipelineResetTime" "5" #
(FpDivOpPipelineResetTime) Number of cycles required between subsequent FP DIV
operations
flexus.set "-uarch:fpMultOpLatency" "5" #
(FpMultOpLatency) End-to-end latency of an FP MUL operation
flexus.set "-uarch:fpMultOpPipelineResetTime" "2" #
(FpMultOpPipelineResetTime) Number of cycles required between subsequent FP MUL
operations
flexus.set "-uarch:fpSqrtOpLatency" "6" #
(FpSqrtOpLatency) End-to-end latency of an FP SQRT operation
flexus.set "-uarch:fpSqrtOpPipelineResetTime" "5" #
(FpSqrtOpPipelineResetTime) Number of cycles required between subsequent FP
SQRT operations
flexus.set "-uarch:in_order_execute" "0" #
(InOrderExecute) Ensure that instructions execute in order
flexus.set "-uarch:in_order_memory" "0" #
(InOrderMemory) Only allow ROB/SB head to issue to memory
flexus.set "-uarch:intAluOpLatency" "1" #
(IntAluOpLatency) End-to-end latency of an integer ALU operation
flexus.set "-uarch:intAluOpPipelineResetTime" "1" #
(IntAluOpPipelineResetTime) Number of cycles required between subsequent
integer ALU operations
flexus.set "-uarch:intDivOpLatency" "16" #
(IntDivOpLatency) End-to-end latency of an integer DIV operation
flexus.set "-uarch:intDivOpPipelineResetTime" "6" #
(IntDivOpPipelineResetTime) Number of cycles required between subsequent
integer DIV operations
flexus.set "-uarch:intMultOpLatency" "3" #
(IntMultOpLatency) End-to-end latency of an integer MUL operation
flexus.set "-uarch:intMultOpPipelineResetTime" "1" #
(IntMultOpPipelineResetTime) Number of cycles required between subsequent
integer MUL operations
flexus.set "-uarch:memports" "2" # (MemoryPorts)
Memory Ports
flexus.set "-uarch:multithread" "0" # (Multithread)
Enable multi-threaded execution
flexus.set "-uarch:naw_bypass_sb" "0" # (NAWBypassSB)
Allow Non-Allocating-Writes to bypass store-buffer
flexus.set "-uarch:naw_wait_at_sync" "0" #
(NAWWaitAtSync) Force MEMBAR #Sync to wait for non-allocating writes to finish
flexus.set "-uarch:numFpAlu" "1" # (NumFpAlu)
Number of FP ALUs
flexus.set "-uarch:numFpMult" "1" # (NumFpMult)
Number of FP MUL/DIV units
flexus.set "-uarch:numIntAlu" "2" # (NumIntAlu)
Number of integer ALUs
flexus.set "-uarch:numIntMult" "1" # (NumIntMult)
Number of integer MUL/DIV units
flexus.set "-uarch:off-chip-se" "90" #
(OffChipLatency) Off-Chip Side-Effect latency
flexus.set "-uarch:on-chip-se" "1" #
(OnChipLatency) On-Chip Side-Effect latency
flexus.set "-uarch:prefetch_early" "0" #
(PrefetchEarly) Issue store prefetch requests when address resolves
flexus.set "-uarch:retire" "3" # (RetireWidth)
Retirement width
flexus.set "-uarch:rob" "60" # (ROBSize)
Reorder buffer size
flexus.set "-uarch:sb" "16" # (SBSize) Store
buffer size
flexus.set "-uarch:snoopports" "1" # (SnoopPorts)
Snoop Ports
flexus.set "-uarch:spec_atomic_val" "0" #
(SpeculateOnAtomicValue) Speculate on the Value of Atomics
flexus.set "-uarch:spec_atomic_val_perfect" "0" #
(SpeculateOnAtomicValuePerfect) Use perfect atomic value prediction
flexus.set "-uarch:spec_ckpts" "0" #
(SpeculativeCheckpoints) Number of checkpoints allowed. 0 for infinite
flexus.set "-uarch:spec_order" "0" #
(SpeculativeOrder) Speculate on Memory Order
flexus.set "-uarch:spin_control" "1" # (SpinControl)
Enable spin control
flexus.set "-uarch:storeprefetch" "16" #
(StorePrefetches) Simultaneous store prefeteches
flexus.set "-uarch:validate-mmu" "0" # (ValidateMMU)
Validate MMU after each instruction
flexus.set "-ufetch:associativity" "2" #
(Associativity) ICache associativity
flexus.set "-ufetch:clean_evict" "1" # (CleanEvict)
Enable eviction messages
flexus.set "-ufetch:evict_on_snoop" "0" # (EvictOnSnoop)
Send evicts on Snoop Channel (otherwise use Request Channel)
flexus.set "-ufetch:faq" "24" # (FAQSize)
Fetch address queue size
flexus.set "-ufetch:finst" "4" #
(MaxFetchInstructions) Max instructions fetched per cycle
flexus.set "-ufetch:flines" "2" #
(MaxFetchLines) Max i-cache lines fetched per cycle
flexus.set "-ufetch:iline" "64" #
(ICacheLineSize) Icache line size in bytes
flexus.set "-ufetch:miss_queue_size" "4" #
(MissQueueSize) Maximum size of the fetch miss queue
flexus.set "-ufetch:perfect" "0" #
(PerfectICache) Use a perfect ICache
flexus.set "-ufetch:prefetch" "1" #
(PrefetchEnabled) Enable Next-line Prefetcher
flexus.set "-ufetch:send_acks" "1" # (SendAcks)
Send acknowledgements when we received data
flexus.set "-ufetch:size" "32768" # (Size) ICache
size in bytes
flexus.set "-ufetch:threads" "1" # (Threads)
Number of threads under control of this uFetch
flexus.set "-ufetch:use_reply_channel" "1" #
(UseReplyChannel) Send replies on Reply Channel and only Evicts on Snoop Channel