Re: UIPC total accounted cycles and instructions

zhadji01 Tue, 02 Oct 2012 02:10:21 -0700

Hi Onur thanks for your answer.
Just to make sure I understand correct.

1) So the Idle cycles are not part of the benchmark we don' includethem in UIPC calculation. But what about the trap cycles? Traps arefor handling interrupts right? Should we include them in formula or not?

2) Okay here is an example.

I attached the configuration file of the flexus components. It'sbasically the configuration of arm-corex-15 with one core and 4MB L2.The simulator is CMP.L2SharedNUCA.OoO. The workloads is cloud9/1cpu. Iran 800 flexpoints. 80 from each phase. Here are the cycle statistics.NicoCluster1:/home/zhadji01/flexus-results/cloud9_1cpu_timing_4MBL2/cloud9_1cpu>stat-manager print sum | grep -icycles

   sys-L1d-Track:CyclesWithN
   sys-L1d-Track:CyclesWithN:Coherence
   sys-L1d-Track:CyclesWithN:NonCoherence
   sys-cycles                               40000000
   sys-uarch-SpinCycles                     4243
   sys-uarch-TB:Idle:AccountedCycles        0
   sys-uarch-TB:SkippedCycles               0
   sys-uarch-TB:System:AccountedCycles      323648
   sys-uarch-TB:Trap:AccountedCycles        2166076
   sys-uarch-TB:User:AccountedCycles        37512199
   sys-ufetch-MissCycles                    2811494

NicoCluster1:/home/zhadji01/flexus-results/cloud9_1cpu_timing_4MBL2/cloud9_1cpu>echo "323648 + 2166076 + 37512199" |bc

40001923
You see that the total should be 40000000 but it is 40001923.

And one more question. I noticed that the insn_count file thatcontains the number of instructions executed by simics also missmatches the instructions flexus statistics show. But I guess thedifference is due to the fact that simics count steps and a simicsstep may contain many instructions right?


Quoting Onur Koçberber <[email protected]>:

Hi Zacharias,
Here are the answers for your questions.
1) The formula is correct because the Idle counters count the timewhen the processor is in the Idle loop. These cycles are not a partof the benchmark and they do not contribute to anything useful, wedon't include them in the UIPC calculation.2) Can you give us more details about your simulation configuration?Which simulator and benchmark do you use? You can also send some ofthe debug outputs so that we can see how long you run and how muchdifference you have.3) Yes (you listed 'User' counter twice, I assume you wanted to say'Idle' for the second one).4) I am not aware of a counter for the executed instructions.However, you might use branch predictor information to find theexecuted but not committed instructions.
Regards,
Onur

On Sep 6, 2012, at 2:02 PM, [email protected] wrote:
Hi,
1) In flexus 4.1 distibution the postprocess.sh calculates UIPCwith this formula User:Commits:Busy / (User:AccountedCycles +System:AccountedCycles).Shouldn't it be user:commits:busy / (User:AccountedCycles +System:AccountedCycles + Idle::AccountedCycles +Trap::AccountedCycles ) ?
2) I guess the sum of User:AccountedCycles, System:AccountedCycles,Idle::AccountedCycles and Trap::AccountedCycles represents thetotal execution cycles and it should be equal to flexpoints *measured_region_cycles * cores. In all simulations I ran thesenumbers where not equal (but close).
And I'm sure that all flex points ran correctly. Why is this happening?
3) Is the total number of instructions commited equal toUser:Commits:Busy + System:Commits:Busy + Trap:Commits:Busy +User:Commits:Busy ?
4) Does flexus have separate statistics for instructions committedand executed?
thanks -Zacharias Hadjilambrou

flexus.set "-L1d:allow_evict_clean"       "1"                  # (EvictClean) 
Cause the cache to evict clean blocks
flexus.set "-L1d:array_config"            "STD:size=32768:assoc=2:repl=LRU" # 
(ArrayConfiguration) Configuration of cache array 
(STD:sets=1024:assoc=16:repl=LRU
flexus.set "-L1d:banks"                   "1"                  # (Banks) number 
of banks on the data and tag arrays
flexus.set "-L1d:bsize"                   "64"                 # (BlockSize) 
Block size
flexus.set "-L1d:bustime_data"            "2"                  # (BusTime_Data) 
Bus transfer time - data
flexus.set "-L1d:bustime_nodata"          "1"                  # 
(BusTime_NoData) Bus transfer time - no data
flexus.set "-L1d:cache_type"              
"InclusiveMESI:snoop_lru=false:AlwaysNAck=true" # (CacheType) Type of cache 
(InclusiveMOESI)
flexus.set "-L1d:cores"                   "1"                  # (Cores) Number 
of cores
flexus.set "-L1d:data_issue_lat"          "1"                  # 
(DataIssueLatency) Minimum delay between issues to data pipeline
flexus.set "-L1d:data_lat"                "2"                  # (DataLatency) 
Total latency of data pipeline
flexus.set "-L1d:dup_tag_issue_lat"       "1"                  # 
(TagIssueLatency) Minimum delay between issues to tag pipeline
flexus.set "-L1d:eb"                      "8"                  # 
(EvictBufferSize) Number of Evict Buffer entries
flexus.set "-L1d:evict_on_snoop"          "0"                  # (EvictOnSnoop) 
Send evictions on Snoop channel
flexus.set "-L1d:evict_writable_has_data" "0"                  # 
(EvictWritableHasData) Send data with EvictWritable messages
flexus.set "-L1d:fast_evict_clean"        "0"                  # 
(FastEvictClean) Send clean evicts without reserving data bus
flexus.set "-L1d:gzip_flexpoints"         "1"                  # 
(GZipFlexpoints) Compress flexpoints with gzip
flexus.set "-L1d:level"                   "eL1"                # (CacheLevel) 
CacheLevel
flexus.set "-L1d:maf"                     "32"                 # (MAFSize) 
Number of MAF entries
flexus.set "-L1d:maf_targets"             "0"                  # 
(MAFTargetsPerRequest) Number of MAF targets per request
flexus.set "-L1d:no_bus"                  "0"                  # (NoBus) No bus 
model (i.e., infinite BW, no latency)
flexus.set "-L1d:perfect_cache"           "0"                  # (PerfectCache) 
Cache that always return hit
flexus.set "-L1d:ports"                   "2"                  # (Ports) Number 
of ports on data and tag arrays
flexus.set "-L1d:pre_queue_size"          "4"                  # 
(PreQueueSizes) Size of input arbitration queues
flexus.set "-L1d:probe_fetchmiss"         "0"                  # 
(ProbeFetchMiss) Probe hierarchy on Ifetch miss
flexus.set "-L1d:queue_size"              "8"                  # (QueueSizes) 
Size of input and output queues
flexus.set "-L1d:snoops"                  "8"                  # 
(SnoopBufferSize) Number of Snoop Buffer entries
flexus.set "-L1d:tag_lat"                 "1"                  # (TagLatency) 
Total latency of tag pipeline
flexus.set "-L1d:text_flexpoints"         "0"                  # 
(TextFlexpoints) Store flexpoints as text files (compatible with old FastCache 
component)
flexus.set "-L1d:trace_address"           "0"                  # (TraceAddress) 
Address to initiate tracing
flexus.set "-L1d:use_reply_channel"       "1"                  # 
(UseReplyChannel) Separate Reply and Snoop channels on BackSide
flexus.set "-L2:allow_evict_clean"        "0"                  # (EvictClean) 
Cause the cache to evict clean blocks
flexus.set "-L2:array_config"             
"STD:total_sets=4096:assoc=16:repl=LRU" # (ArrayConfiguration) Configuration of 
cache array (STD:sets=1024:assoc=16:repl=LRU
flexus.set "-L2:bank_interleaving"        "64"                 # 
(BankInterleaving) interleaving between directory banks (64 bytes)
flexus.set "-L2:banks"                    "1"                  # (Banks) number 
of directory banks in each group
flexus.set "-L2:bsize"                    "64"                 # (BlockSize) 
Block size
flexus.set "-L2:cache_eb_size"            "16"                 # 
(CacheEvictBufferSize) Number of Evict Buffer entries for the cache
flexus.set "-L2:controller"               "Default"            # 
(ControllerType) Type of controller (Default or Detailed)
flexus.set "-L2:cores"                    "2"                  # (Cores) Number 
of cores
flexus.set "-L2:data_issue_lat"           "1"                  # 
(DataIssueLatency) Minimum delay between issues to the data array
flexus.set "-L2:data_lat"                 "8"                  # (DataLatency) 
Total latency of data array lookup
flexus.set "-L2:dir_config"               ""                   # 
(DirectoryConfig) Configuration of directory array (sets=1024:assoc=16)
flexus.set "-L2:dir_eb_size"              "16"                 # 
(DirEvictBufferSize) Number of Evict Buffer entries for the directory
flexus.set "-L2:dir_issue_lat"            "1"                  # 
(DirIssueLatency) Minimum delay between issues to the directory
flexus.set "-L2:dir_lat"                  "1"                  # (DirLatency) 
Total latency of directory lookup
flexus.set "-L2:dir_type"                 "inf"                # 
(DirectoryType) Type of directory (infinite, std, region, etc.)
flexus.set "-L2:group_interleaving"       "4096"               # 
(GroupInterleaving) interleaving between directory bank groups (1024 bytes)
flexus.set "-L2:groups"                   "1"                  # (Groups) 
number of directory bank groups
flexus.set "-L2:level"                    "eL2"                # (CacheLevel) 
CacheLevel
flexus.set "-L2:maf_size"                 "64"                 # (MAFSize) 
Number of MAF entries
flexus.set "-L2:policy"                   "NonInclusiveMESI"   # (Policy) 
Coherence policy for higher caches (NonInclusiveMESI)
flexus.set "-L2:queue_size"               "16"                 # (QueueSize) 
Size of input and output queues
flexus.set "-L2:tag_issue_lat"            "1"                  # 
(TagIssueLatency) Minimum delay between issues to the tag array
flexus.set "-L2:tag_lat"                  "1"                  # (TagLatency) 
Total latency of tag array lookup
flexus.set "-decoder:dispatch"            "3"                  # 
(DispatchWidth) Maximum dispatch per cycle
flexus.set "-decoder:fiq"                 "8"                  # (FIQSize) 
Fetch instruction queue size
flexus.set "-decoder:multithread"         "0"                  # (Multithread) 
Enable multi-threaded execution
flexus.set "-fag:bpreds"                  "2"                  # (MaxBPred) Max 
branches predicted per cycle
flexus.set "-fag:faddrs"                  "3"                  # 
(MaxFetchAddress) Max fetch addresses generated per cycle
flexus.set "-fag:threads"                 "1"                  # (Threads) 
Number of threads under control of this FAG
flexus.set "-magic-break:ckpt_cycle"      "0"                  # 
(CkptCycleInterval) # of cycles between checkpoints.
flexus.set "-magic-break:ckpt_cycle_name" "0"                  # 
(CkptCycleName) Base cycle # from which to build checkpoint names.
flexus.set "-magic-break:ckpt_iter"       "0"                  # 
(CheckpointOnIteration) Checkpoint simulation when CPU 0 reaches each iteration.
flexus.set "-magic-break:ckpt_trans"      "-1"                 # 
(CheckpointEveryXTransactions) Quiesce and save every X transactions. -1 
disables
flexus.set "-magic-break:end_iter"        "-1"                 # 
(TerminateOnIteration) Terminate simulation when CPU 0 reaches iteration.  -1 
disables
flexus.set "-magic-break:end_trans"       "-1"                 # 
(TerminateOnTransaction) Terminate simulation after ## transactions.  -1 
disables
flexus.set "-magic-break:first_trans"     "0"                  # 
(FirstTransactionIs) Transaction number for first transaction.
flexus.set "-magic-break:iter"            "0"                  # 
(EnableIterationCounts) Enable Iteration Counts
flexus.set "-magic-break:min_cycle"       "0"                  # (CycleMinimum) 
Minimum number of cycles to run when TerminateOnTransaction is enabled.
flexus.set "-magic-break:stats_trans"     "1000"               # 
(TransactionStatsInterval) Statistics interval on ## transactions.  -1 disables
flexus.set "-magic-break:stop_cycle"      "150000"             # (StopCycle) 
Cycle on which to halt simulation.
flexus.set "-magic-break:stop_on_magic"   "-1"                 # 
(TerminateOnMagicBreak) Terminate simulation on a specific magic breakpoint
flexus.set "-magic-break:trans"           "1"                  # 
(EnableTransactionCounts) Enable Transaction Counts
flexus.set "-magic-break:trans_type"      "0"                  # 
(TransactionType) Workload type.  0=TPCC/JBB  1=WEB
flexus.set "-memory-map:nodes"            "1"                  # (NumNodes) 
Number of Nodes
flexus.set "-memory-map:page_map"         "1"                  # (ReadPageMap) 
Load Page Map on start
flexus.set "-memory-map:pagesize"         "8192"               # (PageSize) 
Page size in bytes (used by statistics only)
flexus.set "-memory-map:round_robin"      "1"                  # (RoundRobin) 
Use static round-robin page allocation
flexus.set "-memory-map:write_page_map"   "1"                  # 
(CreatePageMap) Write page map as pages are created
flexus.set "-memory:UseFetchReply"        "1"                  # 
(UseFetchReply) Send FetchReply in response to FetchReq (instead of MissReply)
flexus.set "-memory:max_requests"         "128"                # (MaxRequests) 
Maximum requests queued in loopback
flexus.set "-memory:time"                 "90"                 # (Delay) Access 
time
flexus.set "-net-mapper:Banks"            "1"                  # (Banks) Number 
of banks
flexus.set "-net-mapper:Cores"            "1"                  # (Cores) Number 
of cores
flexus.set "-net-mapper:DirInterleaving"  "64"                 # 
(DirInterleaving) Interleaving between directories (in bytes)
flexus.set "-net-mapper:DirLocation"      "Distributed"        # (DirLocation) 
Directory location (Distributed|AtMemory)
flexus.set "-net-mapper:DirXORShift"      "-1"                 # (DirXORShift) 
XOR high order bits after shifting this many bits when calculating directory 
index
flexus.set "-net-mapper:Directories"      "1"                  # (Directories) 
Number of directories
flexus.set "-net-mapper:LocalDir"         "0"                  # (LocalDir) 
Treate directory as always being local to the requester
flexus.set "-net-mapper:MemAcksNeedData"  "1"                  # 
(MemAcksNeedData) When memory replies directly to requester, require data with 
final ack
flexus.set "-net-mapper:MemControllers"   "1"                  # 
(MemControllers) Number of memory controllers
flexus.set "-net-mapper:MemInterleaving"  "64"                 # 
(MemInterleaving) Interleaving between memory controllers (in bytes)
flexus.set "-net-mapper:MemLocation"      "0"                  # (MemLocation) 
Memory controller locations (ex: '8,15,24,31,32,39,48,55')
flexus.set "-net-mapper:MemReplyToDir"    "1"                  # 
(MemReplyToDir) Send memory replies to the directory (instead of original 
requester)
flexus.set "-net-mapper:MemXORShift"      "-1"                 # (MemXORShift) 
XOR high order bits after shifting this many bits when calculating memory index
flexus.set "-net-mapper:TwoPhaseWB"       "0"                  # (TwoPhaseWB) 2 
Phase Write-Back sends NAcks to requester, not directory
flexus.set "-network:nodes"               "3"                  # (NumNodes) 
Number of Nodes
flexus.set "-network:topology-file"       "1x3-mesh.topology"  # 
(NetworkTopologyFile) Network topology file
flexus.set "-network:virtual-channels"    "3"                  # (VChannels) 
Number of virtual channels
flexus.set "-nic:recv-capacity"           "4"                  # (RecvCapacity) 
Recv Queue Capacity
flexus.set "-nic:send-capacity"           "4"                  # (SendCapacity) 
Send Queue Capacity
flexus.set "-nic:vc"                      "3"                  # (VChannels) 
Virtual channels
flexus.set "-uarch:break_on_resynch"      "0"                  # 
(BreakOnResynchronize) Break on resynchronizer
flexus.set "-uarch:ckpt_threshold"        "0"                  # 
(CheckpointThreshold) Number of instructions between checkpoints.  0 disables 
periodic checkpoints
flexus.set "-uarch:coherence"             "64"                 # 
(CoherenceUnit) Coherence Unit
flexus.set "-uarch:consistency"           "1"                  # 
(ConsistencyModel) Consistency Model
flexus.set "-uarch:fpAddOpLatency"        "3"                  # 
(FpAddOpLatency) End-to-end latency of an FP ADD/SUB operation
flexus.set "-uarch:fpAddOpPipelineResetTime" "1"                  # 
(FpAddOpPipelineResetTime) Number of cycles required between subsequent FP 
ADD/SUB operations
flexus.set "-uarch:fpCmpOpLatency"        "1"                  # 
(FpCmpOpLatency) End-to-end latency of an FP compare operation
flexus.set "-uarch:fpCmpOpPipelineResetTime" "1"                  # 
(FpCmpOpPipelineResetTime) Number of cycles required between subsequent FP 
compare operations
flexus.set "-uarch:fpCvtOpLatency"        "4"                  # 
(FpCvtOpLatency) End-to-end latency of an FP convert operation
flexus.set "-uarch:fpCvtOpPipelineResetTime" "1"                  # 
(FpCvtOpPipelineResetTime) Number of cycles required between subsequent FP 
convert operations
flexus.set "-uarch:fpDivOpLatency"        "6"                  # 
(FpDivOpLatency) End-to-end latency of an FP DIV operation
flexus.set "-uarch:fpDivOpPipelineResetTime" "5"                  # 
(FpDivOpPipelineResetTime) Number of cycles required between subsequent FP DIV 
operations
flexus.set "-uarch:fpMultOpLatency"       "5"                  # 
(FpMultOpLatency) End-to-end latency of an FP MUL operation
flexus.set "-uarch:fpMultOpPipelineResetTime" "2"                  # 
(FpMultOpPipelineResetTime) Number of cycles required between subsequent FP MUL 
operations
flexus.set "-uarch:fpSqrtOpLatency"       "6"                  # 
(FpSqrtOpLatency) End-to-end latency of an FP SQRT operation
flexus.set "-uarch:fpSqrtOpPipelineResetTime" "5"                  # 
(FpSqrtOpPipelineResetTime) Number of cycles required between subsequent FP 
SQRT operations
flexus.set "-uarch:in_order_execute"      "0"                  # 
(InOrderExecute) Ensure that instructions execute in order
flexus.set "-uarch:in_order_memory"       "0"                  # 
(InOrderMemory) Only allow ROB/SB head to issue to memory
flexus.set "-uarch:intAluOpLatency"       "1"                  # 
(IntAluOpLatency) End-to-end latency of an integer ALU operation
flexus.set "-uarch:intAluOpPipelineResetTime" "1"                  # 
(IntAluOpPipelineResetTime) Number of cycles required between subsequent 
integer ALU operations
flexus.set "-uarch:intDivOpLatency"       "16"                 # 
(IntDivOpLatency) End-to-end latency of an integer DIV operation
flexus.set "-uarch:intDivOpPipelineResetTime" "6"                  # 
(IntDivOpPipelineResetTime) Number of cycles required between subsequent 
integer DIV operations
flexus.set "-uarch:intMultOpLatency"      "3"                  # 
(IntMultOpLatency) End-to-end latency of an integer MUL operation
flexus.set "-uarch:intMultOpPipelineResetTime" "1"                  # 
(IntMultOpPipelineResetTime) Number of cycles required between subsequent 
integer MUL operations
flexus.set "-uarch:memports"              "2"                  # (MemoryPorts) 
Memory Ports
flexus.set "-uarch:multithread"           "0"                  # (Multithread) 
Enable multi-threaded execution
flexus.set "-uarch:naw_bypass_sb"         "0"                  # (NAWBypassSB) 
Allow Non-Allocating-Writes to bypass store-buffer
flexus.set "-uarch:naw_wait_at_sync"      "0"                  # 
(NAWWaitAtSync) Force MEMBAR #Sync to wait for non-allocating writes to finish
flexus.set "-uarch:numFpAlu"              "1"                  # (NumFpAlu) 
Number of FP ALUs
flexus.set "-uarch:numFpMult"             "1"                  # (NumFpMult) 
Number of FP MUL/DIV units
flexus.set "-uarch:numIntAlu"             "2"                  # (NumIntAlu) 
Number of integer ALUs
flexus.set "-uarch:numIntMult"            "1"                  # (NumIntMult) 
Number of integer MUL/DIV units
flexus.set "-uarch:off-chip-se"           "90"                 # 
(OffChipLatency) Off-Chip Side-Effect latency
flexus.set "-uarch:on-chip-se"            "1"                  # 
(OnChipLatency) On-Chip Side-Effect latency
flexus.set "-uarch:prefetch_early"        "0"                  # 
(PrefetchEarly) Issue store prefetch requests when address resolves
flexus.set "-uarch:retire"                "3"                  # (RetireWidth) 
Retirement width
flexus.set "-uarch:rob"                   "60"                 # (ROBSize) 
Reorder buffer size
flexus.set "-uarch:sb"                    "16"                 # (SBSize) Store 
buffer size
flexus.set "-uarch:snoopports"            "1"                  # (SnoopPorts) 
Snoop Ports
flexus.set "-uarch:spec_atomic_val"       "0"                  # 
(SpeculateOnAtomicValue) Speculate on the Value of Atomics
flexus.set "-uarch:spec_atomic_val_perfect" "0"                  # 
(SpeculateOnAtomicValuePerfect) Use perfect atomic value prediction
flexus.set "-uarch:spec_ckpts"            "0"                  # 
(SpeculativeCheckpoints) Number of checkpoints allowed.  0 for infinite
flexus.set "-uarch:spec_order"            "0"                  # 
(SpeculativeOrder) Speculate on Memory Order
flexus.set "-uarch:spin_control"          "1"                  # (SpinControl) 
Enable spin control
flexus.set "-uarch:storeprefetch"         "16"                 # 
(StorePrefetches) Simultaneous store prefeteches
flexus.set "-uarch:validate-mmu"          "0"                  # (ValidateMMU) 
Validate MMU after each instruction
flexus.set "-ufetch:associativity"        "2"                  # 
(Associativity) ICache associativity
flexus.set "-ufetch:clean_evict"          "1"                  # (CleanEvict) 
Enable eviction messages
flexus.set "-ufetch:evict_on_snoop"       "0"                  # (EvictOnSnoop) 
Send evicts on Snoop Channel (otherwise use Request Channel)
flexus.set "-ufetch:faq"                  "24"                 # (FAQSize) 
Fetch address queue size
flexus.set "-ufetch:finst"                "4"                  # 
(MaxFetchInstructions) Max instructions fetched per cycle
flexus.set "-ufetch:flines"               "2"                  # 
(MaxFetchLines) Max i-cache lines fetched per cycle
flexus.set "-ufetch:iline"                "64"                 # 
(ICacheLineSize) Icache line size in bytes
flexus.set "-ufetch:miss_queue_size"      "4"                  # 
(MissQueueSize) Maximum size of the fetch miss queue
flexus.set "-ufetch:perfect"              "0"                  # 
(PerfectICache) Use a perfect ICache
flexus.set "-ufetch:prefetch"             "1"                  # 
(PrefetchEnabled) Enable Next-line Prefetcher
flexus.set "-ufetch:send_acks"            "1"                  # (SendAcks) 
Send acknowledgements when we received data
flexus.set "-ufetch:size"                 "32768"              # (Size) ICache 
size in bytes
flexus.set "-ufetch:threads"              "1"                  # (Threads) 
Number of threads under control of this uFetch
flexus.set "-ufetch:use_reply_channel"    "1"                  # 
(UseReplyChannel) Send replies on Reply Channel and only Evicts on Snoop Channel

Re: UIPC total accounted cycles and instructions

Reply via email to