Hi,
Make sure your system has enough MSHRs, out of the box, L1, and L2 are set
to have a few MSHR entries.
Also, stride prefetcher is not the best, you may try something better: DCPT
gives me better numbers.
On Fri, Apr 15, 2022 at 4:57 AM Zicong Wang via gem5-users <
gem5-users@gem5.org> wrote:
> Hi Jason,
>
> We are testing the memory bandwidth program STREAM **(
> https://www.cs.virginia.edu/stream/), but the results show that the CPU
> cannot fully utilize the DDR bandwidth, and the achieved bandwidth is quite
> low and about 1/10 of the peak bandwidth (peakBW in stats.txt). I tested
> the STREAM binary on my x86 computer and got the near peak bandwidth, so I
> believe the program is ok.
>
> I've seen the maillist dialogue
> https://www.mail-archive.com/gem5-users@gem5.org/msg12965.html, and I
> think I've met the similar problem. So I tried the suggestions proposed by
> Andreas, including *enable l1/l2 prefetcher*, *********using
> ARM detailed CPU*. Although these methods can improve the bandwidth, the
> results show it has limited effect. Besides, I've also tested the STREAM
> program in FS mode with x86 O3/Minor/TimingSimple CPU, and tested it in SE
> mode with ruby option, but all the results are similar and there is no
> essential difference.
>
> I guess it is a general problem in simulation with gem5. I'm wondering
> if the result is expected or is there something wrong with the system
> model?
>
> Two of the experimental results are attached for reference:
>
> *1. **X86 O3CPU, SE-mode, w/o l2 prefetcher:*
>
> ./build/X86/gem5.opt --outdir=m5out-stream configs/example/se.py
> --cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache
> --l2_size=8MB --mem-type=DDR3_1600_8x8 -c ../stream/stream
>
> *STREAM output:*
> -
>
> FunctionBest Rate MB/s Avg time Min time Max time
> Copy:1099.0 0.014559 0.014559 0.014559
> Scale: 1089.7 0.014683 0.014683 0.014683
> Add: 1213.0 0.019786 0.019786 0.019786
> Triad: 1222.1 0.019639 0.019639 0.019639
> -
>
> *stats.txt (dram related):*
>
> system.mem_ctrls.dram.bytesRead 238807808 # Total bytes read
> (Byte)
> system.mem_ctrls.dram.bytesWritten 121179776 # Total bytes written
> (Byte)
> system.mem_ctrls.dram.avgRdBW 718.689026 # Average DRAM read
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.avgWrBW 364.688977 # Average DRAM write
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.peakBW 12800.00 # Theoretical peak
> bandwidth in MiByte/s ((Byte/Second))
> system.mem_ctrls.dram.busUtil 8.46 # Data bus
> utilization in percentage (Ratio)
> system.mem_ctrls.dram.busUtilRead 5.61 # Data bus
> utilization in percentage for reads (Ratio)
> system.mem_ctrls.dram.busUtilWrite2.85 # Data bus
> utilization in percentage for writes (Ratio)
> system.mem_ctrls.dram.pageHitRate40.57 # Row buffer hit
> rate, read and write combined (Ratio)
>
>
> *2**. X86 O3CPU, SE**-mode, w/* *l2 prefetcher:*
>
> ./build/X86/gem5.opt --outdir=m5out-stream-l2hwp configs/example/se.py
> --cpu-type=O3CPU --caches --l1d_size=256kB --l1i_size=256kB --l2cache
> --l2_size=8MB --l2-hwp-typ=StridePrefetcher --mem-type=DDR3_1600_8x8 -c
> ../stream/stream
>
> *STREAM output:*
> -
> FunctionBest Rate MB/s Avg time Min time Max time
> Copy:1703.9 0.009390 0.009390 0.009390
> Scale: 1718.6 0.009310 0.009310 0.009310
> Add: 2087.3 0.011498 0.011498 0.011498
> Triad: 2227.2 0.010776 0.010776 0.010776
> -
>
> *stats.txt (dram related):*
>
> system.mem_ctrls.dram.bytesRead 238811712 # Total bytes read
> (Byte)
> system.mem_ctrls.dram.bytesWritten 121179840 # Total bytes written
> (Byte)
> system.mem_ctrls.dram.avgRdBW 1014.129912 # Average DRAM read
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.avgWrBW 514.598298 # Average DRAM write
> bandwidth in MiBytes/s ((Byte/Second))
> system.mem_ctrls.dram.peakBW 12800.00 # Theoretical peak
> bandwidth in MiByte/s ((Byte/Second))
> system.mem_ctrls.dram.busUtil11.94 # Data bus
> utilization in percentage (Ratio)
> system.mem_ctrls.dram.busUtilRead 7.92 # Data bus
> utilization in percentage for reads (Ratio)
> system.mem_ctrls.dram.busUtilWrite4.02 # Data bus
> utilization in percentage for writes (Ratio)
> system.mem_ctrls.dram.pageHitRate75.37 # Row buffer hit
> rate, read and write combined