[gem5-users] Re: Low memory bandwidth achieved with STREAM benchmark

2022-04-24 Thread Majid Jalili via gem5-users
Hi, You can find the configuration I usually use. There are a few things that are a bit unrealistic, for example, large SQ size, but I usually do so to account for better features on a real machine that we do not have at the moment. The command would have: --cpu-type=DerivO3CPU

[gem5-users] Re: Low memory bandwidth achieved with STREAM benchmark

2022-04-23 Thread Jason Lowe-Power via gem5-users
Majid, These are all great suggestions! Do you have a configuration file that you would be willing to share? It would be a huge benefit to the community if we had some better default configurations in the "examples" for gem5 configuration files. We're also trying to use the new standard library

[gem5-users] Re: Low memory bandwidth achieved with STREAM benchmark

2022-04-22 Thread Majid Jalili via gem5-users
I think it is hard to get to a real machine level in terms of BW. But By looking at your stats, I found the lsqFullEvents is high. You can go after the CPU to make it more aggressive, increasing Load/Store queue size, and ROB depth are the minimal changes you can make. I usually do at least ROB

[gem5-users] Re: Low memory bandwidth achieved with STREAM benchmark

2022-04-16 Thread 王子聪 via gem5-users
Hi Majid, Thanks for your suggestion! I check the default number of MSHRs (in configs/common/Caches.py), and found the default #MSHR of L1/L2 are 4 and 20 respectively. According to the PACT’18 paper "Cimple: Instruction and Memory Level Parallelism: A DSL for Uncovering ILP and MLP”, it

[gem5-users] Re: Low memory bandwidth achieved with STREAM benchmark

2022-04-15 Thread Majid Jalili via gem5-users
Hi, Make sure your system has enough MSHRs, out of the box, L1, and L2 are set to have a few MSHR entries. Also, stride prefetcher is not the best, you may try something better: DCPT gives me better numbers. On Fri, Apr 15, 2022 at 4:57 AM Zicong Wang via gem5-users < gem5-users@gem5.org> wrote: