[gem5-users] Re: Performance tanking with memory intensive benchmark compared to real machine

Majid Jalili via gem5-users Sun, 01 Aug 2021 08:05:40 -0700

Matching up performance with that of a real machine is extremely difficult.
I have been trying this and usually cannot come very close. However, you
can find my advice below:
1) Use prefetchers, at least one per level. They can make a huge
difference! For example, for predictable access patterns (e.g., lbm in
spec) you can see your IPC improves by a factor of 2 or 3.
2) Use a DDR4 memory model
3) Make sure the ROB and LSQ sizes are also large
4) Make sure in your microkernel there is not print or court statement,
they will impact the final results
5) CPU frequency is also very important


I did not check your config files, you may have done all of these :)
Majid





On Fri, Jul 30, 2021 at 3:16 PM Jared Nye via gem5-users <
gem5-users@gem5.org> wrote:

> Hello,
>
> I am running a simple single threaded memory benchmark that measures the
> time it takes to copy an array (https://github.com/BTone/cagbench). I run
> the benchmark in SE mode with only 1 thread (and 1 CPU) configured to match
> the setup used in gem5-Skylake (
> https://github.com/darchr/gem5-skylake-config) with 32 kB L1I and L1D
> cache, 256 kB L2 and 8 MB LLC.
>
> On a real Intel Skylake (i7 6700k), DDR4-2400:
> With an array size of 8 MB (total working set of 16 MB), the throughput is
> ~11,000 MB/s and with an array size 16 MB (total working set of 32 MB) the
> throughput is ~9,500 MB/s.
>
> In Gem5 (darchr/gem5-skylake-config):
> With an array size of 8 MB (total working set of 16 MB), the throughput is
> ~6,000 MB/s. However, with an array size 16 MB (total working set of 32 MB)
> the throughput drops to ~700 MB/s.
>
> The performance when the workload mostly fits in the cache hierarchy is
> reasonable, but ~700 MB/s seems far slower and does not seem commensurate
> with the real system.
>
> I think this has something to do with the memory system past the
> last-level cache, but I am having trouble determining what exactly the
> issue is.
>
> Just for reference, this is how I have the cache hierarchy configured (I
> reduced the tag/data/response latencies to eliminate the caches from being
> an issue):
>
> Both L1I and L1D caches:
>     size = '32kB'
>     assoc = 8
>     tag_latency = 1
>     data_latency = 1
>     response_latency = 1
>     mshrs = 128
>     tgts_per_mshr = 16
>     write_buffers = 56
>     demand_mshr_reserve = 96
>
> L2 Cache:
>     size = '256kB'
>     assoc = 4
>     tag_latency = 1
>     data_latency = 1
>     response_latency = 1
>     mshrs = 256
>     tgts_per_mshr = 16
>     write_buffers = 256
>
> L3 cache:
>     size = '8MB'
>     assoc = 16
>     tag_latency = 1
>     data_latency = 1
>     response_latency = 1
>     mshrs = 256
>     tgts_per_mshr = 20
>     write_buffers = 256
>     clusivity = 'mostly_excl'
>
> Any suggestions would be greatly appreciated.
> _______________________________________________
> gem5-users mailing list -- gem5-users@gem5.org
> To unsubscribe send an email to gem5-users-le...@gem5.org
> %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

_______________________________________________
gem5-users mailing list -- gem5-users@gem5.org
To unsubscribe send an email to gem5-users-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-users] Re: Performance tanking with memory intensive benchmark compared to real machine

Reply via email to