Hello Andrea Mondelli,

I'd like you to do a code review. Please visit

    https://gem5-review.googlesource.com/c/public/gem5/+/30035

to review the following change.


Change subject: mem: model data array bank in classic cache - revisited
......................................................................

mem: model data array bank in classic cache - revisited

Recently, I dug up an old patch initially proposed by Xiangyu Dong and never integrated into gem5.
I adapted it to the new basecode. I bring back the original changelog:

The classic cache does not model data array bank, i.e. if a read/write is being
serviced by a cache bank, no other requests should be sent to this bank.
This patch models a multi-bank cache.  Features include:
1. detect if the bank interleave granularity is larger than cache line size
2. add CacheBank debug flag
3. Differentiate read and write latency
3a. read latency is named as read_latency
3b. write latency is named as write_latency
4. Add write_latency, num_banks, bank_itlv_bit into the Python parser
5. Enabling bank model by --l1-bank-model, --l2-bank-model, --l3-bank-model
Not modeled in this patch:
Due to the lack of retry mechanism in the cache master port, the access form
the memory side will not be denied if the bank is in service. Instead, the bank service time will be extended. This is equivalent to an infinite write buffer
for cache fill operations.

I remain available for any changes or improvements, it would be interesting to integrate this feature in the cache model.

Change-Id: I8ae601df924aadcee29f87326517e3157b9cb93c
---
M configs/common/CacheConfig.py
M configs/common/Caches.py
M configs/common/Options.py
M src/mem/cache/Cache.py
M src/mem/cache/SConscript
M src/mem/cache/base.cc
M src/mem/cache/base.hh
7 files changed, 246 insertions(+), 6 deletions(-)



diff --git a/configs/common/CacheConfig.py b/configs/common/CacheConfig.py
index 05c38e0..040fe0e 100644
--- a/configs/common/CacheConfig.py
+++ b/configs/common/CacheConfig.py
@@ -99,7 +99,10 @@
         # same clock as the CPUs.
         system.l2 = l2_cache_class(clk_domain=system.cpu_clk_domain,
                                    size=options.l2_size,
-                                   assoc=options.l2_assoc)
+                                   assoc=options.l2_assoc,
+ enable_bank_model=options.l2_enable_bank,
+                                   num_banks=options.l2_num_banks,
+ bank_intlv_high_bit=options.l2_intlv_bit)

         system.tol2bus = L2XBar(clk_domain = system.cpu_clk_domain)
         system.l2.cpu_side = system.tol2bus.master
@@ -119,9 +122,15 @@
     for i in range(options.num_cpus):
         if options.caches:
             icache = icache_class(size=options.l1i_size,
-                                  assoc=options.l1i_assoc)
+                                  assoc=options.l1i_assoc,
+                                  enable_bank_model=options.l1_enable_bank,
+                                  num_banks=options.l1_num_banks,
+                                  bank_intlv_high_bit=options.l1_intlv_bit)
             dcache = dcache_class(size=options.l1d_size,
-                                  assoc=options.l1d_assoc)
+                                  assoc=options.l1d_assoc,
+                                  enable_bank_model=options.l1_enable_bank,
+                                  num_banks=options.l1_num_banks,
+                                  bank_intlv_high_bit=options.l1_intlv_bit)

             # If we have a walker cache specified, instantiate two
             # instances here
diff --git a/configs/common/Caches.py b/configs/common/Caches.py
index 77213e8..10ab6ff 100644
--- a/configs/common/Caches.py
+++ b/configs/common/Caches.py
@@ -56,6 +56,7 @@
     response_latency = 2
     mshrs = 4
     tgts_per_mshr = 20
+    enable_bank_model = False

 class L1_ICache(L1Cache):
     is_read_only = True
@@ -73,6 +74,7 @@
     mshrs = 20
     tgts_per_mshr = 12
     write_buffers = 8
+    enable_bank_model = False

 class IOCache(Cache):
     assoc = 8
@@ -82,6 +84,7 @@
     mshrs = 20
     size = '1kB'
     tgts_per_mshr = 12
+    enable_bank_model = False

 class PageTableWalkerCache(Cache):
     assoc = 2
@@ -91,6 +94,7 @@
     mshrs = 10
     size = '1kB'
     tgts_per_mshr = 12
+    enable_bank_model = False

     # the x86 table walker actually writes to the table-walker cache
     if buildEnv['TARGET_ISA'] in ['x86', 'riscv']:
diff --git a/configs/common/Options.py b/configs/common/Options.py
index 3eff04b..d1dfeff 100644
--- a/configs/common/Options.py
+++ b/configs/common/Options.py
@@ -134,6 +134,24 @@
     parser.add_option("--l1i_assoc", type="int", default=2)
     parser.add_option("--l2_assoc", type="int", default=8)
     parser.add_option("--l3_assoc", type="int", default=16)
+    parser.add_option("--l1-enable-bank", action="store_true",
+                      help="Enable L1 bank model")
+    parser.add_option("--l2-enable-bank", action="store_true",
+                      help="Enable L2 bank model")
+    parser.add_option("--l3-enable-bank", action="store_true",
+                      help="Enable L3 bank model")
+    parser.add_option("--l1-num-banks", type="int", default="1",
+                      help="L1 bank count.")
+    parser.add_option("--l2-num-banks", type="int", default="1",
+                      help="L2 bank count.")
+    parser.add_option("--l3-num-banks", type="int", default="1",
+                      help="L3 bank count.")
+    parser.add_option("--l1-intlv-bit", type="int", default="0",
+                      help="L1 bank interleave highest bit.")
+    parser.add_option("--l2-intlv-bit", type="int", default="0",
+                      help="L2 bank interleave highest bit.")
+    parser.add_option("--l3-intlv-bit", type="int", default="0",
+                      help="L3 bank interleave highest bit.")
     parser.add_option("--cacheline_size", type="int", default=64)

     # Enable Ruby
diff --git a/src/mem/cache/Cache.py b/src/mem/cache/Cache.py
index 4f4e445..7d8ab6a 100644
--- a/src/mem/cache/Cache.py
+++ b/src/mem/cache/Cache.py
@@ -81,6 +81,13 @@
     data_latency = Param.Cycles("Data access latency")
response_latency = Param.Cycles("Latency for the return path on a miss");

+    enable_bank_model = Param.Bool(True,
+        "knob to control if the bank model is used")
+    num_banks = Param.Unsigned(1, "Number of cache data array banks")
+    bank_intlv_high_bit = Param.Int(0,
+        "Cache data array bank interleave highest bit "
+        "(0=automatically aligned to cache line granularity)")
+
     warmup_percentage = Param.Percent(0,
         "Percentage of tags to be touched to warm up the cache")

diff --git a/src/mem/cache/SConscript b/src/mem/cache/SConscript
index f07a918..3d11ccc 100644
--- a/src/mem/cache/SConscript
+++ b/src/mem/cache/SConscript
@@ -44,6 +44,7 @@
 DebugFlag('CachePort')
 DebugFlag('CacheRepl')
 DebugFlag('CacheTags')
+DebugFlag('CacheBank')
 DebugFlag('CacheVerbose')
 DebugFlag('HWPrefetch')

diff --git a/src/mem/cache/base.cc b/src/mem/cache/base.cc
index 0187703..e22eac1 100644
--- a/src/mem/cache/base.cc
+++ b/src/mem/cache/base.cc
@@ -75,7 +75,7 @@

 BaseCache::BaseCache(const BaseCacheParams *p, unsigned blk_size)
     : ClockedObject(p),
-      cpuSidePort (p->name + ".cpu_side", this, "CpuSidePort"),
+      cpuSidePort(p->name + ".cpu_side", this, "CpuSidePort"),
       memSidePort(p->name + ".mem_side", this, "MemSidePort"),
       mshrQueue("MSHRs", p->mshrs, 0, p->demand_mshr_reserve), // see below
       writeBuffer("write buffer", p->write_buffers, p->mshrs), // see below
@@ -85,10 +85,17 @@
       writeAllocator(p->write_allocator),
       writebackClean(p->writeback_clean),
       tempBlockWriteback(nullptr),
-      writebackTempBlockAtomicEvent([this]{ writebackTempBlockAtomic(); },
+      writebackTempBlockAtomicEvent([this] { writebackTempBlockAtomic(); },
                                     name(), false,
                                     EventBase::Delayed_Writeback_Pri),
       blkSize(blk_size),
+      enableBankModel(p->enable_bank_model),
+      numBanks(p->num_banks),
+      bankIntlvBits(ceilLog2(p->num_banks)),
+      bankIntlvHighBit(p->bank_intlv_high_bit ? p->bank_intlv_high_bit :
+                        ceilLog2(blkSize) + bankIntlvBits - 1),
+      bankIntlvLowBit(bankIntlvHighBit + 1 - bankIntlvBits),
+      bankIntlvMask(((ULL(1) << bankIntlvBits) - 1) << bankIntlvLowBit),
       lookupLatency(p->tag_latency),
       dataLatency(p->data_latency),
       forwardLatency(p->tag_latency),
@@ -107,6 +114,25 @@
       system(p->system),
       stats(*this)
 {
+
+    if (!isPowerOf2(blkSize))
+ fatal("%s Cacheline size (%ld) is not a power of 2",name(),blkSize);
+
+    if (!isPowerOf2(numBanks))
+ fatal("%s number of banks (%ld) is not a power of 2",name(),numBanks);
+
+    auto granularity = ULL(1) << bankIntlvLowBit;
+    if (granularity < blkSize)
+ fatal("%s bank interleave granularity (%ld) smaller than line size "
+              " (%ld)",
+              name(), granularity, blkSize);
+
+    for (auto i = 0; i < numBanks; i++)
+    {
+        banks.emplace_back(csprintf("%s.bank%d", p->name, i));
+        DPRINTF(CacheBank, "Bank %s created\n", banks.back().name());
+    }
+
     // the MSHR queue has no reserve entries as we check the MSHR
     // queue on every single allocation, whereas the write queue has
     // as many reserve entries as we have MSHRs, since every MSHR may
@@ -128,6 +154,33 @@
     delete tempBlock;
 }

+void BaseCache::CacheBank::markInService(Tick finishTick)
+{
+    assert(!inService);
+    nextIdleTick = finishTick;
+    DPRINTF(CacheBank, "In service until Tick %ld\n",
+            nextIdleTick);
+    inService = true;
+}
+
+void BaseCache::CacheBank::checkAndUnmarkInService()
+{
+    if (inService && nextIdleTick <= curTick())
+    {
+        DPRINTF(CacheBank, "Service done, become idle\n");
+        inService = false;
+    }
+}
+
+void BaseCache::CacheBank::extendService(Tick extraTick)
+{
+    assert(inService);
+    assert(nextIdleTick > curTick());
+    nextIdleTick += extraTick;
+    DPRINTF(CacheBank, "Extend service to Tick %ld\n",
+            nextIdleTick);
+}
+
 void
 BaseCache::CacheSlavePort::setBlocked()
 {
@@ -371,6 +424,15 @@
         }

         handleTimingReqHit(pkt, blk, request_time);
+
+        // Mark the corresponding bank in service
+        if (enableBankModel)
+        {
+            auto &bank = banks.at(getBankId(pkt->getAddr()));
+            bank.markInService(request_time);
+            DPRINTF(CacheBank, "bank %s set busy until tick: %ld\n",
+                    bank.name(), request_time);
+        }
     } else {
         handleTimingReqMiss(pkt, blk, forward_time, request_time);

@@ -471,6 +533,23 @@
         blk = handleFill(pkt, blk, writebacks, allocate);
         assert(blk != nullptr);
         ppFill->notify(pkt);
+
+        if (enableBankModel)
+        {
+            auto &bank = banks.at(getBankId(pkt->getAddr()));
+            // mark the corresponding bank in service
+            auto latency = sequentialAccess
+                               ? (lookupLatency + dataLatency)
+                               : std::max(lookupLatency, dataLatency);
+            if (bank.isBusy())
+            {
+                bank.extendService(latency * clockPeriod());
+            }
+            else
+            {
+                bank.markInService(clockEdge(latency));
+            }
+        }
     }

     if (blk && blk->isValid() && pkt->isClean() && !pkt->isInvalidate()) {
@@ -2299,6 +2378,7 @@
 bool
 BaseCache::CpuSidePort::tryTiming(PacketPtr pkt)
 {
+
     if (cache->system->bypassCaches() || pkt->isExpressSnoop()) {
         // always let express snoop packets through even if blocked
         return true;
@@ -2306,6 +2386,22 @@
         // either already committed to send a retry, or blocked
         mustSendRetry = true;
         return false;
+    } else if (cache->enableBankModel)
+    {
+        auto &bank = cache->banks.at(cache->getBankId(pkt->getAddr()));
+        if (bank.isBusy())
+        {
+            DPRINTF(CachePort,
+ "Cache port %s denying new requests because the accessing "
+                    "bank is busy\n",
+                    name());
+            if (sendRetryEvent.scheduled())
+            {
+                owner.deschedule(sendRetryEvent);
+            }
+            owner.schedule(sendRetryEvent, bank.finishTick());
+            return false;
+        }
     }
     mustSendRetry = false;
     return true;
@@ -2316,6 +2412,17 @@
 {
     assert(pkt->isRequest());

+    // unmark bank in service
+ // NOTE: Ideally, the bank status should be updated immedidately after the + // nextIdleTick expires, but we will need to create new events to do that.
+    // Instead, we only check-and-unmark the inService bit before we really
+    // want to know the bank status.
+    // @todo: we need to replace the bank_busy mark/unmark code into an
+    //        event-driven style
+    if (cache->enableBankModel)
+        for (auto &bank : cache->banks)
+            bank.checkAndUnmarkInService();
+
     if (cache->system->bypassCaches()) {
         // Just forward the packet if caches are disabled.
         // @todo This should really enqueue the packet rather
@@ -2376,6 +2483,17 @@
 bool
 BaseCache::MemSidePort::recvTimingResp(PacketPtr pkt)
 {
+    // unmark bank in service
+ // NOTE: Ideally, the bank status should be updated immedidately after the + // nextIdleTick expires, but we will need to create new events to do that.
+    // Instead, we only check-and-unmark the inService bit before we really
+    // want to know the bank status.
+    // @todo: we need to replace the bank_busy mark/unmark code into an
+    //        event-driven style
+    if (cache->enableBankModel)
+        for (auto &bank : cache->banks)
+            bank.checkAndUnmarkInService();
+
     cache->recvTimingResp(pkt);
     return true;
 }
diff --git a/src/mem/cache/base.hh b/src/mem/cache/base.hh
index 3efc7c7..09c448e 100644
--- a/src/mem/cache/base.hh
+++ b/src/mem/cache/base.hh
@@ -55,6 +55,7 @@
 #include "base/trace.hh"
 #include "base/types.hh"
 #include "debug/Cache.hh"
+#include "debug/CacheBank.hh"
 #include "debug/CachePort.hh"
 #include "enums/Clusivity.hh"
 #include "mem/cache/cache_blk.hh"
@@ -266,15 +267,55 @@

         bool mustSendRetry;

+        EventFunctionWrapper sendRetryEvent;
+
       private:

         void processSendRetry();

-        EventFunctionWrapper sendRetryEvent;

     };

     /**
+    * Cache data array bank.
+    * Only models bank access contention, does not hold actual data
+    */
+    class CacheBank
+    {
+
+    private:
+        /** Descriptive name (for DPRINTF output) */
+        std::string bankName;
+
+        bool inService;
+
+        Tick nextIdleTick;
+
+    public:
+        /** Mark this cache bank in-service until finishTick */
+        void markInService(Tick finishTick);
+
+        /** Check and unmark this cache bank in-service if necessary */
+        void checkAndUnmarkInService();
+
+        /** Extend this cache bank's in-service time by extraTick */
+        void extendService(Tick extraTick);
+
+        CacheBank(const std::string &_name) : bankName(_name),
+                                              inService(false),
+                                              nextIdleTick(0)
+        {
+        }
+
+        bool isBusy() const { return inService; }
+
+        Tick finishTick() const { return nextIdleTick; }
+
+        /** Return bank name (for DPRINTF). */
+        const std::string name() const { return bankName; }
+    };
+
+    /**
      * The CPU-side port extends the base cache slave port with access
      * functions for functional, atomic and timing requests.
      */
@@ -310,6 +351,9 @@

   protected:

+    /** Data array banks */
+    std::vector<CacheBank> banks;
+
     /** Miss status registers */
     MSHRQueue mshrQueue;

@@ -839,6 +883,36 @@
     const unsigned blkSize;

     /**
+     * The knob to turn on/off cache data array bank model
+     */
+    const bool enableBankModel;
+
+    /**
+     * The number of cache data array banks.
+     */
+    const unsigned numBanks;
+
+    /**
+     * The number of cache data array bank interleave bits
+     */
+    const unsigned bankIntlvBits;
+
+    /**
+     * Cache data array bank interleave high bit
+     */
+    const unsigned bankIntlvHighBit;
+
+    /**
+     * Cache data array bank interleave low bit
+     */
+    const unsigned bankIntlvLowBit;
+
+    /**
+     * Cache data array bank interleave mask
+     */
+    const Addr bankIntlvMask;
+
+    /**
      * The latency of tag lookup of a cache. It occurs when there is
      * an access to the cache.
      */
@@ -1092,6 +1166,15 @@
         return blkSize;
     }

+    /**
+     * Return bank ID according to interleave bits
+     */
+    unsigned
+    getBankId(Addr addr) const
+    {
+        return (addr & bankIntlvMask) >> bankIntlvLowBit;
+    }
+
     const AddrRangeList &getAddrRanges() const { return addrRanges; }

MSHR *allocateMissBuffer(PacketPtr pkt, Tick time, bool sched_send = true)

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/30035
To unsubscribe, or for help writing mail filters, visit https://gem5-review.googlesource.com/settings

Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: I8ae601df924aadcee29f87326517e3157b9cb93c
Gerrit-Change-Number: 30035
Gerrit-PatchSet: 1
Gerrit-Owner: Andrea Mondelli <andrea.monde...@huawei.com>
Gerrit-Reviewer: Andrea Mondelli <andrea.monde...@huawei.com>
Gerrit-MessageType: newchange
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to