[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: HTM mem implementation

Giacomo Travaglini (Gerrit) via gem5-dev Mon, 15 Jun 2020 16:42:15 -0700

Hello Timothy Hayes,

I'd like you to do a code review. Please visit


    https://gem5-review.googlesource.com/c/public/gem5/+/30319

to review the following change.


Change subject: mem-ruby: HTM mem implementation
......................................................................

mem-ruby: HTM mem implementation

This patch augments the MESI_Three_Level Ruby protocol with hardware
transactional memory support.

The HTM implementation relies on buffering of speculative memory updates.
The core notifies the L0 cache controller that a new transaction has
started and the controller in turn places itself in transactional state
(htmTransactionalState := true).

When operating in transactional state, the usual MESI protocol changes
slightly. Lines loaded or stored are marked as part of a transaction's
read and write set respectively. If there is an invalidation request to
cache line in the read/write set, the transaction is marked as failed.
Similarly, if there is a read request by another core to a speculatively
written cache line, i.e. in the write set, the transaction is marked as
failed. If failed, all subsequent loads and stores from the core are
made benign, i.e. made into NOPS at the cache controller, and responses
are marked to indicate that the transactional state has failed. When the
core receives these marked responses, it generates a HtmFailureFault
with the reason for the transaction failure. Servicing this fault does
two things--

(a) Restores the architectural checkpoint
(b) Sends an HTM abort signal to the cache controller

The restoration includes all registers in the checkpoint as well as the
program counter of the instruction before the transaction started.

The abort signal is sent to the L0 cache controller and resets the
failed transactional state. It resets the transactional read and write
sets and invalidates any speculatively written cache lines.  It also
exits the transactional state so that the MESI protocol operates as
usual.

Alternatively, if the instructions within a transaction complete without
triggering a HtmFailureFault, the transaction can be committed. The core
is responsible for notifying the cache controller that the transaction
is complete and the cache controller makes all speculative writes
visible to the rest of the system and exits the transactional state.

Notifting the cache controller is done through HtmCmd Requests which are
a subtype of Load Requests.

KUDOS:
The code is based on a previous pull request by Pradip Vallathol who
developed HTM and TSX support in Gem5 as part of his master’s thesis:

http://reviews.gem5.org/r/2308/index.html

JIRA: https://gem5.atlassian.net/browse/GEM5-587

Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Signed-off-by: Giacomo Travaglini <giacomo.travagl...@arm.com>
---
M src/mem/ruby/protocol/MESI_Three_Level-L0cache.sm
M src/mem/ruby/protocol/MESI_Three_Level-L1cache.sm
M src/mem/ruby/protocol/MESI_Three_Level-msg.sm
M src/mem/ruby/protocol/RubySlicc_Exports.sm
M src/mem/ruby/protocol/RubySlicc_Types.sm
M src/mem/ruby/slicc_interface/AbstractCacheEntry.cc
M src/mem/ruby/slicc_interface/AbstractCacheEntry.hh
M src/mem/ruby/slicc_interface/RubyRequest.hh
M src/mem/ruby/slicc_interface/RubySlicc_Util.hh
M src/mem/ruby/structures/CacheMemory.cc
M src/mem/ruby/structures/CacheMemory.hh
M src/mem/ruby/system/RubyPort.cc
M src/mem/ruby/system/RubyPort.hh
M src/mem/ruby/system/Sequencer.cc
M src/mem/ruby/system/Sequencer.hh
15 files changed, 1,193 insertions(+), 65 deletions(-)

diff --git a/src/mem/ruby/protocol/MESI_Three_Level-L0cache.smb/src/mem/ruby/protocol/MESI_Three_Level-L0cache.sm

index 4de4a29..4b8f71d 100644
--- a/src/mem/ruby/protocol/MESI_Three_Level-L0cache.sm
+++ b/src/mem/ruby/protocol/MESI_Three_Level-L0cache.sm
@@ -61,6 +61,12 @@
    // Request Buffer for prefetches
    MessageBuffer * prefetchQueue;
 {
+  // hardware transactional memory
+  bool htmTransactionalState, default="false";
+  bool htmFailed, default="false";
+  int htmUid, default=0;

+ HtmFailedInCacheReason htmFailedRc,default=HtmFailedInCacheReason_NO_FAIL;

+
   // STATES

state_declaration(State, desc="Cache states", default="L0Cache_State_I"){

     // Base states
@@ -141,6 +147,15 @@
     PF_Ifetch,       desc="Instruction fetch request from prefetcher";
     PF_Store,        desc="Exclusive load request from prefetcher";

PF_Bad_Addr, desc="Throw away prefetch request due to bad addressgeneration";

+
+    // hardware transactional memory

+ HTM_abort, desc="Abort HTM transaction and rollback cache topre-transactional state";

+    HTM_start,        desc="Place cache in HTM transactional state";

+ HTM_commit, desc="Commit speculative loads/stores and placecache in normal state";+ HTM_cancel, desc="Fail HTM transaction explicitely withoutaborting";+ HTM_notifyCMD, desc="Notify core via HTM CMD that HTM transactionhas failed";+ HTM_notifyLD, desc="Notify core via LD that HTM transaction hasfailed";+ HTM_notifyST, desc="Notify core via ST that HTM transaction hasfailed";

   }

   // TYPES
@@ -151,6 +166,19 @@
     DataBlock DataBlk,       desc="data for the block";
     bool Dirty, default="false",   desc="data is dirty";

bool isPrefetched, default="false", desc="Set if this block wasprefetched";

+
+    // hardware transactional memory
+    // read/write set state
+    void setInHtmReadSet(bool), external="yes";
+    void setInHtmWriteSet(bool), external="yes";
+    bool getInHtmReadSet(), external="yes";
+    bool getInHtmWriteSet(), external="yes";
+
+    // override invalidateEntry
+    void invalidateEntry() {
+      CacheState := State:I;
+      Dirty := false;
+    }
   }

   // TBE fields
@@ -455,9 +483,102 @@
     if (mandatoryQueue_in.isReady(clockEdge())) {
       peek(mandatoryQueue_in, RubyRequest, block_on="LineAddress") {

- // Check for data access to blocks in I-cache and ifetchs toblocks in D-cache

+        // hardware transactional memory support begins here

-        if (in_msg.Type == RubyRequestType:IFETCH) {
+        // If this cache controller is in a transactional state/mode,
+        // ensure that its failure status is something recognisable.
+        if (htmFailed) {
+          assert(htmFailedRc == HtmFailedInCacheReason:FAIL_SELF ||
+            htmFailedRc == HtmFailedInCacheReason:FAIL_REMOTE ||
+            htmFailedRc == HtmFailedInCacheReason:FAIL_OTHER);
+        }
+
+        // HTM_start commands set a new htmUid
+        // This is used for debugging and sanity checks
+        if (in_msg.Type == RubyRequestType:HTM_start) {
+            assert (htmUid != in_msg.htm_transaction_uid);
+            htmUid := in_msg.htm_transaction_uid;
+        }
+

+ // If the incoming memory request was generated within atransaction,

+        // ensure that the request's htmUid matches the htmUid of this
+        // cache controller. A mismatch here is fatal and implies there was
+        // a reordering that should never have taken place.
+        if (in_msg.htm_from_transaction &&
+            (htmUid != in_msg.htm_transaction_uid)) {
+          DPRINTF(HtmMem,

+ "mandatoryQueue_in: (%u) 0x%lx mismatch between cachehtmUid=%u and message htmUid=%u\n",+ in_msg.Type, in_msg.LineAddress, htmUid,in_msg.htm_transaction_uid);

+        }
+
+        // special/rare case which hopefully won't occur
+        if (htmFailed && in_msg.Type == RubyRequestType:HTM_start) {
+          error("cannot handle this special HTM case yet");
+        }
+
+        // The transaction is to be aborted--
+        // Aborting a transaction returns the cache to a non-transactional
+        // state/mode, resets the read/write sets, and invalidates any
+        // speculatively written lines.
+        if (in_msg.Type == RubyRequestType:HTM_abort) {

+ Entry cache_entry := static_cast(Entry, "pointer",Dcache.getNullEntry());

+          TBE tbe := TBEs.getNullEntry();
+          trigger(Event:HTM_abort, in_msg.LineAddress, cache_entry, tbe);
+        }
+        // The transaction has failed but not yet aborted--
+        // case 1:

+ // If memory request is transactional but the transaction hasfailed,

+        // it is necessary to inform the CPU of the failure.
+        // case 2:
+        // If load/store memory request is transactional and cache is not
+        // in transactional state, it's likely that the transaction aborted
+        // and Ruby is still receiving scheduled memory operations.
+        // The solution is to make these requests benign.

+ else if ((in_msg.htm_from_transaction && htmFailed) ||(in_msg.htm_from_transaction && !isHtmCmdRequest(in_msg.Type)&& !htmTransactionalState)) {

+          if (isHtmCmdRequest(in_msg.Type)) {

+ Entry cache_entry := static_cast(Entry, "pointer",Dcache.getNullEntry());

+            TBE tbe := TBEs.getNullEntry();

+ trigger(Event:HTM_notifyCMD, in_msg.LineAddress, cache_entry,tbe);

+          } else if (isDataReadRequest(in_msg.Type)) {
+            Entry cache_entry := getDCacheEntry(in_msg.LineAddress);
+            TBE tbe := TBEs[in_msg.LineAddress];

+ trigger(Event:HTM_notifyLD, in_msg.LineAddress, cache_entry,tbe);

+          } else if (isWriteRequest(in_msg.Type)) {
+            Entry cache_entry := getDCacheEntry(in_msg.LineAddress);
+            TBE tbe := TBEs[in_msg.LineAddress];

+ trigger(Event:HTM_notifyST, in_msg.LineAddress, cache_entry,tbe);

+          } else {
+            error("unknown message type");
+          }
+        }
+        // The transaction has not failed and this is
+        // one of three HTM commands--
+        // (1) start a transaction
+        // (2) commit a transaction
+        // (3) cancel/fail a transaction (but don't yet abort it)

+ else if (isHtmCmdRequest(in_msg.Type) && in_msg.Type !=RubyRequestType:HTM_abort) {+ Entry cache_entry := static_cast(Entry, "pointer",Dcache.getNullEntry());

+          TBE tbe := TBEs.getNullEntry();
+          if (in_msg.Type == RubyRequestType:HTM_start) {
+            DPRINTF(HtmMem,
+              "mandatoryQueue_in: Starting htm transaction htmUid=%u\n",
+              htmUid);
+            trigger(Event:HTM_start, in_msg.LineAddress, cache_entry, tbe);
+          } else if (in_msg.Type == RubyRequestType:HTM_commit) {
+            DPRINTF(HtmMem,
+              "mandatoryQueue_in: Committing transaction htmUid=%d\n",
+              htmUid);

+ trigger(Event:HTM_commit, in_msg.LineAddress, cache_entry,tbe);

+          } else if (in_msg.Type == RubyRequestType:HTM_cancel) {
+            DPRINTF(HtmMem,
+              "mandatoryQueue_in: Cancelling transaction htmUid=%d\n",
+              htmUid);

+ trigger(Event:HTM_cancel, in_msg.LineAddress, cache_entry,tbe);

+          }
+        }
+        // end: hardware transactional memory
+        else if (in_msg.Type == RubyRequestType:IFETCH) {

+ // Check for data access to blocks in I-cache and ifetchs toblocks in D-cache

           // ** INSTRUCTION ACCESS ***

           Entry Icache_entry := getICacheEntry(in_msg.LineAddress);
@@ -593,17 +714,31 @@
   }

   action(f_sendDataToL1, "f", desc="Send data to the L1 cache") {
-    enqueue(requestNetwork_out, CoherenceMsg, response_latency) {
-      assert(is_valid(cache_entry));
-      out_msg.addr := address;
-      out_msg.Class := CoherenceClass:INV_DATA;
-      out_msg.DataBlk := cache_entry.DataBlk;
-      out_msg.Dirty := cache_entry.Dirty;
-      out_msg.Sender := machineID;
-      out_msg.Dest := createMachineID(MachineType:L1Cache, version);
-      out_msg.MessageSize := MessageSizeType:Writeback_Data;
+    // hardware transactional memory
+    // Cannot write speculative data to L1 cache
+    if (cache_entry.getInHtmWriteSet()) {
+      // If in HTM write set then send NAK to L1
+      enqueue(requestNetwork_out, CoherenceMsg, response_latency) {
+        assert(is_valid(cache_entry));
+        out_msg.addr := address;
+        out_msg.Class := CoherenceClass:NAK;
+        out_msg.Sender := machineID;
+        out_msg.Dest := createMachineID(MachineType:L1Cache, version);
+        out_msg.MessageSize := MessageSizeType:Response_Control;
+      }
+    } else {
+      enqueue(requestNetwork_out, CoherenceMsg, response_latency) {
+        assert(is_valid(cache_entry));
+        out_msg.addr := address;
+        out_msg.Class := CoherenceClass:INV_DATA;
+        out_msg.DataBlk := cache_entry.DataBlk;
+        out_msg.Dirty := cache_entry.Dirty;
+        out_msg.Sender := machineID;
+        out_msg.Dest := createMachineID(MachineType:L1Cache, version);
+        out_msg.MessageSize := MessageSizeType:Writeback_Data;
+      }
+      cache_entry.Dirty := false;
     }
-    cache_entry.Dirty := false;
   }

   action(fi_sendInvAck, "fi", desc="Send data to the L1 cache") {
@@ -625,7 +760,7 @@
     }
   }

-  action(g_issuePUTX, "g", desc="Relinquish line to the L1 cache") {
+  action(g_issuePUTE, "\ge", desc="Relinquish line to the L1 cache") {
     enqueue(requestNetwork_out, CoherenceMsg, response_latency) {
       assert(is_valid(cache_entry));
       out_msg.addr := address;
@@ -633,12 +768,21 @@
       out_msg.Dirty := cache_entry.Dirty;
       out_msg.Sender:= machineID;
       out_msg.Dest := createMachineID(MachineType:L1Cache, version);
+      out_msg.MessageSize := MessageSizeType:Writeback_Control;
+    }
+  }

-      if (cache_entry.Dirty) {
+  action(g_issuePUTM, "\gm", desc="Send modified line to the L1 cache") {
+    if (!cache_entry.getInHtmWriteSet()) {
+      enqueue(requestNetwork_out, CoherenceMsg, response_latency) {
+        assert(is_valid(cache_entry));
+        out_msg.addr := address;
+        out_msg.Class := CoherenceClass:PUTX;
+        out_msg.Dirty := cache_entry.Dirty;
+        out_msg.Sender:= machineID;
+        out_msg.Dest := createMachineID(MachineType:L1Cache, version);
         out_msg.MessageSize := MessageSizeType:Writeback_Data;
         out_msg.DataBlk := cache_entry.DataBlk;
-      } else {
-        out_msg.MessageSize := MessageSizeType:Writeback_Control;
       }
     }
   }
@@ -867,6 +1011,173 @@
     stall_and_wait(optionalQueue_in, address);
   }

+  // hardware transactional memory
+
+  action(hars_htmAddToReadSet, "\hars", desc="add to HTM read set") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      if (htmTransactionalState && in_msg.htm_from_transaction) {
+        assert(!htmFailed);
+        if (!cache_entry.getInHtmReadSet()) {
+          DPRINTF(HtmMem,
+              "Adding 0x%lx to transactional read set htmUid=%u.\n",
+              address, htmUid);
+          cache_entry.setInHtmReadSet(true);
+        }
+      }
+    }
+  }
+
+  action(haws_htmAddToWriteSet, "\haws", desc="add to HTM write set") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      if (htmTransactionalState && in_msg.htm_from_transaction) {
+        assert(!htmFailed);
+        assert(!((cache_entry.getInHtmWriteSet() == false) &&
+          (cache_entry.CacheState == State:IM)));
+        assert(!((cache_entry.getInHtmWriteSet() == false) &&
+          (cache_entry.CacheState == State:SM)));
+        // ON DEMAND write-back
+        // if modified and not in write set,
+        // write back and retain M state
+        if((cache_entry.CacheState == State:M) &&
+           !cache_entry.getInHtmWriteSet()) {
+          // code copied from issuePUTX
+          enqueue(requestNetwork_out, CoherenceMsg, response_latency) {
+            assert(is_valid(cache_entry));
+            out_msg.addr := address;
+            out_msg.Class := CoherenceClass:PUTX_COPY;
+            out_msg.DataBlk := cache_entry.DataBlk;
+            out_msg.Dirty := cache_entry.Dirty;
+            out_msg.Sender:= machineID;
+            out_msg.Dest := createMachineID(MachineType:L1Cache, version);
+            out_msg.MessageSize := MessageSizeType:Writeback_Data;
+          }
+        }
+        if (!cache_entry.getInHtmWriteSet()) {
+          DPRINTF(HtmMem,
+              "Adding 0x%lx to transactional write set htmUid=%u.\n",
+              address, htmUid);
+          cache_entry.setInHtmWriteSet(true);
+        }
+      }
+    }
+  }
+
+  action(hfts_htmFailTransactionSize, "\hfts^",
+  desc="Fail transaction due to cache associativity/capacity conflict") {
+    if (htmTransactionalState &&

+ (cache_entry.getInHtmReadSet() || cache_entry.getInHtmWriteSet())){

+      DPRINTF(HtmMem,

+ "Failure of a transaction due to cache associativity/capacity:rs=%s, ws=%s, addr=0x%lx, htmUid=%u\n",

+          cache_entry.getInHtmReadSet(), cache_entry.getInHtmWriteSet(),
+          address, htmUid);
+      htmFailed := true;
+      htmFailedRc := HtmFailedInCacheReason:FAIL_SELF;
+    }
+  }
+
+  action(hftm_htmFailTransactionMem, "\hftm^",
+  desc="Fail transaction due to memory conflict") {
+    if (htmTransactionalState &&

+ (cache_entry.getInHtmReadSet() || cache_entry.getInHtmWriteSet())){

+      DPRINTF(HtmMem,

+ "Failure of a transaction due to memory conflict: rs=%s, ws=%s,addr=0x%lx, htmUid=%u\n",

+          cache_entry.getInHtmReadSet(), cache_entry.getInHtmWriteSet(),
+          address, htmUid);
+      htmFailed := true;
+      htmFailedRc := HtmFailedInCacheReason:FAIL_REMOTE;
+    }
+  }
+
+  action(hvu_htmVerifyUid, "\hvu",
+  desc="Ensure cache htmUid is equivalent to message htmUid") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      if (htmUid != in_msg.htm_transaction_uid) {
+        DPRINTF(HtmMem, "cache's htmUid=%u and request's htmUid=%u\n",
+            htmUid, in_msg.htm_transaction_uid);
+        error("mismatch between cache's htmUid and request's htmUid");
+      }
+    }
+  }
+
+  action(hcs_htmCommandSucceed, "\hcs",
+  desc="Notify sequencer HTM command succeeded") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      assert(is_invalid(cache_entry) && is_invalid(tbe));
+      DPRINTF(RubySlicc, "htm command successful\n");
+      sequencer.htmCallback(in_msg.LineAddress,
+        HtmCallbackMode:HTM_CMD, HtmFailedInCacheReason:NO_FAIL);
+    }
+  }
+
+  action(hcs_htmCommandFail, "\hcf",
+  desc="Notify sequencer HTM command failed") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      assert(is_invalid(cache_entry) && is_invalid(tbe));
+      DPRINTF(RubySlicc, "htm command failure\n");
+      sequencer.htmCallback(in_msg.LineAddress,
+        HtmCallbackMode:HTM_CMD, htmFailedRc);
+    }
+  }
+
+  action(hcs_htmLoadFail, "\hlf",
+  desc="Notify sequencer HTM transactional load failed") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      DPRINTF(RubySlicc, "htm transactional load failure\n");
+      sequencer.htmCallback(in_msg.LineAddress,
+        HtmCallbackMode:LD_FAIL, htmFailedRc);
+    }
+  }
+
+  action(hcs_htmStoreFail, "\hsf",
+  desc="Notify sequencer HTM transactional store failed") {
+    peek(mandatoryQueue_in, RubyRequest) {
+      DPRINTF(RubySlicc, "htm transactional store failure\n");
+      sequencer.htmCallback(in_msg.LineAddress,
+        HtmCallbackMode:ST_FAIL, htmFailedRc);
+    }
+  }
+
+  action(hat_htmAbortTransaction, "\hat",

+ desc="Abort HTM transaction and rollback cache to pre-transactionalstate") {

+    assert(is_invalid(cache_entry) && is_invalid(tbe));
+    assert (htmTransactionalState);
+    Dcache.htmAbortTransaction();
+    htmTransactionalState := false;
+    htmFailed := false;
+    sequencer.llscClearLocalMonitor();
+    DPRINTF(RubySlicc, "Aborted htm transaction\n");
+  }
+
+  action(hst_htmStartTransaction, "\hst",
+  desc="Place cache in HTM transactional state") {
+    assert(is_invalid(cache_entry) && is_invalid(tbe));
+    assert (!htmTransactionalState);
+    htmTransactionalState := true;
+    htmFailedRc := HtmFailedInCacheReason:NO_FAIL;
+    sequencer.llscClearLocalMonitor();
+    DPRINTF(RubySlicc, "Started htm transaction\n");
+  }
+
+  action(hct_htmCommitTransaction, "\hct",
+  desc="Commit speculative loads/stores and place cache in normal state") {
+    assert(is_invalid(cache_entry) && is_invalid(tbe));
+    assert (htmTransactionalState);
+    assert (!htmFailed);
+    Dcache.htmCommitTransaction();
+    sequencer.llscClearLocalMonitor();
+    htmTransactionalState := false;
+    DPRINTF(RubySlicc, "Committed htm transaction\n");
+  }
+
+  action(hcnt_htmCancelTransaction, "\hcnt",
+  desc="Fail HTM transaction explicitely without aborting") {
+    assert(is_invalid(cache_entry) && is_invalid(tbe));
+    assert (htmTransactionalState);
+    htmFailed := true;
+    htmFailedRc := HtmFailedInCacheReason:FAIL_OTHER;
+    DPRINTF(RubySlicc, "Cancelled htm transaction\n");
+  }
+
   //*****************************************************
   // TRANSITIONS
   //*****************************************************
@@ -880,6 +1191,7 @@
   transition(I, Load, IS) {
     oo_allocateDCacheBlock;
     i_allocateTBE;
+    hars_htmAddToReadSet;
     a_issueGETS;
     uu_profileDataMiss;
     po_observeMiss;
@@ -898,19 +1210,42 @@
   transition(I, Store, IM) {
     oo_allocateDCacheBlock;
     i_allocateTBE;
+    haws_htmAddToWriteSet;
     b_issueGETX;
     uu_profileDataMiss;
     po_observeMiss;
     k_popMandatoryQueue;
   }

-  transition({I, IS, IM, Inst_IS}, {InvOwn, InvElse}) {
+  transition({I, Inst_IS}, {InvOwn, InvElse}) {
     forward_eviction_to_cpu;
     fi_sendInvAck;
     l_popRequestQueue;
   }

-  transition(SM, {InvOwn, InvElse}, IM) {
+  transition({IS, IM}, InvOwn) {
+    hfts_htmFailTransactionSize;
+    forward_eviction_to_cpu;
+    fi_sendInvAck;
+    l_popRequestQueue;
+  }
+
+  transition({IS, IM}, InvElse) {
+    hftm_htmFailTransactionMem;
+    forward_eviction_to_cpu;
+    fi_sendInvAck;
+    l_popRequestQueue;
+  }
+
+  transition(SM, InvOwn, IM) {
+    hfts_htmFailTransactionSize;
+    forward_eviction_to_cpu;
+    fi_sendInvAck;
+    l_popRequestQueue;
+  }
+
+  transition(SM, InvElse, IM) {
+    hftm_htmFailTransactionMem;
     forward_eviction_to_cpu;
     fi_sendInvAck;
     l_popRequestQueue;
@@ -918,6 +1253,7 @@

   // Transitions from Shared
   transition({S,E,M}, Load) {
+    hars_htmAddToReadSet;
     h_load_hit;
     uu_profileDataHit;
     pph_observePfHit;
@@ -933,17 +1269,28 @@

   transition(S, Store, SM) {
     i_allocateTBE;
+    haws_htmAddToWriteSet;
     c_issueUPGRADE;
     uu_profileDataMiss;
     k_popMandatoryQueue;
   }

   transition(S, {L0_Replacement,PF_L0_Replacement}, I) {
+    hfts_htmFailTransactionSize;
     forward_eviction_to_cpu;
     ff_deallocateCacheBlock;
   }

-  transition(S, {InvOwn, InvElse}, I) {
+  transition(S, InvOwn, I) {
+    hfts_htmFailTransactionSize;
+    forward_eviction_to_cpu;
+    fi_sendInvAck;
+    ff_deallocateCacheBlock;
+    l_popRequestQueue;
+  }
+
+  transition(S, InvElse, I) {
+    hftm_htmFailTransactionMem;
     forward_eviction_to_cpu;
     fi_sendInvAck;
     ff_deallocateCacheBlock;
@@ -952,6 +1299,7 @@

   // Transitions from Exclusive
   transition({E,M}, Store, M) {
+    haws_htmAddToWriteSet;
     hh_store_hit;
     uu_profileDataHit;
     pph_observePfHit;
@@ -959,12 +1307,23 @@
   }

   transition(E, {L0_Replacement,PF_L0_Replacement}, I) {
+    hfts_htmFailTransactionSize;
     forward_eviction_to_cpu;
-    g_issuePUTX;
+    g_issuePUTE;
     ff_deallocateCacheBlock;
   }

-  transition(E, {InvOwn, InvElse, Fwd_GETX}, I) {
+  transition(E, {InvElse, Fwd_GETX}, I) {
+    hftm_htmFailTransactionMem;
+    // don't send data
+    forward_eviction_to_cpu;
+    fi_sendInvAck;
+    ff_deallocateCacheBlock;
+    l_popRequestQueue;
+  }
+
+  transition(E, InvOwn, I) {
+    hfts_htmFailTransactionSize;
     // don't send data
     forward_eviction_to_cpu;
     fi_sendInvAck;
@@ -979,12 +1338,22 @@

   // Transitions from Modified
   transition(M, {L0_Replacement,PF_L0_Replacement}, I) {
+    hfts_htmFailTransactionSize;
     forward_eviction_to_cpu;
-    g_issuePUTX;
+    g_issuePUTM;
     ff_deallocateCacheBlock;
   }

-  transition(M, {InvOwn, InvElse, Fwd_GETX}, I) {
+  transition(M, InvOwn, I) {
+    hfts_htmFailTransactionSize;
+    forward_eviction_to_cpu;
+    f_sendDataToL1;
+    ff_deallocateCacheBlock;
+    l_popRequestQueue;
+  }
+
+  transition(M, {InvElse, Fwd_GETX}, I) {
+    hftm_htmFailTransactionMem;
     forward_eviction_to_cpu;
     f_sendDataToL1;
     ff_deallocateCacheBlock;
@@ -992,6 +1361,7 @@
   }

   transition(M, {Fwd_GETS, Fwd_GET_INSTR}, S) {
+    hftm_htmFailTransactionMem;
     f_sendDataToL1;
     l_popRequestQueue;
   }
@@ -1013,6 +1383,7 @@
   }

   transition(IS, Data_Stale, I) {
+    hftm_htmFailTransactionMem;
     u_writeDataToCache;
     forward_eviction_to_cpu;
     hx_load_hit;
@@ -1090,6 +1461,7 @@
   }

   transition(PF_IS, Load, IS) {
+    hars_htmAddToReadSet;
     uu_profileDataMiss;
     ppm_observePfMiss;
     k_popMandatoryQueue;
@@ -1116,6 +1488,7 @@
   }

   transition(PF_IE, Store, IM) {
+    haws_htmAddToWriteSet;
     uu_profileDataMiss;
     ppm_observePfMiss;
     k_popMandatoryQueue;
@@ -1178,4 +1551,56 @@
   transition(I, PF_Bad_Addr) {
     pq_popPrefetchQueue;
   }
+
+  // hardware transactional memory
+
+  transition(I, HTM_abort) {
+    hvu_htmVerifyUid;
+    hat_htmAbortTransaction;
+    hcs_htmCommandSucceed;
+    k_popMandatoryQueue;
+  }
+
+  transition(I, HTM_start) {
+    hvu_htmVerifyUid;
+    hst_htmStartTransaction;
+    hcs_htmCommandSucceed;
+    k_popMandatoryQueue;
+  }
+
+  transition(I, HTM_commit) {
+    hvu_htmVerifyUid;
+    hct_htmCommitTransaction;
+    hcs_htmCommandSucceed;
+    k_popMandatoryQueue;
+  }
+
+  transition(I, HTM_cancel) {
+    hvu_htmVerifyUid;
+    hcnt_htmCancelTransaction;
+    hcs_htmCommandSucceed;
+    k_popMandatoryQueue;
+  }
+
+  transition(I, HTM_notifyCMD) {
+    hvu_htmVerifyUid;
+    hcs_htmCommandFail;
+    k_popMandatoryQueue;
+  }
+
+  transition({I,S,E,M,IS,IM,SM,PF_IS,PF_IE}, HTM_notifyLD) {
+    hvu_htmVerifyUid;
+    hcs_htmLoadFail;
+    k_popMandatoryQueue;
+  }
+
+  transition({I,S,E,M,IS,IM,SM,PF_IS,PF_IE}, HTM_notifyST) {
+    hvu_htmVerifyUid;
+    hcs_htmStoreFail;
+    k_popMandatoryQueue;
+  }
+
+  transition(I, {L0_Replacement,PF_L0_Replacement}) {
+    ff_deallocateCacheBlock;
+  }
 }

diff --git a/src/mem/ruby/protocol/MESI_Three_Level-L1cache.smb/src/mem/ruby/protocol/MESI_Three_Level-L1cache.sm

index 0f5a7ac..7344ca1 100644
--- a/src/mem/ruby/protocol/MESI_Three_Level-L1cache.sm
+++ b/src/mem/ruby/protocol/MESI_Three_Level-L1cache.sm
@@ -130,6 +130,14 @@
     Ack_all,      desc="Last ack for processor";

     WB_Ack,        desc="Ack for replacement";
+
+    // hardware transactional memory
+    L0_DataCopy,     desc="Data Block from L0. Should remain in M state.";
+
+    // L0 cache received the invalidation message and has
+    // sent a NAK (because of htm abort) saying that the data
+    // in L1 is the latest value.

+ L0_DataNak, desc="L0 received INV message, specifies its data isalso stale";

   }

   // TYPES
@@ -361,6 +369,10 @@

         if(in_msg.Class == CoherenceClass:INV_DATA) {
             trigger(Event:L0_DataAck, in_msg.addr, cache_entry, tbe);
+        }  else if (in_msg.Class == CoherenceClass:NAK) {
+              trigger(Event:L0_DataNak, in_msg.addr, cache_entry, tbe);
+        }  else if (in_msg.Class == CoherenceClass:PUTX_COPY) {
+              trigger(Event:L0_DataCopy, in_msg.addr, cache_entry, tbe);
         }  else if (in_msg.Class == CoherenceClass:INV_ACK) {
             trigger(Event:L0_Ack, in_msg.addr, cache_entry, tbe);
         }  else {
@@ -808,18 +820,6 @@
     k_popL0RequestQueue;
   }

-  transition(EE, Load, E) {
-    hh_xdata_to_l0;
-    uu_profileHit;
-    k_popL0RequestQueue;
-  }
-
-  transition(MM, Load, M) {
-    hh_xdata_to_l0;
-    uu_profileHit;
-    k_popL0RequestQueue;
-  }
-
   transition({S,SS}, Store, SM) {
     i_allocateTBE;
     c_issueUPGRADE;
@@ -1034,7 +1034,7 @@
     kd_wakeUpDependents;
   }

-  transition(SM, L0_Invalidate_Else, SM_IL0) {
+  transition(SM, {Inv,L0_Invalidate_Else}, SM_IL0) {
     forward_eviction_to_L0_else;
   }

@@ -1093,4 +1093,55 @@
   transition({S_IL0, M_IL0, E_IL0, MM_IL0}, {Inv, Fwd_GETX, Fwd_GETS}) {
     z2_stallAndWaitL2Queue;
   }
+
+  // hardware transactional memory
+
+  // If a transaction has aborted, the L0 could re-request
+  // data which is in E or EE state in L1.
+  transition({EE,E}, Load, E) {
+    hh_xdata_to_l0;
+    uu_profileHit;
+    k_popL0RequestQueue;
+  }
+
+  // If a transaction has aborted, the L0 could re-request
+  // data which is in M or MM state in L1.
+  transition({MM,M}, Load, M) {
+    hh_xdata_to_l0;
+    uu_profileHit;
+    k_popL0RequestQueue;
+  }
+
+  // If a transaction has aborted, the L0 could re-request
+  // data which is in M state in L1.
+  transition({E,M}, Store, M) {
+    hh_xdata_to_l0;
+    uu_profileHit;
+    k_popL0RequestQueue;
+  }
+
+  // A transaction may have tried to modify a cache block in M state with
+  // non-speculative (pre-transactional) data. This needs to be copied
+  // to the L1 before any further modifications occur at the L0.
+  transition({M,E}, L0_DataCopy, M) {
+    u_writeDataFromL0Request;
+    k_popL0RequestQueue;
+  }
+
+  transition({M_IL0, E_IL0}, L0_DataCopy, M_IL0) {
+    u_writeDataFromL0Request;
+    k_popL0RequestQueue;
+  }
+
+  // A NAK from the L0 means that the L0 invalidated its
+  // modified line (due to an abort) so it is therefore necessary
+  // to use the L1's correct version instead
+  transition({M_IL0, E_IL0}, L0_DataNak, MM) {
+    k_popL0RequestQueue;
+    kd_wakeUpDependents;
+  }
+
+  transition(I, L1_Replacement) {
+    ff_deallocateCacheBlock;
+  }
 }

diff --git a/src/mem/ruby/protocol/MESI_Three_Level-msg.smb/src/mem/ruby/protocol/MESI_Three_Level-msg.sm

index e738b8a..a16e374 100644
--- a/src/mem/ruby/protocol/MESI_Three_Level-msg.sm
+++ b/src/mem/ruby/protocol/MESI_Three_Level-msg.sm
@@ -48,6 +48,7 @@
   INV_OWN,   desc="Invalidate (own)";
   INV_ELSE,  desc="Invalidate (else)";
   PUTX,      desc="Replacement message";

+ PUTX_COPY, desc="Data block to be copied in L1. L0 will still be in Mstate";


   WB_ACK,    desc="Writeback ack";

@@ -59,6 +60,7 @@
   DATA, desc="Data block for L1 cache in S state";
   DATA_EXCLUSIVE, desc="Data block for L1 cache in M/E state";
   ACK, desc="Generic invalidate ack";

+ NAK, desc="Used by L0 to tell L1 that it cannot provide the latestvalue";


   // This is a special case in which the L1 cache lost permissions to the
   // shared block before it got the data. So the L0 cache can use the data

diff --git a/src/mem/ruby/protocol/RubySlicc_Exports.smb/src/mem/ruby/protocol/RubySlicc_Exports.sm

index 08d30cf..03320a6 100644
--- a/src/mem/ruby/protocol/RubySlicc_Exports.sm
+++ b/src/mem/ruby/protocol/RubySlicc_Exports.sm
@@ -187,6 +187,31 @@
   Release,           desc="Release operation";
   Acquire,           desc="Acquire opertion";
   AcquireRelease,    desc="Acquire and Release opertion";
+  HTM_start,         desc="hardware memory transaction: begin";
+  HTM_commit,        desc="hardware memory transaction: commit";
+  HTM_cancel,        desc="hardware memory transaction: cancel";
+  HTM_abort,         desc="hardware memory transaction: abort";
+}
+
+bool isWriteRequest(RubyRequestType type);
+bool isDataReadRequest(RubyRequestType type);
+bool isReadRequest(RubyRequestType type);
+bool isHtmCmdRequest(RubyRequestType type);
+
+// hardware transactional memory
+RubyRequestType htmCmdToRubyRequestType(Packet *pkt);
+
+enumeration(HtmCallbackMode, desc="...", default="HtmCallbackMode_NULL") {
+  HTM_CMD,          desc="htm command";
+  LD_FAIL,          desc="htm transaction failed - inform via read";
+  ST_FAIL,          desc="htm transaction failed - inform via write";
+}
+

+enumeration(HtmFailedInCacheReason, desc="...",default="HtmFailedInCacheReason_NO_FAIL") {

+  NO_FAIL,          desc="no failure in cache";
+  FAIL_SELF,        desc="failed due local cache's replacement policy";
+  FAIL_REMOTE,      desc="failed due remote invalidation";
+  FAIL_OTHER,       desc="failed due other circumstances";
 }

enumeration(SequencerRequestType, desc="...",default="SequencerRequestType_NULL") {diff --git a/src/mem/ruby/protocol/RubySlicc_Types.smb/src/mem/ruby/protocol/RubySlicc_Types.sm

index b59cf97..d860392 100644
--- a/src/mem/ruby/protocol/RubySlicc_Types.sm
+++ b/src/mem/ruby/protocol/RubySlicc_Types.sm
@@ -132,10 +132,14 @@
   // ll/sc support
   void writeCallbackScFail(Addr, DataBlock);
   bool llscCheckMonitor(Addr);
+  void llscClearLocalMonitor();

   void evictionCallback(Addr);
   void recordRequestType(SequencerRequestType);
   bool checkResourceAvailable(CacheResourceType, Addr);
+
+  // hardware transactional memory
+  void htmCallback(Addr, HtmCallbackMode, HtmFailedInCacheReason);
 }

 structure (GPUCoalescer, external = "yes") {
@@ -189,6 +193,8 @@
   HSAScope scope,            desc="HSA scope";
   HSASegment segment,        desc="HSA segment";
   PacketPtr pkt,             desc="Packet associated with this request";

+ bool htm_from_transaction, desc="Memory request originates within a HTMtransaction";+ int htm_transaction_uid, desc="Used to identify the unique HTMtransaction that produced this request";

 }

 structure(AbstractCacheEntry, primitive="yes", external = "yes") {
@@ -222,6 +228,10 @@
   void recordRequestType(CacheRequestType, Addr);
   bool checkResourceAvailable(CacheResourceType, Addr);

+  // hardware transactional memory
+  void htmCommitTransaction();
+  void htmAbortTransaction();
+
   int getCacheSize();
   int getNumBlocks();
   Addr getAddressAtIdx(int);

diff --git a/src/mem/ruby/slicc_interface/AbstractCacheEntry.ccb/src/mem/ruby/slicc_interface/AbstractCacheEntry.cc

index b98425d..ff68f2e 100644
--- a/src/mem/ruby/slicc_interface/AbstractCacheEntry.cc
+++ b/src/mem/ruby/slicc_interface/AbstractCacheEntry.cc
@@ -1,4 +1,16 @@
 /*
+ * Copyright (c) 2019 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
  * Copyright (c) 1999-2008 Mark D. Hill and David A. Wood
  * All rights reserved.
  *
@@ -37,6 +49,8 @@
     m_Address = 0;
     m_locked = -1;
     m_last_touch_tick = 0;
+    htm_in_read_set = false;
+    htm_in_write_set = false;
 }

 AbstractCacheEntry::~AbstractCacheEntry()
@@ -81,3 +95,27 @@
             m_Address, m_locked, context);
     return m_locked == context;
 }
+
+void
+AbstractCacheEntry::setInHtmReadSet(bool val)
+{
+    htm_in_read_set = val;
+}
+
+void
+AbstractCacheEntry::setInHtmWriteSet(bool val)
+{
+    htm_in_write_set = val;
+}
+
+bool
+AbstractCacheEntry::getInHtmReadSet() const
+{
+    return htm_in_read_set;
+}
+
+bool
+AbstractCacheEntry::getInHtmWriteSet() const
+{
+    return htm_in_write_set;
+}

diff --git a/src/mem/ruby/slicc_interface/AbstractCacheEntry.hhb/src/mem/ruby/slicc_interface/AbstractCacheEntry.hh

index 056486c..d03e67e 100644
--- a/src/mem/ruby/slicc_interface/AbstractCacheEntry.hh
+++ b/src/mem/ruby/slicc_interface/AbstractCacheEntry.hh
@@ -1,4 +1,16 @@
 /*
+ * Copyright (c) 2019 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
  * Copyright (c) 1999-2008 Mark D. Hill and David A. Wood
  * All rights reserved.
  *
@@ -90,6 +102,18 @@

     // Set the last access Tick.
     void setLastAccess(Tick tick) { m_last_touch_tick = tick; }
+
+    // hardware transactional memory
+    void setInHtmReadSet(bool val);
+    void setInHtmWriteSet(bool val);
+    bool getInHtmReadSet() const;
+    bool getInHtmWriteSet() const;
+    virtual void invalidateEntry() {}
+
+  private:
+    // hardware transactional memory
+    bool htm_in_read_set;
+    bool htm_in_write_set;
 };

 inline std::ostream&

diff --git a/src/mem/ruby/slicc_interface/RubyRequest.hhb/src/mem/ruby/slicc_interface/RubyRequest.hh

index 68b11f5..6461721 100644
--- a/src/mem/ruby/slicc_interface/RubyRequest.hh
+++ b/src/mem/ruby/slicc_interface/RubyRequest.hh
@@ -1,4 +1,16 @@
 /*
+ * Copyright (c) 2019 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
  * Copyright (c) 2009 Mark D. Hill and David A. Wood
  * All rights reserved.
  *
@@ -58,6 +70,9 @@
     WriteMask m_writeMask;
     DataBlock m_WTData;
     int m_wfid;
+    bool m_htm_from_transaction;
+    uint64_t m_htm_transaction_uid;
+

     RubyRequest(Tick curTime, uint64_t _paddr, uint8_t* _data, int _len,
         uint64_t _pc, RubyRequestType _type, RubyAccessMode _access_mode,
@@ -72,7 +87,9 @@
           m_Prefetch(_pb),
           data(_data),
           m_pkt(_pkt),
-          m_contextId(_core_id)
+          m_contextId(_core_id),
+          m_htm_from_transaction(false),
+          m_htm_transaction_uid(0)
     {
         m_LineAddress = makeLineAddress(m_PhysicalAddress);
     }
@@ -95,7 +112,9 @@
           m_contextId(_core_id),
           m_writeMask(_wm_size,_wm_mask),
           m_WTData(_Data),
-          m_wfid(_proc_id)
+          m_wfid(_proc_id),
+          m_htm_from_transaction(false),
+          m_htm_transaction_uid(0)
     {
         m_LineAddress = makeLineAddress(m_PhysicalAddress);
     }
@@ -119,7 +138,9 @@
           m_contextId(_core_id),
           m_writeMask(_wm_size,_wm_mask,_atomicOps),
           m_WTData(_Data),
-          m_wfid(_proc_id)
+          m_wfid(_proc_id),
+          m_htm_from_transaction(false),
+          m_htm_transaction_uid(0)
     {
         m_LineAddress = makeLineAddress(m_PhysicalAddress);
     }

diff --git a/src/mem/ruby/slicc_interface/RubySlicc_Util.hhb/src/mem/ruby/slicc_interface/RubySlicc_Util.hh

index e3d4f0b..9e1af4d 100644
--- a/src/mem/ruby/slicc_interface/RubySlicc_Util.hh
+++ b/src/mem/ruby/slicc_interface/RubySlicc_Util.hh
@@ -1,4 +1,16 @@
 /*
+ * Copyright (c) 2019 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
  * Copyright (c) 1999-2008 Mark D. Hill and David A. Wood
  * Copyright (c) 2013 Advanced Micro Devices, Inc.
  * All rights reserved.
@@ -85,6 +97,75 @@
   return 1024;
 }

+inline bool
+isWriteRequest(RubyRequestType type)
+{
+    if ((type == RubyRequestType_ST) ||
+        (type == RubyRequestType_ATOMIC) ||
+        (type == RubyRequestType_RMW_Read) ||
+        (type == RubyRequestType_RMW_Write) ||
+        (type == RubyRequestType_Store_Conditional) ||
+        (type == RubyRequestType_Locked_RMW_Read) ||
+        (type == RubyRequestType_Locked_RMW_Write) ||
+        (type == RubyRequestType_FLUSH)) {
+            return true;
+    } else {
+            return false;
+    }
+}
+
+inline bool
+isDataReadRequest(RubyRequestType type)
+{
+    if ((type == RubyRequestType_LD) ||
+        (type == RubyRequestType_Load_Linked)) {
+            return true;
+    } else {
+            return false;
+    }
+}
+
+inline bool
+isReadRequest(RubyRequestType type)
+{
+    if (isDataReadRequest(type) ||
+        (type == RubyRequestType_IFETCH)) {
+            return true;
+    } else {
+            return false;
+    }
+}
+
+inline bool
+isHtmCmdRequest(RubyRequestType type)
+{
+    if ((type == RubyRequestType_HTM_start)  ||
+        (type == RubyRequestType_HTM_commit) ||
+        (type == RubyRequestType_HTM_cancel) ||
+        (type == RubyRequestType_HTM_abort)) {
+            return true;
+    } else {
+            return false;
+    }
+}
+
+inline RubyRequestType
+htmCmdToRubyRequestType(const Packet *pkt)
+{
+    if (pkt->req->isHTMStart()) {
+        return RubyRequestType_HTM_start;
+    } else if (pkt->req->isHTMCommit()) {
+        return RubyRequestType_HTM_commit;
+    } else if (pkt->req->isHTMCancel()) {
+        return RubyRequestType_HTM_cancel;
+    } else if (pkt->req->isHTMAbort()) {
+        return RubyRequestType_HTM_abort;
+    }
+    else {
+        panic("invalid ruby packet type\n");
+    }
+}
+
 /**

* This function accepts an address, a data block and a packet. If theaddress

  * range for the data block contains the address which the packet needs to

diff --git a/src/mem/ruby/structures/CacheMemory.ccb/src/mem/ruby/structures/CacheMemory.cc

index b734308..494d5d8 100644
--- a/src/mem/ruby/structures/CacheMemory.cc
+++ b/src/mem/ruby/structures/CacheMemory.cc
@@ -1,4 +1,16 @@
 /*
+ * Copyright (c) 2019 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
  * Copyright (c) 1999-2012 Mark D. Hill and David A. Wood
  * Copyright (c) 2013 Advanced Micro Devices, Inc.
  * All rights reserved.
@@ -31,6 +43,7 @@

 #include "base/intmath.hh"
 #include "base/logging.hh"
+#include "debug/HtmMem.hh"
 #include "debug/RubyCache.hh"
 #include "debug/RubyCacheTrace.hh"
 #include "debug/RubyResourceStalls.hh"
@@ -502,6 +515,23 @@
     m_cache[cacheSet][loc]->clearLocked();
 }

+void
+CacheMemory::clearLockedAll(int context)
+{
+    // iterate through every set and way to get a cache line
+    for (auto i = m_cache.begin(); i != m_cache.end(); ++i) {
+        std::vector<AbstractCacheEntry*> set = *i;
+        for (auto j = set.begin(); j != set.end(); ++j) {
+            AbstractCacheEntry *line = *j;
+            if (line && line->isLocked(context)) {
+                DPRINTF(RubyCache, "Clear Lock for addr: %#x\n",
+                    line->m_Address);
+                line->clearLocked();
+            }
+        }
+    }
+}
+
 bool
 CacheMemory::isLocked(Addr address, int context)
 {
@@ -603,6 +633,34 @@
         .desc("number of stalls caused by data array")
         .flags(Stats::nozero)
         ;
+
+    HtmTransCommitReadSet
+        .init(8)
+        .name(name() + ".htm_transaction_committed_read_set")
+        .desc("read set size of a committed transaction")
+        .flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
+        ;
+
+    HtmTransCommitWriteSet
+        .init(8)
+        .name(name() + ".htm_transaction_committed_write_set")
+        .desc("write set size of a committed transaction")
+        .flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
+        ;
+
+    HtmTransAbortReadSet
+        .init(8)
+        .name(name() + ".htm_transaction_aborted_read_set")
+        .desc("read set size of a aborted transaction")
+        .flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
+        ;
+
+    HtmTransAbortWriteSet
+        .init(8)
+        .name(name() + ".htm_transaction_aborted_write_set")
+        .desc("write set size of a aborted transaction")
+        .flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
+        ;
 }

 // assumption: SLICC generated files will only call this function
@@ -680,3 +738,69 @@
 {
   return (m_cache[cache_set][loc]->m_Permission != AccessPermission_Busy);
 }
+
+/* hardware transactional memory */
+
+void
+CacheMemory::htmAbortTransaction()
+{
+    uint64_t htmReadSetSize = 0;
+    uint64_t htmWriteSetSize = 0;
+
+    // iterate through every set and way to get a cache line
+    for (auto i = m_cache.begin(); i != m_cache.end(); ++i)
+    {
+        std::vector<AbstractCacheEntry*> set = *i;
+
+        for (auto j = set.begin(); j != set.end(); ++j)
+        {
+            AbstractCacheEntry *line = *j;
+
+            if (line != nullptr) {
+                htmReadSetSize += (line->getInHtmReadSet() ? 1 : 0);
+                htmWriteSetSize += (line->getInHtmWriteSet() ? 1 : 0);
+                if (line->getInHtmWriteSet()) {
+                    line->invalidateEntry();
+                }
+                line->setInHtmWriteSet(false);
+                line->setInHtmReadSet(false);
+                line->clearLocked();
+            }
+        }
+    }
+
+    HtmTransAbortReadSet.sample(htmReadSetSize);
+    HtmTransAbortWriteSet.sample(htmWriteSetSize);
+    DPRINTF(HtmMem, "htmAbortTransaction: read set=%u write set=%u\n",
+        htmReadSetSize, htmWriteSetSize);
+}
+
+void
+CacheMemory::htmCommitTransaction()
+{
+    uint64_t htmReadSetSize = 0;
+    uint64_t htmWriteSetSize = 0;
+
+    // iterate through every set and way to get a cache line
+    for (auto i = m_cache.begin(); i != m_cache.end(); ++i)
+    {
+        std::vector<AbstractCacheEntry*> set = *i;
+
+        for (auto j = set.begin(); j != set.end(); ++j)
+        {
+            AbstractCacheEntry *line = *j;
+            if (line != nullptr) {
+                htmReadSetSize += (line->getInHtmReadSet() ? 1 : 0);
+                htmWriteSetSize += (line->getInHtmWriteSet() ? 1 : 0);
+                line->setInHtmWriteSet(false);
+                line->setInHtmReadSet(false);
+                line->clearLocked();
+             }
+        }
+    }
+
+    HtmTransCommitReadSet.sample(htmReadSetSize);
+    HtmTransCommitWriteSet.sample(htmWriteSetSize);
+    DPRINTF(HtmMem, "htmCommitTransaction: read set=%u write set=%u\n",
+        htmReadSetSize, htmWriteSetSize);
+}

diff --git a/src/mem/ruby/structures/CacheMemory.hhb/src/mem/ruby/structures/CacheMemory.hh

index fc2c2c8..042aaea 100644
--- a/src/mem/ruby/structures/CacheMemory.hh
+++ b/src/mem/ruby/structures/CacheMemory.hh
@@ -121,6 +121,7 @@
     // provided by the AbstractCacheEntry class.
     void setLocked (Addr addr, int context);
     void clearLocked (Addr addr);
+    void clearLockedAll (int context);
     bool isLocked (Addr addr, int context);

     // Print cache contents
@@ -131,6 +132,10 @@
     bool checkResourceAvailable(CacheResourceType res, Addr addr);
     void recordRequestType(CacheRequestType requestType, Addr addr);

+    // hardware transactional memory
+    void htmAbortTransaction();
+    void htmCommitTransaction();
+
   public:
     Stats::Scalar m_demand_hits;
     Stats::Scalar m_demand_misses;
@@ -150,6 +155,12 @@
     Stats::Scalar numTagArrayStalls;
     Stats::Scalar numDataArrayStalls;

+    // hardware transactional memory
+    Stats::Histogram HtmTransCommitReadSet;
+    Stats::Histogram HtmTransCommitWriteSet;
+    Stats::Histogram HtmTransAbortReadSet;
+    Stats::Histogram HtmTransAbortWriteSet;
+
     int getCacheSize() const { return m_cache_size; }
     int getCacheAssoc() const { return m_cache_assoc; }
     int getNumBlocks() const { return m_cache_num_sets * m_cache_assoc; }

diff --git a/src/mem/ruby/system/RubyPort.ccb/src/mem/ruby/system/RubyPort.cc

index 7632bbb..45c39c9 100644
--- a/src/mem/ruby/system/RubyPort.cc
+++ b/src/mem/ruby/system/RubyPort.cc
@@ -44,6 +44,7 @@
 #include "cpu/testers/rubytest/RubyTester.hh"
 #include "debug/Config.hh"
 #include "debug/Drain.hh"
+#include "debug/HtmMem.hh"
 #include "debug/Ruby.hh"
 #include "mem/ruby/protocol/AccessPermission.hh"
 #include "mem/ruby/slicc_interface/AbstractController.hh"
@@ -169,6 +170,7 @@
 {
     // got a response from a device
     assert(pkt->isResponse());
+    assert(!pkt->htmTransactionFailedInCache());

     // First we must retrieve the request port from the sender State
     RubyPort::SenderState *senderState =
@@ -253,6 +255,7 @@
     // pio port.
     if (pkt->cmd != MemCmd::MemFenceReq) {
         if (!isPhysMemAddress(pkt->getAddr())) {
+            assert(!pkt->req->isHTMCmd());
             assert(ruby_port->memMasterPort.isConnected());
             DPRINTF(RubyPort, "Request address %#x assumed to be a "
                     "pio address\n", pkt->getAddr());
@@ -387,38 +390,38 @@
         // of data.
         rs->getPhysMem()->functionalAccess(pkt);
     } else {
-        bool accessSucceeded = false;
-        bool needsResponse = pkt->needsResponse();
+        bool access_succeded = false;
+        bool needs_response = pkt->needsResponse();

         // Do the functional access on ruby memory
         if (pkt->isRead()) {
-            accessSucceeded = rs->functionalRead(pkt);
+            access_succeded = rs->functionalRead(pkt);
         } else if (pkt->isWrite()) {
-            accessSucceeded = rs->functionalWrite(pkt);
+            access_succeded = rs->functionalWrite(pkt);
         } else {
             panic("Unsupported functional command %s\n", pkt->cmdString());
         }

// Unless the requester explicitly said otherwise, generate anerror if

         // the functional request failed
-        if (!accessSucceeded && !pkt->suppressFuncError()) {
+        if (!access_succeded && !pkt->suppressFuncError()) {
             fatal("Ruby functional %s failed for address %#x\n",
                   pkt->isWrite() ? "write" : "read", pkt->getAddr());
         }

         // turn packet around to go back to requester if response expected
-        if (needsResponse) {
+        if (needs_response) {
             // The pkt is already turned into a reponse if the directory
             // forwarded the request to the memory controller (see
             // AbstractController::functionalMemoryWrite and
             // AbstractMemory::functionalAccess)
             if (!pkt->isResponse())
                 pkt->makeResponse();
-            pkt->setFunctionalResponseStatus(accessSucceeded);
+            pkt->setFunctionalResponseStatus(access_succeded);
         }

         DPRINTF(RubyPort, "Functional access %s!\n",
-                accessSucceeded ? "successful":"failed");
+                access_succeded ? "successful":"failed");
     }
 }

@@ -508,7 +511,7 @@
 void
 RubyPort::MemSlavePort::hitCallback(PacketPtr pkt)
 {
-    bool needsResponse = pkt->needsResponse();
+    bool needs_response = pkt->needsResponse();

     // Unless specified at configuraiton, all responses except failed SC
     // and Flush operations access M5 physical memory.
@@ -544,21 +547,21 @@

     if (pkt->req->isKernel()) {
         accessPhysMem = false;
-        needsResponse = true;
+        needs_response = true;
     }

-    DPRINTF(RubyPort, "Hit callback needs response %d\n", needsResponse);
+    DPRINTF(RubyPort, "Hit callback needs response %d\n", needs_response);

     RubyPort *ruby_port = static_cast<RubyPort *>(&owner);
     RubySystem *rs = ruby_port->m_ruby_system;
     if (accessPhysMem) {
         rs->getPhysMem()->access(pkt);
-    } else if (needsResponse) {
+    } else if (needs_response) {
         pkt->makeResponse();
     }

     // turn packet around to go back to requester if response expected
-    if (needsResponse) {
+    if (needs_response) {
         DPRINTF(RubyPort, "Sending packet back over port\n");
         // Send a response in the same cycle. There is no need to delay the
         // response because the response latency is already incurred in the
@@ -594,6 +597,24 @@
     return ruby_port->system->isMemAddr(addr);
 }

+HtmCacheFailure
+RubyPort::MemSlavePort::htmRetCodeConversion(
+    const HtmFailedInCacheReason ruby_ret_code)
+{
+    switch (ruby_ret_code) {
+      case HtmFailedInCacheReason_NO_FAIL:
+        return HtmCacheFailure::NO_FAIL;
+      case HtmFailedInCacheReason_FAIL_SELF:
+        return HtmCacheFailure::FAIL_SELF;
+      case HtmFailedInCacheReason_FAIL_REMOTE:
+        return HtmCacheFailure::FAIL_REMOTE;
+      case HtmFailedInCacheReason_FAIL_OTHER:
+        return HtmCacheFailure::FAIL_OTHER;
+      default:
+        panic("Invalid htm return code\n");
+    }
+}
+
 void
 RubyPort::ruby_eviction_callback(Addr address)
 {
@@ -627,7 +648,6 @@
     }
 }

-
 int
 RubyPort::functionalWrite(Packet *func_pkt)
 {
@@ -638,4 +658,45 @@
         }
     }
     return num_written;
-}
\ No newline at end of file
+}
+
+void
+RubyPort::rubyHtmCallback(PacketPtr pkt,
+                          const HtmFailedInCacheReason htm_return_code)
+{
+    // The packet has not yet been turned into a response
+    assert(pkt->isRequest());
+
+    // First retrieve the request port from the sender State
+    RubyPort::SenderState *senderState =
+        safe_cast<RubyPort::SenderState *>(pkt->popSenderState());
+    MemSlavePort *port = senderState->port;
+    assert(port != NULL);
+    delete senderState;
+
+    port->htmCallback(pkt, htm_return_code);
+
+    trySendRetries();
+}
+
+void
+RubyPort::MemSlavePort::htmCallback(
+    PacketPtr pkt,
+    const HtmFailedInCacheReason htm_return_code)
+{
+    DPRINTF(RubyPort, "HTM callback: start=%d, commit=%d, "
+                      "cancel=%d, rc=%d\n",
+            pkt->req->isHTMStart(), pkt->req->isHTMCommit(),
+            pkt->req->isHTMCancel(), htm_return_code);
+
+    // turn packet around to go back to requester if response expected
+    if (pkt->needsResponse()) {
+        DPRINTF(RubyPort, "Sending packet back over port\n");
+        pkt->makeHtmTransactionalReqResponse(
+            htmRetCodeConversion(htm_return_code));
+        RubyPort *rp = static_cast<RubyPort *>(&owner);
+        schedTimingResp(pkt, curTick()+rp->m_ruby_system->clockPeriod());
+    } else {
+        delete pkt;
+    }
+}

diff --git a/src/mem/ruby/system/RubyPort.hhb/src/mem/ruby/system/RubyPort.hh

index b14e707..da23869 100644
--- a/src/mem/ruby/system/RubyPort.hh
+++ b/src/mem/ruby/system/RubyPort.hh
@@ -45,8 +45,10 @@
 #include <cassert>
 #include <string>

+#include "mem/htm.hh"
 #include "mem/ruby/common/MachineID.hh"
 #include "mem/ruby/network/MessageBuffer.hh"
+#include "mem/ruby/protocol/HtmFailedInCacheReason.hh"
 #include "mem/ruby/protocol/RequestStatus.hh"
 #include "mem/ruby/system/RubySystem.hh"
 #include "mem/tport.hh"
@@ -85,6 +87,7 @@
                      PortID id, bool _no_retry_on_stall);
         void hitCallback(PacketPtr pkt);
         void evictionCallback(Addr address);
+        void htmCallback(PacketPtr, const HtmFailedInCacheReason);

       protected:
         bool recvTimingReq(PacketPtr pkt);
@@ -100,6 +103,17 @@

       private:
         bool isPhysMemAddress(Addr addr) const;
+
+        /**
+         * Htm return code conversion
+         *
+         * This helper is a hack meant to convert the autogenerated ruby
+         * enum (HtmFailedInCacheReason) to the manually defined one
+         * (HtmCacheFailure). This is needed since the cpu code would
+         * otherwise have to include the ruby generated headers in order
+         * to handle the htm return code.
+         */

+ HtmCacheFailure htmRetCodeConversion(const HtmFailedInCacheReasonrc);

     };

     class PioMasterPort : public QueuedMasterPort
@@ -173,6 +187,7 @@
     void ruby_hit_callback(PacketPtr pkt);
     void testDrainComplete();
     void ruby_eviction_callback(Addr address);

+ void rubyHtmCallback(PacketPtr pkt, const HtmFailedInCacheReasonfail_r);


     /**
      * Called by the PIO port when receiving a timing response.

diff --git a/src/mem/ruby/system/Sequencer.ccb/src/mem/ruby/system/Sequencer.cc

index aa134f4..643adf2 100644
--- a/src/mem/ruby/system/Sequencer.cc
+++ b/src/mem/ruby/system/Sequencer.cc
@@ -45,6 +45,7 @@
 #include "base/logging.hh"
 #include "base/str.hh"
 #include "cpu/testers/rubytest/RubyTester.hh"
+#include "debug/HtmMem.hh"
 #include "debug/LLSC.hh"
 #include "debug/MemoryAccess.hh"
 #include "debug/ProtocolTrace.hh"
@@ -55,6 +56,7 @@
 #include "mem/ruby/protocol/PrefetchBit.hh"
 #include "mem/ruby/protocol/RubyAccessMode.hh"
 #include "mem/ruby/slicc_interface/RubyRequest.hh"
+#include "mem/ruby/slicc_interface/RubySlicc_Util.hh"
 #include "mem/ruby/system/RubySystem.hh"
 #include "sim/system.hh"

@@ -70,6 +72,10 @@
     : RubyPort(p), m_IncompleteTimes(MachineType_NUM),
       deadlockCheckEvent([this]{ wakeup(); }, "Sequencer deadlock check")
 {
+    // hardware transactional memory
+    m_htmstart_tick = 0;
+    m_htmstart_instruction = 0;
+
     m_outstanding_count = 0;

     m_instCache_ptr = p->icache;
@@ -149,6 +155,12 @@
 }

 void
+Sequencer::llscClearLocalMonitor()
+{
+    m_dataCache_ptr->clearLockedAll(m_version);
+}
+
+void
 Sequencer::wakeup()
 {
     assert(drainState() != DrainState::Draining);
@@ -175,6 +187,28 @@
         total_outstanding += table_entry.second.size();
     }

+    // hardware transactional memory commands
+    std::deque<SequencerRequest*>::iterator htm =
+      m_htmCmdRequestTable.begin();
+    std::deque<SequencerRequest*>::iterator htm_end =
+      m_htmCmdRequestTable.end();
+    for (; htm != htm_end; ++htm) {
+        SequencerRequest* request = *htm;
+        if (current_time - request->issue_time < m_deadlock_threshold)
+            continue;
+
+        panic("Possible Deadlock detected. Aborting!\n"
+              "version: %d m_htmCmdRequestTable: %d "
+              "current time: %u issue_time: %d difference: %d\n",
+              m_version, m_htmCmdRequestTable.size(),
+              current_time * clockPeriod(),
+              request->issue_time * clockPeriod(),
+              (current_time * clockPeriod()) -
+              (request->issue_time * clockPeriod()));
+    }
+
+    total_outstanding += m_htmCmdRequestTable.size();
+
     assert(m_outstanding_count == total_outstanding);

     if (m_outstanding_count > 0) {
@@ -239,15 +273,32 @@
         schedule(deadlockCheckEvent, clockEdge(m_deadlock_threshold));
     }

-    Addr line_addr = makeLineAddress(pkt->getAddr());
-    // Check if there is any outstanding request for the same cache line.
-    auto &seq_req_list = m_RequestTable[line_addr];
-    // Create a default entry

- seq_req_list.emplace_back(pkt, primary_type, secondary_type,curCycle());

-    m_outstanding_count++;
+    if (isHtmCmdRequest(primary_type)) {

+ // for the moment, allow just one HTM cmd into the cachecontroller.

+        // Later this can be adjusted for optimization, e.g.
+        // back-to-back HTM_starts.
+        if ((m_htmCmdRequestTable.size() > 0) && !pkt->req->isHTMAbort())
+            return RequestStatus_BufferFull;

-    if (seq_req_list.size() > 1) {
-        return RequestStatus_Aliased;
+        // insert request into HtmCmd queue
+        SequencerRequest* htmReq =
+            new SequencerRequest(pkt, primary_type, secondary_type,
+                curCycle());
+        assert(htmReq);
+        m_htmCmdRequestTable.push_back(htmReq);
+        m_outstanding_count++;
+    } else {
+        Addr line_addr = makeLineAddress(pkt->getAddr());

+ // Check if there is any outstanding request for the same cacheline.

+        auto &seq_req_list = m_RequestTable[line_addr];
+        // Create a default entry
+        seq_req_list.emplace_back(pkt, primary_type,
+            secondary_type, curCycle());
+        m_outstanding_count++;
+
+        if (seq_req_list.size() > 1) {
+            return RequestStatus_Aliased;
+        }
     }

     m_outstandReqHist.sample(m_outstanding_count);
@@ -560,16 +611,122 @@
     }
 }

+void
+Sequencer::htmCallback(Addr address,
+                       const HtmCallbackMode mode,
+                       const HtmFailedInCacheReason htm_return_code)
+{
+    // mode=0: HTM command
+    // mode=1: transaction failed - inform via LD
+    // mode=2: transaction failed - inform via ST
+
+    if (mode == HtmCallbackMode_HTM_CMD) {
+        SequencerRequest* request = nullptr;
+
+        assert(m_htmCmdRequestTable.size() > 0);
+
+        request = m_htmCmdRequestTable.front();
+        m_htmCmdRequestTable.pop_front();
+        markRemoved();
+
+        assert(isHtmCmdRequest(request->m_type));
+
+        PacketPtr pkt = request->pkt;
+        delete request;
+
+        // valid responses have zero as the payload
+        uint8_t* dataptr = pkt->getPtr<uint8_t>();
+        memset(dataptr, 0, pkt->getSize());
+        *dataptr = (uint8_t) htm_return_code;
+
+        // record stats
+        if (htm_return_code == HtmFailedInCacheReason_NO_FAIL) {
+            if (pkt->req->isHTMStart()) {
+                m_htmstart_tick = pkt->req->time();
+                m_htmstart_instruction = pkt->req->getInstCount();
+                DPRINTF(HtmMem, "htmStart - htmUid=%u\n",
+                        pkt->getHtmTransactionUid());
+            } else if (pkt->req->isHTMCommit()) {

+ Tick transaction_ticks = pkt->req->time() -m_htmstart_tick;+ Cycles transaction_cycles =ticksToCycles(transaction_ticks);

+                m_htm_transaction_cycles.sample(transaction_cycles);
+                m_htmstart_tick = 0;
+                Counter transaction_instructions =
+                    pkt->req->getInstCount() - m_htmstart_instruction;
+                m_htm_transaction_instructions.sample(
+                  transaction_instructions);
+                m_htmstart_instruction = 0;
+                DPRINTF(HtmMem, "htmCommit - htmUid=%u\n",
+                        pkt->getHtmTransactionUid());
+            } else if (pkt->req->isHTMAbort()) {
+                HtmFailureFaultCause cause = pkt->req->getHtmAbortCause();
+                assert(cause != HtmFailureFaultCause_INVALID);
+                assert(cause < HtmFailureFaultCause_NumCauses);
+                m_htm_transaction_abort_cause[cause]++;
+                DPRINTF(HtmMem, "htmAbort - reason=%s - htmUid=%u\n",
+                        HtmFailureFaultCauseStrings[cause],
+                        pkt->getHtmTransactionUid());
+            }
+        } else {
+            DPRINTF(HtmMem, "HTM_CMD: fail - htmUid=%u\n",
+                pkt->getHtmTransactionUid());
+        }
+
+        rubyHtmCallback(pkt, htm_return_code);
+        testDrainComplete();
+    } else if (mode == HtmCallbackMode_LD_FAIL ||
+               mode == HtmCallbackMode_ST_FAIL) {
+        // transaction failed
+        assert(address == makeLineAddress(address));
+        assert(m_RequestTable.find(address) != m_RequestTable.end());
+
+        auto &seq_req_list = m_RequestTable[address];
+        while (!seq_req_list.empty()) {
+            SequencerRequest &request = seq_req_list.front();
+
+            PacketPtr pkt = request.pkt;
+            markRemoved();
+
+            // TODO - atomics
+
+            // store conditionals should indicate failure
+            if (request.m_type == RubyRequestType_Store_Conditional) {
+                pkt->req->setExtraData(0);
+            }
+
+            DPRINTF(HtmMem, "%s_FAIL: size=%d - "
+                            "addr=0x%lx - htmUid=%d\n",

+ (mode ==HtmCallbackMode_LD_FAIL) ? "LD" : "ST",

+                            pkt->getSize(),
+                            address, pkt->getHtmTransactionUid());
+
+            rubyHtmCallback(pkt, htm_return_code);
+            testDrainComplete();
+            pkt = nullptr;
+            seq_req_list.pop_front();
+        }
+        // free all outstanding requests corresponding to this address
+        if (seq_req_list.empty()) {
+            m_RequestTable.erase(address);
+        }
+    } else {
+        panic("unrecognised HTM callback mode\n");
+    }
+}
+
 bool
 Sequencer::empty() const
 {
-    return m_RequestTable.empty();
+    return m_RequestTable.empty() && m_htmCmdRequestTable.empty();
 }

 RequestStatus
 Sequencer::makeRequest(PacketPtr pkt)
 {
-    if (m_outstanding_count >= m_max_outstanding_requests) {
+    // HTM abort signals must be allowed to reach the Sequencer
+    // the same cycle they are issued. They cannot be retried.
+    if ((m_outstanding_count >= m_max_outstanding_requests) &&
+        !pkt->req->isHTMAbort()) {
         return RequestStatus_BufferFull;
     }

@@ -629,7 +786,10 @@
             //
             primary_type = secondary_type = RubyRequestType_ST;
         } else if (pkt->isRead()) {
-            if (pkt->req->isInstFetch()) {
+            // hardware transactional memory commands
+            if (pkt->req->isHTMCmd()) {

+ primary_type = secondary_type =htmCmdToRubyRequestType(pkt);

+            } else if (pkt->req->isInstFetch()) {
                 primary_type = secondary_type = RubyRequestType_IFETCH;
             } else {
                 bool storeCheck = false;
@@ -706,6 +866,14 @@
             printAddress(msg->getPhysicalAddress()),
             RubyRequestType_to_string(secondary_type));

+    // hardware transactional memory
+    // If the request originates in a transaction,
+    // then mark the Ruby message as such.
+    if (pkt->isHtmTransactional()) {
+        msg->m_htm_from_transaction = true;
+        msg->m_htm_transaction_uid = pkt->getHtmTransactionUid();
+    }
+
     Tick latency = cyclesToTicks(

m_controller->mandatoryQueueLatency(secondary_type));

     assert(latency > 0);
@@ -729,12 +897,28 @@
     return out;
 }

+template <class VALUE>
+std::ostream &
+operator<<(ostream &out, const std::deque<VALUE> &queue)
+{
+    auto i = queue.begin();
+    auto end = queue.end();
+
+    out << "[";
+    for (; i != end; ++i)
+        out << " " << *i;
+    out << " ]";
+
+    return out;
+}
+
 void
 Sequencer::print(ostream& out) const
 {
     out << "[Sequencer: " << m_version
         << ", outstanding requests: " << m_outstanding_count
         << ", request table: " << m_RequestTable
+        << ", htm cmd request table: " << m_htmCmdRequestTable
         << "]";
 }

@@ -756,6 +940,32 @@
 {
     RubyPort::regStats();

+    // hardware transactional memory
+    m_htm_transaction_cycles
+        .init(10)
+        .name(name() + ".htm_transaction_cycles")
+        .desc("number of cycles spent in an outer transaction")
+        .flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
+        ;
+    m_htm_transaction_instructions
+        .init(10)
+        .name(name() + ".htm_transaction_instructions")
+        .desc("number of instructions spent in an outer transaction")
+        .flags(Stats::pdf | Stats::dist | Stats::nozero | Stats::nonan)
+        ;
+    m_htm_transaction_abort_cause
+        .init(HtmFailureFaultCause_NumCauses)
+        .name(name() + ".htm_transaction_abort_cause")
+        .desc("cause of htm transaction abort")
+        .flags(Stats::total | Stats::pdf | Stats::dist | Stats::nozero)
+        ;
+
+    for (unsigned i = 0; i < HtmFailureFaultCause_NumCauses; ++i) {
+        m_htm_transaction_abort_cause.subname(
+            i,
+            HtmFailureFaultCauseStrings[i]);
+    }
+
     // These statistical variables are not for display.
     // The profiler will collate these across different
     // sequencers and display those collated statistics.

diff --git a/src/mem/ruby/system/Sequencer.hhb/src/mem/ruby/system/Sequencer.hh

index ebca568..1346970 100644
--- a/src/mem/ruby/system/Sequencer.hh
+++ b/src/mem/ruby/system/Sequencer.hh
@@ -46,6 +46,8 @@
 #include <unordered_map>

 #include "mem/ruby/common/Address.hh"
+#include "mem/ruby/protocol/HtmCallbackMode.hh"
+#include "mem/ruby/protocol/HtmFailedInCacheReason.hh"
 #include "mem/ruby/protocol/MachineType.hh"
 #include "mem/ruby/protocol/RubyRequestType.hh"
 #include "mem/ruby/protocol/SequencerRequestType.hh"
@@ -113,6 +115,12 @@
                       const Cycles forwardRequestTime = Cycles(0),
                       const Cycles firstResponseTime = Cycles(0));

+    // callback to acknowledge HTM requests and
+    // notify cpu core when htm transaction fails in cache
+    void htmCallback(Addr,
+                     const HtmCallbackMode,
+                     const HtmFailedInCacheReason);
+
     RequestStatus makeRequest(PacketPtr pkt) override;
     bool empty() const;
     int outstandingCount() const override { return m_outstanding_count; }
@@ -202,6 +210,10 @@
     Sequencer& operator=(const Sequencer& obj);

   private:
+    // hardware transactional memory
+    Tick m_htmstart_tick;
+    Counter m_htmstart_instruction;
+
     int m_max_outstanding_requests;
     Cycles m_deadlock_threshold;

@@ -218,10 +230,21 @@
     // RequestTable contains both read and write requests, handles aliasing
     std::unordered_map<Addr, std::list<SequencerRequest>> m_RequestTable;

+    // table/queue for hardware transactional memory commands
+    // these do not have an address so a deque/queue is used instead.
+    std::deque<SequencerRequest*> m_htmCmdRequestTable;
+
     // Global outstanding request count, across all request tables
     int m_outstanding_count;
     bool m_deadlock_check_scheduled;

+    //! Histogram of cycle latencies of HTM transactions
+    Stats::Histogram m_htm_transaction_cycles;
+    //! Histogram of instruction lengths of HTM transactions
+    Stats::Histogram m_htm_transaction_instructions;
+    //! Causes for HTM transaction aborts
+    Stats::Vector m_htm_transaction_abort_cause;
+
     int m_coreId;

     bool m_runningGarnetStandalone;
@@ -294,6 +317,13 @@
      * @return a boolean indicating if the line address was found.
      */
     bool llscCheckMonitor(const Addr);
+
+
+    /**
+     * Removes all addresses from the local monitor.
+     * This is independent of this Sequencer object's version id.
+     */
+    void llscClearLocalMonitor();
 };

 inline std::ostream&

--
To view, visit https://gem5-review.googlesource.com/c/public/gem5/+/30319

To unsubscribe, or for help writing mail filters, visithttps://gem5-review.googlesource.com/settings


Gerrit-Project: public/gem5
Gerrit-Branch: develop
Gerrit-Change-Id: Icc328df93363486e923b8bd54f4d77741d8f5650
Gerrit-Change-Number: 30319
Gerrit-PatchSet: 1
Gerrit-Owner: Giacomo Travaglini <giacomo.travagl...@arm.com>
Gerrit-Reviewer: Timothy Hayes <timothy.ha...@arm.com>
Gerrit-MessageType: newchange
_______________________________________________
gem5-dev mailing list -- gem5-dev@gem5.org
To unsubscribe send an email to gem5-dev-le...@gem5.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[gem5-dev] Change in gem5/gem5[develop]: mem-ruby: HTM mem implementation

Reply via email to