Re: [gem5-dev] Improved regression categorisation

2014-12-23 Thread Andreas Hansson via gem5-dev
Hi Steve,

The 00.hello tests are below 10 seconds and have too high SNR to even make
it into my report :-), so yes you are right in that they are included in
the ‘short’ regressions.

This is definitely an intermediate step, but in any case we benefit from
having a more sensible classification.

Thanks for the feedback.

Andreas

On 22/12/2014 21:21, Steve Reinhardt via gem5-dev gem5-dev@gem5.org
wrote:

Sounds reasonable to me.  I'm not too particular about the naming.

I am surprised that even the o3 hello world tests wouldn't be  180
seconds though.  It would be nice to have the quick/short/zippy/whatever
test category exercise o3 at least a little bit.

As far as composing regression paths, I agree it's awkward, but in general
I use the util/regress script to run batches of tests, then just
copy/paste
the ones that fail if I need to re-run them individually.

Of course, all this should still be considered merely stopgap until we get
a better test system.

Steve



On Mon, Dec 22, 2014 at 12:45 PM, Gabe Black via gem5-dev
gem5-dev@gem5.org
 wrote:

 I mean quick, medium, slow, not quick, medium, fast.

 On Mon, Dec 22, 2014 at 12:44 PM, Gabe Black gabebl...@google.com
wrote:

  I complained about those names a long time ago, and I still think they
  aren't very good. quick and long aren't really on the same scale,
to
  start with. Something can be quick (a rate) and still take a long
time.
  Medium is very generic and so isn't on a different axis, but since the
  others aren't lined up it's not as clear as it could be. I would
suggest
  either:
 
  short, medium, long
 
  or
 
  quick, medium, fast
 
  Preferably the first. We have another collection of options the second
  would collide with, namely fast, opt, debug, etc.
 
  If somebody new came along and saw there were fast/quick and opt/long
  regressions, it wouldn't be obvious what that meant. I also think it's
 not
  easy to compose one of those regression paths since I can never
remember
  what all the parts are or what order they go in and it's not
documented
  anywhere obvious. That's a separate problem though.
 
  Gabe
 
  On Mon, Dec 22, 2014 at 2:39 AM, Andreas Hansson via gem5-dev 
  gem5-dev@gem5.org wrote:
 
  Hi all,
 
  At the moment we run roughly 120 regressions, and divide them into
quick
  and long somewhat arbitrarily. Anyone doing active development and
using
  quick as their “quick” way of checking that nothing is broken has to
 wait
  more than 10 minutes for some of these regressions to finish, which
 seems a
  bit of a stretch. It turns out the actual regression run times
follow an
  exponential distribution, ranging from a few seconds up to 10k
seconds
  (almost 3 hours). I propose we also start using medium (mentioned in
a
 few
  places), and use a slightly more structured approach in dividing
them up
  into quick, medium and long.
 
  Here is what I propose:
 
  Quick – anything below 180 seconds, resulting in roughly 40
regressions
  across all ISAs. The turn around for a quick regression run for NULL,
  ALPHA, ARM and X86 (what I would deem the minimum to run) should
thus be
  below 5 minutes of wall-clock time. Note that there are plenty
  configurations not covered by this (o3, realview64 etc).
 
  Medium – anything above 180 seconds, but below 1800 seconds, also
  resulting in roughly 40 regressions.
 
  Long – anything 1800 seconds.
 
  With this split, quick could be used as part of any development, to
get
  an indication that everything is ok. For a sensible coverage before
 posting
  any patch, quick and medium should do the job. The cronjobs we have
 running
  at the moment could thus do 'quick,medium' for the daily one, and
  'quick,medium,long’ for the weekly one.
 
  Thoughts? Ideas? Additional comments?
 
  Thanks,
 
  Andreas
 
 
  -- IMPORTANT NOTICE: The contents of this email and any attachments
are
  confidential and may also be privileged. If you are not the intended
  recipient, please notify the sender immediately and do not disclose
the
  contents to any other person, use it for any purpose, or store or
copy
 the
  information in any medium. Thank you.
 
  ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
  Registered in England  Wales, Company No: 2557590
  ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
 9NJ,
  Registered in England  Wales, Company No: 2548782
  ___
  gem5-dev mailing list
  gem5-dev@gem5.org
  http://m5sim.org/mailman/listinfo/gem5-dev
 
 
 
 ___
 gem5-dev mailing list
 gem5-dev@gem5.org
 http://m5sim.org/mailman/listinfo/gem5-dev

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and 

[gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2014-12-23 Thread Cron Daemon via gem5-dev
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-atomic passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing-ruby 
passed.
* build/ALPHA/tests/opt/quick/se/20.eio-short/alpha/eio/simple-atomic 
passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/inorder-timing passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/o3-timing passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory
 passed.
* build/ALPHA/tests/opt/quick/se/20.eio-short/alpha/eio/simple-timing 
passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token
 passed.* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token
 passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/o3-timing passed.
* build/ALPHA/tests/opt/quick/se/30.eio-mp/alpha/eio/simple-atomic-mp 
passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby 
passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory
 passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/o3-timing passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-timing passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-timing 
passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual
 passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-atomic passed.
* build/ALPHA/tests/opt/quick/se/01.hello-2T-smt/alpha/linux/o3-timing 
passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby 
passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-atomic passed.
* build/ALPHA/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby 
passed.
* build/ALPHA/tests/opt/quick/se/30.eio-mp/alpha/eio/simple-timing-mp 
passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/minor-timing passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing passed.
* build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/minor-timing passed.
* 
build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-atomic 
passed.
* build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest passed.
* build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest-filter passed.
* build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-dram-ctrl passed.
* build/ALPHA/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby 
passed.* 
build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-timing-dual
 passed.
* build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-simple-mem passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA/tests/opt/quick/fs/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic
 passed.
* build/POWER/tests/opt/quick/se/00.hello/power/linux/simple-atomic passed.
* build/POWER/tests/opt/quick/se/00.hello/power/linux/o3-timing passed.
* build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing passed.
* build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/inorder-timing 
passed.
* build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/o3-timing passed.
* build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-timing passed.
* 

Re: [gem5-dev] Review Request 2511: dev: cirrus: Add a simplified device model for the cirrus graphics device.

2014-12-23 Thread Andreas Hansson via gem5-dev

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2511/#review5708
---

Ship it!


Looks fine. Could you mark the issues that are fixed as fixed (or dropped for 
that matter)?

Thanks.

I am still not sure if I like the USE_KVM better, or perhaps having a NullKvm 
object.

- Andreas Hansson


On Dec. 23, 2014, 1:04 a.m., Gabe Black wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2511/
 ---
 
 (Updated Dec. 23, 2014, 1:04 a.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10608:7c8363f44c5b
 ---
 dev: cirrus: Add a simplified device model for the cirrus graphics device.
 
 All control register accesses are dropped on the floor. If used with KVM, the
 frame buffer is set up as a memory like region to keep performance from
 tanking. If a VNC server is configured, the buffer is marked dirty once every
 simulated 100ms.
 
 
 Diffs
 -
 
   src/dev/Cirrus.py PRE-CREATION 
   src/dev/SConscript b3ea7444f4f020332d1c6fe8635aa81f719a 
   src/dev/cirrus.hh PRE-CREATION 
   src/dev/cirrus.cc PRE-CREATION 
 
 Diff: http://reviews.gem5.org/r/2511/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Gabe Black
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Add support for filtering in the PMU

2014-12-23 Thread Andreas Sandberg via gem5-dev
changeset ae5582819481 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=ae5582819481
description:
arm: Add support for filtering in the PMU

This patch adds support for filtering events in the PMU. In order to
do so, it updates the ISADevice base class to forward an ISA pointer
to ISA devices. This enables such devices to access the MiscReg file
to determine the current execution level.

diffstat:

 src/arch/arm/isa.cc|   3 ++
 src/arch/arm/isa_device.cc |  13 
 src/arch/arm/isa_device.hh |   9 +++-
 src/arch/arm/pmu.cc|  49 +++--
 src/arch/arm/pmu.hh|  13 +++-
 5 files changed, 78 insertions(+), 9 deletions(-)

diffs (196 lines):

diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/isa.cc
--- a/src/arch/arm/isa.cc   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/isa.cc   Tue Dec 23 09:31:17 2014 -0500
@@ -139,6 +139,9 @@
 if (!pmu)
 pmu = dummyDevice;
 
+// Give all ISA devices a pointer to this ISA
+pmu-setISA(this);
+
 system = dynamic_castArmSystem *(p-system);
 DPRINTFN(ISA system set to: %p %p\n, system, p-system);
 
diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/isa_device.cc
--- a/src/arch/arm/isa_device.ccTue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/isa_device.ccTue Dec 23 09:31:17 2014 -0500
@@ -44,6 +44,19 @@
 namespace ArmISA
 {
 
+BaseISADevice::BaseISADevice()
+: isa(nullptr)
+{
+}
+
+void
+BaseISADevice::setISA(ISA *_isa)
+{
+assert(_isa);
+
+isa = _isa;
+}
+
 void
 DummyISADevice::setMiscReg(int misc_reg, MiscReg val)
 {
diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/isa_device.hh
--- a/src/arch/arm/isa_device.hhTue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/isa_device.hhTue Dec 23 09:31:17 2014 -0500
@@ -46,6 +46,8 @@
 namespace ArmISA
 {
 
+class ISA;
+
 /**
  * Base class for devices that use the MiscReg interfaces.
  *
@@ -56,9 +58,11 @@
 class BaseISADevice
 {
   public:
-BaseISADevice() {}
+BaseISADevice();
 virtual ~BaseISADevice() {}
 
+virtual void setISA(ISA *isa);
+
 /**
  * Write to a system register belonging to this device.
  *
@@ -74,6 +78,9 @@
  * @return Register value.
  */
 virtual MiscReg readMiscReg(int misc_reg) = 0;
+
+  protected:
+ISA *isa;
 };
 
 /**
diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/pmu.cc
--- a/src/arch/arm/pmu.cc   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/pmu.cc   Tue Dec 23 09:31:17 2014 -0500
@@ -41,6 +41,8 @@
 
 #include arch/arm/pmu.hh
 
+#include arch/arm/isa.hh
+#include arch/arm/utility.hh
 #include base/trace.hh
 #include cpu/base.hh
 #include debug/Checkpoint.hh
@@ -350,12 +352,44 @@
 }
 }
 
+bool
+PMU::isFiltered(const CounterState ctr) const
+{
+assert(isa);
+
+const PMEVTYPER_t filter(ctr.filter);
+const SCR scr(isa-readMiscRegNoEffect(MISCREG_SCR));
+const CPSR cpsr(isa-readMiscRegNoEffect(MISCREG_CPSR));
+const ExceptionLevel el(opModeToEL((OperatingMode)(uint8_t)cpsr.mode));
+const bool secure(inSecureState(scr, cpsr));
+
+switch (el) {
+  case EL0:
+return secure ? filter.u : (filter.u != filter.nsu);
+
+  case EL1:
+return secure ? filter.p : (filter.p != filter.nsk);
+
+  case EL2:
+return !filter.nsh;
+
+  case EL3:
+return filter.p != filter.m;
+
+  default:
+panic(Unexpected execution level in PMU::isFiltered.\n);
+}
+}
+
 void
 PMU::handleEvent(CounterId id, uint64_t delta)
 {
 CounterState ctr(getCounter(id));
 const bool overflowed(reg_pmovsr  (1  id));
 
+if (isFiltered(ctr))
+return;
+
 // Handle the count every 64 cycles mode
 if (id == PMCCNTR  reg_pmcr.d) {
 clock_remainder += delta;
@@ -434,9 +468,8 @@
 return 0;
 
 const CounterState cs(getCounter(id));
-PMEVTYPER_t type(0);
+PMEVTYPER_t type(cs.filter);
 
-// TODO: Re-create filtering settings from counter state
 type.evtCount = cs.eventId;
 
 return type;
@@ -453,12 +486,14 @@
 }
 
 CounterState ctr(getCounter(id));
-// TODO: Handle filtering (both for general purpose counters and
-// the cycle counter)
+const EventTypeId old_event_id(ctr.eventId);
 
-// If PMCCNTR Register, do not change event type. PMCCNTR can count
-// processor cycles only.
-if (id != PMCCNTR) {
+ctr.filter = val;
+
+// If PMCCNTR Register, do not change event type. PMCCNTR can
+// count processor cycles only. If we change the event type, we
+// need to update the probes the counter is using.
+if (id != PMCCNTR  old_event_id != val.evtCount) {
 ctr.eventId = val.evtCount;
 updateCounter(reg_pmselr.sel, ctr);
 }
diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/pmu.hh
--- a/src/arch/arm/pmu.hh   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/pmu.hh   

[gem5-dev] changeset in gem5: arm: Clean up and document decoder API

2014-12-23 Thread Andreas Sandberg via gem5-dev
changeset 5fae03bd840a in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=5fae03bd840a
description:
arm: Clean up and document decoder API

This changeset adds more documentation to the ArmISA::Decoder class
and restructures it slightly to make API groups more obvious.

diffstat:

 src/arch/arm/decoder.cc |   52 +++-
 src/arch/arm/decoder.hh |  197 +++
 2 files changed, 162 insertions(+), 87 deletions(-)

diffs (truncated from 302 to 300 lines):

diff -r ae5582819481 -r 5fae03bd840a src/arch/arm/decoder.cc
--- a/src/arch/arm/decoder.cc   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/decoder.cc   Tue Dec 23 09:31:17 2014 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2012-2013 ARM Limited
+ * Copyright (c) 2012-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -51,6 +51,23 @@
 
 GenericISA::BasicDecodeCache Decoder::defaultCache;
 
+Decoder::Decoder()
+: data(0), fpscrLen(0), fpscrStride(0)
+{
+reset();
+}
+
+void
+Decoder::reset()
+{
+bigThumb = false;
+offset = 0;
+emi = 0;
+instDone = false;
+outOfBytes = true;
+foundIt = false;
+}
+
 void
 Decoder::process()
 {
@@ -118,8 +135,15 @@
 }
 }
 
-//Use this to give data to the decoder. This should be used
-//when there is control flow.
+void
+Decoder::consumeBytes(int numBytes)
+{
+offset += numBytes;
+assert(offset = sizeof(MachInst));
+if (offset == sizeof(MachInst))
+outOfBytes = true;
+}
+
 void
 Decoder::moreBytes(const PCState pc, Addr fetchPC, MachInst inst)
 {
@@ -134,4 +158,26 @@
 process();
 }
 
+StaticInstPtr
+Decoder::decode(ArmISA::PCState pc)
+{
+if (!instDone)
+return NULL;
+
+const int inst_size((!emi.thumb || emi.bigThumb) ? 4 : 2);
+ExtMachInst this_emi(emi);
+
+pc.npc(pc.pc() + inst_size);
+if (foundIt)
+pc.nextItstate(itBits);
+this_emi.itstate = pc.itstate();
+pc.size(inst_size);
+
+emi = 0;
+instDone = false;
+foundIt = false;
+
+return decode(this_emi, pc.instAddr());
 }
+
+}
diff -r ae5582819481 -r 5fae03bd840a src/arch/arm/decoder.hh
--- a/src/arch/arm/decoder.hh   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/decoder.hh   Tue Dec 23 09:31:17 2014 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2013 ARM Limited
+ * Copyright (c) 2013-2014 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -70,100 +70,129 @@
 int fpscrLen;
 int fpscrStride;
 
-  public:
-void reset()
+/// A cache of decoded instruction objects.
+static GenericISA::BasicDecodeCache defaultCache;
+
+/**
+ * Pre-decode an instruction from the current state of the
+ * decoder.
+ */
+void process();
+
+/**
+ * Consume bytes by moving the offset into the data word and
+ * sanity check the results.
+ */
+void consumeBytes(int numBytes);
+
+  public: // Decoder API
+Decoder();
+
+/** Reset the decoders internal state. */
+void reset();
+
+/**
+ * Can the decoder accept more data?
+ *
+ * A CPU model uses this method to determine if the decoder can
+ * accept more data. Note that an instruction can be ready (see
+ * instReady() even if this method returns true.
+ */
+bool needMoreBytes() const { return outOfBytes; }
+
+/**
+ * Is an instruction ready to be decoded?
+ *
+ * CPU models call this method to determine if decode() will
+ * return a new instruction on the next call. It typically only
+ * returns false if the decoder hasn't received enough data to
+ * decode a full instruction.
+ */
+bool instReady() const { return instDone; }
+
+/**
+ * Feed data to the decoder.
+ *
+ * A CPU model uses this interface to load instruction data into
+ * the decoder. Once enough data has been loaded (check with
+ * instReady()), a decoded instruction can be retrieved using
+ * decode(ArmISA::PCState).
+ *
+ * This method is intended to support both fixed-length and
+ * variable-length instructions. Instruction data is fetch in
+ * MachInst blocks (which correspond to the size of a typical
+ * insturction). The method might need to be called multiple times
+ * if the instruction spans multiple blocks, in that case
+ * needMoreBytes() will return true and instReady() will return
+ * false.
+ *
+ * The fetchPC parameter is used to indicate where in memory the
+ * instruction was fetched from. This is should be the same
+ * address as the pc. If fetching multiple blocks, it indicates
+ * where subsequent blocks are fetched from (pc + n *
+ * sizeof(MachInst)).
+ *
+ * @param pc Instruction pointer that we are decoding.
+ * @param fetchPC The address this chunk was fetched from.
+ * @param inst Raw 

[gem5-dev] changeset in gem5: config: Add --memchecker option

2014-12-23 Thread Marco Elver via gem5-dev
changeset 9d0aef7a9b2e in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=9d0aef7a9b2e
description:
config: Add --memchecker option

This patch adds the --memchecker option, to denote that a MemChecker
should be instantiated for the system. The exact usage of the MemChecker
depends on the system configuration.

For now CacheConfig.py makes use of the option, adding MemCheckerMonitor
instances between CPUs and D-Caches.

Note, however, that currently this only provides limited checking on a
running system; other parts of the system, such as I/O devices are not
monitored, and may cause warnings to be issued by the monitor.

diffstat:

 configs/common/CacheConfig.py |  25 +
 configs/common/Options.py |   2 ++
 2 files changed, 27 insertions(+), 0 deletions(-)

diffs (61 lines):

diff -r 6332c9d471a8 -r 9d0aef7a9b2e configs/common/CacheConfig.py
--- a/configs/common/CacheConfig.py Tue Dec 23 09:31:17 2014 -0500
+++ b/configs/common/CacheConfig.py Tue Dec 23 09:31:18 2014 -0500
@@ -76,6 +76,9 @@
 system.l2.cpu_side = system.tol2bus.master
 system.l2.mem_side = system.membus.slave
 
+if options.memchecker:
+system.memchecker = MemChecker()
+
 for i in xrange(options.num_cpus):
 if options.caches:
 icache = icache_class(size=options.l1i_size,
@@ -83,6 +86,21 @@
 dcache = dcache_class(size=options.l1d_size,
   assoc=options.l1d_assoc)
 
+if options.memchecker:
+dcache_mon = MemCheckerMonitor(warn_only=True)
+dcache_real = dcache
+
+# Do not pass the memchecker into the constructor of
+# MemCheckerMonitor, as it would create a copy; we require
+# exactly one MemChecker instance.
+dcache_mon.memchecker = system.memchecker
+
+# Connect monitor
+dcache_mon.mem_side = dcache.cpu_side
+
+# Let CPU connect to monitors
+dcache = dcache_mon
+
 # When connecting the caches, the clock is also inherited
 # from the CPU in question
 if buildEnv['TARGET_ISA'] == 'x86':
@@ -91,6 +109,13 @@
   PageTableWalkerCache())
 else:
 system.cpu[i].addPrivateSplitL1Caches(icache, dcache)
+
+if options.memchecker:
+# The mem_side ports of the caches haven't been connected yet.
+# Make sure connectAllPorts connects the right objects.
+system.cpu[i].dcache = dcache_real
+system.cpu[i].dcache_mon = dcache_mon
+
 system.cpu[i].createInterruptController()
 if options.l2cache:
 system.cpu[i].connectAllPorts(system.tol2bus, system.membus)
diff -r 6332c9d471a8 -r 9d0aef7a9b2e configs/common/Options.py
--- a/configs/common/Options.py Tue Dec 23 09:31:17 2014 -0500
+++ b/configs/common/Options.py Tue Dec 23 09:31:18 2014 -0500
@@ -97,6 +97,8 @@
 parser.add_option(-l, --lpae, action=store_true)
 parser.add_option(-V, --virtualisation, action=store_true)
 
+parser.add_option(--memchecker, action=store_true)
+
 # Cache Options
 parser.add_option(--caches, action=store_true)
 parser.add_option(--l2cache, action=store_true)
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Add MemChecker and MemCheckerMonitor

2014-12-23 Thread Marco Elver via gem5-dev
changeset 6332c9d471a8 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=6332c9d471a8
description:
mem: Add MemChecker and MemCheckerMonitor

This patch adds the MemChecker and MemCheckerMonitor classes. While
MemChecker can be integrated anywhere in the system and is independent,
the most convenient usage is through the MemCheckerMonitor -- this
however, puts limitations on where the MemChecker is able to observe
read/write transactions.

diffstat:

 src/mem/MemChecker.py  |   58 
 src/mem/SConscript |7 +
 src/mem/mem_checker.cc |  343 
 src/mem/mem_checker.hh |  568 +
 src/mem/mem_checker_monitor.cc |  374 ++
 src/mem/mem_checker_monitor.hh |  240 +
 6 files changed, 1590 insertions(+), 0 deletions(-)

diffs (truncated from 1624 to 300 lines):

diff -r 3bba9f2d0c7d -r 6332c9d471a8 src/mem/MemChecker.py
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/mem/MemChecker.py Tue Dec 23 09:31:17 2014 -0500
@@ -0,0 +1,58 @@
+# Copyright (c) 2014 ARM Limited
+# All rights reserved.
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+# Authors: Marco Elver
+
+from MemObject import MemObject
+from m5.SimObject import SimObject
+from m5.params import *
+from m5.proxy import *
+
+class MemChecker(SimObject):
+type = 'MemChecker'
+cxx_header = mem/mem_checker.hh
+
+class MemCheckerMonitor(MemObject):
+type = 'MemCheckerMonitor'
+cxx_header = mem/mem_checker_monitor.hh
+
+# one port in each direction
+master = MasterPort(Master port)
+slave = SlavePort(Slave port)
+cpu_side = SlavePort(Alias for slave)
+mem_side = MasterPort(Alias for master)
+warn_only = Param.Bool(False, Warn about violations only)
+memchecker = Param.MemChecker(Instance shared with other monitors)
+
diff -r 3bba9f2d0c7d -r 6332c9d471a8 src/mem/SConscript
--- a/src/mem/SConscriptTue Dec 23 09:31:17 2014 -0500
+++ b/src/mem/SConscriptTue Dec 23 09:31:17 2014 -0500
@@ -79,6 +79,10 @@
 Source('dramsim2_wrapper.cc')
 Source('dramsim2.cc')
 
+SimObject('MemChecker.py')
+Source('mem_checker.cc')
+Source('mem_checker_monitor.cc')
+
 DebugFlag('AddrRanges')
 DebugFlag('BaseXBar')
 DebugFlag('CoherentXBar')
@@ -99,3 +103,6 @@
 DebugFlag('PacketQueue')
 
 DebugFlag(DRAMSim2)
+
+DebugFlag(MemChecker)
+DebugFlag(MemCheckerMonitor)
diff -r 3bba9f2d0c7d -r 6332c9d471a8 src/mem/mem_checker.cc
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/mem/mem_checker.ccTue Dec 23 09:31:17 2014 -0500
@@ -0,0 +1,343 @@
+/*
+ * Copyright (c) 2014 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware 

[gem5-dev] changeset in gem5: mem: Add a stack distance calculator

2014-12-23 Thread Kanishk Sugand via gem5-dev
changeset da37aec3ed1a in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=da37aec3ed1a
description:
mem: Add a stack distance calculator

This patch adds a stand-alone stack distance calculator. The stack
distance calculator is a passive SimObject that observes the addresses
passed to it. It calculates stack distances (LRU Distances) of
incoming addresses based on the partial sum hierarchy tree algorithm
described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043.

For each transaction a hashtable look-up is performed. At every
non-unique transaction the tree is traversed from the leaf at the
returned index to the root, the old node is deleted from the tree, and
the sums (to the right) are collected and decremented. The collected
sum represets the stack distance of the found node. At every unique
transaction the stack distance is returned as
numeric_limitsuint64::max().

In addition to the basic stack distance calculation, a feature to mark
an old node in the tree is added. This is useful if it is required to
see the reuse pattern. For example, Writebacks to the lower level
(e.g. membus from L2), can be marked instead of being removed from the
stack (isMarked flag of Node set to True). And then later if this same
address is accessed (by L1), the value of the isMarked flag would be
True. This gives some insight on how the Writeback policy of the
lower level affect the read/write accesses in an application.

Debugging is enabled by setting the verify flag to true. Debugging is
implemented using a dummy stack that behaves in a naive way, using STL
vectors. Note that this has a large impact on run time.

diffstat:

 src/mem/SConscript |4 +-
 src/mem/StackDistCalc.py   |   54 +++
 src/mem/stack_dist_calc.cc |  670 +
 src/mem/stack_dist_calc.hh |  454 ++
 4 files changed, 1181 insertions(+), 1 deletions(-)

diffs (truncated from 1218 to 300 lines):

diff -r 9d0aef7a9b2e -r da37aec3ed1a src/mem/SConscript
--- a/src/mem/SConscriptTue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/SConscriptTue Dec 23 09:31:18 2014 -0500
@@ -44,6 +44,7 @@
 SimObject('ExternalSlave.py')
 SimObject('MemObject.py')
 SimObject('SimpleMemory.py')
+SimObject('StackDistCalc.py')
 SimObject('XBar.py')
 
 Source('abstract_mem.cc')
@@ -64,6 +65,7 @@
 Source('physical.cc')
 Source('simple_mem.cc')
 Source('snoop_filter.cc')
+Source('stack_dist_calc.cc')
 Source('tport.cc')
 Source('xbar.cc')
 
@@ -101,7 +103,7 @@
 DebugFlag('MMU')
 DebugFlag('MemoryAccess')
 DebugFlag('PacketQueue')
-
+DebugFlag('StackDist')
 DebugFlag(DRAMSim2)
 
 DebugFlag(MemChecker)
diff -r 9d0aef7a9b2e -r da37aec3ed1a src/mem/StackDistCalc.py
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/mem/StackDistCalc.py  Tue Dec 23 09:31:18 2014 -0500
@@ -0,0 +1,54 @@
+# Copyright (c) 2014 ARM Limited
+# All rights reserved.
+#
+# The license below extends only to copyright in the software and shall
+# not be construed as granting a license to any other intellectual
+# property including but not limited to intellectual property relating
+# to a hardware implementation of the functionality of the software
+# licensed hereunder.  You may use the software subject to the license
+# terms below provided that you ensure that this notice is replicated
+# unmodified and in its entirety in all distributions of the software,
+# modified or unmodified, in source code or in binary form.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are
+# met: redistributions of source code must retain the above copyright
+# notice, this list of conditions and the following disclaimer;
+# redistributions in binary form must reproduce the above copyright
+# notice, this list of conditions and the following disclaimer in the
+# documentation and/or other materials provided with the distribution;
+# neither the name of the copyright holders nor the names of its
+# contributors may be used to endorse or promote products derived from
+# this software without specific prior written permission.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+# AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, 

[gem5-dev] changeset in gem5: arm: Raise an alignment fault if a PC has ill...

2014-12-23 Thread Andreas Sandberg via gem5-dev
changeset 3bba9f2d0c7d in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=3bba9f2d0c7d
description:
arm: Raise an alignment fault if a PC has illegal alignment

We currently don't handle unaligned PCs correctly. There is one check
for unaligned PCs in the TLB when running in aarch64 mode, but this
check does not cover cases where the CPU does not do a TLB lookup when
decoding an instruction (e.g., a branch stays within the same cache
line). Additionally, the Decoder class sometimes throws an assertion
for unaligned PCs which breaks speculation.

This changeset introduces a decoder fault bit field in the ExtMachInst
structure. This field can be used to signal a decoder failure. If set,
the decoder generates an internal gem5fault instruction instead of a
normal instruction. This instruction in turns either panics (fault
type PANIC), returns an PCAlignmentFault (fault type UNALIGNED,
aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32).

The patch causes minor changes to the realview64 regressions, and a
stats bump will follow.

diffstat:

 src/arch/arm/SConscript  |1 +
 src/arch/arm/decoder.cc  |6 +-
 src/arch/arm/insts/pseudo.cc |  101 +++
 src/arch/arm/insts/pseudo.hh |   61 +
 src/arch/arm/isa/bitfields.isa   |2 +
 src/arch/arm/isa/decoder/decoder.isa |   20 +++---
 src/arch/arm/isa/formats/formats.isa |3 +
 src/arch/arm/isa/formats/pseudo.isa  |   44 +++
 src/arch/arm/isa/includes.isa|1 +
 src/arch/arm/tlb.cc  |5 -
 src/arch/arm/types.hh|   14 -
 11 files changed, 242 insertions(+), 16 deletions(-)

diffs (truncated from 360 to 300 lines):

diff -r 5fae03bd840a -r 3bba9f2d0c7d src/arch/arm/SConscript
--- a/src/arch/arm/SConscript   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/SConscript   Tue Dec 23 09:31:17 2014 -0500
@@ -57,6 +57,7 @@
 Source('insts/misc.cc')
 Source('insts/misc64.cc')
 Source('insts/pred_inst.cc')
+Source('insts/pseudo.cc')
 Source('insts/static_inst.cc')
 Source('insts/vfp.cc')
 Source('insts/fplib.cc')
diff -r 5fae03bd840a -r 3bba9f2d0c7d src/arch/arm/decoder.cc
--- a/src/arch/arm/decoder.cc   Tue Dec 23 09:31:17 2014 -0500
+++ b/src/arch/arm/decoder.cc   Tue Dec 23 09:31:17 2014 -0500
@@ -139,7 +139,7 @@
 Decoder::consumeBytes(int numBytes)
 {
 offset += numBytes;
-assert(offset = sizeof(MachInst));
+assert(offset = sizeof(MachInst) || emi.decoderFault);
 if (offset == sizeof(MachInst))
 outOfBytes = true;
 }
@@ -154,6 +154,10 @@
 emi.fpscrLen = fpscrLen;
 emi.fpscrStride = fpscrStride;
 
+const Addr alignment(pc.thumb() ? 0x1 : 0x3);
+emi.decoderFault = static_castuint8_t(
+pc.instAddr()  alignment ? DecoderFault::UNALIGNED : 
DecoderFault::OK);
+
 outOfBytes = false;
 process();
 }
diff -r 5fae03bd840a -r 3bba9f2d0c7d src/arch/arm/insts/pseudo.cc
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/src/arch/arm/insts/pseudo.cc  Tue Dec 23 09:31:17 2014 -0500
@@ -0,0 +1,101 @@
+/*
+ * Copyright (c) 2014 ARM Limited
+ * All rights reserved
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met: redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer;
+ * redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution;
+ * neither the name of the copyright holders nor the names of its
+ * contributors may be used to endorse or promote products derived from
+ * this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR 

[gem5-dev] changeset in gem5: config: Add options to take/resume from SimPo...

2014-12-23 Thread Dam Sunwoo via gem5-dev
changeset 427f988fe6e5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=427f988fe6e5
description:
config: Add options to take/resume from SimPoint checkpoints

More documentation at http://gem5.org/Simpoints

Steps to profile, generate, and use SimPoints with gem5:

1. To profile workload and generate SimPoint BBV file, use the
following option:

--simpoint-profile --simpoint-interval interval length

Requires single Atomic CPU and fastmem.
interval length is in number of instructions.

2. Generate SimPoint analysis using SimPoint 3.2 from UCSD.
(SimPoint 3.2 not included with this flow.)

3. To take gem5 checkpoints based on SimPoint analysis, use the
following option:

--take-simpoint-checkpoint=simpoint file path,weight file
path,interval length,warmup length

simpoint file and weight file is generated by SimPoint analysis
tool from UCSD. SimPoint 3.2 format expected. interval length and
warmup length are in number of instructions.

4. To resume from gem5 SimPoint checkpoints, use the following option:

--restore-simpoint-checkpoint -r N --checkpoint-dir simpoint
checkpoint path

N is (SimPoint index + 1). E.g., -r 1 will resume from SimPoint
#0.

diffstat:

 configs/common/Options.py|5 +
 configs/common/Simulation.py |  180 ++-
 configs/example/fs.py|9 ++
 3 files changed, 192 insertions(+), 2 deletions(-)

diffs (257 lines):

diff -r b3ea7444f466 -r 427f988fe6e5 configs/common/Options.py
--- a/configs/common/Options.py Mon Dec 22 16:49:24 2014 -0800
+++ b/configs/common/Options.py Tue Dec 23 09:31:17 2014 -0500
@@ -150,6 +150,11 @@
   help=Enable basic block profiling for SimPoints)
 parser.add_option(--simpoint-interval, type=int, default=1000,
   help=SimPoint interval in num of instructions)
+parser.add_option(--take-simpoint-checkpoints, action=store, 
type=string,
+help=simpoint file,weight file,interval-length,warmup-length)
+parser.add_option(--restore-simpoint-checkpoint, action=store_true,
+help=restore from a simpoint checkpoint taken with  +
+ --take-simpoint-checkpoints)
 
 # Checkpointing options
 ###Note that performing checkpointing via python script files will override
diff -r b3ea7444f466 -r 427f988fe6e5 configs/common/Simulation.py
--- a/configs/common/Simulation.py  Mon Dec 22 16:49:24 2014 -0800
+++ b/configs/common/Simulation.py  Tue Dec 23 09:31:17 2014 -0500
@@ -140,9 +140,46 @@
 checkpoint_dir = joinpath(cptdir, cpt.%s.%s % (options.bench, inst))
 if not exists(checkpoint_dir):
 fatal(Unable to find checkpoint directory %s, checkpoint_dir)
+
+elif options.restore_simpoint_checkpoint:
+# Restore from SimPoint checkpoints
+# Assumes that the checkpoint dir names are formatted as follows:
+dirs = listdir(cptdir)
+expr = re.compile('cpt\.simpoint_(\d+)_inst_(\d+)' +
+'_weight_([\d\.e\-]+)_interval_(\d+)_warmup_(\d+)')
+cpts = []
+for dir in dirs:
+match = expr.match(dir)
+if match:
+cpts.append(dir)
+cpts.sort()
+
+cpt_num = options.checkpoint_restore
+if cpt_num  len(cpts):
+fatal('Checkpoint %d not found', cpt_num)
+checkpoint_dir = joinpath(cptdir, cpts[cpt_num - 1])
+match = expr.match(cpts[cpt_num - 1])
+if match:
+index = int(match.group(1))
+start_inst = int(match.group(2))
+weight_inst = float(match.group(3))
+interval_length = int(match.group(4))
+warmup_length = int(match.group(5))
+print Resuming from, checkpoint_dir
+simpoint_start_insts = []
+simpoint_start_insts.append(warmup_length)
+simpoint_start_insts.append(warmup_length + interval_length)
+testsys.cpu[0].simpoint_start_insts = simpoint_start_insts
+if testsys.switch_cpus != None:
+testsys.switch_cpus[0].simpoint_start_insts = simpoint_start_insts
+
+print Resuming from SimPoint,
+print #%d, start_inst:%d, weight:%f, interval:%d, warmup:%d % \
+(index, start_inst, weight_inst, interval_length, warmup_length)
+
 else:
 dirs = listdir(cptdir)
-expr = re.compile('cpt\.([0-9]*)')
+expr = re.compile('cpt\.([0-9]+)')
 cpts = []
 for dir in dirs:
 match = expr.match(dir)
@@ -239,6 +276,131 @@
 
 return exit_event
 
+# Set up environment for taking SimPoint checkpoints
+# Expecting SimPoint files generated by SimPoint 3.2
+def parseSimpointAnalysisFile(options, testsys):
+import re
+
+simpoint_filename, weight_filename, interval_length, warmup_length = \
+   

[gem5-dev] changeset in gem5: mem: Ensure DRAM controller is idle when in a...

2014-12-23 Thread Andreas Hansson via gem5-dev
changeset 6dd27a0e0d23 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=6dd27a0e0d23
description:
mem: Ensure DRAM controller is idle when in atomic mode

This patch addresses an issue seen with the KVM CPU where the refresh
events scheduled by the DRAM controller forces the simulator to switch
out of the KVM mode, thus killing performance.

The current patch works around the fact that we currently have no
proper API to inform a SimObject of the mode switches. Instead we rely
on drainResume being called after any switch, and cache the previous
mode locally to be able to decide on appropriate actions.

The switcheroo regression require a minor stats bump as a result.

diffstat:

 src/mem/dram_ctrl.cc |  56 +++
 src/mem/dram_ctrl.hh |  15 -
 2 files changed, 56 insertions(+), 15 deletions(-)

diffs (128 lines):

diff -r bb665366cc00 -r 6dd27a0e0d23 src/mem/dram_ctrl.cc
--- a/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
@@ -57,7 +57,7 @@
 
 DRAMCtrl::DRAMCtrl(const DRAMCtrlParams* p) :
 AbstractMemory(p),
-port(name() + .port, *this),
+port(name() + .port, *this), isTimingMode(false),
 retryRdReq(false), retryWrReq(false),
 busState(READ),
 nextReqEvent(this), respondEvent(this),
@@ -239,20 +239,25 @@
 void
 DRAMCtrl::startup()
 {
-// timestamp offset should be in clock cycles for DRAMPower
-timeStampOffset = divCeil(curTick(), tCK);
+// remember the memory system mode of operation
+isTimingMode = system()-isTimingMode();
 
-// update the start tick for the precharge accounting to the
-// current tick
-for (auto r : ranks) {
-r-startup(curTick() + tREFI - tRP);
+if (isTimingMode) {
+// timestamp offset should be in clock cycles for DRAMPower
+timeStampOffset = divCeil(curTick(), tCK);
+
+// update the start tick for the precharge accounting to the
+// current tick
+for (auto r : ranks) {
+r-startup(curTick() + tREFI - tRP);
+}
+
+// shift the bus busy time sufficiently far ahead that we never
+// have to worry about negative values when computing the time for
+// the next request, this will add an insignificant bubble at the
+// start of simulation
+busBusyUntil = curTick() + tRP + tRCD + tCL;
 }
-
-// shift the bus busy time sufficiently far ahead that we never
-// have to worry about negative values when computing the time for
-// the next request, this will add an insignificant bubble at the
-// start of simulation
-busBusyUntil = curTick() + tRP + tRCD + tCL;
 }
 
 Tick
@@ -1555,6 +1560,12 @@
 }
 
 void
+DRAMCtrl::Rank::suspend()
+{
+deschedule(refreshEvent);
+}
+
+void
 DRAMCtrl::Rank::checkDrainDone()
 {
 // if this rank was waiting to drain it is now able to proceed to
@@ -2197,6 +2208,25 @@
 return count;
 }
 
+void
+DRAMCtrl::drainResume()
+{
+if (!isTimingMode  system()-isTimingMode()) {
+// if we switched to timing mode, kick things into action,
+// and behave as if we restored from a checkpoint
+startup();
+} else if (isTimingMode  !system()-isTimingMode()) {
+// if we switch from timing mode, stop the refresh events to
+// not cause issues with KVM
+for (auto r : ranks) {
+r-suspend();
+}
+}
+
+// update the mode
+isTimingMode = system()-isTimingMode();
+}
+
 DRAMCtrl::MemoryPort::MemoryPort(const std::string name, DRAMCtrl _memory)
 : QueuedSlavePort(name, _memory, queue), queue(_memory, *this),
   memory(_memory)
diff -r bb665366cc00 -r 6dd27a0e0d23 src/mem/dram_ctrl.hh
--- a/src/mem/dram_ctrl.hh  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/dram_ctrl.hh  Tue Dec 23 09:31:18 2014 -0500
@@ -121,6 +121,11 @@
 MemoryPort port;
 
 /**
+ * Remeber if the memory system is in timing mode
+ */
+bool isTimingMode;
+
+/**
  * Remember if we have to retry a request when available.
  */
 bool retryRdReq;
@@ -340,6 +345,11 @@
 void startup(Tick ref_tick);
 
 /**
+ * Stop the refresh events.
+ */
+void suspend();
+
+/**
  * Check if the current rank is available for scheduling.
  *
  * @param Return true if the rank is idle from a refresh point of view
@@ -855,8 +865,9 @@
 virtual BaseSlavePort getSlavePort(const std::string if_name,
 PortID idx = InvalidPortID);
 
-virtual void init();
-virtual void startup();
+virtual void init() M5_ATTR_OVERRIDE;
+virtual void startup() M5_ATTR_OVERRIDE;
+virtual void drainResume() M5_ATTR_OVERRIDE;
 
   protected:
 
___
gem5-dev mailing 

[gem5-dev] changeset in gem5: mem: Rework the structuring of the prefetchers

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset b9646f4546ad in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b9646f4546ad
description:
mem: Rework the structuring of the prefetchers

Re-organizes the prefetcher class structure. Previously the
BasePrefetcher forced multiple assumptions on the prefetchers that
inherited from it. This patch makes the BasePrefetcher class truly
representative of base functionality. For example, the base class no
longer enforces FIFO order. Instead, prefetchers with FIFO requests
(like the existing stride and tagged prefetchers) now inherit from a
new QueuedPrefetcher base class.

Finally, the stride-based prefetcher now assumes a custimizable lookup 
table
(sets/ways) rather than the previous fully associative structure.

diffstat:

 src/mem/cache/cache_impl.hh  |   10 +-
 src/mem/cache/prefetch/Prefetcher.py |   62 ---
 src/mem/cache/prefetch/SConscript|1 +
 src/mem/cache/prefetch/base.cc   |  258 --
 src/mem/cache/prefetch/base.hh   |  139 +-
 src/mem/cache/prefetch/queued.cc |  213 
 src/mem/cache/prefetch/queued.hh |  108 ++
 src/mem/cache/prefetch/stride.cc |  205 +++---
 src/mem/cache/prefetch/stride.hh |   55 +++---
 src/mem/cache/prefetch/tagged.cc |   16 +-
 src/mem/cache/prefetch/tagged.hh |   19 +-
 11 files changed, 599 insertions(+), 487 deletions(-)

diffs (truncated from 1401 to 300 lines):

diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -535,7 +535,7 @@
 bool satisfied = access(pkt, blk, lat, writebacks);
 
 // track time of availability of next prefetch, if any
-Tick next_pf_time = 0;
+Tick next_pf_time = MaxTick;
 
 bool needsResponse = pkt-needsResponse();
 
@@ -548,7 +548,7 @@
 
 // Don't notify on SWPrefetch
 if (!pkt-cmd.isSWPrefetch())
-next_pf_time = prefetcher-notify(pkt, time);
+next_pf_time = prefetcher-notify(pkt);
 }
 
 if (needsResponse) {
@@ -648,7 +648,7 @@
 if (prefetcher) {
 // Don't notify on SWPrefetch
 if (!pkt-cmd.isSWPrefetch())
-next_pf_time = prefetcher-notify(pkt, time);
+next_pf_time = prefetcher-notify(pkt);
 }
 }
 } else {
@@ -688,12 +688,12 @@
 if (prefetcher) {
 // Don't notify on SWPrefetch
 if (!pkt-cmd.isSWPrefetch())
-next_pf_time = prefetcher-notify(pkt, time);
+next_pf_time = prefetcher-notify(pkt);
 }
 }
 }
 
-if (next_pf_time != 0)
+if (next_pf_time != MaxTick)
 requestMemSideBus(Request_PF, std::max(time, next_pf_time));
 
 // copy writebacks to write buffer
diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/prefetch/Prefetcher.py
--- a/src/mem/cache/prefetch/Prefetcher.py  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/prefetch/Prefetcher.py  Tue Dec 23 09:31:18 2014 -0500
@@ -1,4 +1,4 @@
-# Copyright (c) 2012 ARM Limited
+# Copyright (c) 2012, 2014 ARM Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -37,6 +37,7 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #
 # Authors: Ron Dreslinski
+#  Mitch Hayenga
 
 from ClockedObject import ClockedObject
 from m5.params import *
@@ -46,39 +47,46 @@
 type = 'BasePrefetcher'
 abstract = True
 cxx_header = mem/cache/prefetch/base.hh
-size = Param.Int(100,
- Number of entries in the hardware prefetch queue)
-cross_pages = Param.Bool(False,
- Allow prefetches to cross virtual page boundaries)
-serial_squash = Param.Bool(False,
- Squash prefetches with a later time on a subsequent miss)
-degree = Param.Int(1,
- Degree of the prefetch depth)
-latency = Param.Cycles('1', Latency of the prefetcher)
-use_master_id = Param.Bool(True,
- Use the master id to separate calculations of prefetches)
-data_accesses_only = Param.Bool(False,
- Only prefetch on data not on instruction accesses)
-on_miss_only = Param.Bool(False,
- Only prefetch on miss (as opposed to always))
-on_read_only = Param.Bool(False,
- Only prefetch on read requests (write requests ignored))
-on_prefetch = Param.Bool(True,
- Let lower cache prefetcher train on prefetch requests)
-inst_tagged = Param.Bool(True,
- Perform a tagged prefetch for instruction fetches always)
 sys = Param.System(Parent.any, System this prefetcher belongs to)
 
-class 

[gem5-dev] changeset in gem5: mem: Add rank-wise refresh to the DRAM contro...

2014-12-23 Thread Omar Naji via gem5-dev
changeset bb665366cc00 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=bb665366cc00
description:
mem: Add rank-wise refresh to the DRAM controller

This patch adds rank-wise refresh to the controller, as opposed to the
channel-wide refresh currently in place. In essence each rank can be
refreshed independently, and for this to be possible the controller
is extended with a state machine per rank.

Without this patch the data bus is always idle during a refresh, as
all the ranks are refreshing at the same time. With the rank-wise
refresh it is possible to use one rank while another one is
refreshing, and thus the data bus can be kept busy.

The patch introduces a Rank class to encapsulate the state per rank,
and also shifts all the relevant banks, activation tracking etc to the
rank. The arbitration is also updated to consider the state of the rank.

diffstat:

 src/mem/dram_ctrl.cc |  717 --
 src/mem/dram_ctrl.hh |  360 -
 2 files changed, 637 insertions(+), 440 deletions(-)

diffs (truncated from 1621 to 300 lines):

diff -r 471d390943f0 -r bb665366cc00 src/mem/dram_ctrl.cc
--- a/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
@@ -40,6 +40,7 @@
  * Authors: Andreas Hansson
  *  Ani Udipi
  *  Neha Agarwal
+ *  Omar Naji
  */
 
 #include base/bitfield.hh
@@ -59,8 +60,7 @@
 port(name() + .port, *this),
 retryRdReq(false), retryWrReq(false),
 busState(READ),
-nextReqEvent(this), respondEvent(this), activateEvent(this),
-prechargeEvent(this), refreshEvent(this), powerEvent(this),
+nextReqEvent(this), respondEvent(this),
 drainManager(NULL),
 deviceSize(p-device_size),
 deviceBusWidth(p-device_bus_width), burstLength(p-burst_length),
@@ -89,32 +89,19 @@
 maxAccessesPerRow(p-max_accesses_per_row),
 frontendLatency(p-static_frontend_latency),
 backendLatency(p-static_backend_latency),
-busBusyUntil(0), refreshDueAt(0), refreshState(REF_IDLE),
-pwrStateTrans(PWR_IDLE), pwrState(PWR_IDLE), prevArrival(0),
-nextReqTime(0), pwrStateTick(0), numBanksActive(0),
-activeRank(0), timeStampOffset(0)
+busBusyUntil(0), prevArrival(0),
+nextReqTime(0), activeRank(0), timeStampOffset(0)
 {
-// create the bank states based on the dimensions of the ranks and
-// banks
-banks.resize(ranksPerChannel);
+for (int i = 0; i  ranksPerChannel; i++) {
+Rank* rank = new Rank(*this, p);
+ranks.push_back(rank);
 
-//create list of drampower objects. For each rank 1 drampower instance.
-for (int i = 0; i  ranksPerChannel; i++) {
-DRAMPower drampower = DRAMPower(p, false);
-rankPower.emplace_back(drampower);
-}
+rank-actTicks.resize(activationLimit, 0);
+rank-banks.resize(banksPerRank);
+rank-rank = i;
 
-actTicks.resize(ranksPerChannel);
-for (size_t c = 0; c  ranksPerChannel; ++c) {
-banks[c].resize(banksPerRank);
-actTicks[c].resize(activationLimit, 0);
-}
-
-// set the bank indices
-for (int r = 0; r  ranksPerChannel; r++) {
 for (int b = 0; b  banksPerRank; b++) {
-banks[r][b].rank = r;
-banks[r][b].bank = b;
+rank-banks[b].bank = b;
 // GDDR addressing of banks to BG is linear.
 // Here we assume that all DRAM generations address bank groups as
 // follows:
@@ -126,10 +113,10 @@
 //banks 1,5,9,13  are in bank group 1
 //banks 2,6,10,14 are in bank group 2
 //banks 3,7,11,15 are in bank group 3
-banks[r][b].bankgr = b % bankGroupsPerRank;
+rank-banks[b].bankgr = b % bankGroupsPerRank;
 } else {
 // No bank groups; simply assign to bank number
-banks[r][b].bankgr = b;
+rank-banks[b].bankgr = b;
 }
 }
 }
@@ -254,19 +241,18 @@
 {
 // timestamp offset should be in clock cycles for DRAMPower
 timeStampOffset = divCeil(curTick(), tCK);
+
 // update the start tick for the precharge accounting to the
 // current tick
-pwrStateTick = curTick();
+for (auto r : ranks) {
+r-startup(curTick() + tREFI - tRP);
+}
 
 // shift the bus busy time sufficiently far ahead that we never
 // have to worry about negative values when computing the time for
 // the next request, this will add an insignificant bubble at the
 // start of simulation
 busBusyUntil = curTick() + tRP + tRCD + tCL;
-
-// kick off the refresh, and give ourselves enough time to
-// precharge
-schedule(refreshEvent, curTick() + tREFI - tRP);
 }
 
 Tick
@@ -411,7 +397,7 @@
 // later
 uint16_t 

[gem5-dev] changeset in gem5: mem: Fix a bug in the DRAM controller arbitra...

2014-12-23 Thread Omar Naji via gem5-dev
changeset 471d390943f0 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=471d390943f0
description:
mem: Fix a bug in the DRAM controller arbitration

Fix a minor issue that affects multi-rank systems.

diffstat:

 src/mem/dram_ctrl.cc |  12 +---
 1 files changed, 9 insertions(+), 3 deletions(-)

diffs (29 lines):

diff -r 6d4da9dc90a1 -r 471d390943f0 src/mem/dram_ctrl.cc
--- a/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
@@ -1477,7 +1477,13 @@
 // Offset by tRCD to correlate with ACT timing variables
 Tick min_cmd_at = busBusyUntil - tCL - tRCD;
 
-// Prioritize same rank accesses that can issue B2B
+// if we have multiple ranks and all
+// waiting packets are accessing a rank which was previously active
+// then bank_mask_same_rank will be set to a value while bank_mask will
+// remain 0. In this case, the function should return the value of
+// bank_mask_same_rank.
+// else if waiting packets access a rank which was previously active and
+// other ranks, prioritize same rank accesses that can issue B2B
 // Only optimize for same ranks when the command type
 // does not change; do not want to unnecessarily incur tWTR
 //
@@ -1485,8 +1491,8 @@
 // 1) Commands that access the same rank as previous burst
 //and can prep the bank seamlessly.
 // 2) Commands (any rank) with earliest bank prep
-if (!switched_cmd_type  same_rank_match 
-min_act_at_same_rank = min_cmd_at) {
+if ((bank_mask == 0) || (!switched_cmd_type  same_rank_match 
+min_act_at_same_rank = min_cmd_at)) {
 bank_mask = bank_mask_same_rank;
 }
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Add stats to table walker

2014-12-23 Thread Curtis Dunham via gem5-dev
changeset b7bc5b1084a4 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b7bc5b1084a4
description:
arm: Add stats to table walker

This patch adds table walker stats for:
- Walk events
- Instruction vs Data
- Page size histogram
- Wait time and service time histograms
- Pending requests histogram (per cycle) - measures dist. of L
  (p(1..) = how often busy, p(0) = how often idle)
- Squashes, before starting and after completion

diffstat:

 src/arch/arm/table_walker.cc |  186 ++-
 src/arch/arm/table_walker.hh |   31 +++
 src/dev/dma_device.cc|   12 ++-
 src/dev/dma_device.hh|4 +-
 4 files changed, 225 insertions(+), 8 deletions(-)

diffs (truncated from 476 to 300 lines):

diff -r 74834c49fbbe -r b7bc5b1084a4 src/arch/arm/table_walker.cc
--- a/src/arch/arm/table_walker.cc  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/arch/arm/table_walker.cc  Tue Dec 23 09:31:18 2014 -0500
@@ -60,6 +60,8 @@
   stage2Mmu(NULL), isStage2(p-is_stage2), tlb(NULL),
   currState(NULL), pending(false), masterId(p-sys-getMasterId(name())),
   numSquashable(p-num_squash_per_cycle),
+  pendingReqs(0),
+  pendingChangeTick(curTick()),
   doL1DescEvent(this), doL2DescEvent(this),
   doL0LongDescEvent(this), doL1LongDescEvent(this), 
doL2LongDescEvent(this),
   doL3LongDescEvent(this),
@@ -151,6 +153,7 @@
 if (params()-sys-isTimingMode()  currState) {
 delete currState;
 currState = NULL;
+pendingChange();
 }
 }
 
@@ -170,6 +173,8 @@
   bool secure, TLB::ArmTranslationType tranType)
 {
 assert(!(_functional  _timing));
+++statWalks;
+
 WalkerState *savedCurrState = NULL;
 
 if (!currState  !_functional) {
@@ -196,10 +201,13 @@
 // this fault to re-execute the faulting instruction which should clean
 // up everything.
 if (currState-vaddr_tainted == _req-getVaddr()) {
+++statSquashedBefore;
 return std::make_sharedReExec();
 }
 }
+pendingChange();
 
+currState-startTime = curTick();
 currState-tc = _tc;
 currState-aarch64 = opModeIs64(currOpMode(_tc));
 currState-el = currEL(_tc);
@@ -261,6 +269,8 @@
 currState-isFetch = (currState-mode == TLB::Execute);
 currState-isWrite = (currState-mode == TLB::Write);
 
+statRequestOrigin[REQUESTED][currState-isFetch]++;
+
 // We only do a second stage of translation if we're not secure, or in
 // hyp mode, the second stage MMU is enabled, and this table walker
 // instance is the first stage.
@@ -280,6 +290,10 @@
 currState-userTable = true;
 currState-xnTable = false;
 currState-pxnTable = false;
+
+++statWalksLongDescriptor;
+} else {
+++statWalksShortDescriptor;
 }
 
 if (!currState-timing) {
@@ -303,8 +317,10 @@
 if (pending || pendingQueue.size()) {
 pendingQueue.push_back(currState);
 currState = NULL;
+pendingChange();
 } else {
 pending = true;
+pendingChange();
 if (currState-aarch64)
 return processWalkAArch64();
 else if (long_desc_format)
@@ -321,6 +337,7 @@
 {
 assert(!currState);
 assert(pendingQueue.size());
+pendingChange();
 currState = pendingQueue.front();
 
 ExceptionLevel target_el = EL0;
@@ -372,6 +389,7 @@
(currState-transState-squashed() || te)) {
 pendingQueue.pop_front();
 num_squashed++;
+statSquashedBefore++;
 
 DPRINTF(TLB, Squashing table walk for address %#x\n,
   currState-vaddr_tainted);
@@ -383,6 +401,7 @@
 currState-req, currState-tc, currState-mode);
 } else {
 // translate the request now that we know it will work
+statWalkServiceTime.sample(curTick() - currState-startTime);
 tlb-translateTiming(currState-req, currState-tc,
 currState-transState, currState-mode);
 
@@ -402,8 +421,9 @@
 currState = NULL;
 }
 }
+pendingChange();
 
-// if we've still got pending translations schedule more work
+// if we still have pending translations, schedule more work
 nextWalk(tc);
 currState = NULL;
 }
@@ -420,6 +440,8 @@
 currState-vaddr_tainted, currState-ttbcr, 
mbits(currState-vaddr, 31,
   32 - 
currState-ttbcr.n));
 
+statWalkWaitTime.sample(curTick() - currState-startTime);
+
 if (currState-ttbcr.n == 0 || !mbits(currState-vaddr, 31,
   32 - currState-ttbcr.n)) {
 DPRINTF(TLB,  - Selecting TTBR0\n);
@@ -511,6 +533,8 @@
 DPRINTF(TLB, Beginning table walk for address %#x, TTBCR: %#x\n,
 currState-vaddr_tainted, currState-ttbcr);
 
+

[gem5-dev] changeset in gem5: mem: Fix event scheduling issue for prefetches

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 00965520c9f5 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=00965520c9f5
description:
mem: Fix event scheduling issue for prefetches

The cache's MemSidePacketQueue schedules a sendEvent based upon
nextMSHRReadyTime() which is the time when the next MSHR is ready or 
whenever
a future prefetch is ready.  However, a prefetch being ready does not 
guarentee
that it can obtain an MSHR.  So, when all MSHRs are full,
the simulation ends up unnecessiciarly scheduling a sendEvent every 
picosecond
until an MSHR is finally freed and the prefetch can happen.

This patch fixes this by not signaling the prefetch ready time if the 
prefetch
could not be generated.  The event is rescheduled as soon as a MSHR 
becomes
available.

diffstat:

 src/mem/cache/cache_impl.hh |  13 -
 1 files changed, 12 insertions(+), 1 deletions(-)

diffs (30 lines):

diff -r 97aa1ee1c2d9 -r 00965520c9f5 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -1197,6 +1197,15 @@
 if (wasFull  !mq-isFull()) {
 clearBlocked((BlockedCause)mq-index);
 }
+
+// Request the bus for a prefetch if this deallocation freed enough
+// MSHRs for a prefetch to take place
+if (prefetcher  mq == mshrQueue  mshrQueue.canPrefetch()) {
+Tick next_pf_time = std::max(prefetcher-nextPrefetchReadyTime(),
+ curTick());
+if (next_pf_time != MaxTick)
+requestMemSideBus(Request_PF, next_pf_time);
+}
 }
 
 // copy writebacks to write buffer
@@ -1955,7 +1964,9 @@
 Tick nextReady = std::min(mshrQueue.nextMSHRReadyTime(),
   writeBuffer.nextMSHRReadyTime());
 
-if (prefetcher) {
+// Don't signal prefetch ready time if no MSHRs available
+// Will signal once enoguh MSHRs are deallocated
+if (prefetcher  mshrQueue.canPrefetch()) {
 nextReady = std::min(nextReady,
  prefetcher-nextPrefetchReadyTime());
 }
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: tests: Add a regression for the stack distanc...

2014-12-23 Thread Andreas Hansson via gem5-dev
changeset 6d4da9dc90a1 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=6d4da9dc90a1
description:
tests: Add a regression for the stack distance calculator

Re-use the existing traffic generator regression, and enable the stack
distance calculation in the comm monitor, along with the verification
stack.

The traffic generator config is also tuned to not increase the
run-time too much (and actually have some address re-use).

diffstat:

 tests/configs/tgen-simple-mem.py   |6 +-
 tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txt |  312 +
 tests/quick/se/70.tgen/tgen-simple-mem.cfg |   16 +-
 3 files changed, 195 insertions(+), 139 deletions(-)

diffs (truncated from 452 to 300 lines):

diff -r cd8aae15f89a -r 6d4da9dc90a1 tests/configs/tgen-simple-mem.py
--- a/tests/configs/tgen-simple-mem.py  Tue Dec 23 09:31:18 2014 -0500
+++ b/tests/configs/tgen-simple-mem.py  Tue Dec 23 09:31:18 2014 -0500
@@ -54,9 +54,11 @@
 voltage_domain =
 VoltageDomain()))
 
-# add a communication monitor, and also trace all the packets
+# add a communication monitor, and also trace all the packets and
+# calculate and verify stack distance
 system.monitor = CommMonitor(trace_file = monitor.ptrc.gz,
- trace_enable = True)
+ trace_enable = True,
+ stack_dist_calc = StackDistCalc(verify = True))
 
 # connect the traffic generator to the bus via a communication monitor
 system.cpu.port = system.monitor.slave
diff -r cd8aae15f89a -r 6d4da9dc90a1 
tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txt
--- a/tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txtTue Dec 
23 09:31:18 2014 -0500
+++ b/tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txtTue Dec 
23 09:31:18 2014 -0500
@@ -4,37 +4,88 @@
 sim_ticks1000   # 
Number of ticks simulated
 final_tick   1000   # 
Number of ticks from beginning of simulation (restored from checkpoints and 
never reset)
 sim_freq 1   # 
Frequency of simulated ticks
-host_tick_rate11160095249   # 
Simulator tick rate (ticks/s)
-host_mem_usage 262112   # 
Number of bytes of host memory used
-host_seconds 8.96   # 
Real time elapsed on the host
+host_tick_rate31050955853   # 
Simulator tick rate (ticks/s)
+host_mem_usage 209576   # 
Number of bytes of host memory used
+host_seconds 3.22   # 
Real time elapsed on the host
 system.clk_domain.voltage_domain.voltage1   # 
Voltage in Volts
 system.clk_domain.clock  1000   # 
Clock period in ticks
 system.physmem.bytes_read::cpu 64   # 
Number of bytes read from this memory
 system.physmem.bytes_read::total   64   # 
Number of bytes read from this memory
-system.physmem.bytes_written::cpu   213329152   # 
Number of bytes written to this memory
-system.physmem.bytes_written::total 213329152   # 
Number of bytes written to this memory
+system.physmem.bytes_written::cpu  853312   # 
Number of bytes written to this memory
+system.physmem.bytes_written::total853312   # 
Number of bytes written to this memory
 system.physmem.num_reads::cpu   1   # 
Number of read requests responded to by this memory
 system.physmem.num_reads::total 1   # 
Number of read requests responded to by this memory
-system.physmem.num_writes::cpu268   # 
Number of write requests responded to by this memory
-system.physmem.num_writes::total  268   # 
Number of write requests responded to by this memory
+system.physmem.num_writes::cpu  1   # 
Number of write requests responded to by this memory
+system.physmem.num_writes::total1   # 
Number of write requests responded to by this memory
 system.physmem.bw_read::cpu   640   # 
Total read bandwidth from this memory (bytes/s)
 system.physmem.bw_read::total 640 

[gem5-dev] changeset in gem5: mem: Add parameter to reserve MSHR entries fo...

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 0b969a35781f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=0b969a35781f
description:
mem: Add parameter to reserve MSHR entries for demand access

Adds a new parameter that reserves some number of MSHR entries for 
demand
accesses.  This helps prevent prefetchers from taking all MSHRs, 
forcing demand
requests from the CPU to stall.

diffstat:

 src/mem/cache/BaseCache.py  |   1 +
 src/mem/cache/base.cc   |   4 ++--
 src/mem/cache/cache_impl.hh |   2 +-
 src/mem/cache/mshr_queue.cc |   8 +---
 src/mem/cache/mshr_queue.hh |  19 ++-
 5 files changed, 27 insertions(+), 7 deletions(-)

diffs (101 lines):

diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/BaseCache.py
--- a/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500
@@ -54,6 +54,7 @@
 max_miss_count = Param.Counter(0,
 number of misses to handle before calling exit)
 mshrs = Param.Int(number of MSHRs (max outstanding requests))
+demand_mshr_reserve = Param.Int(1, mshrs to reserve for demand access)
 size = Param.MemorySize(capacity in bytes)
 forward_snoops = Param.Bool(True,
 forward snoops from mem side to cpu side)
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/base.cc
--- a/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500
@@ -68,8 +68,8 @@
 BaseCache::BaseCache(const Params *p)
 : MemObject(p),
   cpuSidePort(nullptr), memSidePort(nullptr),
-  mshrQueue(MSHRs, p-mshrs, 4, MSHRQueue_MSHRs),
-  writeBuffer(write buffer, p-write_buffers, p-mshrs+1000,
+  mshrQueue(MSHRs, p-mshrs, 4, p-demand_mshr_reserve, MSHRQueue_MSHRs),
+  writeBuffer(write buffer, p-write_buffers, p-mshrs+1000, 0,
   MSHRQueue_WriteBuffer),
   blkSize(p-system-cacheLineSize()),
   hitLatency(p-hit_latency),
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -1841,7 +1841,7 @@
 
 // fall through... no pending requests.  Try a prefetch.
 assert(!miss_mshr  !write_mshr);
-if (prefetcher  !mshrQueue.isFull()) {
+if (prefetcher  mshrQueue.canPrefetch()) {
 // If we have a miss queue slot, we can try a prefetch
 PacketPtr pkt = prefetcher-getPacket();
 if (pkt) {
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.cc
--- a/src/mem/cache/mshr_queue.cc   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/mshr_queue.cc   Tue Dec 23 09:31:18 2014 -0500
@@ -52,10 +52,12 @@
 using namespace std;
 
 MSHRQueue::MSHRQueue(const std::string _label,
- int num_entries, int reserve, int _index)
+ int num_entries, int reserve, int demand_reserve,
+ int _index)
 : label(_label), numEntries(num_entries + reserve - 1),
-  numReserve(reserve), registers(numEntries),
-  drainManager(NULL), allocated(0), inServiceEntries(0), index(_index)
+  numReserve(reserve), demandReserve(demand_reserve),
+  registers(numEntries), drainManager(NULL), allocated(0),
+  inServiceEntries(0), index(_index)
 {
 for (int i = 0; i  numEntries; ++i) {
 registers[i].queue = this;
diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.hh
--- a/src/mem/cache/mshr_queue.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/mshr_queue.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -77,6 +77,12 @@
  */
 const int numReserve;
 
+/**
+ * The number of entries to reserve for future demand accesses.
+ * Prevent prefetcher from taking all mshr entries
+ */
+const int demandReserve;
+
 /**  MSHR storage. */
 std::vectorMSHR registers;
 /** Holds pointers to all allocated entries. */
@@ -106,9 +112,11 @@
  * @param num_entrys The number of entries in this queue.
  * @param reserve The minimum number of entries needed to satisfy
  * any access.
+ * @param demand_reserve The minimum number of entries needed to satisfy
+ * demand accesses.
  */
 MSHRQueue(const std::string _label, int num_entries, int reserve,
-  int index);
+  int demand_reserve, int index);
 
 /**
  * Find the first MSHR that matches the provided address.
@@ -218,6 +226,15 @@
 }
 
 /**
+ * Returns true if sufficient mshrs for prefetch.
+ * @return True if sufficient mshrs for prefetch.
+ */
+bool canPrefetch() const
+{
+return (allocated  numEntries - (numReserve + demandReserve));
+}
+
+/**
  * Returns the MSHR at the head of the readyList.
  * @return The next request to service.
  */
___
gem5-dev mailing list
gem5-dev@gem5.org

[gem5-dev] changeset in gem5: mem: Hide WriteInvalidate requests from prefe...

2014-12-23 Thread Curtis Dunham via gem5-dev
changeset 7982e539d003 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=7982e539d003
description:
mem: Hide WriteInvalidate requests from prefetchers

Without this tweak, a prefetcher will happily prefetch data that will
promptly be invalidated and overwritten by a WriteInvalidate.

diffstat:

 src/mem/cache/prefetch/base.cc |  4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diffs (21 lines):

diff -r 00965520c9f5 -r 7982e539d003 src/mem/cache/prefetch/base.cc
--- a/src/mem/cache/prefetch/base.ccTue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/prefetch/base.ccTue Dec 23 09:31:19 2014 -0500
@@ -83,7 +83,8 @@
 {
 Addr addr = pkt-getAddr();
 bool fetch = pkt-req-isInstFetch();
-bool read= pkt-isRead();
+bool read = pkt-isRead();
+bool inv = pkt-isInvalidate();
 bool is_secure = pkt-isSecure();
 
 if (pkt-req-isUncacheable()) return false;
@@ -91,6 +92,7 @@
 if (!fetch  !onData) return false;
 if (!fetch  read  !onRead) return false;
 if (!fetch  !read  !onWrite) return false;
+if (!fetch  !read  inv) return false;
 
 if (onMiss) {
 return !inCache(addr, is_secure) 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: config: Expose the DRAM ranks as a command-li...

2014-12-23 Thread Andreas Hansson via gem5-dev
changeset 74834c49fbbe in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=74834c49fbbe
description:
config: Expose the DRAM ranks as a command-line option

This patch gives the user direct influence over the number of DRAM
ranks to make it easier to tune the memory density without affecting
the bandwidth (previously the only means of scaling the device count
was through the number of channels).

The patch also adds some basic sanity checks to ensure that the number
of ranks is a power of two (since we rely on bit slices in the address
decoding).

diffstat:

 configs/common/MemConfig.py |  12 +---
 configs/common/Options.py   |   2 ++
 src/mem/dram_ctrl.cc|   5 +
 3 files changed, 16 insertions(+), 3 deletions(-)

diffs (49 lines):

diff -r 6dd27a0e0d23 -r 74834c49fbbe configs/common/MemConfig.py
--- a/configs/common/MemConfig.py   Tue Dec 23 09:31:18 2014 -0500
+++ b/configs/common/MemConfig.py   Tue Dec 23 09:31:18 2014 -0500
@@ -197,9 +197,15 @@
 # address mapping in the case of a DRAM
 for r in system.mem_ranges:
 for i in xrange(nbr_mem_ctrls):
-mem_ctrls.append(create_mem_ctrl(cls, r, i, nbr_mem_ctrls,
- intlv_bits,
- system.cache_line_size.value))
+mem_ctrl = create_mem_ctrl(cls, r, i, nbr_mem_ctrls, intlv_bits,
+   system.cache_line_size.value)
+# Set the number of ranks based on the command-line
+# options if it was explicitly set
+if issubclass(cls, m5.objects.DRAMCtrl) and \
+options.mem_ranks:
+mem_ctrl.ranks_per_channel = options.mem_ranks
+
+mem_ctrls.append(mem_ctrl)
 
 system.mem_ctrls = mem_ctrls
 
diff -r 6dd27a0e0d23 -r 74834c49fbbe configs/common/Options.py
--- a/configs/common/Options.py Tue Dec 23 09:31:18 2014 -0500
+++ b/configs/common/Options.py Tue Dec 23 09:31:18 2014 -0500
@@ -90,6 +90,8 @@
   help = type of memory to use)
 parser.add_option(--mem-channels, type=int, default=1,
   help = number of memory channels)
+parser.add_option(--mem-ranks, type=int, default=None,
+  help = number of memory ranks per channel)
 parser.add_option(--mem-size, action=store, type=string,
   default=512MB,
   help=Specify the physical memory size (single memory))
diff -r 6dd27a0e0d23 -r 74834c49fbbe src/mem/dram_ctrl.cc
--- a/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/dram_ctrl.cc  Tue Dec 23 09:31:18 2014 -0500
@@ -92,6 +92,11 @@
 busBusyUntil(0), prevArrival(0),
 nextReqTime(0), activeRank(0), timeStampOffset(0)
 {
+// sanity check the ranks since we rely on bit slicing for the
+// address decoding
+fatal_if(!isPowerOf2(ranksPerChannel), DRAM rank count of %d is not 
+ allowed, must be a power of two\n, ranksPerChannel);
+
 for (int i = 0; i  ranksPerChannel; i++) {
 Rank* rank = new Rank(*this, p);
 ranks.push_back(rank);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: stats: Bump stats for decoder, TLB, prefetche...

2014-12-23 Thread Andreas Hansson via gem5-dev
changeset c9b7e0c69f88 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=c9b7e0c69f88
description:
stats: Bump stats for decoder, TLB, prefetcher and DRAM changes

Changes due to speculative execution of an unaligned PC, introduction
of TLB stats, changes and re-work of the prefetcher, and the
introduction of rank-wise refresh in the DRAM controller.

diffstat:

 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-minor/stats.txt
  |  1247 +-
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3-dual/stats.txt  
  |  3653 ++--
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3/stats.txt   
  |  1992 +-
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-switcheroo-full/stats.txt  
  |  2785 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor-dual/stats.txt
  |  3991 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor/stats.txt 
  |  1574 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-checker/stats.txt
  |  2203 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-dual/stats.txt   
  |  5682 
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3/stats.txt
  |  2121 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-full/stats.txt   
  |  3156 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-o3/stats.txt 
  |  3625 +++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-timing/stats.txt 
  |  2139 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-minor-dual/stats.txt  
  |  4572 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-minor/stats.txt   
  |  1853 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3-checker/stats.txt  
  |  2471 ++-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3-dual/stats.txt 
  |  6110 +
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3/stats.txt  
  |  2381 ++-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-full/stats.txt 
  |  3954 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-o3/stats.txt   
  |  4031 +++---
 
tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-timing/stats.txt
 |  3032 ++--
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt   
  |  2513 ++--
 
tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/stats.txt
 |  1848 +-
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-switcheroo-full/stats.txt 
  |  2995 ++--
 tests/long/se/10.mcf/ref/arm/linux/minor-timing/stats.txt  
  |   243 +-
 tests/long/se/10.mcf/ref/arm/linux/o3-timing/stats.txt 
  |  1661 +-
 tests/long/se/10.mcf/ref/arm/linux/simple-atomic/stats.txt 
  |   102 +-
 tests/long/se/10.mcf/ref/arm/linux/simple-timing/stats.txt 
  |   364 +-
 tests/long/se/10.mcf/ref/x86/linux/o3-timing/stats.txt 
  |   605 +-
 tests/long/se/20.parser/ref/alpha/tru64/minor-timing/stats.txt 
  |   959 +-
 tests/long/se/20.parser/ref/arm/linux/minor-timing/stats.txt   
  |   999 +-
 tests/long/se/20.parser/ref/arm/linux/o3-timing/stats.txt  
  |  1814 +-
 tests/long/se/20.parser/ref/arm/linux/simple-atomic/stats.txt  
  |   102 +-
 tests/long/se/20.parser/ref/arm/linux/simple-timing/stats.txt  
  |   356 +-
 tests/long/se/20.parser/ref/x86/linux/o3-timing/stats.txt  
  |  1704 +-
 tests/long/se/30.eon/ref/alpha/tru64/minor-timing/stats.txt
  |   275 +-
 tests/long/se/30.eon/ref/alpha/tru64/o3-timing/stats.txt   
  |   815 +-
 tests/long/se/30.eon/ref/arm/linux/minor-timing/stats.txt  
  |   349 +-
 tests/long/se/30.eon/ref/arm/linux/o3-timing/stats.txt 
  |  1565 +-
 tests/long/se/30.eon/ref/arm/linux/simple-atomic/stats.txt 
  |   102 +-
 tests/long/se/30.eon/ref/arm/linux/simple-timing/stats.txt 
  |   366 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/minor-timing/stats.txt
  |   373 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/o3-timing/stats.txt   
  |  1423 +-
 tests/long/se/40.perlbmk/ref/arm/linux/minor-timing/stats.txt  
  |   515 +-
 tests/long/se/40.perlbmk/ref/arm/linux/o3-timing/stats.txt 
  |  1792 +-
 

[gem5-dev] changeset in gem5: mem: Change prefetcher to use random_mt

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 63edd4a1243f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=63edd4a1243f
description:
mem: Change prefetcher to use random_mt

Prefechers has used rand() to generate random numers previously.

diffstat:

 src/mem/cache/prefetch/stride.cc |  3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diffs (20 lines):

diff -r 7982e539d003 -r 63edd4a1243f src/mem/cache/prefetch/stride.cc
--- a/src/mem/cache/prefetch/stride.cc  Tue Dec 23 09:31:19 2014 -0500
+++ b/src/mem/cache/prefetch/stride.cc  Tue Dec 23 09:31:19 2014 -0500
@@ -46,6 +46,7 @@
  * Stride Prefetcher template instantiations.
  */
 
+#include base/random.hh
 #include debug/HWPrefetch.hh
 #include mem/cache/prefetch/stride.hh
 
@@ -176,7 +177,7 @@
 {
 // Rand replacement for now
 int set = pcHash(pc);
-int way = rand() % pcTableAssoc;
+int way = random_mt.randomint(0, pcTableAssoc - 1);
 
 DPRINTF(HWPrefetch, Victimizing lookup table[%d][%d].\n, set, way);
 return pcTable[master_id][set][way];
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Fix bug relating to writebacks and prefe...

2014-12-23 Thread Mitch Hayenga via gem5-dev
changeset 97aa1ee1c2d9 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=97aa1ee1c2d9
description:
mem: Fix bug relating to writebacks and prefetches

Previously the code commented about an unhandled case where it might be
possible for a writeback to arrive after a prefetch was generated but
before it was sent to the memory system.  I hit that case.  Luckily
the prefetchSquash() logic already in the code handles dropping prefetch
request in certian circumstances.

diffstat:

 src/mem/cache/cache_impl.hh |  12 
 1 files changed, 4 insertions(+), 8 deletions(-)

diffs (29 lines):

diff -r b9646f4546ad -r 97aa1ee1c2d9 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
+++ b/src/mem/cache/cache_impl.hh   Tue Dec 23 09:31:18 2014 -0500
@@ -1892,12 +1892,6 @@
 BlkType *blk = tags-findBlock(mshr-addr, mshr-isSecure);
 
 if (tgt_pkt-cmd == MemCmd::HardPFReq) {
-// It might be possible for a writeback to arrive between
-// the time the prefetch is placed in the MSHRs and when
-// it's selected to send... if so, this assert will catch
-// that, and then we'll have to figure out what to do.
-assert(blk == NULL);
-
 // We need to check the caches above us to verify that
 // they don't have a copy of this block in the dirty state
 // at the moment. Without this check we could get a stale
@@ -1909,8 +1903,10 @@
 cpuSidePort-sendTimingSnoopReq(snoop_pkt);
 
 // Check to see if the prefetch was squashed by an upper cache
-if (snoop_pkt.prefetchSquashed()) {
-DPRINTF(Cache, Prefetch squashed by upper cache.  
+// Or if a writeback arrived between the time the prefetch was
+// placed in the MSHRs and when it was selected to send.
+if (snoop_pkt.prefetchSquashed() || blk != NULL) {
+DPRINTF(Cache, Prefetch squashed by cache.  
Deallocating mshr target %#x.\n, mshr-addr);
 
 // Deallocate the mshr target
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Improved regression categorisation

2014-12-23 Thread Steve Reinhardt via gem5-dev
Thanks for the clarification, Andreas.  Yes, it's a good step; thanks for
doing it.

Steve

On Tue, Dec 23, 2014 at 12:55 AM, Andreas Hansson via gem5-dev 
gem5-dev@gem5.org wrote:

 Hi Steve,

 The 00.hello tests are below 10 seconds and have too high SNR to even make
 it into my report :-), so yes you are right in that they are included in
 the ‘short’ regressions.

 This is definitely an intermediate step, but in any case we benefit from
 having a more sensible classification.

 Thanks for the feedback.

 Andreas

 On 22/12/2014 21:21, Steve Reinhardt via gem5-dev gem5-dev@gem5.org
 wrote:

 Sounds reasonable to me.  I'm not too particular about the naming.
 
 I am surprised that even the o3 hello world tests wouldn't be  180
 seconds though.  It would be nice to have the quick/short/zippy/whatever
 test category exercise o3 at least a little bit.
 
 As far as composing regression paths, I agree it's awkward, but in general
 I use the util/regress script to run batches of tests, then just
 copy/paste
 the ones that fail if I need to re-run them individually.
 
 Of course, all this should still be considered merely stopgap until we get
 a better test system.
 
 Steve
 
 
 
 On Mon, Dec 22, 2014 at 12:45 PM, Gabe Black via gem5-dev
 gem5-dev@gem5.org
  wrote:
 
  I mean quick, medium, slow, not quick, medium, fast.
 
  On Mon, Dec 22, 2014 at 12:44 PM, Gabe Black gabebl...@google.com
 wrote:
 
   I complained about those names a long time ago, and I still think they
   aren't very good. quick and long aren't really on the same scale,
 to
   start with. Something can be quick (a rate) and still take a long
 time.
   Medium is very generic and so isn't on a different axis, but since the
   others aren't lined up it's not as clear as it could be. I would
 suggest
   either:
  
   short, medium, long
  
   or
  
   quick, medium, fast
  
   Preferably the first. We have another collection of options the second
   would collide with, namely fast, opt, debug, etc.
  
   If somebody new came along and saw there were fast/quick and opt/long
   regressions, it wouldn't be obvious what that meant. I also think it's
  not
   easy to compose one of those regression paths since I can never
 remember
   what all the parts are or what order they go in and it's not
 documented
   anywhere obvious. That's a separate problem though.
  
   Gabe
  
   On Mon, Dec 22, 2014 at 2:39 AM, Andreas Hansson via gem5-dev 
   gem5-dev@gem5.org wrote:
  
   Hi all,
  
   At the moment we run roughly 120 regressions, and divide them into
 quick
   and long somewhat arbitrarily. Anyone doing active development and
 using
   quick as their “quick” way of checking that nothing is broken has to
  wait
   more than 10 minutes for some of these regressions to finish, which
  seems a
   bit of a stretch. It turns out the actual regression run times
 follow an
   exponential distribution, ranging from a few seconds up to 10k
 seconds
   (almost 3 hours). I propose we also start using medium (mentioned in
 a
  few
   places), and use a slightly more structured approach in dividing
 them up
   into quick, medium and long.
  
   Here is what I propose:
  
   Quick – anything below 180 seconds, resulting in roughly 40
 regressions
   across all ISAs. The turn around for a quick regression run for NULL,
   ALPHA, ARM and X86 (what I would deem the minimum to run) should
 thus be
   below 5 minutes of wall-clock time. Note that there are plenty
   configurations not covered by this (o3, realview64 etc).
  
   Medium – anything above 180 seconds, but below 1800 seconds, also
   resulting in roughly 40 regressions.
  
   Long – anything 1800 seconds.
  
   With this split, quick could be used as part of any development, to
 get
   an indication that everything is ok. For a sensible coverage before
  posting
   any patch, quick and medium should do the job. The cronjobs we have
  running
   at the moment could thus do 'quick,medium' for the daily one, and
   'quick,medium,long’ for the weekly one.
  
   Thoughts? Ideas? Additional comments?
  
   Thanks,
  
   Andreas
  
  
   -- IMPORTANT NOTICE: The contents of this email and any attachments
 are
   confidential and may also be privileged. If you are not the intended
   recipient, please notify the sender immediately and do not disclose
 the
   contents to any other person, use it for any purpose, or store or
 copy
  the
   information in any medium. Thank you.
  
   ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
   Registered in England  Wales, Company No: 2557590
   ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
  9NJ,
   Registered in England  Wales, Company No: 2548782
   ___
   gem5-dev mailing list
   gem5-dev@gem5.org
   http://m5sim.org/mailman/listinfo/gem5-dev
  
  
  
  ___
  gem5-dev mailing list
  gem5-dev@gem5.org
  

Re: [gem5-dev] Review Request 2591: x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.

2014-12-23 Thread Steve Reinhardt via gem5-dev


 On Dec. 22, 2014, 9:19 p.m., Steve Reinhardt wrote:
  Fine with me, assuming that our implementations of those features are 
  indeed complete.
 
 Gabe Black wrote:
 They aren't, but I think the bits that are missing will trigger warnings.

OK, good enough.  Thanks.


- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2591/#review5706
---


On Dec. 22, 2014, 4:38 p.m., Gabe Black wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2591/
 ---
 
 (Updated Dec. 22, 2014, 4:38 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10608:fc26fb9c80b9
 ---
 x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.
 
 These are for the monitor/mwait instructions, SSSE3, and XSAVE.
 
 
 Diffs
 -
 
   src/arch/x86/cpuid.cc a0cb57e1c072965dcdd51465beff37b264b41424 
 
 Diff: http://reviews.gem5.org/r/2591/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Gabe Black
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] Review Request 2593: syscall emulation: Return correct writev value

2014-12-23 Thread Joel Hestness via gem5-dev

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2593/
---

Review request for Default.


Repository: gem5


Description
---

Changeset 10629:0de378f6af0e
---
syscall emulation: Return correct writev value

According to Linux man pages, if writev is successful, it returns the total
number of bytes written. Otherwise, it returns an error code. Instead of
returning 0, return the result from the actual call to writev in the system
call.


Diffs
-

  src/sim/syscall_emul.hh c9b7e0c69f88673c79c4a033d4425cc1bba00a6d 

Diff: http://reviews.gem5.org/r/2593/diff/


Testing
---

Fixes infinite loop output printing in Delauney Mesh Refinement benchmark
(LonestarGPU), which uses ofstream to buffer output to file.


Thanks,

Joel Hestness

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2593: syscall emulation: Return correct writev value

2014-12-23 Thread Andreas Hansson via gem5-dev

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2593/#review5711
---


I think we settled on syscall_emul

- Andreas Hansson


On Dec. 23, 2014, 2:51 p.m., Joel Hestness wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2593/
 ---
 
 (Updated Dec. 23, 2014, 2:51 p.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 Changeset 10629:0de378f6af0e
 ---
 syscall emulation: Return correct writev value
 
 According to Linux man pages, if writev is successful, it returns the total
 number of bytes written. Otherwise, it returns an error code. Instead of
 returning 0, return the result from the actual call to writev in the system
 call.
 
 
 Diffs
 -
 
   src/sim/syscall_emul.hh c9b7e0c69f88673c79c4a033d4425cc1bba00a6d 
 
 Diff: http://reviews.gem5.org/r/2593/diff/
 
 
 Testing
 ---
 
 Fixes infinite loop output printing in Delauney Mesh Refinement benchmark
 (LonestarGPU), which uses ofstream to buffer output to file.
 
 
 Thanks,
 
 Joel Hestness
 


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev