Re: [gem5-dev] Improved regression categorisation
Hi Steve, The 00.hello tests are below 10 seconds and have too high SNR to even make it into my report :-), so yes you are right in that they are included in the ‘short’ regressions. This is definitely an intermediate step, but in any case we benefit from having a more sensible classification. Thanks for the feedback. Andreas On 22/12/2014 21:21, Steve Reinhardt via gem5-dev gem5-dev@gem5.org wrote: Sounds reasonable to me. I'm not too particular about the naming. I am surprised that even the o3 hello world tests wouldn't be 180 seconds though. It would be nice to have the quick/short/zippy/whatever test category exercise o3 at least a little bit. As far as composing regression paths, I agree it's awkward, but in general I use the util/regress script to run batches of tests, then just copy/paste the ones that fail if I need to re-run them individually. Of course, all this should still be considered merely stopgap until we get a better test system. Steve On Mon, Dec 22, 2014 at 12:45 PM, Gabe Black via gem5-dev gem5-dev@gem5.org wrote: I mean quick, medium, slow, not quick, medium, fast. On Mon, Dec 22, 2014 at 12:44 PM, Gabe Black gabebl...@google.com wrote: I complained about those names a long time ago, and I still think they aren't very good. quick and long aren't really on the same scale, to start with. Something can be quick (a rate) and still take a long time. Medium is very generic and so isn't on a different axis, but since the others aren't lined up it's not as clear as it could be. I would suggest either: short, medium, long or quick, medium, fast Preferably the first. We have another collection of options the second would collide with, namely fast, opt, debug, etc. If somebody new came along and saw there were fast/quick and opt/long regressions, it wouldn't be obvious what that meant. I also think it's not easy to compose one of those regression paths since I can never remember what all the parts are or what order they go in and it's not documented anywhere obvious. That's a separate problem though. Gabe On Mon, Dec 22, 2014 at 2:39 AM, Andreas Hansson via gem5-dev gem5-dev@gem5.org wrote: Hi all, At the moment we run roughly 120 regressions, and divide them into quick and long somewhat arbitrarily. Anyone doing active development and using quick as their “quick” way of checking that nothing is broken has to wait more than 10 minutes for some of these regressions to finish, which seems a bit of a stretch. It turns out the actual regression run times follow an exponential distribution, ranging from a few seconds up to 10k seconds (almost 3 hours). I propose we also start using medium (mentioned in a few places), and use a slightly more structured approach in dividing them up into quick, medium and long. Here is what I propose: Quick – anything below 180 seconds, resulting in roughly 40 regressions across all ISAs. The turn around for a quick regression run for NULL, ALPHA, ARM and X86 (what I would deem the minimum to run) should thus be below 5 minutes of wall-clock time. Note that there are plenty configurations not covered by this (o3, realview64 etc). Medium – anything above 180 seconds, but below 1800 seconds, also resulting in roughly 40 regressions. Long – anything 1800 seconds. With this split, quick could be used as part of any development, to get an indication that everything is ok. For a sensible coverage before posting any patch, quick and medium should do the job. The cronjobs we have running at the moment could thus do 'quick,medium' for the daily one, and 'quick,medium,long’ for the weekly one. Thoughts? Ideas? Additional comments? Thanks, Andreas -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2548782 ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and
[gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-atomic passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing-ruby passed. * build/ALPHA/tests/opt/quick/se/20.eio-short/alpha/eio/simple-atomic passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/inorder-timing passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/o3-timing passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA/tests/opt/quick/se/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed.* build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA/tests/opt/quick/se/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/o3-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MESI_Two_Level passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA/tests/opt/quick/se/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA/tests/opt/quick/se/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/linux/minor-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA/tests/opt/quick/se/00.hello/alpha/tru64/minor-timing passed. * build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest passed. * build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest-filter passed. * build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-dram-ctrl passed. * build/ALPHA/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby passed.* build/ALPHA/tests/opt/quick/fs/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-simple-mem passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MESI_Two_Level passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA/tests/opt/quick/fs/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/POWER/tests/opt/quick/se/00.hello/power/linux/simple-atomic passed. * build/POWER/tests/opt/quick/se/00.hello/power/linux/o3-timing passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/inorder-timing passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/o3-timing passed. * build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-timing passed. *
Re: [gem5-dev] Review Request 2511: dev: cirrus: Add a simplified device model for the cirrus graphics device.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2511/#review5708 --- Ship it! Looks fine. Could you mark the issues that are fixed as fixed (or dropped for that matter)? Thanks. I am still not sure if I like the USE_KVM better, or perhaps having a NullKvm object. - Andreas Hansson On Dec. 23, 2014, 1:04 a.m., Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2511/ --- (Updated Dec. 23, 2014, 1:04 a.m.) Review request for Default. Repository: gem5 Description --- Changeset 10608:7c8363f44c5b --- dev: cirrus: Add a simplified device model for the cirrus graphics device. All control register accesses are dropped on the floor. If used with KVM, the frame buffer is set up as a memory like region to keep performance from tanking. If a VNC server is configured, the buffer is marked dirty once every simulated 100ms. Diffs - src/dev/Cirrus.py PRE-CREATION src/dev/SConscript b3ea7444f4f020332d1c6fe8635aa81f719a src/dev/cirrus.hh PRE-CREATION src/dev/cirrus.cc PRE-CREATION Diff: http://reviews.gem5.org/r/2511/diff/ Testing --- Thanks, Gabe Black ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: arm: Add support for filtering in the PMU
changeset ae5582819481 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=ae5582819481 description: arm: Add support for filtering in the PMU This patch adds support for filtering events in the PMU. In order to do so, it updates the ISADevice base class to forward an ISA pointer to ISA devices. This enables such devices to access the MiscReg file to determine the current execution level. diffstat: src/arch/arm/isa.cc| 3 ++ src/arch/arm/isa_device.cc | 13 src/arch/arm/isa_device.hh | 9 +++- src/arch/arm/pmu.cc| 49 +++-- src/arch/arm/pmu.hh| 13 +++- 5 files changed, 78 insertions(+), 9 deletions(-) diffs (196 lines): diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/isa.cc --- a/src/arch/arm/isa.cc Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/isa.cc Tue Dec 23 09:31:17 2014 -0500 @@ -139,6 +139,9 @@ if (!pmu) pmu = dummyDevice; +// Give all ISA devices a pointer to this ISA +pmu-setISA(this); + system = dynamic_castArmSystem *(p-system); DPRINTFN(ISA system set to: %p %p\n, system, p-system); diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/isa_device.cc --- a/src/arch/arm/isa_device.ccTue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/isa_device.ccTue Dec 23 09:31:17 2014 -0500 @@ -44,6 +44,19 @@ namespace ArmISA { +BaseISADevice::BaseISADevice() +: isa(nullptr) +{ +} + +void +BaseISADevice::setISA(ISA *_isa) +{ +assert(_isa); + +isa = _isa; +} + void DummyISADevice::setMiscReg(int misc_reg, MiscReg val) { diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/isa_device.hh --- a/src/arch/arm/isa_device.hhTue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/isa_device.hhTue Dec 23 09:31:17 2014 -0500 @@ -46,6 +46,8 @@ namespace ArmISA { +class ISA; + /** * Base class for devices that use the MiscReg interfaces. * @@ -56,9 +58,11 @@ class BaseISADevice { public: -BaseISADevice() {} +BaseISADevice(); virtual ~BaseISADevice() {} +virtual void setISA(ISA *isa); + /** * Write to a system register belonging to this device. * @@ -74,6 +78,9 @@ * @return Register value. */ virtual MiscReg readMiscReg(int misc_reg) = 0; + + protected: +ISA *isa; }; /** diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/pmu.cc --- a/src/arch/arm/pmu.cc Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/pmu.cc Tue Dec 23 09:31:17 2014 -0500 @@ -41,6 +41,8 @@ #include arch/arm/pmu.hh +#include arch/arm/isa.hh +#include arch/arm/utility.hh #include base/trace.hh #include cpu/base.hh #include debug/Checkpoint.hh @@ -350,12 +352,44 @@ } } +bool +PMU::isFiltered(const CounterState ctr) const +{ +assert(isa); + +const PMEVTYPER_t filter(ctr.filter); +const SCR scr(isa-readMiscRegNoEffect(MISCREG_SCR)); +const CPSR cpsr(isa-readMiscRegNoEffect(MISCREG_CPSR)); +const ExceptionLevel el(opModeToEL((OperatingMode)(uint8_t)cpsr.mode)); +const bool secure(inSecureState(scr, cpsr)); + +switch (el) { + case EL0: +return secure ? filter.u : (filter.u != filter.nsu); + + case EL1: +return secure ? filter.p : (filter.p != filter.nsk); + + case EL2: +return !filter.nsh; + + case EL3: +return filter.p != filter.m; + + default: +panic(Unexpected execution level in PMU::isFiltered.\n); +} +} + void PMU::handleEvent(CounterId id, uint64_t delta) { CounterState ctr(getCounter(id)); const bool overflowed(reg_pmovsr (1 id)); +if (isFiltered(ctr)) +return; + // Handle the count every 64 cycles mode if (id == PMCCNTR reg_pmcr.d) { clock_remainder += delta; @@ -434,9 +468,8 @@ return 0; const CounterState cs(getCounter(id)); -PMEVTYPER_t type(0); +PMEVTYPER_t type(cs.filter); -// TODO: Re-create filtering settings from counter state type.evtCount = cs.eventId; return type; @@ -453,12 +486,14 @@ } CounterState ctr(getCounter(id)); -// TODO: Handle filtering (both for general purpose counters and -// the cycle counter) +const EventTypeId old_event_id(ctr.eventId); -// If PMCCNTR Register, do not change event type. PMCCNTR can count -// processor cycles only. -if (id != PMCCNTR) { +ctr.filter = val; + +// If PMCCNTR Register, do not change event type. PMCCNTR can +// count processor cycles only. If we change the event type, we +// need to update the probes the counter is using. +if (id != PMCCNTR old_event_id != val.evtCount) { ctr.eventId = val.evtCount; updateCounter(reg_pmselr.sel, ctr); } diff -r 427f988fe6e5 -r ae5582819481 src/arch/arm/pmu.hh --- a/src/arch/arm/pmu.hh Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/pmu.hh
[gem5-dev] changeset in gem5: arm: Clean up and document decoder API
changeset 5fae03bd840a in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=5fae03bd840a description: arm: Clean up and document decoder API This changeset adds more documentation to the ArmISA::Decoder class and restructures it slightly to make API groups more obvious. diffstat: src/arch/arm/decoder.cc | 52 +++- src/arch/arm/decoder.hh | 197 +++ 2 files changed, 162 insertions(+), 87 deletions(-) diffs (truncated from 302 to 300 lines): diff -r ae5582819481 -r 5fae03bd840a src/arch/arm/decoder.cc --- a/src/arch/arm/decoder.cc Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/decoder.cc Tue Dec 23 09:31:17 2014 -0500 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2012-2013 ARM Limited + * Copyright (c) 2012-2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -51,6 +51,23 @@ GenericISA::BasicDecodeCache Decoder::defaultCache; +Decoder::Decoder() +: data(0), fpscrLen(0), fpscrStride(0) +{ +reset(); +} + +void +Decoder::reset() +{ +bigThumb = false; +offset = 0; +emi = 0; +instDone = false; +outOfBytes = true; +foundIt = false; +} + void Decoder::process() { @@ -118,8 +135,15 @@ } } -//Use this to give data to the decoder. This should be used -//when there is control flow. +void +Decoder::consumeBytes(int numBytes) +{ +offset += numBytes; +assert(offset = sizeof(MachInst)); +if (offset == sizeof(MachInst)) +outOfBytes = true; +} + void Decoder::moreBytes(const PCState pc, Addr fetchPC, MachInst inst) { @@ -134,4 +158,26 @@ process(); } +StaticInstPtr +Decoder::decode(ArmISA::PCState pc) +{ +if (!instDone) +return NULL; + +const int inst_size((!emi.thumb || emi.bigThumb) ? 4 : 2); +ExtMachInst this_emi(emi); + +pc.npc(pc.pc() + inst_size); +if (foundIt) +pc.nextItstate(itBits); +this_emi.itstate = pc.itstate(); +pc.size(inst_size); + +emi = 0; +instDone = false; +foundIt = false; + +return decode(this_emi, pc.instAddr()); } + +} diff -r ae5582819481 -r 5fae03bd840a src/arch/arm/decoder.hh --- a/src/arch/arm/decoder.hh Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/decoder.hh Tue Dec 23 09:31:17 2014 -0500 @@ -1,5 +1,5 @@ /* - * Copyright (c) 2013 ARM Limited + * Copyright (c) 2013-2014 ARM Limited * All rights reserved * * The license below extends only to copyright in the software and shall @@ -70,100 +70,129 @@ int fpscrLen; int fpscrStride; - public: -void reset() +/// A cache of decoded instruction objects. +static GenericISA::BasicDecodeCache defaultCache; + +/** + * Pre-decode an instruction from the current state of the + * decoder. + */ +void process(); + +/** + * Consume bytes by moving the offset into the data word and + * sanity check the results. + */ +void consumeBytes(int numBytes); + + public: // Decoder API +Decoder(); + +/** Reset the decoders internal state. */ +void reset(); + +/** + * Can the decoder accept more data? + * + * A CPU model uses this method to determine if the decoder can + * accept more data. Note that an instruction can be ready (see + * instReady() even if this method returns true. + */ +bool needMoreBytes() const { return outOfBytes; } + +/** + * Is an instruction ready to be decoded? + * + * CPU models call this method to determine if decode() will + * return a new instruction on the next call. It typically only + * returns false if the decoder hasn't received enough data to + * decode a full instruction. + */ +bool instReady() const { return instDone; } + +/** + * Feed data to the decoder. + * + * A CPU model uses this interface to load instruction data into + * the decoder. Once enough data has been loaded (check with + * instReady()), a decoded instruction can be retrieved using + * decode(ArmISA::PCState). + * + * This method is intended to support both fixed-length and + * variable-length instructions. Instruction data is fetch in + * MachInst blocks (which correspond to the size of a typical + * insturction). The method might need to be called multiple times + * if the instruction spans multiple blocks, in that case + * needMoreBytes() will return true and instReady() will return + * false. + * + * The fetchPC parameter is used to indicate where in memory the + * instruction was fetched from. This is should be the same + * address as the pc. If fetching multiple blocks, it indicates + * where subsequent blocks are fetched from (pc + n * + * sizeof(MachInst)). + * + * @param pc Instruction pointer that we are decoding. + * @param fetchPC The address this chunk was fetched from. + * @param inst Raw
[gem5-dev] changeset in gem5: config: Add --memchecker option
changeset 9d0aef7a9b2e in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=9d0aef7a9b2e description: config: Add --memchecker option This patch adds the --memchecker option, to denote that a MemChecker should be instantiated for the system. The exact usage of the MemChecker depends on the system configuration. For now CacheConfig.py makes use of the option, adding MemCheckerMonitor instances between CPUs and D-Caches. Note, however, that currently this only provides limited checking on a running system; other parts of the system, such as I/O devices are not monitored, and may cause warnings to be issued by the monitor. diffstat: configs/common/CacheConfig.py | 25 + configs/common/Options.py | 2 ++ 2 files changed, 27 insertions(+), 0 deletions(-) diffs (61 lines): diff -r 6332c9d471a8 -r 9d0aef7a9b2e configs/common/CacheConfig.py --- a/configs/common/CacheConfig.py Tue Dec 23 09:31:17 2014 -0500 +++ b/configs/common/CacheConfig.py Tue Dec 23 09:31:18 2014 -0500 @@ -76,6 +76,9 @@ system.l2.cpu_side = system.tol2bus.master system.l2.mem_side = system.membus.slave +if options.memchecker: +system.memchecker = MemChecker() + for i in xrange(options.num_cpus): if options.caches: icache = icache_class(size=options.l1i_size, @@ -83,6 +86,21 @@ dcache = dcache_class(size=options.l1d_size, assoc=options.l1d_assoc) +if options.memchecker: +dcache_mon = MemCheckerMonitor(warn_only=True) +dcache_real = dcache + +# Do not pass the memchecker into the constructor of +# MemCheckerMonitor, as it would create a copy; we require +# exactly one MemChecker instance. +dcache_mon.memchecker = system.memchecker + +# Connect monitor +dcache_mon.mem_side = dcache.cpu_side + +# Let CPU connect to monitors +dcache = dcache_mon + # When connecting the caches, the clock is also inherited # from the CPU in question if buildEnv['TARGET_ISA'] == 'x86': @@ -91,6 +109,13 @@ PageTableWalkerCache()) else: system.cpu[i].addPrivateSplitL1Caches(icache, dcache) + +if options.memchecker: +# The mem_side ports of the caches haven't been connected yet. +# Make sure connectAllPorts connects the right objects. +system.cpu[i].dcache = dcache_real +system.cpu[i].dcache_mon = dcache_mon + system.cpu[i].createInterruptController() if options.l2cache: system.cpu[i].connectAllPorts(system.tol2bus, system.membus) diff -r 6332c9d471a8 -r 9d0aef7a9b2e configs/common/Options.py --- a/configs/common/Options.py Tue Dec 23 09:31:17 2014 -0500 +++ b/configs/common/Options.py Tue Dec 23 09:31:18 2014 -0500 @@ -97,6 +97,8 @@ parser.add_option(-l, --lpae, action=store_true) parser.add_option(-V, --virtualisation, action=store_true) +parser.add_option(--memchecker, action=store_true) + # Cache Options parser.add_option(--caches, action=store_true) parser.add_option(--l2cache, action=store_true) ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Add MemChecker and MemCheckerMonitor
changeset 6332c9d471a8 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6332c9d471a8 description: mem: Add MemChecker and MemCheckerMonitor This patch adds the MemChecker and MemCheckerMonitor classes. While MemChecker can be integrated anywhere in the system and is independent, the most convenient usage is through the MemCheckerMonitor -- this however, puts limitations on where the MemChecker is able to observe read/write transactions. diffstat: src/mem/MemChecker.py | 58 src/mem/SConscript |7 + src/mem/mem_checker.cc | 343 src/mem/mem_checker.hh | 568 + src/mem/mem_checker_monitor.cc | 374 ++ src/mem/mem_checker_monitor.hh | 240 + 6 files changed, 1590 insertions(+), 0 deletions(-) diffs (truncated from 1624 to 300 lines): diff -r 3bba9f2d0c7d -r 6332c9d471a8 src/mem/MemChecker.py --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/src/mem/MemChecker.py Tue Dec 23 09:31:17 2014 -0500 @@ -0,0 +1,58 @@ +# Copyright (c) 2014 ARM Limited +# All rights reserved. +# +# The license below extends only to copyright in the software and shall +# not be construed as granting a license to any other intellectual +# property including but not limited to intellectual property relating +# to a hardware implementation of the functionality of the software +# licensed hereunder. You may use the software subject to the license +# terms below provided that you ensure that this notice is replicated +# unmodified and in its entirety in all distributions of the software, +# modified or unmodified, in source code or in binary form. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# +# Authors: Marco Elver + +from MemObject import MemObject +from m5.SimObject import SimObject +from m5.params import * +from m5.proxy import * + +class MemChecker(SimObject): +type = 'MemChecker' +cxx_header = mem/mem_checker.hh + +class MemCheckerMonitor(MemObject): +type = 'MemCheckerMonitor' +cxx_header = mem/mem_checker_monitor.hh + +# one port in each direction +master = MasterPort(Master port) +slave = SlavePort(Slave port) +cpu_side = SlavePort(Alias for slave) +mem_side = MasterPort(Alias for master) +warn_only = Param.Bool(False, Warn about violations only) +memchecker = Param.MemChecker(Instance shared with other monitors) + diff -r 3bba9f2d0c7d -r 6332c9d471a8 src/mem/SConscript --- a/src/mem/SConscriptTue Dec 23 09:31:17 2014 -0500 +++ b/src/mem/SConscriptTue Dec 23 09:31:17 2014 -0500 @@ -79,6 +79,10 @@ Source('dramsim2_wrapper.cc') Source('dramsim2.cc') +SimObject('MemChecker.py') +Source('mem_checker.cc') +Source('mem_checker_monitor.cc') + DebugFlag('AddrRanges') DebugFlag('BaseXBar') DebugFlag('CoherentXBar') @@ -99,3 +103,6 @@ DebugFlag('PacketQueue') DebugFlag(DRAMSim2) + +DebugFlag(MemChecker) +DebugFlag(MemCheckerMonitor) diff -r 3bba9f2d0c7d -r 6332c9d471a8 src/mem/mem_checker.cc --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/src/mem/mem_checker.ccTue Dec 23 09:31:17 2014 -0500 @@ -0,0 +1,343 @@ +/* + * Copyright (c) 2014 ARM Limited + * All rights reserved + * + * The license below extends only to copyright in the software and shall + * not be construed as granting a license to any other intellectual + * property including but not limited to intellectual property relating + * to a hardware
[gem5-dev] changeset in gem5: mem: Add a stack distance calculator
changeset da37aec3ed1a in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=da37aec3ed1a description: mem: Add a stack distance calculator This patch adds a stand-alone stack distance calculator. The stack distance calculator is a passive SimObject that observes the addresses passed to it. It calculates stack distances (LRU Distances) of incoming addresses based on the partial sum hierarchy tree algorithm described by Alamasi et al. http://doi.acm.org/10.1145/773039.773043. For each transaction a hashtable look-up is performed. At every non-unique transaction the tree is traversed from the leaf at the returned index to the root, the old node is deleted from the tree, and the sums (to the right) are collected and decremented. The collected sum represets the stack distance of the found node. At every unique transaction the stack distance is returned as numeric_limitsuint64::max(). In addition to the basic stack distance calculation, a feature to mark an old node in the tree is added. This is useful if it is required to see the reuse pattern. For example, Writebacks to the lower level (e.g. membus from L2), can be marked instead of being removed from the stack (isMarked flag of Node set to True). And then later if this same address is accessed (by L1), the value of the isMarked flag would be True. This gives some insight on how the Writeback policy of the lower level affect the read/write accesses in an application. Debugging is enabled by setting the verify flag to true. Debugging is implemented using a dummy stack that behaves in a naive way, using STL vectors. Note that this has a large impact on run time. diffstat: src/mem/SConscript |4 +- src/mem/StackDistCalc.py | 54 +++ src/mem/stack_dist_calc.cc | 670 + src/mem/stack_dist_calc.hh | 454 ++ 4 files changed, 1181 insertions(+), 1 deletions(-) diffs (truncated from 1218 to 300 lines): diff -r 9d0aef7a9b2e -r da37aec3ed1a src/mem/SConscript --- a/src/mem/SConscriptTue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/SConscriptTue Dec 23 09:31:18 2014 -0500 @@ -44,6 +44,7 @@ SimObject('ExternalSlave.py') SimObject('MemObject.py') SimObject('SimpleMemory.py') +SimObject('StackDistCalc.py') SimObject('XBar.py') Source('abstract_mem.cc') @@ -64,6 +65,7 @@ Source('physical.cc') Source('simple_mem.cc') Source('snoop_filter.cc') +Source('stack_dist_calc.cc') Source('tport.cc') Source('xbar.cc') @@ -101,7 +103,7 @@ DebugFlag('MMU') DebugFlag('MemoryAccess') DebugFlag('PacketQueue') - +DebugFlag('StackDist') DebugFlag(DRAMSim2) DebugFlag(MemChecker) diff -r 9d0aef7a9b2e -r da37aec3ed1a src/mem/StackDistCalc.py --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/src/mem/StackDistCalc.py Tue Dec 23 09:31:18 2014 -0500 @@ -0,0 +1,54 @@ +# Copyright (c) 2014 ARM Limited +# All rights reserved. +# +# The license below extends only to copyright in the software and shall +# not be construed as granting a license to any other intellectual +# property including but not limited to intellectual property relating +# to a hardware implementation of the functionality of the software +# licensed hereunder. You may use the software subject to the license +# terms below provided that you ensure that this notice is replicated +# unmodified and in its entirety in all distributions of the software, +# modified or unmodified, in source code or in binary form. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer; +# redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution; +# neither the name of the copyright holders nor the names of its +# contributors may be used to endorse or promote products derived from +# this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT,
[gem5-dev] changeset in gem5: arm: Raise an alignment fault if a PC has ill...
changeset 3bba9f2d0c7d in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=3bba9f2d0c7d description: arm: Raise an alignment fault if a PC has illegal alignment We currently don't handle unaligned PCs correctly. There is one check for unaligned PCs in the TLB when running in aarch64 mode, but this check does not cover cases where the CPU does not do a TLB lookup when decoding an instruction (e.g., a branch stays within the same cache line). Additionally, the Decoder class sometimes throws an assertion for unaligned PCs which breaks speculation. This changeset introduces a decoder fault bit field in the ExtMachInst structure. This field can be used to signal a decoder failure. If set, the decoder generates an internal gem5fault instruction instead of a normal instruction. This instruction in turns either panics (fault type PANIC), returns an PCAlignmentFault (fault type UNALIGNED, aarch64) or PrefetchAbort (fault type UNALIGNED, aarch32). The patch causes minor changes to the realview64 regressions, and a stats bump will follow. diffstat: src/arch/arm/SConscript |1 + src/arch/arm/decoder.cc |6 +- src/arch/arm/insts/pseudo.cc | 101 +++ src/arch/arm/insts/pseudo.hh | 61 + src/arch/arm/isa/bitfields.isa |2 + src/arch/arm/isa/decoder/decoder.isa | 20 +++--- src/arch/arm/isa/formats/formats.isa |3 + src/arch/arm/isa/formats/pseudo.isa | 44 +++ src/arch/arm/isa/includes.isa|1 + src/arch/arm/tlb.cc |5 - src/arch/arm/types.hh| 14 - 11 files changed, 242 insertions(+), 16 deletions(-) diffs (truncated from 360 to 300 lines): diff -r 5fae03bd840a -r 3bba9f2d0c7d src/arch/arm/SConscript --- a/src/arch/arm/SConscript Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/SConscript Tue Dec 23 09:31:17 2014 -0500 @@ -57,6 +57,7 @@ Source('insts/misc.cc') Source('insts/misc64.cc') Source('insts/pred_inst.cc') +Source('insts/pseudo.cc') Source('insts/static_inst.cc') Source('insts/vfp.cc') Source('insts/fplib.cc') diff -r 5fae03bd840a -r 3bba9f2d0c7d src/arch/arm/decoder.cc --- a/src/arch/arm/decoder.cc Tue Dec 23 09:31:17 2014 -0500 +++ b/src/arch/arm/decoder.cc Tue Dec 23 09:31:17 2014 -0500 @@ -139,7 +139,7 @@ Decoder::consumeBytes(int numBytes) { offset += numBytes; -assert(offset = sizeof(MachInst)); +assert(offset = sizeof(MachInst) || emi.decoderFault); if (offset == sizeof(MachInst)) outOfBytes = true; } @@ -154,6 +154,10 @@ emi.fpscrLen = fpscrLen; emi.fpscrStride = fpscrStride; +const Addr alignment(pc.thumb() ? 0x1 : 0x3); +emi.decoderFault = static_castuint8_t( +pc.instAddr() alignment ? DecoderFault::UNALIGNED : DecoderFault::OK); + outOfBytes = false; process(); } diff -r 5fae03bd840a -r 3bba9f2d0c7d src/arch/arm/insts/pseudo.cc --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/src/arch/arm/insts/pseudo.cc Tue Dec 23 09:31:17 2014 -0500 @@ -0,0 +1,101 @@ +/* + * Copyright (c) 2014 ARM Limited + * All rights reserved + * + * The license below extends only to copyright in the software and shall + * not be construed as granting a license to any other intellectual + * property including but not limited to intellectual property relating + * to a hardware implementation of the functionality of the software + * licensed hereunder. You may use the software subject to the license + * terms below provided that you ensure that this notice is replicated + * unmodified and in its entirety in all distributions of the software, + * modified or unmodified, in source code or in binary form. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer; + * redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution; + * neither the name of the copyright holders nor the names of its + * contributors may be used to endorse or promote products derived from + * this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * AS IS AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR
[gem5-dev] changeset in gem5: config: Add options to take/resume from SimPo...
changeset 427f988fe6e5 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=427f988fe6e5 description: config: Add options to take/resume from SimPoint checkpoints More documentation at http://gem5.org/Simpoints Steps to profile, generate, and use SimPoints with gem5: 1. To profile workload and generate SimPoint BBV file, use the following option: --simpoint-profile --simpoint-interval interval length Requires single Atomic CPU and fastmem. interval length is in number of instructions. 2. Generate SimPoint analysis using SimPoint 3.2 from UCSD. (SimPoint 3.2 not included with this flow.) 3. To take gem5 checkpoints based on SimPoint analysis, use the following option: --take-simpoint-checkpoint=simpoint file path,weight file path,interval length,warmup length simpoint file and weight file is generated by SimPoint analysis tool from UCSD. SimPoint 3.2 format expected. interval length and warmup length are in number of instructions. 4. To resume from gem5 SimPoint checkpoints, use the following option: --restore-simpoint-checkpoint -r N --checkpoint-dir simpoint checkpoint path N is (SimPoint index + 1). E.g., -r 1 will resume from SimPoint #0. diffstat: configs/common/Options.py|5 + configs/common/Simulation.py | 180 ++- configs/example/fs.py|9 ++ 3 files changed, 192 insertions(+), 2 deletions(-) diffs (257 lines): diff -r b3ea7444f466 -r 427f988fe6e5 configs/common/Options.py --- a/configs/common/Options.py Mon Dec 22 16:49:24 2014 -0800 +++ b/configs/common/Options.py Tue Dec 23 09:31:17 2014 -0500 @@ -150,6 +150,11 @@ help=Enable basic block profiling for SimPoints) parser.add_option(--simpoint-interval, type=int, default=1000, help=SimPoint interval in num of instructions) +parser.add_option(--take-simpoint-checkpoints, action=store, type=string, +help=simpoint file,weight file,interval-length,warmup-length) +parser.add_option(--restore-simpoint-checkpoint, action=store_true, +help=restore from a simpoint checkpoint taken with + + --take-simpoint-checkpoints) # Checkpointing options ###Note that performing checkpointing via python script files will override diff -r b3ea7444f466 -r 427f988fe6e5 configs/common/Simulation.py --- a/configs/common/Simulation.py Mon Dec 22 16:49:24 2014 -0800 +++ b/configs/common/Simulation.py Tue Dec 23 09:31:17 2014 -0500 @@ -140,9 +140,46 @@ checkpoint_dir = joinpath(cptdir, cpt.%s.%s % (options.bench, inst)) if not exists(checkpoint_dir): fatal(Unable to find checkpoint directory %s, checkpoint_dir) + +elif options.restore_simpoint_checkpoint: +# Restore from SimPoint checkpoints +# Assumes that the checkpoint dir names are formatted as follows: +dirs = listdir(cptdir) +expr = re.compile('cpt\.simpoint_(\d+)_inst_(\d+)' + +'_weight_([\d\.e\-]+)_interval_(\d+)_warmup_(\d+)') +cpts = [] +for dir in dirs: +match = expr.match(dir) +if match: +cpts.append(dir) +cpts.sort() + +cpt_num = options.checkpoint_restore +if cpt_num len(cpts): +fatal('Checkpoint %d not found', cpt_num) +checkpoint_dir = joinpath(cptdir, cpts[cpt_num - 1]) +match = expr.match(cpts[cpt_num - 1]) +if match: +index = int(match.group(1)) +start_inst = int(match.group(2)) +weight_inst = float(match.group(3)) +interval_length = int(match.group(4)) +warmup_length = int(match.group(5)) +print Resuming from, checkpoint_dir +simpoint_start_insts = [] +simpoint_start_insts.append(warmup_length) +simpoint_start_insts.append(warmup_length + interval_length) +testsys.cpu[0].simpoint_start_insts = simpoint_start_insts +if testsys.switch_cpus != None: +testsys.switch_cpus[0].simpoint_start_insts = simpoint_start_insts + +print Resuming from SimPoint, +print #%d, start_inst:%d, weight:%f, interval:%d, warmup:%d % \ +(index, start_inst, weight_inst, interval_length, warmup_length) + else: dirs = listdir(cptdir) -expr = re.compile('cpt\.([0-9]*)') +expr = re.compile('cpt\.([0-9]+)') cpts = [] for dir in dirs: match = expr.match(dir) @@ -239,6 +276,131 @@ return exit_event +# Set up environment for taking SimPoint checkpoints +# Expecting SimPoint files generated by SimPoint 3.2 +def parseSimpointAnalysisFile(options, testsys): +import re + +simpoint_filename, weight_filename, interval_length, warmup_length = \ +
[gem5-dev] changeset in gem5: mem: Ensure DRAM controller is idle when in a...
changeset 6dd27a0e0d23 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6dd27a0e0d23 description: mem: Ensure DRAM controller is idle when in atomic mode This patch addresses an issue seen with the KVM CPU where the refresh events scheduled by the DRAM controller forces the simulator to switch out of the KVM mode, thus killing performance. The current patch works around the fact that we currently have no proper API to inform a SimObject of the mode switches. Instead we rely on drainResume being called after any switch, and cache the previous mode locally to be able to decide on appropriate actions. The switcheroo regression require a minor stats bump as a result. diffstat: src/mem/dram_ctrl.cc | 56 +++ src/mem/dram_ctrl.hh | 15 - 2 files changed, 56 insertions(+), 15 deletions(-) diffs (128 lines): diff -r bb665366cc00 -r 6dd27a0e0d23 src/mem/dram_ctrl.cc --- a/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 @@ -57,7 +57,7 @@ DRAMCtrl::DRAMCtrl(const DRAMCtrlParams* p) : AbstractMemory(p), -port(name() + .port, *this), +port(name() + .port, *this), isTimingMode(false), retryRdReq(false), retryWrReq(false), busState(READ), nextReqEvent(this), respondEvent(this), @@ -239,20 +239,25 @@ void DRAMCtrl::startup() { -// timestamp offset should be in clock cycles for DRAMPower -timeStampOffset = divCeil(curTick(), tCK); +// remember the memory system mode of operation +isTimingMode = system()-isTimingMode(); -// update the start tick for the precharge accounting to the -// current tick -for (auto r : ranks) { -r-startup(curTick() + tREFI - tRP); +if (isTimingMode) { +// timestamp offset should be in clock cycles for DRAMPower +timeStampOffset = divCeil(curTick(), tCK); + +// update the start tick for the precharge accounting to the +// current tick +for (auto r : ranks) { +r-startup(curTick() + tREFI - tRP); +} + +// shift the bus busy time sufficiently far ahead that we never +// have to worry about negative values when computing the time for +// the next request, this will add an insignificant bubble at the +// start of simulation +busBusyUntil = curTick() + tRP + tRCD + tCL; } - -// shift the bus busy time sufficiently far ahead that we never -// have to worry about negative values when computing the time for -// the next request, this will add an insignificant bubble at the -// start of simulation -busBusyUntil = curTick() + tRP + tRCD + tCL; } Tick @@ -1555,6 +1560,12 @@ } void +DRAMCtrl::Rank::suspend() +{ +deschedule(refreshEvent); +} + +void DRAMCtrl::Rank::checkDrainDone() { // if this rank was waiting to drain it is now able to proceed to @@ -2197,6 +2208,25 @@ return count; } +void +DRAMCtrl::drainResume() +{ +if (!isTimingMode system()-isTimingMode()) { +// if we switched to timing mode, kick things into action, +// and behave as if we restored from a checkpoint +startup(); +} else if (isTimingMode !system()-isTimingMode()) { +// if we switch from timing mode, stop the refresh events to +// not cause issues with KVM +for (auto r : ranks) { +r-suspend(); +} +} + +// update the mode +isTimingMode = system()-isTimingMode(); +} + DRAMCtrl::MemoryPort::MemoryPort(const std::string name, DRAMCtrl _memory) : QueuedSlavePort(name, _memory, queue), queue(_memory, *this), memory(_memory) diff -r bb665366cc00 -r 6dd27a0e0d23 src/mem/dram_ctrl.hh --- a/src/mem/dram_ctrl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/dram_ctrl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -121,6 +121,11 @@ MemoryPort port; /** + * Remeber if the memory system is in timing mode + */ +bool isTimingMode; + +/** * Remember if we have to retry a request when available. */ bool retryRdReq; @@ -340,6 +345,11 @@ void startup(Tick ref_tick); /** + * Stop the refresh events. + */ +void suspend(); + +/** * Check if the current rank is available for scheduling. * * @param Return true if the rank is idle from a refresh point of view @@ -855,8 +865,9 @@ virtual BaseSlavePort getSlavePort(const std::string if_name, PortID idx = InvalidPortID); -virtual void init(); -virtual void startup(); +virtual void init() M5_ATTR_OVERRIDE; +virtual void startup() M5_ATTR_OVERRIDE; +virtual void drainResume() M5_ATTR_OVERRIDE; protected: ___ gem5-dev mailing
[gem5-dev] changeset in gem5: mem: Rework the structuring of the prefetchers
changeset b9646f4546ad in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=b9646f4546ad description: mem: Rework the structuring of the prefetchers Re-organizes the prefetcher class structure. Previously the BasePrefetcher forced multiple assumptions on the prefetchers that inherited from it. This patch makes the BasePrefetcher class truly representative of base functionality. For example, the base class no longer enforces FIFO order. Instead, prefetchers with FIFO requests (like the existing stride and tagged prefetchers) now inherit from a new QueuedPrefetcher base class. Finally, the stride-based prefetcher now assumes a custimizable lookup table (sets/ways) rather than the previous fully associative structure. diffstat: src/mem/cache/cache_impl.hh | 10 +- src/mem/cache/prefetch/Prefetcher.py | 62 --- src/mem/cache/prefetch/SConscript|1 + src/mem/cache/prefetch/base.cc | 258 -- src/mem/cache/prefetch/base.hh | 139 +- src/mem/cache/prefetch/queued.cc | 213 src/mem/cache/prefetch/queued.hh | 108 ++ src/mem/cache/prefetch/stride.cc | 205 +++--- src/mem/cache/prefetch/stride.hh | 55 +++--- src/mem/cache/prefetch/tagged.cc | 16 +- src/mem/cache/prefetch/tagged.hh | 19 +- 11 files changed, 599 insertions(+), 487 deletions(-) diffs (truncated from 1401 to 300 lines): diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -535,7 +535,7 @@ bool satisfied = access(pkt, blk, lat, writebacks); // track time of availability of next prefetch, if any -Tick next_pf_time = 0; +Tick next_pf_time = MaxTick; bool needsResponse = pkt-needsResponse(); @@ -548,7 +548,7 @@ // Don't notify on SWPrefetch if (!pkt-cmd.isSWPrefetch()) -next_pf_time = prefetcher-notify(pkt, time); +next_pf_time = prefetcher-notify(pkt); } if (needsResponse) { @@ -648,7 +648,7 @@ if (prefetcher) { // Don't notify on SWPrefetch if (!pkt-cmd.isSWPrefetch()) -next_pf_time = prefetcher-notify(pkt, time); +next_pf_time = prefetcher-notify(pkt); } } } else { @@ -688,12 +688,12 @@ if (prefetcher) { // Don't notify on SWPrefetch if (!pkt-cmd.isSWPrefetch()) -next_pf_time = prefetcher-notify(pkt, time); +next_pf_time = prefetcher-notify(pkt); } } } -if (next_pf_time != 0) +if (next_pf_time != MaxTick) requestMemSideBus(Request_PF, std::max(time, next_pf_time)); // copy writebacks to write buffer diff -r 0b969a35781f -r b9646f4546ad src/mem/cache/prefetch/Prefetcher.py --- a/src/mem/cache/prefetch/Prefetcher.py Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/prefetch/Prefetcher.py Tue Dec 23 09:31:18 2014 -0500 @@ -1,4 +1,4 @@ -# Copyright (c) 2012 ARM Limited +# Copyright (c) 2012, 2014 ARM Limited # All rights reserved. # # The license below extends only to copyright in the software and shall @@ -37,6 +37,7 @@ # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # Authors: Ron Dreslinski +# Mitch Hayenga from ClockedObject import ClockedObject from m5.params import * @@ -46,39 +47,46 @@ type = 'BasePrefetcher' abstract = True cxx_header = mem/cache/prefetch/base.hh -size = Param.Int(100, - Number of entries in the hardware prefetch queue) -cross_pages = Param.Bool(False, - Allow prefetches to cross virtual page boundaries) -serial_squash = Param.Bool(False, - Squash prefetches with a later time on a subsequent miss) -degree = Param.Int(1, - Degree of the prefetch depth) -latency = Param.Cycles('1', Latency of the prefetcher) -use_master_id = Param.Bool(True, - Use the master id to separate calculations of prefetches) -data_accesses_only = Param.Bool(False, - Only prefetch on data not on instruction accesses) -on_miss_only = Param.Bool(False, - Only prefetch on miss (as opposed to always)) -on_read_only = Param.Bool(False, - Only prefetch on read requests (write requests ignored)) -on_prefetch = Param.Bool(True, - Let lower cache prefetcher train on prefetch requests) -inst_tagged = Param.Bool(True, - Perform a tagged prefetch for instruction fetches always) sys = Param.System(Parent.any, System this prefetcher belongs to) -class
[gem5-dev] changeset in gem5: mem: Add rank-wise refresh to the DRAM contro...
changeset bb665366cc00 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=bb665366cc00 description: mem: Add rank-wise refresh to the DRAM controller This patch adds rank-wise refresh to the controller, as opposed to the channel-wide refresh currently in place. In essence each rank can be refreshed independently, and for this to be possible the controller is extended with a state machine per rank. Without this patch the data bus is always idle during a refresh, as all the ranks are refreshing at the same time. With the rank-wise refresh it is possible to use one rank while another one is refreshing, and thus the data bus can be kept busy. The patch introduces a Rank class to encapsulate the state per rank, and also shifts all the relevant banks, activation tracking etc to the rank. The arbitration is also updated to consider the state of the rank. diffstat: src/mem/dram_ctrl.cc | 717 -- src/mem/dram_ctrl.hh | 360 - 2 files changed, 637 insertions(+), 440 deletions(-) diffs (truncated from 1621 to 300 lines): diff -r 471d390943f0 -r bb665366cc00 src/mem/dram_ctrl.cc --- a/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 @@ -40,6 +40,7 @@ * Authors: Andreas Hansson * Ani Udipi * Neha Agarwal + * Omar Naji */ #include base/bitfield.hh @@ -59,8 +60,7 @@ port(name() + .port, *this), retryRdReq(false), retryWrReq(false), busState(READ), -nextReqEvent(this), respondEvent(this), activateEvent(this), -prechargeEvent(this), refreshEvent(this), powerEvent(this), +nextReqEvent(this), respondEvent(this), drainManager(NULL), deviceSize(p-device_size), deviceBusWidth(p-device_bus_width), burstLength(p-burst_length), @@ -89,32 +89,19 @@ maxAccessesPerRow(p-max_accesses_per_row), frontendLatency(p-static_frontend_latency), backendLatency(p-static_backend_latency), -busBusyUntil(0), refreshDueAt(0), refreshState(REF_IDLE), -pwrStateTrans(PWR_IDLE), pwrState(PWR_IDLE), prevArrival(0), -nextReqTime(0), pwrStateTick(0), numBanksActive(0), -activeRank(0), timeStampOffset(0) +busBusyUntil(0), prevArrival(0), +nextReqTime(0), activeRank(0), timeStampOffset(0) { -// create the bank states based on the dimensions of the ranks and -// banks -banks.resize(ranksPerChannel); +for (int i = 0; i ranksPerChannel; i++) { +Rank* rank = new Rank(*this, p); +ranks.push_back(rank); -//create list of drampower objects. For each rank 1 drampower instance. -for (int i = 0; i ranksPerChannel; i++) { -DRAMPower drampower = DRAMPower(p, false); -rankPower.emplace_back(drampower); -} +rank-actTicks.resize(activationLimit, 0); +rank-banks.resize(banksPerRank); +rank-rank = i; -actTicks.resize(ranksPerChannel); -for (size_t c = 0; c ranksPerChannel; ++c) { -banks[c].resize(banksPerRank); -actTicks[c].resize(activationLimit, 0); -} - -// set the bank indices -for (int r = 0; r ranksPerChannel; r++) { for (int b = 0; b banksPerRank; b++) { -banks[r][b].rank = r; -banks[r][b].bank = b; +rank-banks[b].bank = b; // GDDR addressing of banks to BG is linear. // Here we assume that all DRAM generations address bank groups as // follows: @@ -126,10 +113,10 @@ //banks 1,5,9,13 are in bank group 1 //banks 2,6,10,14 are in bank group 2 //banks 3,7,11,15 are in bank group 3 -banks[r][b].bankgr = b % bankGroupsPerRank; +rank-banks[b].bankgr = b % bankGroupsPerRank; } else { // No bank groups; simply assign to bank number -banks[r][b].bankgr = b; +rank-banks[b].bankgr = b; } } } @@ -254,19 +241,18 @@ { // timestamp offset should be in clock cycles for DRAMPower timeStampOffset = divCeil(curTick(), tCK); + // update the start tick for the precharge accounting to the // current tick -pwrStateTick = curTick(); +for (auto r : ranks) { +r-startup(curTick() + tREFI - tRP); +} // shift the bus busy time sufficiently far ahead that we never // have to worry about negative values when computing the time for // the next request, this will add an insignificant bubble at the // start of simulation busBusyUntil = curTick() + tRP + tRCD + tCL; - -// kick off the refresh, and give ourselves enough time to -// precharge -schedule(refreshEvent, curTick() + tREFI - tRP); } Tick @@ -411,7 +397,7 @@ // later uint16_t
[gem5-dev] changeset in gem5: mem: Fix a bug in the DRAM controller arbitra...
changeset 471d390943f0 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=471d390943f0 description: mem: Fix a bug in the DRAM controller arbitration Fix a minor issue that affects multi-rank systems. diffstat: src/mem/dram_ctrl.cc | 12 +--- 1 files changed, 9 insertions(+), 3 deletions(-) diffs (29 lines): diff -r 6d4da9dc90a1 -r 471d390943f0 src/mem/dram_ctrl.cc --- a/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 @@ -1477,7 +1477,13 @@ // Offset by tRCD to correlate with ACT timing variables Tick min_cmd_at = busBusyUntil - tCL - tRCD; -// Prioritize same rank accesses that can issue B2B +// if we have multiple ranks and all +// waiting packets are accessing a rank which was previously active +// then bank_mask_same_rank will be set to a value while bank_mask will +// remain 0. In this case, the function should return the value of +// bank_mask_same_rank. +// else if waiting packets access a rank which was previously active and +// other ranks, prioritize same rank accesses that can issue B2B // Only optimize for same ranks when the command type // does not change; do not want to unnecessarily incur tWTR // @@ -1485,8 +1491,8 @@ // 1) Commands that access the same rank as previous burst //and can prep the bank seamlessly. // 2) Commands (any rank) with earliest bank prep -if (!switched_cmd_type same_rank_match -min_act_at_same_rank = min_cmd_at) { +if ((bank_mask == 0) || (!switched_cmd_type same_rank_match +min_act_at_same_rank = min_cmd_at)) { bank_mask = bank_mask_same_rank; } ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: arm: Add stats to table walker
changeset b7bc5b1084a4 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=b7bc5b1084a4 description: arm: Add stats to table walker This patch adds table walker stats for: - Walk events - Instruction vs Data - Page size histogram - Wait time and service time histograms - Pending requests histogram (per cycle) - measures dist. of L (p(1..) = how often busy, p(0) = how often idle) - Squashes, before starting and after completion diffstat: src/arch/arm/table_walker.cc | 186 ++- src/arch/arm/table_walker.hh | 31 +++ src/dev/dma_device.cc| 12 ++- src/dev/dma_device.hh|4 +- 4 files changed, 225 insertions(+), 8 deletions(-) diffs (truncated from 476 to 300 lines): diff -r 74834c49fbbe -r b7bc5b1084a4 src/arch/arm/table_walker.cc --- a/src/arch/arm/table_walker.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/arch/arm/table_walker.cc Tue Dec 23 09:31:18 2014 -0500 @@ -60,6 +60,8 @@ stage2Mmu(NULL), isStage2(p-is_stage2), tlb(NULL), currState(NULL), pending(false), masterId(p-sys-getMasterId(name())), numSquashable(p-num_squash_per_cycle), + pendingReqs(0), + pendingChangeTick(curTick()), doL1DescEvent(this), doL2DescEvent(this), doL0LongDescEvent(this), doL1LongDescEvent(this), doL2LongDescEvent(this), doL3LongDescEvent(this), @@ -151,6 +153,7 @@ if (params()-sys-isTimingMode() currState) { delete currState; currState = NULL; +pendingChange(); } } @@ -170,6 +173,8 @@ bool secure, TLB::ArmTranslationType tranType) { assert(!(_functional _timing)); +++statWalks; + WalkerState *savedCurrState = NULL; if (!currState !_functional) { @@ -196,10 +201,13 @@ // this fault to re-execute the faulting instruction which should clean // up everything. if (currState-vaddr_tainted == _req-getVaddr()) { +++statSquashedBefore; return std::make_sharedReExec(); } } +pendingChange(); +currState-startTime = curTick(); currState-tc = _tc; currState-aarch64 = opModeIs64(currOpMode(_tc)); currState-el = currEL(_tc); @@ -261,6 +269,8 @@ currState-isFetch = (currState-mode == TLB::Execute); currState-isWrite = (currState-mode == TLB::Write); +statRequestOrigin[REQUESTED][currState-isFetch]++; + // We only do a second stage of translation if we're not secure, or in // hyp mode, the second stage MMU is enabled, and this table walker // instance is the first stage. @@ -280,6 +290,10 @@ currState-userTable = true; currState-xnTable = false; currState-pxnTable = false; + +++statWalksLongDescriptor; +} else { +++statWalksShortDescriptor; } if (!currState-timing) { @@ -303,8 +317,10 @@ if (pending || pendingQueue.size()) { pendingQueue.push_back(currState); currState = NULL; +pendingChange(); } else { pending = true; +pendingChange(); if (currState-aarch64) return processWalkAArch64(); else if (long_desc_format) @@ -321,6 +337,7 @@ { assert(!currState); assert(pendingQueue.size()); +pendingChange(); currState = pendingQueue.front(); ExceptionLevel target_el = EL0; @@ -372,6 +389,7 @@ (currState-transState-squashed() || te)) { pendingQueue.pop_front(); num_squashed++; +statSquashedBefore++; DPRINTF(TLB, Squashing table walk for address %#x\n, currState-vaddr_tainted); @@ -383,6 +401,7 @@ currState-req, currState-tc, currState-mode); } else { // translate the request now that we know it will work +statWalkServiceTime.sample(curTick() - currState-startTime); tlb-translateTiming(currState-req, currState-tc, currState-transState, currState-mode); @@ -402,8 +421,9 @@ currState = NULL; } } +pendingChange(); -// if we've still got pending translations schedule more work +// if we still have pending translations, schedule more work nextWalk(tc); currState = NULL; } @@ -420,6 +440,8 @@ currState-vaddr_tainted, currState-ttbcr, mbits(currState-vaddr, 31, 32 - currState-ttbcr.n)); +statWalkWaitTime.sample(curTick() - currState-startTime); + if (currState-ttbcr.n == 0 || !mbits(currState-vaddr, 31, 32 - currState-ttbcr.n)) { DPRINTF(TLB, - Selecting TTBR0\n); @@ -511,6 +533,8 @@ DPRINTF(TLB, Beginning table walk for address %#x, TTBCR: %#x\n, currState-vaddr_tainted, currState-ttbcr); +
[gem5-dev] changeset in gem5: mem: Fix event scheduling issue for prefetches
changeset 00965520c9f5 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=00965520c9f5 description: mem: Fix event scheduling issue for prefetches The cache's MemSidePacketQueue schedules a sendEvent based upon nextMSHRReadyTime() which is the time when the next MSHR is ready or whenever a future prefetch is ready. However, a prefetch being ready does not guarentee that it can obtain an MSHR. So, when all MSHRs are full, the simulation ends up unnecessiciarly scheduling a sendEvent every picosecond until an MSHR is finally freed and the prefetch can happen. This patch fixes this by not signaling the prefetch ready time if the prefetch could not be generated. The event is rescheduled as soon as a MSHR becomes available. diffstat: src/mem/cache/cache_impl.hh | 13 - 1 files changed, 12 insertions(+), 1 deletions(-) diffs (30 lines): diff -r 97aa1ee1c2d9 -r 00965520c9f5 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -1197,6 +1197,15 @@ if (wasFull !mq-isFull()) { clearBlocked((BlockedCause)mq-index); } + +// Request the bus for a prefetch if this deallocation freed enough +// MSHRs for a prefetch to take place +if (prefetcher mq == mshrQueue mshrQueue.canPrefetch()) { +Tick next_pf_time = std::max(prefetcher-nextPrefetchReadyTime(), + curTick()); +if (next_pf_time != MaxTick) +requestMemSideBus(Request_PF, next_pf_time); +} } // copy writebacks to write buffer @@ -1955,7 +1964,9 @@ Tick nextReady = std::min(mshrQueue.nextMSHRReadyTime(), writeBuffer.nextMSHRReadyTime()); -if (prefetcher) { +// Don't signal prefetch ready time if no MSHRs available +// Will signal once enoguh MSHRs are deallocated +if (prefetcher mshrQueue.canPrefetch()) { nextReady = std::min(nextReady, prefetcher-nextPrefetchReadyTime()); } ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: tests: Add a regression for the stack distanc...
changeset 6d4da9dc90a1 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=6d4da9dc90a1 description: tests: Add a regression for the stack distance calculator Re-use the existing traffic generator regression, and enable the stack distance calculation in the comm monitor, along with the verification stack. The traffic generator config is also tuned to not increase the run-time too much (and actually have some address re-use). diffstat: tests/configs/tgen-simple-mem.py |6 +- tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txt | 312 + tests/quick/se/70.tgen/tgen-simple-mem.cfg | 16 +- 3 files changed, 195 insertions(+), 139 deletions(-) diffs (truncated from 452 to 300 lines): diff -r cd8aae15f89a -r 6d4da9dc90a1 tests/configs/tgen-simple-mem.py --- a/tests/configs/tgen-simple-mem.py Tue Dec 23 09:31:18 2014 -0500 +++ b/tests/configs/tgen-simple-mem.py Tue Dec 23 09:31:18 2014 -0500 @@ -54,9 +54,11 @@ voltage_domain = VoltageDomain())) -# add a communication monitor, and also trace all the packets +# add a communication monitor, and also trace all the packets and +# calculate and verify stack distance system.monitor = CommMonitor(trace_file = monitor.ptrc.gz, - trace_enable = True) + trace_enable = True, + stack_dist_calc = StackDistCalc(verify = True)) # connect the traffic generator to the bus via a communication monitor system.cpu.port = system.monitor.slave diff -r cd8aae15f89a -r 6d4da9dc90a1 tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txt --- a/tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txtTue Dec 23 09:31:18 2014 -0500 +++ b/tests/quick/se/70.tgen/ref/null/none/tgen-simple-mem/stats.txtTue Dec 23 09:31:18 2014 -0500 @@ -4,37 +4,88 @@ sim_ticks1000 # Number of ticks simulated final_tick 1000 # Number of ticks from beginning of simulation (restored from checkpoints and never reset) sim_freq 1 # Frequency of simulated ticks -host_tick_rate11160095249 # Simulator tick rate (ticks/s) -host_mem_usage 262112 # Number of bytes of host memory used -host_seconds 8.96 # Real time elapsed on the host +host_tick_rate31050955853 # Simulator tick rate (ticks/s) +host_mem_usage 209576 # Number of bytes of host memory used +host_seconds 3.22 # Real time elapsed on the host system.clk_domain.voltage_domain.voltage1 # Voltage in Volts system.clk_domain.clock 1000 # Clock period in ticks system.physmem.bytes_read::cpu 64 # Number of bytes read from this memory system.physmem.bytes_read::total 64 # Number of bytes read from this memory -system.physmem.bytes_written::cpu 213329152 # Number of bytes written to this memory -system.physmem.bytes_written::total 213329152 # Number of bytes written to this memory +system.physmem.bytes_written::cpu 853312 # Number of bytes written to this memory +system.physmem.bytes_written::total853312 # Number of bytes written to this memory system.physmem.num_reads::cpu 1 # Number of read requests responded to by this memory system.physmem.num_reads::total 1 # Number of read requests responded to by this memory -system.physmem.num_writes::cpu268 # Number of write requests responded to by this memory -system.physmem.num_writes::total 268 # Number of write requests responded to by this memory +system.physmem.num_writes::cpu 1 # Number of write requests responded to by this memory +system.physmem.num_writes::total1 # Number of write requests responded to by this memory system.physmem.bw_read::cpu 640 # Total read bandwidth from this memory (bytes/s) system.physmem.bw_read::total 640
[gem5-dev] changeset in gem5: mem: Add parameter to reserve MSHR entries fo...
changeset 0b969a35781f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=0b969a35781f description: mem: Add parameter to reserve MSHR entries for demand access Adds a new parameter that reserves some number of MSHR entries for demand accesses. This helps prevent prefetchers from taking all MSHRs, forcing demand requests from the CPU to stall. diffstat: src/mem/cache/BaseCache.py | 1 + src/mem/cache/base.cc | 4 ++-- src/mem/cache/cache_impl.hh | 2 +- src/mem/cache/mshr_queue.cc | 8 +--- src/mem/cache/mshr_queue.hh | 19 ++- 5 files changed, 27 insertions(+), 7 deletions(-) diffs (101 lines): diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/BaseCache.py --- a/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/BaseCache.pyTue Dec 23 09:31:18 2014 -0500 @@ -54,6 +54,7 @@ max_miss_count = Param.Counter(0, number of misses to handle before calling exit) mshrs = Param.Int(number of MSHRs (max outstanding requests)) +demand_mshr_reserve = Param.Int(1, mshrs to reserve for demand access) size = Param.MemorySize(capacity in bytes) forward_snoops = Param.Bool(True, forward snoops from mem side to cpu side) diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/base.cc --- a/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/base.cc Tue Dec 23 09:31:18 2014 -0500 @@ -68,8 +68,8 @@ BaseCache::BaseCache(const Params *p) : MemObject(p), cpuSidePort(nullptr), memSidePort(nullptr), - mshrQueue(MSHRs, p-mshrs, 4, MSHRQueue_MSHRs), - writeBuffer(write buffer, p-write_buffers, p-mshrs+1000, + mshrQueue(MSHRs, p-mshrs, 4, p-demand_mshr_reserve, MSHRQueue_MSHRs), + writeBuffer(write buffer, p-write_buffers, p-mshrs+1000, 0, MSHRQueue_WriteBuffer), blkSize(p-system-cacheLineSize()), hitLatency(p-hit_latency), diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -1841,7 +1841,7 @@ // fall through... no pending requests. Try a prefetch. assert(!miss_mshr !write_mshr); -if (prefetcher !mshrQueue.isFull()) { +if (prefetcher mshrQueue.canPrefetch()) { // If we have a miss queue slot, we can try a prefetch PacketPtr pkt = prefetcher-getPacket(); if (pkt) { diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.cc --- a/src/mem/cache/mshr_queue.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/mshr_queue.cc Tue Dec 23 09:31:18 2014 -0500 @@ -52,10 +52,12 @@ using namespace std; MSHRQueue::MSHRQueue(const std::string _label, - int num_entries, int reserve, int _index) + int num_entries, int reserve, int demand_reserve, + int _index) : label(_label), numEntries(num_entries + reserve - 1), - numReserve(reserve), registers(numEntries), - drainManager(NULL), allocated(0), inServiceEntries(0), index(_index) + numReserve(reserve), demandReserve(demand_reserve), + registers(numEntries), drainManager(NULL), allocated(0), + inServiceEntries(0), index(_index) { for (int i = 0; i numEntries; ++i) { registers[i].queue = this; diff -r b7bc5b1084a4 -r 0b969a35781f src/mem/cache/mshr_queue.hh --- a/src/mem/cache/mshr_queue.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/mshr_queue.hh Tue Dec 23 09:31:18 2014 -0500 @@ -77,6 +77,12 @@ */ const int numReserve; +/** + * The number of entries to reserve for future demand accesses. + * Prevent prefetcher from taking all mshr entries + */ +const int demandReserve; + /** MSHR storage. */ std::vectorMSHR registers; /** Holds pointers to all allocated entries. */ @@ -106,9 +112,11 @@ * @param num_entrys The number of entries in this queue. * @param reserve The minimum number of entries needed to satisfy * any access. + * @param demand_reserve The minimum number of entries needed to satisfy + * demand accesses. */ MSHRQueue(const std::string _label, int num_entries, int reserve, - int index); + int demand_reserve, int index); /** * Find the first MSHR that matches the provided address. @@ -218,6 +226,15 @@ } /** + * Returns true if sufficient mshrs for prefetch. + * @return True if sufficient mshrs for prefetch. + */ +bool canPrefetch() const +{ +return (allocated numEntries - (numReserve + demandReserve)); +} + +/** * Returns the MSHR at the head of the readyList. * @return The next request to service. */ ___ gem5-dev mailing list gem5-dev@gem5.org
[gem5-dev] changeset in gem5: mem: Hide WriteInvalidate requests from prefe...
changeset 7982e539d003 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=7982e539d003 description: mem: Hide WriteInvalidate requests from prefetchers Without this tweak, a prefetcher will happily prefetch data that will promptly be invalidated and overwritten by a WriteInvalidate. diffstat: src/mem/cache/prefetch/base.cc | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diffs (21 lines): diff -r 00965520c9f5 -r 7982e539d003 src/mem/cache/prefetch/base.cc --- a/src/mem/cache/prefetch/base.ccTue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/prefetch/base.ccTue Dec 23 09:31:19 2014 -0500 @@ -83,7 +83,8 @@ { Addr addr = pkt-getAddr(); bool fetch = pkt-req-isInstFetch(); -bool read= pkt-isRead(); +bool read = pkt-isRead(); +bool inv = pkt-isInvalidate(); bool is_secure = pkt-isSecure(); if (pkt-req-isUncacheable()) return false; @@ -91,6 +92,7 @@ if (!fetch !onData) return false; if (!fetch read !onRead) return false; if (!fetch !read !onWrite) return false; +if (!fetch !read inv) return false; if (onMiss) { return !inCache(addr, is_secure) ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: config: Expose the DRAM ranks as a command-li...
changeset 74834c49fbbe in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=74834c49fbbe description: config: Expose the DRAM ranks as a command-line option This patch gives the user direct influence over the number of DRAM ranks to make it easier to tune the memory density without affecting the bandwidth (previously the only means of scaling the device count was through the number of channels). The patch also adds some basic sanity checks to ensure that the number of ranks is a power of two (since we rely on bit slices in the address decoding). diffstat: configs/common/MemConfig.py | 12 +--- configs/common/Options.py | 2 ++ src/mem/dram_ctrl.cc| 5 + 3 files changed, 16 insertions(+), 3 deletions(-) diffs (49 lines): diff -r 6dd27a0e0d23 -r 74834c49fbbe configs/common/MemConfig.py --- a/configs/common/MemConfig.py Tue Dec 23 09:31:18 2014 -0500 +++ b/configs/common/MemConfig.py Tue Dec 23 09:31:18 2014 -0500 @@ -197,9 +197,15 @@ # address mapping in the case of a DRAM for r in system.mem_ranges: for i in xrange(nbr_mem_ctrls): -mem_ctrls.append(create_mem_ctrl(cls, r, i, nbr_mem_ctrls, - intlv_bits, - system.cache_line_size.value)) +mem_ctrl = create_mem_ctrl(cls, r, i, nbr_mem_ctrls, intlv_bits, + system.cache_line_size.value) +# Set the number of ranks based on the command-line +# options if it was explicitly set +if issubclass(cls, m5.objects.DRAMCtrl) and \ +options.mem_ranks: +mem_ctrl.ranks_per_channel = options.mem_ranks + +mem_ctrls.append(mem_ctrl) system.mem_ctrls = mem_ctrls diff -r 6dd27a0e0d23 -r 74834c49fbbe configs/common/Options.py --- a/configs/common/Options.py Tue Dec 23 09:31:18 2014 -0500 +++ b/configs/common/Options.py Tue Dec 23 09:31:18 2014 -0500 @@ -90,6 +90,8 @@ help = type of memory to use) parser.add_option(--mem-channels, type=int, default=1, help = number of memory channels) +parser.add_option(--mem-ranks, type=int, default=None, + help = number of memory ranks per channel) parser.add_option(--mem-size, action=store, type=string, default=512MB, help=Specify the physical memory size (single memory)) diff -r 6dd27a0e0d23 -r 74834c49fbbe src/mem/dram_ctrl.cc --- a/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/dram_ctrl.cc Tue Dec 23 09:31:18 2014 -0500 @@ -92,6 +92,11 @@ busBusyUntil(0), prevArrival(0), nextReqTime(0), activeRank(0), timeStampOffset(0) { +// sanity check the ranks since we rely on bit slicing for the +// address decoding +fatal_if(!isPowerOf2(ranksPerChannel), DRAM rank count of %d is not + allowed, must be a power of two\n, ranksPerChannel); + for (int i = 0; i ranksPerChannel; i++) { Rank* rank = new Rank(*this, p); ranks.push_back(rank); ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: stats: Bump stats for decoder, TLB, prefetche...
changeset c9b7e0c69f88 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=c9b7e0c69f88 description: stats: Bump stats for decoder, TLB, prefetcher and DRAM changes Changes due to speculative execution of an unaligned PC, introduction of TLB stats, changes and re-work of the prefetcher, and the introduction of rank-wise refresh in the DRAM controller. diffstat: tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-minor/stats.txt | 1247 +- tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3-dual/stats.txt | 3653 ++-- tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3/stats.txt | 1992 +- tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-switcheroo-full/stats.txt | 2785 ++-- tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor-dual/stats.txt | 3991 +++--- tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor/stats.txt | 1574 +- tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-checker/stats.txt | 2203 +- tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-dual/stats.txt | 5682 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3/stats.txt | 2121 +- tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-full/stats.txt | 3156 ++-- tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-o3/stats.txt | 3625 +++-- tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-timing/stats.txt | 2139 +- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-minor-dual/stats.txt | 4572 +++--- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-minor/stats.txt | 1853 +- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3-checker/stats.txt | 2471 ++- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3-dual/stats.txt | 6110 + tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3/stats.txt | 2381 ++- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-full/stats.txt | 3954 +++--- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-o3/stats.txt | 4031 +++--- tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-timing/stats.txt | 3032 ++-- tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt | 2513 ++-- tests/long/fs/10.linux-boot/ref/x86/linux/pc-simple-timing-ruby-MESI_Two_Level/stats.txt | 1848 +- tests/long/fs/10.linux-boot/ref/x86/linux/pc-switcheroo-full/stats.txt | 2995 ++-- tests/long/se/10.mcf/ref/arm/linux/minor-timing/stats.txt | 243 +- tests/long/se/10.mcf/ref/arm/linux/o3-timing/stats.txt | 1661 +- tests/long/se/10.mcf/ref/arm/linux/simple-atomic/stats.txt | 102 +- tests/long/se/10.mcf/ref/arm/linux/simple-timing/stats.txt | 364 +- tests/long/se/10.mcf/ref/x86/linux/o3-timing/stats.txt | 605 +- tests/long/se/20.parser/ref/alpha/tru64/minor-timing/stats.txt | 959 +- tests/long/se/20.parser/ref/arm/linux/minor-timing/stats.txt | 999 +- tests/long/se/20.parser/ref/arm/linux/o3-timing/stats.txt | 1814 +- tests/long/se/20.parser/ref/arm/linux/simple-atomic/stats.txt | 102 +- tests/long/se/20.parser/ref/arm/linux/simple-timing/stats.txt | 356 +- tests/long/se/20.parser/ref/x86/linux/o3-timing/stats.txt | 1704 +- tests/long/se/30.eon/ref/alpha/tru64/minor-timing/stats.txt | 275 +- tests/long/se/30.eon/ref/alpha/tru64/o3-timing/stats.txt | 815 +- tests/long/se/30.eon/ref/arm/linux/minor-timing/stats.txt | 349 +- tests/long/se/30.eon/ref/arm/linux/o3-timing/stats.txt | 1565 +- tests/long/se/30.eon/ref/arm/linux/simple-atomic/stats.txt | 102 +- tests/long/se/30.eon/ref/arm/linux/simple-timing/stats.txt | 366 +- tests/long/se/40.perlbmk/ref/alpha/tru64/minor-timing/stats.txt | 373 +- tests/long/se/40.perlbmk/ref/alpha/tru64/o3-timing/stats.txt | 1423 +- tests/long/se/40.perlbmk/ref/arm/linux/minor-timing/stats.txt | 515 +- tests/long/se/40.perlbmk/ref/arm/linux/o3-timing/stats.txt | 1792 +-
[gem5-dev] changeset in gem5: mem: Change prefetcher to use random_mt
changeset 63edd4a1243f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=63edd4a1243f description: mem: Change prefetcher to use random_mt Prefechers has used rand() to generate random numers previously. diffstat: src/mem/cache/prefetch/stride.cc | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diffs (20 lines): diff -r 7982e539d003 -r 63edd4a1243f src/mem/cache/prefetch/stride.cc --- a/src/mem/cache/prefetch/stride.cc Tue Dec 23 09:31:19 2014 -0500 +++ b/src/mem/cache/prefetch/stride.cc Tue Dec 23 09:31:19 2014 -0500 @@ -46,6 +46,7 @@ * Stride Prefetcher template instantiations. */ +#include base/random.hh #include debug/HWPrefetch.hh #include mem/cache/prefetch/stride.hh @@ -176,7 +177,7 @@ { // Rand replacement for now int set = pcHash(pc); -int way = rand() % pcTableAssoc; +int way = random_mt.randomint(0, pcTableAssoc - 1); DPRINTF(HWPrefetch, Victimizing lookup table[%d][%d].\n, set, way); return pcTable[master_id][set][way]; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Fix bug relating to writebacks and prefe...
changeset 97aa1ee1c2d9 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=97aa1ee1c2d9 description: mem: Fix bug relating to writebacks and prefetches Previously the code commented about an unhandled case where it might be possible for a writeback to arrive after a prefetch was generated but before it was sent to the memory system. I hit that case. Luckily the prefetchSquash() logic already in the code handles dropping prefetch request in certian circumstances. diffstat: src/mem/cache/cache_impl.hh | 12 1 files changed, 4 insertions(+), 8 deletions(-) diffs (29 lines): diff -r b9646f4546ad -r 97aa1ee1c2d9 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 +++ b/src/mem/cache/cache_impl.hh Tue Dec 23 09:31:18 2014 -0500 @@ -1892,12 +1892,6 @@ BlkType *blk = tags-findBlock(mshr-addr, mshr-isSecure); if (tgt_pkt-cmd == MemCmd::HardPFReq) { -// It might be possible for a writeback to arrive between -// the time the prefetch is placed in the MSHRs and when -// it's selected to send... if so, this assert will catch -// that, and then we'll have to figure out what to do. -assert(blk == NULL); - // We need to check the caches above us to verify that // they don't have a copy of this block in the dirty state // at the moment. Without this check we could get a stale @@ -1909,8 +1903,10 @@ cpuSidePort-sendTimingSnoopReq(snoop_pkt); // Check to see if the prefetch was squashed by an upper cache -if (snoop_pkt.prefetchSquashed()) { -DPRINTF(Cache, Prefetch squashed by upper cache. +// Or if a writeback arrived between the time the prefetch was +// placed in the MSHRs and when it was selected to send. +if (snoop_pkt.prefetchSquashed() || blk != NULL) { +DPRINTF(Cache, Prefetch squashed by cache. Deallocating mshr target %#x.\n, mshr-addr); // Deallocate the mshr target ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Improved regression categorisation
Thanks for the clarification, Andreas. Yes, it's a good step; thanks for doing it. Steve On Tue, Dec 23, 2014 at 12:55 AM, Andreas Hansson via gem5-dev gem5-dev@gem5.org wrote: Hi Steve, The 00.hello tests are below 10 seconds and have too high SNR to even make it into my report :-), so yes you are right in that they are included in the ‘short’ regressions. This is definitely an intermediate step, but in any case we benefit from having a more sensible classification. Thanks for the feedback. Andreas On 22/12/2014 21:21, Steve Reinhardt via gem5-dev gem5-dev@gem5.org wrote: Sounds reasonable to me. I'm not too particular about the naming. I am surprised that even the o3 hello world tests wouldn't be 180 seconds though. It would be nice to have the quick/short/zippy/whatever test category exercise o3 at least a little bit. As far as composing regression paths, I agree it's awkward, but in general I use the util/regress script to run batches of tests, then just copy/paste the ones that fail if I need to re-run them individually. Of course, all this should still be considered merely stopgap until we get a better test system. Steve On Mon, Dec 22, 2014 at 12:45 PM, Gabe Black via gem5-dev gem5-dev@gem5.org wrote: I mean quick, medium, slow, not quick, medium, fast. On Mon, Dec 22, 2014 at 12:44 PM, Gabe Black gabebl...@google.com wrote: I complained about those names a long time ago, and I still think they aren't very good. quick and long aren't really on the same scale, to start with. Something can be quick (a rate) and still take a long time. Medium is very generic and so isn't on a different axis, but since the others aren't lined up it's not as clear as it could be. I would suggest either: short, medium, long or quick, medium, fast Preferably the first. We have another collection of options the second would collide with, namely fast, opt, debug, etc. If somebody new came along and saw there were fast/quick and opt/long regressions, it wouldn't be obvious what that meant. I also think it's not easy to compose one of those regression paths since I can never remember what all the parts are or what order they go in and it's not documented anywhere obvious. That's a separate problem though. Gabe On Mon, Dec 22, 2014 at 2:39 AM, Andreas Hansson via gem5-dev gem5-dev@gem5.org wrote: Hi all, At the moment we run roughly 120 regressions, and divide them into quick and long somewhat arbitrarily. Anyone doing active development and using quick as their “quick” way of checking that nothing is broken has to wait more than 10 minutes for some of these regressions to finish, which seems a bit of a stretch. It turns out the actual regression run times follow an exponential distribution, ranging from a few seconds up to 10k seconds (almost 3 hours). I propose we also start using medium (mentioned in a few places), and use a slightly more structured approach in dividing them up into quick, medium and long. Here is what I propose: Quick – anything below 180 seconds, resulting in roughly 40 regressions across all ISAs. The turn around for a quick regression run for NULL, ALPHA, ARM and X86 (what I would deem the minimum to run) should thus be below 5 minutes of wall-clock time. Note that there are plenty configurations not covered by this (o3, realview64 etc). Medium – anything above 180 seconds, but below 1800 seconds, also resulting in roughly 40 regressions. Long – anything 1800 seconds. With this split, quick could be used as part of any development, to get an indication that everything is ok. For a sensible coverage before posting any patch, quick and medium should do the job. The cronjobs we have running at the moment could thus do 'quick,medium' for the daily one, and 'quick,medium,long’ for the weekly one. Thoughts? Ideas? Additional comments? Thanks, Andreas -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England Wales, Company No: 2548782 ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@gem5.org
Re: [gem5-dev] Review Request 2591: x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield.
On Dec. 22, 2014, 9:19 p.m., Steve Reinhardt wrote: Fine with me, assuming that our implementations of those features are indeed complete. Gabe Black wrote: They aren't, but I think the bits that are missing will trigger warnings. OK, good enough. Thanks. - Steve --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2591/#review5706 --- On Dec. 22, 2014, 4:38 p.m., Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2591/ --- (Updated Dec. 22, 2014, 4:38 p.m.) Review request for Default. Repository: gem5 Description --- Changeset 10608:fc26fb9c80b9 --- x86: Enable three bits in the FamilyModelStepping ECX CPUID bitfield. These are for the monitor/mwait instructions, SSSE3, and XSAVE. Diffs - src/arch/x86/cpuid.cc a0cb57e1c072965dcdd51465beff37b264b41424 Diff: http://reviews.gem5.org/r/2591/diff/ Testing --- Thanks, Gabe Black ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] Review Request 2593: syscall emulation: Return correct writev value
--- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2593/ --- Review request for Default. Repository: gem5 Description --- Changeset 10629:0de378f6af0e --- syscall emulation: Return correct writev value According to Linux man pages, if writev is successful, it returns the total number of bytes written. Otherwise, it returns an error code. Instead of returning 0, return the result from the actual call to writev in the system call. Diffs - src/sim/syscall_emul.hh c9b7e0c69f88673c79c4a033d4425cc1bba00a6d Diff: http://reviews.gem5.org/r/2593/diff/ Testing --- Fixes infinite loop output printing in Delauney Mesh Refinement benchmark (LonestarGPU), which uses ofstream to buffer output to file. Thanks, Joel Hestness ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Review Request 2593: syscall emulation: Return correct writev value
--- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2593/#review5711 --- I think we settled on syscall_emul - Andreas Hansson On Dec. 23, 2014, 2:51 p.m., Joel Hestness wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2593/ --- (Updated Dec. 23, 2014, 2:51 p.m.) Review request for Default. Repository: gem5 Description --- Changeset 10629:0de378f6af0e --- syscall emulation: Return correct writev value According to Linux man pages, if writev is successful, it returns the total number of bytes written. Otherwise, it returns an error code. Instead of returning 0, return the result from the actual call to writev in the system call. Diffs - src/sim/syscall_emul.hh c9b7e0c69f88673c79c4a033d4425cc1bba00a6d Diff: http://reviews.gem5.org/r/2593/diff/ Testing --- Fixes infinite loop output printing in Delauney Mesh Refinement benchmark (LonestarGPU), which uses ofstream to buffer output to file. Thanks, Joel Hestness ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev