Re: [gem5-dev] Review Request: Enabled instruction fetch pipelining.
I'm working on fixing this patch as it doesn't apply cleanly to the current code. It also fails to pipeline fetch for the corner case when the pipeline uses all the fetch bandwidth and you have reached the end of a cache block. It should start the fetch for the next cycle but won't. Probably not a huge deal for the default O3 that is 8-wide fetch/issue, but for anything reasonable/smaller this patch won't work. I'm looking into this without breaking anything else in the fetch stage (right now I'm hitting an assert in buildInst() ). Geoff On Wed, May 25, 2011 at 12:58 AM, Gabe Black gbl...@eecs.umich.eduwrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/718/#review1260 --- Have you run the regressions for the various ISAs with this patch? Have you tried the applicable ISAs with fetch pipelines deeper than the default (one stage?). The fetch code is subjected to a lot of corner cases and would likely be easy to break in subtle ways, so we need to be really careful. Also, have you considered making this an external component to the CPU? O3 is already very complicated, so if it could make sense to compartmentalize this as another component that would help. - Gabe On 2011-05-24 12:01:29, Lisa Hsu wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/718/ --- (Updated 2011-05-24 12:01:29) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- Enabled instruction fetch pipelining. This patch is from one of our co-ops who has since finished her term, Yasuko Watanabe. I don't personally know much about it. In the end, I'll push in her name. Thanks. Diffs - src/cpu/o3/fetch.hh 54a65799e4c1 src/cpu/o3/fetch_impl.hh 54a65799e4c1 Diff: http://reviews.m5sim.org/r/718/diff Testing --- Thanks, Lisa ___ gem5-dev mailing list gem5-dev@m5sim.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@m5sim.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in m5: O3: Fix issue with interrupts/faults occuring i...
changeset 13ac7b9939ef in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=13ac7b9939ef description: O3: Fix issue with interrupts/faults occuring in the middle of a macro-op This patch fixes two problems with the O3 cpu model. The first is an issue with an instruction fetch causing a fault on the next address while the current macro-op is being issued. This happens when the micro-ops exceed the fetch bandwdith and then on the next cycle the fetch stage attempts to issue a request to the next line while it still has micro-ops to issue if the next line faults a fault is attached to a micro-op in the currently executing macro-op rather than a nop from the next instruction block. This leads to an instruction incorrectly faulting when on fetch when it had no reason to fault. A similar problem occurs with interrupts. When an interrupt occurs the fetch stage nominally stops issuing instructions immediately. This is incorrect in the case of a macro-op as the current location might not be interruptable. diffstat: src/arch/arm/faults.cc | 2 ++ src/cpu/o3/fetch.hh | 3 +++ src/cpu/o3/fetch_impl.hh | 28 +--- 3 files changed, 26 insertions(+), 7 deletions(-) diffs (98 lines): diff -r 1eaa1fbd2212 -r 13ac7b9939ef src/arch/arm/faults.cc --- a/src/arch/arm/faults.ccSat May 21 00:40:57 2011 -0400 +++ b/src/arch/arm/faults.ccMon May 23 10:40:18 2011 -0500 @@ -219,6 +219,8 @@ fsr.ext = 0; tc-setMiscReg(T::FsrIndex, fsr); tc-setMiscReg(T::FarIndex, faultAddr); + +DPRINTF(Faults, Abort Fault fsr=%#x faultAddr=%#x\n, fsr, faultAddr); } void diff -r 1eaa1fbd2212 -r 13ac7b9939ef src/cpu/o3/fetch.hh --- a/src/cpu/o3/fetch.hh Sat May 21 00:40:57 2011 -0400 +++ b/src/cpu/o3/fetch.hh Mon May 23 10:40:18 2011 -0500 @@ -403,6 +403,9 @@ StaticInstPtr macroop[Impl::MaxThreads]; +/** Can the fetch stage redirect from an interrupt on this instruction? */ +bool delayedCommit[Impl::MaxThreads]; + /** Memory request used to access cache. */ RequestPtr memReq[Impl::MaxThreads]; diff -r 1eaa1fbd2212 -r 13ac7b9939ef src/cpu/o3/fetch_impl.hh --- a/src/cpu/o3/fetch_impl.hh Sat May 21 00:40:57 2011 -0400 +++ b/src/cpu/o3/fetch_impl.hh Mon May 23 10:40:18 2011 -0500 @@ -346,6 +346,7 @@ pc[tid] = cpu-pcState(tid); fetchOffset[tid] = 0; macroop[tid] = NULL; +delayedCommit[tid] = false; } for (ThreadID tid = 0; tid numThreads; tid++) { @@ -1070,6 +1071,9 @@ assert(numInst fetchWidth); toDecode-insts[toDecode-size++] = instruction; +// Keep track of if we can take an interrupt at this boundary +delayedCommit[tid] = instruction-isDelayedCommit(); + return instruction; } @@ -1112,8 +1116,11 @@ // Align the fetch PC so its at the start of a cache block. Addr block_PC = icacheBlockAlignPC(fetchAddr); -// Unless buffer already got the block, fetch it from icache. -if (!(cacheDataValid[tid] block_PC == cacheDataPC[tid]) !inRom) { +// If buffer is no longer valid or fetchAddr has moved to point +// to the next cache block, AND we have no remaining ucode +// from a macro-op, then start fetch from icache. +if (!(cacheDataValid[tid] block_PC == cacheDataPC[tid]) + !inRom !macroop[tid]) { DPRINTF(Fetch, [tid:%i]: Attempting to translate and read instruction, starting at PC %s.\n, tid, thisPC); @@ -1126,7 +1133,11 @@ else ++fetchMiscStallCycles; return; -} else if (checkInterrupt(thisPC.instAddr()) || isSwitchedOut()) { +} else if ((checkInterrupt(thisPC.instAddr()) !delayedCommit[tid]) + || isSwitchedOut()) { +// Stall CPU if an interrupt is posted and we're not issuing +// an delayed commit micro-op currently (delayed commit instructions +// are not interruptable by interrupts, only faults) ++fetchMiscStallCycles; return; } @@ -1184,9 +1195,11 @@ unsigned blkOffset = (fetchAddr - cacheDataPC[tid]) / instSize; // Loop through instruction memory from the cache. -while (blkOffset numInsts - numInst fetchWidth - !predictedBranch) { +// Keep issuing while we have not reached the end of the block or a +// macroop is active and fetchWidth is available and branch is not +// predicted taken +while ((blkOffset numInsts || curMacroop) + numInst fetchWidth !predictedBranch) { // If we need to process more memory, do it now. if (!(curMacroop || inRom) !predecoder.extMachInstReady()) { @@ -1232,7 +1245,8 @@ pcOffset = 0; } }
[gem5-dev] changeset in m5: O3: Fix issue w/wbOutstading being decremented ...
changeset 6173b87e7652 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=6173b87e7652 description: O3: Fix issue w/wbOutstading being decremented multiple times on blocked cache. If a split load fails on a blocked cache wbOutstanding can be decremented twice if the first part of the split load succeeds and the second part fails. Condition the decrementing on not having completed the first part of the load. diffstat: src/cpu/o3/iew.hh | 2 +- src/cpu/o3/iew_impl.hh | 4 +++- src/cpu/o3/lsq_unit.hh | 7 ++- 3 files changed, 10 insertions(+), 3 deletions(-) diffs (43 lines): diff -r 13ac7b9939ef -r 6173b87e7652 src/cpu/o3/iew.hh --- a/src/cpu/o3/iew.hh Mon May 23 10:40:18 2011 -0500 +++ b/src/cpu/o3/iew.hh Mon May 23 10:40:19 2011 -0500 @@ -228,7 +228,7 @@ { if (++wbOutstanding == wbMax) ableToIssue = false; -DPRINTF(IEW, wbOutstanding: %i\n, wbOutstanding); +DPRINTF(IEW, wbOutstanding: %i [sn:%lli]\n, wbOutstanding, sn); assert(wbOutstanding = wbMax); #ifdef DEBUG wbList.insert(sn); diff -r 13ac7b9939ef -r 6173b87e7652 src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhMon May 23 10:40:18 2011 -0500 +++ b/src/cpu/o3/iew_impl.hhMon May 23 10:40:19 2011 -0500 @@ -1221,7 +1221,9 @@ // Check if the instruction is squashed; if so then skip it if (inst-isSquashed()) { -DPRINTF(IEW, Execute: Instruction was squashed.\n); +DPRINTF(IEW, Execute: Instruction was squashed. PC: %s, [tid:%i] + [sn:%i]\n, inst-pcState(), inst-threadNumber, + inst-seqNum); // Consider this instruction executed so that commit can go // ahead and retire the instruction. diff -r 13ac7b9939ef -r 6173b87e7652 src/cpu/o3/lsq_unit.hh --- a/src/cpu/o3/lsq_unit.hhMon May 23 10:40:18 2011 -0500 +++ b/src/cpu/o3/lsq_unit.hhMon May 23 10:40:19 2011 -0500 @@ -804,7 +804,12 @@ ++lsqCacheBlocked; -iewStage-decrWb(load_inst-seqNum); +// If the first part of a split access succeeds, then let the LSQ +// handle the decrWb when completeDataAccess is called upon return +// of the requested first part of data +if (!completedFirst) +iewStage-decrWb(load_inst-seqNum); + // There's an older load that's already going to squash. if (isLoadBlocked blockedLoadSeqNum load_inst-seqNum) return NoFault; ___ gem5-dev mailing list gem5-dev@m5sim.org http://m5sim.org/mailman/listinfo/gem5-dev
[m5-dev] changeset in m5: O3: Fix an issue with a load branch instructi...
changeset 3c1296738e34 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=3c1296738e34 description: O3: Fix an issue with a load branch instruction and mem dep squashing Instructions that load an address and are control instructions can execute down the wrong path if they were predicted correctly and then instructions following them are squashed. If an instruction is a memory and control op use the predicted address for the next PC instead of just advancing the PC. Without this change NPC is used for the next instruction, but predPC is used to verify that the branch was successful so the wrong path is silently executed. diffstat: src/cpu/o3/iew_impl.hh | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) diffs (22 lines): diff -r d57afdcf38f5 -r 3c1296738e34 src/cpu/o3/iew_impl.hh --- a/src/cpu/o3/iew_impl.hhThu May 12 11:19:35 2011 -0700 +++ b/src/cpu/o3/iew_impl.hhFri May 13 17:27:00 2011 -0500 @@ -485,8 +485,16 @@ inst-seqNum toCommit-squashedSeqNum[tid]) { toCommit-squash[tid] = true; toCommit-squashedSeqNum[tid] = inst-seqNum; -TheISA::PCState pc = inst-pcState(); -TheISA::advancePC(pc, inst-staticInst); +TheISA::PCState pc; +if (inst-isMemRef() inst-isIndirectCtrl()) { +// If an operation is a control operation as well as a memory +// reference we need to use the predicted PC, not the PC+N +// This instruction will verify misprediction based on predPC +pc = inst-readPredTarg(); +} else { +pc = inst-pcState(); +TheISA::advancePC(pc, inst-staticInst); +} toCommit-pc[tid] = pc; toCommit-mispredictInst[tid] = NULL; ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] TBH/TBB ARM instructions should potentially be split into 2 micro-ops?
I've run into a buggy interaction for the ARM ISA between a TBH (or TBB) instruction and a dependent memory operation (that gets squashed) in the O3 model leading to erroneous behavior when diffed against the Atomic model. The TBH instruction is a table-based branch that has to index into memory to calculate its branch destination, so it is both a branch and a memory op. The buggy behavior is as follows: 1) Fetch a TBH, predict branch destination 2) Begin fetching from predicted PC (which happens to be correct in my buggy run) 3) Issue younger dependent memory op to LSQ and send request to cache ahead of TBH which is waiting on register operands 4) Issue TBH to LSQ to read memory for branch destination 5) Memory violation detection with younger instruction and squash for memory ordering --- This squash then calls squashDueToMemOrder(...), which redirects the PC of Fetch to a stale PC value stored in the TBH dyn-inst object as it hasn't yet calculated its true PC 6) Start fetching down wrong path 7) TBH completes, but since the branch part was predicted correctly, no additional squash happens in checkMisprediction (which it may not even check due the already outstanding squash) I see two ways to fix this, either hack up the O3 model to handle this case of a fused memory-op and branch instruction (recheck to squash when the TBH finally resolves for the special case of squashing dependent memory ops causing the fetch to screw up the branch), or split the instruction into 2 micro-ops (the load and then a dependent branch). Which one do people think would be the better option? I'm currently leaning toward micro-coding the instruction. Thanks, Geoff Blake ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: Fix bug in MDT BITMAP to allow more than 2GB of...
changeset 6b05deee0ca3 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=6b05deee0ca3 description: Fix bug in MDT BITMAP to allow more than 2GB of memory. Signed-off by Ali Saidi sa...@eecs.umich.edu diffstat: system/alpha/console/console.c | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) diffs (70 lines): diff -r 8d92e9995321 -r 6b05deee0ca3 system/alpha/console/console.c --- a/system/alpha/console/console.cWed Aug 16 15:51:06 2006 -0400 +++ b/system/alpha/console/console.cFri Oct 19 16:44:02 2007 -0400 @@ -60,7 +60,7 @@ #include sys/types.h #define CONSOLE -#include alpha_access.h +#include access.h #include cserve.h #include rpb.h @@ -89,7 +89,6 @@ #define KPTE(x) ((ulong)ulong)(x)) 32) | 0x1101)) #define HWRPB_PAGES 16 -#define MDT_BITMAP_PAGES 4 #define NUM_KERNEL_THIRD (4) @@ -403,10 +402,12 @@ unsigned char *mdt_bitmap; long *lp1, *lp2, sum; int i, cl; -int kern_first_page; -int mem_size = m5Conf.mem_size; +ulong kern_first_page; +ulong mem_size = m5Conf.mem_size; -int mem_pages = mem_size / PAGE_SIZE, cons_pages; +ulong mem_pages = mem_size / PAGE_SIZE, cons_pages; +ulong mdt_bitmap_pages = mem_pages / (PAGE_SIZE*8); + ulong kernel_bytes, ksp, kernel_end, *unix_kernel_stack, bss, ksp_bottom, ksp_top; struct rpb_ctb *rpb_ctb; @@ -443,7 +444,7 @@ rpb = (struct rpb *)unix_boot_alloc(HWRPB_PAGES); -mdt_bitmap = (unsigned char *)unix_boot_alloc(MDT_BITMAP_PAGES); +mdt_bitmap = (unsigned char *)unix_boot_alloc(mdt_bitmap_pages); first = (ulong *)unix_boot_alloc(1); second = (ulong *)unix_boot_alloc(1); third_rpb = (ulong *)unix_boot_alloc(1); @@ -503,13 +504,13 @@ third_rpb[i] = KPTE(PFN(rpb) + i); /* Map the MDT bitmap table */ -for (i = 0; i MDT_BITMAP_PAGES; i++) { +for (i = 0; i mdt_bitmap_pages; i++) { third_rpb[HWRPB_PAGES + i] = KPTE(PFN(mdt_bitmap) + i); } /* Protect the PAL pages */ for (i = 1; i PFN(first); i++) -third_rpb[HWRPB_PAGES + MDT_BITMAP_PAGES + i] = KPTE(i); +third_rpb[HWRPB_PAGES + mdt_bitmap_pages + i] = KPTE(i); /* Set up third_kernel after it's loaded, when we know where it is */ kern_first_page = (KSEG_TO_PHYS(m5Conf.kernStart)/PAGE_SIZE); @@ -678,7 +679,7 @@ rpb_mdt-rpb_checksum = sum; /* XXX should checksum the cluster descriptors */ -bzero((char *)mdt_bitmap, MDT_BITMAP_PAGES * PAGE_SIZE); +bzero((char *)mdt_bitmap, mdt_bitmap_pages * PAGE_SIZE); for (i = 0; i mem_pages/8; i++) ((unsigned char *)mdt_bitmap)[i] = 0xff; ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Memory corruption in m5 dev repository when using --trace-flags=ExecEnable
I stumbled upon what appears to be a memory corruption bug in the current M5 repository. If on the command line I enter: % ./build/ALPHA_FS/m5.opt -trace-flags=ExecEnable -trace-start=14000 fs.py -b benchmark -t -n cpus more parameters. The simulator will error with a segmentation fault or occasionally an assert not long after starting to trace instructions. I have run this through gdb in with m5.debug and see the same errors, the problem is the stack trace showing the cause of the seg fault or assert changes depending on the inputs to the simulator. So, I have not been able to pin point this bug which appears to be a subtle memory corruption somewhere in the code. This error does not happen for other trace flags such as the Cache trace flag. It appears linked solely to the instruction tracing mechanism. Has anybody else seen this bug? I'm using an up to date repository I pulled from m5sim.org this morning. Thanks, Geoff ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] syscall tracer
What exactly are you trying to do with making a syscall tracer Gabe? I thought your original problem was a happening with GLIBC doing some bizarre pointer encryption/decryption and it was getting it wrong leading to a segmentation fault? To help find that seg fault, I'd suggest going into the kernel and placing m5_exit() calls in arch/x86/mm/fault.c in the do_page_fault() where the kernel sends a SIGSEGV to user code and that'll help track down when it happens the first time, and reduce the cruft that happens after the program halts, like printing Segmentation Fault to the serial port. I'm not sure a syscall tracer will help with finding the segfault, I have a feeling its all in glibc and some weird corner case in the ISA of the M5 implementation that is causing the bug. This version of glibc causing the fault does work on real hardware correct? Geoff From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Steve Reinhardt Sent: Friday, January 30, 2009 10:12 AM To: M5 Developer List Subject: Re: [m5-dev] syscall tracer I've been following your thoughts but haven't replied because I don't have any particular ideas beyond what you've said. The main things I would say are to build off the existing SE-mode data structures and make the tracing work with SE mode as well, but you've already covered those. You'll see that there is some cheesy SE-mode tracing that simply prints the first 3 args as ints (or something like that), which was basically as much effort as I was willing to put into it. It does sound very useful though. One idea that just came to mind is that it might be worth looking at the strace source to see if there are any ideas there you can use. Better to look at the BSD one rather than the Linux one of course in case there is code that you want to re-use directly. Steve On Thu, Jan 29, 2009 at 11:54 PM, Gabe Black gbl...@eecs.umich.edu wrote: Anybody? I was thinking one option would be to extend SyscallDesc to have a gatherArgs() function and a describe() function. describe() would just generate a string which would be like disassembly but for syscalls. Then, every syscall would have a nice line in SE with the syscall traceflag, but it would also automatically be available in FS for my tracer. gatherArgs would just populate member variables (for instance) with the syscall arguments so they aren't pulled in for both the description and the actual syscall. It wouldn't be necessary, especially considering that syscalls aren't a very performance sensitive operation for us. Gabe Gabe Black wrote: Actually the timer goes off and the UART gets checked manually, and everything has passed through by that point so execution continues. I think there's supposed to be an interrupt or something for when the UART finishes, so there may be an issue with that never showing up. Gabe Black wrote: As a vote in favor of the usefulness of something like this, I think I've identified -a- problem with it. There's a close system call that's called on file descriptor 0 which is connected to the UART. The kernel starts waiting for the buffer to drain, but it never does for some reason and it just wakes up every now and then to give it another shot. I don't know if this has anything to do with the segfault, but I'd guess this is partially from me implementing all interrupts like they're edge triggered rather than level triggered like the UART apparently expects. If the UART starts driving its interrupt line while they're disabled for some reason, that will get lost and some count could end up out of balance. I'm going to be working on my somewhat hacky interrupt wiring scheme to make it less hacky and something I'm willing to push, and in the process I'll probably try to fix this too. To the folks with more kernel experience than me, does that sound like a reasonable theory? Is there something else that might make it wait forever? It seems to think it's got 7 characters in the buffer which seems like a very small number compared to how much output it's generated. Gabe Gabe Black wrote: Unfortunately, decreasing the TLB size to one was a red haring (sic?). With only one entry, if an instruction or an access spans pages (which takes amazingly long to happen), the TLB thrashes back and forth in that one entry and never gets anywhere. Now what I'm trying to do to get a better handle on the flow of the program is to implement a tracer, like the one you get with the Exec traceflag, but that prints out the parameters and return value of system calls. I have a simple version of this hacked in already, but there are probably four things that prevent it from working as well as it could. Three of those are mapping syscall numbers to names, knowing how many arguments there are, and knowing which are string pointers so the string can be gathered with functional accesses. The fourth is identifying when you're entering or exiting a
Re: [m5-dev] SLOOOOOOOOOOW IDE controller
I'm pretty sure in the alpha linux code, that they've added the quiesce() pseudo-inst to just skip past any busy wait loops. They've done this for the cpu_idle() loop as well in Alpha. -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Gabe Black Sent: Friday, December 19, 2008 3:50 AM To: M5 Developer List Subject: [m5-dev] SLOOW IDE controller I finally have the IDE controller sort of working (yay!), but apparently there's a built in 3 millisecond busy loop delay before the device is recognized as ready to go. In general, did you need to do anything special to make the controller start up and work in a reasonable amount of real time for Alpha? Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.9.19/1856 - Release Date: 12/18/2008 8:06 PM ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
RE: [m5-dev] tracing data for stores in simple CPU
I believe if you turn on ExecResult in the trace-flags option, it will show data, at least it does for me. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gabe Black Sent: Friday, May 02, 2008 5:32 AM To: M5 Developer List Subject: [m5-dev] tracing data for stores in simple CPU Is there any reason the data (the D in traces) doesn't get set in simple CPU? O3 does which I would imagine is less trivial than simple CPU. I'd hacked that to work at one point and it seemed to work without any issues, other than maybe faulting accesses had junk data. It can be pretty useful information to have and it's a pain that it doesn't get printed. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev No virus found in this incoming message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.6/1403 - Release Date: 4/29/2008 7:26 AM No virus found in this outgoing message. Checked by AVG. Version: 7.5.524 / Virus Database: 269.23.6/1403 - Release Date: 4/29/2008 7:26 AM ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev