[m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick
scons: *** Source `tests/quick/02.insttest/ref/sparc/linux/simple-timing/stats.txt' not found, needed by target `build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing/status'. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed. * build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp passed. * build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/simple-timing-mp passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed. *
Re: [m5-dev] Review Request: O3: Tighten memory order violation checking to 16 bytes.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/ --- (Updated 2011-03-30 08:41:48.614227) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- O3: Tighten memory order violation checking to 16 bytes. The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required. Diffs (updated) - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/o3/O3CPU.py d54b7775a6b0 src/cpu/o3/lsq_unit.hh d54b7775a6b0 src/cpu/o3/lsq_unit_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/520/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: O3: Tighten memory order violation checking to 16 bytes.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/#review1033 --- I think the updated patch addresses all of your issues Gabe. I tested it with an opt binary and one problem jumped out in x86 for 20.parser an assert: m5.opt: build/X86_SE/arch/x86/emulenv.cc:49: void X86ISA::EmulEnv::doModRM(const X86ISA::ExtMachInst): Assertion `machInst.modRM.mod != 3' failed. It looks like the assert shouldn't be there and is hit during some miss speculation. - Ali On 2011-03-30 08:41:48, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/ --- (Updated 2011-03-30 08:41:48) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- O3: Tighten memory order violation checking to 16 bytes. The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required. Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/o3/O3CPU.py d54b7775a6b0 src/cpu/o3/lsq_unit.hh d54b7775a6b0 src/cpu/o3/lsq_unit_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/520/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: cpu: split o3-specific parts out of BaseDynInst
On 2011-03-29 10:02:01, Gabe Black wrote: I agree with the sentiment of this change, but I think you moved too much into the O3 class. There's functionality (pointed out below) that you'll need to support in InOrder to be compliant with the interface instructions expect from CPUs, specifically delayed translation and oddly shaped/sized, memory accesses with readBytes/writeBytes. You'll have to support those to run all the ISAs, as would any other CPU using a dyninst in the future. The implementations in the base dyninst are pretty generic, although feel free to point out why they might not work with InOrder. Gabe, I think I agree with your comments. The intent with making the merge is to support some of the features necessary for ISAs like ARM and x86. But I had some reservations about keeping the translation and the unaligned memory access code in the BaseDynInst class, because in the InOrder model that stuff is handled separately in the CacheUnit resource for InOrder. It's done in a somewhat similar fashion to how the LSQ works in O3. However, there are issues say for split accesses whereas in the O3 model you try to make both requests on the same cycle (and fail if you don't), InOrder splits that up into separate requests to the cache allowing for overlap of the split request in high contention scenarios. The separate TLB translation is also done so that if the TLB is blocked/unavailable/etc. then you are not having to wait for 2 mshrs or 2 tlb-bandwidth slots to be available. With that said, I've been looking at the CacheUnit and LSQ implementations and now think that there is no reason that the DynInst can't make the request for a write and then the actual receiving object (LSQ or CacheUnit) can buffer the requests until the cache becomes available. The final trick, so to speak, is for the receiving memory object to be able to tell that when all translations are done and also if the load/store was sent to memory successfully. I think the support I need to implement this is there though, so I'm going to update this patch with the generic translation and read/writeBytes support back in the Base class. If there are any problems with then getting that to work for InOrder, then I'll bring that up at that point. On 2011-03-29 10:02:01, Gabe Black wrote: src/cpu/base_dyn_inst.hh, line 262 http://reviews.m5sim.org/r/529/diff/1/?file=10611#file10611line262 This stuff looks old and I'm guessing should be deleted one way or the other. There's a slightly different way that InOrder handles this Result structure so I had planned to revisit this and merge it in after I merged inorder into this style of DynInst object. - Korey --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/529/#review1029 --- On 2011-03-01 13:49:24, Korey Sewell wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/529/ --- (Updated 2011-03-01 13:49:24) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- cpu: split o3-specific parts out of BaseDynInst The bigger picture goal is that I want to get the InorderDynInst class derived from the BaseDynInst, since there is no need to replicate a lot of useful code already defined in BaseDynInst (e.g. microcode identification, etc.) and Inorder can take advantage of common code that handles microcode and other features that other ISAs need. But to do this, there are a lot of o3-specific things that are in BaseDynInst, that I pushed to O3DynInst in this patch. Book-keeping variables that handle the IQ,LSQ,ROB are unnecessary in the base class but generic variables that will work across CPUs (IsSquashed, IsCompleted, etc.) are kept in the base class. The upside is more consistency across the simple models (branch prediction and instruction identification are all in one common place). I really wanted to define pure virtual functions for read/write(to memory) and the setInt/FloatRegOperand, but virtual functions in a templated class is a no-no and I couldn't get around that (suggestions?). Also, I'd rather not use the this- pointer all over the place to access member variables of the templated Base class, but it had to be done. Other than those quirks, simulator functionality should stay the same as the O3 Model always references the O3DynInst pointer and the InOrder model doesnt currently make use of the base dyn inst. class. (but it will be easier to derive from now...) Diffs - src/cpu/base_dyn_inst.hh cf1afc88070f src/cpu/base_dyn_inst_impl.hh cf1afc88070f
[m5-dev] Review Request: CPU: Remove references to memory copy operations
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/ --- Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- CPU: Remove references to memory copy operations Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/inorder/inorder_dyn_inst.hh d54b7775a6b0 src/cpu/o3/commit_impl.hh d54b7775a6b0 src/cpu/ozone/lw_back_end_impl.hh d54b7775a6b0 src/cpu/static_inst.hh d54b7775a6b0 src/cpu/thread_state.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/612/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick
We've had multiple discussions on this (search the list archives for m5-stable and you should find them). We had some debate about how frequently m5-stable should be updated, and how long we want a changeset to mature in m5 before we consider promoting it to m5-stable, but I think we found some values everyone was content with last time... something like every 6 months we'll update m5-stable to the last working revision more than 1 month old, or something like that. (Or maybe it was 3/1, or 6/2, I forgot.) But as Ali says, the catch in automating this process is identifying the last working revision... we could use the regression tests to help narrow that down, but there are a lot of bugs that get pushed that aren't caught by the regression tester, so I wouldn't want to rely solely on that. If we had a better bug-tracking system so we could record facts like changeset Y fixes a bug introduced in changeset X then we could automatically exclude changesets between X and Y, but we don't have that. Steve On Tue, Mar 29, 2011 at 6:38 PM, Ali Saidi sa...@umich.edu wrote: You could do that, but there is no guarentee you'd pick a non-broken version to push. We wouldn't want to push anything from the last week with all the compilation issues. Ali On Mar 29, 2011, at 6:19 PM, Korey Sewell wrote: I'd prefer to see us just start updating m5-stable more regularly so it can fulfill its original purpose. We keep discussing this but never actually follow through. Is this any harder than just setting up a cronjob to push whatever is in m5-dev to m5-stable once per month (?) - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Review Request: ARM: Tag appropriate instructions as IsReturn
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/ --- Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Tag appropriate instructions as IsReturn Diffs - src/arch/arm/isa/insts/branch.isa d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/ldr.isa d54b7775a6b0 src/arch/arm/isa/insts/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/branch.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/614/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Review Request: ARM: Cleanup implementation of ITSTATE and put important code in PCState.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/ --- Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Cleanup implementation of ITSTATE and put important code in PCState. Consolidate all code to handle ITSTATE in the PCState object rather than touching a variety of structures/objects. Diffs - src/arch/alpha/predecoder.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/macromem.isa d54b7775a6b0 src/arch/arm/isa/insts/misc.isa d54b7775a6b0 src/arch/arm/isa/operands.isa d54b7775a6b0 src/arch/arm/isa/templates/macromem.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/misc.isa d54b7775a6b0 src/arch/arm/isa/templates/neon.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/predecoder.hh d54b7775a6b0 src/arch/arm/predecoder.cc d54b7775a6b0 src/arch/arm/types.hh d54b7775a6b0 src/arch/mips/predecoder.hh d54b7775a6b0 src/arch/power/predecoder.hh d54b7775a6b0 src/arch/sparc/predecoder.hh d54b7775a6b0 src/arch/x86/predecoder.hh d54b7775a6b0 src/cpu/o3/fetch_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/616/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Review Request: ARM: Fix table walk going on while ASID changes error
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/ --- Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix table walk going on while ASID changes error Diffs - src/arch/arm/faults.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/table_walker.cc d54b7775a6b0 src/arch/arm/tlb.cc d54b7775a6b0 Diff: http://reviews.m5sim.org/r/617/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: O3: Tighten memory order violation checking to 16 bytes.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/#review1034 --- I can't verify 100% that the code in your new function is correct, but I don't see anything obviously wrong. I really like that you consolidated the same code in two places down to the one. There's one issue which is pointed out below. src/cpu/base_dyn_inst.hh http://reviews.m5sim.org/r/520/#comment1405 This comment is inaccurate. It's really the largest address that's part of the request, which is the effective address plus the size and then minus one. Also, this feels like a temporary variable promoted to too large of a scope and/or permanence. size seems like it would be more generally useful, it would be more immediately obvious what it is, and you can go from one to the other easily like you are elsewhere in this change. - Gabe On 2011-03-30 08:41:48, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/ --- (Updated 2011-03-30 08:41:48) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- O3: Tighten memory order violation checking to 16 bytes. The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required. Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/o3/O3CPU.py d54b7775a6b0 src/cpu/o3/lsq_unit.hh d54b7775a6b0 src/cpu/o3/lsq_unit_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/520/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: CPU: Remove references to memory copy operations
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/#review1035 --- Looks good to me. - Gabe On 2011-03-30 08:58:56, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/ --- (Updated 2011-03-30 08:58:56) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- CPU: Remove references to memory copy operations Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/inorder/inorder_dyn_inst.hh d54b7775a6b0 src/cpu/o3/commit_impl.hh d54b7775a6b0 src/cpu/ozone/lw_back_end_impl.hh d54b7775a6b0 src/cpu/static_inst.hh d54b7775a6b0 src/cpu/thread_state.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/612/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: CPU: Remove references to memory copy operations
This does interact with Korey's change a bit, so hopefully we don't end up stepping on each other too much. Since somebody's going to have to update a patch anyway, I'll look at whether that result stuff in the dyninst can go away too. Gabe On 03/30/11 12:09, Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/#review1035 --- Looks good to me. - Gabe On 2011-03-30 08:58:56, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/ --- (Updated 2011-03-30 08:58:56) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- CPU: Remove references to memory copy operations Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/inorder/inorder_dyn_inst.hh d54b7775a6b0 src/cpu/o3/commit_impl.hh d54b7775a6b0 src/cpu/ozone/lw_back_end_impl.hh d54b7775a6b0 src/cpu/static_inst.hh d54b7775a6b0 src/cpu/thread_state.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/612/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Tag appropriate instructions as IsReturn
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/#review1036 --- One thing that should change is that isBranch passes through the ISA description code and fills in the template with the same value every time. If it's always the same (or could be harmlessly) then it should just be in the template. Note that this is different from the predicate test because it doesn't use any operands which -do- have to pass through the parser, just a local variable and a constant which have no special requirements. The other rasPop component may be the same, but it was less obvious where all that was being used. - Gabe On 2011-03-30 09:02:35, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/ --- (Updated 2011-03-30 09:02:35) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Tag appropriate instructions as IsReturn Diffs - src/arch/arm/isa/insts/branch.isa d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/ldr.isa d54b7775a6b0 src/arch/arm/isa/insts/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/branch.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/614/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix table walk going on while ASID changes error
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/#review1037 --- src/arch/arm/faults.cc http://reviews.m5sim.org/r/617/#comment1408 The Faults trace flag can be useful during boot to see where things go haywire since early on there shouldn't be any, at least in ISAs with hardware TLB miss handlers. Perhaps you should make this and any other artificial faults use FaultsVerbose or similar so they get ignored unless you really wanted to see them. src/arch/arm/table_walker.cc http://reviews.m5sim.org/r/617/#comment1406 It's not part of this change, but the brackets are messed up on this line. src/arch/arm/table_walker.cc http://reviews.m5sim.org/r/617/#comment1407 This panic doesn't do anything any more. - Gabe On 2011-03-30 09:05:28, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/ --- (Updated 2011-03-30 09:05:28) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix table walk going on while ASID changes error Diffs - src/arch/arm/faults.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/table_walker.cc d54b7775a6b0 src/arch/arm/tlb.cc d54b7775a6b0 Diff: http://reviews.m5sim.org/r/617/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Cleanup implementation of ITSTATE and put important code in PCState.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/#review1038 --- This is a great change. I saw one style mistake, and also I think some code you moved could be simplified further. src/arch/arm/types.hh http://reviews.m5sim.org/r/616/#comment1409 The type should be on its own line. src/arch/arm/types.hh http://reviews.m5sim.org/r/616/#comment1410 You could add new fields to the ITSTATE bitunion that would make this easier. cond and mask could be SubBitUnions which can be treated as values on their own or have internal bitfields (syntactically, they still have access to everything). This would then look more like: it.cond.bottom = it.mask.top; it.mask = it.mask 1; if (it.mask == 0) it.cond = 0; You could use _itstate directly as well which would save a few more lines. - Gabe On 2011-03-30 09:05:10, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/ --- (Updated 2011-03-30 09:05:10) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Cleanup implementation of ITSTATE and put important code in PCState. Consolidate all code to handle ITSTATE in the PCState object rather than touching a variety of structures/objects. Diffs - src/arch/alpha/predecoder.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/macromem.isa d54b7775a6b0 src/arch/arm/isa/insts/misc.isa d54b7775a6b0 src/arch/arm/isa/operands.isa d54b7775a6b0 src/arch/arm/isa/templates/macromem.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/misc.isa d54b7775a6b0 src/arch/arm/isa/templates/neon.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/predecoder.hh d54b7775a6b0 src/arch/arm/predecoder.cc d54b7775a6b0 src/arch/arm/types.hh d54b7775a6b0 src/arch/mips/predecoder.hh d54b7775a6b0 src/arch/power/predecoder.hh d54b7775a6b0 src/arch/sparc/predecoder.hh d54b7775a6b0 src/arch/x86/predecoder.hh d54b7775a6b0 src/cpu/o3/fetch_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/616/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: CPU: Remove references to memory copy operations
They're used in Checker and Ozone (and orthogonally overloaded in InOrder it looks like). Checker and Ozone really need to be either updated and disposed of, but until that happens I suppose it doesn't make sense to make it worse -and- complicate these other two patches by getting rid of the Results bits now. Gabe On 03/30/11 12:11, Gabe Black wrote: This does interact with Korey's change a bit, so hopefully we don't end up stepping on each other too much. Since somebody's going to have to update a patch anyway, I'll look at whether that result stuff in the dyninst can go away too. Gabe On 03/30/11 12:09, Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/#review1035 --- Looks good to me. - Gabe On 2011-03-30 08:58:56, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/ --- (Updated 2011-03-30 08:58:56) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- CPU: Remove references to memory copy operations Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/inorder/inorder_dyn_inst.hh d54b7775a6b0 src/cpu/o3/commit_impl.hh d54b7775a6b0 src/cpu/ozone/lw_back_end_impl.hh d54b7775a6b0 src/cpu/static_inst.hh d54b7775a6b0 src/cpu/thread_state.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/612/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: CPU: Remove references to memory copy operations
updated -or- disposed of. One is sufficient :-). Gabe On 03/30/11 13:13, Gabe Black wrote: They're used in Checker and Ozone (and orthogonally overloaded in InOrder it looks like). Checker and Ozone really need to be either updated and disposed of, but until that happens I suppose it doesn't make sense to make it worse -and- complicate these other two patches by getting rid of the Results bits now. Gabe On 03/30/11 12:11, Gabe Black wrote: This does interact with Korey's change a bit, so hopefully we don't end up stepping on each other too much. Since somebody's going to have to update a patch anyway, I'll look at whether that result stuff in the dyninst can go away too. Gabe On 03/30/11 12:09, Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/#review1035 --- Looks good to me. - Gabe On 2011-03-30 08:58:56, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/612/ --- (Updated 2011-03-30 08:58:56) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- CPU: Remove references to memory copy operations Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/inorder/inorder_dyn_inst.hh d54b7775a6b0 src/cpu/o3/commit_impl.hh d54b7775a6b0 src/cpu/ozone/lw_back_end_impl.hh d54b7775a6b0 src/cpu/static_inst.hh d54b7775a6b0 src/cpu/thread_state.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/612/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: O3: Tighten memory order violation checking to 16 bytes.
On 2011-03-30 09:08:07, Gabe Black wrote: src/cpu/base_dyn_inst.hh, line 246 http://reviews.m5sim.org/r/520/diff/2/?file=11291#file11291line246 This comment is inaccurate. It's really the largest address that's part of the request, which is the effective address plus the size and then minus one. Also, this feels like a temporary variable promoted to too large of a scope and/or permanence. size seems like it would be more generally useful, it would be more immediately obvious what it is, and you can go from one to the other easily like you are elsewhere in this change. I'll change it to size. - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/#review1034 --- On 2011-03-30 08:41:48, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/520/ --- (Updated 2011-03-30 08:41:48) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- O3: Tighten memory order violation checking to 16 bytes. The comment in the code suggests that the checking granularity should be 16 bytes, however in reality the shift by 8 is 256 bytes which seems much larger than required. Diffs - src/cpu/base_dyn_inst.hh d54b7775a6b0 src/cpu/o3/O3CPU.py d54b7775a6b0 src/cpu/o3/lsq_unit.hh d54b7775a6b0 src/cpu/o3/lsq_unit_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/520/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Tag appropriate instructions as IsReturn
On 2011-03-30 09:22:13, Gabe Black wrote: One thing that should change is that isBranch passes through the ISA description code and fills in the template with the same value every time. If it's always the same (or could be harmlessly) then it should just be in the template. Note that this is different from the predicate test because it doesn't use any operands which -do- have to pass through the parser, just a local variable and a constant which have no special requirements. The other rasPop component may be the same, but it was less obvious where all that was being used. I don't get what you mean. isBranch depends on the instruction and the destination register. I don't see any benefit from creating a new template which would be a duplicate of the non-isbranch case with this the branch case added. Seems like that is way messier. - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/#review1036 --- On 2011-03-30 09:02:35, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/ --- (Updated 2011-03-30 09:02:35) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Tag appropriate instructions as IsReturn Diffs - src/arch/arm/isa/insts/branch.isa d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/ldr.isa d54b7775a6b0 src/arch/arm/isa/insts/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/branch.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/614/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix table walk going on while ASID changes error
On 2011-03-30 09:29:31, Gabe Black wrote: src/arch/arm/table_walker.cc, line 128 http://reviews.m5sim.org/r/617/diff/1/?file=11356#file11356line128 This panic doesn't do anything any more. it does still catch some cases. - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/#review1037 --- On 2011-03-30 09:05:28, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/ --- (Updated 2011-03-30 09:05:28) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix table walk going on while ASID changes error Diffs - src/arch/arm/faults.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/table_walker.cc d54b7775a6b0 src/arch/arm/tlb.cc d54b7775a6b0 Diff: http://reviews.m5sim.org/r/617/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix table walk going on while ASID changes error
On 2011-03-30 09:29:31, Gabe Black wrote: src/arch/arm/faults.cc, line 233 http://reviews.m5sim.org/r/617/diff/1/?file=11355#file11355line233 The Faults trace flag can be useful during boot to see where things go haywire since early on there shouldn't be any, at least in ISAs with hardware TLB miss handlers. Perhaps you should make this and any other artificial faults use FaultsVerbose or similar so they get ignored unless you really wanted to see them. It's extraordinarily rare that this occurs. The number of things that have to occur are numerous. You have to be running with O3, execute a branch instruction, predict the branch as taken, that prediction has to have an entry in the BTB, the BTB entry has to miss in the TLB, a table walk has to occur to satisfy the miss, and MISCREG_CONTEXIDR has to be written while all this happens. At boot this is never going to happen because the context isn't going to change. I'm inclined to leave it as is. - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/#review1037 --- On 2011-03-30 09:05:28, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/ --- (Updated 2011-03-30 09:05:28) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix table walk going on while ASID changes error Diffs - src/arch/arm/faults.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/table_walker.cc d54b7775a6b0 src/arch/arm/tlb.cc d54b7775a6b0 Diff: http://reviews.m5sim.org/r/617/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Cleanup implementation of ITSTATE and put important code in PCState.
On 2011-03-30 09:53:54, Gabe Black wrote: src/arch/arm/types.hh, line 350 http://reviews.m5sim.org/r/616/diff/1/?file=11348#file11348line350 You could add new fields to the ITSTATE bitunion that would make this easier. cond and mask could be SubBitUnions which can be treated as values on their own or have internal bitfields (syntactically, they still have access to everything). This would then look more like: it.cond.bottom = it.mask.top; it.mask = it.mask 1; if (it.mask == 0) it.cond = 0; You could use _itstate directly as well which would save a few more lines. I'll see about adding this in another change... I want to get this known working code committed. It was very painful. On 2011-03-30 09:53:54, Gabe Black wrote: src/arch/arm/types.hh, line 322 http://reviews.m5sim.org/r/616/diff/1/?file=11348#file11348line322 The type should be on its own line. ok. - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/#review1038 --- On 2011-03-30 09:05:10, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/ --- (Updated 2011-03-30 09:05:10) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Cleanup implementation of ITSTATE and put important code in PCState. Consolidate all code to handle ITSTATE in the PCState object rather than touching a variety of structures/objects. Diffs - src/arch/alpha/predecoder.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/macromem.isa d54b7775a6b0 src/arch/arm/isa/insts/misc.isa d54b7775a6b0 src/arch/arm/isa/operands.isa d54b7775a6b0 src/arch/arm/isa/templates/macromem.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/misc.isa d54b7775a6b0 src/arch/arm/isa/templates/neon.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/predecoder.hh d54b7775a6b0 src/arch/arm/predecoder.cc d54b7775a6b0 src/arch/arm/types.hh d54b7775a6b0 src/arch/mips/predecoder.hh d54b7775a6b0 src/arch/power/predecoder.hh d54b7775a6b0 src/arch/sparc/predecoder.hh d54b7775a6b0 src/arch/x86/predecoder.hh d54b7775a6b0 src/cpu/o3/fetch_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/616/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Tag appropriate instructions as IsReturn
On 2011-03-30 09:22:13, Gabe Black wrote: One thing that should change is that isBranch passes through the ISA description code and fills in the template with the same value every time. If it's always the same (or could be harmlessly) then it should just be in the template. Note that this is different from the predicate test because it doesn't use any operands which -do- have to pass through the parser, just a local variable and a constant which have no special requirements. The other rasPop component may be the same, but it was less obvious where all that was being used. Ali Saidi wrote: I don't get what you mean. isBranch depends on the instruction and the destination register. I don't see any benefit from creating a new template which would be a duplicate of the non-isbranch case with this the branch case added. Seems like that is way messier. There's only one template that uses isBranch, right? And it's always either 0 or dest == INTREG_PC, right? So why not just hard code it to dest == INTREG_PC? It's at construction time which is less important performance wise, and if you're writing to the PC it's a branch. I don't think you should make a new template, but if you can pull stuff out of the ISA desc with all else being equal then that would be a good idea. - Gabe --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/#review1036 --- On 2011-03-30 09:02:35, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/614/ --- (Updated 2011-03-30 09:02:35) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Tag appropriate instructions as IsReturn Diffs - src/arch/arm/isa/insts/branch.isa d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/ldr.isa d54b7775a6b0 src/arch/arm/isa/insts/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/branch.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/614/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix table walk going on while ASID changes error
On 2011-03-30 09:29:31, Gabe Black wrote: src/arch/arm/table_walker.cc, line 128 http://reviews.m5sim.org/r/617/diff/1/?file=11356#file11356line128 This panic doesn't do anything any more. Ali Saidi wrote: it does still catch some cases. Oh, yeah. Duh :-P. On 2011-03-30 09:29:31, Gabe Black wrote: src/arch/arm/faults.cc, line 233 http://reviews.m5sim.org/r/617/diff/1/?file=11355#file11355line233 The Faults trace flag can be useful during boot to see where things go haywire since early on there shouldn't be any, at least in ISAs with hardware TLB miss handlers. Perhaps you should make this and any other artificial faults use FaultsVerbose or similar so they get ignored unless you really wanted to see them. Ali Saidi wrote: It's extraordinarily rare that this occurs. The number of things that have to occur are numerous. You have to be running with O3, execute a branch instruction, predict the branch as taken, that prediction has to have an entry in the BTB, the BTB entry has to miss in the TLB, a table walk has to occur to satisfy the miss, and MISCREG_CONTEXIDR has to be written while all this happens. At boot this is never going to happen because the context isn't going to change. I'm inclined to leave it as is. That's fair. Go ahead then. - Gabe --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/#review1037 --- On 2011-03-30 09:05:28, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/617/ --- (Updated 2011-03-30 09:05:28) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix table walk going on while ASID changes error Diffs - src/arch/arm/faults.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/table_walker.cc d54b7775a6b0 src/arch/arm/tlb.cc d54b7775a6b0 Diff: http://reviews.m5sim.org/r/617/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Cleanup implementation of ITSTATE and put important code in PCState.
On 2011-03-30 09:53:54, Gabe Black wrote: src/arch/arm/types.hh, line 350 http://reviews.m5sim.org/r/616/diff/1/?file=11348#file11348line350 You could add new fields to the ITSTATE bitunion that would make this easier. cond and mask could be SubBitUnions which can be treated as values on their own or have internal bitfields (syntactically, they still have access to everything). This would then look more like: it.cond.bottom = it.mask.top; it.mask = it.mask 1; if (it.mask == 0) it.cond = 0; You could use _itstate directly as well which would save a few more lines. Ali Saidi wrote: I'll see about adding this in another change... I want to get this known working code committed. It was very painful. No problem. - Gabe --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/#review1038 --- On 2011-03-30 09:05:10, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/616/ --- (Updated 2011-03-30 09:05:10) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Cleanup implementation of ITSTATE and put important code in PCState. Consolidate all code to handle ITSTATE in the PCState object rather than touching a variety of structures/objects. Diffs - src/arch/alpha/predecoder.hh d54b7775a6b0 src/arch/arm/faults.cc d54b7775a6b0 src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/isa/insts/data.isa d54b7775a6b0 src/arch/arm/isa/insts/macromem.isa d54b7775a6b0 src/arch/arm/isa/insts/misc.isa d54b7775a6b0 src/arch/arm/isa/operands.isa d54b7775a6b0 src/arch/arm/isa/templates/macromem.isa d54b7775a6b0 src/arch/arm/isa/templates/mem.isa d54b7775a6b0 src/arch/arm/isa/templates/misc.isa d54b7775a6b0 src/arch/arm/isa/templates/neon.isa d54b7775a6b0 src/arch/arm/isa/templates/pred.isa d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/predecoder.hh d54b7775a6b0 src/arch/arm/predecoder.cc d54b7775a6b0 src/arch/arm/types.hh d54b7775a6b0 src/arch/mips/predecoder.hh d54b7775a6b0 src/arch/power/predecoder.hh d54b7775a6b0 src/arch/sparc/predecoder.hh d54b7775a6b0 src/arch/x86/predecoder.hh d54b7775a6b0 src/cpu/o3/fetch_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/616/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Ruby Optimization Opportunity?
Hi all, I had noticed that Ruby was running a little slower than the old M5 memory system and decided to run gprof on it to see if there was anything obvious holding things up. For 2, 4, and 8 core ALPHA_FS_MOESI_CMP_directory, SimpleCPU runs for the Fft benchmark, it seems that the MemoryControl::executeCycle conributes to nearly 30% of the runtime. Looking at the comments for that code, I see this: // executeCycle: This function is called once per memory clock cycle I'm not familiar with this Memory Controller code but it would seem that some type of optimization not requiring this to be run every memory cycle would speed things up a good bit. So if someone has the time or the need to do some Ruby optimization work (i know Nilay had did some previously), then I think this will be a good place to start... I post some of the gprof output below: = 2 core = time (%) name 29.17 MemoryControl::executeCycle() 4.19RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 3.52PerfectSwitch::wakeup() 3.47Set::Set(Set const) 3.46RubyEventQueueNode::process() 4 core = time (%) name 27.49MemoryControl::executeCycle() 4.01RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 3.66PerfectSwitch::wakeup() 3.59 Set::Set(Set const) 3.50RubyEventQueueNode::process() 8 core = time (%) name 26.09MemoryControl::executeCycle() 4.12 Set::Set(Set const) 3.91 PerfectSwitch::wakeup() 3.88 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 3.41 RubyEventQueueNode::process() -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Review Request: ARM: Fix m5op parameters bug.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/618/ --- Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix m5op parameters bug. All the m5op parameters are 64 bits, but we were only sending 32 bits; and the static register indexes were incorrectly specified. Diffs - src/arch/arm/isa/insts/m5ops.isa d54b7775a6b0 src/arch/arm/isa/operands.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/618/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Review Request: ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/ --- Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works. This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. Diffs - src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/utility.cc d54b7775a6b0 src/cpu/o3/thread_context_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/620/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix m5op parameters bug.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/618/#review1047 --- LGTM - Gabe On 2011-03-30 14:53:40, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/618/ --- (Updated 2011-03-30 14:53:40) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix m5op parameters bug. All the m5op parameters are 64 bits, but we were only sending 32 bits; and the static register indexes were incorrectly specified. Diffs - src/arch/arm/isa/insts/m5ops.isa d54b7775a6b0 src/arch/arm/isa/operands.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/618/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix bug in MicroLdrNeon templates for initiateAcc().
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/619/#review1048 --- Looks mostly good. src/arch/arm/isa/templates/mem.isa http://reviews.m5sim.org/r/619/#comment1419 There's an extra blank line here. - Gabe On 2011-03-30 14:53:58, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/619/ --- (Updated 2011-03-30 14:53:58) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix bug in MicroLdrNeon templates for initiateAcc(). Diffs - src/arch/arm/isa/templates/mem.isa d54b7775a6b0 Diff: http://reviews.m5sim.org/r/619/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/#review1049 --- I'm not sure this is right yet. Won't it only copy the USR registers now and leave out all the other modes? Also, is there anything wrong with reading the CPSR, changing the mode, and then writing it back? src/arch/arm/isa.cc http://reviews.m5sim.org/r/620/#comment1420 Random blank line. - Gabe On 2011-03-30 14:55:05, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/ --- (Updated 2011-03-30 14:55:05) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works. This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. Diffs - src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/utility.cc d54b7775a6b0 src/cpu/o3/thread_context_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/620/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
On 2011-03-30 15:38:14, Gabe Black wrote: I'm not sure this is right yet. Won't it only copy the USR registers now and leave out all the other modes? Also, is there anything wrong with reading the CPSR, changing the mode, and then writing it back? No, NumIntRegs is all the registers in the system, not just the user ones. However, if some other register mapping is in effect, the mapping hides the user registers so they can't be accessed. Starting in user mode solves the problem. As far as the CPSR goes, I only want the updateRegMap() functionality, so the no effect versions can't be used and the effect version can change other things (E.g. if you were in thumb mode and cpsr mode was written you the pcstate would be updated). This seems much cleaner. - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/#review1049 --- On 2011-03-30 14:55:05, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/ --- (Updated 2011-03-30 14:55:05) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works. This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. Diffs - src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/utility.cc d54b7775a6b0 src/cpu/o3/thread_context_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/620/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
On 2011-03-30 15:38:14, Gabe Black wrote: I'm not sure this is right yet. Won't it only copy the USR registers now and leave out all the other modes? Also, is there anything wrong with reading the CPSR, changing the mode, and then writing it back? Ali Saidi wrote: No, NumIntRegs is all the registers in the system, not just the user ones. However, if some other register mapping is in effect, the mapping hides the user registers so they can't be accessed. Starting in user mode solves the problem. As far as the CPSR goes, I only want the updateRegMap() functionality, so the no effect versions can't be used and the effect version can change other things (E.g. if you were in thumb mode and cpsr mode was written you the pcstate would be updated). This seems much cleaner. Ah, ok, so everything that isn't put through the map is flattened to the same thing. That seems a little fragile since if flattening changes it'll break, and it relies on USR mode being identically mapped. I'm not sure exactly what to do about it, though. - Gabe --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/#review1049 --- On 2011-03-30 14:55:05, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/ --- (Updated 2011-03-30 14:55:05) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works. This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. Diffs - src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/utility.cc d54b7775a6b0 src/cpu/o3/thread_context_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/620/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
On 2011-03-30 15:38:14, Gabe Black wrote: I'm not sure this is right yet. Won't it only copy the USR registers now and leave out all the other modes? Also, is there anything wrong with reading the CPSR, changing the mode, and then writing it back? Ali Saidi wrote: No, NumIntRegs is all the registers in the system, not just the user ones. However, if some other register mapping is in effect, the mapping hides the user registers so they can't be accessed. Starting in user mode solves the problem. As far as the CPSR goes, I only want the updateRegMap() functionality, so the no effect versions can't be used and the effect version can change other things (E.g. if you were in thumb mode and cpsr mode was written you the pcstate would be updated). This seems much cleaner. Gabe Black wrote: Ah, ok, so everything that isn't put through the map is flattened to the same thing. That seems a little fragile since if flattening changes it'll break, and it relies on USR mode being identically mapped. I'm not sure exactly what to do about it, though. I can't come up with a better solution, but you wrote the initial code that operated that way ;). - Ali --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/#review1049 --- On 2011-03-30 14:55:05, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/ --- (Updated 2011-03-30 14:55:05) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works. This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. Diffs - src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/utility.cc d54b7775a6b0 src/cpu/o3/thread_context_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/620/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works.
On 2011-03-30 15:38:14, Gabe Black wrote: I'm not sure this is right yet. Won't it only copy the USR registers now and leave out all the other modes? Also, is there anything wrong with reading the CPSR, changing the mode, and then writing it back? Ali Saidi wrote: No, NumIntRegs is all the registers in the system, not just the user ones. However, if some other register mapping is in effect, the mapping hides the user registers so they can't be accessed. Starting in user mode solves the problem. As far as the CPSR goes, I only want the updateRegMap() functionality, so the no effect versions can't be used and the effect version can change other things (E.g. if you were in thumb mode and cpsr mode was written you the pcstate would be updated). This seems much cleaner. Gabe Black wrote: Ah, ok, so everything that isn't put through the map is flattened to the same thing. That seems a little fragile since if flattening changes it'll break, and it relies on USR mode being identically mapped. I'm not sure exactly what to do about it, though. Ali Saidi wrote: I can't come up with a better solution, but you wrote the initial code that operated that way ;). Honestly I probably copied and pasted that or left it from the original implementation, and since I usually don't pay much attention to the checkpointing/CPU switching stuff just left it that way while I got the other parts working. So you could say it's my fault but not my design. - Gabe --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/#review1049 --- On 2011-03-30 14:55:05, Ali Saidi wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/620/ --- (Updated 2011-03-30 14:55:05) Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and Nathan Binkert. Summary --- ARM: Fix checkpoint restoration into O3 CPU and the way O3 switchCpu works. This change fixes a small bug in the arm copyRegs() code where some registers wouldn't be copied if the processor was in a mode other than MODE_USER. Additionally, this change simplifies the way the O3 switchCpu code works by utilizing TheISA::copyRegs() to copy the required context information rather than the adhoc copying that goes on in the CPU model. The current code makes assumptions about the visibility of int and float registers that aren't true for all architectures in FS mode. Diffs - src/arch/arm/isa.cc d54b7775a6b0 src/arch/arm/miscregs.hh d54b7775a6b0 src/arch/arm/utility.cc d54b7775a6b0 src/cpu/o3/thread_context_impl.hh d54b7775a6b0 Diff: http://reviews.m5sim.org/r/620/diff Testing --- Thanks, Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Ruby: Add support for functional accesses
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/611/ --- (Updated 2011-03-30 16:19:26.551926) Review request for Default. Summary --- Ruby: Add support for functional accesses This patch is meant for aiding discussions on implementation of functional access support in Ruby. Diffs (updated) - configs/ruby/MESI_CMP_directory.py d54b7775a6b0 configs/ruby/Ruby.py d54b7775a6b0 src/mem/ruby/network/Network.cc d54b7775a6b0 src/mem/ruby/network/Network.py d54b7775a6b0 src/mem/ruby/profiler/Profiler.cc d54b7775a6b0 src/mem/ruby/profiler/Profiler.py d54b7775a6b0 src/mem/ruby/recorder/Tracer.cc d54b7775a6b0 src/mem/ruby/recorder/Tracer.py d54b7775a6b0 src/mem/ruby/system/AbstractMemory.hh PRE-CREATION src/mem/ruby/system/AbstractMemory.cc PRE-CREATION src/mem/ruby/system/Cache.py d54b7775a6b0 src/mem/ruby/system/CacheMemory.hh d54b7775a6b0 src/mem/ruby/system/CacheMemory.cc d54b7775a6b0 src/mem/ruby/system/DirectoryMemory.hh d54b7775a6b0 src/mem/ruby/system/DirectoryMemory.cc d54b7775a6b0 src/mem/ruby/system/DirectoryMemory.py d54b7775a6b0 src/mem/ruby/system/RubyPort.hh d54b7775a6b0 src/mem/ruby/system/RubyPort.cc d54b7775a6b0 src/mem/ruby/system/RubySystem.py d54b7775a6b0 src/mem/ruby/system/SConscript d54b7775a6b0 src/mem/ruby/system/Sequencer.py d54b7775a6b0 src/mem/ruby/system/System.hh d54b7775a6b0 src/mem/ruby/system/System.cc d54b7775a6b0 Diff: http://reviews.m5sim.org/r/611/diff Testing --- Thanks, Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Ruby: Add support for functional accesses
On Tue, 29 Mar 2011, Nilay Vaish wrote: Brad, I have posted on the review board my current implementation for supporting functional accesses in Ruby. This is untested and is mainly meant for furthering the discussions. I have some questions for you -- 1. How do we inform the other end of RubyPort's M5 Port about whether or not functional access was successful? 2. What's the role of directory memory in functional accesses? 3. If none of the caches have the block pertaining to the address of the access, then read accesses should be satisfied from the physical memory. Write accesses should always go to physical memory as well. How can physical memory be accessed from RubyPort? -- Nilay Brad, I have made some changes to the patch. I have updated it on the review board. I have added a call to sendFunctional() so as to send the response. I have also added call to sendFunctional() on the physical memory port of ruby port, so that the physical memory would also get updated. You had mentioned that we would unhook M5 memory and use Ruby to supply the data. How do we do this? And the second question from the previous mail still remains unanswered. Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
I think you forgot the attachments :P. Sometimes, if ProtocolTrace isn't enough for me to find a problem, I turn on RubySlicc and RubyGenerated as well. RubySlicc is the DPRINTFs within the actual protocol *.sm files, and RubyGenerated are inside of the generated code that you will only see in the build directory. Lisa On Tue, Mar 29, 2011 at 10:15 AM, Korey Sewell ksew...@umich.edu wrote: Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby Store/Coaslescing Buffer Implementation for TimingSimpleCPU
Malek, TimingSimpleCPUs are in-order CPU models and only do one instruction at a time, so coalescing at the CPU won't make any sense, since there will be nothing to coalesce. Unless you want to do your coalescing further down the memory hierarchy where you might have multiple accesses from different CPUs meeting at a shared cache (which sounds like it's not the situation you are targeting), then you'll probably have to use the O3 CPU. However, I'm not totally sure what the state of using Ruby with O3 is right now. Can anyone speak to that? Lisa On Sun, Mar 27, 2011 at 6:19 PM, Malek Musleh malek.mus...@gmail.comwrote: Hello, I am interested in implementing a storebuffer (coalescing buffer) for Ruby's Memory Model in M5/GEM5 for use in my current research. I wish to be able to coalesce speculative stores + non-speculative stores to the same cache line and then flush them to the cache during certain acquire/release constructs. I see that there was an existing directory called storebuffer, but was removed not too long ago. Reading the associated thread on the mailing list it seems that it was removed because it is not in use (given that O3 is not yet functional with Ruby), nor was never actually even used in the original GEM implementation. Here is the link to that thread: http://www.mail-archive.com/m5-dev@m5sim.org/msg10575.html In further reading of that thread, I see that there is/was general consensus that the Ruby Store Buffer will be merged with M5 O3's LSQ. For my research, O3 CPU Model is not a requirement, although storebuffers tend to be used typically only in O3 execution. For what I need to do, my specific question is as follows: A) Would it be better/easier to implement a new Buffer (similar to the MessageBuffer class) from the Ruby Side or B) actually reuse M5's existing O3's LSQ buffer in the Timing CPU Model. I think that A) might be the easier method to go for the following reasons: 1) It seems that the Sequencer class already has functionality to support coalescing stores to the same cache line (in reading the previous storebuffer thread) 2) This would make the coalescing buffer CPU Model independent 3) Avoid having to change the Timing CPU Code which may make it more likely to mess up how the CPU Model handles other memory related things (ISA-Dependent Memory references, split data requests, prefetching, etc). 4) Allows me to make it a Ruby Only change on the Ruby Code side of things as opposed to the M5 side of things. However, my hesitation with this approach is because 1) the way the Sequencer operates, it is the interface between the CPU Core and the Ruby Memory Model (converting M5 requests to Ruby Requests and what not), so 'logically' I guess it might make more sense to implement the store buffer before Ruby sees the store requests, and just have the sequencer do its thing with the coalescing? 2) The conclusion of the previous storebuffer thread was that work is currently?/will be done implementing the store buffer on the M5 side of things. Depending on if I go with Approach A), I know I would have to change which message buffer L1 communicates with L2, such that instead of sending stores through the L2 Request Buffer, I would send it as follows: L1 - Coaslescing Buffer - L2 Request Network Buffer - L2 instead of L1 - L2 Request Network Buffer - L2 But I am not sure how exactly I would go about this if I want to add this coalescing buffer to sit between the CPU Core and L1 as well? Could those familiar with Ruby comment on my thoughts/offer suggestions? Thanks Malek ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby Optimization Opportunity?
Korey, I do not have the FftBase32 benchmark. Is it possible for you to run the simulation with one of the following benchmarks -- IScsiInitiator, IScsiTarget, MutexTest, NetperfMaerts, NetperfStream, NetperfStreamNT, NetperfStreamUdp, NetperfUdpLocal, Nfs, NfsTcp, Nhfsstone, Ping, PovrayAutumn, PovrayBench, SurgeSpecweb, SurgeStandard, ValAccDelay, ValAccDelay2, ValCtxLat, ValMemLat, ValMemLat2MB, ValMemLat8MB, ValStream, ValStreamCopy, ValStreamScale, ValSysLat, ValTlbLat, Validation, bnAn Which of these do you think would closely resemble FFT? Nilay On Wed, 30 Mar 2011, Korey Sewell wrote: Hi all, I had noticed that Ruby was running a little slower than the old M5 memory system and decided to run gprof on it to see if there was anything obvious holding things up. For 2, 4, and 8 core ALPHA_FS_MOESI_CMP_directory, SimpleCPU runs for the Fft benchmark, it seems that the MemoryControl::executeCycle conributes to nearly 30% of the runtime. Looking at the comments for that code, I see this: // executeCycle: This function is called once per memory clock cycle I'm not familiar with this Memory Controller code but it would seem that some type of optimization not requiring this to be run every memory cycle would speed things up a good bit. So if someone has the time or the need to do some Ruby optimization work (i know Nilay had did some previously), then I think this will be a good place to start... I post some of the gprof output below: = 2 core = time (%) name 29.17 MemoryControl::executeCycle() 4.19RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 3.52PerfectSwitch::wakeup() 3.47Set::Set(Set const) 3.46RubyEventQueueNode::process() 4 core = time (%) name 27.49MemoryControl::executeCycle() 4.01RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 3.66PerfectSwitch::wakeup() 3.59 Set::Set(Set const) 3.50RubyEventQueueNode::process() 8 core = time (%) name 26.09MemoryControl::executeCycle() 4.12 Set::Set(Set const) 3.91 PerfectSwitch::wakeup() 3.88 RubyEventQueue::scheduleEventAbsolute(Consumer*, long long) 3.41 RubyEventQueueNode::process() -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby Question about MESI_CMP and implications of a Blocking Protocol
Hi Malek, I think the term blocking is confusing because it's really an overloaded term. There's a distinction is between blocking CPUs and blocking memory systems, and they are distinct. As you say below, a TimingCPU is a blocking CPU and only has one outstanding instruction going at a time. However, since GEMS is a coherence modeling infrastructure, obviously you take best advantage of it by running a CMP simulation or a multisocket simulation, where there are multiple CPUs. A blocking *protocol* is a protocol that essentially blocks on actions to an address that is in a transient state, e.g. in the process of writing back and not in a stable state like M/O/E/S/I. So, you could have a blocking protocol with a non-blocking CPU, for example, because they are talking about blocking totally different things (insts vs. addresses), e.g. the CPU has multiple memory operations outstanding, all being satisfied by the memory system, but once two CPUs try to do things to a *single* address that are in conflict, then operations on that address across the system become serialized. Even if you have blocking CPUs and a blocking protocol, you can still have multiple requests (up to N for N CPUs) flowing through Ruby, and the protocol will not block unless there is an issue with two of the CPUs accessing the same address in a conflicting way. I hope that helps and makes sense. Lisa On Mon, Mar 21, 2011 at 1:21 PM, Malek Musleh malek.mus...@gmail.comwrote: Hi, 1) I was wondering if the MESI_CMP protocol is currently implemented as a 'blocking' protocol, similar to how the MOESI_CMP version is. I see that on this link on the GEMS page that it indicates that the MOESI_CMP one is blocking, but doesn't indicate anything about the MESI_CMP version. http://www.cs.wisc.edu/gems/doc/gems-wiki/moin.cgi/Protocols?action=fullsearchcontext=180value=blockingtitlesearch=Titles In the MOESI_CMP Version there are coherenceResponseType Messages such as 'Unblock' / 'Exclusive_Unblock' which seems to enforce the blocking aspect, and I also see these types in the MESI version, but just implementing them does not necessarily enforce blocking in all possible situations. 2) Even if the MESI version is non-blocking, because Ruby only currently works with the Timing CPU Model, only one request can be issued to the memory model at a time anyway (and I believe the CPU stalls anyway until that request COMPLETES), but generally is it possible to have multiple outstanding/in progress in the ruby memory model even though Timing CPU is Blocking? For example, can Core 0 do a STORE X to L1 as L1 does a writeback of Data: Y to L2? I suspect no, that in a blocking Cache Coherence Protocol I cannot do that, but just wanted to confirm. Malek ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Ruby: Add support for functional accesses
Hi Nilay, Thanks for posting a new patch. I will review it as soon as I can...hopefully tonight. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Nilay Vaish Sent: Wednesday, March 30, 2011 4:32 PM To: Default Subject: Re: [m5-dev] Review Request: Ruby: Add support for functional accesses On Tue, 29 Mar 2011, Nilay Vaish wrote: Brad, I have posted on the review board my current implementation for supporting functional accesses in Ruby. This is untested and is mainly meant for furthering the discussions. I have some questions for you -- 1. How do we inform the other end of RubyPort's M5 Port about whether or not functional access was successful? 2. What's the role of directory memory in functional accesses? 3. If none of the caches have the block pertaining to the address of the access, then read accesses should be satisfied from the physical memory. Write accesses should always go to physical memory as well. How can physical memory be accessed from RubyPort? -- Nilay Brad, I have made some changes to the patch. I have updated it on the review board. I have added a call to sendFunctional() so as to send the response. I have also added call to sendFunctional() on the physical memory port of ruby port, so that the physical memory would also get updated. You had mentioned that we would unhook M5 memory and use Ruby to supply the data. How do we do this? And the second question from the previous mail still remains unanswered. Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Running Ruby w/32 Cores
Hi Korey, For the first trace, it looks like the L2 cache is either miscounting the number of valid L1 copies, or there is an error with the ack arithmetic. We are going to need a bit more information to figure out where the exact problem is. Could you apply the attached patch and reply with the new protocol trace? Thanks. For the second trace, you should be able to get the offending address by simply attaching GDB to the aborted process. Without knowing which address to zero in on, it is the proverbial finding a needle in a haystack. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 10:15 AM To: M5 Developer List Subject: Re: [m5-dev] Running Ruby w/32 Cores Thanks for the response Brad. The 1st trace has 1 L2 and the 2nd has 1 L2 (i had a typo in the original email). For each trace, I attach the stdout/stderr (*.out) and then the protocol trace (*.prottrace). Also, in the 1st trace, the offending address is clear and I isolate that in the protocol trace file provided. However, in the 2nd trace, it's unclear (currently) which access caused it to fail so I took the whole protocol trace file and gzip'd it. My current lack of expertise in SLICC limits me a bit, but I'd like to be more helpful in debugging so if there is anything that I can look into (or run) on my end to expedite the process, please advise. In the interim, I'll try to locate the exact address that's breaking trace 2 and then hopefully repost that. Thanks! -Korey On Tue, Mar 29, 2011 at 12:02 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Korey, I believe both of these issues should be easy to solve once we have a protocol trace leading up to the error. If you could create such a trace and send it to the list, that would be great. Just zero in on the offending address. Thanks, Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev- boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 29, 2011 8:11 AM To: M5 Developer List Subject: [m5-dev] Running Ruby w/32 Cores Hi All, I'm still having a bit of trouble running Ruby with 32+ cores. I am experimenting w/configs varying the l2-caches. The runs seems to generate various errors in the SLICC. Has anybody seen these or have any insight to how to start solving these type of issues (posted below)? = The command line and errors are as follows: (1) 32 Cores and 32 L2s build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... info: Entering event queue @ 0. Starting simulation... Runtime Error at MOESI_CMP_directory-dir.sm:155, Ruby Time: 38279: assert failure, PID: 5990 press return to continue. Program aborted at cycle 19139500 Aborted (2) 32 Cores and 1 L2 build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b FftBase32 -n 32 --num-dirs=32 --num- l2caches=32 ... fatal: Invalid transition system.l1_cntrl0 time: 349075 addr: [0x16180, line 0x16180] event: Ack state: MM @ cycle 174537500 [doTransitionWorker:build/ALPHA_FS_MOESI_CMP_directory/mem/protoc ol/L1Cache_Transitions.cc, line 477] Memory Usage: 2316756 KBytes For more information see: http://www.m5sim.org/fatal/23f196b2 Please let me know if you do...Thanks! -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev