[m5-dev] Cron m5t...@zizzer /z/m5/regression/do-regression quick
* build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby passed. * build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing-ruby passed. * build/ARM_SE/tests/fast/quick/00.hello/arm/linux/simple-atomic passed. * build/X86_SE/tests/fast/quick/00.hello/x86/linux/simple-timing-ruby passed. * build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/o3-timing-mp passed. *
Re: [m5-dev] X86 FS regression
I see that the bridge and cache are in parallel like you're describing. The culprit seems to be this line: configs/example/fs.py:test_sys.bridge.filter_ranges_a=[AddrRange(0, Addr.max)] where the bridge is being told explicitly not to let anything through from the IO side to the memory side. That should be fairly straightforward to poke a hole in for the necessary ranges. The corresponding line for the other direction (below) brings up another question. What happens if the bridge doesn't disallow something to go across and something else wants to respond to an address? The bridge isn't set to ignore APIC messages implementing IPIs between CPUs, but those seem to be going between CPUs and not out into the IO system. Are we just getting lucky? This same thing would seem to apply to any other memory side object that isn't in the address range 0-mem_size. configs/example/fs.py: test_sys.bridge.filter_ranges_b=[AddrRange(mem_size)] Gabe Steve Reinhardt wrote: I believe the I/O cache is normally paired with a bridge that lets things flow in the other direction. It's really just designed to handle accesses to cacheable space from devices on the I/O bus without requiring each device to have a cache. It's possible we've never had a situation before where I/O devices issue accesses to uncacheable non-memory locations on the CPU side of the I/O cache, in which case I would not be terribly surprised if that didn't quite work. Steve On Mon, Nov 22, 2010 at 11:59 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: The cache claims to support all addresses on the CPU side (or so says the comments), but no addresses on the memory side. Messages going from the IO interrupt controller get to the IO bus but then don't know where to go since the IO cache hides the fact that the CPU interrupt controller wants to receive messages on that address range. I also don't know if the cache can handle messages passing through originating from the memory side, but I didn't look into that. Gabe Ali Saidi wrote: Something has to maintain i/o coherency and that something looks an whole lot like a couple line cache. Why is having a cache there any issue, they should pass right through the cache? Ali On Nov 22, 2010, at 4:42 AM, Gabe Black wrote: Hmm. It looks like this IO cache is only added when there are caches in the system (a fix for some coherency something? I sort of remember that discussion.) and that wouldn't propagate to the IO bus the fact that the CPU's local APIC wanted to receive interrupt messages passed over the memory system. I don't know the intricacies of why the IO cache was necessary, or what problems passing requests back up through the cache might cause, but this is a serious issue for x86 and any other ISA that wants to move to a message based interrupt scheme. I suppose the interrupt objects could be connected all the way out onto the IO bus itself, bypassing that cache, but I'm not sure how realistic that is. Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still looking into that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___
[m5-dev] changeset in m5: X86: Obey the PCD (cache disable) bit in the pa...
changeset 8e8fa2f28f2e in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=8e8fa2f28f2e description: X86: Obey the PCD (cache disable) bit in the page tables. diffstat: src/arch/x86/tlb.cc | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diffs (12 lines): diff -r 6246338ac1e9 -r 8e8fa2f28f2e src/arch/x86/tlb.cc --- a/src/arch/x86/tlb.cc Mon Nov 22 05:49:03 2010 -0500 +++ b/src/arch/x86/tlb.cc Tue Nov 23 06:10:17 2010 -0500 @@ -653,6 +653,8 @@ Addr paddr = entry-paddr | (vaddr (entry-size-1)); DPRINTF(TLB, Translated %#x - %#x.\n, vaddr, paddr); req-setPaddr(paddr); +if (entry-uncacheable) +req-setFlags(Request::UNCACHEABLE); } else { //Use the address which already has segmentation applied. DPRINTF(TLB, Paging disabled.\n); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] X86 FS regression
I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb miss would be even more expensive then it is. The l1 isn't normally used for such things since it would get polluted (look why sparc has a load 128bits from l2, do not allocate into l1 instruction). Ali On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still looking into that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: X86: Loosen an assert for x86 and connect the A...
changeset 865e37d507c7 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=865e37d507c7 description: X86: Loosen an assert for x86 and connect the APIC ports when caches are used. diffstat: src/cpu/BaseCPU.py | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diffs (21 lines): diff -r 8e8fa2f28f2e -r 865e37d507c7 src/cpu/BaseCPU.py --- a/src/cpu/BaseCPU.pyTue Nov 23 06:10:17 2010 -0500 +++ b/src/cpu/BaseCPU.pyTue Nov 23 06:11:50 2010 -0500 @@ -167,7 +167,7 @@ exec('self.%s = bus.port' % p) def addPrivateSplitL1Caches(self, ic, dc): -assert(len(self._mem_ports) 6) +assert(len(self._mem_ports) 8) self.icache = ic self.dcache = dc self.icache_port = ic.cpu_side @@ -176,6 +176,8 @@ if buildEnv['FULL_SYSTEM']: if buildEnv['TARGET_ISA'] in ['x86', 'arm']: self._mem_ports += [itb.walker.port, dtb.walker.port] +if buildEnv['TARGET_ISA'] == 'x86': +self._mem_ports += [interrupts.pio, interrupts.int_port] def addTwoLevelCacheHierarchy(self, ic, dc, l2c): self.addPrivateSplitL1Caches(ic, dc) ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.
--- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/323/ --- Review request for Default. Summary --- Mem,X86: Make the IO bridge pass APIC messages back towards the CPU. Diffs - configs/example/fs.py 865e37d507c7 Diff: http://reviews.m5sim.org/r/323/diff Testing --- Thanks, Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.
This seems to get APIC messages back to the CPU, but I really don't know if it's the right way to do this. I have the feeling there are forces at work in this code I don't fully appreciate. Gabe Gabe Black wrote: This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/323/ Review request for Default. By Gabe Black. Description Mem,X86: Make the IO bridge pass APIC messages back towards the CPU. Diffs * configs/example/fs.py (865e37d507c7) View Diff http://reviews.m5sim.org/r/323/diff/ ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] X86 FS regression
Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb miss would be even more expensive then it is. The l1 isn't normally used for such things since it would get polluted (look why sparc has a load 128bits from l2, do not allocate into l1 instruction). Ali On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still looking into that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] X86 FS regression
IIRC, the filter works in conjunction with the address range autodetection stuff, so in order for a memory request to go across the bridge, the targeted address must lie on the other side *and* not be filtered out. I expect this explains why IPIs aren't going across. Thinking about it, I'm not sure why the I/O cache doesn't let uncached accesses through from the I/O side to the memory side, assuming the target exists on the memory side. CPU caches certainly let uncached accesses through, and it's the same cache module in both cases. Hmm, looking at fs.py, I think this line may be as much of a culprit as the others: test_sys.iocache = IOCache(addr_range=mem_size) I believe the address range exclusions are necessary to avoid an infinite loop between the iocache and the bridge in the address range autodetection algorithm, but perhaps the ranges are set up a little too conservatively so that uncacheable addresses have no way through. I don't think it matters whether you open up the range in the iocache or in the bridge to let them through, as long as (1) you only do one and not the other and (2) it's selective enough that it doesn't include any PCI addresses that might result in a loop. Steve On Tue, Nov 23, 2010 at 12:17 AM, Gabe Black gbl...@eecs.umich.edu wrote: I see that the bridge and cache are in parallel like you're describing. The culprit seems to be this line: configs/example/fs.py:test_sys.bridge.filter_ranges_a=[AddrRange(0, Addr.max)] where the bridge is being told explicitly not to let anything through from the IO side to the memory side. That should be fairly straightforward to poke a hole in for the necessary ranges. The corresponding line for the other direction (below) brings up another question. What happens if the bridge doesn't disallow something to go across and something else wants to respond to an address? The bridge isn't set to ignore APIC messages implementing IPIs between CPUs, but those seem to be going between CPUs and not out into the IO system. Are we just getting lucky? This same thing would seem to apply to any other memory side object that isn't in the address range 0-mem_size. configs/example/fs.py: test_sys.bridge.filter_ranges_b=[AddrRange(mem_size)] Gabe Steve Reinhardt wrote: I believe the I/O cache is normally paired with a bridge that lets things flow in the other direction. It's really just designed to handle accesses to cacheable space from devices on the I/O bus without requiring each device to have a cache. It's possible we've never had a situation before where I/O devices issue accesses to uncacheable non-memory locations on the CPU side of the I/O cache, in which case I would not be terribly surprised if that didn't quite work. Steve On Mon, Nov 22, 2010 at 11:59 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: The cache claims to support all addresses on the CPU side (or so says the comments), but no addresses on the memory side. Messages going from the IO interrupt controller get to the IO bus but then don't know where to go since the IO cache hides the fact that the CPU interrupt controller wants to receive messages on that address range. I also don't know if the cache can handle messages passing through originating from the memory side, but I didn't look into that. Gabe Ali Saidi wrote: Something has to maintain i/o coherency and that something looks an whole lot like a couple line cache. Why is having a cache there any issue, they should pass right through the cache? Ali On Nov 22, 2010, at 4:42 AM, Gabe Black wrote: Hmm. It looks like this IO cache is only added when there are caches in the system (a fix for some coherency something? I sort of remember that discussion.) and that wouldn't propagate to the IO bus the fact that the CPU's local APIC wanted to receive interrupt messages passed over the memory system. I don't know the intricacies of why the IO cache was necessary, or what problems passing requests back up through the cache might cause, but this is a serious issue for x86 and any other ISA that wants to move to a message based interrupt scheme. I suppose the interrupt objects could be connected all the way out onto the IO bus itself, bypassing that cache, but I'm not sure how realistic that is. Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to
Re: [m5-dev] X86 FS regression
The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb miss would be even more expensive then it is. The l1 isn't normally used for such things since it would get polluted (look why sparc has a load 128bits from l2, do not allocate into l1 instruction). Ali On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still looking into that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] X86 FS regression
I definitely agree that putting a bus between the CPU and L1 and plugging the table walker in there is the best way to figure out if this is really the problem (and I expect it is). I'm not sure if it's the long-term right answer or not. We also need to consider how this works with Ruby. Steve On Tue, Nov 23, 2010 at 3:29 AM, Gabe Black gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb miss would be even more expensive then it is. The l1 isn't normally used for such things since it would get polluted (look why sparc has a load 128bits from l2, do not allocate into l1 instruction). Ali On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still looking into that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.
My initial reaction is even if this works, this can't possibly be the best way to do it... where do APIC messages live in the address space? How does 'Addr.max 4' let them through? Did you really think this change didn't need a comment? ;-) On Tue, Nov 23, 2010 at 3:39 AM, Gabe Black gbl...@eecs.umich.edu wrote: This seems to get APIC messages back to the CPU, but I really don't know if it's the right way to do this. I have the feeling there are forces at work in this code I don't fully appreciate. Gabe Gabe Black wrote: This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/323/ Review request for Default. By Gabe Black. Description Mem,X86: Make the IO bridge pass APIC messages back towards the CPU. Diffs * configs/example/fs.py (865e37d507c7) View Diff http://reviews.m5sim.org/r/323/diff/ ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] X86 FS regression
Does it? Shouldn't the l2 receive the request, ask for the block and end up snooping the l1s? Ali On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt wrote: The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb miss would be even more expensive then it is. The l1 isn't normally used for such things since it would get polluted (look why sparc has a load 128bits from l2, do not allocate into l1 instruction). Ali On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still looking into that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org [5] m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev [6] ___ m5-dev mailing list m5-dev@m5sim.org [7] m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev [8] ___ m5-dev mailing list m5-dev@m5sim.org [9] m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev [10] ___ m5-dev mailing list m5-dev@m5sim.org [11] http://m5sim.org/mailman/listinfo/m5-dev [12] ___ m5-dev mailing list m5-dev@m5sim.org [13]
Re: [m5-dev] X86 FS regression
No, when the L2 receives a request it assumes the L1s above it have already been snooped, which is true since the request came in on the bus that the L1s snoop. The issue is that caches don't necessarily behave correctly when non-cache-block requests come in through their mem-side (snoop) port and not through their cpu-side (request) port. I'm guessing this could be made to work, I'd just be very surprised if it does right now, since the caches weren't designed to deal with this case and aren't tested this way. Steve On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu wrote: Does it? Shouldn't the l2 receive the request, ask for the block and end up snooping the l1s? Ali On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com wrote: The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu gbl...@eecs.umich.edu wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb miss would be even more expensive then it is. The l1 isn't normally used for such things since it would get polluted (look why sparc has a load 128bits from l2, do not allocate into l1 instruction). Ali On Nov 22, 2010, at 4:27 AM, Gabe Black wrote: For anybody waiting for an x86 FS regression (yes, I know, you can all hardly wait, but don't let this spoil your Thanksgiving) I'm getting closer to having it working, but I've discovered some issues with the mechanisms behind the --caches flag with fs.py and x86. I'm surprised I never thought to try it before. It also brings up some questions about where the table walkers should be hooked up in x86 and ARM. Currently it's after the L1, if any, but before the L2, if any, which seems wrong to me. Also caches don't seem to propagate requests upwards to the CPUs which may or may not be an issue. I'm still
Re: [m5-dev] X86 FS regression
And even though I do think it could be made to work, I'm not sure it would be easy or a good idea. There are a lot of corner cases to worry about, especially for writes, since you'd have to actually buffer the write data somewhere as opposed to just remembering that so-and-so has requested an exclusive copy. Actually as I think about it, that might be the case that's breaking now... if the L1 has an exclusive copy and then it snoops a write (and not a read-exclusive), I'm guessing it will just invalidate its copy, losing the modifications. I wouldn't be terribly surprised if reads are working OK (the L1 should snoop those and respond if it's the owner), and of course it's all OK if the L1 doesn't have a copy of the block. So maybe there is a relatively easy way to make this work, but figuring out whether that's true and then testing it is still a non-trivial amount of effort. Steve On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt ste...@gmail.com wrote: No, when the L2 receives a request it assumes the L1s above it have already been snooped, which is true since the request came in on the bus that the L1s snoop. The issue is that caches don't necessarily behave correctly when non-cache-block requests come in through their mem-side (snoop) port and not through their cpu-side (request) port. I'm guessing this could be made to work, I'd just be very surprised if it does right now, since the caches weren't designed to deal with this case and aren't tested this way. Steve On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu wrote: Does it? Shouldn't the l2 receive the request, ask for the block and end up snooping the l1s? Ali On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com wrote: The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu gbl...@eecs.umich.edu wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches seems like a good place to me. The caches can cache page table entries, otherwise a tlb
Re: [m5-dev] X86 FS regression
So what is the relatively good way to make this work in the short term? A bus? What about the slightly better version? I suppose a small cache might be ok and probably somewhat realistic. Thanks, Ali On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt wrote: And even though I do think it could be made to work, I'm not sure it would be easy or a good idea. There are a lot of corner cases to worry about, especially for writes, since you'd have to actually buffer the write data somewhere as opposed to just remembering that so-and-so has requested an exclusive copy. Actually as I think about it, that might be the case that's breaking now... if the L1 has an exclusive copy and then it snoops a write (and not a read-exclusive), I'm guessing it will just invalidate its copy, losing the modifications. I wouldn't be terribly surprised if reads are working OK (the L1 should snoop those and respond if it's the owner), and of course it's all OK if the L1 doesn't have a copy of the block. So maybe there is a relatively easy way to make this work, but figuring out whether that's true and then testing it is still a non-trivial amount of effort. Steve On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt wrote: No, when the L2 receives a request it assumes the L1s above it have already been snooped, which is true since the request came in on the bus that the L1s snoop. The issue is that caches don't necessarily behave correctly when non-cache-block requests come in through their mem-side (snoop) port and not through their cpu-side (request) port. I'm guessing this could be made to work, I'd just be very surprised if it does right now, since the caches weren't designed to deal with this case and aren't tested this way. Steve On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi wrote: Does it? Shouldn't the l2 receive the request, ask for the block and end up snooping the l1s? Ali On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt wrote: The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be handled correctly. Not that they fundamentally couldn't, this just isn't a configuration we test so it's likely that there are problems... for example, the L1 may try to hand ownership to the requester but the requester won't recognize that and things will break. Steve On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black wrote: What happens if an entry is in the L1 but not the L2? Gabe Ali Saidi wrote: Between the l1 and l2 caches
Re: [m5-dev] X86 FS regression
I think the two easy (python-only) solutions are sharing the existing L1 via a bus and tacking on a small L1 to the walker. Which one is more realistic would depend on what you're trying to model. Steve On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi sa...@umich.edu wrote: So what is the relatively good way to make this work in the short term? A bus? What about the slightly better version? I suppose a small cache might be ok and probably somewhat realistic. Thanks, Ali On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt ste...@gmail.com wrote: And even though I do think it could be made to work, I'm not sure it would be easy or a good idea. There are a lot of corner cases to worry about, especially for writes, since you'd have to actually buffer the write data somewhere as opposed to just remembering that so-and-so has requested an exclusive copy. Actually as I think about it, that might be the case that's breaking now... if the L1 has an exclusive copy and then it snoops a write (and not a read-exclusive), I'm guessing it will just invalidate its copy, losing the modifications. I wouldn't be terribly surprised if reads are working OK (the L1 should snoop those and respond if it's the owner), and of course it's all OK if the L1 doesn't have a copy of the block. So maybe there is a relatively easy way to make this work, but figuring out whether that's true and then testing it is still a non-trivial amount of effort. Steve On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt ste...@gmail.com wrote: No, when the L2 receives a request it assumes the L1s above it have already been snooped, which is true since the request came in on the bus that the L1s snoop. The issue is that caches don't necessarily behave correctly when non-cache-block requests come in through their mem-side (snoop) port and not through their cpu-side (request) port. I'm guessing this could be made to work, I'd just be very surprised if it does right now, since the caches weren't designed to deal with this case and aren't tested this way. Steve On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu wrote: Does it? Shouldn't the l2 receive the request, ask for the block and end up snooping the l1s? Ali On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com wrote: The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a potential issue there, I'm inclined to suspect the actual page table entry is still in the L1 and hasn't been evicted out to memory yet. To fix this, is the best solution to add a bus below the CPU for all the connections that need to go to the L1? I'm assuming they'd all go into the dcache since they're more data-ey and that keeps the icache read only (ignoring SMC issues), and the dcache is probably servicing lower bandwidth normally. It also seems a little strange that this type of configuration is going on in the BaseCPU.py SimObject python file and not a configuration file, but I could be convinced there's a reason. Even if this isn't really a fix or the right thing to do, I'd still like to try it temporarily at least to see if it corrects the problem I'm seeing. Gabe Ali Saidi wrote: I haven't seen any strange behavior yet. That isn't to say it's not going to cause an issue in the future, but we've taken many a tlb miss and it hasn't fallen over yet. Ali On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com wrote: Yea, I just got around to reading this thread and that was the point I was going to make... the L1 cache effectively serves as a translator between the CPU's word-size read write requests and the coherent block-level requests that get snooped. If you attach a CPU-like device (such as the table walker) directly to an L2, the CPU-like accesses that go to the L2 will get sent to the L1s but I'm not sure they'll be
Re: [m5-dev] Implementation of findTagInSet
Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag lookups for a single access; I don't know if that's just an API change or if that's something that needs to be folded into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on it, or likely to work on it again?) In the short term, it's possible that some of the overhead can be avoided by building a software cache into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call to see if we're looking up the same address as last time and if so just returning the same pointer before resorting to the hash table. I hope that doesn't lead to any coherence problems with the block changing out from under this cached copy... if so, perhaps an additional block check is required on hits. Steve On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I was looking at the MOESI hammer protocol. I think Steve's observation that extra tag lookups are going on in the cache is correct. In particular I noticed that in the getState() and setState() functions, first isTagPresent(address) is called and on the basis of the result (which is true or false), getCacheEntry(address) is called. Surprisingly, the getCacheEntry() function calls the isTagPresent() function again. These calls are in the file src/mem/protocol/MOESI_hammer-cache.sm Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
Brad and I will be having a discussion today on how to resolve this issue. -- Nilay On Tue, 23 Nov 2010, Steve Reinhardt wrote: Thanks for tracking that down; that confirms my suspicions. I think the long-term answer is that the system needs to be reworked to avoid having to do multiple tag lookups for a single access; I don't know if that's just an API change or if that's something that needs to be folded into SLICCer. (BTW, what is the status of SLICCer? Is anyone working on it, or likely to work on it again?) In the short term, it's possible that some of the overhead can be avoided by building a software cache into isTagPresent(), by storing the last address looked up along with a pointer to the block, then just checking on each call to see if we're looking up the same address as last time and if so just returning the same pointer before resorting to the hash table. I hope that doesn't lead to any coherence problems with the block changing out from under this cached copy... if so, perhaps an additional block check is required on hits. Steve On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote: I was looking at the MOESI hammer protocol. I think Steve's observation that extra tag lookups are going on in the cache is correct. In particular I noticed that in the getState() and setState() functions, first isTagPresent(address) is called and on the basis of the result (which is true or false), getCacheEntry(address) is called. Surprisingly, the getCacheEntry() function calls the isTagPresent() function again. These calls are in the file src/mem/protocol/MOESI_hammer-cache.sm Thanks Nilay ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.
Looks good to me. I don't see the problem with the print function either. (Ali, it's for an IP, not the netmask). I can nitpick (and this is bikeshedding), but I'm not sure that I agree that inheritance (as opposed to containment) from IPAddress is the right thing to do (on the C++ side). Nate On Mon, Nov 22, 2010 at 2:52 AM, Gabe Black gbl...@eecs.umich.edu wrote: So are we all ok with this now? Gabe Gabe Black wrote: This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/316/ Review request for Default. By Gabe Black. /Updated 2010-11-20 23:40:22.294085/ Description (updated) Params: Add parameter types for IP addresses in various forms. New parameter forms are: IP address in the format a.b.c.d where a-d are from decimal 0 to 255. IP address with netmask which is an IP followed by /n where n is a netmask length in bits from decimal 0 to 32 or by /e.f.g.h where e-h are from decimal 0 to 255 and which is all 1 bits followed by all 0 bits when represented in binary. These can also be specified as an integral IP and netmask passed in separately. IP address with port which is an IP followed by :p where p is a port index from decimal 0 to 65535. These can also be specified as an integral IP and port value passed in separately. Diffs (updated) * src/base/inet.hh (bf5377d8f5c1) * src/base/inet.cc (bf5377d8f5c1) * src/python/m5/params.py (bf5377d8f5c1) * src/python/m5/util/convert.py (bf5377d8f5c1) * src/python/swig/inet.i (bf5377d8f5c1) View Diff http://reviews.m5sim.org/r/316/diff/ ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.
On Tue, 23 Nov 2010 11:52:22 -0800, nathan binkert n...@binkert.org wrote: Looks good to me. I don't see the problem with the print function either. (Ali, it's for an IP, not the netmask). I can nitpick (and this is bikeshedding), but I'm not sure that I agree that inheritance (as opposed to containment) from IPAddress is the right thing to do (on the C++ side). Look at what the code is doing to get the values? left shifting and converting to a byte? It's going to print all 0s. Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.
Ali Saidi wrote: On Tue, 23 Nov 2010 11:52:22 -0800, nathan binkert n...@binkert.org wrote: Looks good to me. I don't see the problem with the print function either. (Ali, it's for an IP, not the netmask). I can nitpick (and this is bikeshedding), but I'm not sure that I agree that inheritance (as opposed to containment) from IPAddress is the right thing to do (on the C++ side). Look at what the code is doing to get the values? left shifting and converting to a byte? It's going to print all 0s. Ali ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev Oh yeah, you're right, those shifts look like they're going the wrong way. I'll fix that. Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.
A number of prefixes can be stuck into the top nibble of a physical address to put it into a partition set aside for a certain purpose. This is something I'm doing in M5 that isn't directly analogous to a real system, but I suppose it would be similar to extra signals on the bus for the same purpose. The CPU can only generate so many physical address lines (less than 64) so there shouldn't be any collision. The partition with prefix 0 is normal memory, devices, etc. so they don't have to be treated specially, and one of the others is for the APICs to talk to each other. And yes, a comment would be a good idea. I didn't want to put on all the trimmings if this was a dead end. Gabe Steve Reinhardt wrote: My initial reaction is even if this works, this can't possibly be the best way to do it... where do APIC messages live in the address space? How does 'Addr.max 4' let them through? Did you really think this change didn't need a comment? ;-) On Tue, Nov 23, 2010 at 3:39 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This seems to get APIC messages back to the CPU, but I really don't know if it's the right way to do this. I have the feeling there are forces at work in this code I don't fully appreciate. Gabe Gabe Black wrote: This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/323/ Review request for Default. By Gabe Black. Description Mem,X86: Make the IO bridge pass APIC messages back towards the CPU. Diffs * configs/example/fs.py (865e37d507c7) View Diff http://reviews.m5sim.org/r/323/diff/ ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] X86 FS regression
Of these, I think the walker cache sounds better for two reasons. First, it avoids the L1 pollution Ali was talking about, and second, a new bus would add mostly inert stuff on the way to memory and which would involve looking up what port to use even though it'd always be the same one. I'll give that a try. Gabe Steve Reinhardt wrote: I think the two easy (python-only) solutions are sharing the existing L1 via a bus and tacking on a small L1 to the walker. Which one is more realistic would depend on what you're trying to model. Steve On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi sa...@umich.edu mailto:sa...@umich.edu wrote: So what is the relatively good way to make this work in the short term? A bus? What about the slightly better version? I suppose a small cache might be ok and probably somewhat realistic. Thanks, Ali On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt ste...@gmail.com mailto:ste...@gmail.com wrote: And even though I do think it could be made to work, I'm not sure it would be easy or a good idea. There are a lot of corner cases to worry about, especially for writes, since you'd have to actually buffer the write data somewhere as opposed to just remembering that so-and-so has requested an exclusive copy. Actually as I think about it, that might be the case that's breaking now... if the L1 has an exclusive copy and then it snoops a write (and not a read-exclusive), I'm guessing it will just invalidate its copy, losing the modifications. I wouldn't be terribly surprised if reads are working OK (the L1 should snoop those and respond if it's the owner), and of course it's all OK if the L1 doesn't have a copy of the block. So maybe there is a relatively easy way to make this work, but figuring out whether that's true and then testing it is still a non-trivial amount of effort. Steve On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt ste...@gmail.com mailto:ste...@gmail.com wrote: No, when the L2 receives a request it assumes the L1s above it have already been snooped, which is true since the request came in on the bus that the L1s snoop. The issue is that caches don't necessarily behave correctly when non-cache-block requests come in through their mem-side (snoop) port and not through their cpu-side (request) port. I'm guessing this could be made to work, I'd just be very surprised if it does right now, since the caches weren't designed to deal with this case and aren't tested this way. Steve On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu mailto:sa...@umich.edu wrote: Does it? Shouldn't the l2 receive the request, ask for the block and end up snooping the l1s? Ali On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com mailto:ste...@gmail.com wrote: The point is that connecting between the L1 and L2 induces the same problems wrt the L1 that connecting directly to memory induces wrt the whole cache hierarchy. You're just statistically more likely to get away with it in the former case because the L1 is smaller. Steve On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu mailto:sa...@umich.edu wrote: Where are you connecting the table walker? If it's between the l1 and l2 my guess is that it will work. if it is to the memory bus, yes, memory is just responding without the help of a cache and this could be the reason. Ali On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: I think I may have just now. I've fixed a few issues, and am now getting to the point where something that should be in the pagetables is causing a page fault. I found where the table walker is walking the tables for this particular access, and the last level entry is all 0s. There could be a number of reasons this is all 0s, but since the main difference other than timing between this and a working configuration is the presence of caches and we've identified a
[m5-dev] changeset in m5: Params: Add parameter types for IP addresses in...
changeset 369f90d32e2e in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=369f90d32e2e description: Params: Add parameter types for IP addresses in various forms. New parameter forms are: IP address in the format a.b.c.d where a-d are from decimal 0 to 255. IP address with netmask which is an IP followed by /n where n is a netmask length in bits from decimal 0 to 32 or by /e.f.g.h where e-h are from decimal 0 to 255 and which is all 1 bits followed by all 0 bits when represented in binary. These can also be specified as an integral IP and netmask passed in separately. IP address with port which is an IP followed by :p where p is a port index from decimal 0 to 65535. These can also be specified as an integral IP and port value passed in separately. diffstat: src/base/inet.cc | 67 + src/base/inet.hh | 59 +++ src/python/m5/params.py | 158 ++ src/python/m5/util/convert.py | 49 + src/python/swig/inet.i| 19 + 5 files changed, 352 insertions(+), 0 deletions(-) diffs (truncated from 405 to 300 lines): diff -r 865e37d507c7 -r 369f90d32e2e src/base/inet.cc --- a/src/base/inet.cc Tue Nov 23 06:11:50 2010 -0500 +++ b/src/base/inet.cc Tue Nov 23 15:54:43 2010 -0500 @@ -117,6 +117,73 @@ return stream; } +string +IpAddress::string() const +{ +stringstream stream; +stream *this; +return stream.str(); +} + +bool +operator==(const IpAddress left, const IpAddress right) +{ +return left.ip() == right.ip(); +} + +ostream +operator(ostream stream, const IpAddress ia) +{ +uint32_t ip = ia.ip(); +ccprintf(stream, %x.%x.%x.%x, +(uint8_t)(ip 0), (uint8_t)(ip 8), +(uint8_t)(ip 16), (uint8_t)(ip 24)); +return stream; +} + +string +IpNetmask::string() const +{ +stringstream stream; +stream *this; +return stream.str(); +} + +bool +operator==(const IpNetmask left, const IpNetmask right) +{ +return (left.ip() == right.ip()) +(left.netmask() == right.netmask()); +} + +ostream +operator(ostream stream, const IpNetmask in) +{ +ccprintf(stream, %s/%d, (const IpAddress )in, in.netmask()); +return stream; +} + +string +IpWithPort::string() const +{ +stringstream stream; +stream *this; +return stream.str(); +} + +bool +operator==(const IpWithPort left, const IpWithPort right) +{ +return (left.ip() == right.ip()) (left.port() == right.port()); +} + +ostream +operator(ostream stream, const IpWithPort iwp) +{ +ccprintf(stream, %s:%d, (const IpAddress )iwp, iwp.port()); +return stream; +} + uint16_t cksum(const IpPtr ptr) { diff -r 865e37d507c7 -r 369f90d32e2e src/base/inet.hh --- a/src/base/inet.hh Tue Nov 23 06:11:50 2010 -0500 +++ b/src/base/inet.hh Tue Nov 23 15:54:43 2010 -0500 @@ -147,6 +147,65 @@ /* * IP Stuff */ +struct IpAddress +{ + protected: +uint32_t _ip; + + public: +IpAddress() : _ip(0) +{} +IpAddress(const uint32_t __ip) : _ip(__ip) +{} + +uint32_t ip() const { return _ip; } + +std::string string() const; +}; + +std::ostream operator(std::ostream stream, const IpAddress ia); +bool operator==(const IpAddress left, const IpAddress right); + +struct IpNetmask : public IpAddress +{ + protected: +uint8_t _netmask; + + public: +IpNetmask() : IpAddress(), _netmask(0) +{} +IpNetmask(const uint32_t __ip, const uint8_t __netmask) : +IpAddress(__ip), _netmask(__netmask) +{} + +uint8_t netmask() const { return _netmask; } + +std::string string() const; +}; + +std::ostream operator(std::ostream stream, const IpNetmask in); +bool operator==(const IpNetmask left, const IpNetmask right); + +struct IpWithPort : public IpAddress +{ + protected: +uint16_t _port; + + public: +IpWithPort() : IpAddress(), _port(0) +{} +IpWithPort(const uint32_t __ip, const uint16_t __port) : +IpAddress(__ip), _port(__port) +{} + +uint8_t port() const { return _port; } + +std::string string() const; +}; + +std::ostream operator(std::ostream stream, const IpWithPort iwp); +bool operator==(const IpWithPort left, const IpWithPort right); + struct IpOpt; struct IpHdr : public ip_hdr { diff -r 865e37d507c7 -r 369f90d32e2e src/python/m5/params.py --- a/src/python/m5/params.py Tue Nov 23 06:11:50 2010 -0500 +++ b/src/python/m5/params.py Tue Nov 23 15:54:43 2010 -0500 @@ -675,6 +675,163 @@ def ini_str(self): return self.value +# When initializing an IpAddress, pass in an existing IpAddress, a string of +# the form a.b.c.d, or an integer representing an IP. +class IpAddress(ParamValue): +cxx_type = 'Net::IpAddress' + +@classmethod +def cxx_predecls(cls, code): +code('#include base/inet.hh') + +@classmethod +def swig_predecls(cls, code): +
Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.
Look at what the code is doing to get the values? left shifting and converting to a byte? It's going to print all 0s. Oh yeah, you're right, those shifts look like they're going the wrong way. I'll fix that. heh, duh. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: Copyright: Add AMD copyright to the param chang...
changeset 6a7207241112 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=6a7207241112 description: Copyright: Add AMD copyright to the param changes I just made. diffstat: src/base/inet.cc | 2 ++ src/base/inet.hh | 2 ++ src/python/m5/params.py | 1 + src/python/m5/util/convert.py | 2 ++ src/python/swig/inet.i| 2 ++ 5 files changed, 9 insertions(+), 0 deletions(-) diffs (82 lines): diff -r 369f90d32e2e -r 6a7207241112 src/base/inet.cc --- a/src/base/inet.cc Tue Nov 23 15:54:43 2010 -0500 +++ b/src/base/inet.cc Tue Nov 23 17:08:41 2010 -0500 @@ -1,5 +1,6 @@ /* * Copyright (c) 2002-2005 The Regents of The University of Michigan + * Copyright (c) 2010 Advanced Micro Devices, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -26,6 +27,7 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * Authors: Nathan Binkert + * Gabe Black */ #include cstdio diff -r 369f90d32e2e -r 6a7207241112 src/base/inet.hh --- a/src/base/inet.hh Tue Nov 23 15:54:43 2010 -0500 +++ b/src/base/inet.hh Tue Nov 23 17:08:41 2010 -0500 @@ -1,5 +1,6 @@ /* * Copyright (c) 2002-2005 The Regents of The University of Michigan + * Copyright (c) 2010 Advanced Micro Devices, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -27,6 +28,7 @@ * * Authors: Nathan Binkert * Steve Reinhardt + * Gabe Black */ #ifndef __BASE_INET_HH__ diff -r 369f90d32e2e -r 6a7207241112 src/python/m5/params.py --- a/src/python/m5/params.py Tue Nov 23 15:54:43 2010 -0500 +++ b/src/python/m5/params.py Tue Nov 23 17:08:41 2010 -0500 @@ -27,6 +27,7 @@ # # Authors: Steve Reinhardt # Nathan Binkert +# Gabe Black # # diff -r 369f90d32e2e -r 6a7207241112 src/python/m5/util/convert.py --- a/src/python/m5/util/convert.py Tue Nov 23 15:54:43 2010 -0500 +++ b/src/python/m5/util/convert.py Tue Nov 23 17:08:41 2010 -0500 @@ -1,4 +1,5 @@ # Copyright (c) 2005 The Regents of The University of Michigan +# Copyright (c) 2010 Advanced Micro Devices, Inc. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -25,6 +26,7 @@ # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # # Authors: Nathan Binkert +# Gabe Black # metric prefixes exa = 1.0e18 diff -r 369f90d32e2e -r 6a7207241112 src/python/swig/inet.i --- a/src/python/swig/inet.iTue Nov 23 15:54:43 2010 -0500 +++ b/src/python/swig/inet.iTue Nov 23 17:08:41 2010 -0500 @@ -1,5 +1,6 @@ /* * Copyright (c) 2006 The Regents of The University of Michigan + * Copyright (c) 2010 Advanced Micro Devices, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -26,6 +27,7 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * Authors: Nathan Binkert + * Gabe Black */ %{ ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev