[m5-dev] Cron m5t...@zizzer /z/m5/regression/do-regression quick

2010-11-23 Thread Cron Daemon
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory
 passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp 
passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory
 passed.
* build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby 
passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token
 passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing 
passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory
 passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby 
passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual
 passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic 
passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing 
passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby 
passed.
* 
build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic
 passed.
* build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed.
* build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic 
passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing 
passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing-ruby 
passed.
* build/ARM_SE/tests/fast/quick/00.hello/arm/linux/simple-atomic passed.
* build/X86_SE/tests/fast/quick/00.hello/x86/linux/simple-timing-ruby 
passed.
* 
build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/o3-timing-mp
 passed.
* 

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Gabe Black
I see that the bridge and cache are in parallel like you're describing.
The culprit seems to be this line:

configs/example/fs.py:test_sys.bridge.filter_ranges_a=[AddrRange(0,
Addr.max)]

where the bridge is being told explicitly not to let anything through
from the IO side to the memory side. That should be fairly
straightforward to poke a hole in for the necessary ranges. The
corresponding line for the other direction (below) brings up another
question. What happens if the bridge doesn't disallow something to go
across and something else wants to respond to an address? The bridge
isn't set to ignore APIC messages implementing IPIs between CPUs, but
those seem to be going between CPUs and not out into the IO system. Are
we just getting lucky? This same thing would seem to apply to any other
memory side object that isn't in the address range 0-mem_size.

configs/example/fs.py:   
test_sys.bridge.filter_ranges_b=[AddrRange(mem_size)]

Gabe

Steve Reinhardt wrote:
 I believe the I/O cache is normally paired with a bridge that lets
 things flow in the other direction.  It's really just designed to
 handle accesses to cacheable space from devices on the I/O bus without
 requiring each device to have a cache.  It's possible we've never had
 a situation before where I/O devices issue accesses to uncacheable
 non-memory locations on the CPU side of the I/O cache, in which case I
 would not be terribly surprised if that didn't quite work.

 Steve

 On Mon, Nov 22, 2010 at 11:59 AM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 The cache claims to support all addresses on the CPU side (or so says
 the comments), but no addresses on the memory side. Messages going
 from
 the IO interrupt controller get to the IO bus but then don't know
 where
 to go since the IO cache hides the fact that the CPU interrupt
 controller wants to receive messages on that address range. I also
 don't
 know if the cache can handle messages passing through originating from
 the memory side, but I didn't look into that.

 Gabe

 Ali Saidi wrote:
  Something has to maintain i/o coherency and that something looks
 an whole lot like a couple line cache. Why is having a cache there
 any issue, they should pass right through the cache?
 
  Ali
 
 
 
  On Nov 22, 2010, at 4:42 AM, Gabe Black wrote:
 
 
  Hmm. It looks like this IO cache is only added when there are
 caches in
  the system (a fix for some coherency something? I sort of
 remember that
  discussion.) and that wouldn't propagate to the IO bus the fact
 that the
  CPU's local APIC wanted to receive interrupt messages passed
 over the
  memory system. I don't know the intricacies of why the IO cache was
  necessary, or what problems passing requests back up through
 the cache
  might cause, but this is a serious issue for x86 and any other
 ISA that
  wants to move to a message based interrupt scheme. I suppose the
  interrupt objects could be connected all the way out onto the
 IO bus
  itself, bypassing that cache, but I'm not sure how realistic
 that is.
 
  Gabe Black wrote:
 
 For anybody waiting for an x86 FS regression (yes, I know,
 you can
  all hardly wait, but don't let this spoil your Thanksgiving)
 I'm getting
  closer to having it working, but I've discovered some issues
 with the
  mechanisms behind the --caches flag with fs.py and x86. I'm
 surprised I
  never thought to try it before. It also brings up some
 questions about
  where the table walkers should be hooked up in x86 and ARM.
 Currently
  it's after the L1, if any, but before the L2, if any, which
 seems wrong
  to me. Also caches don't seem to propagate requests upwards to
 the CPUs
  which may or may not be an issue. I'm still looking into that.
 
  Gabe
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___

[m5-dev] changeset in m5: X86: Obey the PCD (cache disable) bit in the pa...

2010-11-23 Thread Gabe Black
changeset 8e8fa2f28f2e in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=8e8fa2f28f2e
description:
X86: Obey the PCD (cache disable) bit in the page tables.

diffstat:

 src/arch/x86/tlb.cc |  2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diffs (12 lines):

diff -r 6246338ac1e9 -r 8e8fa2f28f2e src/arch/x86/tlb.cc
--- a/src/arch/x86/tlb.cc   Mon Nov 22 05:49:03 2010 -0500
+++ b/src/arch/x86/tlb.cc   Tue Nov 23 06:10:17 2010 -0500
@@ -653,6 +653,8 @@
 Addr paddr = entry-paddr | (vaddr  (entry-size-1));
 DPRINTF(TLB, Translated %#x - %#x.\n, vaddr, paddr);
 req-setPaddr(paddr);
+if (entry-uncacheable)
+req-setFlags(Request::UNCACHEABLE);
 } else {
 //Use the address which already has segmentation applied.
 DPRINTF(TLB, Paging disabled.\n);
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] X86 FS regression

2010-11-23 Thread Gabe Black
I think I may have just now. I've fixed a few issues, and am now getting
to the point where something that should be in the pagetables is causing
a page fault. I found where the table walker is walking the tables for
this particular access, and the last level entry is all 0s. There could
be a number of reasons this is all 0s, but since the main difference
other than timing between this and a working configuration is the
presence of caches and we've identified a potential issue there, I'm
inclined to suspect the actual page table entry is still in the L1 and
hasn't been evicted out to memory yet.

To fix this, is the best solution to add a bus below the CPU for all the
connections that need to go to the L1? I'm assuming they'd all go into
the dcache since they're more data-ey and that keeps the icache read
only (ignoring SMC issues), and the dcache is probably servicing lower
bandwidth normally. It also seems a little strange that this type of
configuration is going on in the BaseCPU.py SimObject python file and
not a configuration file, but I could be convinced there's a reason.
Even if this isn't really a fix or the right thing to do, I'd still
like to try it temporarily at least to see if it corrects the problem
I'm seeing.

Gabe

Ali Saidi wrote:

 I haven't seen any strange behavior yet. That isn't to say it's not
 going to cause an issue in the future, but we've taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

 Yea, I just got around to reading this thread and that was the point
 I was going to make... the L1 cache effectively serves as a
 translator between the CPU's word-size read  write requests and the
 coherent block-level requests that get snooped.  If you attach a
 CPU-like device (such as the table walker) directly to an L2, the
 CPU-like accesses that go to the L2 will get sent to the L1s but I'm
 not sure they'll be handled correctly.  Not that they fundamentally
 couldn't, this just isn't a configuration we test so it's likely that
 there are problems... for example, the L1 may try to hand ownership
 to the requester but the requester won't recognize that and things
 will break.

 Steve

 On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 What happens if an entry is in the L1 but not the L2?

 Gabe

 Ali Saidi wrote:
  Between the l1 and l2 caches seems like a good place to me. The
 caches can cache page table entries, otherwise a tlb miss would
 be even more expensive then it is. The l1 isn't normally used for
 such things since it would get polluted (look why sparc has a
 load 128bits from l2, do not allocate into l1 instruction).
 
  Ali
 
  On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:
 
 
 For anybody waiting for an x86 FS regression (yes, I know,
 you can
  all hardly wait, but don't let this spoil your Thanksgiving)
 I'm getting
  closer to having it working, but I've discovered some issues
 with the
  mechanisms behind the --caches flag with fs.py and x86. I'm
 surprised I
  never thought to try it before. It also brings up some
 questions about
  where the table walkers should be hooked up in x86 and ARM.
 Currently
  it's after the L1, if any, but before the L2, if any, which
 seems wrong
  to me. Also caches don't seem to propagate requests upwards to
 the CPUs
  which may or may not be an issue. I'm still looking into that.
 
  Gabe
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


  

 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: X86: Loosen an assert for x86 and connect the A...

2010-11-23 Thread Gabe Black
changeset 865e37d507c7 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=865e37d507c7
description:
X86: Loosen an assert for x86 and connect the APIC ports when caches 
are used.

diffstat:

 src/cpu/BaseCPU.py |  4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diffs (21 lines):

diff -r 8e8fa2f28f2e -r 865e37d507c7 src/cpu/BaseCPU.py
--- a/src/cpu/BaseCPU.pyTue Nov 23 06:10:17 2010 -0500
+++ b/src/cpu/BaseCPU.pyTue Nov 23 06:11:50 2010 -0500
@@ -167,7 +167,7 @@
 exec('self.%s = bus.port' % p)
 
 def addPrivateSplitL1Caches(self, ic, dc):
-assert(len(self._mem_ports)  6)
+assert(len(self._mem_ports)  8)
 self.icache = ic
 self.dcache = dc
 self.icache_port = ic.cpu_side
@@ -176,6 +176,8 @@
 if buildEnv['FULL_SYSTEM']:
 if buildEnv['TARGET_ISA'] in ['x86', 'arm']:
 self._mem_ports += [itb.walker.port, dtb.walker.port]
+if buildEnv['TARGET_ISA'] == 'x86':
+self._mem_ports += [interrupts.pio, interrupts.int_port]
 
 def addTwoLevelCacheHierarchy(self, ic, dc, l2c):
 self.addPrivateSplitL1Caches(ic, dc)
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.

2010-11-23 Thread Gabe Black

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/323/
---

Review request for Default.


Summary
---

Mem,X86: Make the IO bridge pass APIC messages back towards the CPU.


Diffs
-

  configs/example/fs.py 865e37d507c7 

Diff: http://reviews.m5sim.org/r/323/diff


Testing
---


Thanks,

Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.

2010-11-23 Thread Gabe Black
This seems to get APIC messages back to the CPU, but I really don't know
if it's the right way to do this. I have the feeling there are forces at
work in this code I don't fully appreciate.

Gabe

Gabe Black wrote:
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.m5sim.org/r/323/


 Review request for Default.
 By Gabe Black.


   Description

 Mem,X86: Make the IO bridge pass APIC messages back towards the CPU.


   Diffs

 * configs/example/fs.py (865e37d507c7)

 View Diff http://reviews.m5sim.org/r/323/diff/

 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] X86 FS regression

2010-11-23 Thread Ali Saidi


Where are you connecting the table walker? If it's between the l1 and 
l2 my guess is that it will work. if it is to the memory bus, yes, 
memory is just responding without the help of a cache and this could be 
the reason.


Ali


On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu 
wrote:
I think I may have just now. I've fixed a few issues, and am now 
getting
to the point where something that should be in the pagetables is 
causing
a page fault. I found where the table walker is walking the tables 
for
this particular access, and the last level entry is all 0s. There 
could

be a number of reasons this is all 0s, but since the main difference
other than timing between this and a working configuration is the
presence of caches and we've identified a potential issue there, I'm
inclined to suspect the actual page table entry is still in the L1 
and

hasn't been evicted out to memory yet.

To fix this, is the best solution to add a bus below the CPU for all 
the
connections that need to go to the L1? I'm assuming they'd all go 
into

the dcache since they're more data-ey and that keeps the icache read
only (ignoring SMC issues), and the dcache is probably servicing 
lower

bandwidth normally. It also seems a little strange that this type of
configuration is going on in the BaseCPU.py SimObject python file and
not a configuration file, but I could be convinced there's a reason.
Even if this isn't really a fix or the right thing to do, I'd 
still

like to try it temporarily at least to see if it corrects the problem
I'm seeing.

Gabe

Ali Saidi wrote:


I haven't seen any strange behavior yet. That isn't to say it's not
going to cause an issue in the future, but we've taken many a tlb 
miss

and it hasn't fallen over yet.

Ali

On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt 
ste...@gmail.com

wrote:

Yea, I just got around to reading this thread and that was the 
point

I was going to make... the L1 cache effectively serves as a
translator between the CPU's word-size read  write requests and 
the

coherent block-level requests that get snooped.  If you attach a
CPU-like device (such as the table walker) directly to an L2, the
CPU-like accesses that go to the L2 will get sent to the L1s but 
I'm

not sure they'll be handled correctly.  Not that they fundamentally
couldn't, this just isn't a configuration we test so it's likely 
that

there are problems... for example, the L1 may try to hand ownership
to the requester but the requester won't recognize that and things
will break.

Steve

On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu
mailto:gbl...@eecs.umich.edu wrote:

What happens if an entry is in the L1 but not the L2?

Gabe

Ali Saidi wrote:
 Between the l1 and l2 caches seems like a good place to me. 
The

caches can cache page table entries, otherwise a tlb miss would
be even more expensive then it is. The l1 isn't normally used 
for

such things since it would get polluted (look why sparc has a
load 128bits from l2, do not allocate into l1 instruction).

 Ali

 On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:


For anybody waiting for an x86 FS regression (yes, I 
know,

you can
 all hardly wait, but don't let this spoil your Thanksgiving)
I'm getting
 closer to having it working, but I've discovered some issues
with the
 mechanisms behind the --caches flag with fs.py and x86. I'm
surprised I
 never thought to try it before. It also brings up some
questions about
 where the table walkers should be hooked up in x86 and ARM.
Currently
 it's after the L1, if any, but before the L2, if any, which
seems wrong
 to me. Also caches don't seem to propagate requests upwards 
to

the CPUs
 which may or may not be an issue. I'm still looking into 
that.


 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev



 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org mailto:m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev








___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] X86 FS regression

2010-11-23 Thread Steve Reinhardt
IIRC, the filter works in conjunction with the address range autodetection
stuff, so in order for a memory request to go across the bridge, the
targeted address must lie on the other side *and* not be filtered out.  I
expect this explains why IPIs aren't going across.

Thinking about it, I'm not sure why the I/O cache doesn't let uncached
accesses through from the I/O side to the memory side, assuming the target
exists on the memory side.  CPU caches certainly let uncached accesses
through, and it's the same cache module in both cases.  Hmm, looking at
fs.py, I think this line may be as much of a culprit as the others:

test_sys.iocache = IOCache(addr_range=mem_size)

I believe the address range exclusions are necessary to avoid an infinite
loop between the iocache and the bridge in the address range autodetection
algorithm, but perhaps the ranges are set up a little too conservatively so
that uncacheable addresses have no way through.  I don't think it matters
whether you open up the range in the iocache or in the bridge to let them
through, as long as (1) you only do one and not the other and (2) it's
selective enough that it doesn't include any PCI addresses that might result
in a loop.

Steve

On Tue, Nov 23, 2010 at 12:17 AM, Gabe Black gbl...@eecs.umich.edu wrote:

 I see that the bridge and cache are in parallel like you're describing.
 The culprit seems to be this line:

 configs/example/fs.py:test_sys.bridge.filter_ranges_a=[AddrRange(0,
 Addr.max)]

 where the bridge is being told explicitly not to let anything through
 from the IO side to the memory side. That should be fairly
 straightforward to poke a hole in for the necessary ranges. The
 corresponding line for the other direction (below) brings up another
 question. What happens if the bridge doesn't disallow something to go
 across and something else wants to respond to an address? The bridge
 isn't set to ignore APIC messages implementing IPIs between CPUs, but
 those seem to be going between CPUs and not out into the IO system. Are
 we just getting lucky? This same thing would seem to apply to any other
 memory side object that isn't in the address range 0-mem_size.

 configs/example/fs.py:
 test_sys.bridge.filter_ranges_b=[AddrRange(mem_size)]

 Gabe

 Steve Reinhardt wrote:
  I believe the I/O cache is normally paired with a bridge that lets
  things flow in the other direction.  It's really just designed to
  handle accesses to cacheable space from devices on the I/O bus without
  requiring each device to have a cache.  It's possible we've never had
  a situation before where I/O devices issue accesses to uncacheable
  non-memory locations on the CPU side of the I/O cache, in which case I
  would not be terribly surprised if that didn't quite work.
 
  Steve
 
  On Mon, Nov 22, 2010 at 11:59 AM, Gabe Black gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu wrote:
 
  The cache claims to support all addresses on the CPU side (or so says
  the comments), but no addresses on the memory side. Messages going
  from
  the IO interrupt controller get to the IO bus but then don't know
  where
  to go since the IO cache hides the fact that the CPU interrupt
  controller wants to receive messages on that address range. I also
  don't
  know if the cache can handle messages passing through originating
 from
  the memory side, but I didn't look into that.
 
  Gabe
 
  Ali Saidi wrote:
   Something has to maintain i/o coherency and that something looks
  an whole lot like a couple line cache. Why is having a cache there
  any issue, they should pass right through the cache?
  
   Ali
  
  
  
   On Nov 22, 2010, at 4:42 AM, Gabe Black wrote:
  
  
   Hmm. It looks like this IO cache is only added when there are
  caches in
   the system (a fix for some coherency something? I sort of
  remember that
   discussion.) and that wouldn't propagate to the IO bus the fact
  that the
   CPU's local APIC wanted to receive interrupt messages passed
  over the
   memory system. I don't know the intricacies of why the IO cache
 was
   necessary, or what problems passing requests back up through
  the cache
   might cause, but this is a serious issue for x86 and any other
  ISA that
   wants to move to a message based interrupt scheme. I suppose the
   interrupt objects could be connected all the way out onto the
  IO bus
   itself, bypassing that cache, but I'm not sure how realistic
  that is.
  
   Gabe Black wrote:
  
  For anybody waiting for an x86 FS regression (yes, I know,
  you can
   all hardly wait, but don't let this spoil your Thanksgiving)
  I'm getting
   closer to having it working, but I've discovered some issues
  with the
   mechanisms behind the --caches flag with fs.py and x86. I'm
  surprised I
   never thought to 

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Steve Reinhardt
The point is that connecting between the L1 and L2 induces the same problems
wrt the L1 that connecting directly to memory induces wrt the whole cache
hierarchy.  You're just statistically more likely to get away with it in the
former case because the L1 is smaller.

Steve

On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote:


 Where are you connecting the table walker? If it's between the l1 and l2 my
 guess is that it will work. if it is to the memory bus, yes, memory is just
 responding without the help of a cache and this could be the reason.

 Ali



 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu
 wrote:

 I think I may have just now. I've fixed a few issues, and am now getting
 to the point where something that should be in the pagetables is causing
 a page fault. I found where the table walker is walking the tables for
 this particular access, and the last level entry is all 0s. There could
 be a number of reasons this is all 0s, but since the main difference
 other than timing between this and a working configuration is the
 presence of caches and we've identified a potential issue there, I'm
 inclined to suspect the actual page table entry is still in the L1 and
 hasn't been evicted out to memory yet.

 To fix this, is the best solution to add a bus below the CPU for all the
 connections that need to go to the L1? I'm assuming they'd all go into
 the dcache since they're more data-ey and that keeps the icache read
 only (ignoring SMC issues), and the dcache is probably servicing lower
 bandwidth normally. It also seems a little strange that this type of
 configuration is going on in the BaseCPU.py SimObject python file and
 not a configuration file, but I could be convinced there's a reason.
 Even if this isn't really a fix or the right thing to do, I'd still
 like to try it temporarily at least to see if it corrects the problem
 I'm seeing.

 Gabe

 Ali Saidi wrote:


 I haven't seen any strange behavior yet. That isn't to say it's not
 going to cause an issue in the future, but we've taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

  Yea, I just got around to reading this thread and that was the point
 I was going to make... the L1 cache effectively serves as a
 translator between the CPU's word-size read  write requests and the
 coherent block-level requests that get snooped.  If you attach a
 CPU-like device (such as the table walker) directly to an L2, the
 CPU-like accesses that go to the L2 will get sent to the L1s but I'm
 not sure they'll be handled correctly.  Not that they fundamentally
 couldn't, this just isn't a configuration we test so it's likely that
 there are problems... for example, the L1 may try to hand ownership
 to the requester but the requester won't recognize that and things
 will break.

 Steve

 On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

What happens if an entry is in the L1 but not the L2?

Gabe

Ali Saidi wrote:
 Between the l1 and l2 caches seems like a good place to me. The
caches can cache page table entries, otherwise a tlb miss would
be even more expensive then it is. The l1 isn't normally used for
such things since it would get polluted (look why sparc has a
load 128bits from l2, do not allocate into l1 instruction).

 Ali

 On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:


For anybody waiting for an x86 FS regression (yes, I know,
you can
 all hardly wait, but don't let this spoil your Thanksgiving)
I'm getting
 closer to having it working, but I've discovered some issues
with the
 mechanisms behind the --caches flag with fs.py and x86. I'm
surprised I
 never thought to try it before. It also brings up some
questions about
 where the table walkers should be hooked up in x86 and ARM.
Currently
 it's after the L1, if any, but before the L2, if any, which
seems wrong
 to me. Also caches don't seem to propagate requests upwards to
the CPUs
 which may or may not be an issue. I'm still looking into that.

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev



 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org mailto:m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Steve Reinhardt
I definitely agree that putting a bus between the CPU and L1 and plugging
the table walker in there is the best way to figure out if this is really
the problem (and I expect it is).

I'm not sure if it's the long-term right answer or not.  We also need to
consider how this works with Ruby.

Steve

On Tue, Nov 23, 2010 at 3:29 AM, Gabe Black gbl...@eecs.umich.edu wrote:

 I think I may have just now. I've fixed a few issues, and am now getting
 to the point where something that should be in the pagetables is causing
 a page fault. I found where the table walker is walking the tables for
 this particular access, and the last level entry is all 0s. There could
 be a number of reasons this is all 0s, but since the main difference
 other than timing between this and a working configuration is the
 presence of caches and we've identified a potential issue there, I'm
 inclined to suspect the actual page table entry is still in the L1 and
 hasn't been evicted out to memory yet.

 To fix this, is the best solution to add a bus below the CPU for all the
 connections that need to go to the L1? I'm assuming they'd all go into
 the dcache since they're more data-ey and that keeps the icache read
 only (ignoring SMC issues), and the dcache is probably servicing lower
 bandwidth normally. It also seems a little strange that this type of
 configuration is going on in the BaseCPU.py SimObject python file and
 not a configuration file, but I could be convinced there's a reason.
 Even if this isn't really a fix or the right thing to do, I'd still
 like to try it temporarily at least to see if it corrects the problem
 I'm seeing.

 Gabe

 Ali Saidi wrote:
 
  I haven't seen any strange behavior yet. That isn't to say it's not
  going to cause an issue in the future, but we've taken many a tlb miss
  and it hasn't fallen over yet.
 
  Ali
 
  On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com
  wrote:
 
  Yea, I just got around to reading this thread and that was the point
  I was going to make... the L1 cache effectively serves as a
  translator between the CPU's word-size read  write requests and the
  coherent block-level requests that get snooped.  If you attach a
  CPU-like device (such as the table walker) directly to an L2, the
  CPU-like accesses that go to the L2 will get sent to the L1s but I'm
  not sure they'll be handled correctly.  Not that they fundamentally
  couldn't, this just isn't a configuration we test so it's likely that
  there are problems... for example, the L1 may try to hand ownership
  to the requester but the requester won't recognize that and things
  will break.
 
  Steve
 
  On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu
  mailto:gbl...@eecs.umich.edu wrote:
 
  What happens if an entry is in the L1 but not the L2?
 
  Gabe
 
  Ali Saidi wrote:
   Between the l1 and l2 caches seems like a good place to me. The
  caches can cache page table entries, otherwise a tlb miss would
  be even more expensive then it is. The l1 isn't normally used for
  such things since it would get polluted (look why sparc has a
  load 128bits from l2, do not allocate into l1 instruction).
  
   Ali
  
   On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:
  
  
  For anybody waiting for an x86 FS regression (yes, I know,
  you can
   all hardly wait, but don't let this spoil your Thanksgiving)
  I'm getting
   closer to having it working, but I've discovered some issues
  with the
   mechanisms behind the --caches flag with fs.py and x86. I'm
  surprised I
   never thought to try it before. It also brings up some
  questions about
   where the table walkers should be hooked up in x86 and ARM.
  Currently
   it's after the L1, if any, but before the L2, if any, which
  seems wrong
   to me. Also caches don't seem to propagate requests upwards to
  the CPUs
   which may or may not be an issue. I'm still looking into that.
  
   Gabe
   ___
   m5-dev mailing list
   m5-dev@m5sim.org mailto:m5-dev@m5sim.org
   http://m5sim.org/mailman/listinfo/m5-dev
  
  
  
   ___
   m5-dev mailing list
   m5-dev@m5sim.org mailto:m5-dev@m5sim.org
   http://m5sim.org/mailman/listinfo/m5-dev
  
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
 
 
  
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.

2010-11-23 Thread Steve Reinhardt
My initial reaction is even if this works, this can't possibly be the best
way to do it... where do APIC messages live in the address space?  How does
'Addr.max  4' let them through?  Did you really think this change didn't
need a comment? ;-)

On Tue, Nov 23, 2010 at 3:39 AM, Gabe Black gbl...@eecs.umich.edu wrote:

 This seems to get APIC messages back to the CPU, but I really don't know
 if it's the right way to do this. I have the feeling there are forces at
 work in this code I don't fully appreciate.

 Gabe

 Gabe Black wrote:
  This is an automatically generated e-mail. To reply, visit:
  http://reviews.m5sim.org/r/323/
 
 
  Review request for Default.
  By Gabe Black.
 
 
Description
 
  Mem,X86: Make the IO bridge pass APIC messages back towards the CPU.
 
 
Diffs
 
  * configs/example/fs.py (865e37d507c7)
 
  View Diff http://reviews.m5sim.org/r/323/diff/
 
  
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] X86 FS regression

2010-11-23 Thread Ali Saidi


Does it? Shouldn't the l2 receive the request, ask for the block and
end up snooping the l1s? 

Ali 

On Tue, 23 Nov 2010 07:30:00 -0800,
Steve Reinhardt  wrote:  

The point is that connecting between the L1
and L2 induces the same problems wrt the L1 that connecting directly to
memory induces wrt the whole cache hierarchy. You're just statistically
more likely to get away with it in the former case because the L1 is
smaller.

Steve

On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi  wrote:


Where are you connecting the table walker? If it's between the l1 and l2
my guess is that it will work. if it is to the memory bus, yes, memory
is just responding without the help of a cache and this could be the
reason.

 Ali 

 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black  wrote:

I think I may have just now. I've fixed a few issues, and am now
getting
 to the point where something that should be in the pagetables
is causing
 a page fault. I found where the table walker is walking the
tables for
 this particular access, and the last level entry is all 0s.
There could
 be a number of reasons this is all 0s, but since the main
difference
 other than timing between this and a working configuration
is the
 presence of caches and we've identified a potential issue there,
I'm
 inclined to suspect the actual page table entry is still in the L1
and
 hasn't been evicted out to memory yet.

 To fix this, is the best
solution to add a bus below the CPU for all the
 connections that need
to go to the L1? I'm assuming they'd all go into
 the dcache since
they're more data-ey and that keeps the icache read
 only (ignoring SMC
issues), and the dcache is probably servicing lower
 bandwidth normally.
It also seems a little strange that this type of
 configuration is going
on in the BaseCPU.py SimObject python file and
 not a configuration
file, but I could be convinced there's a reason.
 Even if this isn't
really a fix or the right thing to do, I'd still
 like to try it
temporarily at least to see if it corrects the problem
 I'm seeing.


Gabe

 Ali Saidi wrote:

 I haven't seen any strange behavior yet. That
isn't to say it's not
 going to cause an issue in the future, but we've
taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On Mon, 22
Nov 2010 13:08:13 -0800, Steve Reinhardt 
 wrote:

 Yea, I just got
around to reading this thread and that was the point
 I was going to
make... the L1 cache effectively serves as a
 translator between the
CPU's word-size read  write requests and the
 coherent block-level
requests that get snooped. If you attach a
 CPU-like device (such as the
table walker) directly to an L2, the
 CPU-like accesses that go to the
L2 will get sent to the L1s but I'm
 not sure they'll be handled
correctly. Not that they fundamentally
 couldn't, this just isn't a
configuration we test so it's likely that
 there are problems... for
example, the L1 may try to hand ownership
 to the requester but the
requester won't recognize that and things
 will break.

 Steve

 On Mon,
Nov 22, 2010 at 12:00 PM, Gabe Black  wrote:

 What happens if an entry
is in the L1 but not the L2?

 Gabe

 Ali Saidi wrote:
  Between the l1
and l2 caches seems like a good place to me. The
 caches can cache page
table entries, otherwise a tlb miss would
 be even more expensive then
it is. The l1 isn't normally used for
 such things since it would get
polluted (look why sparc has a
 load 128bits from l2, do not allocate
into l1 instruction).
 
  Ali
 
  On Nov 22, 2010, at 4:27 AM, Gabe
Black wrote:
 
 
  For anybody waiting for an x86 FS regression
(yes, I know,
 you can
  all hardly wait, but don't let this spoil
your Thanksgiving)
 I'm getting
  closer to having it working, but
I've discovered some issues
 with the
  mechanisms behind the --caches
flag with fs.py and x86. I'm
 surprised I
  never thought to try it
before. It also brings up some
 questions about
  where the table
walkers should be hooked up in x86 and ARM.
 Currently
  it's after
the L1, if any, but before the L2, if any, which
 seems wrong
  to me.
Also caches don't seem to propagate requests upwards to
 the CPUs
 
which may or may not be an issue. I'm still looking into that.
 
 
Gabe
  ___
  m5-dev
mailing list
  m5-dev@m5sim.org [5] m5-dev@m5sim.org
 
http://m5sim.org/mailman/listinfo/m5-dev [6]
 
 
 
 
___
  m5-dev mailing list

 m5-dev@m5sim.org [7] m5-dev@m5sim.org
 
http://m5sim.org/mailman/listinfo/m5-dev [8]
 


___
 m5-dev mailing list

m5-dev@m5sim.org [9] m5-dev@m5sim.org

http://m5sim.org/mailman/listinfo/m5-dev [10]





___
 m5-dev mailing
list
m5-dev@m5sim.org [11]
http://m5sim.org/mailman/listinfo/m5-dev
[12]

 ___
 m5-dev mailing
list
m5-dev@m5sim.org [13]

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Steve Reinhardt
No, when the L2 receives a request it assumes the L1s above it have already
been snooped, which is true since the request came in on the bus that the
L1s snoop.  The issue is that caches don't necessarily behave correctly when
non-cache-block requests come in through their mem-side (snoop) port and not
through their cpu-side (request) port.  I'm guessing this could be made to
work, I'd just be very surprised if it does right now, since the caches
weren't designed to deal with this case and aren't tested this way.

Steve

On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu wrote:

 Does it? Shouldn't the l2 receive the request, ask for the block and end up
 snooping the l1s?



 Ali





 On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

 The point is that connecting between the L1 and L2 induces the same
 problems wrt the L1 that connecting directly to memory induces wrt the whole
 cache hierarchy.  You're just statistically more likely to get away with it
 in the former case because the L1 is smaller.

 Steve

 On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote:


 Where are you connecting the table walker? If it's between the l1 and l2
 my guess is that it will work. if it is to the memory bus, yes, memory is
 just responding without the help of a cache and this could be the reason.

 Ali



 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu
 wrote:

 I think I may have just now. I've fixed a few issues, and am now getting
 to the point where something that should be in the pagetables is causing
 a page fault. I found where the table walker is walking the tables for
 this particular access, and the last level entry is all 0s. There could
 be a number of reasons this is all 0s, but since the main difference
 other than timing between this and a working configuration is the
 presence of caches and we've identified a potential issue there, I'm
 inclined to suspect the actual page table entry is still in the L1 and
 hasn't been evicted out to memory yet.

 To fix this, is the best solution to add a bus below the CPU for all the
 connections that need to go to the L1? I'm assuming they'd all go into
 the dcache since they're more data-ey and that keeps the icache read
 only (ignoring SMC issues), and the dcache is probably servicing lower
 bandwidth normally. It also seems a little strange that this type of
 configuration is going on in the BaseCPU.py SimObject python file and
 not a configuration file, but I could be convinced there's a reason.
 Even if this isn't really a fix or the right thing to do, I'd still
 like to try it temporarily at least to see if it corrects the problem
 I'm seeing.

 Gabe

 Ali Saidi wrote:


 I haven't seen any strange behavior yet. That isn't to say it's not
 going to cause an issue in the future, but we've taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

 Yea, I just got around to reading this thread and that was the point
 I was going to make... the L1 cache effectively serves as a
 translator between the CPU's word-size read  write requests and the
 coherent block-level requests that get snooped.  If you attach a
 CPU-like device (such as the table walker) directly to an L2, the
 CPU-like accesses that go to the L2 will get sent to the L1s but I'm
 not sure they'll be handled correctly.  Not that they fundamentally
 couldn't, this just isn't a configuration we test so it's likely that
 there are problems... for example, the L1 may try to hand ownership
 to the requester but the requester won't recognize that and things
 will break.

 Steve

 On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu
 gbl...@eecs.umich.edu wrote:

What happens if an entry is in the L1 but not the L2?

Gabe

Ali Saidi wrote:
 Between the l1 and l2 caches seems like a good place to me. The
caches can cache page table entries, otherwise a tlb miss would
be even more expensive then it is. The l1 isn't normally used for
such things since it would get polluted (look why sparc has a
load 128bits from l2, do not allocate into l1 instruction).

 Ali

 On Nov 22, 2010, at 4:27 AM, Gabe Black wrote:


For anybody waiting for an x86 FS regression (yes, I know,
you can
 all hardly wait, but don't let this spoil your Thanksgiving)
I'm getting
 closer to having it working, but I've discovered some issues
with the
 mechanisms behind the --caches flag with fs.py and x86. I'm
surprised I
 never thought to try it before. It also brings up some
questions about
 where the table walkers should be hooked up in x86 and ARM.
Currently
 it's after the L1, if any, but before the L2, if any, which
seems wrong
 to me. Also caches don't seem to propagate requests upwards to
the CPUs
 which may or may not be an issue. I'm still 

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Steve Reinhardt
And even though I do think it could be made to work, I'm not sure it would
be easy or a good idea.  There are a lot of corner cases to worry about,
especially for writes, since you'd have to actually buffer the write data
somewhere as opposed to just remembering that so-and-so has requested an
exclusive copy.

Actually as I think about it, that might be the case that's breaking now...
if the L1 has an exclusive copy and then it snoops a write (and not a
read-exclusive), I'm guessing it will just invalidate its copy, losing the
modifications.  I wouldn't be terribly surprised if reads are working OK
(the L1 should snoop those and respond if it's the owner), and of course
it's all OK if the L1 doesn't have a copy of the block.

So maybe there is a relatively easy way to make this work, but figuring out
whether that's true and then testing it is still a non-trivial amount of
effort.

Steve

On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt ste...@gmail.com wrote:

 No, when the L2 receives a request it assumes the L1s above it have already
 been snooped, which is true since the request came in on the bus that the
 L1s snoop.  The issue is that caches don't necessarily behave correctly when
 non-cache-block requests come in through their mem-side (snoop) port and not
 through their cpu-side (request) port.  I'm guessing this could be made to
 work, I'd just be very surprised if it does right now, since the caches
 weren't designed to deal with this case and aren't tested this way.

 Steve


 On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu wrote:

 Does it? Shouldn't the l2 receive the request, ask for the block and end
 up snooping the l1s?



 Ali





 On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

 The point is that connecting between the L1 and L2 induces the same
 problems wrt the L1 that connecting directly to memory induces wrt the whole
 cache hierarchy.  You're just statistically more likely to get away with it
 in the former case because the L1 is smaller.

 Steve

 On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote:


 Where are you connecting the table walker? If it's between the l1 and l2
 my guess is that it will work. if it is to the memory bus, yes, memory is
 just responding without the help of a cache and this could be the reason.

 Ali



 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu
 wrote:

 I think I may have just now. I've fixed a few issues, and am now getting
 to the point where something that should be in the pagetables is causing
 a page fault. I found where the table walker is walking the tables for
 this particular access, and the last level entry is all 0s. There could
 be a number of reasons this is all 0s, but since the main difference
 other than timing between this and a working configuration is the
 presence of caches and we've identified a potential issue there, I'm
 inclined to suspect the actual page table entry is still in the L1 and
 hasn't been evicted out to memory yet.

 To fix this, is the best solution to add a bus below the CPU for all the
 connections that need to go to the L1? I'm assuming they'd all go into
 the dcache since they're more data-ey and that keeps the icache read
 only (ignoring SMC issues), and the dcache is probably servicing lower
 bandwidth normally. It also seems a little strange that this type of
 configuration is going on in the BaseCPU.py SimObject python file and
 not a configuration file, but I could be convinced there's a reason.
 Even if this isn't really a fix or the right thing to do, I'd still
 like to try it temporarily at least to see if it corrects the problem
 I'm seeing.

 Gabe

 Ali Saidi wrote:


 I haven't seen any strange behavior yet. That isn't to say it's not
 going to cause an issue in the future, but we've taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

 Yea, I just got around to reading this thread and that was the point
 I was going to make... the L1 cache effectively serves as a
 translator between the CPU's word-size read  write requests and the
 coherent block-level requests that get snooped.  If you attach a
 CPU-like device (such as the table walker) directly to an L2, the
 CPU-like accesses that go to the L2 will get sent to the L1s but I'm
 not sure they'll be handled correctly.  Not that they fundamentally
 couldn't, this just isn't a configuration we test so it's likely that
 there are problems... for example, the L1 may try to hand ownership
 to the requester but the requester won't recognize that and things
 will break.

 Steve

 On Mon, Nov 22, 2010 at 12:00 PM, Gabe Black gbl...@eecs.umich.edu
 gbl...@eecs.umich.edu wrote:

What happens if an entry is in the L1 but not the L2?

Gabe

Ali Saidi wrote:
 Between the l1 and l2 caches seems like a good place to me. The
caches can cache page table entries, otherwise a tlb 

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Ali Saidi


So what is the relatively good way to make this work in the short
term? A bus? What about the slightly better version? I suppose a small
cache might be ok and probably somewhat realistic. 

Thanks, 

Ali 

On
Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt  wrote:  

And even
though I do think it could be made to work, I'm not sure it would be
easy or a good idea. There are a lot of corner cases to worry about,
especially for writes, since you'd have to actually buffer the write
data somewhere as opposed to just remembering that so-and-so has
requested an exclusive copy.

Actually as I think about it, that might
be the case that's breaking now... if the L1 has an exclusive copy and
then it snoops a write (and not a read-exclusive), I'm guessing it will
just invalidate its copy, losing the modifications. I wouldn't be
terribly surprised if reads are working OK (the L1 should snoop those
and respond if it's the owner), and of course it's all OK if the L1
doesn't have a copy of the block.

So maybe there is a relatively easy
way to make this work, but figuring out whether that's true and then
testing it is still a non-trivial amount of effort.

Steve

On Tue, Nov
23, 2010 at 7:57 AM, Steve Reinhardt  wrote:
 No, when the L2 receives a
request it assumes the L1s above it have already been snooped, which is
true since the request came in on the bus that the L1s snoop. The issue
is that caches don't necessarily behave correctly when non-cache-block
requests come in through their mem-side (snoop) port and not through
their cpu-side (request) port. I'm guessing this could be made to work,
I'd just be very surprised if it does right now, since the caches
weren't designed to deal with this case and aren't tested this
way.

Steve 

On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi  wrote:

Does
it? Shouldn't the l2 receive the request, ask for the block and end up
snooping the l1s? 

Ali 

On Tue, 23 Nov 2010 07:30:00 -0800, Steve
Reinhardt  wrote:

The point is that connecting between the L1 and
L2 induces the same problems wrt the L1 that connecting directly to
memory induces wrt the whole cache hierarchy. You're just statistically
more likely to get away with it in the former case because the L1 is
smaller.

Steve

On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi  wrote:   


Where are you connecting the table walker? If it's between the l1 and l2
my guess is that it will work. if it is to the memory bus, yes, memory
is just responding without the help of a cache and this could be the
reason.

 Ali  

 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black  wrote:
 

I think I may have just now. I've fixed a few issues, and am now
getting
 to the point where something that should be in the pagetables
is causing
 a page fault. I found where the table walker is walking the
tables for
 this particular access, and the last level entry is all 0s.
There could
 be a number of reasons this is all 0s, but since the main
difference
 other than timing between this and a working configuration
is the
 presence of caches and we've identified a potential issue there,
I'm
 inclined to suspect the actual page table entry is still in the L1
and
 hasn't been evicted out to memory yet.

 To fix this, is the best
solution to add a bus below the CPU for all the
 connections that need
to go to the L1? I'm assuming they'd all go into
 the dcache since
they're more data-ey and that keeps the icache read
 only (ignoring SMC
issues), and the dcache is probably servicing lower
 bandwidth normally.
It also seems a little strange that this type of
 configuration is going
on in the BaseCPU.py SimObject python file and
 not a configuration
file, but I could be convinced there's a reason.
 Even if this isn't
really a fix or the right thing to do, I'd still
 like to try it
temporarily at least to see if it corrects the problem
 I'm seeing.


Gabe

 Ali Saidi wrote:   

 I haven't seen any strange behavior yet.
That isn't to say it's not
 going to cause an issue in the future, but
we've taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On
Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt 
 wrote:

Yea, I just
got around to reading this thread and that was the point
 I was going to
make... the L1 cache effectively serves as a
 translator between the
CPU's word-size read  write requests and the
 coherent block-level
requests that get snooped. If you attach a
 CPU-like device (such as the
table walker) directly to an L2, the
 CPU-like accesses that go to the
L2 will get sent to the L1s but I'm
 not sure they'll be handled
correctly. Not that they fundamentally
 couldn't, this just isn't a
configuration we test so it's likely that
 there are problems... for
example, the L1 may try to hand ownership
 to the requester but the
requester won't recognize that and things
 will break.

 Steve

 On Mon,
Nov 22, 2010 at 12:00 PM, Gabe Black  wrote:

 What happens if an entry
is in the L1 but not the L2?

 Gabe

 Ali Saidi wrote:
  Between the l1
and l2 caches 

Re: [m5-dev] X86 FS regression

2010-11-23 Thread Steve Reinhardt
I think the two easy (python-only) solutions are sharing the existing L1 via
a bus and tacking on a small L1 to the walker.  Which one is more realistic
would depend on what you're trying to model.

Steve

On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi sa...@umich.edu wrote:

 So what is the relatively good way to make this work in the short term? A
 bus? What about the slightly better version? I suppose a small cache might
 be ok and probably somewhat realistic.



 Thanks,

 Ali





 On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

 And even though I do think it could be made to work, I'm not sure it would
 be easy or a good idea.  There are a lot of corner cases to worry about,
 especially for writes, since you'd have to actually buffer the write data
 somewhere as opposed to just remembering that so-and-so has requested an
 exclusive copy.

 Actually as I think about it, that might be the case that's breaking now...
 if the L1 has an exclusive copy and then it snoops a write (and not a
 read-exclusive), I'm guessing it will just invalidate its copy, losing the
 modifications.  I wouldn't be terribly surprised if reads are working OK
 (the L1 should snoop those and respond if it's the owner), and of course
 it's all OK if the L1 doesn't have a copy of the block.

 So maybe there is a relatively easy way to make this work, but figuring out
 whether that's true and then testing it is still a non-trivial amount of
 effort.

 Steve

 On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt ste...@gmail.com wrote:

 No, when the L2 receives a request it assumes the L1s above it have
 already been snooped, which is true since the request came in on the bus
 that the L1s snoop.  The issue is that caches don't necessarily behave
 correctly when non-cache-block requests come in through their mem-side
 (snoop) port and not through their cpu-side (request) port.  I'm guessing
 this could be made to work, I'd just be very surprised if it does right now,
 since the caches weren't designed to deal with this case and aren't tested
 this way.

 Steve


 On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu wrote:

 Does it? Shouldn't the l2 receive the request, ask for the block and end
 up snooping the l1s?



 Ali





 On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt ste...@gmail.com
 wrote:

  The point is that connecting between the L1 and L2 induces the same
 problems wrt the L1 that connecting directly to memory induces wrt the whole
 cache hierarchy.  You're just statistically more likely to get away with it
 in the former case because the L1 is smaller.

 Steve

   On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi sa...@umich.edu wrote:


 Where are you connecting the table walker? If it's between the l1 and l2
 my guess is that it will work. if it is to the memory bus, yes, memory is
 just responding without the help of a cache and this could be the reason.

 Ali



 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black gbl...@eecs.umich.edu
 wrote:

  I think I may have just now. I've fixed a few issues, and am now
 getting
 to the point where something that should be in the pagetables is
 causing
 a page fault. I found where the table walker is walking the tables for
 this particular access, and the last level entry is all 0s. There could
 be a number of reasons this is all 0s, but since the main difference
 other than timing between this and a working configuration is the
 presence of caches and we've identified a potential issue there, I'm
 inclined to suspect the actual page table entry is still in the L1 and
 hasn't been evicted out to memory yet.

 To fix this, is the best solution to add a bus below the CPU for all
 the
 connections that need to go to the L1? I'm assuming they'd all go into
 the dcache since they're more data-ey and that keeps the icache read
 only (ignoring SMC issues), and the dcache is probably servicing lower
 bandwidth normally. It also seems a little strange that this type of
 configuration is going on in the BaseCPU.py SimObject python file and
 not a configuration file, but I could be convinced there's a reason.
 Even if this isn't really a fix or the right thing to do, I'd still
 like to try it temporarily at least to see if it corrects the problem
 I'm seeing.

 Gabe

 Ali Saidi wrote:


 I haven't seen any strange behavior yet. That isn't to say it's not
 going to cause an issue in the future, but we've taken many a tlb miss
 and it hasn't fallen over yet.

 Ali

 On Mon, 22 Nov 2010 13:08:13 -0800, Steve Reinhardt ste...@gmail.com
 
 wrote:

   Yea, I just got around to reading this thread and that was the
 point
 I was going to make... the L1 cache effectively serves as a
 translator between the CPU's word-size read  write requests and the
 coherent block-level requests that get snooped.  If you attach a
 CPU-like device (such as the table walker) directly to an L2, the
 CPU-like accesses that go to the L2 will get sent to the L1s but I'm
 not sure they'll be 

Re: [m5-dev] Implementation of findTagInSet

2010-11-23 Thread Steve Reinhardt
Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided by
building a software cache into isTagPresent(), by storing the last address
looked up along with a pointer to the block, then just checking on each call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table.  I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve


On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I was looking at the MOESI hammer protocol. I think Steve's observation
 that extra tag lookups are going on in the cache is correct. In particular I
 noticed that in the getState() and setState() functions, first
 isTagPresent(address) is called and on the basis of the result (which is
 true or false), getCacheEntry(address) is called. Surprisingly, the
 getCacheEntry() function calls the isTagPresent() function again. These
 calls are in the file src/mem/protocol/MOESI_hammer-cache.sm

 Thanks
 Nilay


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-23 Thread Nilay Vaish

Brad and I will be having a discussion today on how to resolve this issue.

--
Nilay


On Tue, 23 Nov 2010, Steve Reinhardt wrote:


Thanks for tracking that down; that confirms my suspicions.

I think the long-term answer is that the system needs to be reworked to
avoid having to do multiple tag lookups for a single access; I don't know if
that's just an API change or if that's something that needs to be folded
into SLICCer.  (BTW, what is the status of SLICCer?  Is anyone working on
it, or likely to work on it again?)

In the short term, it's possible that some of the overhead can be avoided by
building a software cache into isTagPresent(), by storing the last address
looked up along with a pointer to the block, then just checking on each call
to see if we're looking up the same address as last time and if so just
returning the same pointer before resorting to the hash table.  I hope that
doesn't lead to any coherence problems with the block changing out from
under this cached copy... if so, perhaps an additional block check is
required on hits.

Steve


On Tue, Nov 16, 2010 at 3:17 PM, Nilay Vaish ni...@cs.wisc.edu wrote:


I was looking at the MOESI hammer protocol. I think Steve's observation
that extra tag lookups are going on in the cache is correct. In particular I
noticed that in the getState() and setState() functions, first
isTagPresent(address) is called and on the basis of the result (which is
true or false), getCacheEntry(address) is called. Surprisingly, the
getCacheEntry() function calls the isTagPresent() function again. These
calls are in the file src/mem/protocol/MOESI_hammer-cache.sm

Thanks
Nilay





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.

2010-11-23 Thread nathan binkert
Looks good to me.  I don't see the problem with the print function
either.  (Ali, it's for an IP, not the netmask).  I can nitpick (and
this is bikeshedding), but I'm not sure that I agree that inheritance
(as opposed to containment) from IPAddress is the right thing to do
(on the C++ side).

  Nate

On Mon, Nov 22, 2010 at 2:52 AM, Gabe Black gbl...@eecs.umich.edu wrote:
 So are we all ok with this now?

 Gabe

 Gabe Black wrote:
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.m5sim.org/r/316/


 Review request for Default.
 By Gabe Black.

 /Updated 2010-11-20 23:40:22.294085/


   Description (updated)

 Params: Add parameter types for IP addresses in various forms.

 New parameter forms are:
 IP address in the format a.b.c.d where a-d are from decimal 0 to 255.
 IP address with netmask which is an IP followed by /n where n is a netmask
 length in bits from decimal 0 to 32 or by /e.f.g.h where e-h are from
 decimal 0 to 255 and which is all 1 bits followed by all 0 bits when
 represented in binary. These can also be specified as an integral IP and
 netmask passed in separately.
 IP address with port which is an IP followed by :p where p is a port index
 from decimal 0 to 65535. These can also be specified as an integral IP and
 port value passed in separately.


   Diffs (updated)

     * src/base/inet.hh (bf5377d8f5c1)
     * src/base/inet.cc (bf5377d8f5c1)
     * src/python/m5/params.py (bf5377d8f5c1)
     * src/python/m5/util/convert.py (bf5377d8f5c1)
     * src/python/swig/inet.i (bf5377d8f5c1)

 View Diff http://reviews.m5sim.org/r/316/diff/

 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.

2010-11-23 Thread Ali Saidi



On Tue, 23 Nov 2010 11:52:22 -0800, nathan binkert n...@binkert.org 
wrote:

Looks good to me.  I don't see the problem with the print function
either.  (Ali, it's for an IP, not the netmask).  I can nitpick (and
this is bikeshedding), but I'm not sure that I agree that inheritance
(as opposed to containment) from IPAddress is the right thing to do
(on the C++ side).
Look at what the code is doing to get the values? left shifting and 
converting to a byte? It's going to print all 0s.


Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.

2010-11-23 Thread Gabe Black
Ali Saidi wrote:


 On Tue, 23 Nov 2010 11:52:22 -0800, nathan binkert n...@binkert.org
 wrote:
 Looks good to me.  I don't see the problem with the print function
 either.  (Ali, it's for an IP, not the netmask).  I can nitpick (and
 this is bikeshedding), but I'm not sure that I agree that inheritance
 (as opposed to containment) from IPAddress is the right thing to do
 (on the C++ side).
 Look at what the code is doing to get the values? left shifting and
 converting to a byte? It's going to print all 0s.

 Ali

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

Oh yeah, you're right, those shifts look like they're going the wrong
way. I'll fix that.

Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Mem, X86: Make the IO bridge pass APIC messages back towards the CPU.

2010-11-23 Thread Gabe Black
A number of prefixes can be stuck into the top nibble of a physical
address to put it into a partition set aside for a certain purpose. This
is something I'm doing in M5 that isn't directly analogous to a real
system, but I suppose it would be similar to extra signals on the bus
for the same purpose. The CPU can only generate so many physical address
lines (less than 64) so there shouldn't be any collision. The partition
with prefix 0 is normal memory, devices, etc. so they don't have to be
treated specially, and one of the others is for the APICs to talk to
each other. And yes, a comment would be a good idea. I didn't want to
put on all the trimmings if this was a dead end.

Gabe

Steve Reinhardt wrote:
 My initial reaction is even if this works, this can't possibly be the
 best way to do it... where do APIC messages live in the address
 space?  How does 'Addr.max  4' let them through?  Did you really
 think this change didn't need a comment? ;-)

 On Tue, Nov 23, 2010 at 3:39 AM, Gabe Black gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 This seems to get APIC messages back to the CPU, but I really
 don't know
 if it's the right way to do this. I have the feeling there are
 forces at
 work in this code I don't fully appreciate.

 Gabe

 Gabe Black wrote:
  This is an automatically generated e-mail. To reply, visit:
  http://reviews.m5sim.org/r/323/
 
 
  Review request for Default.
  By Gabe Black.
 
 
Description
 
  Mem,X86: Make the IO bridge pass APIC messages back towards the CPU.
 
 
Diffs
 
  * configs/example/fs.py (865e37d507c7)
 
  View Diff http://reviews.m5sim.org/r/323/diff/
 
 
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org mailto:m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org mailto:m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] X86 FS regression

2010-11-23 Thread Gabe Black
Of these, I think the walker cache sounds better for two reasons. First,
it avoids the L1 pollution Ali was talking about, and second, a new bus
would add mostly inert stuff on the way to memory and which would
involve looking up what port to use even though it'd always be the same
one. I'll give that a try.

Gabe

Steve Reinhardt wrote:
 I think the two easy (python-only) solutions are sharing the existing
 L1 via a bus and tacking on a small L1 to the walker.  Which one is
 more realistic would depend on what you're trying to model.

 Steve

 On Tue, Nov 23, 2010 at 8:23 AM, Ali Saidi sa...@umich.edu
 mailto:sa...@umich.edu wrote:

 So what is the relatively good way to make this work in the short
 term? A bus? What about the slightly better version? I suppose a
 small cache might be ok and probably somewhat realistic.

  

 Thanks,

 Ali

  

  

 On Tue, 23 Nov 2010 08:15:01 -0800, Steve Reinhardt
 ste...@gmail.com mailto:ste...@gmail.com wrote:

 And even though I do think it could be made to work, I'm not sure
 it would be easy or a good idea.  There are a lot of corner cases
 to worry about, especially for writes, since you'd have to
 actually buffer the write data somewhere as opposed to just
 remembering that so-and-so has requested an exclusive copy.

 Actually as I think about it, that might be the case that's
 breaking now... if the L1 has an exclusive copy and then it
 snoops a write (and not a read-exclusive), I'm guessing it will
 just invalidate its copy, losing the modifications.  I wouldn't
 be terribly surprised if reads are working OK (the L1 should
 snoop those and respond if it's the owner), and of course it's
 all OK if the L1 doesn't have a copy of the block.

 So maybe there is a relatively easy way to make this work, but
 figuring out whether that's true and then testing it is still a
 non-trivial amount of effort.

 Steve

 On Tue, Nov 23, 2010 at 7:57 AM, Steve Reinhardt
 ste...@gmail.com mailto:ste...@gmail.com wrote:

 No, when the L2 receives a request it assumes the L1s above
 it have already been snooped, which is true since the request
 came in on the bus that the L1s snoop.  The issue is that
 caches don't necessarily behave correctly when
 non-cache-block requests come in through their mem-side
 (snoop) port and not through their cpu-side (request) port. 
 I'm guessing this could be made to work, I'd just be very
 surprised if it does right now, since the caches weren't
 designed to deal with this case and aren't tested this way.

 Steve


 On Tue, Nov 23, 2010 at 7:50 AM, Ali Saidi sa...@umich.edu
 mailto:sa...@umich.edu wrote:

 Does it? Shouldn't the l2 receive the request, ask for
 the block and end up snooping the l1s?

  

 Ali

  

  

 On Tue, 23 Nov 2010 07:30:00 -0800, Steve Reinhardt
 ste...@gmail.com mailto:ste...@gmail.com wrote:

 The point is that connecting between the L1 and L2
 induces the same problems wrt the L1 that connecting
 directly to memory induces wrt the whole cache
 hierarchy.  You're just statistically more likely to
 get away with it in the former case because the L1 is
 smaller.

 Steve

 On Tue, Nov 23, 2010 at 7:16 AM, Ali Saidi
 sa...@umich.edu mailto:sa...@umich.edu wrote:


 Where are you connecting the table walker? If
 it's between the l1 and l2 my guess is that it
 will work. if it is to the memory bus, yes,
 memory is just responding without the help of a
 cache and this could be the reason.

 Ali



 On Tue, 23 Nov 2010 06:29:20 -0500, Gabe Black
 gbl...@eecs.umich.edu
 mailto:gbl...@eecs.umich.edu wrote:

 I think I may have just now. I've fixed a few
 issues, and am now getting
 to the point where something that should be
 in the pagetables is causing
 a page fault. I found where the table walker
 is walking the tables for
 this particular access, and the last level
 entry is all 0s. There could
 be a number of reasons this is all 0s, but
 since the main difference
 other than timing between this and a working
 configuration is the
 presence of caches and we've identified a
 

[m5-dev] changeset in m5: Params: Add parameter types for IP addresses in...

2010-11-23 Thread Gabe Black
changeset 369f90d32e2e in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=369f90d32e2e
description:
Params: Add parameter types for IP addresses in various forms.

New parameter forms are:
IP address in the format a.b.c.d where a-d are from decimal 0 to 255.
IP address with netmask which is an IP followed by /n where n is a 
netmask
length in bits from decimal 0 to 32 or by /e.f.g.h where e-h are from
decimal 0 to 255 and which is all 1 bits followed by all 0 bits when
represented in binary. These can also be specified as an integral IP and
netmask passed in separately.
IP address with port which is an IP followed by :p where p is a port 
index
from decimal 0 to 65535. These can also be specified as an integral IP 
and
port value passed in separately.

diffstat:

 src/base/inet.cc  |   67 +
 src/base/inet.hh  |   59 +++
 src/python/m5/params.py   |  158 ++
 src/python/m5/util/convert.py |   49 +
 src/python/swig/inet.i|   19 +
 5 files changed, 352 insertions(+), 0 deletions(-)

diffs (truncated from 405 to 300 lines):

diff -r 865e37d507c7 -r 369f90d32e2e src/base/inet.cc
--- a/src/base/inet.cc  Tue Nov 23 06:11:50 2010 -0500
+++ b/src/base/inet.cc  Tue Nov 23 15:54:43 2010 -0500
@@ -117,6 +117,73 @@
 return stream;
 }
 
+string
+IpAddress::string() const
+{
+stringstream stream;
+stream  *this;
+return stream.str();
+}
+
+bool
+operator==(const IpAddress left, const IpAddress right)
+{
+return left.ip() == right.ip();
+}
+
+ostream 
+operator(ostream stream, const IpAddress ia)
+{
+uint32_t ip = ia.ip();
+ccprintf(stream, %x.%x.%x.%x,
+(uint8_t)(ip  0),  (uint8_t)(ip  8),
+(uint8_t)(ip  16), (uint8_t)(ip  24));
+return stream;
+}
+
+string
+IpNetmask::string() const
+{
+stringstream stream;
+stream  *this;
+return stream.str();
+}
+
+bool
+operator==(const IpNetmask left, const IpNetmask right)
+{
+return (left.ip() == right.ip()) 
+(left.netmask() == right.netmask());
+}
+
+ostream 
+operator(ostream stream, const IpNetmask in)
+{
+ccprintf(stream, %s/%d, (const IpAddress )in, in.netmask());
+return stream;
+}
+
+string
+IpWithPort::string() const
+{
+stringstream stream;
+stream  *this;
+return stream.str();
+}
+
+bool
+operator==(const IpWithPort left, const IpWithPort right)
+{
+return (left.ip() == right.ip())  (left.port() == right.port());
+}
+
+ostream 
+operator(ostream stream, const IpWithPort iwp)
+{
+ccprintf(stream, %s:%d, (const IpAddress )iwp, iwp.port());
+return stream;
+}
+
 uint16_t
 cksum(const IpPtr ptr)
 {
diff -r 865e37d507c7 -r 369f90d32e2e src/base/inet.hh
--- a/src/base/inet.hh  Tue Nov 23 06:11:50 2010 -0500
+++ b/src/base/inet.hh  Tue Nov 23 15:54:43 2010 -0500
@@ -147,6 +147,65 @@
 /*
  * IP Stuff
  */
+struct IpAddress
+{
+  protected:
+uint32_t _ip;
+
+  public:
+IpAddress() : _ip(0)
+{}
+IpAddress(const uint32_t __ip) : _ip(__ip)
+{}
+
+uint32_t ip() const { return _ip; }
+
+std::string string() const;
+};
+
+std::ostream operator(std::ostream stream, const IpAddress ia);
+bool operator==(const IpAddress left, const IpAddress right);
+
+struct IpNetmask : public IpAddress
+{
+  protected:
+uint8_t _netmask;
+
+  public:
+IpNetmask() : IpAddress(), _netmask(0)
+{}
+IpNetmask(const uint32_t __ip, const uint8_t __netmask) :
+IpAddress(__ip), _netmask(__netmask)
+{}
+
+uint8_t netmask() const { return _netmask; }
+
+std::string string() const;
+};
+
+std::ostream operator(std::ostream stream, const IpNetmask in);
+bool operator==(const IpNetmask left, const IpNetmask right);
+
+struct IpWithPort : public IpAddress
+{
+  protected:
+uint16_t _port;
+
+  public:
+IpWithPort() : IpAddress(), _port(0)
+{}
+IpWithPort(const uint32_t __ip, const uint16_t __port) :
+IpAddress(__ip), _port(__port)
+{}
+
+uint8_t port() const { return _port; }
+
+std::string string() const;
+};
+
+std::ostream operator(std::ostream stream, const IpWithPort iwp);
+bool operator==(const IpWithPort left, const IpWithPort right);
+
 struct IpOpt;
 struct IpHdr : public ip_hdr
 {
diff -r 865e37d507c7 -r 369f90d32e2e src/python/m5/params.py
--- a/src/python/m5/params.py   Tue Nov 23 06:11:50 2010 -0500
+++ b/src/python/m5/params.py   Tue Nov 23 15:54:43 2010 -0500
@@ -675,6 +675,163 @@
 def ini_str(self):
 return self.value
 
+# When initializing an IpAddress, pass in an existing IpAddress, a string of
+# the form a.b.c.d, or an integer representing an IP.
+class IpAddress(ParamValue):
+cxx_type = 'Net::IpAddress'
+
+@classmethod
+def cxx_predecls(cls, code):
+code('#include base/inet.hh')
+
+@classmethod
+def swig_predecls(cls, code):
+

Re: [m5-dev] Review Request: Params: Add parameter types for IP addresses in various forms.

2010-11-23 Thread nathan binkert
 Look at what the code is doing to get the values? left shifting and
 converting to a byte? It's going to print all 0s.

 Oh yeah, you're right, those shifts look like they're going the wrong
 way. I'll fix that.

heh, duh.
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] changeset in m5: Copyright: Add AMD copyright to the param chang...

2010-11-23 Thread Gabe Black
changeset 6a7207241112 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=6a7207241112
description:
Copyright: Add AMD copyright to the param changes I just made.

diffstat:

 src/base/inet.cc  |  2 ++
 src/base/inet.hh  |  2 ++
 src/python/m5/params.py   |  1 +
 src/python/m5/util/convert.py |  2 ++
 src/python/swig/inet.i|  2 ++
 5 files changed, 9 insertions(+), 0 deletions(-)

diffs (82 lines):

diff -r 369f90d32e2e -r 6a7207241112 src/base/inet.cc
--- a/src/base/inet.cc  Tue Nov 23 15:54:43 2010 -0500
+++ b/src/base/inet.cc  Tue Nov 23 17:08:41 2010 -0500
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2002-2005 The Regents of The University of Michigan
+ * Copyright (c) 2010 Advanced Micro Devices, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -26,6 +27,7 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * Authors: Nathan Binkert
+ *  Gabe Black
  */
 
 #include cstdio
diff -r 369f90d32e2e -r 6a7207241112 src/base/inet.hh
--- a/src/base/inet.hh  Tue Nov 23 15:54:43 2010 -0500
+++ b/src/base/inet.hh  Tue Nov 23 17:08:41 2010 -0500
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2002-2005 The Regents of The University of Michigan
+ * Copyright (c) 2010 Advanced Micro Devices, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -27,6 +28,7 @@
  *
  * Authors: Nathan Binkert
  *  Steve Reinhardt
+ *  Gabe Black
  */
 
 #ifndef __BASE_INET_HH__
diff -r 369f90d32e2e -r 6a7207241112 src/python/m5/params.py
--- a/src/python/m5/params.py   Tue Nov 23 15:54:43 2010 -0500
+++ b/src/python/m5/params.py   Tue Nov 23 17:08:41 2010 -0500
@@ -27,6 +27,7 @@
 #
 # Authors: Steve Reinhardt
 #  Nathan Binkert
+#  Gabe Black
 
 #
 #
diff -r 369f90d32e2e -r 6a7207241112 src/python/m5/util/convert.py
--- a/src/python/m5/util/convert.py Tue Nov 23 15:54:43 2010 -0500
+++ b/src/python/m5/util/convert.py Tue Nov 23 17:08:41 2010 -0500
@@ -1,4 +1,5 @@
 # Copyright (c) 2005 The Regents of The University of Michigan
+# Copyright (c) 2010 Advanced Micro Devices, Inc.
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -25,6 +26,7 @@
 # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #
 # Authors: Nathan Binkert
+#  Gabe Black
 
 # metric prefixes
 exa  = 1.0e18
diff -r 369f90d32e2e -r 6a7207241112 src/python/swig/inet.i
--- a/src/python/swig/inet.iTue Nov 23 15:54:43 2010 -0500
+++ b/src/python/swig/inet.iTue Nov 23 17:08:41 2010 -0500
@@ -1,5 +1,6 @@
 /*
  * Copyright (c) 2006 The Regents of The University of Michigan
+ * Copyright (c) 2010 Advanced Micro Devices, Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -26,6 +27,7 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * Authors: Nathan Binkert
+ *  Gabe Black
  */
 
 %{
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev