[m5-dev] Cron m5t...@zizzer /z/m5/regression/do-regression quick

2010-11-02 Thread Cron Daemon
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed.
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic 
passed.
* build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp 
passed.
* build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing 
passed.
* build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp 
passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby 
passed.
* build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic 
passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing 
passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual
 passed.
* 
build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual
 passed.
* 
build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic
 passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby 
passed.
* build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed.
* build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed.
* build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed.
* build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing-ruby 
passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic 
passed.
* build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing 
passed.
* 
build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp
 passed.
* 

[m5-dev] Implementation of findTagInSet

2010-11-02 Thread Nilay Vaish
I am looking at possible performance optimizations in Ruby. As you can see 
grasp from the mail excerpt below, the function findTagInSet() consumes 
lots of time. I am thinking of making the changes as suggested by Brad. I 
have questions for m5-dev members, in particular for Derek and Steve. How 
did you arrive at the conclusion that findTagInSet() is a problem? What 
benchmarks, profiling tools to use?


Thanks
Nilay

-- Forwarded message --
Date: Mon, 20 Sep 2010 22:57:39 -0500
From: Beckmann, Brad brad.beckm...@amd.com
To: 'Nilay Vaish' ni...@cs.wisc.edu
Cc: Daniel Gibson gib...@cs.wisc.edu
Subject: RE: Performane Optimizations in Ruby

== CacheMemory findTagInSet == Recently Steve mentioned to me that a huge 
percentage of time was being spent in CacheMemory's findTagInSet function. 
Right now that function uses a hashmap across the entire cache to map tags 
to way ids.  I think Derek recently implemented this change in hopes to 
improve performance, and it might have for small caches, but I don't think 
it helps for larger caches.  There a couple of possible solutions: per set 
hashmaps, or reordering the ways so that the MRU blocks are at the lower 
ids and use a loop.  I think we should investigate both solutions and see 
which is better.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Implementation of findTagInSet

2010-11-02 Thread Steve Reinhardt
I just compiled m5.prof and ran it (forgot what workload I ran on it,
probably one of the parsec benchmarks; it probably doesn't matter a lot).
If you've never used gprof before, this is a great time to learn!

Steve

On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote:

 I am looking at possible performance optimizations in Ruby. As you can see
 grasp from the mail excerpt below, the function findTagInSet() consumes lots
 of time. I am thinking of making the changes as suggested by Brad. I have
 questions for m5-dev members, in particular for Derek and Steve. How did you
 arrive at the conclusion that findTagInSet() is a problem? What benchmarks,
 profiling tools to use?

 Thanks
 Nilay

 -- Forwarded message --
 Date: Mon, 20 Sep 2010 22:57:39 -0500
 From: Beckmann, Brad brad.beckm...@amd.com
 To: 'Nilay Vaish' ni...@cs.wisc.edu
 Cc: Daniel Gibson gib...@cs.wisc.edu
 Subject: RE: Performane Optimizations in Ruby

 == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge
 percentage of time was being spent in CacheMemory's findTagInSet function.
 Right now that function uses a hashmap across the entire cache to map tags
 to way ids.  I think Derek recently implemented this change in hopes to
 improve performance, and it might have for small caches, but I don't think
 it helps for larger caches.  There a couple of possible solutions: per set
 hashmaps, or reordering the ways so that the MRU blocks are at the lower ids
 and use a loop.  I think we should investigate both solutions and see which
 is better.
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] build_dir has been deprecated

2010-11-02 Thread Gabe Black
 I went to build ALPHA_FS just now, and I must have upgraded scons as
part of my most recent system update because now I get a bunch of the
following warnings.

scons: warning: The build_dir keyword has been deprecated; use the
variant_dir keyword instead.
File /home/gblack/m5/repos/m5/build/ALPHA_FS/SConscript, line 251, in
module

Things still seem to work, but do we want to change build_dir to
variant_dir to clean that up? Will that break compatibility with an old
version we still want to work?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.

2010-11-02 Thread Steve Reinhardt
Do you mean (1) or (2)?  I thought that with (1) the stats would not change.

My bias would be (2), but (1) seems livable enough.  In either case it would
be nice to put in a warn_once() if we don't already have one so it's obvious
that SW prefetches are being ignored.

Steve

On Sun, Oct 31, 2010 at 9:45 AM, Ali Saidi sa...@umich.edu wrote:

 Any input? Otherwise I'm going with (1) and have new stats to go with it.

 Ali

 On Oct 27, 2010, at 12:02 AM, Ali Saidi wrote:

  Hmmm... three emails when one should have done. There are three options:
  1. Make them actual no-ops (e.g. stop marking them as mem refs, data
 prefetch, etc). The instruction count will stay the same here. The
 functionality will stay the same. The instructions will be further away from
 working -- not that I think anyone will make them work in the future.
  2. Leave them in their half bake memop state where they're memops that
 never call read() and don't write back anything, so the instruction count is
 different since the inst count gets incremented after the op completes. This
 is what I currently have.
  3. Make them actually work. I've tried to muck with this without success
 for a while now.
 
  Ali
 
 
 
  On Oct 26, 2010, at 11:58 PM, Ali Saidi wrote:
 
  The other portion of this, is when I try to make them act like loads,
 but not actually write a register I break the o3 cpu in ways that 4 hours
 has not been able to explain.
 
  Ali
 
  On Oct 26, 2010, at 10:42 PM, Ali Saidi wrote:
 
  The count gets smaller because since they don't actually access memory,
 they never complete and therefore they never increment the instruction
 count.
 
  Ali
 
  On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote:
 
  I vote for updating the stats... it's really wrong that we ignored
 them previously.
 
  On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi sa...@umich.edu wrote:
  Ok. So next question. With the CPU model treating prefetches as normal
 memory instructions the # of instructions changes for the timing simple cpu
 because the inst count stat is incremented in completeAccess(). So, one
 option is to update the stats to reflect the new count. The other option
 would be to stop marking the prefetch instructions as memory ops in which
 case they would just execute as nop. Any thoughts?
 
  Ali
 
 
 
 
 
  On Oct 24, 2010, at 12:14 AM, Steve Reinhardt wrote:
 
  No, we've lived with Alpha prefetches the way they are for long
 enough
  now I don't see where fixing them buys us that much.
 
  Steve
 
  On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi sa...@umich.edu wrote:
  Sounds goo to me. I'll take a look at what I need to do to implement
 it.  Any arguments with the Alpha prefetch instructions staying nops?
 
  Ali
 
  On Oct 22, 2010, at 6:52 AM, Steve Reinhardt wrote:
 
  On Tue, Oct 19, 2010 at 11:14 PM, Ali Saidi sa...@umich.edu
 wrote:
 
  I think the prefetch should be sent the the TLB unconditionally,
 and then if the prefetch faults the CPU should toss the instruction rather
 than the TLB returning no fault and the CPU i guess checking if the PA is
 set?
 
  I agree that we should override the fault in the CPU. Are we
 violently agreeing?
 
  OK, it's becoming a little clearer to me now.  I think we're
 agreeing
  that the TLB should be oblivious to whether an access is a prefetch
 or
  not, so that's a start.
 
  The general picture I'd like to see is that once a prefetch returns
  from the TLB, the CPU does something like:
 
  if (inst-fault == NoFault) {
  access the cache
  } else if (inst-isPrefetch()) {
  maybe set a flag if necessary
  inst-fault = NoFault;
  }
 
  ...so basically everywhere else down the pipeline where we check
 for
  faults we don't have to explicitly except prefetches from normal
 fault
  handling.
 
  If there are points past this one where we really care to know if a
  prefetch accessed the cache or not, then maybe we need a flag to
  remember that (sort of a dynamic version of the NO_ACCESS static
  flag), but I don't know if that's really necessary or not.  Clearly
 if
  the cache access doesn't happen right there, then we can add the
 flag
  and use it later to decide whether to access the cache.
 
  Anyway, this is the flavor I was going for... any issues with it?
 
  Steve
 
 
 
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 
 
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  http://m5sim.org/mailman/listinfo/m5-dev
 



Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.

2010-11-02 Thread Ali Saidi


Unfortunately, the stats change in all cases. For (1) the
instructions no longer have IsMemRef set which means the num_refs
changes for all CPUs and the change causes some minor changes in the O3.
With (2) they're half baked, so the models call initiateAcc() but it
doesn't actually initiate the access, so completeAcc() is never called
and thus they aren't counted as part of the instruction count. (2) isn't
ideal since half-calling the initiateAcc() might lead to some problems
down the road. 

I'll post a diff today. 

Ali 

On Tue, 2 Nov 2010
12:18:08 -0700, Steve Reinhardt  wrote:  

Do you mean (1) or (2)? I
thought that with (1) the stats would not change.

My bias would be (2),
but (1) seems livable enough. In either case it would be nice to put in
a warn_once() if we don't already have one so it's obvious that SW
prefetches are being ignored.

Steve

On Sun, Oct 31, 2010 at 9:45 AM,
Ali Saidi  wrote:
 Any input? Otherwise I'm going with (1) and have new
stats to go with it.

 Ali

 On Oct 27, 2010, at 12:02 AM, Ali Saidi
wrote:

  Hmmm... three emails when one should have done. There are
three options:
  1. Make them actual no-ops (e.g. stop marking them as
mem refs, data prefetch, etc). The instruction count will stay the same
here. The functionality will stay the same. The instructions will be
further away from working -- not that I think anyone will make them work
in the future.
  2. Leave them in their half bake memop state where
they're memops that never call read() and don't write back anything, so
the instruction count is different since the inst count gets incremented
after the op completes. This is what I currently have.
  3. Make them
actually work. I've tried to muck with this without success for a while
now.
 
  Ali
 
 
 
  On Oct 26, 2010, at 11:58 PM, Ali Saidi
wrote:
 
  The other portion of this, is when I try to make them act
like loads, but not actually write a register I break the o3 cpu in ways
that 4 hours has not been able to explain.
 
  Ali
 
  On Oct
26, 2010, at 10:42 PM, Ali Saidi wrote:
 
  The count gets smaller
because since they don't actually access memory, they never complete and
therefore they never increment the instruction count.
 
  Ali


  On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote:
 
 
I vote for updating the stats... it's really wrong that we ignored them
previously.
 
  On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi 
wrote:
  Ok. So next question. With the CPU model treating
prefetches as normal memory instructions the # of instructions changes
for the timing simple cpu because the inst count stat is incremented in
completeAccess(). So, one option is to update the stats to reflect the
new count. The other option would be to stop marking the prefetch
instructions as memory ops in which case they would just execute as nop.
Any thoughts?
 
  Ali
 
 
 
 
 
  On Oct
24, 2010, at 12:14 AM, Steve Reinhardt wrote:
 
  No, we've
lived with Alpha prefetches the way they are for long enough
  now
I don't see where fixing them buys us that much.
 
  Steve


  On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi  wrote:
 
Sounds goo to me. I'll take a look at what I need to do to implement it.
Any arguments with the Alpha prefetch instructions staying nops?


  Ali
 
  On Oct 22, 2010, at 6:52 AM, Steve
Reinhardt wrote:
 
  On Tue, Oct 19, 2010 at 11:14 PM, Ali
Saidi  wrote:
 
  I think the prefetch should be sent
the the TLB unconditionally, and then if the prefetch faults the CPU
should toss the instruction rather than the TLB returning no fault and
the CPU i guess checking if the PA is set?
 
  I agree
that we should override the fault in the CPU. Are we violently
agreeing?
 
  OK, it's becoming a little clearer to me
now. I think we're agreeing
  that the TLB should be oblivious to
whether an access is a prefetch or
  not, so that's a start.


  The general picture I'd like to see is that once a
prefetch returns
  from the TLB, the CPU does something like:


  if (inst-fault == NoFault) {
  access the
cache
  } else if (inst-isPrefetch()) {
  maybe set a
flag if necessary
  inst-fault = NoFault;
  }
 

 ...so basically everywhere else down the pipeline where we check
for
  faults we don't have to explicitly except prefetches from
normal fault
  handling.
 
  If there are points
past this one where we really care to know if a
  prefetch
accessed the cache or not, then maybe we need a flag to
 
remember that (sort of a dynamic version of the NO_ACCESS static

 flag), but I don't know if that's really necessary or not.
Clearly if
  the cache access doesn't happen right there, then we
can add the flag
  and use it later to decide whether to access
the cache.
 
  Anyway, this is the flavor I was going
for... any issues with it?
 
  Steve
 
 


 
 
 
___
  m5-dev mailing
list
  m5-dev@m5sim.org [5]
 
http://m5sim.org/mailman/listinfo/m5-dev [6]
 
 
___
  m5-dev mailing
list
  m5-dev@m5sim.org 

Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.

2010-11-02 Thread Ali Saidi


There is a NO_ACCESS reference that is implemnted in the simple CPU,
but not in the O3 CPU. Ultimately, it's a couple of lines of code
difference between (1) and (2). I've also written the isa description
for (3), so I'm even up for revisiting it after we get prefetches
working with other x86 or arm in O3 and seeing if it's fixed. 

Ali 

On
Tue, 2 Nov 2010 13:12:17 -0700, Steve Reinhardt  wrote:  

Aren't there
other NO_ACCESS references (in other ISAs) that call initiateAcc() but
not completeAcc()? If so, then that by itself doesn't seem like
justification to avoid solution (2). If not, then I suppose I agree with
you.

Steve

On Tue, Nov 2, 2010 at 12:54 PM, Ali Saidi 
wrote:

Unfortunately, the stats change in all cases. For (1) the
instructions no longer have IsMemRef set which means the num_refs
changes for all CPUs and the change causes some minor changes in the O3.
With (2) they're half baked, so the models call initiateAcc() but it
doesn't actually initiate the access, so completeAcc() is never called
and thus they aren't counted as part of the instruction count. (2) isn't
ideal since half-calling the initiateAcc() might lead to some problems
down the road. 

I'll post a diff today. 

Ali 

On Tue, 2 Nov 2010
12:18:08 -0700, Steve Reinhardt  wrote:  

Do you mean (1) or (2)? I
thought that with (1) the stats would not change.

My bias would be (2),
but (1) seems livable enough. In either case it would be nice to put in
a warn_once() if we don't already have one so it's obvious that SW
prefetches are being ignored.

Steve

On Sun, Oct 31, 2010 at 9:45 AM,
Ali Saidi  wrote:
 Any input? Otherwise I'm going with (1) and have new
stats to go with it.

 Ali

 On Oct 27, 2010, at 12:02 AM, Ali Saidi
wrote:

  Hmmm... three emails when one should have done. There are
three options:
  1. Make them actual no-ops (e.g. stop marking them as
mem refs, data prefetch, etc). The instruction count will stay the same
here. The functionality will stay the same. The instructions will be
further away from working -- not that I think anyone will make them work
in the future.
  2. Leave them in their half bake memop state where
they're memops that never call read() and don't write back anything, so
the instruction count is different since the inst count gets incremented
after the op completes. This is what I currently have.
  3. Make them
actually work. I've tried to muck with this without success for a while
now.
 
  Ali
 
 
 
  On Oct 26, 2010, at 11:58 PM, Ali Saidi
wrote:
 
  The other portion of this, is when I try to make them act
like loads, but not actually write a register I break the o3 cpu in ways
that 4 hours has not been able to explain.
 
  Ali
 
  On Oct
26, 2010, at 10:42 PM, Ali Saidi wrote:
 
  The count gets smaller
because since they don't actually access memory, they never complete and
therefore they never increment the instruction count.
 
  Ali


  On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote:
 
 
I vote for updating the stats... it's really wrong that we ignored them
previously.
 
  On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi 
wrote:
  Ok. So next question. With the CPU model treating
prefetches as normal memory instructions the # of instructions changes
for the timing simple cpu because the inst count stat is incremented in
completeAccess(). So, one option is to update the stats to reflect the
new count. The other option would be to stop marking the prefetch
instructions as memory ops in which case they would just execute as nop.
Any thoughts?
 
  Ali
 
 
 
 
 
  On Oct
24, 2010, at 12:14 AM, Steve Reinhardt wrote:
 
  No, we've
lived with Alpha prefetches the way they are for long enough
  now
I don't see where fixing them buys us that much.
 
  Steve


  On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi  wrote:
 
Sounds goo to me. I'll take a look at what I need to do to implement it.
Any arguments with the Alpha prefetch instructions staying nops?


  Ali
 
  On Oct 22, 2010, at 6:52 AM, Steve
Reinhardt wrote:
 
  On Tue, Oct 19, 2010 at 11:14 PM, Ali
Saidi  wrote:
 
  I think the prefetch should be sent
the the TLB unconditionally, and then if the prefetch faults the CPU
should toss the instruction rather than the TLB returning no fault and
the CPU i guess checking if the PA is set?
 
  I agree
that we should override the fault in the CPU. Are we violently
agreeing?
 
  OK, it's becoming a little clearer to me
now. I think we're agreeing
  that the TLB should be oblivious to
whether an access is a prefetch or
  not, so that's a start.


  The general picture I'd like to see is that once a
prefetch returns
  from the TLB, the CPU does something like:


  if (inst-fault == NoFault) {
  access the
cache
  } else if (inst-isPrefetch()) {
  maybe set a
flag if necessary
  inst-fault = NoFault;
  }
 

 ...so basically everywhere else down the pipeline where we check
for
  faults we don't have to explicitly except prefetches from
normal fault
  handling.
 
  If there are points
past this one where