Re: [gem5-dev] Debug Flags

2011-06-01 Thread Gabriel Michael Black
So, I think part of the confusion is that there are two names now,  
debug flags and trace flags, but they're different views of the same  
mechanism (yes? no?) It seems like the --trace* options are like the  
--debug* options, except their intended use is a subset of --debug*,  
specifically DPRINTFs. What about returning the DPRINTF ones to  
--trace-flags, etc., and introducing a separate parallel set of  
options and namespace for the debug stuff? There's some macro or  
something to check if trace flags are turned on, and that encourages  
their use as debug flags (although I think that use is minimal in the  
current code). We could introduce a new DEBUG_ON() macro (or a better  
name) and optionally eliminate the trace oriented one or make it  
internal to DPRINTFs only. I can think of some valid uses for keeping  
it like blocks of DPRINTFs like Ali added recently, but it blurs the  
line and could add to the confusion.


By having two parallel systems, even though they're a bit redundant  
where they overlap, I think it introduces a clear conceptual  
separation between the two. Then it's clear what trace flags are for  
and when to use them, and also what debug flags are for and when to  
use them.


We really have two different ideas budding off from each other  
(controlling tracing and debug features), and by partially bundling  
them together and partially distinguishing them that leads to  
confusion. The mental model is different from the way you have to  
control things, and trying to reconcile the two views makes the system  
hard to reason about.


Gabe

Quoting nathan binkert n...@binkert.org:


Ok, there has been a lot of confusion about debug flags and trace
flags.  I changed the way the flags stuff worked from a compile
perspective which required me to make changes throughout the tree, so
I took the opportunity to rename the trace flags to debug flags.  The
idea behind the change was that the flags can be used for things other
than tracing (I use them for breakpoints) and there is only one
namespace, so I just renamed it to debug (people did review that
change).

So, I renamed --trace-flags to --debug-flags and --trace-flags-help to
--debug-flags-help.  --trace-start, --trace-file, and --trace-ignore
stayed the same because those only affect the tracing portion of the
debugging stuff.  I never renamed the TraceFlags SCons option to
DebugFlags.

So, how do we clear up the confusion?  Should I just fix the SCons
thing and people will just learn?  Should I change the name back?
(There are a ton of places where this would change).

Anyone care?

  Nate
___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev




___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request: Misc: Remove the URL from warnings, fatals, panics, etc.

2011-05-28 Thread Gabriel Michael Black

ping

Quoting Gabe Black gbl...@eecs.umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/719/
---

Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

Misc: Remove the URL from warnings, fatals, panics, etc.


Diffs
-

  src/base/misc.cc dda2a88eb7c4
  src/python/m5/util/__init__.py dda2a88eb7c4

Diff: http://reviews.m5sim.org/r/719/diff


Testing
---


Thanks,

Gabe

___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev





___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] [m5-dev] src/dest detection in the ISA descriptions

2011-05-28 Thread Gabriel Michael Black

ping

Quoting Gabe Black gbl...@eecs.umich.edu:


Ping...

On 05/05/11 10:38, Steve Reinhardt wrote:

On Wed, May 4, 2011 at 2:25 PM, Gabe Black gbl...@eecs.umich.edu wrote:


Did that make sense?


I see how that could work... I think I was more puzzled by how you would
figure out that

for (int i = 0; i  7; i++)
Dest.bytes[i] = Source1.bytes[i] + Source2.bytes[i];

overwrote all of Dest, but

for (int i = 0; i  4; i++)
Dest.bytes[i] = Source1.bytes[i] + Source2.bytes[i];

wouldn't... but looking back I see now that you'd expect to need manual
annotations in at least one of those cases.



Do you think you'll be able to review those patches
soonish?


I'll try... thanks for the reminder, that definitely increases the
probability :-).

Steve
___
m5-dev mailing list
m5-...@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-...@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Auto-generated error/warning URLs

2011-05-23 Thread Gabriel Michael Black

According to this:

http://m5sim.org/Special:AllPages

only one exists created by Ali last August. I'm all for getting rid of  
those links and I think I've even suggested it a time or two in the  
past.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


You know the For more  information see:
http://www.m5sim.org/warn/3a2134f6; URLs?  How many of these are
actually used?  They seem mostly like noise to me.  If we're not using
them, can we get rid of them?  Seems like it's just as intuitive to
search for the warning/error string on the wiki (or in the mailing
list).

Thoughts?

Steve
___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev





___
gem5-dev mailing list
gem5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [m5-dev] Review Request: Trace: Allow printing ASIDs and selectively tracing based on user/kernel code.

2011-05-04 Thread Gabriel Michael Black
X86 doesn't have an ASID hardware feature to the best of my knowledge,  
except related to the virtualization extensions when working with  
guest memory spaces. Would it make sense to use root page table  
pointers for this? I don't know specifically how page tables are  
managed in Linux, so I don't know if that would actually work. The  
root page table pointer is in the CR3 register.


Gabe

Quoting Gabe Black gbl...@eecs.umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/678/#review1207
---



src/cpu/exetrace.cc
http://reviews.m5sim.org/r/678/#comment1657

Maybe put these ifs together with an or? It's not hugely better  
since the body is so short, but it's something to consider.



- Gabe


On 2011-05-04 18:42:30, Ali Saidi wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/678/
---

(Updated 2011-05-04 18:42:30)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

Trace: Allow printing ASIDs and selectively tracing based on  
user/kernel code.


Debug flags are ExecUser, ExecKernel, and ExecAsid. ExecUser and
ExecKernel are set by default when Exec is specified.  Use minus
sign with ExecUser or ExecKernel to remove user or kernel tracing
respectively.


Diffs
-

  src/arch/alpha/utility.hh 5a9a639ce16f
  src/arch/alpha/utility.cc 5a9a639ce16f
  src/arch/arm/utility.hh 5a9a639ce16f
  src/arch/mips/utility.hh 5a9a639ce16f
  src/arch/power/utility.hh 5a9a639ce16f
  src/arch/sparc/utility.hh 5a9a639ce16f
  src/arch/x86/utility.hh 5a9a639ce16f
  src/cpu/SConscript 5a9a639ce16f
  src/cpu/exetrace.cc 5a9a639ce16f

Diff: http://reviews.m5sim.org/r/678/diff


Testing
---


Thanks,

Ali








___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ARM: Add vfpv3 support to native trace.

2011-05-02 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:





On 2011-05-02 16:42:25, Gabe Black wrote:
 util/statetrace/arch/arm/tracechild.cc, line 79
 http://reviews.m5sim.org/r/669/diff/1/?file=12215#file12215line79

 I don't know how easy this would be to accommodate, but  
you're going to be sending a bunch of extra zeros for int regs that  
aren't 64 bits wide. Can you make it so you send full 64 bit values  
only when the source is actually 64 bits wide?


Ali Saidi wrote:
There is no reason to worry about sending 4 bytes down the  
wire. The speed issue isn't about sending 4 bytes, it's all about  
having to put a breakpoint after every instruction.


Gabe Black wrote:
The reason I implemented sending diffs of the state instead of  
sending the whole state is that what you send over the wire -does-  
matter, significantly. Breakpoints are slow too, but that doesn't  
mean everything else is irrelevant.


I fully appreciate that it wouldn't be a good idea to send 1KB of  
data over the wire, but we're well past bike shedding arguing about  
4 bytes. There is no reason to add complexity and control flow to  
try and avoid it.


I don't think it's necessary, although I also don't think it would be  
that hard.


Also, I don't like our tendency to characterize disagreement as bike  
shedding. If I don't agree, I don't agree. I'm not obligated to change  
my mind or just stand by while you do whatever you want. This is my  
code and by the convention we've stated on several occasions that  
means it's my decision. You're bike shedding by continuing to argue  
after I've made my decision.






On 2011-05-02 16:42:25, Gabe Black wrote:
 util/statetrace/arch/arm/tracechild.cc, line 104
 http://reviews.m5sim.org/r/669/diff/1/?file=12215#file12215line104

 The idea is to verify that you're not falling off of uregs.  
Maybe you could do something more flexible like  
sizeof(myregs.uregs) / sizeof (myregs.uregs[0]).


Ali Saidi wrote:
uregs is never going to get smaller than it is now and I don't  
see a reason to come up with a crazy assert to try and prove that  
it isn't.


Gabe Black wrote:
uregs changing size is irrelevant. That formula will be exactly  
right all the time and doesn't depend on the coincidence that the  
CPSR is last.


Only if you assume that the integer registers are sent first, which  
it seems like you're unwilling to make any assumptions about the  
code, so perhaps we should somehow verify that?





Depending on the constants in an enum or the members of a data  
structure being in a particular order for all of time is dangerous and  
in this case pointless.


I don't appreciate your sarcasm or you treating my attempts to protect  
the quality of my code like some sort of unreasonable hassle. Do you  
think I like taking my time to reverse engineer and review all this  
code somebody else wrote, and then having to fight over why it's  
wrong? It's an enormous waste of energy, and there are so many better  
uses for my time. And yet here we are. Again.



On 2011-05-02 16:42:25, Gabe Black wrote:
 util/statetrace/arch/arm/tracechild.cc, line 111
 http://reviews.m5sim.org/r/669/diff/1/?file=12215#file12215line111

 The same comment applies as in getRegs, except that you have  
to deal with an offset. It would probably be a good idea to define  
something in the enum to mark the start of the FP regs. You can  
move the assert to after the if and subtract out the offset right  
before indexing fpregs.


Ali Saidi wrote:
There are clearly 32 float registers defined in the enum and in  
the struct. The assert just verifies that we're actually accessing  
a floating point register when we should be. We don't need to  
verify the structure size it's correct by construction.


Gabe Black wrote:
I wrote the original assert and know what it's for, verifying  
the index and not the structure. Again, it assumes F0 is the first  
FP reg which is arbitrary.


No it's not! F0 Is the first floating point register. 0 is the first  
whole number so it's first. Would you rather some uglyiness of  
START_FP, F0 = START_FP? Why are we arguing about code that is  
correct by inspection, has been extensively tested, and works?!





Yes, put that in my code. Why are we arguing about what you want to do  
to my code? How could anything be wrong with code that works right  
this moment? Why shouldn't I just let you do whatever you want because  
it's expedient?



On 2011-05-02 16:42:25, Gabe Black wrote:
 util/statetrace/arch/arm/tracechild.cc, line 129
 http://reviews.m5sim.org/r/669/diff/1/?file=12215#file12215line129

 Just because libc would use a macro doesn't mean we have to.  
You should replace this with a constant of the appropriate type.


Ali Saidi wrote:
I disagree... This will transition nicely as soon as libc gets  
it's act together.


Gabe Black wrote:
The fact that gcc uses macros is an unfortunate historical  
artifact, not a valid design decision. The 

Re: [m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2011-04-29 Thread Gabriel Michael Black
I deleted the build directory but I'll let it rerun naturally tonight.  
If somebody wants to rerun them manually go ahead.


Gabe

Quoting Beckmann, Brad brad.beckm...@amd.com:

I can't reproduce these scons errors and they don't seem to happen  
from a clean build.  Can we blow away the current build directory on  
zizzer and re-run the regression tester?  I would do it myself, but  
I don't have access to zizzer.


Thanks,

Brad



-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org]
On Behalf Of Cron Daemon
Sent: Friday, April 29, 2011 12:17 AM
To: m5-dev@m5sim.org
Subject: [m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression
quick

scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_RubyNetwork_wrap.cc'.
scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_BaseGarnetNetwork_wrap.cc'
.
scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_Topology_wrap.cc'.
scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_RubySystem_wrap.cc'.
scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_GarnetNetwork_wrap.cc'.
scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_SimpleNetwork_wrap.cc'.
scons: *** Implicit dependency `build/ALPHA_SE/params/ExtLink.hh' not
found, needed by target
`build/ALPHA_SE/python/m5/internal/param_GarnetNetwork_d_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_RubyNetwo
rk_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_BaseGarnet
Network_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_Topology_w
rap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_RubySystem
_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_GarnetNetw
ork_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_SimpleNetw
ork_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_hammer/params/ExtLink.hh' not found, needed
by target
`build/ALPHA_SE_MOESI_hammer/python/m5/internal/param_GarnetNetw
ork_d_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_RubyN
etwork_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_BaseGa
rnetNetwork_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_Topolo
gy_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_RubySy
stem_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_Garnet
Network_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_Simple
Network_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MESI_CMP_directory/python/m5/internal/param_Garnet
Network_d_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MOESI_CMP_directory/python/m5/internal/param_Ruby
Network_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_CMP_directory/params/ExtLink.hh' not found,
needed by target
`build/ALPHA_SE_MOESI_CMP_directory/python/m5/internal/param_Base
GarnetNetwork_wrap.cc'.
scons: *** Implicit dependency
`build/ALPHA_SE_MOESI_CMP_directory/params/ExtLink.hh' not found,
needed by target

Re: [m5-dev] Code Reviewing

2011-04-27 Thread Gabriel Michael Black
That sounds reasonable. With too many reviews it gets harder to get to  
all of them, and some obscure things may languish with no reviews  
because only one person is comfortable with that code. Reviews are  
generally a really good thing but they have some overhead. If we don't  
get more benefit than that threshold, they aren't worth it in that case.


Gabe

Quoting nathan binkert n...@binkert.org:


Hi Everyone,

We don't have an official policy on code reviews, but I think we're
being a bit pedantic with them.  While I definitely want us to err on
the side of having code review is the author has any doubt, I think it
is completely unnecessary to have reviews on things like changing
comments and text in strings.  Similarly, obvious bug fixes (though
this is one of those subjective things that the author has to
consider) need not be reviewed.

What do you all think?  What is our policy?  Am I crazy? Should we
review everything?

   Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] what scons can do

2011-04-21 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:


Anyway, it seems like it would be useful to be able to have multiple
binaries that can be built by scons, specifically the utility stuff and
unit tests. That way we could avoid having a hodge podge of small build
systems which are either isolated or not in not quite the right ways. I
know some of Nate's recent changes suggested this was going to get
easier. Could you quickly summarize what that's all about, Nate?

What changes are you thinking of?  I'm not sure that anything I've
done makes this easier or harder.  I think it's just a matter of
creating SConscript files that the SConstruct file sources.


A mechanism where I can say binary foo needs file bar, and then when I  
tell it to build foo it build's it with bar, and bar doesn't get mixed  
in to other things. I think you're new widget that was like mercurial  
patch guards would help with that, right? You could have a guard for  
each extra target.





Also, I was thinking about how to handle the dependencies/generated
files/custom language issue a little while back, and what I kept coming
back to were schemes where scons would use a cache of dependency
information which it would regenerate if any of the input files which
determined outputs and/or dependencies changed. The problem is that
scons would need to run once and possibly regenerate its cache, and then
run again to actually run. Is this sort of multi-pass setup possible
somehow without major hacks?

I've looked into this in depth and I think that SCons just sucks for
this sort of thing.  I've seen a few different proposals for dealing
with this, but they all seem to suck.  Basically SCons builds the
dependence graph up front and then walks it.  It doesn't seem to be
able to build it on the fly.  (WAF is different in this regard.)  That
said, there are hacks out there to get around this, but I haven't
managed to get them to work and I'm not sure if they're fragile or
not.


That's unfortunate.




When you run for the first time, scons would see that foo.isa.dep
doesn't exist. During it's build phase, it would run foo.isa through the
system and see that it generated foo_exec.cc and bar_exec.cc and put
that into foo.isa.dep (as actual SConscript type code, or flat data,
or...). When scons ran the second time, it would read in foo.isa.dep and
extract the dependencies from it and build that into the graph. It
wouldn't construct foo.isa.dep again since all its inputs were the same,
and it would still capture all those dependencies. This time around, the
larger binary would see that it depended on foo_exec.cc and bar_exec.cc
and that those depend on foo.isa.dep (as a convenient aggregation point
of all *.isa files involved). If foo.isa changed later, foo.isa.dep
would be out of date and have to be regenerated, and then foo_exec.cc
and bar_exec.cc, and then the main binary.

Running SCons twice in the way you suggest would be FAR slower than
just running SLICC and the ISA parser twice the way we do.  Also,
notice that when SLICC is run twice, the modes are very different.
The first time it is run, it does parse the files, but only to figure
out what the dependencies are.  The second time it runs, it is run in
the mode to actually generate the files.  We parse all files twice
anyway to get the dependency information (scanners parse things like
.cc and .hh files to figure out dependencies with #includes, though
they basically just use regexes to do it).  I think you should do this
with the ISA parser.



I'm not sure how SLICC is structured, but it might be hard to get out  
file information without running the whole description, depending on  
how the multi-file output thing is implemented. I wrote an email about  
this a while ago, but basically it depends on if the output files are  
more like a list or a program. #include files are like a list where  
you can scan for them and know what they are without caring about any  
of the other text (or at least are sort of like that). The mulit-file  
output may be a lot more programmable and would be like determining  
what files a program is going to generate at run time. You could build  
in a dry run sort of mode that would just figure that out, but ISA  
descriptions tend to be pretty complicated and that would be a  
maintenance headache. It might still work out, though, since the  
descriptions tend to be relatively quick. That's compared to building  
the ginormous files they produce which can be S.L.O.W., especially if  
you start swapping.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] what scons can do

2011-04-21 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:


A mechanism where I can say binary foo needs file bar, and then when I tell
it to build foo it build's it with bar, and bar doesn't get mixed in to
other things. I think you're new widget that was like mercurial patch guards
would help with that, right? You could have a guard for each extra target.

Ah, I see.  I personally wouldn't use any of that.  Don't use Source
or PySource or that sort of thing.  Just set up normal looking
SConscript files with their own targets and such.  The
Source/PySource/etc. stuff is really about building the m5 binary and
dealing with all of the variations that we have.


Ok.




I'm not sure how SLICC is structured, but it might be hard to get out file
information without running the whole description, depending on how the
multi-file output thing is implemented.

In SLICC, we parse the entire file using the ply grammar and generate
the AST in both phases.  In one phase, you walk the AST simply to
figure out which files will be generated.  In the other, you actually
do the generation.  I'm suggesting that you do that as well, the files
aren't that long and I bet that parsing the ISA desc files and
generating the AST takes a second or so.



That doesn't really fit with how the ISA files work. They get broken  
into an AST, but that gets consumed as it goes, and it has a lot of  
anonymous python in it that just gets executed somehow. I want to move  
more into the python, so the AST will be less and less useful. I'm not  
sure whether the new support will be with explicit language support or  
part of the embedded python. I would prefer the later, but if it's  
harder to use that way language support may be best. Ultimately I want  
to make the ISA stuff a part of regular python scripts instead of the  
python being embedded in the ISA stuff, so in the longer term it will  
probably be best to have that mechanism in regular python.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] what scons can do

2011-04-21 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:


That doesn't really fit with how the ISA files work. They get broken into an
AST, but that gets consumed as it goes,

Does it have to be?


Making it not work that way would likely be very painful. The parser  
part is finicky (like they all are in any language) and we have lots  
and lots of very intricate code built on top of it in the form of the  
descriptions themselves.





and it has a lot of anonymous python
in it that just gets executed somehow. I want to move more into the python,
so the AST will be less and less useful.

Does the AST not contain enough information to know what files are
being generated?  The anonymous python itself creates files?  That
sounds crazy.


This may not be quite right, but off hand here is a summary of the isa  
parser's inputs and outputs. Going in, the parser starts with  
main.isa. There are ##includes (there are two #s on purpose) which  
bring in other .isa files. The parser reads in all those files by  
following the ##includes, stitches them together into one huge string,  
and then crunches through it all. As that's being processed, the  
description can read in other files that have, for instance, microcode  
in them. This already hits basically the problem we're talking about  
since the involved are determined by execution, not static landmarks  
like ##include. I solve this problem by manually listing all  
microcode files in the SConscript. It's a nasty hack, but it avoids  
not rebuilding when microcode changes which is even more annoying.


On the output side, the parser generates two files which implement the  
decoder, decoder.hh and decoder.cc. It also outputs one file for each  
CPU model involved that implements the exec (and related) functions.  
These are called something_something_exec.cc I think.


The problem is that for x86 for sure, but also now for ARM and likely  
for any other ISA with a lot of complexity and/or fidelity, those  
output files get to be very, very large. It's easy to run out or RAM,  
especially if scons tries to build more than one at a time or if  
you're on a smaller machine. Then the build grinds to a halt, as does  
everything else. Often the only solutions are to wait until it  
finishes or you die (whichever comes first) or rebooting the machine  
and trying again with more conservative settings.


What this mechanism would do would be to allow you to put different  
portions of the output into different files which would be compiled  
independently. Then scons compiling three things at once is equivalent  
to three normal files at once, not a million lines of code all at once.


To do that, you have to decide how to split things up so they still  
build. You could try hacking things up in an automated way, but that  
would likely either be overly restrictive, ineffective, incorrect, or  
all three. My plan is to expose the idea of different files to the ISA  
description author so that they can choose to put all the, say,  
floating point loads and stores together along with their utility  
functions. These may not all be defined in the same place or even in  
the same directory since there are ordering constraints in python as  
well as in the resulting C++. It might be that you define output files  
in a fixed place (def output floatMem, for instance) and then refer to  
them later when it comes time to put C++ someplace. That might make  
the most sense. It could also be that you have batches of similarly  
named output clusters (these will likely involve more than one file at  
a time, like a .cc and a .hh) and you'd want to generate them all  
programatically. I'm not sure exactly what it would look like to  
select an output file either. You might want to just put down markers  
that say, essentially, henceforth output goes in floatMem. Or pass  
an output cluster name into the outputting function (whatever that  
looks like).


It might work best in the near to mid term to put in static, ISA  
language defined declarations of output files which would be feasible  
to scan. In the mid to long term, though, I want to move away from  
having a custom language and move towards having the same machinery  
(extended and parameterized more) exposed as a module or something  
inside regular python scripts. Maybe something ala scons's SConscripts  
which are regular python that run in an armature, sort of.


I would be hesitant to make the ISA descriptions open and write to  
files themselves directly, but primarily because that would be  
cumbersome and error prone. I don't think we should design it out,  
though, unless it's just too evil to support.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] another compilation error in the tree (!!)

2011-04-20 Thread Gabriel Michael Black
Combining FS and SE is a lot like the heterogeneous ISA stuff in that  
it's something I'd really like to see happen but also where there are  
definitely some steps in the middle. The detail work of combining FS  
and SE will be easier, I think, but it will take more conceptual  
level/architectural changes.


Gabe

Quoting nathan binkert n...@binkert.org:


I'm certain that I compiled the whole tree with debug, opt, and fast,
in addition to running regressions.  I pretty much do that every time.
 What I must not have done was compile all of the different coherence
protocols though (I didn't realize that more were run in regressions
since I last updated my script).  Why don't we make opt the default
for util/regress if that's the right thing to do?  Of course, then I
may forget to compile fast and have bugs due to that.  Running both
opt and fast on regress every time would fix it, but the number of
variants that we need to compile is getting ridiculous.  We need to
figure out a way to compile multiple coherence protocols into a single
binary and probably compile SE and FS into the same binary.

  Nate


This has to do with DPRINTFs in Ruby code, so I think it's related to
Nate's recent changes.  Note that the regressions pass because they
run with m5.fast which compiles out DPRINTFs.  I'm pretty sure I've
said this before, but I think we should run regressions with opt and
not fast (at least some of the time).  Certainly people should run
regressions with opt and not just fast before they commit, since this
also gives the opportunity to catch run-time assertions.

Steve


% util/regress -k --variant=debug
[...]
 [     CXX]  
ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_L1Cache_request_type_to_event.cc

- .do
build/ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_L1Cache_request_type_to_event.cc:
In member function 'L2Cache_Event
L2Cache_Controller::L2Cache_L1Cache_request_type_to_event(CoherenceRequestType,
Address, MachineID, L2Cache_Entry*)':
build/ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_L1Cache_request_type_to_event.cc:32:
error: 'RubySlicc' is not a member of 'Debug'
scons: ***  
[build/ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_L1Cache_request_type_to_event.do]

Error 1
 [     CXX]  
ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_addSharer.cc - .do

build/ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_addSharer.cc:
In member function 'void
L2Cache_Controller::L2Cache_addSharer(Address, MachineID,
L2Cache_Entry*)':
build/ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_addSharer.cc:23:
error: 'RubySlicc' is not a member of 'Debug'
scons: ***  
[build/ALPHA_SE_MESI_CMP_directory/mem/protocol/L2Cache_addSharer.do]

Error 1
scons: `build/ALPHA_SE_MOESI_CMP_directory/m5.debug' is up to date.
 [     CXX]  
ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_averageLatencyEstimate.cc

- .do
build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_averageLatencyEstimate.cc:
In member function 'int
L1Cache_Controller::L1Cache_averageLatencyEstimate()':
build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_averageLatencyEstimate.cc:9:
error: 'RubySlicc' is not a member of 'Debug'
scons: ***  
[build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_averageLatencyEstimate.do]

Error 1
 [     CXX]  
ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_updateAverageLatencyEstimate.cc

- .do
build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_updateAverageLatencyEstimate.cc:
In member function 'void
L1Cache_Controller::L1Cache_updateAverageLatencyEstimate(int)':
build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_updateAverageLatencyEstimate.cc:9:
error: 'RubySlicc' is not a member of 'Debug'
scons: ***  
[build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L1Cache_updateAverageLatencyEstimate.do]

Error 1
 [     CXX]  
ALPHA_SE_MOESI_CMP_token/mem/protocol/L2Cache_convertToGenericType.cc

- .do
build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L2Cache_convertToGenericType.cc:
In member function 'GenericRequestType
L2Cache_Controller::L2Cache_convertToGenericType(CoherenceRequestType)':
build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L2Cache_convertToGenericType.cc:15:
error: 'RubySlicc' is not a member of 'Debug'
scons: ***  
[build/ALPHA_SE_MOESI_CMP_token/mem/protocol/L2Cache_convertToGenericType.do]

Error 1
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] what scons can do

2011-04-19 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


On Tue, Apr 19, 2011 at 3:13 PM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:


Caching the list of generated SLICC files sounds like a good idea to
me.  I'm not sure this would require recursive scons invocations,
since we manage to build the list dynamically already without that.  I
wouldn't call it a cache of dependency information though, since
scons already has one of those; this is really just a cache of
generated filenames, right?


How would you be able to do it all in one shot? The tricky part is that you
actually have to build the .dep in the build phase, but you need it in the
dependency tree generating phase. Since you can't go backwards like that in
one invocation (as far as I know or can imagine) then you'd need to rounds.


OK, I see, maybe it is inherently another level beyond what we
currently do.  I'm not the scons expert...


Me neither... Anybody else?




As far as what it would be caching I think it's largely a semantic
difference. You could consider it a cache of generated files which are used
to set up the dependencies.


It's just terminology... not that it's wildly inaccurate to call it
dependency info, since it is info that is eventually used to
determine dependencies, just that there's already a dependency
caching feature in scons that's totally different
(http://www.scons.org/doc/2.0.1/HTML/scons-user.html#AEN1148), and in
general when the scons docs talk about dependencies it's what are the
files that this file depends on not what are the files that depend
on this file.  Thus it would be less confusing if you avoided
referring to the info that you're discussing here as dependency info
and used a different term like generated file info.  That's all.



Yes, I see why that could be confusing. I won't call it that any more  
(unless I forget).


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Stats Bug

2011-04-18 Thread Gabriel Michael Black
My first reaction is let's fix it, but I don't really understand the  
problem or the impact of changing things. Anything serious?


Gabe

Quoting nathan binkert n...@binkert.org:


I'm trying to get my python stats stuff committed and I found a bug in
the classic cache stats.  Look in src/mem/cache/base.cc.  The
VectorStats have several different lengths _numCpus + 1, _numCpus,
or maxThreadsPerCPU.

The fact that this works in the current stats package is lucky.  I can
be bug compatible, but I think we should fix this instead.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: includes: fix up code after sorting

2011-04-15 Thread Gabriel Michael Black

I don't see how it avoids depending on transitive includes...

Quoting nathan binkert n...@binkert.org:


I noticed a clever way to fix this in the google style guide that we
could adopt if we choose.
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Names_and_Order_of_Includes


Cute... seems like a nice little tweak.  I assume your script can do
this automatically?  :-)


Honestly, I think it could with only a little effort, assuming that we
like the idea.  We're generally pretty good about having .cc files
with a matching .hh, though _impl.hh might take some extra tweaking.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: includes: fix up code after sorting

2011-04-15 Thread Gabriel Michael Black
Ah, ok, that makes more sense. I didn't expand the little blurb so it  
looked like foo.cc's foo.hh would be last and not first. But in the  
example code it's first which would make that work.


Gabe

Quoting nathan binkert n...@binkert.org:


I don't see how it avoids depending on transitive includes...


It's not perfect, but with it, every (or almost every) include file is
included first in at least one file.  Since the file is first there
will have been no other includes from other files, and it will only
compile correctly if its own includes are correct.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-07 Thread Gabriel Michael Black

Quoting Nilay Vaish ni...@cs.wisc.edu:


On Thu, 7 Apr 2011, Gabriel Michael Black wrote:

When you say this is portable, what do you mean? Portable between  
compilers? We usually use gcc, but we have at least partial support  
for other compilers. I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay


Between different versions of gcc. Do we actually test whether the  
code compiles using other compilers?


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I don't know if we actively test it, but it worked at one time. Ali  
did some work on that, I think to get it to build with sun's compiler  
back when he was doing the SPARC full system support. It would be a  
good idea not to bake in any dependence on gcc.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Running Ruby w/32 Cores

2011-04-06 Thread Gabriel Michael Black
When you say this is portable, what do you mean? Portable between  
compilers? We usually use gcc, but we have at least partial support  
for other compilers. I think this is necessary on some platforms.


Gabe



I would still root for using popcount() builtin available with GCC.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: X86: fnstsw: Another patch from Vince Weaver

2011-03-31 Thread Gabriel Michael Black
It might be ok, but I've been busy and forgot to look at it. Please  
give me a few more days.


Gabe

Quoting Lisa Hsu h...@eecs.umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/594/#review1056
---


Since there have been no objections, I'm going to commit this.

- Lisa


On 2011-03-17 16:07:24, Lisa Hsu wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/594/
---

(Updated 2011-03-17 16:07:24)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

X86:  fnstsw: Another patch from Vince Weaver


Diffs
-

  src/arch/x86/isa/decoder/x87.isa 2e269d6fb3e6
  src/arch/x86/isa/insts/x87/control/save_x87_status_word.py 2e269d6fb3e6
  src/arch/x86/isa/operands.isa 2e269d6fb3e6
  src/arch/x86/regs/misc.hh 2e269d6fb3e6

Diff: http://reviews.m5sim.org/r/594/diff


Testing
---


Thanks,

Lisa




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2011-03-22 Thread Gabriel Michael Black
The two issues below are copied from ARM_FS, but other targets had the  
same problems.


These errors are making the build fail.

build/ARM_FS/cpu/testers/networktest/networktest.cc: In member  
function 'void NetworkTest::completeRequest(Packet*)':
build/ARM_FS/cpu/testers/networktest/networktest.cc:160: warning:  
unused variable 'req'
build/ARM_FS/cpu/testers/networktest/networktest.cc: In member  
function 'void NetworkTest::tick()':
build/ARM_FS/cpu/testers/networktest/networktest.cc:194: error: call  
of overloaded 'pow(int, int)' is ambiguous



These warnings should probably be cleaned up too, although I don't  
know how long they've been there or how hard that would be.


build/ARM_FS/mem/ruby/system/Sequencer.cc: In member function 'void  
Sequencer::issueRequest(const RubyRequest)':
build/ARM_FS/mem/ruby/system/Sequencer.cc:616: warning: 'ctype' may be  
used uninitialized in this function
build/ARM_FS/mem/ruby/system/Sequencer.cc:653: warning: 'amtype' may  
be used uninitialized in this function


Quoting Cron Daemon r...@zizzer.eecs.umich.edu:


scons: *** [build/ALPHA_SE/cpu/testers/networktest/networktest.fo] Error 1
scons: ***  
[build/ALPHA_SE_MOESI_hammer/cpu/testers/networktest/networktest.fo]  
Error 1
scons: ***  
[build/ALPHA_SE_MESI_CMP_directory/cpu/testers/networktest/networktest.fo]  
Error 1
scons: ***  
[build/ALPHA_SE_MOESI_CMP_directory/cpu/testers/networktest/networktest.fo]  
Error 1
scons: ***  
[build/ALPHA_SE_MOESI_CMP_token/cpu/testers/networktest/networktest.fo]  
Error 1

scons: *** [build/ALPHA_FS/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/MIPS_SE/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/POWER_SE/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/SPARC_SE/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/X86_SE/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/X86_FS/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/ARM_SE/cpu/testers/networktest/networktest.fo] Error 1
scons: *** [build/ARM_FS/cpu/testers/networktest/networktest.fo] Error 1

See /z/m5/regression/regress-2011-03-22-03:00:01 for details.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: sim: use nextCycle() for quiesceSkip function

2011-03-21 Thread Gabriel Michael Black
This looks like the compiler being a little over ambitious looking for  
problems. Can we just turn off that warning?


Gabe

Quoting Korey Sewell ksew...@umich.edu:


Sorry, I didnt give the full details here. I'm using gcc4.4.1.  And when I
compile for m5.opt/debug then I get the following errors:
[ CXX] ALPHA_FS_MOESI_CMP_directory/sim/pseudo_inst.cc - .o
...
cc1plus: warnings being treated as errors
build/ALPHA_FS_MOESI_CMP_directory/sim/eventq.hh: In function 'void
PseudoInst::quiesceSkip(ThreadContext*)':
build/ALPHA_FS_MOESI_CMP_directory/sim/eventq.hh:526: error: assuming signed
overflow does not occur when assuming that (X + c)  X is always false


Basically, the curTick() + 1 line in pseudo_inst.cc and the subsequent
assert from eventq.hh aren't playing nice together.

On Mon, Mar 21, 2011 at 9:43 AM, Ali Saidi sa...@eecs.umich.edu wrote:


What is the exact problem you're trying to solve here?  Why can the
compiler complain about this? What is the error message?


Ali

Sent from my ARM powered device

On Mar 20, 2011, at 8:14 PM, Korey Sewell ksew...@umich.edu wrote:


 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.m5sim.org/r/603/
 ---

 Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and
Nathan Binkert.


 Summary
 ---

 sim: use nextCycle() for quiesceSkip function
 the increment of curTick causes some compiler to complain on an assert in
the event queue
 scheduler. Since the code is only scheduling for the next cycle it seems
safe to go ahead
 and just use the cpu's function to trick the compiler. NOTE: this only
comes up in opt/debug
 builds since asserts are taken out of fast


 Diffs
 -

  src/sim/pseudo_inst.cc c1c6f36e118e

 Diff: http://reviews.m5sim.org/r/603/diff


 Testing
 ---

 This passed the simple-atomic, simple-timing, and o3 regressions tests
for ARM_FS.


 Thanks,

 Korey

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





--
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: sim: use nextCycle() for quiesceSkip function

2011-03-21 Thread Gabriel Michael Black
I just saw Ali's email (casting to UTick). I like that idea better  
than mine. I had just assumed Tick already was unsigned, but  
apparently that's not true.


Gabe

Quoting Gabriel Michael Black gbl...@eecs.umich.edu:

This looks like the compiler being a little over ambitious looking  
for problems. Can we just turn off that warning?


Gabe

Quoting Korey Sewell ksew...@umich.edu:


Sorry, I didnt give the full details here. I'm using gcc4.4.1.  And when I
compile for m5.opt/debug then I get the following errors:
[ CXX] ALPHA_FS_MOESI_CMP_directory/sim/pseudo_inst.cc - .o
...
cc1plus: warnings being treated as errors
build/ALPHA_FS_MOESI_CMP_directory/sim/eventq.hh: In function 'void
PseudoInst::quiesceSkip(ThreadContext*)':
build/ALPHA_FS_MOESI_CMP_directory/sim/eventq.hh:526: error: assuming signed
overflow does not occur when assuming that (X + c)  X is always false


Basically, the curTick() + 1 line in pseudo_inst.cc and the subsequent
assert from eventq.hh aren't playing nice together.

On Mon, Mar 21, 2011 at 9:43 AM, Ali Saidi sa...@eecs.umich.edu wrote:


What is the exact problem you're trying to solve here?  Why can the
compiler complain about this? What is the error message?


Ali

Sent from my ARM powered device

On Mar 20, 2011, at 8:14 PM, Korey Sewell ksew...@umich.edu wrote:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/603/
---

Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and

Nathan Binkert.



Summary
---

sim: use nextCycle() for quiesceSkip function
the increment of curTick causes some compiler to complain on an assert in

the event queue

scheduler. Since the code is only scheduling for the next cycle it seems

safe to go ahead

and just use the cpu's function to trick the compiler. NOTE: this only

comes up in opt/debug

builds since asserts are taken out of fast


Diffs
-

 src/sim/pseudo_inst.cc c1c6f36e118e

Diff: http://reviews.m5sim.org/r/603/diff


Testing
---

This passed the simple-atomic, simple-timing, and o3 regressions tests

for ARM_FS.



Thanks,

Korey

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





--
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] Hung up for a bit

2011-03-18 Thread Gabriel Michael Black
Hey folks. My computer decided to eat itself yesterday and my file  
system ended up mangled. I think the important stuff in my home  
directory survived, but I'm in a Starbucks right now trying to get it  
straightened out. Please give me a few days extra slack responding  
since I don't know when I'll be back up and running.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] A small bug on thread pid and start time.

2011-03-03 Thread Gabriel Michael Black

They did mention the line that needed to be changed, so that's covered.

But I think we'll still need that other information (how to reproduce  
it, basically) so we can verify the fix ourselves. In any case, thanks  
for letting us know!


Gabe

Quoting Korey Sewell ksew...@umich.edu:


Many more details are needed for us to incorporate this bug fix.

What is the command line you ran that caused the error?

What is the output that shows the error?

What files do you change to fix the error?

On Thu, Mar 3, 2011 at 9:05 PM, Yi Xiang y...@colostate.edu wrote:


I found a bug while trying to generate trace file for thread execution. At
first, --trace-thread returned wrong value for thread tid and start time.
For example, tids for all the threads are 3211264.

Actually, this bug can be fixed by changing from Addr offset to ini32_t
offset in line106 and line 119 of threadinfo.hh.

After this change, I got the right thread pid.

All the best.

Yi Xiang
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





--
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ARM: Fix rfe macroops.

2011-02-24 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:





On 2011-02-24 14:56:09, Gabe Black wrote:
 This is actually better than my initial impression suggested. I  
think I saw all the lines from the Ra-URa change and got confused  
since those didn't have anything to do with RFE. I jumped to the  
conclusion that you guys had some big, elaborate, round about  
implementation for this when really you have two changes stuck  
together.


 I'm not 100% sure you're safe with out the PC change even if you  
rearrange things this way. What's going to happen is the NPC is  
going to be forced to PC + size of the instruction after every  
instruction is generated, even if that's wrong and is corrected by  
a branch mispredict. The problem manifests itself functionally when  
you have a mispredict and don't immediately move the NPC into the  
PC, preserving the correct control flow. That happened here because  
you were staying on the same inst to do more microops. I think most  
of the time if you put the branching microop last you'll get away  
with it, but I have a creeping doubt that there's some other corner  
case where it'll get you in trouble. I do agree that the ARM PC  
state is pretty complicated and it would be nice to slim it down  
somehow, or at least not make it worse.


There is always that case, but there is also the possibility that  
we'll accidentally introduce other weird bugs in the process. Right  
now we know this devil and there aren't many left. This is one of  
the last changes until the O3 cpu boots Linux/ARM.




Good to hear. If you guys are comfortable with it and you know the  
risks, then it's ok with me. We should put a big fat comment someplace  
so nobody unwittingly reintroduces the problem. I'm not sure where to  
put it where it's guaranteed to be seen, though.



On 2011-02-24 14:56:09, Gabe Black wrote:
 src/arch/arm/isa/operands.isa, line 231
 http://reviews.m5sim.org/r/474/diff/1/?file=10293#file10293line231

 There isn't anything inherently wrong with changing these  
names, but it doesn't belong in this patch. It causes a lot of code  
to change that doesn't have anything to do with RFE.


This patch made it clear that the names had to be changed, although  
it can be done in a separate patch. There was some conflict with one  
of the names that led to an invalid substitution.





Yeah, it makes sense to put that in a patch preceding this one then.  
Those sound like two distinct fixes.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] paths for disk images, kernels, etc.

2011-02-23 Thread Gabriel Michael Black
I don't want to beat the issue to death (too late?), but I think  
you'll agree that as implemented this is a hack. A sometimes useful  
hack that hasn't caused any serious problems so far, but still a hack.  
I'd like us to get rid of hacks by getting rid of the code or turning  
them into non-hack features. In this case, turning it into a feature  
would add a lot of code and complexity for not a lot of extra  
functionality (which I think you've said as well), so I'd prefer it  
got axed. Then again I didn't even know the code was there until Ali  
brought it up and I actively looked for it, so it's more a  
philosophical question whether to keep it or not (vs. expanding it  
which is a different sort of issue). If it helps you and it doesn't  
hurt anybody then I wouldn't mind leaving it alone.


Gabe

Quoting nathan binkert n...@binkert.org:


Ok, there's this long thread on how to deal with default configuration
options which turned into a specific argument about M5_PATH=

So, what's the deal?  I created a mechanism for having an m5
configuration directory ($HOME/.m5, but can be overridden by the
M5_CONFIG environment variable) and specifically an options.py file in
that config directory.  options.py gets executed before main() with
one parameter in the global dict, the options dictionary.  This
options dictionary can be modified to change default values of m5
options (the things that go before the script name), or even add
options that are specific to an installation.  Examples of things that
are convenient to do in such a file: change the default output
directory, change the default statistics output file name, change the
defaults for stdout and stderr (and redirecting those outputs), making
the output quiet, changing the default port of the remote debugger,
and set the script path.

All of these things have been pretty convenient for me at times.  You
can even parse command line options to add your own in here (though
that can be a bit of voodoo), and do things like enable iron python if
you ever use the interactive python prompt, or do things with pdb if
you use pdb.

What I wrote above is now the extent of the documentation of this
feature.  The thought was that the files in this directory could
basically be used like hooks are used in mercurial.

Now it's true that few (one?) people ever use this, and a lot of
people end up writing wrapper scripts for m5 command line stuff, e.g.
if you're running in a batch system.  That's of course possible, but
it begs the question, if you're going to write a bash script to wrap
M5, wouldn't it be better to write a python script that more or less
has access to all of M5 (and requires less code at the same time?).  I
agree that it is more or less impossible to document this feature
fully, but it may be easy enough to document the few things that
people actually want to use.

I'll totally agree that the way it is implemented may be poor, and we
may want to have a single configuration file with specific entry
points (a .m5.py file that exports a set of functions that get called
explicitly like hooks), but that's not hugely different.  Requiring me
to write wrapper scripts for M5 is annoying though.

There is one totally different way to approach this of that we (or
specific users that care) can stop using the m5 binary and start
using libm5.so.  Did you guys know that you can build m5 completely as
a library and do import m5?  That would then make it obvious that a
script can choose whether or not to call M5's main() routine.

No matter how we do it, I still like the ability to control what's
happening in M5 way early in execution.  Even if it is a mostly
undocumented feature for m5 hackers like me, it has made it easier for
me to implement and do stuff and it's easy enough to document the few
things that people might want to do.  Anything else (environment vars,
ini files, etc.) is actually more code and more maintenance.

  Nate




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] paths for disk images, kernels, etc.

2011-02-22 Thread Gabriel Michael Black
Oh, actually, this made me think of something I wanted to bring up.  
What we're talking about now is basically a config file, but we  
already have something called that. I was describing M5 to some folks  
at work a little while ago, and there seemed to be some confusion  
about how configuration worked. I think they may have assumed ahead of  
time a config would be more like a static file of values (like the  
ini-s used to be).


So to straighten out both of those issues, what if we called these  
default and command line values the configuration, and then call the  
python scripts simulation scripts? That disambiguates the term and  
hopefully makes it more obvious what, say, se.py is for and how to set  
up a simulation. I think it's something people usually figure out  
relatively quickly, but it wouldn't hurt to make it a little more  
obvious.


Gabe

Quoting Gabriel Michael Black gbl...@eecs.umich.edu:

Three places does seem like a lot. What are they specifically?  
Hopefully we can get rid of one or two.


That said, using python to generate python to put in a hidden  
directory to read with python embedded in a binary is, in my  
opinion, WAY more complicated than the problem warrants. The  
generating script would have to be fairly smart too so you could  
quickly update particular fields, remove previously set defaults, etc.


How about a file with key value pairs separated by equals signs.  
Those would be used as the defaults when the options were set up,  
and that would be the whole deal. If somethings doesn't make sense  
as an option to the m5 binary, it wouldn't go in that file.


There are a couple of things that follow from that. First, m5path  
would become an option to the binary which seems like a reasonable  
thing to do in its own right. That would mean we could eliminate the  
M5_PATH environment variable entirely, or leave it in somehow for  
compatibility.


Second, there would potentially be a significant increase in the  
number of options the binary supports which could make --help less  
useful. We could define two levels of help, regular and verbose,  
like hg does I think.


If we wanted to get fancy we could treat our file like hgrc and have  
one version in the repo, one in the home directory, and one system  
wide. Then they'd have to accumulate for paths and whatnot, I suppose.


Gabe

Quoting Ali Saidi sa...@umich.edu:




We could create a script and put it in the m5 root that emmitted  
python files into .m5 that properly setup the environment based on  
the user input. I rather like that idea and that way it could ask  
where various things were and we aren't setting a bunch of  
environment variables. I'm not a huge fan of the config.ini files  
because pretty much every thing we're going to want to add you need  
to add and check in three places. It would be better if it was just  
python that setup SysPaths and whatever else correctly.


Ali


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] paths for disk images, kernels, etc.

2011-02-22 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



Regarding /dist:
I think the directories of interest were binaries, disks, cpu2000.  
What more of a rundown do you want?




That's not what I meant. I meant why is /dist in /dist at all? Where  
did /dist come from? I've never seen it anywhere else. Everything else  
puts its files in /opt or /usr/share or something along those lines.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] paths for disk images, kernels, etc.

2011-02-22 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



On Feb 22, 2011, at 5:25 PM, Gabe Black wrote:

Like I said, why can't you just do that with a simple bash alias that
appends the options to the binary? Or a wrapper script? Why make a new
tool when an existing one works just as well?


I don't think any of those mechanisms are that great. Wrapper  
scripts normally muck with job control, stdin,  stdout and bash  
aliases assume you're using a particular shell. I think aliases are  
just magic and will cause a lot of confusion (e.g. looking i'm just  
running X, never mind your shell is secretly substituting something  
else without telling you). If you're in the unfontunate situation of  
having multiple csh/bashrc files depending on what machine/cluster  
you're on (i do) you have to try and keep those in sync rather than  
just keeping all your m5 setting in sync.
Finally, the last thing I want is my m5 command line to be even  
longer than it is now. They already wrap a couple of times on a  
terminal. Isn't this akin to saying, why have a .vimrc? Yes i can  
pass all the settings I have in my .vimrc on the command line to  
vim, but why would I want to?


The more I think about it the more I like a configs.py or whatever  
you want to call it. It lets you be very simple and should provide a  
single place to change any environment like settings and then there  
won't be any random environment variables or other things grabbed in  
an adhoc fashion in the scripts.




So basically what I'm worried about is creating a new, m5 only rats  
nest perhaps like those rc files you have to deal with. If we allow  
multiple files that cascade we have that problem, but if we don't then  
it's harder to have per machine settings that reflect the local file  
system layout, for instance. If we have per machine settings at all,  
they might cause unanticipated problems because they change things  
your batch job or whatever wasn't expecting.


As far as your .vimrc example, I think you have a point. That's an  
example of where this sort of thing is useful. The problem I see with  
that, though, is that we already have a programmable configuration  
system in the form of the simulation scripts. This is just for the  
little stuff you'd configure through the main binary which is much  
more limited than all the stuff you can control with a .vimrc.


What makes your command line so long? Is it options for the binary  
itself, or for the simulation script? This sort of thing won't help  
for the later, at least not without making it a bit of a hack. The  
only thing that makes my command line long as far as the binary  
section are any traceflags I'm turning on, and I doubt you'd want to  
force those on all the time. You're usage may be very different, of  
course.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] setting M5_PATH for the regressions

2011-02-22 Thread Gabriel Michael Black
Independent of how the specifics end up hashed out, I think it's safe  
to say it would be a good idea to set M5_PATH explicitly for the  
regressions instead of relying on the default built into M5 being  
tuned for that environment. I'd like to attempt to do that now. Does  
anyone object? Could somebody give me a quick explanation of how to do  
that? I have an idea, but I don't want to guess and screw something  
up. This would just be for the cron job running the nightly  
regressions. I wouldn't be changing any of the code in M5 yet,  
although I may send a patch or patches out for that too in the not too  
distant future.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] paths for disk images, kernels, etc.

2011-02-21 Thread Gabriel Michael Black
Three places does seem like a lot. What are they specifically?  
Hopefully we can get rid of one or two.


That said, using python to generate python to put in a hidden  
directory to read with python embedded in a binary is, in my opinion,  
WAY more complicated than the problem warrants. The generating script  
would have to be fairly smart too so you could quickly update  
particular fields, remove previously set defaults, etc.


How about a file with key value pairs separated by equals signs. Those  
would be used as the defaults when the options were set up, and that  
would be the whole deal. If somethings doesn't make sense as an option  
to the m5 binary, it wouldn't go in that file.


There are a couple of things that follow from that. First, m5path  
would become an option to the binary which seems like a reasonable  
thing to do in its own right. That would mean we could eliminate the  
M5_PATH environment variable entirely, or leave it in somehow for  
compatibility.


Second, there would potentially be a significant increase in the  
number of options the binary supports which could make --help less  
useful. We could define two levels of help, regular and verbose, like  
hg does I think.


If we wanted to get fancy we could treat our file like hgrc and have  
one version in the repo, one in the home directory, and one system  
wide. Then they'd have to accumulate for paths and whatnot, I suppose.


Gabe

Quoting Ali Saidi sa...@umich.edu:




We could create a script and put it in the m5 root that emmitted  
python files into .m5 that properly setup the environment based on  
the user input. I rather like that idea and that way it could ask  
where various things were and we aren't setting a bunch of  
environment variables. I'm not a huge fan of the config.ini files  
because pretty much every thing we're going to want to add you need  
to add and check in three places. It would be better if it was just  
python that setup SysPaths and whatever else correctly.


Ali


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] paths for disk images, kernels, etc.

2011-02-21 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:


Let me ponder the other stuff in this for a day or two...
Ali


Speaking of the other stuff, could anyone give a quick rundown of /dist?

Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Ruby: Recompiling SLICC

2011-02-17 Thread Gabriel Michael Black
I was actually thinking about this the other day so I'll chime in. I  
think the reason SLICC is being run is that you need to see what files  
it's going to produce to set up dependencies, basically like you might  
have to do if the ISA parser split up its output files. It might be  
possible to cache a list of some sort that said what all the input  
files were last time and check those for changes. If you know what all  
the input files were last time and none of them changed, presumably  
the set of input files would end up being the same and so would all  
the outputs. If any input file changed all bets would be off. I'm not  
sure how you'd accomplish that in scons, but it might cut out a lot of  
unnecessary SLICC runs.


Gabe

Quoting Ali Saidi sa...@umich.edu:


How about a way to make scons not rebuild if there isn't a change?

Ali

On Feb 17, 2011, at 10:21 AM, Korey Sewell wrote:


Thanks Nilay.

I wonder if people think that NO_HTML=True should be the default  
setting for builds?


On Wed, Feb 16, 2011 at 8:20 PM, Nilay Vaish ni...@cs.wisc.edu wrote:
On Wed, 16 Feb 2011, Korey Sewell wrote:

Hi all,
I noticed that on every build, SLICC wants to parse and generate C++/HTML on
every compile regardless of any changes
to code.

The C++ part seems pretty quick, but the HTML portion hangs slightly.

For the scons aficionados, how hard would it be to just have SLICC (or
maybe scons) be smart about this and cancel
the rebuild of all the files???

It's probably not a high priority task, but its just one of the things you
notice after compiling a good # of times...

--
- Korey


If you set NO_HTML=True when you issue scons command for  
compilation, that would prevent html files from being generated.


--
Nilay
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



--
- Korey
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev






___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Incompleteness in MOESI_CMP_directory-L1cache.sm

2011-02-16 Thread Gabriel Michael Black
Could you please use review board? I wouldn't know what I'm looking  
at, but other people might want a chance to look it over.


Gabe

Quoting Nilay Vaish ni...@cs.wisc.edu:

Can you email your patch, I'll take a look and commit the changes to  
the repository.


Thanks!
Nilay

On Wed, 16 Feb 2011, Joseph Pusdesris wrote:


Bump.
-Joseph

On Fri, Feb 11, 2011 at 3:28 PM, Joseph Pusdesris jo...@umich.edu wrote:


I have noticed that many of the action definitions are missing
out_msg.RequestorMachine assignments and it causes m5 to die when run with
--trace-flags=RubyNetwork.  I have fixed it locally by setting the
RequestorMachine to L1Cache for each of these, but I don't know  
enough about

the codebase to know if that is a proper fix.

Is there a fix for this in the pipeline?

-Joseph




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Condition code bits in X86 O3

2011-02-11 Thread Gabriel Michael Black
Hello again. I've had a chance to talk with an expert, and I have an  
idea of how to approach this. It's going to require more flexibility  
than the ISA parser has currently, though, specifically in how the  
list of source and destination registers are managed. It would also be  
nice to have a more integrated idea of composite operands, ie. ones  
where some bits come from here, some from there, and in the end it  
builds a single uint64_t, double precision float, vector of uint32_ts,  
etc.


Rather than try to shoe horn this into a system that's already  
suffered enough of my abuse, aka the ISA description language, I'm  
going to attempt to build a parallel facility for defining  
instructions usable from inside the python in let blocks. Basically  
it would be python classes, functions, etc., (hopefully not that many)  
exported into the let block context that would allow more direct  
interaction with the parser's guts, and more control over how things  
are put together.


In the future I'd like to see this bud into isa_parser2.py, but that's  
going to be a lot of work and is a somewhat orthogonal issue. Ideally  
this sort of thing will also make it easier to split output into  
smaller files.


Gabe

Quoting Gabe Black gbl...@eecs.umich.edu:


I'm looking at why x86 goes so much slower than Alpha on O3 (4x the
ticks), and I think one culprit are dependencies set up by the condition
code bits of the flags register. Many instructions in x86 modify or
depend on those bits, and even though the condition codes are separated
out from the flags register (which does a lot of other stuff too),
they're being updated with a read-modify-write sort of mechanism. I
expect that's setting up long chains of serializing dependencies which
is killing parallelism and performance.

Basically, There are 6 condition codes in x86, Z, C, A, S, P, O or zero,
carry, auxiliary carry, sign, parity and overflow. In M5's
implementation (and in the patent I patterned it after) there are also
artificial emulation zero and carry flags that work like the regular
ones but are maintained separately. They can be updated independently
and checked separately, and are useful behind the scenes when
implementing some macroops. Instructions may update all of these flags
or only some of them. The PTLSim manual claims that there's a ZAPS
rule where the zero, auxiliary carry, parity and sign bits are always
updated together. That's usually true, but certain instructions change
only the zero flag. CMPXCHG8B is an example.

What I'd been thinking of doing to handle this is to further split up
the condition code bits into separate registers to be managed
independently for any register renaming. There are a couple of issues
with that, though. First, it looks like there'd have to be 6 different
registers, APS, Z, O, C, EZ, and EC. A non-trivial number of
instructions would need to update 4 or more of those, putting a perhaps
unrealistic burden on any rename mechanism. That would also make the
simple CPUs slower because they'd have to read/write all those extra
registers. Bread and butter x86 tends to be condition code happy, so
that could be a significant slow down.

Also, that complicates decoding significantly. Conceptually it's easy to
imagine reading/writing the registers with the bits you need, but with
the ISA parser, the code needs to either be there or not be there. If
you have code that's never used but accesses a register, it'll still get
pulled in as a source or dest. That means there would need to be a hard
coded version of every microop that would correspond to each possible
combination of condition code bits. Since there are 6 bits, that's 2^6,
plus 2 variants for partial or complete register writes, so 2^7 or 128
versions of every microop. There are also register/immediate versions of
many microops. We would likely end up with thousands of microop classes.
We'd also need to generate selection functions that would pick which
variant to use. This is all possible, but fairly ugly and clunky.

So does anybody have any suggestions on how to unserialize these
microops? I found a paper here:
http://www.wseas.us/e-library/conferences/2006elounda1/papers/537-325.pdf
that claims IPC for x86 CPUs is significantly worse than other ISAs
specifically because of this sort of thing. Is this just a fact of life
with x86? Would fixing it be not only very annoying but also
unrealistic? Is that paper's claim actually true?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Profile Results for Mesh Network

2011-01-24 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


 Gabe, how many bytes at a time does the x86 predecoder fetch?  If it
doesn't currently grab a cache line at a time, could it be made to do so,
and do you know if that would cause any issues with SMC?


All of the predecoders expect to receive one MachInst at a time so  
that for fixed width ISAs they naturally get one instruction at a  
time. For x86 that type doesn't really mean anything other than  
controlling that width. Currently it's defined to be a uint64_t, but  
that could be changed pretty easily with hopefully basically no  
functional effect. Whatever it is would need to have some integer like  
properties, though, where it could be shifted, masked, etc. I have it  
set to a big int to try to minimize trips through the predecoder, the  
work needed to glue big immediates together, etc.


I heard during a presentation by a pretty senior VMware guy that x86  
isn't guaranteed to refetch instruction bytes until a control flow  
instruction and/or a serializing instruction or something along those  
lines. A former AMD guy in the room was surprised by that, though, so  
I'm not sure it's true. It could be one of those things that's  
supposed to be true, but to get minesweeper to work on Windows 95 they  
have to refetch anyway. It could also be a difference between Intel  
and AMD. So the take away answer is possibly no, possibly yes.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] EIO Regression Tests

2011-01-17 Thread Gabriel Michael Black

I think there are two important aspects of this issue.

1. Using regression tests we can't distribute freely has some  
important limitations. It would be nice to replace them with ones we  
can.


2. The majority of the regression tests we have now are really  
benchmarks which provide basic coverage by working/not working and not  
changing behavior unexpectedly. That's an important element to have  
since it's a practical reality check and probably hits things we  
wouldn't think to test. They have significant limitations, though,  
since they take a long time to run and tend to exercise the same  
simulator functionality over and over. For instance, gcc may generate  
code that always has the same type of backward branch for a for loop.  
Using gzip as a test will verify that that branch works, but possibly  
not the slightly different variant that may, for instance, use a large  
branch displacement. Even when writing code in x86 assembly it can be  
impossible to predict which of the possibly many redundant instruction  
encodings the assembler might pick.


So, in everyone's infinite free time, I think we should replace our  
benchmark based regressions with a smaller set of freely distributable  
regressions/inputs, and augment them with shorter, targeted tests that  
exercise particular mechanisms, circumstances, instructions, etc.  
Instead of replacing our existing benchmarks which are useful as  
actual benchmarks and are good to keep working, we could build up this  
second set of tests in parallel.


Gabe

Quoting Beckmann, Brad brad.beckm...@amd.com:


Hi Nilay,

I understand your confusion.  This is an example of where the wiki  
needs to be updated.  I believe the wiki only mentions the  
encumbered tar ball and doesn't mention the encumbered hg repo on  
repo.m5sim.org.  As far as the anagram test program goes, I remember  
Lisa and I encountered the same issue a while back and to resolve it  
I believe Lisa copied that test along with several other regression  
tester programs from Michigan to AMD.


I can provide you those regression tester programs, but at a higher  
level, I think this is a good time to ask the question on how we  
want to provide external users all the files necessary to run the  
regression tester?  As Nilay points out, the encumbered repo has  
some, but not all of the necessary files.  I believe, one also needs  
another set of regression tester programs which include both the  
anagram files, as well as the SPECCPU files for the long regression  
tester runs.


Thoughts?

Brad



-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
Behalf Of Nilay Vaish
Sent: Monday, January 17, 2011 1:55 PM
To: M5 Developer List
Subject: Re: [m5-dev] EIO Regression Tests

I figured that out, but there is no anagram directory in tests/test-
progs.
I, therefore, receive the following error:

gzip: tests/test-progs/anagram/bin/alpha/eio/anagram-vshort.eio.gz: No
such file or directory

--
Nilay

On Mon, 17 Jan 2011, Steve Reinhardt wrote:

 The one where the EIO code lives.  That's it's name, at
 http://repo.m5sim.org.

 On Mon, Jan 17, 2011 at 12:59 PM, Nilay Vaish ni...@cs.wisc.edu
wrote:

 What do you mean by the encumbered repository?


 On Mon, 17 Jan 2011, Steve Reinhardt wrote:

  Yes, it should be a concern... it should work.  Did you do a pull
on the
 encumbered repository?  There were some changes there needed to
maintain
 compatibility with the latest m5 dev repo.

 Otherwise you'll need to provide more detail about how things
failed.

 Steve

 On Mon, Jan 17, 2011 at 10:21 AM, Nilay Vaish ni...@cs.wisc.edu
wrote:

  I just ran the regression tests for the patch (deals with SLICC
and cache
 coherence protocols) that I need to commit. The EIO tests fail.
Should
 this
 be a concern?

 --
 Nilay
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


  ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: O3: Enhance data address translation by supporting hardware page table walkers.

2011-01-17 Thread Gabriel Michael Black
I looked at Alpha's ISA description briefly, and I didn't see anywhere  
an instruction did anything with the fault returned by a read/write in  
initateAcc other than return it. Do have an example where that isn't  
true? The only place I can think of where that would be useful is  
prefetches, but I think we already have a separate mechanism that  
handles that.


Gabe

Quoting Gabe Black gbl...@eecs.umich.edu:





On 2011-01-17 00:28:39, Gabe Black wrote:
 This change seems to have some dead functionality in it. The  
delay member added to translations is never used, the delay()  
pure virtual function is never used and not defined for any other  
ISAs (which I think will break them all), and the  
translationDelayed variable is never used. Unnecessary copyright  
changes should also be rolled back when removing that dead code.


 Are there any instructions that actually expect to get a valid  
fault when performing initiateAcc? I could imagine there might be  
since it used to be something you could always do for the most  
part, but getting rid of those instances would help simplify this  
code I think. The code that prepares a request for the memory  
system could follow from the finishTranslation function no matter  
if it happened immediately or after a table walk. Then an  
instruction would (hopefully) only require one pass since it's work  
would be done and it would either be ready to commit (a store) or  
ready for completeAcc (a load). The actual load/store queue would  
have to wait since the address might not be ready, but that might  
be a pretty simple extension on top of waiting for initiateAcc to  
happen.


Ali Saidi wrote:
I pointed out where delay() is used and it's not defined in  
BaseTLB, it's in Translation. All the regressions pass just fine.  
If you search for translationDelayed in the diff you'll see it is  
used too.


Yes, Alpha still returns a fault on initiateAcc() since there  
is no table walk, the tlb lookup is known immediately.


This is quite a substantial patch, so I would prefer to commit  
it close to as is and then work on patch on top of it to rework  
finishTranslation if it's possible.


Ok, yeah, I see where translationDelayed is used. I see where  
delay() is called, but I don't see where the value it sets (member  
variable delay) is actually used for anything. There is a lot of  
code here, so I might have missed it. My biggest concern about this  
patch is that it's adding a lot of code and state (relatively  
speaking) to handle the delayed translation case, and O3 is already  
really complicated. That may just be what's necessary, though. It  
would be nice to go into Alpha and adjust its semantics as far as  
expecting faults in initiateAcc. When I get a chance I'll look at  
them to see how easy that might be. I'm not sure if or when that  
would get done and taking advantage of it would require reworking a  
decent bit of this code, so it's not necessarily something you  
should wait for.


This is also at least the second time where I've been confused by  
nested classes or multiple classes in a file and thought I was  
looking at something I wasn't. I need to be more careful about that.



- Gabe


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/422/#review741
---


On 2011-01-12 09:12:18, Ali Saidi wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/422/
---

(Updated 2011-01-12 09:12:18)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

O3: Enhance data address translation by supporting hardware page  
table walkers.


Some ISAs (like ARM) relies on hardware page table walkers.  For those ISAs,
when a TLB miss occurs, initiateTranslation() can return with  
NoFault but with

the translation unfinished.

Instructions experiencing a delayed translation due to a hardware page table
walk are deferred until the translation completes and kept into the IQ.  In
order to keep track of them, the IQ has been augmented with a queue of the
outstanding delayed memory instructions.  When their translation completes,
instructions are re-executed (only their initiateAccess() was already
executed; their DTB translation is now skipped).  The IEW stage has been
modified to support such a 2-pass execution.


Diffs
-

  src/arch/arm/tlb.cc 5d0f62927d75
  src/cpu/base_dyn_inst.hh 5d0f62927d75
  src/cpu/base_dyn_inst_impl.hh 5d0f62927d75
  src/cpu/o3/fetch.hh 5d0f62927d75
  src/cpu/o3/iew_impl.hh 5d0f62927d75
  src/cpu/o3/inst_queue.hh 5d0f62927d75
  src/cpu/o3/inst_queue_impl.hh 5d0f62927d75
  src/cpu/o3/lsq_unit_impl.hh 5d0f62927d75
  src/cpu/simple/timing.hh 5d0f62927d75
  

Re: [m5-dev] Review Request: Time: Add a mechanism to prevent M5 from running faster than real time.

2011-01-10 Thread Gabriel Michael Black

No problem, I'd been meaning to do it for a while anyway.

As far as librt, I agree it has its drawbacks and I have no problem  
with alternatives. An alternative for clock_nanosleep is nanosleep,  
the big difference I think being that you can't tell it to wait until  
an absolute time, and you don't need to tell it what clock source to  
use. I don't have my book here with me so I'm not 100% sure of that. I  
looked briefly for a ns accurate get time function, but the one I  
used was the only one I found. I think that book said gettimeofday is  
now deprecated or even removed in the latest version of some standard,  
so it might not be the right choice. ns precision is probably overkill  
and you're not actually guaranteed that level of resolution anyway, so  
us is probably fine. I'm in new territory here so I'm definitely open  
to suggestions. As far as the lib check in the SConstruct, I mean to  
put an Exit(1) after the error message to make the build die. I think  
if we do stick with something requiring librt, I'd probably change  
that to set some flag so that timesyncing was disabled instead of  
failing out.


Also, I agree or at least don't disagree with all of the suggestions  
so far. I'll spin up a new version when I get a chance, possibly not  
changing out functions yet since I don't think we've nailed down what  
to change them to.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:





On 2011-01-10 15:41:41, Ali Saidi wrote:
 src/sim/root.cc, line 200
 http://reviews.m5sim.org/r/419/diff/1/?file=9421#file9421line200

 Thanks for taking the time to do this!


Ditto... I meant to say something in my review but forgot... despite  
the nitpicking, thanks for whipping up a solution so quickly.   
Otherwise we'd have nothing to nitpick :-)



- Steve


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/419/#review709
---


On 2011-01-10 07:44:05, Gabe Black wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/419/
---

(Updated 2011-01-10 07:44:05)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

Time: Add a mechanism to prevent M5 from running faster than real time.

M5 skips over any simulated time where it doesn't have any work to do. When
the simulation is active, the time skipped is short and the work done at any
point in time is relatively substantial. If the time between events is long
and/or the work to do at each event is small, it's possible for  
simulated time

to pass faster than real time. When running a benchmark that can be good
because it means the simulation will finish sooner in real time. When
interacting with the real world through, for instance, a serial terminal or
bridge to a real network, this can be a problem. Human or network  
response time

could be greatly exagerated from the perspective of the simulation and make
simulated events happen too soon from an external perspective.

This change adds the capability to force the simulation to run no  
faster than

real time. It does so by scheduling a periodic event that checks to see if
its simulated period is shorter than its real period. If it is, it  
stalls the

simulation until they're equal. This is called time syncing.

A future change could add pseudo instructions which turn time syncing on and
off from within the simulation. That would allow time syncing to be used for
the interactive parts of a session but then turned off when running a
benchmark using the m5 utility program inside a script. Time syncing would
probably not happen anyway while running a benchmark because there would be
plenty of work for M5 to do, but the event overhead could be avoided.


Diffs
-

  SConstruct c06505ff551e
  configs/example/fs.py c06505ff551e
  src/sim/Root.py c06505ff551e
  src/sim/SConscript c06505ff551e
  src/sim/root.hh PRE-CREATION
  src/sim/root.cc c06505ff551e

Diff: http://reviews.m5sim.org/r/419/diff


Testing
---


Thanks,

Gabe








___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: IntDev: latency fix

2011-01-07 Thread Gabriel Michael Black
Ok, yeah, looking at it again I think you probably have something. I  
keep mixing up the port and the containing device in my head when I  
think about this. A revised version of my suggestion would be to move  
the latency into IntDev and use it there. Sorry if I propagated any of  
my own confusion.


Gabe

Quoting Joel Hestness hestn...@cs.utexas.edu:





On 2011-01-07 04:34:28, Gabe Black wrote:
 See review of the earlier IntDev patch. Basically this is  
displacing the latency value from the base class that uses it into  
the subclass that gets it from the config. I don't think it's  
necessary as described previously, but also that decentralizes a  
value that's always used in the same place for the same purpose.


**Note that this patch removes the latency member from IntPort.**   
This patch doesn't indicate where the latency member should end up  
(I'll comment on that in the other review request).  Regardless of  
where the latency is handled, the rest of the codebase indicates  
that a port should not be responsible for assessing latency (see  
mem/port.*, mem/tport.* and mem/mport.*), so this is why I removed  
latency from the IntPort definition.



- Joel


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/384/#review641
---


On 2011-01-06 15:57:01, Brad Beckmann wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/384/
---

(Updated 2011-01-06 15:57:01)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

IntDev: latency fix

Since the device should be responsible for latency of packets, remove the
latency field of the IntPort completely.


Diffs
-

  src/dev/x86/intdev.hh 9f9e10967912

Diff: http://reviews.m5sim.org/r/384/diff


Testing
---


Thanks,

Brad








___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: m5: added work completed monitoring support

2011-01-07 Thread Gabriel Michael Black

Please update existing review requests instead of creating new ones.

Gabe

Quoting Brad Beckmann brad.beckm...@amd.com:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/418/
---

Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

m5: added work completed monitoring support


Diffs
-

  configs/common/FSConfig.py 9f9e10967912
  configs/common/Options.py 9f9e10967912
  configs/example/fs.py 9f9e10967912
  configs/example/ruby_fs.py 9f9e10967912
  src/arch/x86/isa/decoder/two_byte_opcodes.isa 9f9e10967912
  src/cpu/base.hh 9f9e10967912
  src/cpu/base.cc 9f9e10967912
  src/sim/SConscript 9f9e10967912
  src/sim/System.py 9f9e10967912
  src/sim/pseudo_inst.hh 9f9e10967912
  src/sim/pseudo_inst.cc 9f9e10967912
  src/sim/system.hh 9f9e10967912
  src/sim/system.cc 9f9e10967912
  util/m5/m5op_x86.S 9f9e10967912
  util/m5/m5ops.h 9f9e10967912

Diff: http://reviews.m5sim.org/r/418/diff


Testing
---


Thanks,

Brad





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: MessagePort: implemented virtual recvTiming avoiding double delete

2011-01-07 Thread Gabriel Michael Black
I'll have to look at this again and see if I can figure out what's  
going on. For now I wanted to mention that Valgrind isn't necessarily  
going to be useful in determining if there's a memory leak here  
because these messages are sent infrequently and only leak a little  
bit each time. In the course of a simulation there will be a lot of  
other stuff that doesn't get deallocated, and this would pretty easily  
slip into the noise. I think it might break out things that it only  
thinks are leaks and memory that's no longer reachable, but we've got  
enough crazy stuff going on with python/swig/etc., that may not be  
very accurate. You might keep a count of the packets that have been  
allocated but not deleted and see if it gradually goes up on average  
or stays fairly steady.


Gabe

Quoting Joel Hestness hestn...@cs.utexas.edu:





On 2011-01-07 04:21:05, Gabe Black wrote:
 I think there are two problems with this patch. First, if at all  
possible we should avoid the code duplication we'd now have for the  
recvTiming function. Second, while this probably does fix the  
legitimate problem of deleting packets twice, I think it creates a  
memory leak in the process. I suspect if you leave your other  
changes in place but get rid of your custom recvTiming function,  
things will still work. The packet won't be deleted by the device,  
won't be deleted after being received as a request in either atomic  
or timing mode, but will be deleted in both modes after being  
received as a response. The virtual you added in tport.hh could  
almost certainly go away then too.


Brad Beckmann wrote:
Joel is the one who actually wrote this patch, so hopefully he  
can elaborate on the possible the memory leak.  I'll hold off on  
this patch until he can respond.


Actually, the double delete problem still exists if we removed the  
(almost) replicated recvTiming code.  This is because  
pkt-needsResponse() returns false when the message type is  
MemCmd::MessageResp, which causes execution of the needsResponse  
else clause in SimpleTimingPort::recvTiming.  It would be freed  
there, as well as in recvAtomic.


I think when I tested this with Valgrind, I didn't see the memory  
leak (doesn't mean it doesn't exist).  However, I don't think I was  
able to justify to myself why it didn't occur.


I remember that I spent a while trying to figure out how to make  
this work nicely, but the inheritance SimpleTimingPort -  
MessagePort - IntPort, and the overloading that that implies makes  
this quite difficult to analyze.  For instance, I'm still not clear  
why the new MemCmd, MessageReq/Resp, needed to be defined for this.




On 2011-01-07 04:21:05, Gabe Black wrote:
 src/mem/tport.hh, line 145
 http://reviews.m5sim.org/r/382/diff/1/?file=9048#file9048line145

 Marking this as explicitly virtual shouldn't really be  
necessary. Is there a reason you want to?


I think I had trouble compiling since MessagePort overloads  
recvTiming.  In this patch, MessagePort would become the first  
(only) descendant class of SimpleTimingPort that overloads recvTiming.



- Joel


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/382/#review639
---


On 2011-01-06 15:56:19, Brad Beckmann wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/382/
---

(Updated 2011-01-06 15:56:19)


Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt,  
and Nathan Binkert.



Summary
---

MessagePort: implemented virtual recvTiming avoiding double delete

Double packet delete problem is due to an interrupt device deleting a packet
that the SimpleTimingPort also deletes. Since MessagePort descends from
SimpleTimingPort, simply reimplement the failing code from SimpleTimingPort:
recvTiming.


Diffs
-

  src/arch/x86/interrupts.cc 9f9e10967912
  src/dev/x86/intdev.hh 9f9e10967912
  src/mem/mport.hh 9f9e10967912
  src/mem/mport.cc 9f9e10967912
  src/mem/tport.hh 9f9e10967912

Diff: http://reviews.m5sim.org/r/382/diff


Testing
---


Thanks,

Brad








___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: ARM: Add checkpointing support

2010-12-09 Thread Gabriel Michael Black
In reference to this and the next email, I don't think these should be  
conditioned on the ARM ISA because it looks like those values are  
valid and need to be serialized for all ISAs. It's just that the other  
ISAs happen to have checkpoints that are old enough that they break,  
and that's not a good enough reason to introduce a superficial ISA  
incompatibility.


The core problem, which is a real issue, is that checkpoints created  
with one version of M5 aren't necessarily compatible with other  
versions, although they tend to be. That's unfortunate, but I think to  
avoid really binding our hands as far as how we can change things,  
that's just one of those things we'll need to live with. We could try  
to implement some sort of versioning system maybe just to warn that  
something is incompatible, or maybe have a default value if something  
isn't found. I'm not sure whether that would solve the problem,  
though, and it would add a decent amount of complexity.


As far as regression testing, that would be great. There have been a  
lot of times where I've wished we had that since I accidentally broke  
something :-). I remember hearing that would be hard to do for some  
reason, though.


Gabe

Quoting Beckmann, Brad brad.beckm...@amd.com:


Hi Ali,

I just synced with this changeset 7733, as well as changeset 7730,  
and I now notice that the modifications to physical.cc break all  
previous checkpoints.  Can we put the lal_addr and lal_cid  
serialization and unserialization in a conditional that tests for  
the ARM ISA?  I welcome other suggestions as well.


In general, I would be interested to hear other people's thoughts on  
adding a checkpoint test to the regression tester.  It would be  
great if we can at least identify ahead of time what changesets  
break older checkpoints.


Brad




-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
Behalf Of Ali Saidi
Sent: Monday, November 08, 2010 11:59 AM
To: m5-dev@m5sim.org
Subject: [m5-dev] changeset in m5: ARM: Add checkpointing support

changeset 08d6a773d1b6 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=08d6a773d1b6
description:
ARM: Add checkpointing support

diffstat:

 src/arch/arm/isa.hh  |  12 +-
 src/arch/arm/linux/system.cc |   5 +-
 src/arch/arm/linux/system.hh |   4 +-
 src/arch/arm/pagetable.hh|  87 +++

 src/arch/arm/table_walker.cc |  16 ++-
 src/arch/arm/table_walker.hh |   2 +-
 src/arch/arm/tlb.cc  |  14 ++-
 src/arch/arm/tlb.hh  |   2 -
 src/dev/arm/gic.cc   |  44 +-
 src/dev/arm/pl011.cc |  42 -
 src/dev/arm/rv_ctrl.cc   |   2 -
 src/dev/arm/timer_sp804.cc   |  59 -
 src/dev/arm/timer_sp804.hh   |   4 ++
 src/mem/physical.cc  |  30 +++
 src/mem/physical.hh  |   5 ++
 src/sim/SConscript   |   1 +
 src/sim/system.cc|   2 +-
 src/sim/system.hh|   2 +-
 18 files changed, 268 insertions(+), 65 deletions(-)

diffs (truncated from 587 to 300 lines):

diff -r a2c660de7787 -r 08d6a773d1b6 src/arch/arm/isa.hh
--- a/src/arch/arm/isa.hh   Mon Nov 08 13:58:24 2010 -0600
+++ b/src/arch/arm/isa.hh   Mon Nov 08 13:58:25 2010 -0600
@@ -178,10 +178,18 @@
 }

 void serialize(EventManager *em, std::ostream os)
-{}
+{
+DPRINTF(Checkpoint, Serializing Arm Misc Registers\n);
+SERIALIZE_ARRAY(miscRegs, NumMiscRegs);
+}
 void unserialize(EventManager *em, Checkpoint *cp,
 const std::string section)
-{}
+{
+DPRINTF(Checkpoint, Unserializing Arm Misc Registers\n);
+UNSERIALIZE_ARRAY(miscRegs, NumMiscRegs);
+CPSR tmp_cpsr = miscRegs[MISCREG_CPSR];
+updateRegMap(tmp_cpsr);
+}

 ISA()
 {
diff -r a2c660de7787 -r 08d6a773d1b6 src/arch/arm/linux/system.cc
--- a/src/arch/arm/linux/system.cc  Mon Nov 08 13:58:24 2010 -0600
+++ b/src/arch/arm/linux/system.cc  Mon Nov 08 13:58:25 2010 -0600
@@ -99,9 +99,9 @@
 }

 void
-LinuxArmSystem::startup()
+LinuxArmSystem::initState()
 {
-ArmSystem::startup();
+ArmSystem::initState();
 ThreadContext *tc = threadContexts[0];

 // Set the initial PC to be at start of the kernel code
@@ -117,7 +117,6 @@
 {
 }

-
 LinuxArmSystem *
 LinuxArmSystemParams::create()
 {
diff -r a2c660de7787 -r 08d6a773d1b6 src/arch/arm/linux/system.hh
--- a/src/arch/arm/linux/system.hh  Mon Nov 08 13:58:24 2010 -0600
+++ b/src/arch/arm/linux/system.hh  Mon Nov 08 13:58:25 2010 -0600
@@ -67,8 +67,8 @@
 LinuxArmSystem(Params *p);
 ~LinuxArmSystem();

-/** Initialize the CPU for booting */
-void startup();
+void initState();
+
   private:
 #ifndef NDEBUG
 /** Event to halt the simulator if the kernel calls panic()  

Re: [m5-dev] changeset in m5: ARM: Add checkpointing support

2010-12-09 Thread Gabriel Michael Black
Right, I don't want to sound like those sort of changes should just be  
done cavalierly. I'm just saying we shouldn't be afraid to make them  
when we need to.


Gabe

Quoting Beckmann, Brad brad.beckm...@amd.com:


Thanks Gabe.

Yeah, I don't think conditioning on the ARM ISA for this change is  
necessarily a good idea.  I just suggested it to see if there was  
some sort of conditional we could add.  As far as the _size  
parameter goes, it is not clear to me why that needs to be in the  
checkpoint.  Isn't the memory size passed in from the configuration?  
 Maybe I'm not fully understanding Ali's change.


Overall, I think we need to be careful by just saying that these  
sort of changes are just something people need to deal with.  It is  
really hard for someone to easily fix these sort of problems because  
it often requires understanding someone else's code in a good amount  
of detail.  I agree that any sort of versioning system would add a  
decent amount of complexity.  But continually dealing with these  
sorts of problems as well as helping others deal with them, will not  
be easy either.


Brad




-Original Message-
From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
Behalf Of Gabriel Michael Black
Sent: Thursday, December 09, 2010 9:04 PM
To: m5-dev@m5sim.org
Subject: Re: [m5-dev] changeset in m5: ARM: Add checkpointing support

In reference to this and the next email, I don't think these should be
conditioned on the ARM ISA because it looks like those values are
valid and need to be serialized for all ISAs. It's just that the other
ISAs happen to have checkpoints that are old enough that they break,
and that's not a good enough reason to introduce a superficial ISA
incompatibility.

The core problem, which is a real issue, is that checkpoints created
with one version of M5 aren't necessarily compatible with other
versions, although they tend to be. That's unfortunate, but I think to
avoid really binding our hands as far as how we can change things,
that's just one of those things we'll need to live with. We could try
to implement some sort of versioning system maybe just to warn that
something is incompatible, or maybe have a default value if something
isn't found. I'm not sure whether that would solve the problem,
though, and it would add a decent amount of complexity.

As far as regression testing, that would be great. There have been a
lot of times where I've wished we had that since I accidentally broke
something :-). I remember hearing that would be hard to do for some
reason, though.

Gabe

Quoting Beckmann, Brad brad.beckm...@amd.com:

 Hi Ali,

 I just synced with this changeset 7733, as well as changeset 7730,
 and I now notice that the modifications to physical.cc break all
 previous checkpoints.  Can we put the lal_addr and lal_cid
 serialization and unserialization in a conditional that tests for
 the ARM ISA?  I welcome other suggestions as well.

 In general, I would be interested to hear other people's thoughts on
 adding a checkpoint test to the regression tester.  It would be
 great if we can at least identify ahead of time what changesets
 break older checkpoints.

 Brad



 -Original Message-
 From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On
 Behalf Of Ali Saidi
 Sent: Monday, November 08, 2010 11:59 AM
 To: m5-dev@m5sim.org
 Subject: [m5-dev] changeset in m5: ARM: Add checkpointing support

 changeset 08d6a773d1b6 in /z/repo/m5
 details: http://repo.m5sim.org/m5?cmd=changeset;node=08d6a773d1b6
 description:
ARM: Add checkpointing support

 diffstat:

  src/arch/arm/isa.hh  |  12 +-
  src/arch/arm/linux/system.cc |   5 +-
  src/arch/arm/linux/system.hh |   4 +-
  src/arch/arm/pagetable.hh|  87 +++-
---
 
  src/arch/arm/table_walker.cc |  16 ++-
  src/arch/arm/table_walker.hh |   2 +-
  src/arch/arm/tlb.cc  |  14 ++-
  src/arch/arm/tlb.hh  |   2 -
  src/dev/arm/gic.cc   |  44 +-
  src/dev/arm/pl011.cc |  42 -
  src/dev/arm/rv_ctrl.cc   |   2 -
  src/dev/arm/timer_sp804.cc   |  59 -
  src/dev/arm/timer_sp804.hh   |   4 ++
  src/mem/physical.cc  |  30 +++
  src/mem/physical.hh  |   5 ++
  src/sim/SConscript   |   1 +
  src/sim/system.cc|   2 +-
  src/sim/system.hh|   2 +-
  18 files changed, 268 insertions(+), 65 deletions(-)

 diffs (truncated from 587 to 300 lines):

 diff -r a2c660de7787 -r 08d6a773d1b6 src/arch/arm/isa.hh
 --- a/src/arch/arm/isa.hh  Mon Nov 08 13:58:24 2010 -0600
 +++ b/src/arch/arm/isa.hh  Mon Nov 08 13:58:25 2010 -0600
 @@ -178,10 +178,18 @@
  }

  void serialize(EventManager *em, std::ostream os)
 -{}
 +{
 +DPRINTF(Checkpoint, Serializing Arm Misc
Registers\n);
 +SERIALIZE_ARRAY(miscRegs, NumMiscRegs

Re: [m5-dev] param type documentation

2010-12-06 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:


I put together a wiki page here:

http://m5sim.org/wiki/index.php/Python_Parameter_Types

that attempts to document the parameter types that are available and
how they work. If the experts (that's probably you, Steve and Nate)
could look it over and make sure I didn't misinterpret/mangle
something, I'd appreciate it.


Looks pretty good. I have one question about this:

IPAddress: Four 2 digit hex values separated by .s, for instance
01.23.45.67, or an integer where the leftmost component is the most
significant byte.


Sorry crazy mailer.  Is it really hex?  Who uses hex for IP Addrs?  It
should be integers between 0 and 255

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



No, it isn't. Fortunately that's a bug in my documentation and not the  
parameter types. I'll fix it.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: O3: Make all instructions that write a misc register not perform the write until commit.

2010-11-15 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



On Nov 15, 2010, at 8:29 PM, Gabriel Michael Black wrote:
It seems like our notion of serializing came from the definition  
Alpha uses, and sometimes we need a stronger one. X86 is going to  
have similar issues in some cases, although I couldn't necessarily  
list them for you off hand. The nuke everything flag your  
proposing might be the best solution because I doubt we'd ever need  
anything stronger than that. Maybe you could make the CPU stop  
fetching too, but I don't see how that would be useful and it's  
probably very hard to do.


This also highlights the usefulness of target tests of particular  
features like changing the ISA and then immediately using it as  
apposed to getting specific workloads to work. The compiler, code  
author, etc., only wants to achieve a functional result, and  
they'll probably use the same structure and features over and over  
again since those work well, are equivalently good to the other  
options, etc. There are swathes of x86, which granted is very  
large, that aren't implemented at all and Linux boots just fine,  
but depending on how picky you are you could say those areas are  
severely broken. The same thing could be happening less  
intentionally elsewhere.


I don't mean to pick on you Gabe, I'm just surprised that when the  
compiler inserts wr asi in happens to also include enough padding  
before it needs it that it worked in the past. I think the load that  
used it would minimally have to be a cache line away. It's just  
bizarre to me unless there was an interlock on wr asi that they knew  
about. Anyway, I've added a flag called isSquashState. I don't like  
the name, so please propose a better one. isSquashAfter? After  
commit in the o3 cpu it checks for the flag and squashes the world.  
This clears up the issue enough for gzip to start running. I should  
know in 3 hours if it is sufficient.


Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



No problem, I didn't think you were picking on me :-). Weren't there  
fake faults introduced to ARM that were used for a similar effect? The  
flag approach is better I think, but the instructions that throw that  
fault could be refitted, right?


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] param type documentation

2010-11-11 Thread Gabriel Michael Black

I put together a wiki page here:

http://m5sim.org/wiki/index.php/Python_Parameter_Types

that attempts to document the parameter types that are available and  
how they work. If the experts (that's probably you, Steve and Nate)  
could look it over and make sure I didn't misinterpret/mangle  
something, I'd appreciate it.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: scons: Work around for old versions of scons mi...

2010-11-11 Thread Gabriel Michael Black
I'd guess it's trying to be smart and make sure the single string  
argument ends up being a single entry in argv. I'd also guess that's  
why g++ is upset.


Gabe

Quoting Ali Saidi sa...@umich.edu:



g++ -o build/ARM_FS/arch/arm/predecoder.do -c -Wno-deprecated -pipe  
-fno-strict-aliasing -Wall -Wno-sign-compare -Wundef -arch x86_64  
-ggdb3 -Werror -DDEBUG -DTRACING_ON=1 -Ibuild/gzstream  
-Ibuild/libelf -Iext  
-I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6  
-Ibuild/ARM_FS build/ARM_FS/arch/arm/predecoder.cc

cc1plus: error: unrecognized command line option -arch x86_64

$ g++ -o build/ARM_FS/arch/arm/predecoder.do -c -Wno-deprecated  
-pipe -fno-strict-aliasing -Wall -Wno-sign-compare -Wundef -arch  
x86_64 -ggdb3 -Werror -DDEBUG -DTRACING_ON=1 -Ibuild/gzstream  
-Ibuild/libelf -Iext  
-I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6  
-Ibuild/ARM_FS build/ARM_FS/arch/arm/predecoder.cc

cc1plus: error: unrecognized command line option -arch x86_64
$
$ g++ -o build/ARM_FS/arch/arm/predecoder.do -c -Wno-deprecated  
-pipe -fno-strict-aliasing -Wall -Wno-sign-compare -Wundef -arch  
x86_64 -ggdb3 -Werror -DDEBUG -DTRACING_ON=1 -Ibuild/gzstream  
-Ibuild/libelf -Iext  
-I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6  
-Ibuild/ARM_FS build/ARM_FS/arch/arm/predecoder.cc

$

for whatever reason g++ doesn't like -arch x86_64, but -arch  
x86_64 is fine. Why is it quoting it anyway?


Ali




On Thu, 11 Nov 2010 14:04:26 -0800, nathan binkert n...@binkert.org wrote:

Out of curiosity, what happens if you do ['-arch', 'x86_64' ]

I haven't compiled on my mac in a little while.  I can try next week.

 Nate


This patch breaks compilation on m5 mac with scons 1.3.0. reverting the []
around the -arch x86_64 fixes the problem. Nate, have you run into this?

Thanks,
Ali

On Tue, 09 Nov 2010 14:04:05 -0500, Gabe Black gbl...@eecs.umich.edu
wrote:


changeset f97a5f4d0879 in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=f97a5f4d0879
description:
       scons: Work around for old versions of scons mistaking strings for
sequences.

diffstat:

 SConstruct            |  32 
 ext/libelf/SConscript |   2 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diffs (86 lines):

diff -r e2e8ca8d9640 -r f97a5f4d0879 SConstruct
--- a/SConstruct        Tue Nov 09 10:45:02 2010 -0800
+++ b/SConstruct        Tue Nov 09 11:03:40 2010 -0800
@@ -358,10 +358,10 @@

 # Set up default C++ compiler flags
 if main['GCC']:
-    main.Append(CCFLAGS='-pipe')
-    main.Append(CCFLAGS='-fno-strict-aliasing')
+    main.Append(CCFLAGS=['-pipe'])
+    main.Append(CCFLAGS=['-fno-strict-aliasing'])
    main.Append(CCFLAGS=['-Wall', '-Wno-sign-compare', '-Wundef'])
-    main.Append(CXXFLAGS='-Wno-deprecated')
+    main.Append(CXXFLAGS=['-Wno-deprecated'])
    # Read the GCC version to check for versions with bugs
    # Note CCVERSION doesn't work here because it is run with the CC
    # before we override it from the command line
@@ -369,16 +369,16 @@
    if not compareVersions(gcc_version, '4.4.1') or \
       not compareVersions(gcc_version, '4.4.2'):
        print 'Info: Tree vectorizer in GCC 4.4.1  4.4.2 is buggy,
disabling.'
-        main.Append(CCFLAGS='-fno-tree-vectorize')
+        main.Append(CCFLAGS=['-fno-tree-vectorize'])
 elif main['ICC']:
    pass #Fix me... add warning flags once we clean up icc warnings
 elif main['SUNCC']:
-    main.Append(CCFLAGS='-Qoption ccfe')
-    main.Append(CCFLAGS='-features=gcc')
-    main.Append(CCFLAGS='-features=extensions')
-    main.Append(CCFLAGS='-library=stlport4')
-    main.Append(CCFLAGS='-xar')
-    #main.Append(CCFLAGS='-instances=semiexplicit')
+    main.Append(CCFLAGS=['-Qoption ccfe'])
+    main.Append(CCFLAGS=['-features=gcc'])
+    main.Append(CCFLAGS=['-features=extensions'])
+    main.Append(CCFLAGS=['-library=stlport4'])
+    main.Append(CCFLAGS=['-xar'])
+    #main.Append(CCFLAGS=['-instances=semiexplicit'])
 else:
    print 'Error: Don\'t know what compiler options to use for your
compiler.'
    print '       Please fix SConstruct and src/SConscript and try again.'
@@ -399,7 +399,7 @@

 if sys.platform == 'cygwin':
    # cygwin has some header file issues...
-    main.Append(CCFLAGS=-Wno-uninitialized)
+    main.Append(CCFLAGS=[-Wno-uninitialized])

 # Check for SWIG
 if not main.has_key('SWIG'):
@@ -489,10 +489,10 @@
    uname = platform.uname()
    if uname[0] == 'Darwin' and compareVersions(uname[2], '9.0.0') = 0:
        if int(readCommand('sysctl -n hw.cpu64bit_capable')[0]):
-            main.Append(CCFLAGS='-arch x86_64')
-            main.Append(CFLAGS='-arch x86_64')
-            main.Append(LINKFLAGS='-arch x86_64')
-            main.Append(ASFLAGS='-arch x86_64')
+            main.Append(CCFLAGS=['-arch x86_64'])
+            main.Append(CFLAGS=['-arch x86_64'])
+            main.Append(LINKFLAGS=['-arch x86_64'])
+            

Re: [m5-dev] Times completeAcc has to be called on stores

2010-11-11 Thread Gabriel Michael Black

Sorry about that. Do you need to fix it, or me? It may be simpler if you do.

That also brings up the question of testing with ARM. Internal to ARM  
testing is fairly sophisticated and gets at most of what's implemented  
instruction wise. External to ARM testing is much less so, so it's  
hard to know whether something broke. I don't have any suggestions  
there, but it would be nice to bring those into alignment somehow.  
Maybe an FS regression would cover a lot of the gaps? I'm pretty sure  
that's in the works, right?


Gabe

Quoting Ali Saidi sa...@umich.edu:



Turns out this isn't entirely correct.

SRS is writing back in initiateAcc() and your change fe91d5e2c374  
unfortunately breaks those cases. It really shouldn't be doing this,  
it either needs to do it in completeAcc or needs to be a macro-op,  
but it broke some stuff.


Ali

On Tue, 19 Oct 2010 15:24:56 -0700, Gabe Black gbl...@eecs.umich.edu wrote:

I've been surveying all the ISAs to see how they're using initiateAcc
and completeAcc, and it looks like there are two cases where completeAcc
needs to be (and apparently is) called on stores or store like
instructions. The first are the StoreConditional instructions which have
to (at least in some cases) write whether the store succeeded somewhere.
The second are the swap instructions which have to write the old value
of memory to a register. I don't expect this to be a revelation for
anyone and it's not the source of any problems, but I wanted to verify
what I'm saying here is correct and see if anyone can think of any other
places where completeAcc needs to be called.

I've verified, as best I can by inspection, that initiateAcc isn't doing
any writing back of registers. That's good, because that wouldn't work
anyway. Some templates in ARM make it look like it might, but that's
just from copy/paste, I think, and tracing back to where the templates
get filled in and with what, there shouldn't be any registers to write.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Scons: Try to make SCons output prettier.

2010-11-10 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:

Feel free to propose some more verbose names, but the whole idea is  
to remove the verbosity. Almost everything but [CC] and [SW] else  
gets executed once per build. If there is an error it's going to be  
pretty obvious where it came from, especially with GN where it's  
just a simple replacement of Generating: with [GN]. Honestly, how  
many people do you think will be messing with the parser? It's less  
than 5. We've run out of ISAs at this point. The only ISAs that  
come to mind at the moment are PARISC and IA64. One is dead and the  
other might as well be.


A few things come to mind.
1) I think slightly more verbose names are OK and probably good.
We've cut down significantly on the verbosity already, so we can add
some back.
2) One thing that needs to be thought about is SOURCE vs TARGET.  It
seems that we should always use one or the other, no?  My guess is
that we should always use $TARGET.  (Should we use both?)
3) If we choose $SOURCE, should it be from the actual source directory
(i.e. not the BUILDDIR copy), or not?
4) I think we should strip BUILDDIR from the output (which for many
people is just build/, but for me is much longer).  Stripping build/
adds room for #2.  I sent Ali code for this.  (And I just realized
that there is better code so talk to me)

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




Here are my renaming suggestions. The first alternative is if we  
restrict tags to 7 characters which seems to be what Linux was doing  
judging by the spacing in the commands from the Makefile. The second  
alternative is if we go with arbitrarily long but reasonable tags. The  
right answer might be somewhere between the two alternatives. If  
what I've got here sucks and you have a better idea, please suggest it.


C - CC - CC
CC - CXX - CXX
AS - AS - AS
SW - SWIG - SWIG
AR - AR - AR
LN - LD - LD
RN - RANLIB - RANLIB
M4 - M4 - M4
GN - HEADER - SWITCH_HDR
IA - THE_ISA - THE_ISA
DF - DEFINES - DEFINES_PY
IF - INFO_PY - INFO_PY
SM - C_PARAM - SIMOBJ_PARAM
SG - S_PARAM - SWIG_PARAM
ES - C_ENUM - ENUM_STRS
EW - S_ENUM - ENUM_PARAM
PM - B_PARAM - PARAMHow specifically is this different from SG?
SW - SWIG - SWIG
TF - TRACE - TRACE_FLAGS
EP - P_EMBED - EMBED_PY

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Scons: Try to make SCons output prettier.

2010-11-09 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:



Generally I'm in favor of this, but I think two characters are too  
few, and the abbreviations are almost arbitrary in some places.  
It's great to be brief and remove clutter but it's bad to be  
cryptic. I'd be in favor of short but meaningful tags like [PARAMS]  
or [ISA] or even [SWITCH HEADER].


I feel similarly, though with experience, we would get used to it.  I know

that linux does something like this.  What does it do for things other than
CC and LN?




Well, diving into their Makefile, I found this comment

# Beautify output
# ---
#
# Normally, we echo the whole command before executing it. By making
# that echo $($(quiet)$(cmd)), we now have the possibility to set
# $(quiet) to choose other forms of output instead, e.g.
#
# quiet_cmd_cc_o_c = Compiling $(RELDIR)/$@
# cmd_cc_o_c   = $(CC) $(c_flags) -c -o $@ $
#
# If $(quiet) is empty, the whole command will be printed.
# If it is set to quiet_, only the short version will be printed.
# If it is set to silent_, nothing will be printed at all, since
# the variable $(silent_cmd_cc_o_c) doesn't exist.
#
# A simple variant is to prefix commands with $(Q) - that's useful
# for commands that shall be hidden in non-verbose mode.
#
#   $(Q)ln $@ :
#
# If KBUILD_VERBOSE equals 0 then the above command will be hidden.
# If KBUILD_VERBOSE equals 1 then the above command is displayed.









So I ran the grep below and got the output after it. There are a lot  
of examples of longer tags like DOCPROC or BUILD just looking at the  
first few lines. Also note they don't use []s, although I'm not  
deathly opposed if we decide to.








gbl...@cake /usr/src/linux $ grep -R quiet_cmd *
Documentation/kbuild/makefiles.txt: quiet_cmd_command - what  
shall be echoed

Documentation/kbuild/makefiles.txt: quiet_cmd_image = BUILD   $@
Documentation/DocBook/Makefile:quiet_cmd_docproc = DOCPROC $@
Documentation/DocBook/Makefile:quiet_cmd_db2ps = PS  $@
Documentation/DocBook/Makefile:quiet_cmd_db2pdf = PDF $@
Documentation/DocBook/Makefile:quiet_cmd_db2html = HTML$@
Documentation/DocBook/Makefile:quiet_cmd_db2man = MAN $@
Documentation/DocBook/Makefile:quiet_cmd_fig2eps = FIG2EPS $@
Documentation/DocBook/Makefile:quiet_cmd_fig2png = FIG2PNG $@
Kbuild:quiet_cmd_bounds = GEN $@
Kbuild:quiet_cmd_offsets = GEN $@
Kbuild:quiet_cmd_syscalls = CALL$
Makefile:# quiet_cmd_cc_o_c = Compiling $(RELDIR)/$@
Makefile:quiet_cmd_vmlinux__ ?= LD  $@
Makefile:quiet_cmd_vmlinux_version = GEN .version
Makefile:quiet_cmd_sysmap = SYSMAP
Makefile:quiet_cmd_kallsyms = KSYM$@
Makefile:quiet_cmd_vmlinux-modpost = LD  $@
Makefile:quiet_cmd_tags = GEN $@
Makefile:quiet_cmd_rmdirs = $(if $(wildcard $(rm-dirs)),CLEAN
$(wildcard $(rm-dirs)))
Makefile:quiet_cmd_rmfiles = $(if $(wildcard $(rm-files)),CLEAN
$(wildcard $(rm-files)))

Makefile:quiet_cmd_depmod = DEPMOD  $(KERNELRELEASE)
Makefile:quiet_cmd_as_o_S = AS  $@
arch/x86/boot/compressed/Makefile:quiet_cmd_relocs = RELOCS  $@
arch/x86/boot/compressed/Makefile:quiet_cmd_mkpiggy = MKPIGGY $@
arch/x86/boot/Makefile:quiet_cmd_cpustr = CPUSTR  $@
arch/x86/boot/Makefile:quiet_cmd_image = BUILD   $@
arch/x86/boot/Makefile:quiet_cmd_voffset = VOFFSET $@
arch/x86/boot/Makefile:quiet_cmd_zoffset = ZOFFSET $@
arch/x86/tools/Makefile:quiet_cmd_posttest = TEST$@
arch/x86/lib/Makefile:quiet_cmd_inat_tables = GEN $@
arch/x86/kernel/cpu/Makefile:quiet_cmd_mkcapflags = MKCAP   $@
arch/x86/vdso/Makefile:quiet_cmd_vdsosym = VDSOSYM $@
arch/x86/vdso/Makefile:quiet_cmd_vdso32sym = VDSOSYM $@
arch/x86/vdso/Makefile:quiet_cmd_vdso = VDSO$@
arch/x86/vdso/Makefile:quiet_cmd_vdso_install = INSTALL $@
arch/avr32/boot/images/Makefile:quiet_cmd_uimage = UIMAGE $@
arch/avr32/boot/images/Makefile:quiet_cmd_sfdwarf = SFDWARF $@
arch/avr32/Makefile:quiet_cmd_listing = LST $@
arch/avr32/Makefile:quiet_cmd_disasm  = DIS $@
arch/cris/boot/compressed/Makefile:quiet_cmd_image = BUILD   $@
arch/microblaze/boot/Makefile:quiet_cmd_cp = CP  $ $...@$2
arch/microblaze/boot/Makefile:quiet_cmd_strip = STRIP   $@
arch/microblaze/boot/Makefile:quiet_cmd_uimage = UIMAGE  $...@.ub
arch/microblaze/boot/Makefile:quiet_cmd_dtc = DTC $@
arch/alpha/boot/Makefile:quiet_cmd_strip = STRIP  $@
arch/alpha/boot/Makefile:quiet_cmd_objstrip = OBJSTRIP $@
arch/um/kernel/Makefile:quiet_cmd_quote1 = QUOTE   $@
arch/um/kernel/Makefile:quiet_cmd_quote2 = QUOTE   $@
arch/sparc/boot/Makefile:quiet_cmd_elftoaout= ELFTOAOUT $@
arch/sparc/boot/Makefile:quiet_cmd_piggy= PIGGY   $@
arch/sparc/boot/Makefile:quiet_cmd_btfix= BTFIX   $@
arch/sparc/boot/Makefile:quiet_cmd_sysmap= SYSMAP  $(obj)/System.map
arch/sparc/boot/Makefile:quiet_cmd_image = LD  $@
arch/sparc/boot/Makefile:quiet_cmd_strip = STRIP   $@

Re: [m5-dev] Review Request: Scons: Try to make SCons output prettier.

2010-11-08 Thread Gabriel Michael Black

I've done this before, just a sec...

(some googling)

I think you have to use an Action object instead of a raw command in  
the Command builder. When building the Action object, the second  
parameter is the alternative text to output.


It might look like the following:

env.Command(target, source, Action(foo $TARGET $SOURCES, FOOING $SOURCES))

The []s are probably not necessary, but that's just my opinion.

It might be better to support a -v or --verbose option on the scons  
command line if we can. An environment variable is a little obscure,  
and it's likely you'll just want verbose output temporarily, not as a  
long term environment setting. I don't really remember whether adding  
command line options to the scons command line is feasible and/or  
advisable, so I'll defer to other people's opinions, but it seems a  
little more natural to me.


Gabe

Quoting Ali Saidi sa...@umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/299/
---

(Updated 2010-11-08 15:49:05.987230)


Review request for Default.


Summary (updated)
---

Scons: Try to make SCons output prettier.

This change has scons print [ C], [CC], [LN], etc in front of normal  
commands instead of the entire command themselves and cleans up the  
build a good bit. Unfortunately, I couldn't figure out a way to get  
the same behavior from env.Command() calls so they're still verbose.


Thoughts? Like it? Hate it?


Diffs
-

  SConstruct f61e079ad05e

Diff: http://reviews.m5sim.org/r/299/diff


Testing
---


Thanks,

Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Scons: Try to make SCons output prettier.

2010-11-08 Thread Gabriel Michael Black
Oh wait, that's not an environment variable, that's a scons variable  
from the command line. My opinion still stands since it'd be sticky  
and it's not as nice as a -- option, but it's better than an  
environment variable.


Gabe

Quoting Gabriel Michael Black gbl...@eecs.umich.edu:


I've done this before, just a sec...

(some googling)

I think you have to use an Action object instead of a raw command in  
the Command builder. When building the Action object, the second  
parameter is the alternative text to output.


It might look like the following:

env.Command(target, source, Action(foo $TARGET $SOURCES, FOOING  
$SOURCES))


The []s are probably not necessary, but that's just my opinion.

It might be better to support a -v or --verbose option on the scons  
command line if we can. An environment variable is a little obscure,  
and it's likely you'll just want verbose output temporarily, not as  
a long term environment setting. I don't really remember whether  
adding command line options to the scons command line is feasible  
and/or advisable, so I'll defer to other people's opinions, but it  
seems a little more natural to me.


Gabe

Quoting Ali Saidi sa...@umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/299/
---

(Updated 2010-11-08 15:49:05.987230)


Review request for Default.


Summary (updated)
---

Scons: Try to make SCons output prettier.

This change has scons print [ C], [CC], [LN], etc in front of  
normal commands instead of the entire command themselves and cleans  
up the build a good bit. Unfortunately, I couldn't figure out a way  
to get the same behavior from env.Command() calls so they're still  
verbose.


Thoughts? Like it? Hate it?


Diffs
-

 SConstruct f61e079ad05e

Diff: http://reviews.m5sim.org/r/299/diff


Testing
---


Thanks,

Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] custom parameter types

2010-11-07 Thread Gabriel Michael Black
Is there a way to define custom parameter types, particularly from an  
EXTRAS directory? I want to define an IPAddress type (or similar)  
which takes input in the form a.b.c.d/n (netmask) or a.b.c.d  
(plain IP) or a.b.c.d:p (with port) or something along those lines.  
I see we have an EthernetAddress type, but it looks like that's a MAC  
address and not an IP address. I would also appreciate suggestions on  
how to organize those hypothetical parameter types since an IP with  
netmask wouldn't be used in the same places as an IP with a port, even  
though they're fairly similar.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] [m5-users] Hope for m5 API for Configuration a system.

2010-11-03 Thread Gabriel Michael Black
The best place to look is in the .py files in the src directory. These  
are where the parameters are set up, and there isn't really any other  
documentation. You shouldn't change the values in those files since  
those just set the defaults, but that will let you know what  
parameters there are and in most cases (I hope) give you some  
information about what they do.


Gabe

Quoting Gdansk Amir gdanska...@gmail.com:


Hello:
   I am a m5 users,and now i want to use m5 to build a new system. But when
i want to write a Configuration script, i fond it was very hard to define
the attribute of the objects, because i don't know what attribute of the
object and the accurat word of this corresponding object. So i ant to know
if the m5 has en API docs for Configuration! And how i can get it.
   Many thanks!





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] build_dir has been deprecated

2010-11-03 Thread Gabriel Michael Black
A little research shows that as of scons 0.98 the variant_dir variable  
is available, older than 0.98.1 which we require. It looks like as of  
2.0.1 they started complaining if you use build_dir. I'll put together  
a patch to move us over in the next day or two. I expect it to be very  
simple, but given its potential impact I'll still put it on review  
board to give people one last chance to object.


All my info about versions comes from this:
http://www.scons.org/CHANGES-2.x.txt

And my info about our required version comes from this:
http://m5sim.org/wiki/index.php/Compiling_M5

Gabe

Quoting Gabe Black gbl...@eecs.umich.edu:


 I went to build ALPHA_FS just now, and I must have upgraded scons as
part of my most recent system update because now I get a bunch of the
following warnings.

scons: warning: The build_dir keyword has been deprecated; use the
variant_dir keyword instead.
File /home/gblack/m5/repos/m5/build/ALPHA_FS/SConscript, line 251, in
module

Things still seem to work, but do we want to change build_dir to
variant_dir to clean that up? Will that break compatibility with an old
version we still want to work?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.

2010-11-02 Thread Gabriel Michael Black
I made some changes related to that with x86, although I don't  
remember the specifics. It looks like at least in the timing simple  
CPU completeAcc is still called. The packet will have junk data since  
no access was performed, but it still makes a complete trip for  
whatever reason. I don't know if we really -want- to do things that  
way, but it looks like we currently are.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


Aren't there other NO_ACCESS references (in other ISAs) that call
initiateAcc() but not completeAcc()?  If so, then that by itself doesn't
seem like justification to avoid solution (2). If not, then I suppose I
agree with you.

Steve

On Tue, Nov 2, 2010 at 12:54 PM, Ali Saidi sa...@umich.edu wrote:


Unfortunately, the stats change in all cases. For (1) the instructions no
longer have IsMemRef set which means the num_refs changes for all CPUs and
the change causes some minor changes in the O3. With (2) they're half baked,
so the models call initiateAcc() but it doesn't actually initiate the
access, so completeAcc() is never called and thus they aren't counted as
part of the instruction count. (2) isn't ideal since half-calling the
initiateAcc() might lead to some problems down the road.

I'll post a diff today.

Ali





On Tue, 2 Nov 2010 12:18:08 -0700, Steve Reinhardt ste...@gmail.com
wrote:

Do you mean (1) or (2)?  I thought that with (1) the stats would not
change.

My bias would be (2), but (1) seems livable enough.  In either case it
would be nice to put in a warn_once() if we don't already have one so it's
obvious that SW prefetches are being ignored.

Steve

On Sun, Oct 31, 2010 at 9:45 AM, Ali Saidi sa...@umich.edu wrote:


Any input? Otherwise I'm going with (1) and have new stats to go with it.

Ali

On Oct 27, 2010, at 12:02 AM, Ali Saidi wrote:

 Hmmm... three emails when one should have done. There are three options:
 1. Make them actual no-ops (e.g. stop marking them as mem refs, data
prefetch, etc). The instruction count will stay the same here. The
functionality will stay the same. The instructions will be further  
away from

working -- not that I think anyone will make them work in the future.
 2. Leave them in their half bake memop state where they're memops that
never call read() and don't write back anything, so the  
instruction count is
different since the inst count gets incremented after the op  
completes. This

is what I currently have.
 3. Make them actually work. I've tried to muck with this without success
for a while now.

 Ali



 On Oct 26, 2010, at 11:58 PM, Ali Saidi wrote:

 The other portion of this, is when I try to make them act like loads,
but not actually write a register I break the o3 cpu in ways that 4 hours
has not been able to explain.

 Ali

 On Oct 26, 2010, at 10:42 PM, Ali Saidi wrote:

 The count gets smaller because since they don't actually access
memory, they never complete and therefore they never increment the
instruction count.

 Ali

 On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote:

 I vote for updating the stats... it's really wrong that we ignored
them previously.

 On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi sa...@umich.edu wrote:
 Ok. So next question. With the CPU model treating prefetches as
normal memory instructions the # of instructions changes for the timing
simple cpu because the inst count stat is incremented in completeAccess().
So, one option is to update the stats to reflect the new count. The other
option would be to stop marking the prefetch instructions as memory ops in
which case they would just execute as nop. Any thoughts?

 Ali





 On Oct 24, 2010, at 12:14 AM, Steve Reinhardt wrote:

 No, we've lived with Alpha prefetches the way they are for long
enough
 now I don't see where fixing them buys us that much.

 Steve

 On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi sa...@umich.edu wrote:
 Sounds goo to me. I'll take a look at what I need to do to
implement it.  Any arguments with the Alpha prefetch instructions staying
nops?

 Ali

 On Oct 22, 2010, at 6:52 AM, Steve Reinhardt wrote:

 On Tue, Oct 19, 2010 at 11:14 PM, Ali Saidi sa...@umich.edu
wrote:

 I think the prefetch should be sent the the TLB unconditionally,
and then if the prefetch faults the CPU should toss the instruction rather
than the TLB returning no fault and the CPU i guess checking if the PA is
set?

 I agree that we should override the fault in the CPU. Are we
violently agreeing?

 OK, it's becoming a little clearer to me now.  I think we're
agreeing
 that the TLB should be oblivious to whether an access is a
prefetch or
 not, so that's a start.

 The general picture I'd like to see is that once a prefetch
returns
 from the TLB, the CPU does something like:

 if (inst-fault == NoFault) {
 access the cache
 } else if (inst-isPrefetch()) {
 maybe set a flag if necessary
 inst-fault = NoFault;
 }

 ...so basically everywhere else down the pipeline where we check
for
 faults we don't have to explicitly except prefetches from 

Re: [m5-dev] Review Request: ISA, CPU, etc: Create an ISA defined PC type that abstracts out ISA behaviors.

2010-10-25 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


I don't see any responses from you on reviewboard, Gabe; clicking on
View Reviews does show a review from you at October 24th, 2010,
12:30 a.m., but it's empty as far as I can tell.

Anyway, as I think I mentioned earlier, I'm thinking that email works
better for back-and-forth conversations than trying to go through
reviewboard, so that's fine with me.



Apparently the confusion was mutual. I think review board had a new  
review started and two replies to your two reviews. I cleared out the  
text of the review since I meant it to be a reply, but there was still  
a banner at the top to publish that review. I thought it was to  
publish my replies. Clicking it published a blank review and left the  
actual content as a draft. Anyway, those should be sent now.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ISA, CPU, etc: Create an ISA defined PC type that abstracts out ISA behaviors.

2010-10-25 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


On Sun, Oct 24, 2010 at 10:05 PM, Gabe Black gbl...@eecs.umich.edu wrote:




On 2010-10-22 10:30:06, Steve Reinhardt wrote:
 src/arch/alpha/ev5.cc, line 63
 http://reviews.m5sim.org/r/255/diff/11/?file=4275#file4275line63

     Why not just redefine setPC() to do this PCState update,  
then just delete the setNextPC() line?  In general I feel like a  
lot of the syntax changes are bigger than necessary.  I'm fine  
with renaming methods for clarity (e.g., readPC() - instAddr())  
but taking simple functions/assignments and replacing them with  
multi-stmt sequences seems like a step backwards.


This is already very close to that. The only difference is that the  
new version was split into two lines to keep it from getting  
unmanageably long. I'm not sure if g++ would figure out that it  
needed to build a PCState object it could attach a const reference  
to if the function was called with just an address. There's a  
constructor to create a PCState from an address which you see used  
here, I just don't know if the language would make that logical leap.


I was thinking more of the idea (that I think Nate mentioned) of
having an overloaded call that just took an Addr as a new PC and did
all the other necessary stuff internally.  Whether you want to have
this be an overloaded flavor of pcState() with a different arg type or
give it a different name like setPC() is an open question.



It turns out if you pass it an Addr and it expects a PCState and  
there's an appropriate constructor, it just does the conversion for  
you. That means you can get what you're talking about just by calling  
the existing function differently, a fact I've taken advantage of in  
my local version of the patch.


     one more example: why not just delete the last two lines  
here, or replace them all with:

     tc-setPC(objFile-entryPoint());



We're moving away from setXXX, I thought. Also the fewer methods  
that need to be plumbed all over the place the better. tc-pcState  
is basically already what you're talking about turning tc-setPC  
into, I think. The difference is that the new name is more  
appropriate, and it takes a PC state object instead of an address.


Yea, we're moving away from setXX/getXX as accessors for a field named
XX.  However in this case, it's not really just an accessor as it's
doing all the side effect stuff of setting NPC etc.  I'm not saying
it's the best name, but it does have the advantage of minimizing the
amount of change to legacy code.


My philosophy (which I just made up) is the only distinguishing  
characteristic of legacy code is that it's old. It can be painful to  
change, but by not changing it you make it older, and even more  
painful to change. Then you end up with X86. I'm not going to go  
changing things just because I'm bored, but if changing legacy code  
makes it better that's what I want to do.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ISA, CPU, etc: Create an ISA defined PC type that abstracts out ISA behaviors.

2010-10-25 Thread Gabriel Michael Black


OK, that seems like a good explanation, but (1) as Ali said, it really
needs to go in a doxygen comment where these accessors are defined in
the base PCState class and (2) I seem to recall some specific comments
where substitutions were made that didn't necessarily make sense wrt
this comment, so it would be good to double-check those, and possibly
add some explanatory comments to the less obvious uses as well.



I'll go back and look. I was going through so much code I'd  
occasionally forget if I was in an ISA dependent or independent file.  
I think I got it right most of the time.


The PCs were previously managed entirely by the CPU which had to  
know about PC
semantics, try to figure out which dimension to increment the PC  
in, what to

set NPC/NNPC, etc. These decisions are best left to the ISA in conjunction
with the PC type itself. Because most of the information about how to
increment the PC (mainly what type of instruction it refers to)  
is contained
in the instruction object, a new advancePC virtual function was  
added to the
StaticInst class. Subclasses provide an implementation that moves  
around the
right element of the PC with a minimal amount of decision making.  
In ISAs like
Alpha, the instructions always simply assign NPC to PC without  
having to worry
about micropcs, nnpcs, etc. The added cost of a virtual function  
call should
be outweighed by not having to figure out as much about what to  
do with the

PCs and mucking around with the extra elements.



Does it make sense to go through an ISA-specific function, for example:

inline
TheISA::advancePC(PCState pcState, StaticInstPtr inst)
{
    pcState.advance();
OR
    inst-advancePC(pcState);
}



As far as pcState.advance(), no. The idea is that the instruction knows
what would follow it (a microop or a new instruction) and advances the
PC in the right dimension. As far as inst-advancePC that would
functionally work, but I'm not sure how it would be useful. Wouldn't
that just obscure what's going on?


I'm not sure we're communicating here... you're talking about how in
some ISAs (like x86) you need to know the current instruction to know
how to advance the PC, and in other ISAs (like Alpha) you don't, and
always calling inst-advancePC(pcState) costs you a virtual function
call even in the latter case.  My suggestion is that this ISA-specific
advancePC function (where the OR is a compile-time decision based on
which ISA you're using) would allow you to avoid the virtual function
call when it's not needed.



Ok.


ISA parser:

To support the new API, all PC related operand types were removed from the
parser and replaced with a PCState type. There are two warts on this
implementation. First, as with all the other operand types, the  
PCState still
has to have a valid operand type even though it doesn't use it.  
Second, using
syntax like PCS.npc(target) doesn't work for two reasons, this  
looks like the
syntax for operand type overriding, and the parser can't figure  
out if you're

reading or writing. Instructions that use the PCS operand (which I've
consistently called it) need to first read it into a local variable,
manipulate it, and then write it back out.



See my comments below... I don't see why this change is necessary  
or even desirable as opposed to just making the existing  
PC/NPC/NNPC/UPC etc. operands do the right thing, at least in the  
common cases.




I would have really like to keep the old syntax since I completely agree
this new version is a bit clunky and round about. The basic problem is
that unlike in the previous system, there aren't accessors for reading
and writing parts of the PC since what those parts are isn't defined
external to the ISA. Because of that, an instruction that wants to
modify the PC needs to bring in a copy of the whole thing, do any
modifications locally, and then send the whole revised version back.
This could be done locally to update each operand, but then you'd have a
lot of wasteful copying of the PCState structure and calls to get/set it.

One possible solution would be to keep track of whether -any- member of
the PC was being accessed, read it in once, use it, and if any member
was written to write it out at the end. Then the individual components
could be accessed with operands.


I think I understand everything you're saying up to here.  I think we
agree that you could take an ISA description statement like
NNPC = foo;
and automatically rewrite it to something like
PCState _tmp_pc = tc-pc();
_tmp_pc.nnpc(foo);
tc-pc(_tmp_pc);
if necessary, or perhaps something a little simpler if possible.

Your concern seems to be that if we have multiple statements like
  PC = foo;
  NPC =bar;
then a naive translation would do two read-modify-writes on the pcState.


Yeah.



I agree that's a potential issue, but how many times does that really
happen?  In particular, it seems like there are a lot of places where
we used to have two updates like
PC = foo;

Re: [m5-dev] stores that update their base registers

2010-10-22 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


On Fri, Oct 22, 2010 at 10:57 AM, Gabe Black gbl...@eecs.umich.edu wrote:

Is this just to get STUPD to be a single uop instead of two
uops that communicate via a temp reg, without forcing dependent
instructions to wait for the STUPD to commit to get the updated base
value?



I wouldn't say just, but essentially yes.


So it seems like the overriding question is: is all this hassle really
worth it?  How often do we use a STUPD uop dynamically anyway?



That is a good question. Since the operation isn't visible  
architecturally (at least as far as I remember), I think its use boils  
down to stack pushes and perhaps some other microcoded operations  
using the stack like constructing exception stack frames and that sort  
of thing. Pushes and pops would normally be considered a core part of  
the ISA, but I think compilers generally just add or subtract from the  
stack pointer instead. There are some things where the only expedient  
mechanism to get them is through pushes and pops like the flags  
register, so it's at least partially unavoidable.


I'll instrument the simulator in the near future and measure what  
percentage of instructions are stupds. I'll run SE and FS workloads  
which will likely be different.



Do we need another execution phase like completeTrans() that can be
overridden here?  Generally it's not unreasonable to say that any
exception that occurs post-translation on a store is imprecise... I
don't know if x86 specifically has any exceptions to that rule.



I think that would be a fairly major change, and 99% of the time
completeTrans either wouldn't be used or wouldn't do anything, depending
on how it's implemented


I'm not overwhelmingly concerned about that... O3 is slow enough that
doing one more virtual function call per dynamic memory access (that
will typically hit in the BTB if all the no-op versions point to the
same base implementation) probably won't make a major difference.

Same with calling completeAcc() on stores, though in that case I agree
that it still isn't really the right point to do the update.  In fact,
since O3 explicitly checks to see if an instruction is a
store-conditional to know whether to call completeAcc(), it might even
be faster to call completeAcc() unconditionally and let the virtual
function call replace that if test.


That's true.




I don't think we're talking about exceptions
post translation, just during translation.


Yea, what I meant was that if you do the update post translation
(including waiting for a delayed translation, so you know the
translation didn't fault), then you don't have to worry about rolling
it back because the instruction won't take a later exception, so it
would be safe to commit the value at that point.  That does force
the update to potentially wait for a page-table walk though which is
still not ideal.

So one annoying thing is that there's no benefit to doing the update
in initiateAcc() for TImingSimpleCPU; the only reason to make that
work is so that we can do it in initiateAcc() in O3 and have the same
code work in both places.  It seems like the problem is that we either
call execute() or initiateAcc()/completeAcc(), and in this case we
really want to continue to call execute() to do the update in addition
to using initiateAcc()/completeAcc().  Again, the easy way to do this
is to use two uops.  If we really feel we need an alternative, it
still feels to me like the right thing to do is to define some new
StaticInst method that gets called when initiateAcc() gets called in
O3, but gets called when the instruction commits in TimingSimpleCPU.
Either that or find a way for the instruction to know which model it's
in, and do the update in initiateAcc() for O3 and in completeAcc() for
TImingSimpleCPU.  (I really don't like that last one, but I still like
it better than implementing speculation via a temp reg inside the
instruction definition itself.)


Yeah, I -really- don't like the last one :-). The stupd measurement  
will likely help shape the conversation here, and if Ali were willing  
to look at how often this sort of thing happens in ARM that would be  
good to know too. Since it's architecturally visible there I'd guess a  
lot more often. I'd like to try to do this with a really light  
mechanism, but an extra static inst method may be the way to go in the  
end.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Moving Ruby to M5's debug print support

2010-10-22 Thread Gabriel Michael Black
One exception to that are the PCState structs I'm in the process of  
adding. These each overload  to print themselves as  
(X1=X2=X3).(Y1=Y2) where the X*s are as many architecture level PCs  
as are needed and the Y*s are microPCs. This is in the middle where  
it's too complex for a single built in data type but still simple  
enough to be printed as a unit. This also allows ISA independent code  
to do things like:


DPRINTF(Fetch, The PC is %s.\n, pc);

without having to know what's in the pc or how to print its components.

Generally, though, I agree with Steve that there's a pretty narrow  
window where this sort of thing makes sense and meshes at least  
stylistically with the rest of M5.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


I think part of the confusion is that we don't typically overload
operator in M5 because we don't typically define complex types that
have standard ways of printing themselves out.  So for example simple
M5 types like Tick and Addr are just typedefs for uint64_t (or
something like that), so  just works on them.  More complex objects
are just too complex to have simple string representations, or at
least we haven't felt the need to standardize them, so we just have
ad-hoc DPRINTFs that print out the relevant scalar fields as needed.

So I guess the upshot is that if there are complex Ruby types (not
just typedefs of builtin types) that want to have standard ways of
printing themselves out, that's kind of inconsistent with M5 right
there :-), but if you want to keep that behavior then overloading
operator is the way to do it.

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: Configs: Stop setting the mem parameter in splash2 config files.

2010-10-22 Thread Gabriel Michael Black
I put this together a while ago but forgot to actually publish it. I  
saw it again when I was clearing out submitted reviews.


Gabe

Quoting Gabe Black gbl...@eecs.umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/278/
---

Review request for Default.


Summary
---

Configs: Stop setting the mem parameter in splash2 config files.

This parameter is no longer used, and trying to set it like these  
scripts were

gives a simobject two parents and causes the simulation to die.


Diffs
-

  configs/splash2/cluster.py fc12f4d657f0
  configs/splash2/run.py fc12f4d657f0

Diff: http://reviews.m5sim.org/r/278/diff


Testing
---


Thanks,

Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] please mark reviews for committed patches as such

2010-10-21 Thread Gabriel Michael Black
Yeah, I'd noticed that too but I wasn't sure what to do about it. I'll  
take care of mine tonight.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


Hi folks,

Just a reminder that once you've committed a patch to the repo, you
should go back to reviewboard and mark the corresponding review
submission as submitted.  The review request list is getting pretty
long, and I know there are some patches in there that have been
committed and so don't really need reviewing anymore.

Thanks,

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] stores that update their base registers

2010-10-18 Thread Gabriel Michael Black

Quoting Gabe Black gbl...@eecs.umich.edu:


Gabe Black wrote:

This has come up in ARM and also in X86 with its STUPD (store with
update) microop. The problem has been updating the base register when,
one, the instruction may fault after initiateAcc and the initial value
is lost, and two, completeAcc isn't called by O3. The problem is
compounded by the fact that O3 can speculatively update the register and
recover the old value if there's a fault, and the simple CPUs can't.

What if we changed the instructions that update the base to update the
base in initiateAcc and store the old value in an architecturally
invisible register? Then, if the instruction faults for whatever reason,
the fault object can know it needs to restore the old value of the base
before vectoring into the fault handler. If the instruction completes
normally the value of the base will be updated for consumption by later
instructions, and the value of the backup register can be ignored. I
don't -think- there would be performance distortions from this since the
actual number of sources/destinations doesn't matter, and this would be
at least a little more realistic and simulator level performant than
splitting things into microops.

This would be pretty easy to implement, I think, and would be entirely
contained in existing mechanisms in the ISA, so there isn't really any
question there. What I'd like to know is whether people think this is a
reasonable approach to this problem in the first place.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



Hmm. This probably won't work. O3 would revert to the old value of the
backup register, I think, and the fault object would clobber the
correctly restored base register with that old value.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



OK, so, ideally we'd want to put the any register updates for a store  
in initiateAcc since they're not dependent on memory, that way other  
instructions can use them sooner, O3 doesn't run completeAcc, etc.,  
but that doesn't work because SimpleCPU and O3 are inconsistent as far  
as the commit points an instruction goes through as it runs. In  
SimpleCPU state is updated live, so every setIntReg is a commit point.  
In O3, the instructions are updating a dyninst so the commit point is  
the actual commit stage. For regular instructions this is masked by  
writing back results at the end of the execute function and only if  
there's no fault, but for memory ops with initiateAcc and completeAcc,  
all possible faults haven't happened by the point the instruction  
loses control. The actual commit points of the instruction then  
introduce functional differences and break the consistency of the  
instruction model.


The problem seems to be that O3 is smarter than SimpleCPU, or really  
that O3 is more capable at undoing things that shouldn't have  
happened. One solution might be to make SimpleCPU smarter, but why  
don't we make O3 selectively dumber?


We might be able to solve this problem if we change the semantics of  
initateAcc, the access, and completeAcc for stores. We could do the  
same for loads for symmetry, but I won't push for it because of the  
arguments Steve made about base updating loads and the fact that it  
might not work as well there. Anyway, instead of trying  
(unsuccessfully) to string intiateAcc, the access, and completeAcc,  
together as one large atomic operation, lets make them all separate.  
Once initiateAcc finishes, if it doesn't return a fault it commits. If  
the access faults later that's handled, but the state written to in  
initiateAcc is already permanent. If something needs to be rolled  
back, initiateAcc needs to set up backup state like I talked about in  
my earlier email. completeAcc would then never be called for stores.


This is nice because it means all CPUs can behave the same, we get all  
the benefits of writing back state in initiateAcc, there's no  
simulated performance overhead as far as I can see, the impact on  
existing ISA code is minimal, and (I hope) it shouldn't be that hard  
to implement or carry that much baggage for later.


So what do people think of this second version? Hopefully we don't  
need a third :-).


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] generalization of hypercalls/simulator calls

2010-10-01 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


On Sat, Sep 25, 2010 at 5:03 AM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

[...]


1  2 sound good to me.


3. For selectively trapped instructions, StaticInst properties might now
vary based on runtime information which doesn't really work.


I can see how this is a theoretical issue but I'm not convinced it
will matter in practice. Basically either you're always faking
something, just possibly in different ways, or you're faking it or not
depending on the mode (syscalls in SE vs FS).  And as long as you're
faking it anyway, I don't think you need to worry about imprecision in
the instruction itself.


You're -not- faking it anyway, though. If the same code works in SE  
and FS modes, you're faking in one situation and not in the other with  
the same code and the same StaticInsts. The instruction needs to  
decide if its non-speculative, serialize before/after, artificially  
dependent on register so and so, and the right answer wouldn't be  
fixed ahead of time if SE and FS were just artifacts of the  
configuration and convention and not enforced with #ifdefs.


That said, while worrying about this is probably healthy planning, we  
(I) might not want to get too far ahead of our(my)self and worry about  
the fine details of merging FS and SE modes yet. That will come, I  
think, but there are benefits from just consolidating the mechanisms  
providing magic instruction type behavior.





4. As I understand it we have system objects in SE and FS mode, and process
objects in SE mode but nothing to correspond to them in FS. We may want to
introduce an OS object to be the backing object there.


Are you proposing this for the case where we do hypercall emulation
to have M5 fake the existence of a hypervisor, like it fakes an OS in
SE mode now?  It makes sense in that context, but I don't see how it's
necessary otherwise.  I would say that there's no need to push on this
unless we really have a desire to support HE mode.  I like the
elegance of this generalized model that makes HE mode possible, but
I'm not convinced there's much practical need for it... I think
implementing the virtualization extensions of a real ISA so we can run
something like KVM under M5 would be more useful.


No. I think HE mode is a fairly interesting idea (paravirtualize  
tricky devices?), but I'm talking about something that understands the  
ABI and knows where to pull arguments from when dispatching a pseudo  
inst for instance. 64 bit Linux and 32 bit Linux and Windows (maybe 32  
bit vs. 64 bit) will all most naturally (and maybe compulsively) do  
that differently, but they don't and can't with the current set up.  
We've made the implicit assumption that we'll always be dealing with  
one static ABI per ISA. We might also want instructions that recognize  
that you've requested a BIOS service, for instance, and fake that  
through a backing BIOS object. I don't know exactly how PAL mode calls  
are handled in FS for Alpha (I think we have a ROM?) but if those are  
faked this would be a similar sort of thing. Basically I want to have  
one or more objects associated with whatever the workload is, or maybe  
really whatever code might run since the BIOS isn't really a workload,  
and have them register to handle families of hypercalls.


On top of that, the code that sets up an OS kernel is mechanically  
pretty similar to the thing that loads a process in SE mode. The both  
take an image and blast it into memory, set up supporting memory  
structures (initial stack, BIOS tables, etc.) and both configure the  
hardware to start running whatever they loaded. This seems ripe for  
generalization.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] generalization of hypercalls/simulator calls

2010-10-01 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


On Fri, Oct 1, 2010 at 5:44 PM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:



3. For selectively trapped instructions, StaticInst properties might now
vary based on runtime information which doesn't really work.


I can see how this is a theoretical issue but I'm not convinced it
will matter in practice. Basically either you're always faking
something, just possibly in different ways, or you're faking it or not
depending on the mode (syscalls in SE vs FS).  And as long as you're
faking it anyway, I don't think you need to worry about imprecision in
the instruction itself.


You're -not- faking it anyway, though. If the same code works in SE and FS
modes, you're faking in one situation and not in the other with the same
code and the same StaticInsts.


Yea, that's exactly what I meant when I said faking it or not
depending on the mode.  My point is that since you're faking it in at
least one case, then you can set the static flags to make the
non-faking case work and not worry if that's not precisely what you
would have done in the faking case since it's fake anyway.



What I'm saying is that that won't work since the faked case is -more-  
restrictive than the not faked case. You can't speculatively execute a  
write syscall on behalf of the simulated code, but you can  
speculatively execute a software interrupt instruction. Also, if you  
need the syscall number from a register to call the right emulation  
function, you can't get that from within the instruction since it  
won't be an operand. That one isn't that hard to work around since by  
the time you execute the call you can't be speculating, and the  
backing code can look it up with the thread context.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] generalization of hypercalls/simulator calls

2010-10-01 Thread Gabriel Michael Black


What I'm saying is that that won't work since the faked case is  
-more- restrictive than the not faked case. You can't speculatively  
execute a write syscall on behalf of the simulated code, but you can  
speculatively execute a software interrupt instruction. Also, if you  
need the syscall number from a register to call the right emulation  
function, you can't get that from within the instruction since it  
won't be an operand. That one isn't that hard to work around since  
by the time you execute the call you can't be speculating, and the  
backing code can look it up with the thread context.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I know you're busy right now so don't feel compelled to respond right  
away, but I had one other comment I wanted to bring up here. One other  
change I'd like to see eventually and I think I've brought up before  
(although maybe that's my imagination) is to turn the decode function  
into a stateful decode object that can be put into various modes,  
and/or a set of decode objects where the actual decode function is  
virtual.


That would complicates things at least a little as far as defining the  
decoders, but then you wouldn't have to keep figuring out what mode  
you were in. That may also mean the ExtMachInsts could have less  
contextualizing state and wouldn't take as much effort to load up. I  
vaguely remember someone saying that might mean ExtMachInsts could go  
away (not true at least because of x86's stream of consciousness like  
instruction encoding) so I guess I did bring this up before. Anyway,  
with a setup like that, 99% of the decoder could be shared, and then  
it could statefully know when it sees a syscall instruction whether to  
return the faking one or the real one.


One minor issue with that is how to cache those instructions. Maybe  
they'd be kept separate so the right sub-cache-thingy would be used  
depending on what mode you're in, keeping the other 99% in play all  
the time, maybe when you switch modes you dump the cache and expect  
switches to be infrequent and the hot instructions to fill it again  
quickly, maybe you just have separate caches for each mode... I don't  
know what the right answer is there, but there are at least a number  
of valid options that would be functionally correct if not good or  
elegant :-), and I think we're a good way off from having to worry  
about it.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] possible speed up of X86_FS boot

2010-09-27 Thread Gabriel Michael Black

Sorry, try this one:

http://lxr.linux.no/#linux+v2.6.28.4/arch/x86/kernel/apic.c#L576

I copied the first one from the link itself. This one I followed the  
link and copied from the address bar.


Gabe

Quoting nathan binkert n...@binkert.org:


That link doesn't seem to work.

  Nate

On Mon, Sep 27, 2010 at 9:03 PM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

The function in question is here:

http://lxr.linux.no/linux+*/arch/x86/kernel/apic.c#L576

It looks complicated enough and does enough work that, to me, it doesn't
look feasible to fake, but you have more experience with these sorts of
modifications than I do. The function or perhaps macro that has the pause
in it is cpu_relax.

Gabe

Quoting nathan binkert n...@binkert.org:


It should be easy enough to measure the latency of a pause on a real
machine, but given the description and its purpose of avoiding memory
order violations, my guess is that the latency is in the 10s of ns,
possibly 100s.

That said, you should probably figure out a way to skip the function
entirely and save the time.  In fact, it might be worthwhile to search
for pause in the code and skip all such functions.

 Nate


Hey everybody. I'm currently waiting on x86 to boot under gdb where I'm
occasionally stepping in to see what it's doing. It's spinning furiously
in
a tight, three instruction loop while it calibrates the local APIC timer.
These instructions are:

pause
cmpl   $0x19,0x2860b(%rip)        # 0x80903438
jle    0x808dae24 setup_boot_APIC_clock+142


The second instruction checks to see if the loop can exit and the third
is
the jump back to the top. The first instruction, pause, is supposed to
make the CPU hang out briefly where briefly is a small, possibly zero,
delay. For us, this could make M5 boot X86 faster because it means fewer
instructions would be executed waiting for timer interrupts and in other
delay loops.

I think the quiescCycles pseudo op function would make implementing this
behavior pretty easy, but then the question is how many cycles to wait.
Does
anyone have an informed (but not too informed) opinion about what an
appropriate number of cycles would be?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev






___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ISA, CPU, etc: Create an ISA defined PC type that abstracts out ISA behaviors.

2010-09-24 Thread Gabriel Michael Black
Darn it, this is still not right. I must have forgotten to refresh the  
patch. Before I create yet another review request, does anyone know  
how to update the one I already have? I tried using the -u option to  
postreview, but maybe I didn't do it right.


I think if you try to compile ARM with this version, you'll get an  
error in process.cc. Just take the parenthesis off the declaration of  
PCState pc on line 363.


Gabe

Quoting Gabe Black gbl...@eecs.umich.edu:



---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/255/
---

Review request for Default.


Summary
---

Below is the actual change description. THIS IS A BIG, DISRUPTIVE  
CHANGE. Conceptually it's not that complicated, but it touches just  
about every part of the simulator outside of the memory system  
include all the CPU models, all the ISAs, the ISA parser,  
instruction tracing, remote gdb support, starting processes in SE  
mode, etc. etc. Basically anything involving a simulated PC. Given  
the huge reach of this change and its correspondingly huge potential  
to introduce bugs, if at all possible please help me verify it by  
downloading it and trying things out. Specifically if you could try  
checkpointing and/or remote GDB that would be very helpful. If you  
have a workload you care about that isn't part of the regressions  
please try that too. I'd much rather work with you to fix bugs  
before they break your stuff than after.


This version leaves out stats updates since I'm still running the  
full regressions, and some nearly trivial changes to the EIO stuff.  
The TODOs are future TODOs since this change is already very large  
and I don't want to delay getting it checked in.



ISA,CPU,etc: Create an ISA defined PC type that abstracts out ISA behaviors.


This change is a low level and pervasive reorganization of how PCs  
are managed
in M5. Back when Alpha was the only ISA, there were only 2 PCs to  
worry about,

the PC and the NPC, and the lsb of the PC signaled whether or not you were in
PAL mode. As other ISAs were added, we had to add an NNPC, micro PC and next
micropc, x86 and ARM introduced variable length instruction sets, and ARM
started to keep track of mode bits in the PC. Each CPU model handled PCs in
its own custom way that needed to be updated individually to handle the new
dimensions of variability, or, in the case of ARMs mode-bit-in-the-pc hack,
the complexity could be hidden in the ISA at the ISA  
implementation's expense.

Areas like the branch predictor hadn't been updated to handle branch delay
slots or micropcs, and it turns out that had introduced a significant (10s of
percent) performance bug in SPARC and to a lesser extend MIPS. Rather than
perpetuate the problem by reworking O3 again to handle the PC features needed
by x86, this change was introduced to rework PC handling in a more modular,
transparent, and hopefully efficient way.


PC type:

Rather than having the superset of all possible elements of PC state declared
in each of the CPU models, each ISA defines its own PCState type which has
exactly the elements it needs. A cross product of canned PCState classes are
defined in the new generic ISA directory for ISAs with/without delay slots
and microcode. These are either typedef-ed or subclassed by each ISA. To read
or write this structure through a *Context, you use the new  
pcState() accessor

which reads or writes depending on whether it has an argument. If you just
want the address of the current or next instruction or the current micro PC,
you can get those through read-only accessors on either the PCState type or
the *Contexts. These are instAddr(), nextInstAddr(), and microPC(). Note the
move away from readPC. That name is ambiguous since it's not clear whether or
not it should be the actual address to fetch from, or if it should have extra
bits in it like the PAL mode bit. Each class is free to define its own
functions to get at whatever values it needs however it needs to to  
be used in

ISA specific code. Eventually Alpha's PAL mode bit could be moved out of the
PC and into a separate field like ARM.

These types can be reset to a particular pc (where npc = pc +
sizeof(MachInst), nnpc = npc + sizeof(MachInst), upc = 0, nupc = 1 as
appropriate), printed, serialized, and compared. There is a branching()
function which encapsulates code in the CPU models that checked if an
instruction branched or not. Exactly what that means in the context of branch
delay slots which can skip an instruction when not taken is ambiguous, and
ideally this function and its uses can be eliminated. PCStates also generally
know how to advance themselves in various ways depending on if they point at
an instruction, a microop, or the last microop of a macroop. More on that
later.

Ideally, accessing all the PCs at once when setting them will improve

Re: [m5-dev] DPRINTF changes

2010-09-10 Thread Gabriel Michael Black

Quoting nathan binkert n...@binkert.org:


So in debugging some python stuff, I was thinking to myself about how
it would be nice if we had DPRINTF support in python, so it got me
thinking about how I might implement that.  It turns out that it isn't
that hard since I've already exposed the flags in python, but I
started to get annoyed about the fact that I have to define new flags
in SConscript files.  I've also recently removed almost all cases of
monolithic files gathering a bunch of info from across the build (to
make it easier to modularize things).  So, I'd like to do this to the
trace flags.  Right now, we have a std::vectorbool of all of the
trace flags, and each trace flag is an index into that vector.  SCons
has a TraceFlag

Instead, I was thinking of having Trace::Flag objects that are
statically allocated and register themselves with a global dictionary
of flags at runtime.  I think that this should be just as fast as the
existing system since each flags object would contain a bool
representing if the flag is on or off.  It would also have the benefit
of making each TraceFlag localized from a compile perspective, so
you'll no longer have to rebuild the word just because someone defined
a new trace flag.

Any comments or objections to this approach?  I'm pretty sure that the
end result will have the same DPRINTF syntax but a different syntax
for defining trace flags.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I have no objections based on your description, but I'm curious what  
python code is complicated enough to need DPRINTFs. Also, how would  
you expect the syntax for defining traceflags to change?


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] possible problem with branch predictor squash ordering assert

2010-09-02 Thread Gabriel Michael Black
Hi everybody. I'm reworking a lot of PC related stuff across CPUs and  
ISAs (more to come soon) and I'm running into an assertion in the  
branch predictor when running MIPS in O3. The assert is during a  
squash when the branch that was mispredicted isn't the oldest pending  
branch.


I don't have the code in front of my since I'm away from my  
development machine, but I think the offending line is likely this one:


http://repo.m5sim.org/m5/file/c1b66fc648e2/src/cpu/o3/bpred_unit_impl.hh#l353

I looked at the instruction stream coming out of fetch and, again from  
memory so don't hold me to specifics, the sequence that seems to cause  
it is a jalr (sn:23) to a computed address in a register which is  
unconditional but mispredicts as not taken since the target isn't  
known, and then an unconditional branch (sn:25) which also doesn't  
have a known target so is predicted as not taken but who's target I  
think is resolved in decode.


The first branch, the jalr, is still winding its way to commit or  
wherever its mispredict would be detected when the other branch is  
detected as mispredicted in decode. Squash is called on the branch  
predictor for this branch, sn:25, but sn:23 is at the head of the  
prediction history. Kablooey.


I think this didn't happen with MIPS before because it would trick  
fetch into thinking it was branching even when it didn't need to be  
and it would space out the branches enough for everything to be worked  
out. That matches the timing difference I saw in this early part of  
the Exec trace, but I don't know if it's causal or just coincidence.


This sequence of events as described seems like an entirely plausible  
thing to happen considering branches can be resolved out of order if  
they're different types. Is the assert just too picky? Is there some  
structural reason this would break the branch predictor? Will this  
just never happen if some invariant I inadvertently violated was  
still true? I think the assert is just too picky, but I want to hear  
from the experts before I go trying to fix it.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] ARM_SE and the InOrderCPU

2010-08-22 Thread Gabriel Michael Black

Quoting Korey Sewell ksew...@umich.edu:


comments below

On Sun, Aug 22, 2010 at 3:46 PM, Gabe Black gbl...@eecs.umich.edu wrote:


The CPSR might be that high because it's after the banks of registers
for all the modes. Korey, are you asking about register flattening in
general, or how the index are mapped in ARM?


I guess both :) 

I never really had a *full hold* on it, but I'm assuming for now that in
order to keep track of the initial register dependencies the unflattened
register index is enough (what you get after inst. decode), but in execution
you need to flatten the register to access the right spot in the register
bank.

Please correct me if I'm wrong though, I'm learning this on the fly. But
once I do understand that I'll be more ready to get ARM and SPARC working
for InOrder.




--
- Korey



That's not quite right. The indices should be flattened each time an  
instruction goes through since they might not always go to the same  
thing, and only the flattened indices should be used by the CPU. The  
preflattened indices are between the instructions and the flattening  
function and have no fixed relationship to anything the CPU can use.


Here's the nickle tour of flattening, just to make sure we're on the  
same page.


First, an instruction has identified that it needs register such and  
such as determined by its encoding (or the fact that it always uses a  
certain register, or ...). For the sake of argument, lets say we're  
talking about SPARC, the register is %g1, and the second bank of  
globals is active. From the instructions point of view, the  
unflattened register is %g1, which, likely, is just represented by the  
index 1.


Next, we need to map from the instruction's view of the register  
file(s) down to actual storage locations. Think of this like virtual  
memory. The instruction is working within an index space which is like  
a virtual address space, and it needs to be mapped down to the  
flattened space which is like physical memory. Here, the index 1 is  
likely mapped to, say, 9, where 0-7 is the first bank of globals and  
8-15 is the second.


This is the point where the CPU gets involved. The index 9 refers to  
an actual register the instruction expects to access, and it's the  
CPU's job to make that happen. Before this point, all the work was  
done by the ISA with no insight available to the CPU, and beyond this  
point all the work is done by the CPU with no insight available to the  
ISA.


The CPU is free to provide a register directly like the simple CPU by  
having an array and just reading and writing the 9th element on behalf  
of the instruction. The CPU could, alternatively, do something  
complicated like renaming and mapping the flattened index further into  
a physical register like O3.



One important property of all this, which makes sense if you think  
about the virtual memory analogy, is that the size of the index space  
before flattening has nothing to do with the size after. The virtual  
memory space could be very large (presumably with gaps) and map to a  
smaller physical space, or it could be small and map to a larger  
physical space where the extra is for, say, other virtual spaces used  
at other times. You need to make sure you're using the right size  
(post flattening) to size your tables because that's the space of  
possible options.


One other tricky part comes from the fact that we add offsets into the  
indices to distinguish ints from floats from miscs. Those offsets  
might be one thing in the preflattening world, but then need to be  
something else in the post flattening world to keep things from  
landing on top of each other without leaving gaps. It's easy to make a  
mistake here, and it's one of the reasons I don't like this offset  
idea as a way to keep the different types separate. I'd rather see a  
two dimensional index where the second coordinate was a register type.  
But in the world as it exists today, this is something you have to  
keep track of.




As far as the ARM registers, you'll find what amounts to a map of the  
integer registers here:


http://repo.m5sim.org/m5/file/417ef5d444bd/src/arch/arm/intregs.hh

The first swath of indices all correspond to actual storage locations  
needed by the ISA. These are the regular user registers, and the extra  
registers that get swapped in in the other modes. There are also spots  
for the zero register, the microcode register, and the condition codes.


After that, there are a bunch of constants defined that show how to  
map from the 16 normal registers down to the actual storage locations  
in each of the modes. At the end of the enum, those are put into  
arrays which are used as translation tables.



You can find constants for use outside of the ISA (ie in your CPU) here:

http://repo.m5sim.org/m5/file/417ef5d444bd/src/arch/arm/registers.hh

To the first order, ignore the constants with Arch in them and just go  
with the ones like NumIntRegs. I 

Re: [m5-dev] Review Request: ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate.

2010-08-20 Thread Gabriel Michael Black
I'd like the CPUs to remain as dumb as possible as far as ISA  
semantics and mechanisms so neither they nor the ISAs are  
unnecessarily constrained or complicated. In light of that, I think  
it's actually better if the CPU has no idea what predication is or  
when it may or may not have happened.


This is a class of stats I don't think we have a good way to collect,  
though. We might want to know how often store conditionals fail, how  
often compare and swaps fail, how often a register window push/pop  
needs to fill/spill, etc. I think a more general mechanism that solved  
-this- problem would solve the larger problem, and I think could be  
quite useful.


Gabe

Quoting Ali Saidi sa...@umich.edu:



Yes, but we've not really solved the larger problem that is a CPU model
should be able to see if a instruction is predicated false. For example,
it's impossible to create a statistics that is the number of instructions
that were predicated false when we should just be able to do if
(inst-readPredicate() == false) predicatedFalse++; in the CPU models.

Ali


On Fri, 20 Aug 2010 11:42:23 -0700, Gabe Black gbl...@eecs.umich.edu
wrote:

Well the problem is just the load/store instructions, right? Otherwise
the execute method can just do/not do whatever it needs without having
to coordinate with the CPU. If you make your proposed isa parser
changes, the instructions should be able to handle all the other cases
internally without too much fuss.

Gabe

Ali Saidi wrote:

Interesting... we're top posting on this thread and bottom posting on

the

other one...

Anyway... yes you're correct initiateAcc() is called one instruction at
at
time.

For memory ops alone the o3 model could be changed in

executeLoad/Store()

inst-predicated = false; // assume failure
load_fault = inst-initiateAcc();

if (inst-predicated) 

However, to make this work, the read()/write() methods would have to

set

inst-predicated = true;

That isn't so bad, but I still have two problems with this method:
1) This only works for load/store instructions. There isn't a
corresponding way to do this for any other type of op since they're in
either case they're going to call setInt/FloatReg().
2) I'm not really a fan on the idea that the absence of a call triggers
is
what triggers the mechanism. It seems like a convoluted way to instead

of

having:
if (testPredicate()) {
   .
} else {
  
  xc-setPredicate(false);
}

Doing it the way it's currently implemented also means that the same
mechanism works for multiple cpu models that might need it.

Thanks,
Ali



On Thu, 19 Aug 2010 20:45:41 -0700, Gabe Black gbl...@eecs.umich.edu
wrote:


O3 only -seems- to execute multiple memory instructions at a time.

Each

initiateAcc is called one at a time and completes execution before the
next starts, so the same thing should apply as far as that goes.

Gabe

Ali Saidi wrote:


Remember that the timing cpu is only executing one instruction at a


time.


If the instruction calls read() and no access isn't set the timing

cpu

packages up the request ships it out and sets it's state to
DcacheWaitResponse. If the instruction doesn't call read() it

continues

on
like nothing happened {because it didn't). With the o3 cpu there are
multiple instructions in-flight, so simply waiting for it to call

read



or


not isn't an option. Something has to be passed through the xc to

tell

the
cpu that this instruction is done since the normal mechanism won't

take

care of it. The way this works now is by passing back something other
than
NoFault. However, the instruction didn't actually fault, so then we


would


have to special case everything that reads that fault later on in the
pipeline to say if its a predicationfault, do the same thing you

would



do


for no fault. This seems worse and more error prone.

Thanks,
ali


On Thu, 19 Aug 2010 15:16:22 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:



I don't think that's true, but I may be confused. I think, at least



for the timing CPU, that it checks if a read/write was called and
doesn't just fall through. The timing CPU would wait forever for a
read/write response that would never come otherwise. (digs around a



little) I think it's sort of like what I described. The CPU will
continue if it's not waiting for anything, which would be the case

if






no access actually happened. We could probably get the same behavior



if we checked if the instruction was waiting for a read/write
response, but that might be kept somewhere annoying to get at.
Generally, if we can hide the existence of predication from the

CPUs,






I think that'll make everyones life easier (except for the ARM

ISA's,






I suppose).

Gabe

Quoting Ali Saidi sa...@umich.edu:




It's not the same issue here. The simple cpus just have their
execute/completeAccess methods guarded by a predicate condition

test.




If



nothing happens in there, so be it and the cpu goes onto the next
instruction

Re: [m5-dev] Review Request: ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate.

2010-08-20 Thread Gabriel Michael Black
Or even something that builds a histogram of instructions/instruction  
types executed.


Gabe

Quoting Gabriel Michael Black gbl...@eecs.umich.edu:

I'd like the CPUs to remain as dumb as possible as far as ISA  
semantics and mechanisms so neither they nor the ISAs are  
unnecessarily constrained or complicated. In light of that, I think  
it's actually better if the CPU has no idea what predication is or  
when it may or may not have happened.


This is a class of stats I don't think we have a good way to  
collect, though. We might want to know how often store conditionals  
fail, how often compare and swaps fail, how often a register window  
push/pop needs to fill/spill, etc. I think a more general mechanism  
that solved -this- problem would solve the larger problem, and I  
think could be quite useful.


Gabe

Quoting Ali Saidi sa...@umich.edu:



Yes, but we've not really solved the larger problem that is a CPU model
should be able to see if a instruction is predicated false. For example,
it's impossible to create a statistics that is the number of instructions
that were predicated false when we should just be able to do if
(inst-readPredicate() == false) predicatedFalse++; in the CPU models.

Ali


On Fri, 20 Aug 2010 11:42:23 -0700, Gabe Black gbl...@eecs.umich.edu
wrote:

Well the problem is just the load/store instructions, right? Otherwise
the execute method can just do/not do whatever it needs without having
to coordinate with the CPU. If you make your proposed isa parser
changes, the instructions should be able to handle all the other cases
internally without too much fuss.

Gabe

Ali Saidi wrote:

Interesting... we're top posting on this thread and bottom posting on

the

other one...

Anyway... yes you're correct initiateAcc() is called one instruction at
at
time.

For memory ops alone the o3 model could be changed in

executeLoad/Store()

inst-predicated = false; // assume failure
load_fault = inst-initiateAcc();

if (inst-predicated) 

However, to make this work, the read()/write() methods would have to

set

inst-predicated = true;

That isn't so bad, but I still have two problems with this method:
1) This only works for load/store instructions. There isn't a
corresponding way to do this for any other type of op since they're in
either case they're going to call setInt/FloatReg().
2) I'm not really a fan on the idea that the absence of a call triggers
is
what triggers the mechanism. It seems like a convoluted way to instead

of

having:
if (testPredicate()) {
  .
} else {
 
 xc-setPredicate(false);
}

Doing it the way it's currently implemented also means that the same
mechanism works for multiple cpu models that might need it.

Thanks,
Ali



On Thu, 19 Aug 2010 20:45:41 -0700, Gabe Black gbl...@eecs.umich.edu
wrote:


O3 only -seems- to execute multiple memory instructions at a time.

Each

initiateAcc is called one at a time and completes execution before the
next starts, so the same thing should apply as far as that goes.

Gabe

Ali Saidi wrote:


Remember that the timing cpu is only executing one instruction at a


time.


If the instruction calls read() and no access isn't set the timing

cpu

packages up the request ships it out and sets it's state to
DcacheWaitResponse. If the instruction doesn't call read() it

continues

on
like nothing happened {because it didn't). With the o3 cpu there are
multiple instructions in-flight, so simply waiting for it to call

read



or


not isn't an option. Something has to be passed through the xc to

tell

the
cpu that this instruction is done since the normal mechanism won't

take

care of it. The way this works now is by passing back something other
than
NoFault. However, the instruction didn't actually fault, so then we


would


have to special case everything that reads that fault later on in the
pipeline to say if its a predicationfault, do the same thing you

would



do


for no fault. This seems worse and more error prone.

Thanks,
ali


On Thu, 19 Aug 2010 15:16:22 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:



I don't think that's true, but I may be confused. I think, at least



for the timing CPU, that it checks if a read/write was called and
doesn't just fall through. The timing CPU would wait forever for a
read/write response that would never come otherwise. (digs around a



little) I think it's sort of like what I described. The CPU will
continue if it's not waiting for anything, which would be the case

if






no access actually happened. We could probably get the same behavior



if we checked if the instruction was waiting for a read/write
response, but that might be kept somewhere annoying to get at.
Generally, if we can hide the existence of predication from the

CPUs,






I think that'll make everyones life easier (except for the ARM

ISA's,






I suppose).

Gabe

Quoting Ali Saidi sa...@umich.edu:




It's not the same issue here. The simple cpus just have their
execute

Re: [m5-dev] Review Request: ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate.

2010-08-20 Thread Gabriel Michael Black
But we shouldn't mandate features on either side of the boundary. We'd  
be forcing all CPU models to support predication when only ARM will  
actually use it and there are easy tweaks to avoid needing it even then.


Gabe

Quoting Ali Saidi sa...@umich.edu:



This isn't constraining anything. Either you check if the the state
variable in the dyn inst or you don't. If you don't check it, it's 1 byte
of data that doesn't matter. Another example, is what if I want a smarter
CPU model that is able to issue predicated instructions before the
destination registers are ready. Unless this piece of state is available
that would be impossible to do.

The point should be to create interfaces that encapsulate features, and
not architectures. Predication is a feature of an architecture. Itanium
predicates lots of instructions too...

Ali


On Fri, 20 Aug 2010 15:18:45 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

I'd like the CPUs to remain as dumb as possible as far as ISA
semantics and mechanisms so neither they nor the ISAs are
unnecessarily constrained or complicated. In light of that, I think
it's actually better if the CPU has no idea what predication is or
when it may or may not have happened.

This is a class of stats I don't think we have a good way to collect,
though. We might want to know how often store conditionals fail, how
often compare and swaps fail, how often a register window push/pop
needs to fill/spill, etc. I think a more general mechanism that solved
-this- problem would solve the larger problem, and I think could be
quite useful.

Gabe

Quoting Ali Saidi sa...@umich.edu:



Yes, but we've not really solved the larger problem that is a CPU model
should be able to see if a instruction is predicated false. For

example,

it's impossible to create a statistics that is the number of

instructions

that were predicated false when we should just be able to do if
(inst-readPredicate() == false) predicatedFalse++; in the CPU models.

Ali


On Fri, 20 Aug 2010 11:42:23 -0700, Gabe Black gbl...@eecs.umich.edu
wrote:

Well the problem is just the load/store instructions, right? Otherwise
the execute method can just do/not do whatever it needs without having
to coordinate with the CPU. If you make your proposed isa parser
changes, the instructions should be able to handle all the other cases
internally without too much fuss.

Gabe

Ali Saidi wrote:

Interesting... we're top posting on this thread and bottom posting on

the

other one...

Anyway... yes you're correct initiateAcc() is called one instruction

at

at
time.

For memory ops alone the o3 model could be changed in

executeLoad/Store()

inst-predicated = false; // assume failure
load_fault = inst-initiateAcc();

if (inst-predicated) 

However, to make this work, the read()/write() methods would have to

set

inst-predicated = true;

That isn't so bad, but I still have two problems with this method:
1) This only works for load/store instructions. There isn't a
corresponding way to do this for any other type of op since they're

in

either case they're going to call setInt/FloatReg().
2) I'm not really a fan on the idea that the absence of a call

triggers

is
what triggers the mechanism. It seems like a convoluted way to

instead

of

having:
if (testPredicate()) {
   .
} else {
  
  xc-setPredicate(false);
}

Doing it the way it's currently implemented also means that the same
mechanism works for multiple cpu models that might need it.

Thanks,
Ali



On Thu, 19 Aug 2010 20:45:41 -0700, Gabe Black

gbl...@eecs.umich.edu

wrote:


O3 only -seems- to execute multiple memory instructions at a time.

Each

initiateAcc is called one at a time and completes execution before

the

next starts, so the same thing should apply as far as that goes.

Gabe

Ali Saidi wrote:


Remember that the timing cpu is only executing one instruction at a


time.


If the instruction calls read() and no access isn't set the timing

cpu

packages up the request ships it out and sets it's state to
DcacheWaitResponse. If the instruction doesn't call read() it

continues

on
like nothing happened {because it didn't). With the o3 cpu there

are

multiple instructions in-flight, so simply waiting for it to call

read



or


not isn't an option. Something has to be passed through the xc to

tell

the
cpu that this instruction is done since the normal mechanism won't

take

care of it. The way this works now is by passing back something

other

than
NoFault. However, the instruction didn't actually fault, so then we


would


have to special case everything that reads that fault later on in

the

pipeline to say if its a predicationfault, do the same thing you

would



do


for no fault. This seems worse and more error prone.

Thanks,
ali


On Thu, 19 Aug 2010 15:16:22 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:



I don't think that's true, but I may be confused. I think, at

least



for the timing CPU, that it checks if a read/write

Re: [m5-dev] Review Request: ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate.

2010-08-20 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



We're not forcing all cpu models to support predication, only have to a
function allows the cpu to understand if an instruction didn't execute
because it was predicated. If the CPU chooses to do nothing with that, as
the simple cpus do, then so be it.

It's like saying that we shouldn't have a micro-pc function because Alpha
doesn't have any micro-coded instructions or we shouldn't have a NextNPC
because only sparc has delay slots.


A difference is those ISAs simply wouldn't function otherwise, and  
that's not the case here. I've actually thought about ways to abstract  
away even those differences and just have an opaque PC blob that can  
increment and be compared for equality and leave all the details to  
the ISA.


I'd really like to try to bring down the complexity inside our CPUs  
and ISA descriptions, and while this change isn't the end of the  
world, it's definitely an avoidable step in the opposite direction.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ARM/O3: store the result of the predicate evaluation in DynInst or Threadstate.

2010-08-19 Thread Gabriel Michael Black
I don't think that's true, but I may be confused. I think, at least  
for the timing CPU, that it checks if a read/write was called and  
doesn't just fall through. The timing CPU would wait forever for a  
read/write response that would never come otherwise. (digs around a  
little) I think it's sort of like what I described. The CPU will  
continue if it's not waiting for anything, which would be the case if  
no access actually happened. We could probably get the same behavior  
if we checked if the instruction was waiting for a read/write  
response, but that might be kept somewhere annoying to get at.  
Generally, if we can hide the existence of predication from the CPUs,  
I think that'll make everyones life easier (except for the ARM ISA's,  
I suppose).


Gabe

Quoting Ali Saidi sa...@umich.edu:



It's not the same issue here. The simple cpus just have their
execute/completeAccess methods guarded by a predicate condition test. If
nothing happens in there, so be it and the cpu goes onto the next
instruction without complaint.  The out of order cpu on the other hand
needs to know if the instruction was predicated false so it can notify
commit that it is complete, even though it hasn't done anything. If commit
isn't notified, the instruction will never commit and the processor will
stall.

This information should clearly belong in the dyninst. Unless there is
some other way to access the class from the the isa description, I think
the change is correct. An alternate approach would be to have the method in
threadstate do nothing because it's unimportant for it.

Ali



On Tue, 17 Aug 2010 19:04:10 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

Sorry if I wasn't clear before (I reread my post and it sounded a
little vague) but what the simple CPU does is keep track of whether
the supposed memory instruction actually calls read or write on the
execution context. If not, then the CPU doesn't try to complete any
access, it just considers that part over. Ideally we can do the same
thing here.

Gabe

Quoting Ali Saidi sa...@umich.edu:



Anyone have comments on this? It seems like this is the only way to
access
the DynInst from the isa description. Threadstate does have the current
instruction in it, as well as things like Temporary storage to pass

the

source address from copy_load to. It doesn't seem to out of place to
include current instruction predication state in there.

Ali


On Sat, 14 Aug 2010 07:08:35 -, Gabe Black

gbl...@eecs.umich.edu

wrote:

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/177/#review185
---



src/cpu/thread_context.hh
http://reviews.m5sim.org/r/177/#comment326

This isn't really a property of a thread, it's the property of a
single instruction. I don't think this is being done in the right
place. I think we should have a discussion on m5-dev to determine

the

best way to handle this. There was a little code added to the

simple

CPU that does what this is supposed to do if a memory instruction
didn't actually read or write memory, and I think this is a better

way

to handle this. We should have a discussion about this on m5-dev,
especially since it touches lots of low level bits like *contexts,
instruction behavior, CPUs, etc. These sorts of changes need to be

made

carefully.


- Gabe


On 2010-08-13 10:12:35, Ali Saidi wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/177/
---

(Updated 2010-08-13 10:12:35)


Review request for Default and Min Kyu Jeong.


Summary
---

ARM/O3: store the result of the predicate evaluation in DynInst or
Threadstate.
THis allows the CPU to handle predicated-false instructions

accordingly.

This particular patch makes loads that are predicated-false to be

sent

straight to the commit stage directly, not waiting for return of the

data

that was never requested since it was predicated-false.


Diffs
-

  src/arch/arm/isa/templates/mem.isa 3c48b2b3cb83
  src/arch/arm/isa/templates/pred.isa 3c48b2b3cb83
  src/cpu/base_dyn_inst.hh 3c48b2b3cb83
  src/cpu/base_dyn_inst_impl.hh 3c48b2b3cb83
  src/cpu/o3/lsq_unit_impl.hh 3c48b2b3cb83
  src/cpu/simple/base.hh 3c48b2b3cb83
  src/cpu/simple_thread.hh 3c48b2b3cb83
  src/cpu/thread_context.hh 3c48b2b3cb83

Diff: http://reviews.m5sim.org/r/177/diff


Testing
---


Thanks,

Ali




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http

Re: [m5-dev] Review Request: ARM: make predicated-false instruction to move data from a old register.

2010-08-19 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



On Tue, 17 Aug 2010 14:19:49 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

Quoting Steve Reinhardt ste...@gmail.com:


On Sun, Aug 15, 2010 at 6:07 PM, Min Kyu Jeong mkje...@gmail.com

wrote:



I needed to spit out a code that reads from a register, and writes to

it

again. The thing is arch reg indices are renamed (reg renaming and
shadow
reg file), so many structures are needed to be looked up to find a
previous/current physical register for the given arch reg name. These
indirections are handled in ISA-specific code and encapsulated, and

the

operand section of the ISA description specifies which one to use.


I think we're thinking of these things on different levels... the
operand registers are available via StaticInst::srcRegIdx() and
destRegIdx(), and can be accessed via (for example) readIntReg() in
o3/thread_context_impl.hh, none of which are ISA dependent.  I don't
see why you can't build on these calls to handle this entirely within
the O3 model in an ISA-independent fashion (i.e., a way that would
generalize to any ISA with universal predication, if we ever run into
another one).

I was actually thinking this could be handled at a level higher than
the parser, namely in the ISA description.

If the predicate is false, then the instruction still needs to forward
the value from the physical source register to the physical
destination register so future consumers get the right thing. For
renaming to work, it's necessary for all those registers to be listed
as both sources and destinations. By putting a fall through condition
in the instruction as defined by the ISA description that just assigns
registers to themselves (DestReg = DestReg; SrcReg1 = SrcReg1) both of
those conditions are satisfied automatically without the CPUs having
to get any smarter/complex/constrained and without retrofitting the
ISA parser.

This will mean having to change a lot of instruction templates so that
they have the fall through case and so that the definitions know what
registers to assign to themselves. That's going to be tedious, but
hopefully fairly mechanical to do.

If the ISA parser were a python framework instead of a parser, then
it's conceivable this sort of thing could be handled automatically and
cleanly by subclassing, replacing default function implementations,
etc. etc., and kept local to ARM. That would give the benefits you
might have been going for by changing the parser (the parser does the
work, fewer bugs...) , but without impacting the complexity seen by
all the other ISAs. That's not going to happen tomorrow or possibly
ever so I hestitated to mention it and don't make any plans around it,
but it's something I would really like to see happen.



I'm not sure I know why people are so opposed to changing a piece of code
that is supposed to make creating an isa description easier, not more
difficult. With the changes all the regression tests still pass, and the
parser is generating the exact same code for the other ISAs with and
without this change.

The isa parser already generates 130,000 lines of code per cpu model and
75,000 lines of decode. It takes over 1GB of memory and 4 minutes to
compile a CPU model. To do this without changing the ISA parser and to
allow for out-of-order execution, conditional always and plain conditional
instructions would have to be decoded separately, doubling the size of the
decoder and cpu models. Additionally, not only would we have to go through
the process of changing all the isa descriptions, but the possibility for
introducing subtle bugs where predicted false instructions wouldn't be
guaranteed to properly update their renamed registers. It seems like the
right answer in this case is to have the isa parser be able to figure out
what registers need updating when an instruction is predicated false and
do
the update.

Thanks,
Ali
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



Between the CPU and the parser I think it should be in the parser.  
You're right that it's a mess trying to do it in the ISA description,  
but that's a localized mess. If we continue making the parser more and  
more complicated it turns into a global mess. Fixing problems in the  
parser can be very hard and annoyingly subtle.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] TimingSimpleCPU, x86: sendSplitData packet sender states

2010-08-19 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


I don't think I changed anything here... hg annotate seems to back me
up on that, too.  I think the fundamental (but subtle) issue here is
that once you successfully send a packet, the ownership for that
packet object is conceptually handed off to the recipient, so
technically the sender shouldn't be dereferencing that packet pointer
anymore.  Thus it's OK for Ruby to push its own senderState into the
packet if it wants.  If I had to guess I think it might just be that
Gabe hasn't been testing with Ruby...



Yes, that's probably true. Mercurial backs you up, but I honestly  
don't remember writing that code. Old age I guess :-).





(That said, looking over the Ruby code with fresh eyes after not
having thought about it for a while, I think the Ruby code might be
overcomplicated... instead of only tracking the m5 packet pointer in
the ruby request object, then using senderState to look up the port
based on the packet, why don't we just keep the both the packet and
the port in the ruby request?)

At a high level I think part of the issue with the sendSplitData()
code is that buildSplitPacket doesn't return a pointer to the big
packet, so the only way to access it is via the senderState objects of
the sub-packets.  I expect that with some thought we could restructure
the code to be a little cleaner, but Joel's idea of holding on to the
original senderState pointers on the stack seems like a reasonable
interim solution.


That sounds reasonable.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] TimingSimpleCPU, x86: sendSplitData + TLB miss

2010-08-19 Thread Gabriel Michael Black
For that we might want two walkers. They'd need to start with  
different CR3s (roots), interleave, etc. and doing it all in one might  
be overly complicated. I honestly haven't thought about it deeply.  
That's with AMD's SVM too, mind you, since I think Intel's VT uses a  
substantially different structure for the hypervisor level page  
tables. Why would they do that? I have no idea.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


Note that if/when we get around to supporting virtualization, we'll
need to handle nested page table walks too, so if there's a solution
to this problem that looks like it would lend itself better to nesting
walks too then that would be a preferable direction to go.

Steve

On Wed, Aug 18, 2010 at 8:14 PM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

Off hand I would say yes, queuing up a second request behind one that caused
a table walk makes sense. Since there should only be one other request, just
having a variable for the on deck request should be sufficient instead of a
full blown queue. Please let me know if you think otherwise. I'll think
about it some more and let you know if I change my mind, but I think that
makes sense.

Gabe

Quoting Joel Hestness hestn...@cs.utexas.edu:


Hi,
 I am currently running a benchmark in X86_FS timing mode (single or
multicore) that crashes due to the page table walker.  On a data write (or
read) instruction that causes TimingSimpleCPU::write to split the TLB
access
into two accesses (cpu/simple/timing.cc:~560), if the first TLB access
misses, it causes the page table walker to start a walk and its state =
Waiting.  Since the second access happens immediately
in TimingSimpleCPU::write, if the second request also misses, it causes
another walk that fails the (state == Ready) assertion in
X86ISA::Walker::start (arch/x86/pagetable_walker.cc:~316).
 Seems this is a corner case of a corner case, namely, an unaligned
(split)
data access, whose split TLB accesses both miss.  It doesn't look like
there
is any code to handle the situation yet, and I'm hoping to get some
guidance
on how to address it.
 It seems to me that since this only happens on a TLB miss, that the TLB
or
walker should be able to handle the multiple requests.  I see that in the
ARM code, the page table walker has a queue of walks that are currently in
flight (I'm having trouble convincing myself that the queues can't
conflict
when multiple walks are in flight :\).  Would it make sense to have
similar
state queuing in the x86 page table walker?
 Thanks,
 Joel

--
 Joel Hestness
 PhD Student, Computer Architecture
 Dept. of Computer Science, University of Texas - Austin
 http://www.cs.utexas.edu/~hestness




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] TimingSimpleCPU, x86: sendSplitData + TLB miss

2010-08-18 Thread Gabriel Michael Black
Off hand I would say yes, queuing up a second request behind one that  
caused a table walk makes sense. Since there should only be one other  
request, just having a variable for the on deck request should be  
sufficient instead of a full blown queue. Please let me know if you  
think otherwise. I'll think about it some more and let you know if I  
change my mind, but I think that makes sense.


Gabe

Quoting Joel Hestness hestn...@cs.utexas.edu:


Hi,
  I am currently running a benchmark in X86_FS timing mode (single or
multicore) that crashes due to the page table walker.  On a data write (or
read) instruction that causes TimingSimpleCPU::write to split the TLB access
into two accesses (cpu/simple/timing.cc:~560), if the first TLB access
misses, it causes the page table walker to start a walk and its state =
Waiting.  Since the second access happens immediately
in TimingSimpleCPU::write, if the second request also misses, it causes
another walk that fails the (state == Ready) assertion in
X86ISA::Walker::start (arch/x86/pagetable_walker.cc:~316).
  Seems this is a corner case of a corner case, namely, an unaligned (split)
data access, whose split TLB accesses both miss.  It doesn't look like there
is any code to handle the situation yet, and I'm hoping to get some guidance
on how to address it.
  It seems to me that since this only happens on a TLB miss, that the TLB or
walker should be able to handle the multiple requests.  I see that in the
ARM code, the page table walker has a queue of walks that are currently in
flight (I'm having trouble convincing myself that the queues can't conflict
when multiple walks are in flight :\).  Would it make sense to have similar
state queuing in the x86 page table walker?
  Thanks,
  Joel

--
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Texas - Austin
  http://www.cs.utexas.edu/~hestness




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ARM: make predicated-false instruction to move data from a old register.

2010-08-17 Thread Gabriel Michael Black

Quoting Steve Reinhardt ste...@gmail.com:


On Sun, Aug 15, 2010 at 6:07 PM, Min Kyu Jeong mkje...@gmail.com wrote:



I needed to spit out a code that reads from a register, and writes to it
again. The thing is arch reg indices are renamed (reg renaming and shadow
reg file), so many structures are needed to be looked up to find a
previous/current physical register for the given arch reg name. These
indirections are handled in ISA-specific code and encapsulated, and the
operand section of the ISA description specifies which one to use.


I think we're thinking of these things on different levels... the
operand registers are available via StaticInst::srcRegIdx() and
destRegIdx(), and can be accessed via (for example) readIntReg() in
o3/thread_context_impl.hh, none of which are ISA dependent.  I don't
see why you can't build on these calls to handle this entirely within
the O3 model in an ISA-independent fashion (i.e., a way that would
generalize to any ISA with universal predication, if we ever run into
another one).

Steve
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I was actually thinking this could be handled at a level higher than  
the parser, namely in the ISA description.


If the predicate is false, then the instruction still needs to forward  
the value from the physical source register to the physical  
destination register so future consumers get the right thing. For  
renaming to work, it's necessary for all those registers to be listed  
as both sources and destinations. By putting a fall through condition  
in the instruction as defined by the ISA description that just assigns  
registers to themselves (DestReg = DestReg; SrcReg1 = SrcReg1) both of  
those conditions are satisfied automatically without the CPUs having  
to get any smarter/complex/constrained and without retrofitting the  
ISA parser.


This will mean having to change a lot of instruction templates so that  
they have the fall through case and so that the definitions know what  
registers to assign to themselves. That's going to be tedious, but  
hopefully fairly mechanical to do.


If the ISA parser were a python framework instead of a parser, then  
it's conceivable this sort of thing could be handled automatically and  
cleanly by subclassing, replacing default function implementations,  
etc. etc., and kept local to ARM. That would give the benefits you  
might have been going for by changing the parser (the parser does the  
work, fewer bugs...) , but without impacting the complexity seen by  
all the other ISAs. That's not going to happen tomorrow or possibly  
ever so I hestitated to mention it and don't make any plans around it,  
but it's something I would really like to see happen.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ruby: Resurrected Ruby's deterministic tests

2010-08-11 Thread Gabriel Michael Black
That seems reasonable, but I still don't think being deterministic is  
a good distinguishing characteristic. M5 is designed to always be  
deterministic and the regressions assume the same test will always get  
the same answer, so -all- tests are deterministic, even if they're  
determinstically random which admittedly sounds contradictory.  
Assuming my interpretation of your description is correct, I think you  
mean directed tests which are, I think, tests directed towards testing  
something in particular. I'm not an expert on the terminology but I  
think that's right.


Gabe

Quoting Beckmann, Brad brad.beckm...@amd.com:

I think we should incorporate the ruby tester in the same directory  
structure.  How about we create a new directory cpu/testers and then  
move the three types of testers below that: cpu/testers/memtest,  
cpu/testers/rubytest, and cpu/testers/determtest


Brad


-Original Message-
From: Gabe Black [mailto:gabe.bl...@gmail.com] On Behalf Of Gabe Black
Sent: Sunday, August 15, 2010 2:16 PM
To: M5 Developer List
Cc: Beckmann, Brad; Nathan Binkert
Subject: Re: [m5-dev] Review Request: ruby: Resurrected Ruby's  
deterministic tests


How about putting the stuff currently in memtest into memtest/random and
this new stuff in memtest/directed? I think both are technically
determinstic as implemented.

Gabe

Brad Beckmann wrote:



On 2010-08-08 20:48:58, Nathan Binkert wrote:

I think the directory name needs to have tester in it.   
src/cpu/deterministic gives you no indication of what it actually  
is.  Beyond that, is it really necessary to have multiple  
testers?  Can we not fold the functionality into a single tester?




I used the name deterministic because the directory contains  
multiple deterministic testers.  I changed the name to  
determtest...does that work for you?  These deterministic testers  
serve a very different purpose than the random testers.  The  
deterministic testers are used for verifying latency and protocol  
operation for a specific transaction.  They do not stress races  
like the memtest or rubytest.  The deterministic testers are much  
simplier than the random testers.  I don't think it would be easy  
to merge the different testers' logic into a single tester and I'm  
not sure it is worth the effort.  This code is actually not new  
code.  It is actually fairly old GEMS code.  I'm not sure why it  
was moved over in the original transfer, but I've found the code to  
be very valuable.


I fixed the includes, guards, and comments.  Since these aren't new  
files, but actually old GEMS files, I don't think it appropriate to  
change the variable names in this patch.  Possibly later we can  
conver all the old GEMS files to the M5 variable convention.



- Brad


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/101/#review148
---


On 2010-08-11 12:01:17, Brad Beckmann wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/101/
---

(Updated 2010-08-11 12:01:17)


Review request for Default.


Summary
---

ruby: Resurrected Ruby's deterministic tests

Added the request series and invalidate deterministic tests as new  
cpu models

and removed the no longer needed ruby tests


Diffs
-

  configs/example/determ_test.py PRE-CREATION
  src/cpu/determtest/DetermGenerator.hh PRE-CREATION
  src/cpu/determtest/DetermGenerator.cc PRE-CREATION
  src/cpu/determtest/InvalidateGenerator.hh PRE-CREATION
  src/cpu/determtest/InvalidateGenerator.cc PRE-CREATION
  src/cpu/determtest/RubyDetermTester.hh PRE-CREATION
  src/cpu/determtest/RubyDetermTester.cc PRE-CREATION
  src/cpu/determtest/RubyDetermTester.py PRE-CREATION
  src/cpu/determtest/SConscript PRE-CREATION
  src/cpu/determtest/SeriesRequestGenerator.hh PRE-CREATION
  src/cpu/determtest/SeriesRequestGenerator.cc PRE-CREATION
  src/mem/protocol/RubySlicc_Exports.sm a75564db03c3
  src/mem/ruby/tester/DetermGETXGenerator.hh a75564db03c3
  src/mem/ruby/tester/DetermGETXGenerator.cc a75564db03c3
  src/mem/ruby/tester/DetermInvGenerator.hh a75564db03c3
  src/mem/ruby/tester/DetermInvGenerator.cc a75564db03c3
  src/mem/ruby/tester/DetermSeriesGETSGenerator.hh a75564db03c3
  src/mem/ruby/tester/DetermSeriesGETSGenerator.cc a75564db03c3
  src/mem/ruby/tester/SConscript a75564db03c3

Diff: http://reviews.m5sim.org/r/101/diff


Testing
---


Thanks,

Brad





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev









___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ThreadContext suspension / activation in O3 SMT - nanosleep syscall

2010-07-29 Thread Gabriel Michael Black
It's fine with me, but we'd need to be really careful the semantics  
(64 bit vs. 32 bit operands, flags, etc.) are translated correctly. If  
a syscall isn't there at all it's obvious why it's not going to work,  
but if it is there and slightly wrong it'd be a lot harder to figure  
out what's wrong.


Gabe

Quoting Timothy Jones tjon...@inf.ed.ac.uk:





On 2010-07-29 08:46:47, Steve Reinhardt wrote:
 src/arch/alpha/linux/process.cc, line 470
 http://reviews.m5sim.org/r/68/diff/1/?file=834#file834line470

 This is a broader question (not just for Ioannis): does it  
make sense to proactively add this to the syscall tables for the  
other ISAs too?  I don't see anything ISA-specific here.


I would be in favour of doing that, yes.


- Timothy


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/68/#review115
---


On 2010-07-29 07:59:26, Ioannis Ilkos wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.m5sim.org/r/68/
---

(Updated 2010-07-29 07:59:26)


Review request for Default.


Summary
---

This is a patch fixing the ThreadContext suspension / activation  
described in  
http://www.mail-archive.com/m5-dev@m5sim.org/msg07307.html and  
implementing the nanosleep() syscall (albeit without the  
signal-relevant parts) for AlphaLinux.


Changes in O3:
- The tick scheduling was removed from activateContext() and moved  
into activateThread(). It seems more natural since  
activateContext() either calls activateThread() or schedules it. In  
the case of scheduling there is no need to schedule ticks  
prematurely.
- suspendContext() and haltContext() check the number of active  
threads before setting CPU _status to Idle.



Diffs
-

  src/arch/alpha/linux/linux.hh b28e7286990c
  src/arch/alpha/linux/process.cc b28e7286990c
  src/cpu/o3/cpu.cc b28e7286990c
  src/sim/syscall_emul.hh b28e7286990c

Diff: http://reviews.m5sim.org/r/68/diff


Testing
---


Thanks,

Ioannis




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: ThreadContext suspension / activation in O3 SMT - nanosleep syscall

2010-07-29 Thread Gabriel Michael Black




According to the linux kernel sources:

#define _STRUCT_TIMESPEC
struct timespec {
time_t  tv_sec; /* seconds */
longtv_nsec;/* nanoseconds */
};
#endif

It basically depends on sizeof(long) for each ISA. If all ISAs  
supported are 64bit it can be moved to ISA-independent code.




It's really per ISA/OS combination, and they are not all 64 bit. You  
can template it, though. There are examples of that floating around.


Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: util/m5/m5.c: in readfile(), added memset to touch all pages - ensure they are in the page table

2010-07-29 Thread Gabriel Michael Black
Actually I have a minor comment there too. You should really only need  
to touch a byte on each page of the buffer, not all bytes like memset  
would. This will make a read_file take longer than it absolutely needs  
to, although it's likely negligable. Feel free to ignore this as you  
see fit, perhaps after giving it a try to see if the difference is  
even perceivable.


Gabe

Quoting Joel Hestness hestn...@cs.utexas.edu:


So, it appears that the only change that we agree on for now is the change
to m5.c.  Should I submit that change as its own patch and withdraw this
one?
  Thanks,
  Joel

On Fri, Jul 23, 2010 at 3:45 PM, Gabriel Michael Black 
gbl...@eecs.umich.edu wrote:


Quoting Ali Saidi sa...@umich.edu:



On Fri, 23 Jul 2010 16:59:08 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

 Hmm, maybe we should be building these regularly too... What do you

think, Ali? Would it be possible to return reserved1_func and use a
different code?


It was reserved for me while I was doing the bottleneck analysis work and
didn't want anyone to grab that ID. Once I pushed all of the bottleneck
analysis changes, I changed reserved into the actual cp_annotate
operations. So, everything worked as intended.

reserved1_func shouldn't be used anywhere and shouldn't be added back to
the file.

Ali


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I don't understand how that made it reserved. Wouldn't anyone else be able
to do the same thing you did but with some conflicting use? The comment next
to those says Reserved for user, but it's not if it ends up being assigned
an official use. Why would we want to have reserved2_func but not
reserved1_func?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





--
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Texas - Austin
  http://www.cs.utexas.edu/~hestness




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] ARM Regression tests

2010-07-26 Thread Gabriel Michael Black
I don't think we/I found a good solution. I had (and may still have)  
copied and pasted versions of things with a 32 appended for SPARC, and  
x86 just doesn't have any 32 bit tests. The 32 bit SPARC tests may  
have just been in my own tree and never committed.


Gabe

Quoting Ali Saidi sa...@umich.edu:



Hi Everyone,

I'm trying to setup some regression tests for ARM at the moment. I've got
two sets of binaries one for the thumb instruction set and another for the
arm instruction set. Is there any way to setup a test for both? I think
we've run into this before but I don't know that we've ever solved the
problem (sparc32/sparc64; x86/x86-64; etc). If not I'm just going to pick
the instructions for each test randomly.

Thanks,

Ali
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: util/m5/m5.c: in readfile(), added memset to touch all pages - ensure they are in the page table

2010-07-23 Thread Gabriel Michael Black

Quoting Joel Hestness hestn...@cs.utexas.edu:


Hey Gabe,
  Comments are in-lined below.  If you'd like me to resubmit another review
of all or part, just let me know.
  Thanks,
  Joel




util/m5/Makefile.x86
http://reviews.m5sim.org/r/64/#comment248

   Why is this necessary? Is this so it runs under SE mode? In that case I
think we should make it run like before as the default since 99% of the time
this will run in FS, and provide a way to inject -static for the 1% of the
time it runs in SE.

   Compiling it as static all the time wouldn't be the end of the world,
but it seems like we'd be making universal changes for a very uncommon case.



Building the m5 binary without -static allows it to dynamically link a few
libraries:
  j...@capillary:~/research/m5-new/util/m5$ ldd m5
linux-vdso.so.1 =  (0x7fff3f9ff000)
libc.so.6 = /lib/libc.so.6 (0x7fb05131f000)
/lib64/ld-linux-x86-64.so.2 (0x7fb05168f000)
When I was putting together a disk image using busybox, it had issues with
library versions.  In general, since the m5 utility isn't performance
critical and just implements simulator magic, I think it would be easiest if
it was always built statically whether for FS or SE.  On the other, I would
imagine that it's built very infrequently and only for initial disk image
creation, so perhaps its not worth changing.


You have a good reason to build it statically, but it would still be  
best, I think, if that was optional. I'm not sure what the best way to  
do that would be. Maybe another target? Leaving LDFLAGS in the command  
line but not setting it to anything might work too, although then  
there's the danger of picking something up from the environment. I  
don't think it's worth being too elaborate for. Another question might  
be why we have three different nearly identical make files instead of  
one, or maybe one with really short ones for each arch, or even just  
pulling it into scons. Maybe to avoid complicating things too much  
this would be its own little scons environment independent from the  
main one.


I think there are enough unanswered (but not serious) questions here  
that we might want to wait to check this part in.









util/m5/m5ops.h
http://reviews.m5sim.org/r/64/#comment249

   It looks like Ali commandeered that value on line 61. It might have been
better to use 0x5A for that, but it also might not be safe to change it now
since there may be binaries out there that use it (probably not too many).
It would be a little strange, but you could actually use 0x5A for
reserved1_func. I don't know what restrictions there are in the various ISAs
for function numbers, but in x86 it's a 16 bit value.



Ah, I didn't see that originally!
The only real trouble right now is that if you try to build the m5 utility
for x86_64 with the current version in the repo, it will fail with an
undefined reference to reserved1_func:
  gcc -O2  -o m5op_x86.o -c m5op_x86.S
  gcc -o m5 m5.o m5op_x86.o
  m5op_x86.o: In function `m5_reserved1_func':
  (.text+0x5c): undefined reference to `reserved1_func'
  collect2: ld returned 1 exit status
  make: *** [m5] Error 1
It looks like neither m5op_alpha.S or m5op_sparc.S use reserved1_func, so
another solution would be to remove it from m5op_x86.S (eliminate it
completely from the m5 utility codebase).


Hmm, maybe we should be building these regularly too... What do you  
think, Ali? Would it be possible to return reserved1_func and use a  
different code?


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] Review Request: util/m5/m5.c: in readfile(), added memset to touch all pages - ensure they are in the page table

2010-07-23 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



On Fri, 23 Jul 2010 16:59:08 -0400, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:


Hmm, maybe we should be building these regularly too... What do you
think, Ali? Would it be possible to return reserved1_func and use a
different code?

It was reserved for me while I was doing the bottleneck analysis work and
didn't want anyone to grab that ID. Once I pushed all of the bottleneck
analysis changes, I changed reserved into the actual cp_annotate
operations. So, everything worked as intended.

reserved1_func shouldn't be used anywhere and shouldn't be added back to
the file.

Ali


___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



I don't understand how that made it reserved. Wouldn't anyone else be  
able to do the same thing you did but with some conflicting use? The  
comment next to those says Reserved for user, but it's not if it  
ends up being assigned an official use. Why would we want to have  
reserved2_func but not reserved1_func?


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabriel Michael Black
Yes it does, and that sounds reasonable to me. I'd still like to see  
us use ISA hooks as minimally as possible, but this seems ok.


Gabe

Quoting Timothy M Jones tjon...@inf.ed.ac.uk:

Oh, ok, I see where you're going with that.  However, the main idea  
of having TheISA::HasUnalignedMemAcc was that it is a constant  
specific to each ISA.  Therefore, the compiler should really  
recognise this and optimise it away wherever it's used.  In this  
case, for ISAs that don't have unaligned memory accesses the whole  
'if' block should disappear.  For ISAs that do have them then the  
condition should be reduced to just checking sreqLow.  Therefore  
it's better for the first set of ISAs for this to be kept in.  Does  
that make sense?


Tim

On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black gbl...@eecs.umich.edu wrote:


I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
is redundant, we shouldn't be checking it at all. It's a free, though
very small, performance bump, but more significantly it removes a direct
dependence on the ISA.

Gabe

Timothy M. Jones wrote:

changeset 3bd51d6ac9ef in /z/repo/m5
details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef
description:
O3CPU: Fix a bug where stores in the cpu where never marked as split.

diffstat:

src/cpu/o3/lsq_unit.hh |  6 ++
1 files changed, 6 insertions(+), 0 deletions(-)

diffs (16 lines):

diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh
--- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100
+++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100
@@ -822,6 +822,12 @@
storeQueue[store_idx].sreqLow = sreqLow;
storeQueue[store_idx].sreqHigh = sreqHigh;
storeQueue[store_idx].size = sizeof(T);
+
+// Split stores can only occur in ISAs with unaligned memory  
accesses.  If
+// a store request has been split, sreqLow and sreqHigh will  
be non-null.

+if (TheISA::HasUnalignedMemAcc  sreqLow) {
+storeQueue[store_idx].isSplit = true;
+}
assert(sizeof(T) = sizeof(storeQueue[store_idx].data));

T gData = htog(data);
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabriel Michael Black
To me there's actually a spectrum of ISA dependence, incompletely  
described below:


1. If ISA == suchandsuch do this, otherwise do that.
2. If ISA has characteristic such and such, do this, otherwise do that.
3. Here, ISA, you take care of this.
4. ISA data parameterizing non-ISA stuff.

Number 1 is the worst since it's hard to maintain, can be cumbersome  
to specify, and it isn't always clear -why- ISA suchandsuch needs a  
particular behavior.


I'd say number 2 is second worst, and that's what this is an example  
of. It's better since it's obvious why the code is separated out and  
there can be sharing between CPU models, etc., but at the same time it  
increases the CPU's awareness of the ISA on it and partially breaks  
down the barriers of abstraction. It also sets up special case code  
paths where, for instance, only x86 on the timing CPU would possibly  
exhibit a certain bug. If someone changes things for ARM and  
everything seems to work, they could be subtley breaking x86 which  
they aren't familiar with and weren't thinking about when they made  
their change.


3. Three is better in some ways and worse in others. It's clear what's  
happening, there's a lot of flexibility, and the CPU isn't actually  
aware of -how- something is being done, just that, say, this would be  
a good time to check for interrupts, whatever that means. It's not as  
great because you have more complex interactions between ISAs and  
CPUs, and you have to do a bit more work in the ISA. It can also be  
hard to make some of this functionality work sensibly in order, out of  
order, single cycle, multi cycle, timing mode, atomic mode, etc. etc.


4. This one is the best when you can get away with it. This is where  
you, say, make your integer register file 32 registers because the ISA  
says that's how many it needs. Basically the only draw back to this is  
that behavior changes a little based on each ISA, but if you can get  
away with it this is the safest.




I think having a multi ISA simulator that will be modified by its end  
users, especially one with as large a cross product of options as  
ours, needs to try to be safe and simple before being as absolutely  
fast as it can be. You know what they say about premature  
optimization. Ideally we should design things so the big, unnecessary  
pieces of code just aren't part of the equation because they're in the  
ISA defined objects, control just doesn't go that way when it wouldn't  
make sense, etc. And if, for instance, a pointer should always be 0,  
the code should still behave correctly. The code should do its job  
correctly no matter -why- it's being asked to do it.


I think it's bad news to have a big list of yes or no checkboxes in  
each ISA directory which turn on and off behaviors. This is especially  
true when it's ambiguous whether to say yes or no, if the behavior  
changes based on circumstances, etc., and if/when the checkbox is  
interpretted subtley (or not so subtley) differently by the consumer.


In this particular limitted case it seems relatively ok although not  
necessarily advisable, but it's a slippery slope I don't consider us  
have a completely clean history with.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


In generalI think this is the kind of ISA hook we should be using...
in the sense that checking TheISA::HasUnalignedMemAcc is much better
than (TheISA == x86 || TheISA == Power).  I think it's useful not
only to avoid the overhead of a dynamic check for an ISA that doesn't
need it, but also to clarify that this is code that never gets
executed on those ISAs.  Maybe for a little one-liner like this it's
not a big deal either way, but for bigger hunks of code I think that
clarification is potentially useful.

Steve

On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

Yes it does, and that sounds reasonable to me. I'd still like to see us use
ISA hooks as minimally as possible, but this seems ok.

Gabe

Quoting Timothy M Jones tjon...@inf.ed.ac.uk:


Oh, ok, I see where you're going with that.  However, the main idea of
having TheISA::HasUnalignedMemAcc was that it is a constant  
specific to each

ISA.  Therefore, the compiler should really recognise this and optimise it
away wherever it's used.  In this case, for ISAs that don't have unaligned
memory accesses the whole 'if' block should disappear.  For ISAs that do
have them then the condition should be reduced to just checking sreqLow.
 Therefore it's better for the first set of ISAs for this to be kept in.
 Does that make sense?

Tim

On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black gbl...@eecs.umich.edu
wrote:


I think you missed my point. If the check of TheISA::HasUnalignedMemAcc
is redundant, we shouldn't be checking it at all. It's a free, though
very small, performance bump, but more significantly it removes a direct
dependence on the ISA.

Gabe

Timothy M. Jones wrote:


changeset 3bd51d6ac9ef in /z/repo/m5
details

Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...

2010-07-22 Thread Gabriel Michael Black
Actually, maybe this falls into a subcategory of 2. which is why it  
seems more acceptable.


2.a If the ISA says it's safe to use this optimization which doesn't  
affect behavior and which could be turned off and only result in lower  
simulator performance, skip this code/check/whatever.


This still has some of the drawbacks I mentioned for the  
list-of-checkboxes model, but at least if it doesn't fit there's  
always a failsafe option. These would be optional hints, like telling  
the compiler a function never returns.


This may have been what you were talking about in the first place, in  
which case great, we're on the same page. I'd been debating whether to  
send out my little list anyway since I think it's productive to spend  
a few cycles thinking about this stuff.


Gabe

Quoting Gabriel Michael Black gbl...@eecs.umich.edu:

To me there's actually a spectrum of ISA dependence, incompletely  
described below:


1. If ISA == suchandsuch do this, otherwise do that.
2. If ISA has characteristic such and such, do this, otherwise do that.
3. Here, ISA, you take care of this.
4. ISA data parameterizing non-ISA stuff.

Number 1 is the worst since it's hard to maintain, can be cumbersome  
to specify, and it isn't always clear -why- ISA suchandsuch needs a  
particular behavior.


I'd say number 2 is second worst, and that's what this is an example  
of. It's better since it's obvious why the code is separated out and  
there can be sharing between CPU models, etc., but at the same time  
it increases the CPU's awareness of the ISA on it and partially  
breaks down the barriers of abstraction. It also sets up special  
case code paths where, for instance, only x86 on the timing CPU  
would possibly exhibit a certain bug. If someone changes things for  
ARM and everything seems to work, they could be subtley breaking x86  
which they aren't familiar with and weren't thinking about when they  
made their change.


3. Three is better in some ways and worse in others. It's clear  
what's happening, there's a lot of flexibility, and the CPU isn't  
actually aware of -how- something is being done, just that, say,  
this would be a good time to check for interrupts, whatever that  
means. It's not as great because you have more complex interactions  
between ISAs and CPUs, and you have to do a bit more work in the  
ISA. It can also be hard to make some of this functionality work  
sensibly in order, out of order, single cycle, multi cycle, timing  
mode, atomic mode, etc. etc.


4. This one is the best when you can get away with it. This is where  
you, say, make your integer register file 32 registers because the  
ISA says that's how many it needs. Basically the only draw back to  
this is that behavior changes a little based on each ISA, but if you  
can get away with it this is the safest.




I think having a multi ISA simulator that will be modified by its  
end users, especially one with as large a cross product of options  
as ours, needs to try to be safe and simple before being as  
absolutely fast as it can be. You know what they say about premature  
optimization. Ideally we should design things so the big,  
unnecessary pieces of code just aren't part of the equation because  
they're in the ISA defined objects, control just doesn't go that way  
when it wouldn't make sense, etc. And if, for instance, a pointer  
should always be 0, the code should still behave correctly. The code  
should do its job correctly no matter -why- it's being asked to do it.


I think it's bad news to have a big list of yes or no checkboxes in  
each ISA directory which turn on and off behaviors. This is  
especially true when it's ambiguous whether to say yes or no, if the  
behavior changes based on circumstances, etc., and if/when the  
checkbox is interpretted subtley (or not so subtley) differently by  
the consumer.


In this particular limitted case it seems relatively ok although not  
necessarily advisable, but it's a slippery slope I don't consider us  
have a completely clean history with.


Gabe

Quoting Steve Reinhardt ste...@gmail.com:


In generalI think this is the kind of ISA hook we should be using...
in the sense that checking TheISA::HasUnalignedMemAcc is much better
than (TheISA == x86 || TheISA == Power).  I think it's useful not
only to avoid the overhead of a dynamic check for an ISA that doesn't
need it, but also to clarify that this is code that never gets
executed on those ISAs.  Maybe for a little one-liner like this it's
not a big deal either way, but for bigger hunks of code I think that
clarification is potentially useful.

Steve

On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black
gbl...@eecs.umich.edu wrote:

Yes it does, and that sounds reasonable to me. I'd still like to see us use
ISA hooks as minimally as possible, but this seems ok.

Gabe

Quoting Timothy M Jones tjon...@inf.ed.ac.uk:


Oh, ok, I see where you're going with that.  However, the main idea of
having TheISA

Re: [m5-dev] fixed a problem with XCHG macro-op in X86

2010-07-20 Thread Gabriel Michael Black
You're probably right as far your diagnosis and fix for the problem,  
but please give me a chance to verify before you check in.


The M, R and P suffixes are for the different operand types, and as  
I'm sure you guessed, M means memory and R means register. P means RIP  
relative memory reference, and is separated out because those versions  
need to put rip into a regular microcode register before actually  
doing the access. This is partially an effect of the microop ISA I  
chose predating 64 bit mode and RIP relative addressing.


Gabe

Quoting Krishna, Tushar tushar.kris...@amd.com:


Hi,
I ran into what could be a bug in the implementation of the XCHG  
macro-op in X86.
I was trying to run a binary built using m5threads through X86_SE,  
and observed incorrect mutex-locking behavior.
Closer inspection showed this (execution trace, I have copied only  
the relevant micro-ops):


177896500: system.cpu05 T0 : @pthread_mutex_unlock+19.1  :   MOV_M_I  
: st   t1b, DS:[rbx + 0x4] : MemWrite :  D=0x  
A=0x695dc4

...
17790: system.cpu06 T0 : @pthread_mutex_lock+33.0  :   XCHG_M_R  
: ldst   t1b, DS:[rdx] : MemRead :  D=0x00401700 A=0x695dc4
17790: system.cpu07 T0 : @pthread_mutex_lock+33.0  :   XCHG_M_R  
: ldst   t1b, DS:[rdx] : MemRead :  D=0x00401700 A=0x695dc4
177900500: system.cpu07 T0 : @pthread_mutex_lock+33.1  :   XCHG_M_R  
: st   al, DS:[rdx] : MemWrite :  D=0x0001 A=0x695dc4
177900500: system.cpu06 T0 : @pthread_mutex_lock+33.1  :   XCHG_M_R  
: st   al, DS:[rdx] : MemWrite :  D=0x0001 A=0x695dc4

...

cpu05 does an unlock, and then *BOTH* cpu06 and cpu07 get the lock!
The reason is that both try and do a XCHG_M_R (atomic exchange  
between memory and register) and succeed!


The XCHG implementation  
(src/arch/x86/isa/insts/general_purpose/data_transfer/xchg.py)

has a comment saying
# All the memory versions need to use LOCK, regardless of if it was set

However, the descpription of XCHG_M_R is
def macroop XCHG_M_R
{
ldst t1, seg, sib, disp
st reg, seg, sib, disp
mov reg, reg, t1
};

while the description of XCHG_LOCKED_M_R is
def macro-op XCHG_LOCKED_M_R
{
ldstl t1, seg, sib, disp
stul reg, seg, sib, disp
mov reg, reg, t1
};


Changing XCHG_M_R to use the locked micro-ops (ldstl and stul) fixed  
the incorrect mutex-locking issue.
I am not sure if this is a bug, or is there some reason for not  
using the locked micro-ops?


There are similar, unlocked micro-ops used for XCHG_R_M, XCHG_R_P  
and XCHG_P_R ... I am not sure if they all need to be changed to  
locked?

[What does the P in R_P  mean? Is this also referencing memory?].
If they all need to be changed to use ldstl and stul, I can create a  
patch for the same and check it in.


Thanks,
Tushar





___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] configs dir

2010-07-19 Thread Gabriel Michael Black
I'm not running on a lot of sleep, so sorry in advance if this email  
is less coherent than normal.





I was actually thinking about this the other day I think because of  
the SMARTS config changes. My suggestion would be to build up  
composite primitives that can be wrapped up in thin, simple, flexible  
scripts to get common configurations. I think we really need to set up  
stronger, more opaque, better defined boundaries between the different  
pieces of the configuration package, and move things to where they  
make sense to be logically. All of this is handwavy, of course, but I  
think it's important to get away from the model, intentional or  
otherwise, of hacking up the example se.py. I think we need to either  
make se.py a real thing where it's an official interface to M5 (as  
it's actually being used today), or deflate it until people will  
basically be starting from scratch when they reuse it. The pieces are  
also all too tied to each other to be able to change things too much  
without some significant effort.


Changing configs to m5configs seems a little redundant to me. What  
would we be distinguishing it from?


To step back a little, how do we imagine people are going to use the  
config stuff? I'd imagine people will probably want to be able to put  
together a sane, architypical configuration really quickly without a  
lot of code, get at the --whatever options they're used to that allow  
them to control important parameters like the actual executable easily  
without having to reimplement them, and then they're going to want to  
be able to add their twist, whatever that might be, without having to  
redo everything from scratch.


That, to me, suggests it's good to have some canned, gently  
parameterized configurations that are not implemented directly in the  
main script.


It would be great, I think, if we had a blob of prepackaged --whatever  
options for the various scenarios that make sense. Then they can be  
defined in a common way, and either the various components of the  
system know where to look because the options are well known (or have  
some options.getVal thing for the option modules?) or they're just  
stubs and need to be plugged into the (hopefully clearly marked, in a  
philisophical sense) consuming objects.


Modularization and appropriate parameterization is key. If I want to  
put my twist in there by, say, replacing all L1s with my superL1  
object, I shouldn't have to change any (or at least minimal) canonical  
library code, nor should I need to copy and paste large chunks of it  
to change one or two things. You can't parameterize everything, but  
more is likely better than less.


Finally, this hypothetical person will probably want to put their  
scripts someplace, and if there's no obvious alternative I think  
they'll just end up wedged in the configs directory as  
examples/myse.py for instance. Maybe we should have a directory  
specifically for the highest level scripts that are either written by  
users or do very little to wrap other functionality?


Also, how strong is the name common? Is that common as in this  
shows up every now and then or common in this is common to all  
scripts? I don't know if it's a problem or not, but we should pick a  
strenght and stick to it, and things that don't really fit should go  
somewhere else. Stuff that's occasionally useful and can be pulled in  
when needed shouldn't get in the way of truely universal stuff.


Would using -m and -c add additional required knowledge for someone  
using M5? I had to look up what they do. If they don't make anything  
fundementally cleaner or more capable they seem like unnecessary  
complication to me. Like I said, though, I had to look them up so I'm  
sure I'm missing their subtle nuances.


Gabe

Quoting nathan binkert n...@binkert.org:


Ok, so the current configs dir is a mess.  I wanted to use it to try
to create an example for my batch scripts, but as it is, the code is
pretty messy, so I'd like to start a conversation about where we'd
like to take the config stuff.  I think the following are some goals:

1) Turn the configurations into a package
2) Move as much code as possible out of the various example files
3) Support the python -m command line option in M5.  (and perhaps use
the runpy module to do stuff in main.py)  (Support -c as well?)
4) Move the examples stuff into the configs package and expect that -m
will be used for examples (m5 -d -m configs.examples.se options)
5) get rid of addToPath usage

Suggested steps:
Move configs/common/* into configs/
Create empty configs/__init__.py
Create empty configs/examples/__init__.py
Move common/Options.py into configs/examples/Options.py
Fix up imports.
Should we rename configs to m5config?

Comments?  One question is, what do we do with the boot dir?  I think
we should just leave it.  There's no prohibition against having data
files in python packages.

  Nate

Re: [m5-dev] m5threads in X86_SE

2010-07-16 Thread Gabriel Michael Black
It would be better to send this to m5-users where you'll reach more  
people. I expect basically everyone on this list is on that list as  
well.


Gabe

Quoting Krishna, Tushar tushar.kris...@amd.com:


Hi,

I am trying to run the tests in the m5threads package through M5's X86_SE.
[I was successfully able to run all the tests cross-compiled for  
sparc through M5's SPARC_SE].


All the multi-threaded tests get stuck somewhere in the middle when  
the number of spawned threads becomes  1.
(They run fine on the native x86_64 machine which means that there  
is nothing wrong with the compilation of the tests).


The only visible problem seems to be this warning in copyMiscRegs  
that is printed out whenever the clone system call is made:

warn: copyMiscRegs is naively implemented for x86

There are also a bunch of futex warnings, which the README file  
attributes to printfs in the test files, so I am ignoring those.
I also tried building M5 with linuxthreads instead of NPTL, as  
suggested in the README, but the problem still exists.


Could anyone give me some pointers on why m5threads does not work  
with M5's X86_SE?


Thanks,
Tushar




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] ARM Linux

2010-07-15 Thread Gabriel Michael Black

Awesome :-).

Gabe

Quoting Ali Saidi sa...@umich.edu:



The uart is still a bit flaky and there is tons more todo, however:

 m5 slave terminal: Terminal 0 
[0.00] Linux version 2.6.28-arm2-eb-a9-arm-nano-tiny-up-wa-4.3.3
(alisa...@aus-bc5-b7) (gcc version 4.3.3 (Sourcery G++ Lite 2009q1-203) )
#2 Tue Jun 15 16:05:55 CDT 2010
[0.00] CPU: ARMv7 Processor [350f] revision 0 (ARMv7),
cr=10c53c7f
[0.00] CPU: VIPT nonaliasing data cache, VIPT nonaliasing
instruction cache
[0.00] Machine: ARM-RealView PBX
[0.00] Ignoring unrecognised tag 0x
[0.00] Memory policy: ECC disabled, Data cache writeback
[0.00] Built 1 zonelists in Zone order, mobility grouping on.
Total pages: 32512
[0.00] Kernel command line: earlyprintk mem=128MB console=ttyAMA0
lpj=19988480
[0.00] console [earlyser0] enabled
[0.00] PID hash table entries: 512 (order: 9, 2048 bytes)
[   40.381302] Dentry cache hash table entries: 16384 (order: 4, 65536
bytes)
[  149.222615] Inode-cache hash table entries: 8192 (order: 3, 32768
bytes)
[   11.936310] Memory: 128MB = 128MB total
[   52.760156] Memory: 126008KB available (1028K code, 107K data, 2580K
init)
[   31.414578] Calibrating delay loop (skipped) preset value.. 3997.69
BogoMIPS (lpj=19988480)
[   54.839468] Mount-cache hash table entries: 512
[  135.932017] CPU: Testing write buffer coherency: ok
[   39.369672] L2X0 cache controller enabled
[7.872702] msgmni has been set to 246
[  119.403964] io scheduler noop registered (default)
[  100.576888] Serial: AMBA PL011 UART driver
[   76.364701] dev:f1: ttyAMA0 at MMIO 0x10009000 (irq = 44) is a
AMBA/PL011
[   82.096644] console handover: boot [earlyser0] - real [ttyAMA0]
[  141.660657] oprofile: cpu_architecture() returns 0x9, using arm/a9
model
[   79.735997] VFP support v0.3: implementor 00 architecture 0 part 00
variant 0 rev 0
[   35.900503] Freeing init memory: 2580K
Mounting /proc... done.
Mounting /sys... done.
Mounting /tmp as tmpfs

(none) login: root
~ $ ls
~ $ cat /proc/cpuinfo
Processor   : ARMv7 Processor rev 0 (v7l)
BogoMIPS: 3997.69
Features: swp half thumb fastmult vfp edsp neon
CPU implementer : 0x35
CPU architecture: 7
CPU variant : 0x0
CPU part: 0x000
CPU revision: 0

Hardware: ARM-RealView PBX
Revision: 
Serial  : 

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev




___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] a question on CPU assertions

2010-07-13 Thread Gabriel Michael Black
That one I think should be fixed in the decoder. If the mode isn't  
valid, an Uknown(machInst) should be returned rather than the Srs  
instruction. I'll try to get the previous patch cleaned up and  
committed and a patch for this to you this evening. I think it would  
be a worthwhile but time consuming and tedious exercise to make sure  
the bad instruction encodings (like in this case) actually cause  
undefined exceptions rather than blissfully plowing ahead.


Gabe

Quoting Min Kyu Jeong mkje...@gmail.com:


I tried it, and it solved the particular segfault problem. Thanks!

However, a new problem came up further down the execution stream. It is also
from a garbage instruction. Invalid register mode is extracted from the
garbage binary and used in intRegInMode(). Later during the rename stage,
flattenIntIndex() extracts this mode out and found out it's invalid and
panics.

decoder.cc
  if (val == 0x4) {
  *const uint32_t mode = bits(machInst, 4,
0);*
  switch (bits(machInst, 24, 21)) {
case 0x2:
  return new
SRS_STORE_IMM_AN_PY_SN_UN_WN_SZ8(machInst, *mode*,
  SrsOp::DecrementAfter, false);


SRS_STORE_IMM_AY_PY_SN_UN_WY_SZ8::SRS_STORE_IMM_AY_PY_SN_UN_WY_SZ8(ExtMachInst
machInst,
uint32_t _regMode, int _mode, bool _wb)
 : SrsOp(srs, machInst, MemWriteOp,
 (OperatingMode)*_regMode*, (AddrMode)_mode, _wb)
{

_srcRegIdx[0] = MISCREG_CPSR + Ctrl_Base_DepTag;
_srcRegIdx[1] = (condCode == COND_AL || condCode == COND_UC) ?
   INTREG_ZERO : INTREG_CONDCODES;
* **_srcRegIdx[2] = intRegInMode((OperatingMode)regMode, INTREG_SP);*



int
flattenIntIndex(int reg)
{
assert(reg = 0);
if (reg  NUM_ARCH_INTREGS) {
return intRegMap[reg];
} else if (reg  NUM_INTREGS) {
return reg;
} else {
int mode = reg / intRegsPerMode;
reg = reg % intRegsPerMode;
switch (mode) {
  case MODE_USER:
  case MODE_SYSTEM:
return INTREG_USR(reg);
  case MODE_FIQ:
return INTREG_FIQ(reg);
  case MODE_IRQ:
return INTREG_IRQ(reg);
  case MODE_SVC:
return INTREG_SVC(reg);
  case MODE_MON:
return INTREG_MON(reg);
  case MODE_ABORT:
return INTREG_ABT(reg);
  case MODE_UNDEFINED:
return INTREG_UND(reg);
*  default:*
*panic(Flattening into an unknown mode.\n);*
}
}



On Mon, Jul 12, 2010 at 10:04 PM, Min Kyu Jeong mkje...@gmail.com wrote:


I will try it tomorrow and let you know.


On Mon, Jul 12, 2010 at 6:51 PM, Gabriel Michael Black 
gbl...@eecs.umich.edu wrote:


Did that patch fix it?

Gabe


Quoting Gabe Black gbl...@eecs.umich.edu:

 Here's more or less what's going on as far as the register index. The

load microop needs to store into register 1, and it needs to be sure it
stores into the version visible from the user mode. It does that by
applying the intRegInMode function which shifts the register index 1 by
MODE_USER * the number of integer registers. Later, the flattenIntIndex
function is called to unambiguously figure out what a particular
register index goes with given the current values of various ISA state
(specifically the CPU mode for ARM) or other conditions flagged by
putting the register index in a particular range. You can see in the
else clause that the mode is re-extracted from the index using an
integer division and the offset of 1 is extracted using a mod. This is
then translated into the actual register visible from that mode with
that index. From this point forward, the CPU can pretend the integer
register file is one big flat contiguous space and totally ignore the
ISAs register indexing semantics.

One additional mechanism is at work when actually storing the register
index in the StaticInst object. There are really three different types
of register indices, integer, float and misc (which could have also been
called other or control), but these are all stored in the same array
with no flag to distinguish them. To be able to tell them apart later,
an offset is added to them so that the integer indexes are all from 0 to
FP_Base_DepTag - 1, the floating point registers are all from
FP_Base_DepTag to Ctrl_Base_DepTag - 1 (inconsistently named for
historical reasons), and the misc registers are from Ctrl_Base_DepTag
and up. This is a fairly fragile system since if a, say, integer index
is large enough, it might spill into the fp or misc range and be
misidentified later. I'd like to replace

Re: [m5-dev] O3CPU + translateTiming

2010-07-13 Thread Gabriel Michael Black
I think you've mostly interpretted this correctly. The instructions  
aren't retried if the translation fails, they just hang around and  
wait for it. The check if fault == NoFault will work if the  
translation is finished by the time initiateTranslation is done.  
That's true for everything we have now except x86 and ARM, neither of  
which is currently supported by O3. What might work to fix this is to  
move the code that checks for a fault and calls the CPU read/write  
function into the callback itself. That way once translation is done,  
whenever that may be, the correct action will happen.


Gabe


Quoting Min Kyu Jeong mkje...@gmail.com:


Thanks, Tim

It looks like the for the DTLB translation, some code is there to handle
this but not complete, for the ISAs that does hardware page table walk.

cpu/base_dyn_inst.hh
BaseDynInstImpl::read(Addr addr, T data, unsigned flags)
{
...
initiateTranslation(req, sreqLow, sreqHigh, NULL, BaseTLB::Read);

if (fault == NoFault) {
effAddr = req-getVaddr();
effAddrValid = true;
fault = cpu-read(req, sreqLow, sreqHigh, data, lqIdx);
} else {
...
this-setExecuted();
}

It first initiate translation, and would call cpu-read() as long as a fault
has not been generated during the translation. This should work for the
Alpha, where TLB miss is treated as fault and handled in software by
PALcode. Alpha TLB returns a fault in case of a miss.

For the ISAs that does hardware page-table walk, the TLB-miss instruction
shout not either start a read (cpu-read()) or taken out of the instruction
window (this-setExecuted()). I think it should wait for the table walk to
finish and retry the execution of the load/store (it might be not true
depending on the implementation??)

I looked into the x86 code, and if the memory is timing, then the pagetable
walker would initiate a memory access and return without a fault - it means
the cpu-read() would be called w/o the translation finished. It is the same
case for the Arm.

Is there any plan or ongoing effort to support this wait-on-TLB-miss on the
other ISAs? or ideas about how to go about implementing it?

Thanks,

Min

On Mon, Jul 12, 2010 at 5:44 PM, Timothy M Jones tjon...@inf.ed.ac.ukwrote:


Hi Min,

The way that the TLB deals with a timing translation is specific to each
ISA.  I don't have much experience with anything other than Power but for
that ISA, yes, you're correct.  The timing translation is just a wrapper
around the atomic translation.  It seems from a quick check that Alpha is
the same.

If you actually wanted to have the fetch translation finish on a different
cycle to the one it was initiated on then you would have to make some
changes to the fetch stage to allow that.  I wouldn't have thought it would
be too difficult but might require splitting up several functions into code
that's executed before the translation and code that's executed afterwards.

Cheers
Tim


On 12/07/2010 18:14, Min Kyu Jeong wrote:


Hi,

This question is regarding the changeset
(http://repo.m5sim.org/m5?cmd=changeset;node=a123bd350935).

   This initiates a timing translation and passes the read or write on
   to the

   processor before waiting for it to finish


It looks like even in the event of TLB miss, TLB-walk does not delay the
actual execution of the loads. Am I correct?

I was trying to find a reference for replacing the translateAtomic() in
the fetch stage w/ translateTIming(). It would require some mechanism to
stop the actual fetch until the translation is finished - which doesn't
seem to exist in the O3 CPU even for the data translation.

Thanks,

Min



___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



--
Timothy M. Jones
http://homepages.inf.ed.ac.uk/tjones1

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev






___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] cleaning up TimingSimpleCPU

2010-07-13 Thread Gabriel Michael Black

Quoting Ali Saidi sa...@umich.edu:



On Tue, 13 Jul 2010 11:20:26 -0700, Gabe Black gbl...@eecs.umich.edu
wrote:


This also brings up another idea I've been rolling around for a while.
Why is all the control state local to the miscregfile/it's decendant the
ISA object? Why don't we put control state that matters to the TLB, or
at least a copy of it, in the TLB itself and then communicate it back
and forth as necessary? That would be easier to code (or at least I'm
guessing) since you'd just have the state right there, faster since it
avoids calling out for it, and would more conceptually match real
hardware where all the control state isn't put in one huge blob
someplace. The same thing could be done for other structures like the
interrupt controller, and maybe the decoder and/or predecoder. Speaking
of the decoder, it would be nice to make that a little stateful as well.
As it is in, say, ARM, the decoder has to rediscover what mode it's in
over and over. I'm guessing it would be better to explicitly switch it's
state (or it entirely) when changing modes instead, although that might
add a fair amount of complexity. Perhaps the decoder should be an object
instead of a bare function? I'm less sure how that would work. It could,
hypothetically, allow us to return the two PC bits commandeered to
signal the mode.


At least for ARM, I intended to have the TLB cache the various translation
state (CPSR, SCTLR) and
invalidate the cache on CPSR, SCTLR writes to that it wouldn't have to
access those misc registers every cycle. I think part of the issue is that
those registers are accessed by a variety of things so it isn't necessarily
better for it to live in the TLB. I don't know that there is anything
preventing the misc register accesses from calling another function on the
tlb to update registers when they're written. Couldn't the same thing be
used for these other suggestions? Assuming that we make the decoder an
object (which I don't think should be too hard), in the case of changing
the state of the cpu couldn't the misc register write call
tc-decoder-changeState(foo);

Ali

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev



That's a reasonable approach, especially for CPSR which (for people  
that don't know ARM) has a bunch of execution level control state like  
which endianness to use for loads/stores. On the other hand, though,  
there's control state that only applies to translation like what ASI  
to use, segmentation limits and attributes in x86 which aren't  
directly accessible most of the time except with hardware  
virtualization extensions, etc. Then it makes sense to stick it in the  
TLB and make the ISA object ask for it when it needs it. Fortunately  
all these things are ISA specific objects, I think, and can be  
designed to work together.


Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


  1   2   3   >