Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Leopold Toetsch
Bill Coffman [EMAIL PROTECTED] wrote:
 Leo,

 Thanks for your suggestions and comments.

Welcome and thanks to you for looking at that nasty piece of code ;)

 On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Some remargs WRT gen{3,4}.pl:
 1) While these programs exhibit some worst case register layout it's
probably not a very typical layout.

 Agreed.  The idea was to automate and compare to gcc.  There are
 real-world tests already in the parrot test suite, but because I can
 generate millions of such cases, one hope is to detect errors, and to
 get some kind of performance metric, even if these programs are
 artificial.

Ok, that's of course very reasonable. One of the problems Dan
encountered is memory usage, though. Currently the interference graph is
built for all four kinds of registers in one piece. By splitting the
register allocation into four passes, much memory can be saved.

But the memory usage can be estimated w/o using four register kinds too.

 I'd change the simulation program to use PMCs to allow for 2). Now when
 it comes to spilling, these lexicals or globals don't need a new
 storage, their live range can just get discarded and at the next usage
 of this lexical or global it just can be refetched[1]. Implementing this
 should already vastly improve the register allocation.

 [1] The refetching can of course be into a different Parrot register.
 Thus the usage of globals or lexicals wouldn't interfer and generate
 huge live ranges over the whole function as it currently does.

 I don't quite understand this.  You think  I should create PMC
 variables, instead of integers?

Yes. As said, with PMC lexicals or globals the register allocation can
be simplified. But it depends. E.g.

   .local pmc foo
   ...
   find_global foo, foo
   ...
   find_global foo, foo

The Cfoo is an Cout argument. If your algorithm does register
renaming, all is fine. If it doesn't, like now, all the usage of the
Cfoos take the same register over the whole unit, which is the major
PITA of the current register allocation.

And of course, lexicals and globals already have a storage, you don't
need to spill them.

One more difference between {I,N,S} and {P} registers is the value
behavior. E.g.

  =item Badd(out INT, in INT, in INT)
  =item Badd(out NUM, in NUM, in NUM)
  =item Badd(in PMC, in PMC, in PMC)
   ^^

That is, with {I,N,S} register ranges are cut early at each binary
operation. You don't have that with PMCs.

 ... It's probably a good idea to have a
 mix of the four types I suppose.

If you want to compare with gcc too, you could use three types:

   I ... int
   N ... double
   P ... V4SI   (vector for sse, altivec, ...)

 Once thing I can say is that the interference graph is pretty
 conservatively generated.  When a variable is reassigned, it can be
 treated as a new variable which can reduce register pressure, but I
 don't think the code is doing that yet.

Yep, register renaming. That would already improve things vastly.

[ metrics ]

 could be helpful.  That could be incorporated into the register
 allocator itself (which already is, if you run with -d 0008, then grep
 Spill).

With much less output -v. The spill number is ok, the usage count is
broken, though.

 I don't know if we need some syntax to easily support lexicals or
 globals or if the register allocation can deduce that itself. But if a
 new syntax simplifies that, we put it in.

 My preference is to not distinguish between various types of
 variables, but to have the algorithms deal best with nodes based on
 the structure of the graph(s).

That's good. But as said, you don't have to spill lexicals or globals.

 For 3) there is already a separate pass in
 imcc/reg_alloc.c:allocate_non_interfering().

 Yes.  This function seems to find variables whose live ranges (life
 ranges) are restricted to a single basic block, and assign them reg
 28,29, or 30.

   It's
 usually best to color the easy stuff last.

I did assign them first to reduce the size of the interference graph,
which was one of the problems - out of memory. But that is better
solved by doing four passes.

 ...  Another
 optimization to the algorithm is to incorporate a score that causes
 the most frequently accessed nodes not to get spilled.

... and stuff inside inner loops.

 scoring mechanism is pretty primitive right now

Yep. A broad field for experiments.

 ...  I think it's important to
 look at experimental results as these decisions are made.

Yep.

 With so many registers, it may be more important to focus on swapping
 registers in and out between subroutine calls, rather than the actual
 register allocation optimization.

That's currently being addressed. Each subroutine gets a fresh set of
registers.

 ... Another thing that might be worth
 checking, after parrot gets out of alpha, is if reducing or
 increasing the number of registers will help performance.  Just a
 thought.

The 4 x 32 is pretty good. It matches 

Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Jeff Clites
On Oct 20, 2004, at 12:09 PM, Leopold Toetsch wrote:
Dan Sugalski wrote:
'Kay, now I'm confused. I thought we were talking about removing the 
registers from out of the interpreter structure, which'd leave us 
needing two pointers, one for the interpreter struct and one for the 
registers.
Ok, short summary of future layout of JIT regs:
itemPPC   i386

interpreter r13   -16(%ebp)
frame pointer   r16%ebx
Register addressing is done relative to the frame pointer, which will 
be in a register. The interpreter isn't used that often inside integer 
JIT  code, so it isn't in an register in i386 but is easily reloaded 
into one.

Currently the frame pointer and the interpreter are the same.
Just to clarify: This is the approach wherein each frame gets a fresh 
set of registers, and function call and return (or continuation 
invocation) copy the relevant registers between the register sets? And 
this isn't quite the scheme from the towards a new call scheme 
thread, in which we'd be duplicating the interpreter context for each 
frame, right? (And the latter was what you did in Proof of concept - 
hack_42 (was: the whole and everything), right?)

Just trying to sort out all of the ideas.
JEff


Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Jeff Clites
On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote:
Bill Coffman [EMAIL PROTECTED] wrote:

And of course, lexicals and globals already have a storage, you don't
need to spill them.
I'm not sure that's true. If there's no 'eval' in scope, lexicals don't 
have to live in pads--they could purely exist in registers. And with 
tied namespaces and such, it may not be legitimate to re-fetch a global 
(ie, to fetch it multiple times, if the code appears to only fetch it 
once) -- one could pathologically have a global whose value appears to 
increase each time it's fetched, for instance, or you could end up with 
multiple round-trips to a database.

... Another thing that might be worth
checking, after parrot gets out of alpha, is if reducing or
increasing the number of registers will help performance.  Just a
thought.
The 4 x 32 is pretty good. It matches recent hardware too. But if a 
good
register algorithm shows that 4 x 16 is enough, we can of course
decrease the register number. Increasing shouldn't be necessary.
Matching hardware is probably not too significant--even though the PPC 
has 32 int registers, we can't map to all of them in JIT (some are 
dedicated to holding the interpreter pointer, etc.), and we'd really 
need 3 x 32 hardware int registers to accommodate all we'd like (I, S, 
and P registers). So even currently it's a loose match.

JEff


Re: Python, Parrot, and lexical scopes

2004-10-21 Thread Allen Short
On Tue, Oct 19, 2004 at 11:23:13AM +0200, Leopold Toetsch wrote:
 * the import statement is simulated too by storing the lexicals into the
   caller's frame. This would very likely be another Python opcode.


I should point out that this is much more like Python's semantics for
import * than Dan's overlapping-namespaces idea -- import really
means copy these bindings into my current module. In particular,
after doing from foo import *, subsequent additions or removals of
names to foo will not be reflected in the bindings of the module
importing them.

Allen


Should Resizable*Array's be chunked lists?

2004-10-21 Thread Bernhard Schmalhofer
Hi,

I have started to work on some of the missing ops for the Resizable*Array
PMCs.
I noticed that, unlike the Array and PerlArray PMC, they currently do not
use the functionality from src/list.c.
This means that a ResizablePMCArray cannot be broken up in chunks and

  set P0, .ResizablePMCArray
  P0[1000] = 1000

causes a lot of memory allocation.

Is there a design reason for not having a chunked implementation?

I have started to cutpaste some code from array.pmc to
resizablepmcarray.pmc.
So far it looks OK, with only test 15 of t/op/gc.t and test 10 of
t/pmc/resizablepmcarray.t failing.

One thing, that is propably hard to cover, is the method sort inherited
from FixedPMCArray. It looks like 'qsort' needs a continous piece of memory
for sorting.

CU, Bernhard

-- 
/* [EMAIL PROTECTED] */

GMX ProMail mit bestem Virenschutz http://www.gmx.net/de/go/mail
+++ Empfehlung der Redaktion +++ Internet Professionell 10/04 +++



perl6-compiler

2004-10-21 Thread Leopold Toetsch
perl6-compiler is missing at http://dev.perl.org/perl6/lists/ and it 
seems not to be gatewayed to nntp.perl.org. OTOH a lot of unused perl6 
lists and newsgroups are there.

Finally I tried to subscribe already twice and got no answer.
Thanks,
leo


Re: perl6-compiler

2004-10-21 Thread Herbert Snorrason
http://www.nntp.perl.org/group/perl.perl6.compiler -- It's there.

Correct, though, that it's not listed on the lists page...

On Thu, 21 Oct 2004 09:02:23 +0200, Leopold Toetsch [EMAIL PROTECTED] wrote:
 perl6-compiler is missing at http://dev.perl.org/perl6/lists/ and it
 seems not to be gatewayed to nntp.perl.org. OTOH a lot of unused perl6
 lists and newsgroups are there.
 
 Finally I tried to subscribe already twice and got no answer.
 
 Thanks,
 leo
 
 

-- 
Schwäche zeigen heißt verlieren;
härte heißt regieren.
  - Glas und Tränen, Megaherz


Re: Should Resizable*Array's be chunked lists?

2004-10-21 Thread Leopold Toetsch
Bernhard Schmalhofer [EMAIL PROTECTED] wrote:
 Hi,

 Is there a design reason for not having a chunked implementation?

Then you could use Array/PerlArray anyway (modulo the PerlUndef).

 One thing, that is propably hard to cover, is the method sort inherited
 from FixedPMCArray. It looks like 'qsort' needs a continous piece of memory
 for sorting.

Well, qsort can't remain for various reasons. The main one is that it
doesn't allow additional params to be passed to the sort function.
Currently these are static globals, which makes the sort non-reentrant.

For performance reasons we might need some sort/merge algorithm anyway
and then it can work through chunks too.

 CU, Bernhard

leo


Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Leopold Toetsch
Jeff Clites [EMAIL PROTECTED] wrote:

 Just to clarify: This is the approach wherein each frame gets a fresh
 set of registers, and function call and return (or continuation
 invocation) copy the relevant registers between the register sets?

Yes. Function arguments and return values get copied as well as the
interpreter context. Its basically the same as was in, until around
0.0.3, except that we had 4 register frame pointers and 4 stacks. I
really want to have just one register frame pointer now. It avoids 3/4th
of the stack push/pop overhead and mappes nicely to run loops, including
JIT.

 ... And
 this isn't quite the scheme from the towards a new call scheme
 thread, in which we'd be duplicating the interpreter context for each
 frame, right? (And the latter was what you did in Proof of concept -
 hack_42 (was: the whole and everything), right?)

Yep. That scheme was a bit too error prone during implementation
attempts and it didn't perform well for recursive functions. Changing
the interpreter pointer, or having an indirection fot the access to the
interpreter, isn't really simple.

 JEff

leo


Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Leopold Toetsch
Jeff Clites [EMAIL PROTECTED] wrote:
 On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote:

 And of course, lexicals and globals already have a storage, you don't
 need to spill them.

 I'm not sure that's true.

It should read: if there are lexical or global opcodes, lexicals and
globals have a storage.

 ... If there's no 'eval' in scope, lexicals don't
 have to live in pads--they could purely exist in registers.

And if there is no introspecition and what not.

 tied namespaces and such, it may not be legitimate to re-fetch a global
 (ie, to fetch it multiple times, if the code appears to only fetch it
 once) -- one could pathologically have a global whose value appears to
 increase each time it's fetched,

Then there's something horribly wrong with that usage of tie: not the
value is refetched from the var - the var is refetched from the
namespace.

 JEff

leo


ICU failure on RedHat

2004-10-21 Thread Joshua Gatcomb
Yes, that's right folks - I said RedHat.

I got a shiny new development machine at work that I
can install anything I want on.  The trouble is that
it doesn't have any net access so I have to transfer
everything via memory stick or CD.  I just so happened
to have some RH CDs lying around, so that's what I
went with.

1.  Transfer ICU 3.0 and build from source
2.  Do a fresh CVS checkout of parrot, transfer and
build

$ perl Configure.pl --optimize
configure all goes well

$ make
all goes well until parrot is linked

c++ -o parrot -L/usr/local/lib -Wl,-E -g imcc/main.o
blibl/lib/libparrot.a -lnsl -ldl -lm -lpthread -lcrypt
-lutil -lrt -lgmp -lpthread -lm -L/usr/local/lib
-licuuc -licudata -lpthread -lm

./parrot -o runtime/parrot/include/parrotlib.pbc
runtime/parrot/library/parrotlib.imc

./parrot:  error while loading shared libraries: 
libicuuc.so.30: cannot open shared object file:  No
such file or directory
make: *** [runtime/parrot/include/parrotlib.pbc] Error
127

ICU libs are in /usr/local/lib
/usr/local/lib is in my path
all libs are executable
libicuuc.so.30 indeed does exist in that directory


Any advice?

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region




__
Do you Yahoo!?
Yahoo! Mail Address AutoComplete - You start. We finish.
http://promotions.yahoo.com/new_mail 


Re: Should Resizable*Array's be chunked lists?

2004-10-21 Thread Dan Sugalski
At 11:17 PM +0200 10/20/04, Bernhard Schmalhofer wrote:
Hi,
I have started to work on some of the missing ops for the Resizable*Array
PMCs.
I noticed that, unlike the Array and PerlArray PMC, they currently do not
use the functionality from src/list.c.
This means that a ResizablePMCArray cannot be broken up in chunks and
  set P0, .ResizablePMCArray
  P0[1000] = 1000
causes a lot of memory allocation.
Is there a design reason for not having a chunked implementation?
At this point, I think a chunked implementation is actually 
significantly sub-optimal. It makes sense for large or sparse arrays. 
Most of the arrays we'll be dealing with are going to be smallish, 
and the list stuff is unneeded overhead. The arrays can, if we want 
to get clever, switch to a chunked representation when things get big 
enough.

I think I'd rather the resizeable and fixed arrays went with a 
non-chunked scheme by default, rather than a chunked one. We'd pick 
up some speed on array access that way in the majority of cases.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU failure on RedHat

2004-10-21 Thread Joshua Gatcomb

--- Joshua Gatcomb [EMAIL PROTECTED]
wrote:

 
 1.  Transfer ICU 3.0 and build from source
 2.  Do a fresh CVS checkout of parrot, transfer and
 build
 
 $ perl Configure.pl --optimize
 configure all goes well
 
 $ make
 all goes well until parrot is linked
 
 c++ -o parrot -L/usr/local/lib -Wl,-E -g imcc/main.o
 blibl/lib/libparrot.a -lnsl -ldl -lm -lpthread
 -lcrypt
 -lutil -lrt -lgmp -lpthread -lm -L/usr/local/lib
 -licuuc -licudata -lpthread -lm
 
 ./parrot -o runtime/parrot/include/parrotlib.pbc
 runtime/parrot/library/parrotlib.imc
 
 ./parrot:  error while loading shared libraries: 
 libicuuc.so.30: cannot open shared object file:  No
 such file or directory
 make: *** [runtime/parrot/include/parrotlib.pbc]
 Error
 127

Oddly enough, it works if I don't use a system ICU. 
All tests pass normally, but I get 3 tests failing
under JIT

t/op/interp.t #7
t/pmc/coroutine.t #10
t/pmc/exception.t #19

I haven't investigated further as I am still confused
as to why a system ICU would fail.

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region






Re: ICU failure on RedHat

2004-10-21 Thread Joshua Gatcomb

--- Joshua Gatcomb [EMAIL PROTECTED]
wrote:

 
 --- Joshua Gatcomb [EMAIL PROTECTED]
 wrote:
 
  
  1.  Transfer ICU 3.0 and build from source
  2.  Do a fresh CVS checkout of parrot, transfer
 and
  build
  
  $ perl Configure.pl --optimize
  configure all goes well
  
  $ make
  all goes well until parrot is linked
  
  c++ -o parrot -L/usr/local/lib -Wl,-E -g
 imcc/main.o
  blibl/lib/libparrot.a -lnsl -ldl -lm -lpthread
  -lcrypt
  -lutil -lrt -lgmp -lpthread -lm -L/usr/local/lib
  -licuuc -licudata -lpthread -lm
  
  ./parrot -o runtime/parrot/include/parrotlib.pbc
  runtime/parrot/library/parrotlib.imc
  
  ./parrot:  error while loading shared libraries: 
  libicuuc.so.30: cannot open shared object file: 
 No
  such file or directory
  make: *** [runtime/parrot/include/parrotlib.pbc]
  Error
  127
 
 Oddly enough, it works if I don't use a system ICU. 
 All tests pass normally, but I get 3 tests failing
 under JIT
 
 t/op/interp.t #7
 t/pmc/coroutine.t #10
 t/pmc/exception.t #19

ok, so I did investigate a little further.  make testj
works fine (all tests pass) if I don't pass the
--optimize flag to Configure.pl.  


Cheers
Joshua Gatcomb
a.k.a. Limbic~Region






Re: Python, Parrot, and lexical scopes

2004-10-21 Thread Dan Sugalski
At 12:51 PM -0500 10/20/04, Allen Short wrote:
On Tue, Oct 19, 2004 at 11:23:13AM +0200, Leopold Toetsch wrote:
 * the import statement is simulated too by storing the lexicals into the
   caller's frame. This would very likely be another Python opcode.

I should point out that this is much more like Python's semantics for
import * than Dan's overlapping-namespaces idea -- import really
means copy these bindings into my current module. In particular,
after doing from foo import *, subsequent additions or removals of
names to foo will not be reflected in the bindings of the module
importing them.
Right, but the overlapping namespaces stuff isn't for the import.
It's certainly possible that I overestimated the complexity involved 
in the semantics -- wouldn't be the first time, probably won't be the 
last. If lexicals actually work just fine for python I'm fine with 
tossing (or never getting around to implementing) the layered 
namespace stuff, though if its implemented and never used it has no 
performance penalties for anything but failed name lookups.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU failure on RedHat

2004-10-21 Thread Joshua Gatcomb
--- Joshua Gatcomb [EMAIL PROTECTED]
wrote:
 
 ok, so I did investigate a little further.  make
 testj
 works fine (all tests pass) if I don't pass the
 --optimize flag to Configure.pl.  

Ok, so optimizations break things - why not add more. 
Interestingly, adding more aggressive options make it
start working.

--ccflags='-march=i686 -O3 -s -funroll-loops
-fomit-frame-pointer' --debugging=0

everything works again (and is quite fast).

My guess was -march=i686

In IRC Dan said it might be a faulty assumption on our
part or on gcc's.  

The gcc that ships with RH9 was 3.2.2 fwiw

Cheers
Joshua Gatcomb
a.k.a. Limbic~Region






Re: ICU failure on RedHat

2004-10-21 Thread Leopold Toetsch
Joshua Gatcomb [EMAIL PROTECTED] wrote:
 All tests pass normally, but I get 3 tests failing
 under JIT

 t/op/interp.t #7
 t/pmc/coroutine.t #10
 t/pmc/exception.t #19

 ok, so I did investigate a little further.  make testj
 works fine (all tests pass) if I don't pass the
 --optimize flag to Configure.pl.

No problem here with --optimize and make testj. Yet another DeadRat
problem?

 Cheers
 Joshua Gatcomb

leo


Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Leopold Toetsch
Dan Sugalski wrote:
In that case I won't worry about it, and I think I know what I'd like to 
do with the interpreter, the register frame, and the register backing 
stack. I'll muddle it about some and see where it goes.
JIT/i386 is up to date now that is: it doesn't do any absolute register 
addressing anymore.

So what next:
* deprecate the usage of allmost all register stack push and pops?
I think we don't need them anymore. Register preservering is done as 
part of the call sequence. The only ops needed are IMHO: 
saveall/restoreall to support stack calling conventions. The question 
is: is saveall supposed to copy registers or just prepare a fresh set of 
registers.

* remove the {push,pop}{top,bottom}{i,s,p,n} opcodes from tests
* implement the new indirect register frame
Comments welcome,
leo




Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Jeff Clites
On Oct 21, 2004, at 4:13 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote:

And of course, lexicals and globals already have a storage, you don't
need to spill them.

I'm not sure that's true.
It should read: if there are lexical or global opcodes, lexicals and
globals have a storage.
Ah yes, true.
tied namespaces and such, it may not be legitimate to re-fetch a 
global
(ie, to fetch it multiple times, if the code appears to only fetch it
once) -- one could pathologically have a global whose value appears to
increase each time it's fetched,
Then there's something horribly wrong with that usage of tie: not the
value is refetched from the var - the var is refetched from the
namespace.
I think there'll be two types of tie--tied variables (like Perl has 
already), and tied namespaces (as supposedly some people really need, 
though I don't fully know why). But even without the above pathological 
case: with tied namespaces, a namespace fetch potentially has unknown 
overhead, and a compiler can't know if re-fetching is better than 
spilling. But on the other hand, maybe that's just part of the deal 
with tied namespaces--they may be fetched from more often than the code 
would imply, so the tied namespace needs to be prepared for that.

JEff


Re: Should Resizable*Array's be chunked lists?

2004-10-21 Thread Bernhard Schmalhofer
Leopold Toetsch wrote:
Is there a design reason for not having a chunked implementation? 
Then you could use Array/PerlArray anyway (modulo the PerlUndef).
As I understood it, the language specific PMCs are planned to become 
dynamic classes and are supposed to move to their respection languange/* 
directory.
So that would leave the standard PMCs, from pdd17, in the core Parrot 
distribution. I'm not sure where the Array PMC stands there. My idea was 
that PerlArray should inherit from ResizablePMCArray.

This would make the Array PMC obsolete. Or a new incarnation of the 
Array PMC could serve as an abstract PMC.
I imagine a 'extends' hierarchy of PMCs like:

  Default
 |
++-++---+
|  ||   |
  Hash   Array   Integer  Float
|  |
 +--+-++---+-+
 |||   | |
TclHash PerlHash   FixedIntegerArray  FixedFloatArray  FixedPMCArray
   |   | |
 ResizableIntegerArray ResizableFloatArray ResizablePMCArray
 |
 +---++
 ||
 TclArray  PerlArray
I left out the Boolean, String, Env, . PMCs.
This also depends wether there will multiple inheritance for C-Code 
generation with 'pmc2c2.pl'.


One thing, that is propably hard to cover, is the method sort inherited
from FixedPMCArray. It looks like 'qsort' needs a continous piece of memory
for sorting.
Well, qsort can't remain for various reasons. The main one is that it
doesn't allow additional params to be passed to the sort function.
Currently these are static globals, which makes the sort non-reentrant.
So I don't have to worry about it.
For performance reasons we might need some sort/merge algorithm anyway
and then it can work through chunks too.
CU, Bernhard
--
**
Dipl.-Physiker Bernhard Schmalhofer
Senior Developer
Biomax Informatics AG
Lochhamer Str. 11
82152 Martinsried, Germany
Tel: +49 89 895574-839
Fax: +49 89 895574-825
eMail: [EMAIL PROTECTED]
Website: www.biomax.com
**


Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Dan Sugalski
At 9:24 AM -0700 10/21/04, Jeff Clites wrote:
I think there'll be two types of tie--tied variables (like Perl has 
already), and tied namespaces (as supposedly some people really 
need, though I don't fully know why). But even without the above 
pathological case: with tied namespaces, a namespace fetch 
potentially has unknown overhead, and a compiler can't know if 
re-fetching is better than spilling. But on the other hand, maybe 
that's just part of the deal with tied namespaces--they may be 
fetched from more often than the code would imply, so the tied 
namespace needs to be prepared for that.
I'm OK with going on record as saying that the pir code generator and 
optimizer make no guarantees on the number of times something is 
fetched out of a global namespace or lexical pad. If a language wants 
guarantees, it can emit absolute code and not leave it up to the 
register spilling algorithm to decide.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Leopold Toetsch
I almost forgot: there is Fdocs/nanoparrot.c in CVS, which shows both 
the current and a possible future register layout (-DINDIRECT). You can 
time the normal function calling core with the mops benchmark by 
compiling it with -DMOPS.

Have fun,
leo


Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Dan Sugalski
At 5:34 PM +0200 10/21/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
In that case I won't worry about it, and I think I know what I'd 
like to do with the interpreter, the register frame, and the 
register backing stack. I'll muddle it about some and see where it 
goes.
JIT/i386 is up to date now that is: it doesn't do any absolute 
register addressing anymore.

So what next:
* deprecate the usage of allmost all register stack push and pops?
I think we don't need them anymore. Register preservering is done as 
part of the call sequence. The only ops needed are IMHO: 
saveall/restoreall to support stack calling conventions. The 
question is: is saveall supposed to copy registers or just prepare a 
fresh set of registers.

* remove the {push,pop}{top,bottom}{i,s,p,n} opcodes from tests
* implement the new indirect register frame
I think the next steps are:
1) Implement the new indirect register frame
2) Note in the calling conventions that saving and restoring the top 
register set isn't required
3) Get IMCC to skip the save/restore set

and we see where we go from there. I'm not inclined, yet, to drop the 
register stacks and the push/pop ops as there are certainly times 
when it's useful, being a quick way to spill and unspill a set of 
registers for a basic block.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Jeff Clites
On Oct 21, 2004, at 8:34 AM, Leopold Toetsch wrote:
Dan Sugalski wrote:
In that case I won't worry about it, and I think I know what I'd like 
to do with the interpreter, the register frame, and the register 
backing stack. I'll muddle it about some and see where it goes.
JIT/i386 is up to date now that is: it doesn't do any absolute 
register addressing anymore.

So what next:
* deprecate the usage of allmost all register stack push and pops?
I think we don't need them anymore. Register preservering is done as 
part of the call sequence.
Are we still planning to move the current return continuation and 
current sub, out of the registers and into their own spots in the 
interpreter context (in order to avoid the PMC creation overhead in the 
common case, etc.)? (Or, have we already done this?)

JEff


C89

2004-10-21 Thread Bill Coffman
I read somewhere that the requirement for parrot code is that it
should be compliant with the ANSI C'89 standard.  Can someone point me
to a description of the C89 spec, so I can make sure my reg_alloc.c
patch is C89 compliant?

Thanks,
- Bill


Re: C89

2004-10-21 Thread Michael G Schwern
On Thu, Oct 21, 2004 at 02:51:15PM -0400, Dan Sugalski wrote:
 At 11:25 AM -0700 10/21/04, Bill Coffman wrote:
 I read somewhere that the requirement for parrot code is that it
 should be compliant with the ANSI C'89 standard.  Can someone point me
 to a description of the C89 spec, so I can make sure my reg_alloc.c
 patch is C89 compliant?
 
 I don't think the ANSI C89 spec is freely available, though I may be 
 wrong. (Google didn't find it easily, but I don't always get along 
 well with Google) If the patch builds without warning with parrot's 
 standard switches then you should be OK. (ANSI C89 was the first big 
 rev of C after the original KR C. If you've got the second edition 
 or later of the KR C book, it uses the C89 spec)

Its available for the low, low price of $18.  Makes a great stocking stuffer.
Or frightening accessory this Halloween!
http://webstore.ansi.org/ansidocstore/product.asp?sku=INCITS%2FISO%2FIEC+9899%2D1999

(That's the C99 spec but it should be clear from it what was C89 and what's
been introduced with C99).


-- 
Michael G Schwern[EMAIL PROTECTED]  http://www.pobox.com/~schwern/
Beef Coronary


Re: C89

2004-10-21 Thread Jeff Clites
On Oct 21, 2004, at 11:51 AM, Dan Sugalski wrote:
At 11:25 AM -0700 10/21/04, Bill Coffman wrote:
I read somewhere that the requirement for parrot code is that it
should be compliant with the ANSI C'89 standard.  Can someone point me
to a description of the C89 spec, so I can make sure my reg_alloc.c
patch is C89 compliant?
I don't think the ANSI C89 spec is freely available, though I may be 
wrong. (Google didn't find it easily, but I don't always get along 
well with Google) If the patch builds without warning with parrot's 
standard switches then you should be OK. (ANSI C89 was the first big 
rev of C after the original KR C. If you've got the second edition or 
later of the KR C book, it uses the C89 spec)
Also, if you're compiling with gcc, then you can pass -std=c89 to the 
compiler to enforce that particular standard. (Apparently--though I 
haven't tried it.) I believe -ansi does the same thing.

JEff