Re: Inf and NaN

2006-07-31 Thread Bill Coffman

NegNan doesn't exist, except as a fluke of the representation (see link for
how they are represented).  A -NaN is the same as a NaN.  They both fail all
comparison tests, even NaN == NaN is false (unless your compiler optimizes
the comparison out).  Only difference is the way they are stringified, which
should be NaN, but stringification of NaN is non-standard.  Solaris
compilers will print -NaN, but gcc only prints nan.  Microsoft compiler
prints strange stuff.

Ron Blaschke [EMAIL PROTECTED] did some work in stringifying these
things, see [perl #38887]

http://en.wikipedia.org/wiki/IEEE_754

To summarize, if all the exponential bits are set, you get one of these odd
ducks.  If the mantissa is zero, then you get +/- Inf, based on the sign
bit.  If the mantissa is non-zero, you get a NaN, obviously there's lots of
ways ot make NaN's (if the sign bit is set, Solaris prints -NaN).
Fortunately, this is something you can rely on across architectures.  Only
thing is you need to know wheather it's big or little-endian.

The signaling NaN vs the non-signaling is architecture dependent, the
following link discusses AMD and Intel
http://en.wikipedia.org/wiki/NaN

It seems like you shouldn't need PMC for this, just the N-thingy (sorry my
parrot is rusty).  That is, these guys can be represented purely by floats
or doubles -- unless, of course, we're supporting non-ieee-754 machine
architecture.

On 7/31/06, Nicholas Clark [EMAIL PROTECTED] wrote:


On Sun, Jul 30, 2006 at 10:45:29PM -0700, jerry gay wrote:

 don't forget about negative-not-a-number, and the quiet (or signaling)

Ah yes. that oxymoron.

I've never yet seen the reasons for why it exists at all. Does anyone have
a URL?

Nicholas Clark





--
Bill Coffman
~~~_\\\|/
~~~-(@ @)
=oOO=(_)==OOo===


Re: [perl #38887] Result of INFINITY or NAN stringification is platform dependent

2006-07-26 Thread Bill Coffman

On 7/26/06, Ron Blaschke [EMAIL PROTECTED] wrote:


I'm looking into this issue and would like to ask for some advice.

I have added the following platform dependent functions.
int Parrot_math_isnan(double)
int Parrot_math_finite(double)

On Win32 the implementation is simple because the IEEE recommended
functions _finite and _isnan are supported.  I'm thinking about adding a
test for these functions and use them.  But what should happen if they
are not there?

Then I changed Parrot_sprintf_format in spf_render.c to use
Parrot_math_isnan and Parrot_math_finite to render NaN, -inf and
inf directly.

Is this a good way to solve this?



There is no platform independent way to produce NaN or Inf, so IMHO, you
did it the only way it can be done.

That being said, you can optimize by looking at the bits.  Wikipedia
explains IEEE-754 http://en.wikipedia.org/wiki/IEEE_754
First, you have to be aware if the machine is big or little-endian, second,
look at the exponent bits, if they are all set, this is not a normal number
(either +Inf, -Inf, or NaN, depending on mantissa and sign bit).  Unless
you're targeting non-ieee-754 systems like the VAX (an old computer), this
is garanteed to be portable.  I wouldn't bother with any platform specific
functions, as they are all different.  Just look at the bits.  (gcc uses
isnan, isinfinite, ... but I think you have to compile with the -c99 flag
and solaris does it a little different, etc).

I recently did something similar in PDL.  If looking at the bits is not
helpful, I can give you a little more detail (hint struct union).  If you
really don't want to look at the bits, I notice that gcc does have an
isnormal() function you can use, but again you have the portability
issues.

Personally, I'd like to see the i in inf capitolized or the Ns in
NaN lower case.  I like capital I for Inf.

Good luck,


Here's a sample (which should make its way into some tests):


type test.pir
.sub main
# -INF
set N0, 0.0
ln N1, N0

print N1
print \n

# +INF
abs N2, N1

print N2
print \n

# NaN
set N3, -1.0
sqrt N4, N3

print N4
print \n
.end

# trunk
parrot test.pir
-1.#INF00
1.#INF00
-1.#IND00

# local
parrot test.pir
-inf
inf
NaN

Thanks,
Ron





--
Bill Coffman
~~~_\\\|/
~~~-(@ @)
=oOO=(_)==OOo===


Re: [perl #36407] [BUG] imcc - register allocation

2005-06-28 Thread Bill Coffman
Isn't the register allocator pretty much minimized by the new architecture 
implementation? My understanding was that only temporary variables could 
benefit from it now. Perhaps the new changes aren't in effect yet? Just 
curious.

On 6/28/05, via RT Leopold Toetsch [EMAIL PROTECTED] 
wrote:
 
 # New Ticket Created by Leopold Toetsch
 # Please include the string: [perl #36407]
 # in the subject line of all future correspondence about this issue.
 # URL: https://rt.perl.org/rt3/Ticket/Display.html?id=36407 
 
 
 The register allocator doesn't properly track control flow, if a label
 has the same name as an opcode or a variable. The result is usually that
 registers are reused because the control flow change of these labels
 isn't considered.
 
 I tried to reduce the bug to a simple example and failed. Anyway
 languages/tcl/lib/commands/array.imc exposed it with labels named
 'subcommand' (like a var) and 'set' (opcode).
 
 See also r8468
 
 leo
 
 


-- 
-Bill


Re: Hackathon Day 2+3 Report

2005-06-13 Thread Bill Coffman
On 6/13/05, Chip Salzenberg [EMAIL PROTECTED] wrote:
 
 I've posted a report on the Hackathon Days 2+3 as a journal entry on
 use.perl.org http://use.perl.org:
 
 http://use.perl.org/~chip/journal/25182
 
 I'm not going to copy it here, but you probably want to read it, if
 only because it will point you to AN UPDATED PDD. Really.
 --
 Chip Salzenberg [EMAIL PROTECTED]


Chip,

This is really great to see! The whole continuations/regalloc thing is being 
addressed! It's great to see this kind of progress. Implementing this kind 
of register allocator could be tricky, for a number of reasons, but it's 
great that there is a direction now. OTOH, a minimally functional version of 
this architecture can be created by, as you say, just by register 
renumbering. Anything beyond that is premature optimization, at this point.

Thanks for moving on this, and Kudos to you and Leo!
Bill


Re: Attack of the fifty foot register allocator vs. the undead continuation monster

2005-06-12 Thread Bill Coffman
On 6/12/05, Curtis Rawls [EMAIL PROTECTED] wrote:
 
 [snip]
 It might also be helpful to take a look at other systems that also
 implement continuations:
 -Stackless Python (http://www.stackless.com/spcpaper.htm)
 -Standard ML (http://www.smlnj.org/doc/features.html)
 -Formalizing Implementation Strategies
 (http://citeseer.ist.psu.edu/danvy00formalizing.html)
 -Others (http://c2.com/cgi/wiki?ContinuationImplementation)
 
 -Curtis Rawls
 

I found this link helpful when trying to understand continuations. The code 
they use is not secure. It is basically using buffer overflow attacks as a 
programming technique? Well, you'll have to read to understand what I mean.

http://homepage.mac.com/sigfpe/Computing/continuations.html

The part that I understand about continuations, is that they wreak havoc in 
the control flow graph, as Chip initially said:
 Therefore, register allocation must allow for implicit flow of control
 from *every* function call to *every* function return ... or, more
 precisely, to where *every* continuation is taken, including function
 return continuations.

Although I think that is actually sugar coating it a bit. Continuations can 
be taken from within any sub, and possibly even when appending to a list, if 
you're using lazy list eval.

This, I found out when my changes to reg_alloc.c broke the continuations 
tests. The changes were not the problem, as Leo confirmed, it was the way 
the allocator and the continuations interacted. 

There are also some issue about changing variables through continuations, 
like you can only change PMC integers, which point to some fixed location, 
containing the int, i.e. you can only use references. These parts make less 
sense to me, but the issue is basically, some registers are restored, and 
some are kept safe? Well, that issue hasn't been quite resolved either.

If I thought the register allocator had a chance of working, I would 
probably find the time to start working on it again. But probably there are 
plenty of people who would be happy to do so, if this issue could be 
resolved.

-- 
-Bill Coffman


Re: Regarding Google's Summer of Code 2005

2005-06-03 Thread Bill Coffman
Hello,

I worked on the parrot register allocation problem a few months ago. The 
problem I encountered was that of continuations based dependencies. There 
were architectural issues about continuations, and Leo proposed changes that 
would fix this, but Dan said it was too late for architectral changes. 
Basically the existence of continuations adds a lot of dependencies that 
don't want to go away, defeating most of the benefit that one can get from 
register alloction. 

I am referring to parrot's use of full continuations, which allow a snap 
shot of the current runtime stack. Any function can take a snapshot or 
restore the stack to a previous snap shot. This makes traditional register 
allocation problematic.

As far as I can tell, the continuations depency problem was not solved, and 
in fact the existing register allocator is also in conflict with the 
continuations based dependencies. (Unless something has been done in the 
last few weeks.) The resolutions seems to be not to add tests that break 
things. Thus, continuations based tests are few, and unstable. Changing the 
way the registers allocate, even though they still don't violate the 
interference graph, will break some tests.

There are several threads in the parrot mailing list that discuss the 
continuations problem. I hoped to be able to address parrot register 
allocator again, at some point, but it could be a while before I get to 
that. In the mean time, search the perl6-internals mailing list for 
continuations and register allocation. I'm sure you'll find it.

The above not withstanding, I think there are still quite a few things that 
can be done to improve register allocation in parrot. If anyone is 
interested, I'm sure I could dig up my code.

Bill Coffman

On 6/3/05, Curtis Rawls [EMAIL PROTECTED] wrote:
 
 On 6/3/05, Leopold Toetsch [EMAIL PROTECTED] wrote:
  Dheeraj Kumar Arora wrote:
   I m interseted in one of LLVM project
   Implement well-known optimizations in PIR compiler (SSA - register
   allocation)
 
  [ I have answerd this ]
 
  Parrot is a register based virtual machine with 32*4 registers. There
  are a lot of studies WRT optimizations and register allocations for
  compilers and hardware CPUs but probably all these things can be applied
  to Parrot too. See also imcc/cfg.c and imcc/reg_alloc.c for existing 
 code.
 
  I'm not a computer scientist nor am I able to follow most of the papers
  regarding various optimization techniques or the needed infrastructure
  to implement it
 
  Anyway, there are folks on our mailing list like Curtis Rawls, who could
  probably better summarize what's needed and what should be done.
 
   Can Any send me the details?
 
  Yes please, folks.
 
  leo
 
 Wow, thanks for the link to the Summer of Code, that looks like fun!
 (I know I'll be thinking about my proposal all day at work : ).
 
 I don't have my TODO list with me, I will try to post it later today.
 
 I have been working and thinking about improvements to the Parrot/IMCC
 optimizer for a while now. Implementing SSA is definitely at the top
 of my list, because it simplifies a lot of optimizations and makes
 some others possible. The biggest challenge is to graft such a big
 change onto a working system, with small changes at a time.
 
 The reference to LLVM is interesting, since it has similar goals to
 Parrot, but is research-oriented. I bet there are some good
 techniques that could be learned (and borrowed : ) from LLVM.
 
 -Curtis Rawls
 



-- 
-Bill


Re: Register Allocation and Continuations problem definition

2005-06-03 Thread Bill Coffman
I hope that no one misunderstands when I say this. It is not meant to be a 
criticism, but rather it is my understanding of the problem, put very 
directly. I may be wrong, and in many ways, I hope am.

There is a fundamental flaw in the Parrot architecture. In it's current 
form, continuations and register allocation cannot coexist. The solution is 
to do away with the register allocator, or to do away with continuations. 
This can be done, perhaps by saying that certain functions won't be subject 
to continuation snapshots or calls. Then register allocation can be done on 
those routines. Or compiling the whole program with a switch of on or off.

When I say do away with the register allocator, I mean that nearly all 
registers will interfere, so essentially each variable gets a register. The 
extensible register stack is the best way to implement this strategy. If 
there are long sections of code with lots of temporary variables, then 
register allocation might be of some use. But most variables will interfere. 
Implementing this would be an interesting project. It would be quite 
different from a standard register allocator. It might even be worth writing 
a paper on the techniques involved.

Turning off full continuations, is a another matter. I don't know what 
impact that would have on parrot. It would be relatively straightforward to 
write the allocator with this approach.

There are other issues involved, regarding how registers change when called. 
What is preserved, and how. Matt touched on some of these issues. My main 
concearn in attempting to implement the register allocator was to find the 
minimal control flow graph (cfg) which helps to provide the variable 
interference graph. 

This is my understanding of the problem. Matt wants this thread to be a 
problem definition, so these thoughts seem appropriate for this thread. I 
hope no one is offended by my bluntness, but if this problem is to be 
solved, it must be understood.

Bill Coffman

ps: a couple important threads on this issue were
http://groups-beta.google.com/group/perl.perl6.internals/browse_frm/thread/3a61fe5e97d17389/0603ff13ca52f0ff?_done=%2Fgroup%2Fperl.perl6.internals%3F_doneTitle=Back+to+topics_doneTitle=Backd#0603ff13ca52f0ff
http://groups-beta.google.com/group/perl.perl6.internals/browse_frm/thread/214157bf29879710/4c15aa4a1f56c20a?_done=%2Fgroup%2Fperl.perl6.internals%3F_doneTitle=Back+to+topics_doneTitle=Backd#4c15aa4a1f56c20a


On 6/3/05, Matt Fowles [EMAIL PROTECTED] wrote:
 
 All~
 
 On 6/3/05, Bill Coffman [EMAIL PROTECTED] wrote:
 
  There are several threads in the parrot mailing list that discuss the
  continuations problem. I hoped to be able to address parrot register
  allocator again, at some point, but it could be a while before I get to
  that. In the mean time, search the perl6-internals mailing list for
  continuations and register allocation. I'm sure you'll find it.
 
 I would actually very much like to see this issue moved on. In the
 hopes of getting everything up to speed quickly I will try and
 summarize all of the discussions and suggestions that have been made.
 I am NOT trying to advocate any particular solution (yet), I just want
 to make sure that everyone is on the same page.
 
 I propose that we use this thread to ensure that we are not talking
 past each other. Discussions about specific solutions should and
 religion should have a seperate thread.
 
 
 PROBLEM:
 
 Continuations can be invoked repeatedly. Whenever a continuation is
 reinvoked, the registers must be in a defined state or the code will
 behave incorrectly.
 
 
 BASIC ANALYSIS
 
 The actual state of the registers does not matter as long as it is
 well defined.
 
 A) If the registers are defined to be garbage, then every continuation
 will start by returning them to the state it expects.
 
 B) If the registers are defined to be preserved, then every
 continuation merrily chugs along.
 
 Unfortunately these each have issues and as such much debate ensued.
 
 
 SUGGESTED SOLUTIONS
 
 1) Context includes registers
 
 + continuations don't have to preserve specifc registers
 + register allocator can ignore continuations wrt spilling
 - copying overheard
 - value based registers (IN) return to old values
 - pointer based registers (SP) cannot have their pointers moved or
 require double indirection
 
 
 2) Return Continuations include registers, non don't
 
 + register allocator can remain mostly ignorant
 + only non return continuations need to worry
 - promoting a return continuation will force copying as plan (1)
 
 
 3) register are not restored
 
 + simple to explain
 - register allocator must add many extra interference edges
 - largely prevents reuse of register
 
 
 4) variable sized register frames
 
 + never have unused registers
 + no need for a register allocator
 + could correspond directly to scratchpad
 - more complicated
 - no register reuse
 - large architectural change
 - more custom allocation (can't pool

Re: continuation enhanced arcs

2004-12-01 Thread Bill Coffman
All,

As with most technical problems, fully specifying them, is often half
the battle.  In this case, I think we're getting close to
understanding the issues at least.

[please treat all statements as possible questions]

First, consider my original post in this thread:
http://www.nntp.perl.org/group/perl.perl6.internals/27383
To summarize:
  Full continuations are powerful but expensive.  They are like hidden
goto's and add arcs to the control flow graph.  This causes more
registers to interfere.
  Proposed Solutions:
- Accept the arcs and probable spilling.  We still have the regions
between sub calls.
- Put labels on certain subs that denote they might call or be called
by a continuation.
- Pragmas indicating ranges where continuations won't be called (or will be).
- Restore registers from lexicals somewhere, after each function call.
- Accept incorrectness, as the current implementation does (discovered
when new allocator failed 2 tests, and Leo saw the problem).

Next, consider Dan's message, Lexicals, continuations, and register
allocation:
On Tue, 30 Nov 2004 10:22:29 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
 So far as I can tell there are two cases here:
 
 1) A return continuation
 
 2) A regular continuation
 
[snip return contuations, which are far from solved, but likely a
subset of the below]
 
 The second case, where code takes an arbitrary continuation that
 returns to location X, wherever X is. I'm missing the problem there,
 too. Assuming there's a way to note the destination to the register
 allocator (with those points being places where all registers must be
 assumed to be trash) I'm not seeing the problem either. There are
 only two cases here, where the destination is marked and where it
 isn't. If it's marked, the register allocator assumes things are
 dirty and loads everything, which is fine. If it's unmarked, the code
 has essentially shot itself and everything quietly goes to hell since
 you've lied to the register allocator and you shouldn't do that.
 Which is fine too. Don't Do That.

If I read the above correctly, Dan is advocating the strongest
restriction of all, which is to specify any arcs that might be added. 
That is, specify all sub_i - sub_j, where - means that a
continuation saved in sub_j is invoked by sub_i.

To reclassify the reasonable solutions:
1a. Concentrate on restoring registers after calling -- probably using
the lexicals.
1b. Possibly increase the size of the register set, and/or try to use
lexicals or globals (I think of this as a kind of improved spilling)..
2a. Insert all the arcs into the CFG, which will increase spilling, but
2b. Reduce the number of arcs in the CFG, by introducing various
annotations on subs indicating when they do or do not save or invoke a
continuation.  Especially target library functions.

Currently, I'm trying to work on 2a right now, but my day job is
making that tough lately.  Would like to see more attention payed to
2b, by those who care about the issue.  In particular I'd like to see
HLL designers say, Yeah, 2b looks like a great idea!, and maybe a
few, yeah, let's specifically implement the following...

Then, there are additional problems to contend with...

On Wed, 1 Dec 2004 09:49:34 -0800, Jeff Clites [EMAIL PROTECTED] wrote:
 But so it sounds like I and N registers are a bit of a waste then, no?
 They won't really be used. Even in your my int @foo case, you could
 have an optimized backing store for the array, but you'd have to
 on-the-fly create a PMC to hold any values you pulled out of the array,
 e.g. for a subsequent x = @foo[1] (in Perl6 syntax)--x would have to
 hold a full PMC there, it seems, so nothing there would end up in an I
 register (except as in intermediate in immediately creating the PMC).
[snippage of examples]

If I understand right, PerlArray's would be used for my int @foo. 
If you want to use these values, you have to copy back and forth from
I* registers, for the most part.

The ISN registers can be used in the nether regions, between sub calls
(even the lower half of the registers can be used there).  Subs which
are garanteed not to call or save continuations are safe (see 2b
above), and ISN registers can be used while crossing those.

However, we now know that only P registers can cross unsafe subs,
unless ...  well, there's probably certain cases where ISN's can be
used.  That question would take more thought.

My sense is that we want to support continuations, but we don't want
to be crushed under the heavy load.  Perhaps we can adapt the policy
that if one is brazen enough to use full continuations, then s/he
should be expected to at least inform the compiler about it.

Bill


Re: continuation enhanced arcs

2004-11-30 Thread Bill Coffman
On Tue, 30 Nov 2004 14:45:39 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 11:20 AM -0800 11/30/04, Jeff Clites wrote:
 % cat continuation6.ruby
 def strange
  callcc {|continuation| $saved = continuation}
 end
 
 def outer
  a = 0
  strange()
  a = a + 1
  print a = , a, \n
 end
 
 Through the joys of reference types, a will continue to increase
 forevermore, assuming the compiler hasn't incorrectly put a in an int
 register. (Which'd be wrong) 

I can see that there is true magic in the power of using references in
this way.  Nonetheless, how can the compiler figure out that it can't
use an integer here?  The compiler should use integers when it can,
but it sounds like you are saying that when a variable crosses a sub
call (which might invoke a continuation) it will then have to be a PMC
or String to do the right thing.

 Remember the PMC and string registers
 hold pointers to pmc/string structures, which is all we're preserving
 -- the *pointer* to the structure, not the contents of the structure.
 (Though that's an interesting thing to do for other reasons, like,
 say, transactions, we're not going to be doing that) The contents can
 change over and over without the register itself ever changing.
 
 
 
 # these two lines are main
 outer()
 $saved.call
 

Bill

 JEff
 
 --
 Dan


Re: continuation enhanced arcs

2004-11-29 Thread Bill Coffman
Hello All,

I haven't been able to devote much time lately, but this week I hope
to modify the register allocator to deal with these arcs, as well as
enhancements to the spill code.  I expect that in some cases it will
have a definite impact on speed, and others it won't. Most of the test
cases (make test) don't have that many variables, for instance.  I
hope to get this patch out in about a week.

There has been much discussion about how these arcs will impact the
performance, complexity, etc.  What we need are some hard numbers, so
as to compare results.

Perhaps Leo can impliment his architecture, and compare to this. 
Really, I think that Leo's idea is a form of spilling in itself, but
without the extra load/store instructions.  That kind of spilling is
much better.  HLL compilers can optimize this further, by adding
annotations to subs, saying no full continuations called by any
subroutines (actually, only one subroutine calling a full continuation
would not impact the register allocator).  Analysis could also be
helpful in certain cases.

Bill


Re: Lexicals, continuations, and register allocation

2004-11-24 Thread Bill Coffman
On Wed, 24 Nov 2004 09:39:27 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Bill Coffman wrote:
  On Tue, 23 Nov 2004 23:26:39 +0100, Leopold Toetsch [EMAIL PROTECTED] 
  wrote:
 
  Keep in mind that you don't actually have to add all those CFG edges.
  You already know precisely the effects of adding them.  All
  non-volatile symbols (those crossing subs that might make continuation
  invocations) are garanteed to interfere.  This is garanteed to be a
  clique in the interference graph.  No need to actually add the CFG
  edges to know how the interference graph is effected.
 
 Possible, but just another special cased exception. With that you get
 two possible interferences of different kinds, with additional coding
 overhead ...

Instead of n*(n-1) arcs for n continuation passing subs (both to and
from), we can simply add a pseudo instruction.  All n sub calls have
arcs to that pseudo-instruction, and out from it, to the instruction
after the sub call.  This means only 2*n arcs (both to and from).

Each continuation calling sub becomes a kind of wormhole.  Any symbol
crossing it, also crosses the pseudo-instruction.  What makes it a
little complicated is how do these ubiquetous symbols interact with
the non-ubuiquitous?  Those arcs are needed for this.



 I've no confirmation of a compiler writer that its possible.
 Annotating PIR can only work for nested closures. If libraries are
 involved you are out of luck.

It was Larry Wall who suggested the pragma in the long thread,
Continuations, basic blocks, loops and register allocation . 
Although his message is somewhat genaral, he indicates a prefernce for
pragmas to turn off certain features that slow things down for
correctness.  [gmail works great for finding stuff like that]

Bill


Re: [perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols

2004-11-23 Thread Bill Coffman
Wait, I just thought of a huge change. 

Dan, Does the patch you have implement Leo's U_NON_VOLATILE patch?  If
so, that restricts from 32 to 16 registers, in various cases for
*non-volatile* symbols (did I get that right?).  Anyway, the symbols
that cross sub calls can only use 16 registers, where they used 32
before.  That could have a huge effect in that more variables are
spilling.

Well, if you don't have that patch, then back to the drawing board.

~Bill

On Tue, 23 Nov 2004 11:55:47 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 5:40 PM +0100 11/23/04, Leopold Toetsch wrote:
 *But*, I've looked again at the new reg_alloc.c code. It seems to have a
 piece of code with qubic order in registers, which is for sure killing
 all performance advantage it has for a few hundreds of symbols.
 
 So the scales better to more symbols has some limits when more
 reaches 10K ;)
 
 I'll hold off then. I can't picture anything that -O3 could do that
 wouldn't get swamped by a cubic time algorithm.
 
 
 --
 Dan
 
 --it's like this---
 Dan Sugalski  even samurai
 [EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk



Re: Lexicals, continuations, and register allocation

2004-11-23 Thread Bill Coffman
On Tue, 23 Nov 2004 23:26:39 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Matt Fowles wrote:
 
  Have we seen that this actually destroys us?  Meaning, if we add the
  extra CFG arcs, do we start spilling like mad?  If not, this is much
  ado about nothing.
 
 Please first have a look at Dan's recent posting about Evil Sub. Then
 estimate, how many subs may be called in 14000 basic blocks. For 100
 subs only you get ~1 more edges ...
 
 Second, having this arcs means just that in that range i.e. from the
 first subroutine to the last, you can't reuse a register around a call.
 Question: how many from ~23000 registers might be effected. Please note
 the currently rather low spill count despite the huge register usage.

Keep in mind that you don't actually have to add all those CFG edges. 
You already know precisely the effects of adding them.  All
non-volatile symbols (those crossing subs that might make continuation
invocations) are garanteed to interfere.  This is garanteed to be a
clique in the interference graph.  No need to actually add the CFG
edges to know how the interference graph is effected.

This brings up the possibility of non-volatile symbols that cross subs
that are garanteed *not* to invoke continuations.  Those would not
necessarily be part of the clique, but would be non-volatile (forced
into the upper half of registers).  The register allocator could
handle that the way it currently does (the one I'm currently working
on, that is).

It bothers me that this discussion is not including the concept of
subs that don't call/invoke continuations.  Remember the previous
posts about adding a label, or setting a pragma?  Also, if that sub is
defined in the same imc source code, we can analyze it to see if there
is a continuation there or not.

Another interesting thing about this problem is that these new CFG
edges are rarely, or at least with low probability, ever travelled. 
That can also be used to our advantage I think.

~Bill


Re: Lexicals, continuations, and register allocation

2004-11-23 Thread Bill Coffman
On Wed, 24 Nov 2004 04:55:24 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Matt Fowles [EMAIL PROTECTED] wrote:
  ...  However, if a continuation restores registers to the data they
  had when the continuation was taken, then all of the registers will
  contain the things that exactly as the original allocator expects
  them.
 
 Yes. We had that scheme until the indirect register frame addressing.
 It was:
 
savetop
invokecc
restoretop
 
 It was too slow - remember a factor of 5!

I think the problem was that you were saving 32*4 = 128 registers! 
Even if you only used 5 or 10.  Thats a factor of ten right there, I
bet.  The right way to have done this was

savetop(n1,n2,n3,n4)
invokecc
restoretop(n1,n2,n3,n4)

Where n1..n4 are the number of registers to be saved, or better yet,
the length of the register array to save.  That way, only the critical
registers would be saved, and that could be minimized by the
allocator.  Then your memcpy's would have been optimized, but saving
600 bytes of unused territory was never going to work.

Bill

 leo



continuation enhanced arcs

2004-11-22 Thread Bill Coffman
Hello all,

I would like to address the problem of how continuations have affected
the register allocator.  This problem came to the public eye, when two
tests that the new register allocator failed.  Leo's analysis was
essentially that the allocator was fine, but that our handling of
continuations was incomplete, as far control flow and regiester
allocation went.

First, I think that it is important to acknowledge what is actually
agreed upon.  Exceptions are handled as a special case of
continuations, and as such, this is a subproblem of the general
continuations problem.  Leo's proposed solution, below seems to have
been accepted as workable, correct, and fast enough.

 Quoth [EMAIL PROTECTED]:
 1a) for exceptions:

  set_eh handler, catch_label

   This is just a small adaption of the sequence of installing an
   exception handler.
   It depends a bit, if exception handlers are inline or nested
   closures or both.

Second, is the larger, *unresolved* problem of continuations in
general.  Bear with me, as I am just barely beginning to understand
continuations myself.  Comments on my observations are more than
welcome.  Especially on my mistakes.

Continuations are a kind of goto.  When you invoke a continuation,
it moves the program counter to where the continuation was taken, and
restore most state (but not registers).  This scheme has provided a
wealth of power, with acceptable performance for a variety of
circumstances.  One neglected consideration, when implementing this
strategy, is the effect on the control flow graph.

sub1()  ---+  -+
... ||
sub2()  +-+ |
...| |
sub3()  ---+-+

In the continuations enhanced control flow graph (the control flow
graph (cfg), after one considers the effects of continuations), it is
possible for any subroutine to jump back to the exit point of any
previous subroutine.  Leo has drawn the picture a few times, as above,
showing the complex web of links from any sub to any sub.  It is the
control flow graph which provides the register allocator with the
information it nees to determines which variables interfere.  They
interfere if they could be active at the same time.  If they
interfere, they must be assigned different registers.

Now, consider a slightly different topic: According to recent
decisions, the lower half of the registers (0-15) for all types, will
not survive a subroutine call.  This has created what we can call
volatile, and non-volatile symbols (variables).  Non-volatile symbols
do not cross subroutine calls, and can therefore be put in the lower
register half.  Volatile symbols cross sub calls, and cannot be put in
the lower half.  (Think the volatile keyword from C, which means
don't mess with this value, well it sort of means that.)

Here's the funny thing -- in the new continuations enhanced flowgraph,
every volatile symbol in a subroutine will interfere with every other
volatile symbol.  Here's an informal proof.  Remember all those arcs
that Leo drew?  For instance, if v1 crosses sub1() and v2 crosses
sub2(), there is an arc from sub2() back to sub1() [wolog, sub1 is
first].  So v1 is live when v2 is live, and the two symbols interfere.
 All those arcs extend the life range of each symbol to that extent. 
Therefore, all volatile symbols in the sub will interfere.

What this means is that if your subroutine has more than 16 volatile
symbols of a given kind, there will be spilling.

Here's a list of proposals I have seen to deal with that issue:

1) Let the above stand (I think I was the only one who said that, and
that was before I had really thought about it).

2) Let the above stand, but allow for a pragma to turn off the arcs. 
Thus, I can create a range within the sub, that turns off all those
pesky continuation enhanced arcs within that range.  HLL compiler
writers will be able to set this pragma, knowing that none of it's
subs use continuations.  (Larry Wall's suggestion.)

3) Specifically a label for each subroutine that *might* set a
continuation.  Others have said that continuations should not be the
common case, if a continuation could be called, then labelling it as
RESUMABLE, isn't such a bad idea.  (Leo's idea.)
3a) The natural extension of that is to also have label, like
INVOKEABLE, for subs that might invoke a continuation.

4) Fetch all from lexicals/globals after a function call.  (Another
of Leo's.)
4a) Use lexicals instead of registers. (Not sure who mentioned it, but
it seems like just spilling everything, using a slightly different
spilling scheme.)
4b) Conditionally fetch ... (Ben's suggestion is one that deserves
some further thought.  Leo doesn't think it could work, and he
understands the problem better than I, but his answer is not a
difinitive impossible, so that gives some hope.)

5) Let the symbols interfere.  The chances are that the program will
run correctly anyway.  Let the register allocator do it's thing, and
every 

Re: silent effects of opcodes

2004-11-19 Thread Bill Coffman
On Thu, 18 Nov 2004 08:13:02 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Bill Coffman [EMAIL PROTECTED] wrote:
  ps: I'm making progress on grokking the cfg and register renaming
  stuff.  Will let you know.
 
 This needs an SSA graph of the data flow?

Static Single-Assignment (I had to look it up).  My approach is to
consider something you once said, which was to the effect of ensuring
the accuracy of the analysis.  I've got some code to do a fairly
simple depth first search per symbol.  It is slower (I presume), but
easier to understand.  Later, I plan to (or someone else could)
implement the data flow stuff with bit vectors and SSA to determine
reachability, etc.  In this way, I can compare the simple analysis
with the complex analysis, and convince myself, and anyone else that
the code is correct.  All this will be done even before verifying the
program runs correctly (passing make test).

Advanced_Compiler_Design__Implementation by Muchnick is my main
resource.  Register renaming is easy enough when you have DU (def-use
(def means value is written, use means value read) and UD chains,
and especifically webs (which connect defs to uses).  Webs are
components of def-use chains (using graph theory language).  They are
effectively different variables, and can be provided with separate
symbol names, which is actually very similar to what is done during a
spill.  My depth first search algorithm finds these too.  Like I said,
I wanted to start with something simple that is correct, and move
forward with the more optimized forms later.

Another thing I'd like to do, is throw in is a randomizer, to change
the way the allocator assigns registers.  Considering all the cruft
that was uncovered when the algorithm was changed, it might be a good
idea to have a debug feature that selects registers differently each
time it runs (each allocation should be valid, of course).  This could
help flesh out any other faulty assumptions in the parrot compiler.

 BTW, looking at analyse_life_block() it seems that this allocates
 unconditionally a Life_range structure. As these are O2 in (n_symbols *
 n_basic_blocks) we could safe huge amounts of memory, by defining that a
 missing life block propagates the previous one. Dunno if its possible,
 though.

That's definitely not very efficient.  Plus it's mallocing each
Life_Range, which is usually at least another 8 bytes per chunk, and
then a lot of potential thrashing when freed.  It seems I've seen a
lot of less than optimum code like this, and I suspect there's a lot
more like it.

Solving these things will bring me that much closer to the ultimate
goal of defeating Dan's evil code. ;)

 leo
 

Bill


Re: silent effects of opcodes

2004-11-17 Thread Bill Coffman
On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 I've now (locally here) extended Bill Coffman's register allocator by
 one subroutine that actually decides to use non-volatiles or volatiles
 according to pdd03. All variables that are live around a subroutine call
 are e.g. allocated from R16..R31.

Interesting.  I'd like to see it. 

Regarding pdd03, I am still not clear how it should be interpreted. 
The current pdd03, as well as the previous one, both seem to indicate
that registers 0-15 are likely to be overwritten, and anyone making a
call, should save those registers if they still want them.  The issue
with PIR Code, is that the author won't know which of their symbols
are mapping to registers about to be killed.  So, as previously
discussed, those registers will have to be hands off for the register
allocator.  That is essentially how the old and new alloctor have been
working.  But this doesn't have to always be the case.

* If the subroutine being allocated is a leaf (with no sub calls),
then all registers should be available.
* Registers P4, S1-S4, N0-N4 are free for allocation, regardless.
* It seems like it would be simple enough to provide a compiler
hint, to let the allocator know if the subs it calls are using the
parrot convention or not, or how many of the R5-R15 it will need. 
From this hint, a bit mask saying which registers are available could
be constructed.  This can then be used as part of a static analysis,
and can be incorporated into the unit data structure, or passed as a
separate parameter to imc_reg_alloc().

I wouldn't think this last idea would be considered a change to the
calling convention, but rather as an optional optimization prototype. 
Not part of pasm.  Dan, would something like this be allowed?

~Bill


Re: silent effects of opcodes

2004-11-17 Thread Bill Coffman
So to generalize.  The following registers are available, under the
following conditions:

* Registers R16-R31 are always available for the allocator.
* Registers R0-R15 are available between sub calls.  That is, for any
symbol, whose life range does not cross a subroutine.  (This implies
that all registers are available if no subs are called.)  Since we
have no way to determine if a sub is using those or not, any sub call
will be assumed to possibly use R0-R15.  Furthermore, even though we
know there are certain registers in that range, which are unused by
the calling convention, we will still not use them through a sub call
for security reasons.
* [NEW] If register 15 or below is used, it should be cleared out,
ZEROED, after it's last use and before the next sub call.  This is for
security reasons.  Obviously, these registers will not be the first
choice to use.
* Availability of these registers is subject to the rules for using
the Parrot opcode Csetp_ind Ix, Py, which were (are being?) worked
through by Leo.

Other observations:

* Leo introduced a flag on the symbol, to indicate if it's volatile or
not.  These will be eligible for R0-R15 (volitile registers?).
* From new allocator bugs, and analysis, we've discovered that
exceptions cause new control flow edges, not previously considerd. 
This case is being reworked by Leo?  to provide missing CFG edges,
through a minor change in the try block declaration.  (thread
Continuations, basic blocks, loops and register allocation)
* The case of continuations has not been solved with respect to
register alloction.  Leo's RESUMEABLE: label might provide help here. 
In any case, we can expect to see some additional edges being inserted
though.  (also thread Continuations, basic blocks, loops and register
allocation)

Did I miss anything?


Re: silent effects of opcodes

2004-11-17 Thread Bill Coffman
 * [NEW] If register 15 or below is used, it should be cleared out,
 ZEROED, after it's last use and before the next sub call.  This is for
 security reasons.  Obviously, these registers will not be the first
 choice to use.
 
 Nope -- this isn't the job of the register allocator. We aren't
 leaving security issues up to bytecode except in a very few, limited
 cases. (All involving subroutines with elevated security credentials
 which the sub needs to drop after using things they allow)

Okay, looks like I misread an earlier message of Dan's.  The reason
that we cannot use R0-R15 through a sub, is that they are shredded. 
The values are not preserved through the sub call.

Since I understand the item about allocating registers between sub
calls, I can probably implement that change, as I work through the
control flow/data flow analysis.

Sounds like everything else is okay.  We're just missing a few CFG
arcs from the continuations stuff, which I'll let you all worry about.
:)

Bill

ps: I'm making progress on grokking the cfg and register renaming
stuff.  Will let you know.


Re: IEEE 754 double floats

2004-11-16 Thread Bill Coffman
Another way is to count the bits, as in the following:

.sub _main
N1 = 1
N2 = 0.5
I0 = 0
REPEAT:
I0 = I0 + 1
N2 = N2 / 2
N3 = N1 + N2
ne N1, N3, REPEAT

print I0
print  bits precision\n
end
.end



On Wed, 17 Nov 2004 01:13:21 +1300, Adam Warner [EMAIL PROTECTED] wrote:
 Hi Leopold Toetsch,
 
  PI is (very) approximately:
3.1415906535896946927266526472521945834159851074218750
   ^
3.141592653589793238462643383279502884197169399375105820974944
 
  You might probably want to run more iterations ;) And you'll never get
  60 digits out of long doubles.
 
 I appreciate that Leopold. I thought the zeros indicated where the stored
 value cuts off. But with 52 significant figures at ~3.3 bits per figure
 that's about 172 bits, which indicates they would have to be 256 bit
 floats when including the exponent (my mistake in calling them 128 bit)!
 
 In other words sprintf is printing trailing garbage:
 
 .sub _main
 set N1, 2.0
 set N2, 3.0
 div N1, N1, N2
 new P0, .PerlArray
 set P0, 1
 set P0[0], N1
 sprintf S0, %.60f\n, P0
 print   S0
 end
 .end
 
 On the CVS version I compiled this prints:
 0.2965923251249478198587894439697265625000
 
 On this win32 release (running on a different machine):
 http://www.jwcs.net/developers/perl/pow/download/pow-0.1.1-release.zip
 parrot prints:
 0.3000
 
 Regards,
 Adam
 



Re: Another issue with pdd03

2004-11-14 Thread Bill Coffman
PDD03: Responsibility for environment preservation
PDD03: 
PDD03: The caller is responsible for preserving any environment it is interested
PDD03: in keeping. This includes any and all registers, lexical scoping and 
PDD03: scratchpads, opcode libraries, and so forth.
PDD03: 
PDD03: Use of the savetop opcode is recommended if the caller wishes to save
PDD03: everything, and the restoretop opcode to restore everything
savetop saved.
PDD03: This saves off the top 16 of each register type, leaving the bottom 16
PDD03: registers, which generally contain the return values, intact.

The way I read it, paragraph one implies that when you print P5 after
calling foo(), you are expecting to get the return value.  You didn't
save and restore register P5, so you wanted foo() to do something to
it.

The above docs may be slightly ambiguous.  The first paragraph says
that you have to save everything.  The seconds says that
savetop/restoretop commands are there to help the user, by allowing
them to take care of registers 16-31, but what about 0-15?  Is there
an implication that you don't have to save those?  Based on the way
people are coding IMC, that seems to be the case.

Both the old and new register allocator are kind of hands off the
first 16 registers.  Recall that I was asking about this some time
ago, and found that if the allocator tries to use the bottom half
first, chaos ensues.  Our convention then ... our interpretation of
the calling convention, is that we try to allocate only the top 16
registers.  If more are needed, the new allocator starts from R15, and
go on downward.  The old one just went up from R0.

Note that PDD03 is specifying a dynamic calling convention.  Because
of this, the register allocator cannot use the convention to find what
should be saved, and what it can use.  If this were indicated at
compile time, the register allocator could keep hands off only the
used registers.  As it is, we may need to do as you suggest, and leave
the bottom 16 registers for parameter passing, and the top 16 for
local symbols and register allocation.  If so, we will obviously need
a better register allocator.  One exception to this.  If a sub calls
no other subs, it can use all 32 registers.  Also, P4, S1-S4, N0-N4
seem to be free.

~Bill

On Sun, 14 Nov 2004 18:32:50 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 As outlined in the analysis of dumper.t failures with the new register
 allocator, we have another problem with current calling or better return
 conventions.
 
 Given this simple program:
 
 $ cat ret.imc
 .sub main @MAIN
  P5 = new PerlString
  P5 = ok\n
  foo()
  print P5
 .end
 .sub foo
  .local pmc ok
  ok = new PerlString
  ok = bug\n
  .return(ok)
 .end
 
 $ parrot ret.imc
 bug
 
 The usage of the register P5 in main isn't invalid: the register
 allocator might have (in a more complex program) just not found any
 other free register. And as the main program just calls a function in
 void context the register P5 could have been used.
 
 Defining now that P5 has to be preserved in main, because it's a
 possible return result of foo() and therefore may be clobbered by foo()
 is meaning, that we have effectively just 16 registers per kind
 available for allocation around a function call.
 
 If the latter is true according to current pdd03 then we are wasting
 half of our registers for a very doubtful advantage: being able to pass
 return values in R5..R15.
 
 leo
 



Re: Continuations, basic blocks, loops and register allocation

2004-11-14 Thread Bill Coffman
On Sun, 14 Nov 2004 17:03:33 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 5:53 PM +0100 11/13/04, Leopold Toetsch wrote:
 As the analysis of test errors of the new reigster allocator has
 shown, we have a problem WRT register allocation. This problem isn't
 new, but as the allocator is more efficiently reusing registers (or
 reusing them in a different way) it's exposed again.
 
 We don't really have that much of a problem. What we have is just
 something more simple -- the target of a continuation marks the start
 of a basic block. That means that we have to assume everything we
 don't get handed back from the function's dirty and should be
 refetched.

I tend to agree that this is not such a problem.  Basically, Parrot
has various control flow possibilities that were not previously
considered.  (1) An exception can happen in a try block, so CFG arcs
need to be added from statements within the try block to the catch. 
(2) Continuations, which I don't pretend to understand, can also flow
in previously unconsidered directions.  Those arcs need to be added to
the CFG.

Alternately, for exceptions, it may be possible to circumvent that
process, and just add symbolic interfenence to all vars in the try
block with all vars in the catch block.

For continuations, however, it seems like those really are control
flow arcs that need to be added.  If analysis can trim a few of them,
then that would be great, but if people are using continuations, maybe
they shouldn't expect high performance, and extra register spilling
may be another side effect.

~Bill


Re: register allocation

2004-11-12 Thread Bill Coffman
Interesting.  I did a little more research, and checked out a fresh
CVS tree.  I did nothing more than insert a conflict checker into
reg_alloc.c  That is, whatever pre-colored registers have been
assigned to symbols, I verify that none of them are conflicting. 
Guess what I got?

make[1]: Leaving directory `/usr/home/coffman/dev/parrot/parrot/docs'
./parrot -o runtime/parrot/library/Data/Dumper/Base.pbc
runtime/parrot/library/Data/Dumper/Base.imc
**  added new here
node 1 = defname(S) is colored 5 and neighbor 9 = S5(S) is colored 5
parrot: imcc/reg_alloc.c:200: imc_reg_alloc: Assertion `r-color==-1
|| r-color != unit-reglist[y]-color' failed.
make: *** [runtime/parrot/library/Data/Dumper/Base.pbc] Aborted

The routine you mentioned, allocate_wanted_regs() doesn't assign
wanted regs if in conflict, but as far as I can tell, there's no check
on the integrity of precolored nodes.  Not quite sure what the correct
behaviour is for allocator, when it's given conflicts.

I saw 'K' compounded key somewhere, but don't know what it is.  I know
reg_alloc.c pretty well though.  Offhand I'd say it's not handled in
reg_alloc.  Maybe it should be?

~Bill

On Fri, 12 Nov 2004 10:26:51 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Bill Coffman wrote:
  Hello,
  * I have the below failed tests.  I haven't looked into these.  Can
  someone tell me if the tests are broken, or is my allocator broken.  I
  know they don't fail for the current cvs code (as of yesterday).
 
  Failed TestStat Wstat Total Fail  Failed  List of Failed
  ---
  t/library/dumper.t   13  332813   13 100.00%  1-13
  t/op/gc.t 1   256181   5.56%  13
  t/pmc/sub.t   1   256781   1.28%  78
 
 These errors look remarkable the same, when I turn on OPT_SUB that is,
 when allocate_wanted_regs() is used. And this code did miss registers
 sets like 'K' (compound keys).
 
 Changing imcc/reg_alloc.c:890 to:
 
if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set
 == 0))
 
 did fix this flaw.
 
 leo
 



Re: register allocation

2004-11-12 Thread Bill Coffman
 These errors look remarkable the same, when I turn on OPT_SUB that is,
 when allocate_wanted_regs() is used. And this code did miss registers
 sets like 'K' (compound keys).
 
 Changing imcc/reg_alloc.c:890 to:
 
if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set
 == 0))
 
 did fix this flaw.
 
 leo

You're patch looks suspiciously like a typo.  strchr(ISPN, r-set == 0))
will always returns true if r-set != 0.  This means that the patch
effectively disables allocate_wanted_regs(), unless there are certain
cases where r-set==0, which I doubt.

Try sticking this toward the beginning of imc_reg_alloc(), but after
build_interference_graph, like imcc/reg_alloc.c:187 (or so)

#ifndef NDEBUG
{
int x,y;
for (x = 0; x  unit-n_symbols; x++) {
SymReg* r = unit-reglist[x];
for (y = 0; y  unit-n_symbols; y++) {
if (ig_test(x, y, unit-n_symbols, unit-interference_graph)) {
if(r-color!=-1  r-color == unit-reglist[y]-color)
fprintf(stderr,node %d = %s(%c) is colored %d and 
neighbor %d = %s(%c) is colored %d\n,
x,r-name,r-set,r-color, 
y,unit-reglist[y]-name,unit-reglist[y]-set,
unit-reglist[y]-color);
assert(r-color==-1 || r-color != unit-reglist[y]-color);
}
}
}
}
#endif

And before ...
while (todo) {


Re: Register allocation/spilling and register volitility

2004-11-12 Thread Bill Coffman
On Mon, 8 Nov 2004 12:20:19 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Jeff Clites [EMAIL PROTECTED] wrote:
  On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
 
  If frames aren't adjacent, normal argument copying can be done anyway.
 
  This would seem to require the same types of runtime checks that you
  are objecting to below,
 
 Not runtime. The register allocator knows the amount of needed
 non-volatiles and volatile registers. Depending on that a call can place
 arguments directly into the incoming regs of the caller or not. Then if
 at runtime copying is needed (because e.g. frames aren't adjacent), the
 function arguments are copyied. This is fully transparent.
 
  I discussed that in item (1) under Further notes.
 
 Yep, saw that then ;)
 
  It's not needed. I've a better scheme in mind, which addressess
  efficieny as well as argument passing.
 
  And spilling?
 
 Well, I'm proposing a variable-sized register frame. With very little
 additions we could run with more then 32 registers per kind (there are a
 few bitmasks currently that would need adaptions, but not much).

I think this is a great idea.  Mostly, I was thinking of the case
where there are less than 32 registers needed.  The allocator can try
to reduce the number of registers that will need to be used.  I think
that addressing could become an issue is there are too many registers,
but I still haven't figured out how the bytecode looks yet.

 But basically, spilling should not be needed at all, if the register
 allocator isn't as broken as it now is. Dan got some really evil
 subroutines, so we have RL test cases. We'll see.

In that case, I'll focus on the register renaming, live-range
analysis, etc.  The memory leak, which might not be actually lost
memory, but just wasteful useage of memory, is pretty much related to
the spill code.  Most of the grotesqueness of the allocator is in the
spiller.  If this has a high probability of changing, I'll focus on
other stuff, like register renaming, which is a lot harder, but with
higher longterm payoff.

In light of this, I think I'll send in my patch, in case it is
helpful.  The problem is that some routine outside of imc_reg_alloc()
is sending in conflicting register preallocation.  My code does lot's
of assertions, and finds this, one way or another.  I'll put in a
variable to turn off color consistency checking, which will probably
then pass the tests.

Bill

 
  JEff
 
 leo



register allocation

2004-11-11 Thread Bill Coffman
Hello,

I have addressed almost all the code issues from the last patch.  That
broke some things, which I had to fix, but they're fixed.  A couple
other improvements too.  Here's the state of things:

* I have the below failed tests.  I haven't looked into these.  Can
someone tell me if the tests are broken, or is my allocator broken.  I
know they don't fail for the current cvs code (as of yesterday).

Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/library/dumper.t   13  332813   13 100.00%  1-13
t/op/gc.t 1   256181   5.56%  13
t/pmc/sub.t   1   256781   1.28%  78
4 tests and 59 subtests skipped.
Failed 3/124 test scripts, 97.58% okay. 15/1975 subtests failed, 99.24% okay.


* Still have a memory leak that blocks Dan's evil code from compiling.

* Run time isn't so great, with 10x gcc -O2, for 200 symbols, and 100x
for 500 syms.  Below, is for 1000 syms.  Still, this is much better
than the previous allocator, which crashed my workstation around 130
symbols.

#vars  gcc  parrot
198   6.84   66.88
199   7.29   82.03
200   9.86   109.86
500   0.74   125.39
501   0.68   99.53
999   1.68   755.30
1000   2.06   988.55
1001   1.79   723.25

* Considerable improvements are on the way.  Some are fairly low
hanging fruit, but my biggest worry is not to break things along the
way, which is all too easy to do.

~Bill


Fwd: register allocation

2004-11-11 Thread Bill Coffman
Trying again...


-- Forwarded message --
From: Bill Coffman [EMAIL PROTECTED]
Date: Thu, 11 Nov 2004 14:05:29 -0800
Subject: register allocation
To: Perl 6 Internals [EMAIL PROTECTED]


Hello,

I have addressed almost all the code issues from the last patch.  That
broke some things, which I had to fix, but they're fixed.  A couple
other improvements too.  Here's the state of things:

* I have the below failed tests.  I haven't looked into these.  Can
someone tell me if the tests are broken, or is my allocator broken.  I
know they don't fail for the current cvs code (as of yesterday).

Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/library/dumper.t   13  332813   13 100.00%  1-13
t/op/gc.t 1   256181   5.56%  13
t/pmc/sub.t   1   256781   1.28%  78
4 tests and 59 subtests skipped.
Failed 3/124 test scripts, 97.58% okay. 15/1975 subtests failed, 99.24% okay.

* Still have a memory leak that blocks Dan's evil code from compiling.

* Run time isn't so great, with 10x gcc -O2, for 200 symbols, and 100x
for 500 syms.  Below, is for 1000 syms.  Still, this is much better
than the previous allocator, which crashed my workstation around 130
symbols.

#vars  gcc  parrot
198   6.84   66.88
199   7.29   82.03
200   9.86   109.86
500   0.74   125.39
501   0.68   99.53
999   1.68   755.30
1000   2.06   988.55
1001   1.79   723.25

* Considerable improvements are on the way.  Some are fairly low
hanging fruit, but my biggest worry is not to break things along the
way, which is all too easy to do.

~Bill


Re: Solicitation of Ideas for Performance Statistics and Graphs

2004-11-03 Thread Bill Coffman
On Wed, 3 Nov 2004 12:52:26 -0800 (PST), Joshua Gatcomb
[EMAIL PROTECTED] wrote:
[snip]
 What we would like to do is determine if what we have
 done so far is sufficient or, if not, what specifically
 people would like to see.  Some of our unimplemented
 ideas so far are:
 1.  Include the computed goto core
 2.  Summary of results over N week(s)/month(s)
 3.  Provide user form for dynamic results
 If people would like this, they also need to
 indicate what the form should provide:
 (benchmark name, date, executable, etc)
 4.  Provide HTML table of data for some/all of graphs
 5.  Provide links for people to work locally
 A.  A db schema/structure dump so people can
 collect statistics on other architectures
 B.  Source code
 C.  daily db dump
 
 If you would like to see any of these ideas
 implemented, or you have some of your own - please
 respond to this on the list.
[snip]

I think this is a really great idea.  I made some picture graphs for
the register allocation stuff myself, and tried to evangelize the idea
a little, but you guys have introduced a useful tool that can help
developers see the results of their work.  Now, one can anxiously
check the results after a checkin and see if a desired speedup has
occurred or not.  Great job.

To make this idea, and philosophy, pay off, we need (35B)++.  I think
what you're suggesting is to provide the software to others, so they
can run their own tests.  That's great, but I'd rather create the
test, check it in, and modify a master config file, that says which
tests will run.  The administrator, of course would have to approve
the patch.  For long tests, they don't have to be in the nightly test
suit, but could be run by name, by the user.

For number 4, I'd say that a comma separated values (CSV) file would
be most useful.  Then users can view and manipulate the data
themselves.  Various spreadsheets have nice capabilities for this, and
of course certain scripting languages can parse this kind of data
well.  So I hear, at least :).

Once again, great idea, great job!

~Bill


Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols

2004-11-01 Thread Bill Coffman
Hello,
Just getting back to parrot.

[Leo]* alloca() isn't portable and not available everywhere
[Dan] Yep. This is a gcc-ism. Use Parrot's memory allocation functions instead.

reg_alloc.c uses malloc().  I saw alloca() used in the CVS tree
earlier, but it's gone now.  What is the correct way to allocate
memory in Parrot?


[Leo]* the Dbg and Dbg2 debug macros aren't needed. Just use the existing
debug(interp, level, ...) function in src/debug.c. If you need some
extra levels, you can use some more bits in imcc/debug.h

I notice an IMC_TRACE macro, as well as the associated debug stuff. 
Debug looks like it has a lot of overhead, even when turned off.  I
didn't really get Dbg cleaned up either, because I couldn't figure out
how to handle variable number of arguments, but something like
assert() would be much nicer.  I'll go ahead and follow the convention
of adding debug calls.  Should the IMC_TRACE stuff be removed?


[Leo] * all functions should have an Interp* and a IMC_Unit* argument to allow
[Leo] reentrancy. I.e. all state should be in the unit structure.

I'm not quite sure I understand this.  I get that each function call
has (interpreter,unit) as parameters.  But why is this?  Parrot
handles interrupts, I understand, so this probably has something to do
with interrupt processing.  Does this also mean that there should be
no other data structures, except those pointed to by interpreter or
unit?


[Dan] I'd like the code issues cleaned up before it gets committed.

I want to fix the memory leak, which involves rewriting a few
functions.  This might go fast, or it could become difficult.  It's
probably best to send an intermediate patch that has the code issues
fixed.  That way we can maximize code review, and make sure I'm
writing parrot compliant code.

- Bill


Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols

2004-10-29 Thread Bill Coffman
Sounds like the memory leak.  Let me try to fix this, and address the
other issues.  I'll get back to you.

Thanks,
-Bill

On Fri, 29 Oct 2004 10:15:34 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote:
 Bill Coffman (via RT) wrote:
 
 Patch does the following:
 
 - Applied Matula/Chaitin/Briggs algorithm for register allocation.
 - Color the graph all at once, and spill all symbols with high colors.
   Spill all at once to speed things up.
 
 Good. Hopefully Dan can provide some compile number compares.
 
 The numbers are... not good.
 
 I took one of the mid-sized programs and threw it at the new code.
 Parrot in CVS takes about 10 minutes to run through this program. The
 main sub's about 30Klines of code, and the stat from a parrot -v is:
 
 sub _MAIN:
  registers in .imc:   I2875, N0, S868, P7615
  0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch
  0 used once deleted
  0 invariants_moved
  registers needed:I2883, N0, S873, P7741
  registers in .pasm:  I31, N0, S31, P32 - 37 spilled
  5845 basic_blocks, 47622 edges
 
 I applied the patch to a copy of parrot and ran it. After 37 minutes
 I killed the thing. It had 1.6G of RAM allocated at the time of
 death, too.
 --
 
 
 Dan
 
 --it's like this---
 Dan Sugalski  even samurai
 [EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk



Re: register allocation questions

2004-10-28 Thread Bill Coffman
Hi all,

Thanks for your continued comments.  Btw, I usually read all the
parrot list, so don't think I'm not paying attention.

Currently, here's how the register allocator is doing.

Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/library/dumper.t5  1280135  38.46%  1-2 5 8 13
4 tests and 51 subtests skipped.
Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.

I recall Leo, or someone, saying that the data dumper routines are not
following the calling convention properly.  So I've decided not to
worry about it too much.  It passes the other tests, plus the
randomized tests that I created, up to 150 symbols.  At that range, it
still takes about 20x longer than g++ -O2, for equivalent programs to
compile (see gen4.pl).

Also, it is currently running about O(n^2) for n symbols, where the
old one was running about O(n^3) from my analysis.  The spill code is
still very expensive, and has a large constant associate.  I also have
data, which is attached.  The difference doesn't show up until a lot
of spilling is going on, around 80 symbols or so.

I've learned a lot about how the compiler works at this point, and I'd
like to contribute more :)

Would you like a patch?  Should I fix the data dumper routines first? 
What is all this talk about deferred registers?  What should I do
next?

Well, I'm making some comments on the below stuff.

On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
 Dan Sugalski wrote:
 At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
 
 So, if you want that really super efficient, you would allocate
 registers around function calls directly to that wanted register number,
 which should be in the SymReg's want_regno.

Yes, I think we are kind of doing this.  It's best to pass the
registers straight through though.  Like when a variable will be used
as a parameter, give it the appropriate reg num.  Sort of outside the
immediate scope of register coloring, but as I've learned, one must go
a little beyond, to see the input and output for each sub.

 While true, in the general case leaving 0-15 as non-preferred
 registers will probably make things easier. Those registers,
 especially the PMC ones, are going to see a lot of thrash as
 function calls are made, and it'll probably be easier to have them
 as scratch registers.

I guess I don't agree.  I'd like to pack down the number of registers
used to a minimum.  Then when a function is called, only those needed
registers are copied in/out.  Don't think the functionality exists. 
But the idea is to have each sub declare how many registers to
save/restore.  This would then save 0-k such registers.  Where k is
the number of registers used by the sub.  Pack 'em down, minimize the
number needed.

We can also minimize this number to match the physical architecture
that parrot is running on (for an arch specific optimization).  The
imc_reg_alloc function does not have 32 hard coded in there (well a
little bit, but can be easily changed).  It's pretty dynamic.

 Yep, that's the easy part ;) OTOH when the register allocator is
 doing register renaming anyway, the most inner loop with a function
 call should get registers assigned already matching the calling
 convemtions. With more then one call at that loop level, you have to
 move around registers anyway.

Yes, yes, renaming!  I want to do register renaming!

 Oh, sure, but keeping your scratch PMCs out of the way makes life a
 lot easier for the register coloring algorithms. Might not be
 optimal, but if it makes life simpler to start, optimal can come
 later.

p31 holds all the spill stuff.  It's a pain.  Maybe I'll move that
around, but if p31 is used, it means that there is no more room for
symbols, in at least one of the reg sets.

 [1] all except Dan's 6000 lines subroutines :) Did you start
 creating real subs for your code already?
 
 I wish. :( Unfortunately not, outside some simple stuff, and I doubt
 I will. The language just doesn't lend itself to that sort of thing.
 We're going to add actual real subroutines to the language after we
 roll out into production, but that doesn't help now, alas.

Interesting.  I'd like to test on something like that.  Maybe SPEC99 as well.

- Bill Coffman


compile.dat
Description: Binary data


compile.plot
Description: Binary data
attachment: compile.png

Re: register allocation

2004-10-28 Thread Bill Coffman
When I cvs up'd, cleared and reConfigure'd I got these stats:

Failed Test Stat Wstat Total Fail  Failed  List of Failed
---
t/library/streams.t1   256211   4.76%  11
t/op/gc.t  1   256181   5.56%  13
4 tests and 66 subtests skipped.
Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay.



 I'm curious to see how it behaves once the spilling gets up into the
 1000+ symbol range. Dropping from cubic to quadratic time ought to
 make a not-insignificant change in the running time, even if that
 constant's pretty big. :)

I think it's a bit more complicated.  M=number lines in code, N=number
variables.
- Time = O(M^2+N^2)  
- old time = O(M^2 +N^3)
Not quite sure of this either.  But the N^3 eventually dominates, I
think.  The data seems to bear this out.

There are more fixes I'd like to make as well.  I spotted several
things that could be fixed.  And I think the spill code can be
optimized a lot to reduce the big-O time as well.

More statistics #vars v. time in seconds: 
#vars  gcc  parrot2
200   7.92   89.20
201   11.86   146.31
202   18.11   246.37
203   9.54   107.88
204   11.81   134.60
205   14.75   190.95
206   13.25   161.83
207   10.63   138.83
208   11.02   117.73
209   7.14   88.29
210   15.14   176.69

I am also running gen3.pl with 1000 vars.  It's still on gcc.  We'll
see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight.

 Would you like a patch?
 
 Yes! Oh, yeah, definitely.
 [...]
 If you've got a patch, I'd be more than happy to give it a whirl, and
 I can likely get you a copy of the code in question to give a run on.

Soon, I'll send one the proper way.


  The
 imc_reg_alloc function does not have 32 hard coded in there (well a
 little bit, but can be easily changed).  It's pretty dynamic.
 By all means, go for it. I certainly don't want to curb your
 enthusiasm. It's the right thing to do, ultimately. I didn't want to
 presume on your time. Happy to have it, of course. :)

Thanks.  I've had a great time doing this.  Remembering graph
algorithms and compilers.  Great fun!  I'd also like to contribute to
getting Parrot out there, sooner rather than later.  So if I can help
with that, I'd like to hear suggestions.

-Bill


Re: register allocation questions

2004-10-28 Thread Bill Coffman
Thanks Matt,

I hope I can help out.  The patch I am submitting actually does
simplify register coloring a bit.  I've been waiting for perl6 with so
much anticipation, I just couldn't stand it any more, and I had to
participate.

-Bill


On Thu, 28 Oct 2004 18:17:57 -0400, Matt Fowles [EMAIL PROTECTED] wrote:
 Bill~
 
 I have to say that I am really impressed by all of the work that you
 are doing, and if you can make the internals of imcc a little more
 approachable, you would be doing a great service.
 
 Thanks,
 Matt
 
 
 



Re: C89

2004-10-28 Thread Bill Coffman
Thanks for the info...

Apparently,

   gcc -ansi -pedantic 

is supposed to be ANSI C '89.  Equiv to -std=c89.  Also, my
Configure.pl generated make file uses neither -ansi nor -pedantic.  I
do have access to a KR C v2, but it doesn't look like it's going to
match the actual practice.  Oh well.  So long, as my code works, I'm
happy.

Incidentally, I tried adding -ansi and -pedantic and I got lots of
warnings, like long long not supported by ANSI C'89, etc. (how can
you do 64 bit ints then?).  I also got errors that caused outright
failure.  Perhaps it's best to forget the whole C'89 thing.  But maybe
someone should remove that from the documentation?  Just a thought.

-Bill

On Thu, 21 Oct 2004 22:41:36 -0700, Jeff Clites [EMAIL PROTECTED] wrote:
 On Oct 21, 2004, at 11:51 AM, Dan Sugalski wrote:
 
  At 11:25 AM -0700 10/21/04, Bill Coffman wrote:
  I read somewhere that the requirement for parrot code is that it
  should be compliant with the ANSI C'89 standard.  Can someone point me
  to a description of the C89 spec, so I can make sure my reg_alloc.c
  patch is C89 compliant?
 
  I don't think the ANSI C89 spec is freely available, though I may be
  wrong. (Google didn't find it easily, but I don't always get along
  well with Google) If the patch builds without warning with parrot's
  standard switches then you should be OK. (ANSI C89 was the first big
  rev of C after the original KR C. If you've got the second edition or
  later of the KR C book, it uses the C89 spec)
 
 Also, if you're compiling with gcc, then you can pass -std=c89 to the
 compiler to enforce that particular standard. (Apparently--though I
 haven't tried it.) I believe -ansi does the same thing.
 
 JEff
 



register allocation questions

2004-10-25 Thread Bill Coffman
Hello All,

I have been hard at work, trying to grok the reg_alloc.c code, and
with some success.  My code is assigning registers, so that none are
conflicting (which I double-verify), and I'm getting to the end of
make.  Nonetheless, I don't get too far into make test, and I'm
not quite sure why.  Maybe some of  you who understand this so much
better than myself, can help.

Here's some questions:

1)  In the existing parrot code, when a register is assigned, it uses
the following code:
   for (color = 0; color  MAX_COLOR; color++) {
int c = (color + MAX_COLOR/2) % MAX_COLOR;
if (!colors[c]) {
reglist[x]-color = c;

debug(interpreter, DEBUG_IMC,
#[%s] provisionally gets color [%d]
 (%d free colors, score %d)\n,
reglist[x]-name, c,
free_colors, reglist[x]-score);
break;
}
}

Thus, it seems to prefer to use register #16 and up, first, before it
uses 0-15.  This is mistifying to me, since it's not so trivial to
code it this way, and yet I cannot figure out why this color
preference.  It seems like maybe there's a reason which I need to
understand here.  My new code doesn't do this behaviour.

2)  When function imc_reg_alloc (the main register allocation thingy)
is called, some of the variables have already expressed a preference
as to which register they want.  I understand that this can optimize
certain parameter passing, etc.  The question that arrises is, what if
two conflicting variables want the same color (reg#)?  Obviously, they
don't get their way.  But what are the consequences, and what must be
done to fix this.

3) Failed test.  Here's a test that I fail:
##PIR##
.sub _main
.local pmc array
new array, .PerlArray
push array, 0
push array, 1

_dumper( array, array )
end
.end
.include library/dumper.imc

ERROR:
Method 'dumpWithName' not found
in file '(unknown file)' near line -1

4) When I change my code to start at reg#16, like in (1), the code of
(3) produces the output:
array = PerlArray (size:2) [
array\array[0],
array\array[1]
]

Where the array\array[0] should be just 0.  Incidentally, reg
allocation is done on the following subs:
_main __library_dumper_onload _dumper _register_dumper _global_dumper
__library_data_dumper_onload dumper
__library_data_dumper_default_onload dumpWithName dumpCached
dumpProperties dumpHash dumpStringEscaped pmcDefault pmcPMCArray
pmcIntList pmcStringArray pmcPerlArray pmcPerlHash pmcPerlString
pmcPerlInt pmcPerlNum pmcPerlUndef pmcSub pmcNull pmcOrderedHash
pmcManagedStruct pmcUnManagedStruct pmcArray
__library_data_dumper_base_onload prepare cache createIndent indent
newIndent deleteIndent dump simple String

which seems like a lot to me, but I guess it's compiling a lot of stuff.

---

I might have some kind of bug, in allocating registers, but this seems
strange, and there seems to be a little more going on than finding
non-conflicting registers for variable names.  The -d 008 output seems
to produce pretty much the same thing, but just different registers.

Maybe someone can point me at something to look at to fix this.  I'll
provide the patch if someone would like to play with it.

Thanks to all who made it this far,
Bill Coffman


C89

2004-10-21 Thread Bill Coffman
I read somewhere that the requirement for parrot code is that it
should be compliant with the ANSI C'89 standard.  Can someone point me
to a description of the C89 spec, so I can make sure my reg_alloc.c
patch is C89 compliant?

Thanks,
- Bill


Re: Pathological Register Allocation Test Generator

2004-10-20 Thread Bill Coffman
Leo,

Thanks for your suggestions and comments.

On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch [EMAIL PROTECTED] wrote:
 Some remargs WRT gen{3,4}.pl:
 1) While these programs exhibit some worst case register layout it's
probably not a very typical layout.

Agreed.  The idea was to automate and compare to gcc.  There are
real-world tests already in the parrot test suite, but because I can
generate millions of such cases, one hope is to detect errors, and to
get some kind of performance metric, even if these programs are
artificial.  To get real metrics that people would pay attention to,
we might implement SPEC99, etc.

 2) RL programs have lexicals and globals access spread over the code like
in the generated gen.imc
 3) and that's intersparsed with highly local access to temporaries coming
from expression evaluations.

I think I could modify the tests so that the variables have a Poisson
distribution, as far as their usage distance goes (max number of lines
away from first use).  This would cause some of them to look more like
temporary variables, and be a somewhat better simulation of real
programs.

 I'd change the simulation program to use PMCs to allow for 2). Now when
 it comes to spilling, these lexicals or globals don't need a new
 storage, their live range can just get discarded and at the next usage
 of this lexical or global it just can be refetched[1]. Implementing this
 should already vastly improve the register allocation.

 [1] The refetching can of course be into a different Parrot register.
 Thus the usage of globals or lexicals wouldn't interfer and generate
 huge live ranges over the whole function as it currently does.
 

I don't quite understand this.  You think  I should create PMC
variables, instead of integers?  It's probably a good idea to have a
mix of the four types I suppose.

Once thing I can say is that the interference graph is pretty
conservatively generated.  When a variable is reassigned, it can be
treated as a new variable which can reduce register pressure, but I
don't think the code is doing that yet.  It sounds like you're talking
about implementing the interference graph better.  I may get into that
too, but for now, I want to get the rester coloring (mapping vars to
registers) working better.

Also, I'd like to be able to capture the effects of changes in metrics
(via scripts like those I posted) so we can see if different ideas
will actually improve the situation or not.  I don't even have any
runtime checks yet, but creating a score to see how much was spilled,
could be helpful.  That could be incorporated into the register
allocator itself (which already is, if you run with -d 0008, then grep
Spill).

 I don't know if we need some syntax to easily support lexicals or
 globals or if the register allocation can deduce that itself. But if a
 new syntax simplifies that, we put it in.

My preference is to not distinguish between various types of
variables, but to have the algorithms deal best with nodes based on
the structure of the graph(s).

 
 For 3) there is already a separate pass in
 imcc/reg_alloc.c:allocate_non_interfering().

Yes.  This function seems to find variables whose live ranges (life
ranges) are restricted to a single basic block, and assign them reg
28,29, or 30.  My preference is that low hanging fruit like temporary
vars should be assigned by the algorithm.  Such variables will tend
not to have much interference with others, and are thus probably
easier to color.  My experience with coloring, is that assigning low
degree (aka arity, #neighbors) nodes to fixed colors first, will
actually reduce the performance of the graph coloring algorithm.  It's
usually best to color the easy stuff last.

The algorithm I'd like to use is one that happens to work very well
for interference graphs, and it is very simple.  This algorithm
selects a node that interferes with the fewest other variables (lowest
degree), and removes it from the interference graph, and puts it into
a stack.  It then recalculates the lower degrees of each neighbor, and
iteratively removes all nodes from the graph in this way, pushing them
onto the stack.  It then, pops off the nodes from the stack and colors
them in lifo order, using the first available color.  Nodes that need
a color higher that 30, or whatever the max. turns out to be, are
spilled.  The spilled nodes are always in the densest part of the
graph, where something has to spill.

I believe this is essentially the algorithm Briggs/Chaitin used, and
some variation of it has been used in most compilers since the 70's. 
One exception is that certain variables want certain colors
(want_regno) -- those should be colored first, as the current code
does.

A disadvantage of the above algorithm is that it tends to randomly
select which nodes get spilled, among the densest parts.  Another
optimization to the algorithm is to incorporate a score that causes
the most frequently accessed nodes not to get spilled.  (The 

Re: Pathological Register Allocation Test Generator

2004-10-19 Thread Bill Coffman
Hello All,

This is my first post to the parrot list, but I hope that many will
follow.  Thanks to all of you for working so dilligently on building
this wonderful new toy for all us geeks to play with!

I am currently working on a fix to the large subroutine register
allocation bug, aka, massive spilling not yet implemented.  The
problem, is that the register allocation code is complex, and I'm not
all that familiar with it, or even with working with compilers at the
coding level.

I am comming at the problem as a former graph theorist, who was a
consultant to a compiler project (at UC Davis, in the late 90's where
some of the work was applied toward what is the current Intel
compiler).  I talked a lot, and learned a lot, but didn't actually
contribute any coding to the project.  I also spent a lot of time
coding graph coloring heuristics with C++.

I expect to produce a patch soon, that will make register allocation
better.  To that end, I felt it was important to come up with some
metrics.  The first attached program (gen3.pl), which is based on Greg
Purdy's gen-pra.pl, gathers data on compilation performance for
varying numbers of variables and compares this to GCC.  Mine also uses
gnuplot to make pretty pictures.  If everything is installed okay on
your system, a graph will pop up displaying runtime results.  I'm
using Debian Linux, so it was pretty easy to set up.

The second program, gen4.pl, is a randomized, automated, blackbox
tester.  It works similar to the above, but the corresponding parrot
and GCC outputs are compared.

And now, for the philosophical note ...

One of the biggest wins of RISC over CISC was not that RISC was
smaller, but that architects looked at actual code that would run on
their machines.  They removed an instruction from the instruction set
if it wasn't used that frequently.  They considered wheather to
implement features in the hardware, or the compiler, and they tested
each, to find the optimum balance.  This approach may be helpful in
building parrot as well, and it's what I've tried to do, in a very
modest way, in my scripts.  Thanks to Gregg Purdy for getting the ball
rolling in this general direction.

Thanks,
Bill Coffman


On Mon, 11 Oct 2004 09:54:31 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 4:58 PM -0700 10/2/04, Gregor N. Purdy wrote:
 Dan et al. --
 
 I made a new version of the script that creates gen.cpp and gen.imc
 (attached). You can run it like this:
 
perl gen-pra.pl 1000 1
 
 (for 1000 labels and 1 variables) and it will create equivalent
 gen.imc and gen.cpp files. You can test-compile them with these
 commands:
 
 g++ -c gen.cpp
 
 and
 
imcc -o x.x gen.imc
 
 on my system, the g++ compiler does eventually finish, but the imcc
 compiler is eventually killed.
 
 Maybe this could be used to drive out the underlying problems that
 are keeping parrot from compiling Dan's really large subs?
 
 Mmmm, degenerate behaviour! Cool, and thanks. Should help judge how
 we're doing with the nastier code.
 --
 Dan
 
 --it's like this---
 Dan Sugalski  even samurai
 [EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk

#!/usr/bin/perl -w

use strict;

die Usage: $0 starting_num_variables ending_num_variables\n unless @ARGV == 2;
my ($start,$stop) = @ARGV;

open COMP, compile.dat or die $!;
print COMP #vars   gcc   imc\n;

for (my $vars=$start;$vars=$stop;$vars++) {
  my ($total_locals) = $vars;
  my $total_labels = int ($total_locals / 2);

  my $labels_so_far = 1;
  my $locals_so_far = 0;

  open IMC, gen.imc;
  open CPP, gen.cpp;

  print IMC .sub __MAIN\n\n;
  printf IMC \n_L_%d:\n, 1;
  print CPP #include stdio.h\n\n;
  print CPP int main(int argc, char * arg[])\n{\n;
  printf CPP \n_L_%d:\n, 1;

  my %action_table = qw(l label v variable a arithmetic c control p print);
  my @actions = (('l')x2, ('v')x4, ('a')x9, ('c')x3, ('p')x0);

  while ($labels_so_far  $total_labels or $locals_so_far  $total_locals) {
   my $action = $action_table{$actions[rand @actions]};

   if ($action eq 'label' and $labels_so_far  $total_labels) {
 my $this_label = ++$labels_so_far;

 printf IMC \n_L_%d:\n, $this_label;
 printf CPP \n_L_%d:\n, $this_label;
   }
   elsif ($action eq 'variable' and $locals_so_far  $total_locals) {
 my $this_local = ++$locals_so_far;
 my $this_value = int(rand(99)) + 1;

 printf IMC   .local int V_%d\n, $this_local;
 printf IMC   V_%d = %d\n, $this_local, $this_value;

 printf CPP   int V_%d;\n, $this_local;
 printf CPP   V_%d = %d;\n, $this_local, $this_value;
   }
   elsif ($action eq 'arithmetic' and $locals_so_far  0) {
 my $result = 1 + int(rand($locals_so_far));
 my $arg1   = 1 + int(rand($locals_so_far));
 my $arg2   = 1 + int(rand($locals_so_far));
 next if $arg1 == $arg2;

 my $op