Re: Inf and NaN
NegNan doesn't exist, except as a fluke of the representation (see link for how they are represented). A -NaN is the same as a NaN. They both fail all comparison tests, even NaN == NaN is false (unless your compiler optimizes the comparison out). Only difference is the way they are stringified, which should be NaN, but stringification of NaN is non-standard. Solaris compilers will print -NaN, but gcc only prints nan. Microsoft compiler prints strange stuff. Ron Blaschke [EMAIL PROTECTED] did some work in stringifying these things, see [perl #38887] http://en.wikipedia.org/wiki/IEEE_754 To summarize, if all the exponential bits are set, you get one of these odd ducks. If the mantissa is zero, then you get +/- Inf, based on the sign bit. If the mantissa is non-zero, you get a NaN, obviously there's lots of ways ot make NaN's (if the sign bit is set, Solaris prints -NaN). Fortunately, this is something you can rely on across architectures. Only thing is you need to know wheather it's big or little-endian. The signaling NaN vs the non-signaling is architecture dependent, the following link discusses AMD and Intel http://en.wikipedia.org/wiki/NaN It seems like you shouldn't need PMC for this, just the N-thingy (sorry my parrot is rusty). That is, these guys can be represented purely by floats or doubles -- unless, of course, we're supporting non-ieee-754 machine architecture. On 7/31/06, Nicholas Clark [EMAIL PROTECTED] wrote: On Sun, Jul 30, 2006 at 10:45:29PM -0700, jerry gay wrote: don't forget about negative-not-a-number, and the quiet (or signaling) Ah yes. that oxymoron. I've never yet seen the reasons for why it exists at all. Does anyone have a URL? Nicholas Clark -- Bill Coffman ~~~_\\\|/ ~~~-(@ @) =oOO=(_)==OOo===
Re: [perl #38887] Result of INFINITY or NAN stringification is platform dependent
On 7/26/06, Ron Blaschke [EMAIL PROTECTED] wrote: I'm looking into this issue and would like to ask for some advice. I have added the following platform dependent functions. int Parrot_math_isnan(double) int Parrot_math_finite(double) On Win32 the implementation is simple because the IEEE recommended functions _finite and _isnan are supported. I'm thinking about adding a test for these functions and use them. But what should happen if they are not there? Then I changed Parrot_sprintf_format in spf_render.c to use Parrot_math_isnan and Parrot_math_finite to render NaN, -inf and inf directly. Is this a good way to solve this? There is no platform independent way to produce NaN or Inf, so IMHO, you did it the only way it can be done. That being said, you can optimize by looking at the bits. Wikipedia explains IEEE-754 http://en.wikipedia.org/wiki/IEEE_754 First, you have to be aware if the machine is big or little-endian, second, look at the exponent bits, if they are all set, this is not a normal number (either +Inf, -Inf, or NaN, depending on mantissa and sign bit). Unless you're targeting non-ieee-754 systems like the VAX (an old computer), this is garanteed to be portable. I wouldn't bother with any platform specific functions, as they are all different. Just look at the bits. (gcc uses isnan, isinfinite, ... but I think you have to compile with the -c99 flag and solaris does it a little different, etc). I recently did something similar in PDL. If looking at the bits is not helpful, I can give you a little more detail (hint struct union). If you really don't want to look at the bits, I notice that gcc does have an isnormal() function you can use, but again you have the portability issues. Personally, I'd like to see the i in inf capitolized or the Ns in NaN lower case. I like capital I for Inf. Good luck, Here's a sample (which should make its way into some tests): type test.pir .sub main # -INF set N0, 0.0 ln N1, N0 print N1 print \n # +INF abs N2, N1 print N2 print \n # NaN set N3, -1.0 sqrt N4, N3 print N4 print \n .end # trunk parrot test.pir -1.#INF00 1.#INF00 -1.#IND00 # local parrot test.pir -inf inf NaN Thanks, Ron -- Bill Coffman ~~~_\\\|/ ~~~-(@ @) =oOO=(_)==OOo===
Re: [perl #36407] [BUG] imcc - register allocation
Isn't the register allocator pretty much minimized by the new architecture implementation? My understanding was that only temporary variables could benefit from it now. Perhaps the new changes aren't in effect yet? Just curious. On 6/28/05, via RT Leopold Toetsch [EMAIL PROTECTED] wrote: # New Ticket Created by Leopold Toetsch # Please include the string: [perl #36407] # in the subject line of all future correspondence about this issue. # URL: https://rt.perl.org/rt3/Ticket/Display.html?id=36407 The register allocator doesn't properly track control flow, if a label has the same name as an opcode or a variable. The result is usually that registers are reused because the control flow change of these labels isn't considered. I tried to reduce the bug to a simple example and failed. Anyway languages/tcl/lib/commands/array.imc exposed it with labels named 'subcommand' (like a var) and 'set' (opcode). See also r8468 leo -- -Bill
Re: Hackathon Day 2+3 Report
On 6/13/05, Chip Salzenberg [EMAIL PROTECTED] wrote: I've posted a report on the Hackathon Days 2+3 as a journal entry on use.perl.org http://use.perl.org: http://use.perl.org/~chip/journal/25182 I'm not going to copy it here, but you probably want to read it, if only because it will point you to AN UPDATED PDD. Really. -- Chip Salzenberg [EMAIL PROTECTED] Chip, This is really great to see! The whole continuations/regalloc thing is being addressed! It's great to see this kind of progress. Implementing this kind of register allocator could be tricky, for a number of reasons, but it's great that there is a direction now. OTOH, a minimally functional version of this architecture can be created by, as you say, just by register renumbering. Anything beyond that is premature optimization, at this point. Thanks for moving on this, and Kudos to you and Leo! Bill
Re: Attack of the fifty foot register allocator vs. the undead continuation monster
On 6/12/05, Curtis Rawls [EMAIL PROTECTED] wrote: [snip] It might also be helpful to take a look at other systems that also implement continuations: -Stackless Python (http://www.stackless.com/spcpaper.htm) -Standard ML (http://www.smlnj.org/doc/features.html) -Formalizing Implementation Strategies (http://citeseer.ist.psu.edu/danvy00formalizing.html) -Others (http://c2.com/cgi/wiki?ContinuationImplementation) -Curtis Rawls I found this link helpful when trying to understand continuations. The code they use is not secure. It is basically using buffer overflow attacks as a programming technique? Well, you'll have to read to understand what I mean. http://homepage.mac.com/sigfpe/Computing/continuations.html The part that I understand about continuations, is that they wreak havoc in the control flow graph, as Chip initially said: Therefore, register allocation must allow for implicit flow of control from *every* function call to *every* function return ... or, more precisely, to where *every* continuation is taken, including function return continuations. Although I think that is actually sugar coating it a bit. Continuations can be taken from within any sub, and possibly even when appending to a list, if you're using lazy list eval. This, I found out when my changes to reg_alloc.c broke the continuations tests. The changes were not the problem, as Leo confirmed, it was the way the allocator and the continuations interacted. There are also some issue about changing variables through continuations, like you can only change PMC integers, which point to some fixed location, containing the int, i.e. you can only use references. These parts make less sense to me, but the issue is basically, some registers are restored, and some are kept safe? Well, that issue hasn't been quite resolved either. If I thought the register allocator had a chance of working, I would probably find the time to start working on it again. But probably there are plenty of people who would be happy to do so, if this issue could be resolved. -- -Bill Coffman
Re: Regarding Google's Summer of Code 2005
Hello, I worked on the parrot register allocation problem a few months ago. The problem I encountered was that of continuations based dependencies. There were architectural issues about continuations, and Leo proposed changes that would fix this, but Dan said it was too late for architectral changes. Basically the existence of continuations adds a lot of dependencies that don't want to go away, defeating most of the benefit that one can get from register alloction. I am referring to parrot's use of full continuations, which allow a snap shot of the current runtime stack. Any function can take a snapshot or restore the stack to a previous snap shot. This makes traditional register allocation problematic. As far as I can tell, the continuations depency problem was not solved, and in fact the existing register allocator is also in conflict with the continuations based dependencies. (Unless something has been done in the last few weeks.) The resolutions seems to be not to add tests that break things. Thus, continuations based tests are few, and unstable. Changing the way the registers allocate, even though they still don't violate the interference graph, will break some tests. There are several threads in the parrot mailing list that discuss the continuations problem. I hoped to be able to address parrot register allocator again, at some point, but it could be a while before I get to that. In the mean time, search the perl6-internals mailing list for continuations and register allocation. I'm sure you'll find it. The above not withstanding, I think there are still quite a few things that can be done to improve register allocation in parrot. If anyone is interested, I'm sure I could dig up my code. Bill Coffman On 6/3/05, Curtis Rawls [EMAIL PROTECTED] wrote: On 6/3/05, Leopold Toetsch [EMAIL PROTECTED] wrote: Dheeraj Kumar Arora wrote: I m interseted in one of LLVM project Implement well-known optimizations in PIR compiler (SSA - register allocation) [ I have answerd this ] Parrot is a register based virtual machine with 32*4 registers. There are a lot of studies WRT optimizations and register allocations for compilers and hardware CPUs but probably all these things can be applied to Parrot too. See also imcc/cfg.c and imcc/reg_alloc.c for existing code. I'm not a computer scientist nor am I able to follow most of the papers regarding various optimization techniques or the needed infrastructure to implement it Anyway, there are folks on our mailing list like Curtis Rawls, who could probably better summarize what's needed and what should be done. Can Any send me the details? Yes please, folks. leo Wow, thanks for the link to the Summer of Code, that looks like fun! (I know I'll be thinking about my proposal all day at work : ). I don't have my TODO list with me, I will try to post it later today. I have been working and thinking about improvements to the Parrot/IMCC optimizer for a while now. Implementing SSA is definitely at the top of my list, because it simplifies a lot of optimizations and makes some others possible. The biggest challenge is to graft such a big change onto a working system, with small changes at a time. The reference to LLVM is interesting, since it has similar goals to Parrot, but is research-oriented. I bet there are some good techniques that could be learned (and borrowed : ) from LLVM. -Curtis Rawls -- -Bill
Re: Register Allocation and Continuations problem definition
I hope that no one misunderstands when I say this. It is not meant to be a criticism, but rather it is my understanding of the problem, put very directly. I may be wrong, and in many ways, I hope am. There is a fundamental flaw in the Parrot architecture. In it's current form, continuations and register allocation cannot coexist. The solution is to do away with the register allocator, or to do away with continuations. This can be done, perhaps by saying that certain functions won't be subject to continuation snapshots or calls. Then register allocation can be done on those routines. Or compiling the whole program with a switch of on or off. When I say do away with the register allocator, I mean that nearly all registers will interfere, so essentially each variable gets a register. The extensible register stack is the best way to implement this strategy. If there are long sections of code with lots of temporary variables, then register allocation might be of some use. But most variables will interfere. Implementing this would be an interesting project. It would be quite different from a standard register allocator. It might even be worth writing a paper on the techniques involved. Turning off full continuations, is a another matter. I don't know what impact that would have on parrot. It would be relatively straightforward to write the allocator with this approach. There are other issues involved, regarding how registers change when called. What is preserved, and how. Matt touched on some of these issues. My main concearn in attempting to implement the register allocator was to find the minimal control flow graph (cfg) which helps to provide the variable interference graph. This is my understanding of the problem. Matt wants this thread to be a problem definition, so these thoughts seem appropriate for this thread. I hope no one is offended by my bluntness, but if this problem is to be solved, it must be understood. Bill Coffman ps: a couple important threads on this issue were http://groups-beta.google.com/group/perl.perl6.internals/browse_frm/thread/3a61fe5e97d17389/0603ff13ca52f0ff?_done=%2Fgroup%2Fperl.perl6.internals%3F_doneTitle=Back+to+topics_doneTitle=Backd#0603ff13ca52f0ff http://groups-beta.google.com/group/perl.perl6.internals/browse_frm/thread/214157bf29879710/4c15aa4a1f56c20a?_done=%2Fgroup%2Fperl.perl6.internals%3F_doneTitle=Back+to+topics_doneTitle=Backd#4c15aa4a1f56c20a On 6/3/05, Matt Fowles [EMAIL PROTECTED] wrote: All~ On 6/3/05, Bill Coffman [EMAIL PROTECTED] wrote: There are several threads in the parrot mailing list that discuss the continuations problem. I hoped to be able to address parrot register allocator again, at some point, but it could be a while before I get to that. In the mean time, search the perl6-internals mailing list for continuations and register allocation. I'm sure you'll find it. I would actually very much like to see this issue moved on. In the hopes of getting everything up to speed quickly I will try and summarize all of the discussions and suggestions that have been made. I am NOT trying to advocate any particular solution (yet), I just want to make sure that everyone is on the same page. I propose that we use this thread to ensure that we are not talking past each other. Discussions about specific solutions should and religion should have a seperate thread. PROBLEM: Continuations can be invoked repeatedly. Whenever a continuation is reinvoked, the registers must be in a defined state or the code will behave incorrectly. BASIC ANALYSIS The actual state of the registers does not matter as long as it is well defined. A) If the registers are defined to be garbage, then every continuation will start by returning them to the state it expects. B) If the registers are defined to be preserved, then every continuation merrily chugs along. Unfortunately these each have issues and as such much debate ensued. SUGGESTED SOLUTIONS 1) Context includes registers + continuations don't have to preserve specifc registers + register allocator can ignore continuations wrt spilling - copying overheard - value based registers (IN) return to old values - pointer based registers (SP) cannot have their pointers moved or require double indirection 2) Return Continuations include registers, non don't + register allocator can remain mostly ignorant + only non return continuations need to worry - promoting a return continuation will force copying as plan (1) 3) register are not restored + simple to explain - register allocator must add many extra interference edges - largely prevents reuse of register 4) variable sized register frames + never have unused registers + no need for a register allocator + could correspond directly to scratchpad - more complicated - no register reuse - large architectural change - more custom allocation (can't pool
Re: continuation enhanced arcs
All, As with most technical problems, fully specifying them, is often half the battle. In this case, I think we're getting close to understanding the issues at least. [please treat all statements as possible questions] First, consider my original post in this thread: http://www.nntp.perl.org/group/perl.perl6.internals/27383 To summarize: Full continuations are powerful but expensive. They are like hidden goto's and add arcs to the control flow graph. This causes more registers to interfere. Proposed Solutions: - Accept the arcs and probable spilling. We still have the regions between sub calls. - Put labels on certain subs that denote they might call or be called by a continuation. - Pragmas indicating ranges where continuations won't be called (or will be). - Restore registers from lexicals somewhere, after each function call. - Accept incorrectness, as the current implementation does (discovered when new allocator failed 2 tests, and Leo saw the problem). Next, consider Dan's message, Lexicals, continuations, and register allocation: On Tue, 30 Nov 2004 10:22:29 -0500, Dan Sugalski [EMAIL PROTECTED] wrote: So far as I can tell there are two cases here: 1) A return continuation 2) A regular continuation [snip return contuations, which are far from solved, but likely a subset of the below] The second case, where code takes an arbitrary continuation that returns to location X, wherever X is. I'm missing the problem there, too. Assuming there's a way to note the destination to the register allocator (with those points being places where all registers must be assumed to be trash) I'm not seeing the problem either. There are only two cases here, where the destination is marked and where it isn't. If it's marked, the register allocator assumes things are dirty and loads everything, which is fine. If it's unmarked, the code has essentially shot itself and everything quietly goes to hell since you've lied to the register allocator and you shouldn't do that. Which is fine too. Don't Do That. If I read the above correctly, Dan is advocating the strongest restriction of all, which is to specify any arcs that might be added. That is, specify all sub_i - sub_j, where - means that a continuation saved in sub_j is invoked by sub_i. To reclassify the reasonable solutions: 1a. Concentrate on restoring registers after calling -- probably using the lexicals. 1b. Possibly increase the size of the register set, and/or try to use lexicals or globals (I think of this as a kind of improved spilling).. 2a. Insert all the arcs into the CFG, which will increase spilling, but 2b. Reduce the number of arcs in the CFG, by introducing various annotations on subs indicating when they do or do not save or invoke a continuation. Especially target library functions. Currently, I'm trying to work on 2a right now, but my day job is making that tough lately. Would like to see more attention payed to 2b, by those who care about the issue. In particular I'd like to see HLL designers say, Yeah, 2b looks like a great idea!, and maybe a few, yeah, let's specifically implement the following... Then, there are additional problems to contend with... On Wed, 1 Dec 2004 09:49:34 -0800, Jeff Clites [EMAIL PROTECTED] wrote: But so it sounds like I and N registers are a bit of a waste then, no? They won't really be used. Even in your my int @foo case, you could have an optimized backing store for the array, but you'd have to on-the-fly create a PMC to hold any values you pulled out of the array, e.g. for a subsequent x = @foo[1] (in Perl6 syntax)--x would have to hold a full PMC there, it seems, so nothing there would end up in an I register (except as in intermediate in immediately creating the PMC). [snippage of examples] If I understand right, PerlArray's would be used for my int @foo. If you want to use these values, you have to copy back and forth from I* registers, for the most part. The ISN registers can be used in the nether regions, between sub calls (even the lower half of the registers can be used there). Subs which are garanteed not to call or save continuations are safe (see 2b above), and ISN registers can be used while crossing those. However, we now know that only P registers can cross unsafe subs, unless ... well, there's probably certain cases where ISN's can be used. That question would take more thought. My sense is that we want to support continuations, but we don't want to be crushed under the heavy load. Perhaps we can adapt the policy that if one is brazen enough to use full continuations, then s/he should be expected to at least inform the compiler about it. Bill
Re: continuation enhanced arcs
On Tue, 30 Nov 2004 14:45:39 -0500, Dan Sugalski [EMAIL PROTECTED] wrote: At 11:20 AM -0800 11/30/04, Jeff Clites wrote: % cat continuation6.ruby def strange callcc {|continuation| $saved = continuation} end def outer a = 0 strange() a = a + 1 print a = , a, \n end Through the joys of reference types, a will continue to increase forevermore, assuming the compiler hasn't incorrectly put a in an int register. (Which'd be wrong) I can see that there is true magic in the power of using references in this way. Nonetheless, how can the compiler figure out that it can't use an integer here? The compiler should use integers when it can, but it sounds like you are saying that when a variable crosses a sub call (which might invoke a continuation) it will then have to be a PMC or String to do the right thing. Remember the PMC and string registers hold pointers to pmc/string structures, which is all we're preserving -- the *pointer* to the structure, not the contents of the structure. (Though that's an interesting thing to do for other reasons, like, say, transactions, we're not going to be doing that) The contents can change over and over without the register itself ever changing. # these two lines are main outer() $saved.call Bill JEff -- Dan
Re: continuation enhanced arcs
Hello All, I haven't been able to devote much time lately, but this week I hope to modify the register allocator to deal with these arcs, as well as enhancements to the spill code. I expect that in some cases it will have a definite impact on speed, and others it won't. Most of the test cases (make test) don't have that many variables, for instance. I hope to get this patch out in about a week. There has been much discussion about how these arcs will impact the performance, complexity, etc. What we need are some hard numbers, so as to compare results. Perhaps Leo can impliment his architecture, and compare to this. Really, I think that Leo's idea is a form of spilling in itself, but without the extra load/store instructions. That kind of spilling is much better. HLL compilers can optimize this further, by adding annotations to subs, saying no full continuations called by any subroutines (actually, only one subroutine calling a full continuation would not impact the register allocator). Analysis could also be helpful in certain cases. Bill
Re: Lexicals, continuations, and register allocation
On Wed, 24 Nov 2004 09:39:27 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Bill Coffman wrote: On Tue, 23 Nov 2004 23:26:39 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Keep in mind that you don't actually have to add all those CFG edges. You already know precisely the effects of adding them. All non-volatile symbols (those crossing subs that might make continuation invocations) are garanteed to interfere. This is garanteed to be a clique in the interference graph. No need to actually add the CFG edges to know how the interference graph is effected. Possible, but just another special cased exception. With that you get two possible interferences of different kinds, with additional coding overhead ... Instead of n*(n-1) arcs for n continuation passing subs (both to and from), we can simply add a pseudo instruction. All n sub calls have arcs to that pseudo-instruction, and out from it, to the instruction after the sub call. This means only 2*n arcs (both to and from). Each continuation calling sub becomes a kind of wormhole. Any symbol crossing it, also crosses the pseudo-instruction. What makes it a little complicated is how do these ubiquetous symbols interact with the non-ubuiquitous? Those arcs are needed for this. I've no confirmation of a compiler writer that its possible. Annotating PIR can only work for nested closures. If libraries are involved you are out of luck. It was Larry Wall who suggested the pragma in the long thread, Continuations, basic blocks, loops and register allocation . Although his message is somewhat genaral, he indicates a prefernce for pragmas to turn off certain features that slow things down for correctness. [gmail works great for finding stuff like that] Bill
Re: [perl #32418] Re: [PATCH] Register allocation patch - scales better to more symbols
Wait, I just thought of a huge change. Dan, Does the patch you have implement Leo's U_NON_VOLATILE patch? If so, that restricts from 32 to 16 registers, in various cases for *non-volatile* symbols (did I get that right?). Anyway, the symbols that cross sub calls can only use 16 registers, where they used 32 before. That could have a huge effect in that more variables are spilling. Well, if you don't have that patch, then back to the drawing board. ~Bill On Tue, 23 Nov 2004 11:55:47 -0500, Dan Sugalski [EMAIL PROTECTED] wrote: At 5:40 PM +0100 11/23/04, Leopold Toetsch wrote: *But*, I've looked again at the new reg_alloc.c code. It seems to have a piece of code with qubic order in registers, which is for sure killing all performance advantage it has for a few hundreds of symbols. So the scales better to more symbols has some limits when more reaches 10K ;) I'll hold off then. I can't picture anything that -O3 could do that wouldn't get swamped by a cubic time algorithm. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Lexicals, continuations, and register allocation
On Tue, 23 Nov 2004 23:26:39 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Matt Fowles wrote: Have we seen that this actually destroys us? Meaning, if we add the extra CFG arcs, do we start spilling like mad? If not, this is much ado about nothing. Please first have a look at Dan's recent posting about Evil Sub. Then estimate, how many subs may be called in 14000 basic blocks. For 100 subs only you get ~1 more edges ... Second, having this arcs means just that in that range i.e. from the first subroutine to the last, you can't reuse a register around a call. Question: how many from ~23000 registers might be effected. Please note the currently rather low spill count despite the huge register usage. Keep in mind that you don't actually have to add all those CFG edges. You already know precisely the effects of adding them. All non-volatile symbols (those crossing subs that might make continuation invocations) are garanteed to interfere. This is garanteed to be a clique in the interference graph. No need to actually add the CFG edges to know how the interference graph is effected. This brings up the possibility of non-volatile symbols that cross subs that are garanteed *not* to invoke continuations. Those would not necessarily be part of the clique, but would be non-volatile (forced into the upper half of registers). The register allocator could handle that the way it currently does (the one I'm currently working on, that is). It bothers me that this discussion is not including the concept of subs that don't call/invoke continuations. Remember the previous posts about adding a label, or setting a pragma? Also, if that sub is defined in the same imc source code, we can analyze it to see if there is a continuation there or not. Another interesting thing about this problem is that these new CFG edges are rarely, or at least with low probability, ever travelled. That can also be used to our advantage I think. ~Bill
Re: Lexicals, continuations, and register allocation
On Wed, 24 Nov 2004 04:55:24 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Matt Fowles [EMAIL PROTECTED] wrote: ... However, if a continuation restores registers to the data they had when the continuation was taken, then all of the registers will contain the things that exactly as the original allocator expects them. Yes. We had that scheme until the indirect register frame addressing. It was: savetop invokecc restoretop It was too slow - remember a factor of 5! I think the problem was that you were saving 32*4 = 128 registers! Even if you only used 5 or 10. Thats a factor of ten right there, I bet. The right way to have done this was savetop(n1,n2,n3,n4) invokecc restoretop(n1,n2,n3,n4) Where n1..n4 are the number of registers to be saved, or better yet, the length of the register array to save. That way, only the critical registers would be saved, and that could be minimized by the allocator. Then your memcpy's would have been optimized, but saving 600 bytes of unused territory was never going to work. Bill leo
continuation enhanced arcs
Hello all, I would like to address the problem of how continuations have affected the register allocator. This problem came to the public eye, when two tests that the new register allocator failed. Leo's analysis was essentially that the allocator was fine, but that our handling of continuations was incomplete, as far control flow and regiester allocation went. First, I think that it is important to acknowledge what is actually agreed upon. Exceptions are handled as a special case of continuations, and as such, this is a subproblem of the general continuations problem. Leo's proposed solution, below seems to have been accepted as workable, correct, and fast enough. Quoth [EMAIL PROTECTED]: 1a) for exceptions: set_eh handler, catch_label This is just a small adaption of the sequence of installing an exception handler. It depends a bit, if exception handlers are inline or nested closures or both. Second, is the larger, *unresolved* problem of continuations in general. Bear with me, as I am just barely beginning to understand continuations myself. Comments on my observations are more than welcome. Especially on my mistakes. Continuations are a kind of goto. When you invoke a continuation, it moves the program counter to where the continuation was taken, and restore most state (but not registers). This scheme has provided a wealth of power, with acceptable performance for a variety of circumstances. One neglected consideration, when implementing this strategy, is the effect on the control flow graph. sub1() ---+ -+ ... || sub2() +-+ | ...| | sub3() ---+-+ In the continuations enhanced control flow graph (the control flow graph (cfg), after one considers the effects of continuations), it is possible for any subroutine to jump back to the exit point of any previous subroutine. Leo has drawn the picture a few times, as above, showing the complex web of links from any sub to any sub. It is the control flow graph which provides the register allocator with the information it nees to determines which variables interfere. They interfere if they could be active at the same time. If they interfere, they must be assigned different registers. Now, consider a slightly different topic: According to recent decisions, the lower half of the registers (0-15) for all types, will not survive a subroutine call. This has created what we can call volatile, and non-volatile symbols (variables). Non-volatile symbols do not cross subroutine calls, and can therefore be put in the lower register half. Volatile symbols cross sub calls, and cannot be put in the lower half. (Think the volatile keyword from C, which means don't mess with this value, well it sort of means that.) Here's the funny thing -- in the new continuations enhanced flowgraph, every volatile symbol in a subroutine will interfere with every other volatile symbol. Here's an informal proof. Remember all those arcs that Leo drew? For instance, if v1 crosses sub1() and v2 crosses sub2(), there is an arc from sub2() back to sub1() [wolog, sub1 is first]. So v1 is live when v2 is live, and the two symbols interfere. All those arcs extend the life range of each symbol to that extent. Therefore, all volatile symbols in the sub will interfere. What this means is that if your subroutine has more than 16 volatile symbols of a given kind, there will be spilling. Here's a list of proposals I have seen to deal with that issue: 1) Let the above stand (I think I was the only one who said that, and that was before I had really thought about it). 2) Let the above stand, but allow for a pragma to turn off the arcs. Thus, I can create a range within the sub, that turns off all those pesky continuation enhanced arcs within that range. HLL compiler writers will be able to set this pragma, knowing that none of it's subs use continuations. (Larry Wall's suggestion.) 3) Specifically a label for each subroutine that *might* set a continuation. Others have said that continuations should not be the common case, if a continuation could be called, then labelling it as RESUMABLE, isn't such a bad idea. (Leo's idea.) 3a) The natural extension of that is to also have label, like INVOKEABLE, for subs that might invoke a continuation. 4) Fetch all from lexicals/globals after a function call. (Another of Leo's.) 4a) Use lexicals instead of registers. (Not sure who mentioned it, but it seems like just spilling everything, using a slightly different spilling scheme.) 4b) Conditionally fetch ... (Ben's suggestion is one that deserves some further thought. Leo doesn't think it could work, and he understands the problem better than I, but his answer is not a difinitive impossible, so that gives some hope.) 5) Let the symbols interfere. The chances are that the program will run correctly anyway. Let the register allocator do it's thing, and every
Re: silent effects of opcodes
On Thu, 18 Nov 2004 08:13:02 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Bill Coffman [EMAIL PROTECTED] wrote: ps: I'm making progress on grokking the cfg and register renaming stuff. Will let you know. This needs an SSA graph of the data flow? Static Single-Assignment (I had to look it up). My approach is to consider something you once said, which was to the effect of ensuring the accuracy of the analysis. I've got some code to do a fairly simple depth first search per symbol. It is slower (I presume), but easier to understand. Later, I plan to (or someone else could) implement the data flow stuff with bit vectors and SSA to determine reachability, etc. In this way, I can compare the simple analysis with the complex analysis, and convince myself, and anyone else that the code is correct. All this will be done even before verifying the program runs correctly (passing make test). Advanced_Compiler_Design__Implementation by Muchnick is my main resource. Register renaming is easy enough when you have DU (def-use (def means value is written, use means value read) and UD chains, and especifically webs (which connect defs to uses). Webs are components of def-use chains (using graph theory language). They are effectively different variables, and can be provided with separate symbol names, which is actually very similar to what is done during a spill. My depth first search algorithm finds these too. Like I said, I wanted to start with something simple that is correct, and move forward with the more optimized forms later. Another thing I'd like to do, is throw in is a randomizer, to change the way the allocator assigns registers. Considering all the cruft that was uncovered when the algorithm was changed, it might be a good idea to have a debug feature that selects registers differently each time it runs (each allocation should be valid, of course). This could help flesh out any other faulty assumptions in the parrot compiler. BTW, looking at analyse_life_block() it seems that this allocates unconditionally a Life_range structure. As these are O2 in (n_symbols * n_basic_blocks) we could safe huge amounts of memory, by defining that a missing life block propagates the previous one. Dunno if its possible, though. That's definitely not very efficient. Plus it's mallocing each Life_Range, which is usually at least another 8 bytes per chunk, and then a lot of potential thrashing when freed. It seems I've seen a lot of less than optimum code like this, and I suspect there's a lot more like it. Solving these things will bring me that much closer to the ultimate goal of defeating Dan's evil code. ;) leo Bill
Re: silent effects of opcodes
On Wed, 17 Nov 2004 14:14:18 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: I've now (locally here) extended Bill Coffman's register allocator by one subroutine that actually decides to use non-volatiles or volatiles according to pdd03. All variables that are live around a subroutine call are e.g. allocated from R16..R31. Interesting. I'd like to see it. Regarding pdd03, I am still not clear how it should be interpreted. The current pdd03, as well as the previous one, both seem to indicate that registers 0-15 are likely to be overwritten, and anyone making a call, should save those registers if they still want them. The issue with PIR Code, is that the author won't know which of their symbols are mapping to registers about to be killed. So, as previously discussed, those registers will have to be hands off for the register allocator. That is essentially how the old and new alloctor have been working. But this doesn't have to always be the case. * If the subroutine being allocated is a leaf (with no sub calls), then all registers should be available. * Registers P4, S1-S4, N0-N4 are free for allocation, regardless. * It seems like it would be simple enough to provide a compiler hint, to let the allocator know if the subs it calls are using the parrot convention or not, or how many of the R5-R15 it will need. From this hint, a bit mask saying which registers are available could be constructed. This can then be used as part of a static analysis, and can be incorporated into the unit data structure, or passed as a separate parameter to imc_reg_alloc(). I wouldn't think this last idea would be considered a change to the calling convention, but rather as an optional optimization prototype. Not part of pasm. Dan, would something like this be allowed? ~Bill
Re: silent effects of opcodes
So to generalize. The following registers are available, under the following conditions: * Registers R16-R31 are always available for the allocator. * Registers R0-R15 are available between sub calls. That is, for any symbol, whose life range does not cross a subroutine. (This implies that all registers are available if no subs are called.) Since we have no way to determine if a sub is using those or not, any sub call will be assumed to possibly use R0-R15. Furthermore, even though we know there are certain registers in that range, which are unused by the calling convention, we will still not use them through a sub call for security reasons. * [NEW] If register 15 or below is used, it should be cleared out, ZEROED, after it's last use and before the next sub call. This is for security reasons. Obviously, these registers will not be the first choice to use. * Availability of these registers is subject to the rules for using the Parrot opcode Csetp_ind Ix, Py, which were (are being?) worked through by Leo. Other observations: * Leo introduced a flag on the symbol, to indicate if it's volatile or not. These will be eligible for R0-R15 (volitile registers?). * From new allocator bugs, and analysis, we've discovered that exceptions cause new control flow edges, not previously considerd. This case is being reworked by Leo? to provide missing CFG edges, through a minor change in the try block declaration. (thread Continuations, basic blocks, loops and register allocation) * The case of continuations has not been solved with respect to register alloction. Leo's RESUMEABLE: label might provide help here. In any case, we can expect to see some additional edges being inserted though. (also thread Continuations, basic blocks, loops and register allocation) Did I miss anything?
Re: silent effects of opcodes
* [NEW] If register 15 or below is used, it should be cleared out, ZEROED, after it's last use and before the next sub call. This is for security reasons. Obviously, these registers will not be the first choice to use. Nope -- this isn't the job of the register allocator. We aren't leaving security issues up to bytecode except in a very few, limited cases. (All involving subroutines with elevated security credentials which the sub needs to drop after using things they allow) Okay, looks like I misread an earlier message of Dan's. The reason that we cannot use R0-R15 through a sub, is that they are shredded. The values are not preserved through the sub call. Since I understand the item about allocating registers between sub calls, I can probably implement that change, as I work through the control flow/data flow analysis. Sounds like everything else is okay. We're just missing a few CFG arcs from the continuations stuff, which I'll let you all worry about. :) Bill ps: I'm making progress on grokking the cfg and register renaming stuff. Will let you know.
Re: IEEE 754 double floats
Another way is to count the bits, as in the following: .sub _main N1 = 1 N2 = 0.5 I0 = 0 REPEAT: I0 = I0 + 1 N2 = N2 / 2 N3 = N1 + N2 ne N1, N3, REPEAT print I0 print bits precision\n end .end On Wed, 17 Nov 2004 01:13:21 +1300, Adam Warner [EMAIL PROTECTED] wrote: Hi Leopold Toetsch, PI is (very) approximately: 3.1415906535896946927266526472521945834159851074218750 ^ 3.141592653589793238462643383279502884197169399375105820974944 You might probably want to run more iterations ;) And you'll never get 60 digits out of long doubles. I appreciate that Leopold. I thought the zeros indicated where the stored value cuts off. But with 52 significant figures at ~3.3 bits per figure that's about 172 bits, which indicates they would have to be 256 bit floats when including the exponent (my mistake in calling them 128 bit)! In other words sprintf is printing trailing garbage: .sub _main set N1, 2.0 set N2, 3.0 div N1, N1, N2 new P0, .PerlArray set P0, 1 set P0[0], N1 sprintf S0, %.60f\n, P0 print S0 end .end On the CVS version I compiled this prints: 0.2965923251249478198587894439697265625000 On this win32 release (running on a different machine): http://www.jwcs.net/developers/perl/pow/download/pow-0.1.1-release.zip parrot prints: 0.3000 Regards, Adam
Re: Another issue with pdd03
PDD03: Responsibility for environment preservation PDD03: PDD03: The caller is responsible for preserving any environment it is interested PDD03: in keeping. This includes any and all registers, lexical scoping and PDD03: scratchpads, opcode libraries, and so forth. PDD03: PDD03: Use of the savetop opcode is recommended if the caller wishes to save PDD03: everything, and the restoretop opcode to restore everything savetop saved. PDD03: This saves off the top 16 of each register type, leaving the bottom 16 PDD03: registers, which generally contain the return values, intact. The way I read it, paragraph one implies that when you print P5 after calling foo(), you are expecting to get the return value. You didn't save and restore register P5, so you wanted foo() to do something to it. The above docs may be slightly ambiguous. The first paragraph says that you have to save everything. The seconds says that savetop/restoretop commands are there to help the user, by allowing them to take care of registers 16-31, but what about 0-15? Is there an implication that you don't have to save those? Based on the way people are coding IMC, that seems to be the case. Both the old and new register allocator are kind of hands off the first 16 registers. Recall that I was asking about this some time ago, and found that if the allocator tries to use the bottom half first, chaos ensues. Our convention then ... our interpretation of the calling convention, is that we try to allocate only the top 16 registers. If more are needed, the new allocator starts from R15, and go on downward. The old one just went up from R0. Note that PDD03 is specifying a dynamic calling convention. Because of this, the register allocator cannot use the convention to find what should be saved, and what it can use. If this were indicated at compile time, the register allocator could keep hands off only the used registers. As it is, we may need to do as you suggest, and leave the bottom 16 registers for parameter passing, and the top 16 for local symbols and register allocation. If so, we will obviously need a better register allocator. One exception to this. If a sub calls no other subs, it can use all 32 registers. Also, P4, S1-S4, N0-N4 seem to be free. ~Bill On Sun, 14 Nov 2004 18:32:50 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: As outlined in the analysis of dumper.t failures with the new register allocator, we have another problem with current calling or better return conventions. Given this simple program: $ cat ret.imc .sub main @MAIN P5 = new PerlString P5 = ok\n foo() print P5 .end .sub foo .local pmc ok ok = new PerlString ok = bug\n .return(ok) .end $ parrot ret.imc bug The usage of the register P5 in main isn't invalid: the register allocator might have (in a more complex program) just not found any other free register. And as the main program just calls a function in void context the register P5 could have been used. Defining now that P5 has to be preserved in main, because it's a possible return result of foo() and therefore may be clobbered by foo() is meaning, that we have effectively just 16 registers per kind available for allocation around a function call. If the latter is true according to current pdd03 then we are wasting half of our registers for a very doubtful advantage: being able to pass return values in R5..R15. leo
Re: Continuations, basic blocks, loops and register allocation
On Sun, 14 Nov 2004 17:03:33 -0500, Dan Sugalski [EMAIL PROTECTED] wrote: At 5:53 PM +0100 11/13/04, Leopold Toetsch wrote: As the analysis of test errors of the new reigster allocator has shown, we have a problem WRT register allocation. This problem isn't new, but as the allocator is more efficiently reusing registers (or reusing them in a different way) it's exposed again. We don't really have that much of a problem. What we have is just something more simple -- the target of a continuation marks the start of a basic block. That means that we have to assume everything we don't get handed back from the function's dirty and should be refetched. I tend to agree that this is not such a problem. Basically, Parrot has various control flow possibilities that were not previously considered. (1) An exception can happen in a try block, so CFG arcs need to be added from statements within the try block to the catch. (2) Continuations, which I don't pretend to understand, can also flow in previously unconsidered directions. Those arcs need to be added to the CFG. Alternately, for exceptions, it may be possible to circumvent that process, and just add symbolic interfenence to all vars in the try block with all vars in the catch block. For continuations, however, it seems like those really are control flow arcs that need to be added. If analysis can trim a few of them, then that would be great, but if people are using continuations, maybe they shouldn't expect high performance, and extra register spilling may be another side effect. ~Bill
Re: register allocation
Interesting. I did a little more research, and checked out a fresh CVS tree. I did nothing more than insert a conflict checker into reg_alloc.c That is, whatever pre-colored registers have been assigned to symbols, I verify that none of them are conflicting. Guess what I got? make[1]: Leaving directory `/usr/home/coffman/dev/parrot/parrot/docs' ./parrot -o runtime/parrot/library/Data/Dumper/Base.pbc runtime/parrot/library/Data/Dumper/Base.imc ** added new here node 1 = defname(S) is colored 5 and neighbor 9 = S5(S) is colored 5 parrot: imcc/reg_alloc.c:200: imc_reg_alloc: Assertion `r-color==-1 || r-color != unit-reglist[y]-color' failed. make: *** [runtime/parrot/library/Data/Dumper/Base.pbc] Aborted The routine you mentioned, allocate_wanted_regs() doesn't assign wanted regs if in conflict, but as far as I can tell, there's no check on the integrity of precolored nodes. Not quite sure what the correct behaviour is for allocator, when it's given conflicts. I saw 'K' compounded key somewhere, but don't know what it is. I know reg_alloc.c pretty well though. Offhand I'd say it's not handled in reg_alloc. Maybe it should be? ~Bill On Fri, 12 Nov 2004 10:26:51 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Bill Coffman wrote: Hello, * I have the below failed tests. I haven't looked into these. Can someone tell me if the tests are broken, or is my allocator broken. I know they don't fail for the current cvs code (as of yesterday). Failed TestStat Wstat Total Fail Failed List of Failed --- t/library/dumper.t 13 332813 13 100.00% 1-13 t/op/gc.t 1 256181 5.56% 13 t/pmc/sub.t 1 256781 1.28% 78 These errors look remarkable the same, when I turn on OPT_SUB that is, when allocate_wanted_regs() is used. And this code did miss registers sets like 'K' (compound keys). Changing imcc/reg_alloc.c:890 to: if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set == 0)) did fix this flaw. leo
Re: register allocation
These errors look remarkable the same, when I turn on OPT_SUB that is, when allocate_wanted_regs() is used. And this code did miss registers sets like 'K' (compound keys). Changing imcc/reg_alloc.c:890 to: if (r-color = 0 || r-want_regno == -1 || strchr(ISPN, r-set == 0)) did fix this flaw. leo You're patch looks suspiciously like a typo. strchr(ISPN, r-set == 0)) will always returns true if r-set != 0. This means that the patch effectively disables allocate_wanted_regs(), unless there are certain cases where r-set==0, which I doubt. Try sticking this toward the beginning of imc_reg_alloc(), but after build_interference_graph, like imcc/reg_alloc.c:187 (or so) #ifndef NDEBUG { int x,y; for (x = 0; x unit-n_symbols; x++) { SymReg* r = unit-reglist[x]; for (y = 0; y unit-n_symbols; y++) { if (ig_test(x, y, unit-n_symbols, unit-interference_graph)) { if(r-color!=-1 r-color == unit-reglist[y]-color) fprintf(stderr,node %d = %s(%c) is colored %d and neighbor %d = %s(%c) is colored %d\n, x,r-name,r-set,r-color, y,unit-reglist[y]-name,unit-reglist[y]-set, unit-reglist[y]-color); assert(r-color==-1 || r-color != unit-reglist[y]-color); } } } } #endif And before ... while (todo) {
Re: Register allocation/spilling and register volitility
On Mon, 8 Nov 2004 12:20:19 +0100, Leopold Toetsch [EMAIL PROTECTED] wrote: Jeff Clites [EMAIL PROTECTED] wrote: On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote: If frames aren't adjacent, normal argument copying can be done anyway. This would seem to require the same types of runtime checks that you are objecting to below, Not runtime. The register allocator knows the amount of needed non-volatiles and volatile registers. Depending on that a call can place arguments directly into the incoming regs of the caller or not. Then if at runtime copying is needed (because e.g. frames aren't adjacent), the function arguments are copyied. This is fully transparent. I discussed that in item (1) under Further notes. Yep, saw that then ;) It's not needed. I've a better scheme in mind, which addressess efficieny as well as argument passing. And spilling? Well, I'm proposing a variable-sized register frame. With very little additions we could run with more then 32 registers per kind (there are a few bitmasks currently that would need adaptions, but not much). I think this is a great idea. Mostly, I was thinking of the case where there are less than 32 registers needed. The allocator can try to reduce the number of registers that will need to be used. I think that addressing could become an issue is there are too many registers, but I still haven't figured out how the bytecode looks yet. But basically, spilling should not be needed at all, if the register allocator isn't as broken as it now is. Dan got some really evil subroutines, so we have RL test cases. We'll see. In that case, I'll focus on the register renaming, live-range analysis, etc. The memory leak, which might not be actually lost memory, but just wasteful useage of memory, is pretty much related to the spill code. Most of the grotesqueness of the allocator is in the spiller. If this has a high probability of changing, I'll focus on other stuff, like register renaming, which is a lot harder, but with higher longterm payoff. In light of this, I think I'll send in my patch, in case it is helpful. The problem is that some routine outside of imc_reg_alloc() is sending in conflicting register preallocation. My code does lot's of assertions, and finds this, one way or another. I'll put in a variable to turn off color consistency checking, which will probably then pass the tests. Bill JEff leo
register allocation
Hello, I have addressed almost all the code issues from the last patch. That broke some things, which I had to fix, but they're fixed. A couple other improvements too. Here's the state of things: * I have the below failed tests. I haven't looked into these. Can someone tell me if the tests are broken, or is my allocator broken. I know they don't fail for the current cvs code (as of yesterday). Failed TestStat Wstat Total Fail Failed List of Failed --- t/library/dumper.t 13 332813 13 100.00% 1-13 t/op/gc.t 1 256181 5.56% 13 t/pmc/sub.t 1 256781 1.28% 78 4 tests and 59 subtests skipped. Failed 3/124 test scripts, 97.58% okay. 15/1975 subtests failed, 99.24% okay. * Still have a memory leak that blocks Dan's evil code from compiling. * Run time isn't so great, with 10x gcc -O2, for 200 symbols, and 100x for 500 syms. Below, is for 1000 syms. Still, this is much better than the previous allocator, which crashed my workstation around 130 symbols. #vars gcc parrot 198 6.84 66.88 199 7.29 82.03 200 9.86 109.86 500 0.74 125.39 501 0.68 99.53 999 1.68 755.30 1000 2.06 988.55 1001 1.79 723.25 * Considerable improvements are on the way. Some are fairly low hanging fruit, but my biggest worry is not to break things along the way, which is all too easy to do. ~Bill
Fwd: register allocation
Trying again... -- Forwarded message -- From: Bill Coffman [EMAIL PROTECTED] Date: Thu, 11 Nov 2004 14:05:29 -0800 Subject: register allocation To: Perl 6 Internals [EMAIL PROTECTED] Hello, I have addressed almost all the code issues from the last patch. That broke some things, which I had to fix, but they're fixed. A couple other improvements too. Here's the state of things: * I have the below failed tests. I haven't looked into these. Can someone tell me if the tests are broken, or is my allocator broken. I know they don't fail for the current cvs code (as of yesterday). Failed TestStat Wstat Total Fail Failed List of Failed --- t/library/dumper.t 13 332813 13 100.00% 1-13 t/op/gc.t 1 256181 5.56% 13 t/pmc/sub.t 1 256781 1.28% 78 4 tests and 59 subtests skipped. Failed 3/124 test scripts, 97.58% okay. 15/1975 subtests failed, 99.24% okay. * Still have a memory leak that blocks Dan's evil code from compiling. * Run time isn't so great, with 10x gcc -O2, for 200 symbols, and 100x for 500 syms. Below, is for 1000 syms. Still, this is much better than the previous allocator, which crashed my workstation around 130 symbols. #vars gcc parrot 198 6.84 66.88 199 7.29 82.03 200 9.86 109.86 500 0.74 125.39 501 0.68 99.53 999 1.68 755.30 1000 2.06 988.55 1001 1.79 723.25 * Considerable improvements are on the way. Some are fairly low hanging fruit, but my biggest worry is not to break things along the way, which is all too easy to do. ~Bill
Re: Solicitation of Ideas for Performance Statistics and Graphs
On Wed, 3 Nov 2004 12:52:26 -0800 (PST), Joshua Gatcomb [EMAIL PROTECTED] wrote: [snip] What we would like to do is determine if what we have done so far is sufficient or, if not, what specifically people would like to see. Some of our unimplemented ideas so far are: 1. Include the computed goto core 2. Summary of results over N week(s)/month(s) 3. Provide user form for dynamic results If people would like this, they also need to indicate what the form should provide: (benchmark name, date, executable, etc) 4. Provide HTML table of data for some/all of graphs 5. Provide links for people to work locally A. A db schema/structure dump so people can collect statistics on other architectures B. Source code C. daily db dump If you would like to see any of these ideas implemented, or you have some of your own - please respond to this on the list. [snip] I think this is a really great idea. I made some picture graphs for the register allocation stuff myself, and tried to evangelize the idea a little, but you guys have introduced a useful tool that can help developers see the results of their work. Now, one can anxiously check the results after a checkin and see if a desired speedup has occurred or not. Great job. To make this idea, and philosophy, pay off, we need (35B)++. I think what you're suggesting is to provide the software to others, so they can run their own tests. That's great, but I'd rather create the test, check it in, and modify a master config file, that says which tests will run. The administrator, of course would have to approve the patch. For long tests, they don't have to be in the nightly test suit, but could be run by name, by the user. For number 4, I'd say that a comma separated values (CSV) file would be most useful. Then users can view and manipulate the data themselves. Various spreadsheets have nice capabilities for this, and of course certain scripting languages can parse this kind of data well. So I hear, at least :). Once again, great idea, great job! ~Bill
Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols
Hello, Just getting back to parrot. [Leo]* alloca() isn't portable and not available everywhere [Dan] Yep. This is a gcc-ism. Use Parrot's memory allocation functions instead. reg_alloc.c uses malloc(). I saw alloca() used in the CVS tree earlier, but it's gone now. What is the correct way to allocate memory in Parrot? [Leo]* the Dbg and Dbg2 debug macros aren't needed. Just use the existing debug(interp, level, ...) function in src/debug.c. If you need some extra levels, you can use some more bits in imcc/debug.h I notice an IMC_TRACE macro, as well as the associated debug stuff. Debug looks like it has a lot of overhead, even when turned off. I didn't really get Dbg cleaned up either, because I couldn't figure out how to handle variable number of arguments, but something like assert() would be much nicer. I'll go ahead and follow the convention of adding debug calls. Should the IMC_TRACE stuff be removed? [Leo] * all functions should have an Interp* and a IMC_Unit* argument to allow [Leo] reentrancy. I.e. all state should be in the unit structure. I'm not quite sure I understand this. I get that each function call has (interpreter,unit) as parameters. But why is this? Parrot handles interrupts, I understand, so this probably has something to do with interrupt processing. Does this also mean that there should be no other data structures, except those pointed to by interpreter or unit? [Dan] I'd like the code issues cleaned up before it gets committed. I want to fix the memory leak, which involves rewriting a few functions. This might go fast, or it could become difficult. It's probably best to send an intermediate patch that has the code issues fixed. That way we can maximize code review, and make sure I'm writing parrot compliant code. - Bill
Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols
Sounds like the memory leak. Let me try to fix this, and address the other issues. I'll get back to you. Thanks, -Bill On Fri, 29 Oct 2004 10:15:34 -0400, Dan Sugalski [EMAIL PROTECTED] wrote: At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote: Bill Coffman (via RT) wrote: Patch does the following: - Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. Good. Hopefully Dan can provide some compile number compares. The numbers are... not good. I took one of the mid-sized programs and threw it at the new code. Parrot in CVS takes about 10 minutes to run through this program. The main sub's about 30Klines of code, and the stat from a parrot -v is: sub _MAIN: registers in .imc: I2875, N0, S868, P7615 0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch 0 used once deleted 0 invariants_moved registers needed:I2883, N0, S873, P7741 registers in .pasm: I31, N0, S31, P32 - 37 spilled 5845 basic_blocks, 47622 edges I applied the patch to a copy of parrot and ran it. After 37 minutes I killed the thing. It had 1.6G of RAM allocated at the time of death, too. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: register allocation questions
Hi all, Thanks for your continued comments. Btw, I usually read all the parrot list, so don't think I'm not paying attention. Currently, here's how the register allocator is doing. Failed TestStat Wstat Total Fail Failed List of Failed --- t/library/dumper.t5 1280135 38.46% 1-2 5 8 13 4 tests and 51 subtests skipped. Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay. I recall Leo, or someone, saying that the data dumper routines are not following the calling convention properly. So I've decided not to worry about it too much. It passes the other tests, plus the randomized tests that I created, up to 150 symbols. At that range, it still takes about 20x longer than g++ -O2, for equivalent programs to compile (see gen4.pl). Also, it is currently running about O(n^2) for n symbols, where the old one was running about O(n^3) from my analysis. The spill code is still very expensive, and has a large constant associate. I also have data, which is attached. The difference doesn't show up until a lot of spilling is going on, around 80 symbols or so. I've learned a lot about how the compiler works at this point, and I'd like to contribute more :) Would you like a patch? Should I fix the data dumper routines first? What is all this talk about deferred registers? What should I do next? Well, I'm making some comments on the below stuff. On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote: At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote: So, if you want that really super efficient, you would allocate registers around function calls directly to that wanted register number, which should be in the SymReg's want_regno. Yes, I think we are kind of doing this. It's best to pass the registers straight through though. Like when a variable will be used as a parameter, give it the appropriate reg num. Sort of outside the immediate scope of register coloring, but as I've learned, one must go a little beyond, to see the input and output for each sub. While true, in the general case leaving 0-15 as non-preferred registers will probably make things easier. Those registers, especially the PMC ones, are going to see a lot of thrash as function calls are made, and it'll probably be easier to have them as scratch registers. I guess I don't agree. I'd like to pack down the number of registers used to a minimum. Then when a function is called, only those needed registers are copied in/out. Don't think the functionality exists. But the idea is to have each sub declare how many registers to save/restore. This would then save 0-k such registers. Where k is the number of registers used by the sub. Pack 'em down, minimize the number needed. We can also minimize this number to match the physical architecture that parrot is running on (for an arch specific optimization). The imc_reg_alloc function does not have 32 hard coded in there (well a little bit, but can be easily changed). It's pretty dynamic. Yep, that's the easy part ;) OTOH when the register allocator is doing register renaming anyway, the most inner loop with a function call should get registers assigned already matching the calling convemtions. With more then one call at that loop level, you have to move around registers anyway. Yes, yes, renaming! I want to do register renaming! Oh, sure, but keeping your scratch PMCs out of the way makes life a lot easier for the register coloring algorithms. Might not be optimal, but if it makes life simpler to start, optimal can come later. p31 holds all the spill stuff. It's a pain. Maybe I'll move that around, but if p31 is used, it means that there is no more room for symbols, in at least one of the reg sets. [1] all except Dan's 6000 lines subroutines :) Did you start creating real subs for your code already? I wish. :( Unfortunately not, outside some simple stuff, and I doubt I will. The language just doesn't lend itself to that sort of thing. We're going to add actual real subroutines to the language after we roll out into production, but that doesn't help now, alas. Interesting. I'd like to test on something like that. Maybe SPEC99 as well. - Bill Coffman compile.dat Description: Binary data compile.plot Description: Binary data attachment: compile.png
Re: register allocation
When I cvs up'd, cleared and reConfigure'd I got these stats: Failed Test Stat Wstat Total Fail Failed List of Failed --- t/library/streams.t1 256211 4.76% 11 t/op/gc.t 1 256181 5.56% 13 4 tests and 66 subtests skipped. Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay. I'm curious to see how it behaves once the spilling gets up into the 1000+ symbol range. Dropping from cubic to quadratic time ought to make a not-insignificant change in the running time, even if that constant's pretty big. :) I think it's a bit more complicated. M=number lines in code, N=number variables. - Time = O(M^2+N^2) - old time = O(M^2 +N^3) Not quite sure of this either. But the N^3 eventually dominates, I think. The data seems to bear this out. There are more fixes I'd like to make as well. I spotted several things that could be fixed. And I think the spill code can be optimized a lot to reduce the big-O time as well. More statistics #vars v. time in seconds: #vars gcc parrot2 200 7.92 89.20 201 11.86 146.31 202 18.11 246.37 203 9.54 107.88 204 11.81 134.60 205 14.75 190.95 206 13.25 161.83 207 10.63 138.83 208 11.02 117.73 209 7.14 88.29 210 15.14 176.69 I am also running gen3.pl with 1000 vars. It's still on gcc. We'll see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight. Would you like a patch? Yes! Oh, yeah, definitely. [...] If you've got a patch, I'd be more than happy to give it a whirl, and I can likely get you a copy of the code in question to give a run on. Soon, I'll send one the proper way. The imc_reg_alloc function does not have 32 hard coded in there (well a little bit, but can be easily changed). It's pretty dynamic. By all means, go for it. I certainly don't want to curb your enthusiasm. It's the right thing to do, ultimately. I didn't want to presume on your time. Happy to have it, of course. :) Thanks. I've had a great time doing this. Remembering graph algorithms and compilers. Great fun! I'd also like to contribute to getting Parrot out there, sooner rather than later. So if I can help with that, I'd like to hear suggestions. -Bill
Re: register allocation questions
Thanks Matt, I hope I can help out. The patch I am submitting actually does simplify register coloring a bit. I've been waiting for perl6 with so much anticipation, I just couldn't stand it any more, and I had to participate. -Bill On Thu, 28 Oct 2004 18:17:57 -0400, Matt Fowles [EMAIL PROTECTED] wrote: Bill~ I have to say that I am really impressed by all of the work that you are doing, and if you can make the internals of imcc a little more approachable, you would be doing a great service. Thanks, Matt
Re: C89
Thanks for the info... Apparently, gcc -ansi -pedantic is supposed to be ANSI C '89. Equiv to -std=c89. Also, my Configure.pl generated make file uses neither -ansi nor -pedantic. I do have access to a KR C v2, but it doesn't look like it's going to match the actual practice. Oh well. So long, as my code works, I'm happy. Incidentally, I tried adding -ansi and -pedantic and I got lots of warnings, like long long not supported by ANSI C'89, etc. (how can you do 64 bit ints then?). I also got errors that caused outright failure. Perhaps it's best to forget the whole C'89 thing. But maybe someone should remove that from the documentation? Just a thought. -Bill On Thu, 21 Oct 2004 22:41:36 -0700, Jeff Clites [EMAIL PROTECTED] wrote: On Oct 21, 2004, at 11:51 AM, Dan Sugalski wrote: At 11:25 AM -0700 10/21/04, Bill Coffman wrote: I read somewhere that the requirement for parrot code is that it should be compliant with the ANSI C'89 standard. Can someone point me to a description of the C89 spec, so I can make sure my reg_alloc.c patch is C89 compliant? I don't think the ANSI C89 spec is freely available, though I may be wrong. (Google didn't find it easily, but I don't always get along well with Google) If the patch builds without warning with parrot's standard switches then you should be OK. (ANSI C89 was the first big rev of C after the original KR C. If you've got the second edition or later of the KR C book, it uses the C89 spec) Also, if you're compiling with gcc, then you can pass -std=c89 to the compiler to enforce that particular standard. (Apparently--though I haven't tried it.) I believe -ansi does the same thing. JEff
register allocation questions
Hello All, I have been hard at work, trying to grok the reg_alloc.c code, and with some success. My code is assigning registers, so that none are conflicting (which I double-verify), and I'm getting to the end of make. Nonetheless, I don't get too far into make test, and I'm not quite sure why. Maybe some of you who understand this so much better than myself, can help. Here's some questions: 1) In the existing parrot code, when a register is assigned, it uses the following code: for (color = 0; color MAX_COLOR; color++) { int c = (color + MAX_COLOR/2) % MAX_COLOR; if (!colors[c]) { reglist[x]-color = c; debug(interpreter, DEBUG_IMC, #[%s] provisionally gets color [%d] (%d free colors, score %d)\n, reglist[x]-name, c, free_colors, reglist[x]-score); break; } } Thus, it seems to prefer to use register #16 and up, first, before it uses 0-15. This is mistifying to me, since it's not so trivial to code it this way, and yet I cannot figure out why this color preference. It seems like maybe there's a reason which I need to understand here. My new code doesn't do this behaviour. 2) When function imc_reg_alloc (the main register allocation thingy) is called, some of the variables have already expressed a preference as to which register they want. I understand that this can optimize certain parameter passing, etc. The question that arrises is, what if two conflicting variables want the same color (reg#)? Obviously, they don't get their way. But what are the consequences, and what must be done to fix this. 3) Failed test. Here's a test that I fail: ##PIR## .sub _main .local pmc array new array, .PerlArray push array, 0 push array, 1 _dumper( array, array ) end .end .include library/dumper.imc ERROR: Method 'dumpWithName' not found in file '(unknown file)' near line -1 4) When I change my code to start at reg#16, like in (1), the code of (3) produces the output: array = PerlArray (size:2) [ array\array[0], array\array[1] ] Where the array\array[0] should be just 0. Incidentally, reg allocation is done on the following subs: _main __library_dumper_onload _dumper _register_dumper _global_dumper __library_data_dumper_onload dumper __library_data_dumper_default_onload dumpWithName dumpCached dumpProperties dumpHash dumpStringEscaped pmcDefault pmcPMCArray pmcIntList pmcStringArray pmcPerlArray pmcPerlHash pmcPerlString pmcPerlInt pmcPerlNum pmcPerlUndef pmcSub pmcNull pmcOrderedHash pmcManagedStruct pmcUnManagedStruct pmcArray __library_data_dumper_base_onload prepare cache createIndent indent newIndent deleteIndent dump simple String which seems like a lot to me, but I guess it's compiling a lot of stuff. --- I might have some kind of bug, in allocating registers, but this seems strange, and there seems to be a little more going on than finding non-conflicting registers for variable names. The -d 008 output seems to produce pretty much the same thing, but just different registers. Maybe someone can point me at something to look at to fix this. I'll provide the patch if someone would like to play with it. Thanks to all who made it this far, Bill Coffman
C89
I read somewhere that the requirement for parrot code is that it should be compliant with the ANSI C'89 standard. Can someone point me to a description of the C89 spec, so I can make sure my reg_alloc.c patch is C89 compliant? Thanks, - Bill
Re: Pathological Register Allocation Test Generator
Leo, Thanks for your suggestions and comments. On Wed, 20 Oct 2004 10:35:04 +0200, Leopold Toetsch [EMAIL PROTECTED] wrote: Some remargs WRT gen{3,4}.pl: 1) While these programs exhibit some worst case register layout it's probably not a very typical layout. Agreed. The idea was to automate and compare to gcc. There are real-world tests already in the parrot test suite, but because I can generate millions of such cases, one hope is to detect errors, and to get some kind of performance metric, even if these programs are artificial. To get real metrics that people would pay attention to, we might implement SPEC99, etc. 2) RL programs have lexicals and globals access spread over the code like in the generated gen.imc 3) and that's intersparsed with highly local access to temporaries coming from expression evaluations. I think I could modify the tests so that the variables have a Poisson distribution, as far as their usage distance goes (max number of lines away from first use). This would cause some of them to look more like temporary variables, and be a somewhat better simulation of real programs. I'd change the simulation program to use PMCs to allow for 2). Now when it comes to spilling, these lexicals or globals don't need a new storage, their live range can just get discarded and at the next usage of this lexical or global it just can be refetched[1]. Implementing this should already vastly improve the register allocation. [1] The refetching can of course be into a different Parrot register. Thus the usage of globals or lexicals wouldn't interfer and generate huge live ranges over the whole function as it currently does. I don't quite understand this. You think I should create PMC variables, instead of integers? It's probably a good idea to have a mix of the four types I suppose. Once thing I can say is that the interference graph is pretty conservatively generated. When a variable is reassigned, it can be treated as a new variable which can reduce register pressure, but I don't think the code is doing that yet. It sounds like you're talking about implementing the interference graph better. I may get into that too, but for now, I want to get the rester coloring (mapping vars to registers) working better. Also, I'd like to be able to capture the effects of changes in metrics (via scripts like those I posted) so we can see if different ideas will actually improve the situation or not. I don't even have any runtime checks yet, but creating a score to see how much was spilled, could be helpful. That could be incorporated into the register allocator itself (which already is, if you run with -d 0008, then grep Spill). I don't know if we need some syntax to easily support lexicals or globals or if the register allocation can deduce that itself. But if a new syntax simplifies that, we put it in. My preference is to not distinguish between various types of variables, but to have the algorithms deal best with nodes based on the structure of the graph(s). For 3) there is already a separate pass in imcc/reg_alloc.c:allocate_non_interfering(). Yes. This function seems to find variables whose live ranges (life ranges) are restricted to a single basic block, and assign them reg 28,29, or 30. My preference is that low hanging fruit like temporary vars should be assigned by the algorithm. Such variables will tend not to have much interference with others, and are thus probably easier to color. My experience with coloring, is that assigning low degree (aka arity, #neighbors) nodes to fixed colors first, will actually reduce the performance of the graph coloring algorithm. It's usually best to color the easy stuff last. The algorithm I'd like to use is one that happens to work very well for interference graphs, and it is very simple. This algorithm selects a node that interferes with the fewest other variables (lowest degree), and removes it from the interference graph, and puts it into a stack. It then recalculates the lower degrees of each neighbor, and iteratively removes all nodes from the graph in this way, pushing them onto the stack. It then, pops off the nodes from the stack and colors them in lifo order, using the first available color. Nodes that need a color higher that 30, or whatever the max. turns out to be, are spilled. The spilled nodes are always in the densest part of the graph, where something has to spill. I believe this is essentially the algorithm Briggs/Chaitin used, and some variation of it has been used in most compilers since the 70's. One exception is that certain variables want certain colors (want_regno) -- those should be colored first, as the current code does. A disadvantage of the above algorithm is that it tends to randomly select which nodes get spilled, among the densest parts. Another optimization to the algorithm is to incorporate a score that causes the most frequently accessed nodes not to get spilled. (The
Re: Pathological Register Allocation Test Generator
Hello All, This is my first post to the parrot list, but I hope that many will follow. Thanks to all of you for working so dilligently on building this wonderful new toy for all us geeks to play with! I am currently working on a fix to the large subroutine register allocation bug, aka, massive spilling not yet implemented. The problem, is that the register allocation code is complex, and I'm not all that familiar with it, or even with working with compilers at the coding level. I am comming at the problem as a former graph theorist, who was a consultant to a compiler project (at UC Davis, in the late 90's where some of the work was applied toward what is the current Intel compiler). I talked a lot, and learned a lot, but didn't actually contribute any coding to the project. I also spent a lot of time coding graph coloring heuristics with C++. I expect to produce a patch soon, that will make register allocation better. To that end, I felt it was important to come up with some metrics. The first attached program (gen3.pl), which is based on Greg Purdy's gen-pra.pl, gathers data on compilation performance for varying numbers of variables and compares this to GCC. Mine also uses gnuplot to make pretty pictures. If everything is installed okay on your system, a graph will pop up displaying runtime results. I'm using Debian Linux, so it was pretty easy to set up. The second program, gen4.pl, is a randomized, automated, blackbox tester. It works similar to the above, but the corresponding parrot and GCC outputs are compared. And now, for the philosophical note ... One of the biggest wins of RISC over CISC was not that RISC was smaller, but that architects looked at actual code that would run on their machines. They removed an instruction from the instruction set if it wasn't used that frequently. They considered wheather to implement features in the hardware, or the compiler, and they tested each, to find the optimum balance. This approach may be helpful in building parrot as well, and it's what I've tried to do, in a very modest way, in my scripts. Thanks to Gregg Purdy for getting the ball rolling in this general direction. Thanks, Bill Coffman On Mon, 11 Oct 2004 09:54:31 -0400, Dan Sugalski [EMAIL PROTECTED] wrote: At 4:58 PM -0700 10/2/04, Gregor N. Purdy wrote: Dan et al. -- I made a new version of the script that creates gen.cpp and gen.imc (attached). You can run it like this: perl gen-pra.pl 1000 1 (for 1000 labels and 1 variables) and it will create equivalent gen.imc and gen.cpp files. You can test-compile them with these commands: g++ -c gen.cpp and imcc -o x.x gen.imc on my system, the g++ compiler does eventually finish, but the imcc compiler is eventually killed. Maybe this could be used to drive out the underlying problems that are keeping parrot from compiling Dan's really large subs? Mmmm, degenerate behaviour! Cool, and thanks. Should help judge how we're doing with the nastier code. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk #!/usr/bin/perl -w use strict; die Usage: $0 starting_num_variables ending_num_variables\n unless @ARGV == 2; my ($start,$stop) = @ARGV; open COMP, compile.dat or die $!; print COMP #vars gcc imc\n; for (my $vars=$start;$vars=$stop;$vars++) { my ($total_locals) = $vars; my $total_labels = int ($total_locals / 2); my $labels_so_far = 1; my $locals_so_far = 0; open IMC, gen.imc; open CPP, gen.cpp; print IMC .sub __MAIN\n\n; printf IMC \n_L_%d:\n, 1; print CPP #include stdio.h\n\n; print CPP int main(int argc, char * arg[])\n{\n; printf CPP \n_L_%d:\n, 1; my %action_table = qw(l label v variable a arithmetic c control p print); my @actions = (('l')x2, ('v')x4, ('a')x9, ('c')x3, ('p')x0); while ($labels_so_far $total_labels or $locals_so_far $total_locals) { my $action = $action_table{$actions[rand @actions]}; if ($action eq 'label' and $labels_so_far $total_labels) { my $this_label = ++$labels_so_far; printf IMC \n_L_%d:\n, $this_label; printf CPP \n_L_%d:\n, $this_label; } elsif ($action eq 'variable' and $locals_so_far $total_locals) { my $this_local = ++$locals_so_far; my $this_value = int(rand(99)) + 1; printf IMC .local int V_%d\n, $this_local; printf IMC V_%d = %d\n, $this_local, $this_value; printf CPP int V_%d;\n, $this_local; printf CPP V_%d = %d;\n, $this_local, $this_value; } elsif ($action eq 'arithmetic' and $locals_so_far 0) { my $result = 1 + int(rand($locals_so_far)); my $arg1 = 1 + int(rand($locals_so_far)); my $arg2 = 1 + int(rand($locals_so_far)); next if $arg1 == $arg2; my $op