Re: C89
Bill Coffman [EMAIL PROTECTED] wrote: Thanks for the info... Apparently, gcc -ansi -pedantic is supposed to be ANSI C '89. Not really. It's pedantic ;) Incidentally, I tried adding -ansi and -pedantic and I got lots of warnings, like long long not supported by ANSI C'89, etc. (how can you do 64 bit ints then?). A C compiler on a 64-bit machine uses long. ... I also got errors that caused outright failure. Perhaps it's best to forget the whole C'89 thing. Not the C'89 thing, but the -ansi thing of gcc. -Bill leo
Re: register allocation questions
Bill Coffman [EMAIL PROTECTED] wrote: Currently, here's how the register allocator is doing. Failed TestStat Wstat Total Fail Failed List of Failed --- t/library/dumper.t5 1280135 38.46% 1-2 5 8 13 4 tests and 51 subtests skipped. Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay. I recall Leo, or someone, saying that the data dumper routines are not following the calling convention properly. I didn't look too close, but it's probably only the entry points: .sub _dumper _global_dumper() That's missing C.param statements, so there are none. I've learned a lot about how the compiler works at this point, and I'd like to contribute more :) Great. Thanks. Would you like a patch? Should I fix the data dumper routines first? Definitely - dumper.t tests are currently disabled, don't worry. What is all this talk about deferred registers? What should I do next? deferred registers doesn't make bells ring. What do you mean with that? - Send patch with explanation of algorithm. Yes, I think we are kind of doing this. It's best to pass the registers straight through though. Like when a variable will be used as a parameter, give it the appropriate reg num. Sort of outside the immediate scope of register coloring, but as I've learned, one must go a little beyond, to see the input and output for each sub. Well, it's not really outside of register coloring. It's part of parrot's calling conventions. You can think of it as part of the Parrot machine ABI. When you write a compiler for darwin-PPC, you have to pass function arguments in r3, r4, ... and you get a return value in r3. If you don't do that, you'll not be able to make any C library call. In Parrot we have similar calling conventions and the register allocator must be aware of that. E.g. when you have: some_function() # (i, j) = some_function() $I0 = I5 $I1 = I6 you know that I5 and I6 are return results. The live range or the previous usage of I5 and I6 is cut by the function call. Using the return values directly is of course an optimization and not strictly necessary, nethertheless the allocator has to be aware that the function call invalidates previous I5 and I6. But the idea is to have each sub declare how many registers to save/restore. Don't worry about save/restore. That's already changed. imcc doesn't emit any savetop/restoretop or similar opcodes any more. Registers are preserved now by allocating a new register frame for the subroutine. We can also minimize this number to match the physical architecture that parrot is running on (for an arch specific optimization). Yes. I did that some time ago in imcc/jit.c, which produced register mapping for the underlying hardware CPU. Parrot registers 0.. n-1 were given negative numbers and src/jit.c used these directly as mapping for CPU registers. This vastly reduced JIT startup time. Yes, yes, renaming! I want to do register renaming! Go for it please. p31 holds all the spill stuff. It's a pain. Maybe I'll move that around, but if p31 is used, it means that there is no more room for symbols, in at least one of the reg sets. I'd say that with register renaming, spilling will be very rare. But there is of course no need to use P31 for it. If we really have to spill we can optimize that a bit. - Bill Coffman leo
[PATCH] PPC JIT failure for t/pmc/threads_8.pasm
I was getting a failure under JIT on PPC for t/pmc/threads_8.pasm, and the problem turned out to be that emitting a restart op takes 26 instructions, or 104 bytes, and we were hitting the grow-the-arena logic just shy of what would have triggered a resize, then running off the end. The below patch fixes this; really that magic number (200, now) needs to be bigger than the amount of space we'd ever need to emit the JIT code for a single op (plus saving registers and such), but with the possibility of dynamically loadable op libs (with JIT?), it's hard to say what number is guaranteed to be large enough. Or, we can pick a reasonable, largish number that works for the built-in ops (empirically determined, as now), and document that loadable JITted ops which could take more than this, need to make sure to grow the arena as necessary. (And we could provide a utility function to make this easy.) JEff Index: src/jit.c === RCS file: /cvs/public/parrot/src/jit.c,v retrieving revision 1.95 diff -u -b -r1.95 jit.c --- src/jit.c 25 Oct 2004 10:24:14 - 1.95 +++ src/jit.c 29 Oct 2004 07:50:09 - @@ -1395,7 +1395,7 @@ while (cur_op = cur_section-end) { /* Grow the arena early */ if (jit_info-arena.size -(jit_info-arena.op_map[jit_info-op_i].offset + 100)) { +(jit_info-arena.op_map[jit_info-op_i].offset + 200)) { #if REQUIRES_CONSTANT_POOL Parrot_jit_extend_arena(jit_info); #else
AIX PPC JIT warning
Recently config/gen/platform/darwin/asm.s was added, containing Parrot_ppc_jit_restore_nonvolatile_registers(). Corresponding code also needs to be added to config/gen/platform/aix/asm.s -- Parrot should fail to link on AIX currently, without this. I didn't try to update the AIX asm.s myself, since I wasn't confident that I could do this correctly without having a way to test. So, someone with AIX asm expertise, please take a look. Thanks, JEff
Re: register allocation questions
Leopold Toetsch [EMAIL PROTECTED] wrote: Bill Coffman [EMAIL PROTECTED] wrote: t/library/dumper.t5 1280135 38.46% 1-2 5 8 13 I didn't look too close, but it's probably only the entry points: .sub _dumper _global_dumper() Fixed. leo
Re: [PATCH] PPC JIT failure for t/pmc/threads_8.pasm
Jeff Clites [EMAIL PROTECTED] wrote: I was getting a failure under JIT on PPC for t/pmc/threads_8.pasm, and the problem turned out to be that emitting a restart op takes 26 instructions, or 104 bytes, and we were hitting the grow-the-arena logic just shy of what would have triggered a resize, then running off the end. Duh. It's probably time to generate a JITted restart function. The below patch fixes this; really that magic number (200, now) needs to be bigger than the amount of space we'd ever need to emit the JIT code for a single op (plus saving registers and such), but with the possibility of dynamically loadable op libs (with JIT?), Not easily. Or it would be not too hard. Given a fairly complete implementation of core.jit, we have a function table (per platform) that has slots for every processor instruction (or almost: Parrot hasn't yet vector opcodes). So a new opcode could be written in terms of existing opcodes, which would easily allow the generation of JITted variants. If a new opcode is too complex, it could be written (partly) in C, which would need just one new JIT opcode call_c_func. And that is exactly what Parrot_jit_build_call_func on i386 is already doing. ... it's hard to say what number is guaranteed to be large enough. If we ever have loadable JIT code, will have an interface to set that magic number. Thanks, applied. JEff leo
Re: pmc_type
Paolo Molaro [EMAIL PROTECTED] wrote: On 10/27/04 Luke Palmer wrote: Ugh, yeah, but what does that buy you? In dynamic languages pure derivational typechecking is very close to useless. Actually, if I were to write a perl runtime for parrot, mono or even the JVM I'd experiment with the same pattern. For the latter two yes, but as Luke has outlined that doesn't really help for languages where methods are changing under the hood. You would assign small interger IDs to the names of the methods and build a vtable indexed by the id. Well, we already got a nice method cache, which makes lookup a vtable-like operation, i.e. an array lookup. But that's runtime only (and it needs invalidation still). So actually just the first method lookup is a hash operation. ... There are a number of optimizations that can be done to reduce the vtable size, but I'm not sure this would matter in parrot as long as bytecode values are as big as C ints:-) That ought to come ;) Cachegrind shows no problem with opcode fetch and you know, when it's compiled to JIT bytecode size doesn't matter anyway. We just avoid the opcode and operand decoding. lupus leo
[perl #32208] [PATCH] Register allocation patch - scales better to more symbols
# New Ticket Created by Bill Coffman # Please include the string: [perl #32208] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32208 Patch does the following: - Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. - Remove several of the functions, which are incorporated into the new algorithm. - Shortcomming: doesn't use score anymore, but the algorithm is smart enough that I hope it's okay to do that. - Failed 2 tests for latest CVS. (See earlier posting.) WANT TO DO: - Apparently, there's a memory leak which prevents from coloring graphs with more than a few hundred registers. I suspect this is in the spill, or update_life routine. Not sure if it's mine or pre-existing. - Interference graph is using 8 times the memory it needs to use. This is still trivial compared to lost data in above bug. - Smarten up algorithm to use score again. A good way to do so is commented in the code. - Create spilling score, that prints out with a debug option. This can be a metric to compare various algorithms. - Improve spill to spill all registers at once, adding speed. - Introduce proper analysis of flow graph, to create less conservative interference graph. - Color each of the four register types separately. Be sure to compare gains with losses for this, as it is not entirely cear. - Introduce register renaming. When variable is reassigned, it might as well be considered a new symbol... well, much of the time, anyway. - Introduce variable register size, in coordination with subroutine calls, to reduce copy cost. Coordinate with Dan and Leo on this. - Improve flow-graph, basic block calculation, etc. Make it all a little easier to understand, and more efficient. Knuth style, literate programming. Well, just good comments, and a couple of decent pods. Index: imcc/reg_alloc.c === RCS file: /cvs/public/parrot/imcc/reg_alloc.c,v retrieving revision 1.22 diff -u -r1.22 reg_alloc.c --- imcc/reg_alloc.c 30 Sep 2004 16:00:37 - 1.22 +++ imcc/reg_alloc.c 29 Oct 2004 04:39:07 - @@ -41,15 +41,63 @@ static void compute_du_chain(IMC_Unit * unit); static void compute_one_du_chain(SymReg * r, IMC_Unit * unit); static int interferes(IMC_Unit *, SymReg * r0, SymReg * r1); -static int map_colors(IMC_Unit *, int x, unsigned int * graph, int colors[], int typ); -#ifdef DO_SIMPLIFY -static int simplify (IMC_Unit *); -#endif static void compute_spilling_costs (Parrot_Interp, IMC_Unit *); -static void order_spilling (IMC_Unit *); static void spill (Interp *, IMC_Unit * unit, int); -static int try_allocate(Parrot_Interp, IMC_Unit *); -static void restore_interference_graph(IMC_Unit *); + +/* New graph algorithm stuff */ +static void ig_color_graph(void); +static void apply_coloring(IMC_Unit *); +static void ig_precolor(IMC_Unit *); +static int ig_init_graph(int num_nodes, unsigned* edge_bits); +static void ig_clear_graph(void); +static int spill_registers(Parrot_Interp interpreter, IMC_Unit * unit); + +typedef struct { +int deg; /* degree of node (# neighbors) */ +int col; /* color assigned to this node */ +int rank;/* position within the below D array */ +char in; /* boolean, indicating if removed yet */ +} node; + +typedef struct { +int n; /* number of nodes */ +node* V; /* array of nodes */ +int* D; /* sorted nodes by degree */ +unsigned* E; /* edge data, adjacency matrix */ +int k; /* maximum color used in graph (0 means uncolored) */ +} graph; + +graph G; /* must have as global to use qsort, +but there's only one at a time -- FIXME minimize this global */ + +#define Dbg_level 0 /* FIXME -- must be a better way to implement this */ +#include stdarg.h +static void my_message(const char *pat, ...) +{ +va_list args; +va_start(args, pat); +#if Dbg_level = 1 +vfprintf(stderr,pat,args); +#endif +va_end(args); +} +static void my_message2(const char *pat, ...) +{ +va_list args; +va_start(args, pat); +#if Dbg_level = 2 +vfprintf(stderr,pat,args); +#endif +va_end(args); +} +/*#define Dbg printf*/ +#define Dbg my_message +#define Dbg2 my_message2 + + +/**/ + + #if 0 static int neighbours(int node); #endif @@ -57,7 +105,7 @@ extern int pasm_file; /* XXX FIXME: Globals: */ -static IMCStack nodeStack; +static IMCStack nodeStack; /* FIXME -- this is used in a silly way */ static unsigned int* ig_get_word(int i, int j, int N, unsigned int* graph, int* bit_ofs) @@ -74,12 +122,14 @@ *word |= (1 bit_ofs); } +/* currently unused. static void ig_clear(int i, int j, int N, unsigned int*
Re: [perl #32196] Yet Another GC Crash (YAGC)
Matt Diephouse [EMAIL PROTECTED] wrote: #0 0x0003d420 in pobject_lives (interpreter=0xd00140, obj=0x0) at src/dod.c:198 #1 0x48f0 in mark_1_seg (interpreter=0xd00140, cs=0xd01fd0) at src/packfile.c:360 Ah. The code assumed that there is always a valid subroutine name, which might not be true for dynamically created subs, like in forth. I've put a check for a NULL name in front. Thanks, leo
Re: JIT and platforms warning
Leopold Toetsch [EMAIL PROTECTED] wrote: arm, mips, and sun4 JIT platforms need definitely some work to even keep up with the current state of the JIT interface. That's actually wrong - sorry. Sun4 JIT is fairly complete and is up to date and should't have been in above sentence. I've messed that up with the arm platform, which doesn't use register mappings at all. All opcodes are reloading and storing from/to Parrot registers. Mips only has 3 JITted opcodes. Sorry again Stéphane, leo
Traceback or call chain
We now have since quite a time the current subroutine and the current continuation in the interpreter context structure. With that at hand, we should now be able to generate function tracebacks in error case and we need the call chain too, to optimize register frame recycling. Whenever a continuation is created, we have to walk up the call chain and mark all return continuations as non-recyclable. Should the traceback object be avaiable as a PMC? What information should be included in the traceback (object)? Comments welcome, leo
Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols
Bill Coffman (via RT) wrote: Patch does the following: - Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. Good. Hopefully Dan can provide some compile number compares. - Shortcomming: doesn't use score anymore, but the algorithm is smart enough that I hope it's okay to do that. - Failed 2 tests for latest CVS. (See earlier posting.) I've fixed dumper.t in CVS. Only streams_11 is currently failing here. WANT TO DO: - Apparently, there's a memory leak which prevents from coloring graphs with more than a few hundred registers. I suspect this is in the spill, or update_life routine. Not sure if it's mine or pre-existing. There probably were already some leaks. But we really have to get rid of memory leaks alltogether. - Interference graph is using 8 times the memory it needs to use. This is still trivial compared to lost data in above bug. That might kill Dan's 6000-liner. - Color each of the four register types separately. Be sure to compare gains with losses for this, as it is not entirely cear. That would reduce memory, wouldn't it? - Introduce register renaming. When variable is reassigned, it might as well be considered a new symbol... well, much of the time, anyway. Number 1 in my priority list. - Introduce variable register size, in coordination with subroutine calls, to reduce copy cost. Coordinate with Dan and Leo on this. Not needed. We don't copy registers anymore. - Improve flow-graph, basic block calculation, etc. Yeah. And create some means to test it. Some more notes WRT the patch: * the Dbg and Dbg2 debug macros aren't needed. Just use the existing debug(interp, level, ...) function in src/debug.c. If you need some extra levels, you can use some more bits in imcc/debug.h * The global G is a no no, and I don't think you need it for qsort (If you need it you should just use the global around the qsort). We finally have to have a reentrant compiler. Yes I know, there are still some other globals around, they are being reduced ... * all functions should have an Interp* and a IMC_Unit* argument to allow reentrancy. I.e. all state should be in the unit structure. * Variable names should be a bit more verbose, G.V is to terse. * alloca() isn't portable and not available everywhere I'm waiting for Dan's comments on usability. Thanks for the patch, leo
Q: newsub opcodes
When PIR code has a function call syntax: foo(i, j) the created code has currently (amongst other) a line: newsub Px, .Sub, foo where the label foo is a relative branch offset. This is suboptimal for several reasons: - it creates a new PMC for every call albeit in 99.99% of cases the PMC constant for the sub could be used directly [1] - the created subroutine PMC lacks information: only the start label is known, when the PMC is created. The subroutine's name and the end of the opcodes for that sub isn't in Px. Obtaining that information would be a costy O(n) lookup in the fixup segement of the bytecode (or in the constants, which is probably still larger). Subroutine length and name information is needed for introspection and for bounds checking in safe run cores. So I think, we should do instead something like this: get_sub Px, foo # find the PMC with label foo in constants # at compile time and # replace foo with the index in constants clone Py, Px # if Px would be modified, clone it first [1] find_global Px, foo # we have that already, but hash lookup! The current syntax: newsub Px, .Closure, foo could remain unchanged, except that again under the hood, the label foo is replaced with the index in the constant table. For closures it's probably best to actually return a new object per default, as a closure might have different state in the lexical pad in each invocation. Is that reasonable? leo [1] a few tests attach properties to the Sub PMC
Re: C89
On Thu, 28 Oct 2004 19:22:02 -0700, Bill Coffman [EMAIL PROTECTED] wrote: Thanks for the info... Apparently, gcc -ansi -pedantic is supposed to be ANSI C '89. Equiv to -std=c89. Also, my Configure.pl generated make file uses neither -ansi nor -pedantic. I do have access to a KR C v2, but it doesn't look like it's going to match the actual practice. Oh well. So long, as my code works, I'm happy. Incidentally, I tried adding -ansi and -pedantic and I got lots of warnings, like long long not supported by ANSI C'89, etc. (how can you do 64 bit ints then?). I also got errors that caused outright failure. Perhaps it's best to forget the whole C'89 thing. But maybe someone should remove that from the documentation? Just a thought. I thought long long was only defined in C99, not C89? -- bd
Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols
At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote: Bill Coffman (via RT) wrote: Patch does the following: - Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. Good. Hopefully Dan can provide some compile number compares. I'll give it a shot as soon as I can. WANT TO DO: - Apparently, there's a memory leak which prevents from coloring graphs with more than a few hundred registers. I suspect this is in the spill, or update_life routine. Not sure if it's mine or pre-existing. There probably were already some leaks. But we really have to get rid of memory leaks alltogether. - Interference graph is using 8 times the memory it needs to use. This is still trivial compared to lost data in above bug. That might kill Dan's 6000-liner. I should point out, for the folks following along at home, that it's 6K lines of source in the original language (DecisionPlus). The actual PIR code generated runs to 84k lines in the biggest sub. Some more notes WRT the patch: * the Dbg and Dbg2 debug macros aren't needed. Just use the existing debug(interp, level, ...) function in src/debug.c. If you need some extra levels, you can use some more bits in imcc/debug.h * The global G is a no no, and I don't think you need it for qsort (If you need it you should just use the global around the qsort). We finally have to have a reentrant compiler. Yes I know, there are still some other globals around, they are being reduced ... I'd like to get us down to a single global for all of Parrot. I don't think it's possible to safely go any lower than that, though I suppose we could if we really, really tried, and didn't mind things crashing and burning in some really odd fringe edge cases. * all functions should have an Interp* and a IMC_Unit* argument to allow reentrancy. I.e. all state should be in the unit structure. Definitely. * Variable names should be a bit more verbose, G.V is to terse. Yeah. This stuff is abstruse enough as it is -- take pity on those uf us with Very Little Brain. :) * alloca() isn't portable and not available everywhere Yep. This is a gcc-ism. Use Parrot's memory allocation functions instead. I'm waiting for Dan's comments on usability. I'd like the code issues cleaned up before it gets committed. I'll let you know the timing as soon as I can, though it'll probably take a few hours. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Q: newsub opcodes
At 2:46 PM +0200 10/29/04, Leopold Toetsch wrote: When PIR code has a function call syntax: foo(i, j) the created code has currently (amongst other) a line: newsub Px, .Sub, foo where the label foo is a relative branch offset. This is suboptimal for several reasons: [snip] So I think, we should do instead something like this: get_sub Px, foo # find the PMC with label foo in constants # at compile time and # replace foo with the index in constants clone Py, Px # if Px would be modified, clone it first [1] find_global Px, foo # we have that already, but hash lookup! [snip] Is that reasonable? Yeah, but I think I've a better approach. Instead of doing this, let's just get PMC constants implemented. (I know -- just he says :) Each sub in a bytecode segment can get a slot in the constant table, and we can map in sub fetches to the (as of now nonexistent) set_p_pc op. While we're at it we should see about adding in an integer constant table that can be fixed up on load (to take care of those pesky what number did my PMC class map to problems more quickly than the hash lookup) but we can put that off a bit. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols
At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote: Bill Coffman (via RT) wrote: Patch does the following: - Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. Good. Hopefully Dan can provide some compile number compares. The numbers are... not good. I took one of the mid-sized programs and threw it at the new code. Parrot in CVS takes about 10 minutes to run through this program. The main sub's about 30Klines of code, and the stat from a parrot -v is: sub _MAIN: registers in .imc: I2875, N0, S868, P7615 0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch 0 used once deleted 0 invariants_moved registers needed:I2883, N0, S873, P7741 registers in .pasm: I31, N0, S31, P32 - 37 spilled 5845 basic_blocks, 47622 edges I applied the patch to a copy of parrot and ran it. After 37 minutes I killed the thing. It had 1.6G of RAM allocated at the time of death, too. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: pmc_type
On 10/29/04 Leopold Toetsch wrote: Ugh, yeah, but what does that buy you? In dynamic languages pure derivational typechecking is very close to useless. Actually, if I were to write a perl runtime for parrot, mono or even the JVM I'd experiment with the same pattern. For the latter two yes, but as Luke has outlined that doesn't really help for languages where methods are changing under the hood. If a method changes you just replace the pointer in the vtable to point to the new method implementation. Invalidation is the same, you just replace it with a method that gives the method not found error/exception. You would assign small interger IDs to the names of the methods and build a vtable indexed by the id. Well, we already got a nice method cache, which makes lookup a vtable-like operation, i.e. an array lookup. But that's runtime only (and it needs invalidation still). So actually just the first method lookup is a hash operation. And where is it cached and how? Take (sorry, still perl5 syntax:-): foreach $i (@list) { $i-method (); } With the vtable idea, the low-level operations are (in pseudo-C): vtable = $i-vtable; // just a memory dereference code = vtable [method-constant-id]; // another mem deref run_code (code); From your description it seems it would look like: vtable = $i-vtable; code = vtable-method_lookup (method); // C function call run_code (code); Note that $i may be of different type for each loop iteration. Even a cached lookup is going to be slower than a simple memory dereference. Of course this only matters if the lookup is actually a bottleneck of your function call speed. matter in parrot as long as bytecode values are as big as C ints:-) That ought to come ;) Cachegrind shows no problem with opcode fetch and you know, when it's compiled to JIT bytecode size doesn't matter anyway. We just avoid the opcode and operand decoding. If you use a JIT, decode overhead is already very small:-) AFAIK, alpha is the only interesting architecture that doesn't do byte access (at least on older processors) and so it may be a little inefficient there. But I think you should optimize for the common case. On my machine going through a byte opcode array is faster than an int one by about 15% (more if level 2 cache or mem is needed to hold it). The only issue is when you need to load int values that don't fit in a byte, but those are not so common as register numbers in your bytecode which currently take a whole int could just use a byte. Anyway, the two approaches may also balance out if the opcodes are in ro memory. The issue is that in perl, for example, so much is supposed to happen at runtime, because the 'use' operator changes the compiling environment, so you actually need to compile at runtime in many cases, not only eval. That means emitting parrot bytecode in memory and this bytecode is per-process, so it increases memory usage and eventually swapping activity. As you say, since you jit, this memory is wasted, since it goes unused soon after it is written. Another issue is disk-load time: when you have small test apps it doesn't matter, but when you start having bigger apps it might (even mmapping has its cost, if you need a larger working set to load bytecodes). BTW, in the computed goto code, make the array of address labels const: it helps reducing the rw working set at least when parrot is built as an executable. lupus -- - [EMAIL PROTECTED] debian/rules [EMAIL PROTECTED] Monkeys do it better
Re: Q: newsub opcodes
Dan Sugalski [EMAIL PROTECTED] wrote: get_sub Px, foo # find the PMC with label foo in constants Yeah, but I think I've a better approach. Instead of doing this, let's just get PMC constants implemented. Well, they are implemented, at least partly. Sub PMCs are in the constant table. The funny Cget_sub opcode is actually a ... set_p_pc op. ... with the small difference, that at compile time, the integer argument is a label (offset). What about the other part: closures - should they be created via new always? While we're at it we should see about adding in an integer constant table that can be fixed up on load (to take care of those pesky what number did my PMC class map to problems more quickly than the hash lookup) but we can put that off a bit. You are thinking of a 2-stage lookup? I1 = dynamic_type I0 # lookup runtime type mapping $P0 = new I1 A normal IntList Array can do that too. leo
Mostly a Perl task for the interested
classes/*.c is created by the bytecode compiler classes/pmc2c2.pl. Most of the actual code is in lib/Parrot/Pmc2c.pm. The created C code could need some improvements: * the temp_base_vtable should be const. This is currently not possible, because items like .whoami are changed in the temp_base_vtable. But we don't have to do that, as the vtable is cloned a few lines below anyway. So we should create a const table and do the rest of the init stuff in the cloned table. * same with the MMD init table. * All constant strings in classes (whoami, isa_str, does_str) and method names in the delegate.c should use the CONST_STRING() macro. That would need some Makefile tweaks too, to add a dependency on the .str file. Note: foo = CONST_STRING(interpreter, foo); should always be on it's own line and not inside a multiline expression. Thanks, leo
Re: [perl #32208] [PATCH] Register allocation patch - scales better to more symbols
Sounds like the memory leak. Let me try to fix this, and address the other issues. I'll get back to you. Thanks, -Bill On Fri, 29 Oct 2004 10:15:34 -0400, Dan Sugalski [EMAIL PROTECTED] wrote: At 12:30 PM +0200 10/29/04, Leopold Toetsch wrote: Bill Coffman (via RT) wrote: Patch does the following: - Applied Matula/Chaitin/Briggs algorithm for register allocation. - Color the graph all at once, and spill all symbols with high colors. Spill all at once to speed things up. Good. Hopefully Dan can provide some compile number compares. The numbers are... not good. I took one of the mid-sized programs and threw it at the new code. Parrot in CVS takes about 10 minutes to run through this program. The main sub's about 30Klines of code, and the stat from a parrot -v is: sub _MAIN: registers in .imc: I2875, N0, S868, P7615 0 labels, 0 lines deleted, 0 if_branch, 0 branch_branch 0 used once deleted 0 invariants_moved registers needed:I2883, N0, S873, P7741 registers in .pasm: I31, N0, S31, P32 - 37 spilled 5845 basic_blocks, 47622 edges I applied the patch to a copy of parrot and ran it. After 37 minutes I killed the thing. It had 1.6G of RAM allocated at the time of death, too. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Q: newsub opcodes
At 4:36 PM +0200 10/29/04, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: get_sub Px, foo # find the PMC with label foo in constants Yeah, but I think I've a better approach. Instead of doing this, let's just get PMC constants implemented. Well, they are implemented, at least partly. Sub PMCs are in the constant table. The funny Cget_sub opcode is actually a ... set_p_pc op. ... with the small difference, that at compile time, the integer argument is a label (offset). Then we should toss the difference and have a single op to access the PMC constant table. We're going to need to do this for real PMC constants, and I don't see any point to have two ways to do the identical same thing. What about the other part: closures - should they be created via new always? Yeah, I think so. They always need to capture a lexical scope, so I think they're going to have to. While we're at it we should see about adding in an integer constant table that can be fixed up on load (to take care of those pesky what number did my PMC class map to problems more quickly than the hash lookup) but we can put that off a bit. You are thinking of a 2-stage lookup? I1 = dynamic_type I0 # lookup runtime type mapping $P0 = new I1 More like what we do right now with all the other constant types. Integers aren't in a constant table since we just inline them, but the nice thing about a constant table is you can do fixups on it while not touching the actual bytecode, leaving it readonly and mmapped and all that. While I'd prefer to leave integers inlined in general, having an integer section of the constant table that can be accessed when necessary makes the things that need integer fixup easier. And yeah, this imples that our constant table isn't necesasrily constant. I'm OK with that, though. :) A normal IntList Array can do that too. Sure, it could. But we're trying to make sure we provide all the standard facilities in one place so all the different compiler writers don't have to bother. Fixed-up integer constants is a reasonable foundation piece for us to provide. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Q: newsub opcodes
Dan Sugalski wrote: At 4:36 PM +0200 10/29/04, Leopold Toetsch wrote: Well, they are implemented, at least partly. Sub PMCs are in the constant table. The funny Cget_sub opcode is actually a ... set_p_pc op. ... with the small difference, that at compile time, the integer argument is a label (offset). Then we should toss the difference and have a single op to access the PMC constant table. I'm all for that. We just need set_p_pc. OTOH we need some syntax bits, what this PMC constant in _pc denotes and how it's constructed. For subroutine PMCs it's quite simple: the subroutine label is defining the Sub PMC. So its probably something like: .pmc_constant .Sub, foo A complex PMC could be .pmc_constant .Complex, 2+3i Putting PMC constants into the constant table isn't the problem here, nor changing the format on disc to use freeze/thaw, the construction of more or less arbitrary PMC constants needs some thoughts. We could probably just define that the Cnew_extended vtable of a class is responsible for constructing an appropriate object from a given string. And that get's frozen to bytecode. [ integer constants ] More like what we do right now with all the other constant types. Ah, sounds good. Integers aren't in a constant table since we just inline them, but the nice thing about a constant table is you can do fixups on it while not touching the actual bytecode, leaving it readonly and mmapped and all that. While I'd prefer to leave integers inlined in general, having an integer section of the constant table that can be accessed when necessary makes the things that need integer fixup easier. Well, we can't have both schemes coexisting. When running add_i_ic or new_p_ic. we have to know, whether the integer is inlined or in the constant table. For RISC cpus JIT code a constant table is better even for integers. And having integer constants in the constant table would open the path to compile an INTVAL=64bit configuration on a 32-bit machine, where opcode_t is 32 bits. That leads again to my warnocked proposal to just toss all variants of opcodes that have constants too. With all possible PMC constants in the constants table, we get another (estimated) times two opcode count increase. When we now have only: add_p_p_p we'd get: add_p_p_pc add_p_pc_p add_p_pc_pc additionally. We'll start blowing caches. The code get's too big (think compile problems with CGoto). JIT maintainers have to support all these opcodes. Please consider to reduce constant usage to 4 opcodes: set_i_ic set_n_nc set_s_sc set_p_pc (yes, that imposes a bit more pressure on the register allocator, but these constants are reloadable all the time and don't need spilling) And yeah, this imples that our constant table isn't necesasrily constant. I'm OK with that, though. :) Well, constant Sub PMCs have already offsets relative to their code segment in the PBC. On loading the segment, this gets converted to absolute code addresses. Forget the constantness of constant segments ;) We'll do yet another fixup and I really like the idea WRT PMC types. leo
Re: Q: newsub opcodes
At 10:17 PM +0200 10/29/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 4:36 PM +0200 10/29/04, Leopold Toetsch wrote: Well, they are implemented, at least partly. Sub PMCs are in the constant table. The funny Cget_sub opcode is actually a ... set_p_pc op. ... with the small difference, that at compile time, the integer argument is a label (offset). Then we should toss the difference and have a single op to access the PMC constant table. I'm all for that. We just need set_p_pc. OTOH we need some syntax bits, what this PMC constant in _pc denotes and how it's constructed. For subroutine PMCs it's quite simple: the subroutine label is defining the Sub PMC. So its probably something like: .pmc_constant .Sub, foo For now I'm fine with restricting it to sub PMCs and opening it up to more later. :) [ integer constants ] More like what we do right now with all the other constant types. Ah, sounds good. Integers aren't in a constant table since we just inline them, but the nice thing about a constant table is you can do fixups on it while not touching the actual bytecode, leaving it readonly and mmapped and all that. While I'd prefer to leave integers inlined in general, having an integer section of the constant table that can be accessed when necessary makes the things that need integer fixup easier. Well, we can't have both schemes coexisting. Right, and I don't think it's a good idea to switch out from what we have now. For integer constants I think we ought to have an explicit op for fetching them: getconstant Ix, Iconstantnumber or something like that. We can have a corresponding setconstant op for constants that... aren't. And for code to set up the constant table in the first place. And having integer constants in the constant table would open the path to compile an INTVAL=64bit configuration on a 32-bit machine, where opcode_t is 32 bits. That leads again to my warnocked proposal to just toss all variants of opcodes that have constants too. With all possible PMC constants in the constants table, we get another (estimated) times two opcode count increase. Please consider to reduce constant usage to 4 opcodes: I have, and no. (though we can toss all the two-constant I, S, and N forms) We leave things as-is. When we run up to our 1.0 release we can run some analysis on the different compilers to see what ops aren't being used and pare out the list at that point And yeah, this imples that our constant table isn't necesasrily constant. I'm OK with that, though. :) Well, constant Sub PMCs have already offsets relative to their code segment in the PBC. On loading the segment, this gets converted to absolute code addresses. Forget the constantness of constant segments ;) :) Works for me. I want to revisit packfile formats and whatnot again soon anyway -- I want us to start adding in source line number source lines to the packfiles and have it available as we run so we can start throwing more informative error messages and making the debugger more useful. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Mostly a Perl task for the interested
On Fri, Oct 29, 2004 at 05:47:55PM +0200, Leopold Toetsch wrote: classes/*.c is created by the bytecode compiler classes/pmc2c2.pl. Most of the actual code is in lib/Parrot/Pmc2c.pm. The created C code could need some improvements: Can I add a fourth - one I said to Dan I intended to do, but so far haven't managed: * The created C code could benefit from #line directives to track where C code came from the input .pmc file, so that compiler errors are reported for the original .pmc file. Perl 5's xsubpp does this well, using #line directives to switch between foo.c and foo.xs, depending on whether that section of code was human written, or autogenerated. It makes things much easier while developing. Of course, I may find time to do this before anyone else does, but anyone is welcome to beat me to it. Nicholas Clark
Re: AIX PPC JIT warning
On Fri, 29 Oct 2004 01:05:18 -0700, Jeff Clites [EMAIL PROTECTED] wrote: Recently config/gen/platform/darwin/asm.s was added, containing Parrot_ppc_jit_restore_nonvolatile_registers(). Corresponding code also needs to be added to config/gen/platform/aix/asm.s -- Parrot should fail to link on AIX currently, without this. I didn't try to update the AIX asm.s myself, since I wasn't confident that I could do this correctly without having a way to test. So, someone with AIX asm expertise, please take a look. Thanks, JEff Worry not, it's already broken. I've been unable to test the AIX/PPC JIT since ICU went in. The configuration for ICU (at least as of 2.6) supports only a 64-bit build, while aix/asm.s is 32-bit only (the linker claims the .o is corrupt if assembled with OBJECT_MODE=64). To get it working again, one of three things needs to happen: 1. ICU becomes optional again (please!). 2. PPC64 JIT code is written which can be morphed into POWER code. Transforming PPC32-POWER was mostly straightforward, so hopefully 64-bit will be as well. 3. ICU's configure starts to support 32-bit compiles. This might happen with 3.0/CVS already, but I haven't checked. 1 is necessary anyway, but it doesn't seem like a high priority. 2 is best in the long run, but requires somebody who knows more about PPC64 ASM than I do to get started. I don't know if 3 has any chance of happening upstream, but I doubt there's anybody working on Parrot who wants to deal with it. If somebody can help with one or more of these, I can try to get it going on AIX 4.3.3 once again. Adam
[perl #32223] [PATCH] Build dynclasses by default.
# New Ticket Created by Will Coleda # Please include the string: [perl #32223] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32223 I think we should be building dynclasses by default. oolong:~/research/parrot/config/gen/makefiles coke$ cvs diff root.in Index: root.in === RCS file: /cvs/public/parrot/config/gen/makefiles/root.in,v retrieving revision 1.254 diff -b -u -r1.254 root.in --- root.in 12 Oct 2004 09:00:16 - 1.254 +++ root.in 30 Oct 2004 03:31:44 - @@ -463,7 +463,7 @@ # ### -all : flags_dummy $(TEST_PROG) runtime/parrot/include/parrotlib.pbc runtime/parrot/include/config.fpmc docs $(LIBNCI_SO) $(GEN_LIBRARY) +all : flags_dummy $(TEST_PROG) runtime/parrot/include/parrotlib.pbc runtime/parrot/include/config.fpmc docs $(LIBNCI_SO) $(GEN_LIBRARY) dynclasses_dummy .SUFFIXES : .c .h .pmc .dump $(O) .str .imc .pbc @@ -581,6 +581,9 @@ @echo Compiling with: @$(PERL) tools/dev/cc_flags.pl ./CFLAGS echo $(CC) $(CFLAGS) -I$(@D) ${cc_o_out} xx$(O) -c xx.c +dynclasses_dummy : + cd dynclasses $(MAKE) + runtime/parrot/include/parrotlib.pbc: runtime/parrot/library/parrotlib.imc $(TEST_PROG) ./parrot -o $@ runtime/parrot/library/parrotlib.imc
[perl #25255] IMCC - no warning on duplicate .local vars
[coke - Sat Jan 24 19:32:16 2004]: It would be helpful if IMCC complained about duplicate .local labels, so that the attached wouldn't compile, rather than dying at runtime. A naive pass at this is: oolong:~/research/parrot coke$ cvs diff imcc/symreg.c Index: imcc/symreg.c = == RCS file: /cvs/public/parrot/imcc/symreg.c,v retrieving revision 1.55 diff -b -u -r1.55 symreg.c --- imcc/symreg.c 17 Jul 2004 08:07:27 - 1.55 +++ imcc/symreg.c 30 Oct 2004 04:45:21 - @@ -287,6 +287,11 @@ ident-next = namespace-idents; namespace-idents = ident; } +if (_get_sym(cur_unit-hash,fullname)) { +fataly(1, sourcefile, line, +duplicate .local or .sym: '%s', + fullname); +} r = mk_symreg(fullname, t); r-type = VTIDENTIFIER; free(name); This causes a few tests to fail: t/library/dumper.t 13 332813 13 100.00% 1-13 t/library/parrotlib.t1 256 61 16.67% 3 t/library/streams.t 12 307221 12 57.14% 2 4-5 8 10-12 14-17 20 t/pmc/iter.t 1 256441 2.27% 11 /Some/ of these seem to be valid errors. But it also seems to not like having .subs with the same name as a .local inside that sub. Also, the errors message isn't reporting properly. Help?
Re: [perl #25255] IMCC - no warning on duplicate .local vars
That is to say, the file and line number appear to be off. Will Coleda via RT wrote: Also, the errors message isn't reporting properly. Help?
Re: Q: newsub opcodes
Dan Sugalski [EMAIL PROTECTED] wrote: At 10:17 PM +0200 10/29/04, Leopold Toetsch wrote: That leads again to my warnocked proposal to just toss all variants of opcodes that have constants too. With all possible PMC constants in the constants table, we get another (estimated) times two opcode count increase. Please consider to reduce constant usage to 4 opcodes: I have, and no. (though we can toss all the two-constant I, S, and N forms) We leave things as-is. When we run up to our 1.0 release we can run some analysis on the different compilers to see what ops aren't being used and pare out the list at that point As a compromise, it strikes me that we're using 32-bit numbers to encode registers, but only using five bits of that 32. Could we not have constants indicated by numbers starting at 32 or something similar? We could probably do something very clever to abstract it, like load all the constants into a reserved, dynamically-sized set of registers starting at [INSP]32. A scheme like this would allow us to consolidate the constant and register variants of all the ops, while still allowing us to use constants whenever we wanted. I'm not sure if the cost--allocating more register banks and loading the constants into those registers--is worth it, but it might be worth thinking about at least. -- Brent 'Dax' Royal-Gordon [EMAIL PROTECTED] Perl and Parrot hacker There is no cabal.