Re: Python/Pirate status
Sam Ruby [EMAIL PROTECTED] wrote: I'm now converting to dynclasses. To be honest, I'm not thrilled with this. What I would really prefer is a Parrot_new_p_s opcode with the runtime worrying about caching class names across sub and module boundaries. $P0 = new Py_int or some such has a considerable runtime overhead, if that is emitted as a new_p_sc opcode. So we probably want to reserve a certain range of PMC enums for Python, Perl, whatever. With fix assigned PMC types, the type lookup could use integer types again and type numbers aren't depending on load order of PMC extensions. - Sam Ruby leo
Re: hash multithreading and cross language issue
Sam Ruby [EMAIL PROTECTED] wrote: I note that the perlscalar code is careful about multithreading issues (example: if we morph to a string, first clear str_val so that after changing the vtable a parallel reader doesn't get a gargabe pointer), but reuses a static PMC* intret. Lets postpone multi-threading issues for a while. dict = {} dict[1] = 'foo' dict[1] = 'bar' print dict[1] For Python support, it would be ideal if there would be a hash method entry in the VTABLE for each object. Not only ideal but necessary. The stringification of hash keys is a perlism that just isn't usable for Python. - Sam Ruby leo
[CVS ci] indirect register frame 9 - go
I've now committed the new (internal) calling scheme. On the surface nothing has changed, at least, if the code obeys to the rules in docs/pdds/pdd03_calling_conventions.pod. If you are using PIR code and the function call directives all will still work. PASM code or handcrafted calls have to take care to setup I0..I4 accordingly. If these registers don't indicate function arguments or return values, the other end will not see the passed values. Some additional notes: * t/library/streams_11 produces now a different result, I don't know which one is correct and why there is a difference * t/library/dumper.* seems to be broken WRT pdd03, it's disabled * t/op/gc_13 (Piers' backtracking example) needed the cloning of the 2nd Cchoose closure. I hope that this is correct, but as these closures are holding different state, it should be. * all prederefed run cores (Prederef, CGP, Switch) are currently broken because they are still using absolute register addresses. * all JIT platforms except ppc and i386 are broken Takers wanted for JIT fixes. See jit/ppc/* for necessary changes. leo
Re: pmc_type
Stphane Payrard writes: That would allow to implement typechecking in imcc. .sym Scalar a a = new .PerlInt # ok. Perlint is derived from Scalar Ugh, yeah, but what does that buy you? In dynamic languages pure derivational typechecking is very close to useless. The reason C++[1] has pure derivational semantics is because of implementation. The vtable functions have the same relative address, so you can use a derived object interchangably. In a language where methods are looked up by name, such strictures are more often over-restrictive than helpful. Anyway, that's just my rant. If such a thing is to be in imcc, it _must_ be optional without loss of feature. I have quibble with the automatic typechecking of .param variables for the same reason. Luke [1] And the reason Java has it is because C++ did. Great design work, guys.
[perl #32178] [TODO] include via relative paths
# New Ticket Created by Matt Diephouse # Please include the string: [perl #32178] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32178 Currently there's no way to include a file using a relative path. This is a bit limiting. For example, Tcl must be run from the root parrot directory because the compiler is split across multiple files. There should be a way to include these files and just make it work. -- matt diephouse http://matt.diephouse.com
Prederefed run cores
With the indirect register addressing all prederefed run cores (Prederefed, CGP, Switch) are currently not functional, as these run cores have absolute addresses in the prederefed code. I see two ways to fix it: 1) use frame pointer relative addressing: + prederefed code is usable by different threads too - ~4 times increase in code size of core_ops_*.{c,o} [1] 2) Re-prederef on function calls, if frame pointer differs + no impact on code size - needs precise code length of functions - threads need distinct prederefed code - possibly slower then 1) Comments welcome, leo [1] due to absolute addressing a constant argument and a register argument have the same code, set_i_ic and set_i_i are the same.
Re: [perl #32176] [PATCH] Getopt/Long tidbits and Array access benchmark
Bernhard Schmalhofer [EMAIL PROTECTED] wrote: this patch adds a benchmark for random access of different Array PMCs. Thanks, applied. Their differences against /dev/null are part of the attached patch. Hope that works. works fine. leo
Re: Python/Pirate status
Leopold Toetsch wrote: Sam Ruby [EMAIL PROTECTED] wrote: I'm now converting to dynclasses. To be honest, I'm not thrilled with this. What I would really prefer is a Parrot_new_p_s opcode with the runtime worrying about caching class names across sub and module boundaries. $P0 = new Py_int or some such has a considerable runtime overhead, if that is emitted as a new_p_sc opcode. So we probably want to reserve a certain range of PMC enums for Python, Perl, whatever. With fix assigned PMC types, the type lookup could use integer types again and type numbers aren't depending on load order of PMC extensions. Yes, I meant the ability to do things like '$P0 = new Py_int'. Could this be JITed? The mapping between string class name and assigned PMC type is constant throughout the life of the VM... What provoked me to suggest that was a statement made in IRC yesterday that TCL is doing a find_type in every subroutine that does a new. And the knowledge that every local variable in Python and PHP is likely to be a PMC. My concern is that if there isn't a convenient way to look up and cache these types, the considerable runtime overhead will still be incurred, but in ways that aren't readily ameanable to optimization by the runtime. - Sam Ruby
RE: Install-Problem
I left the make for overnight :) Here is the error I got.. xx.c cc -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/includ e/gdbm -g -Dan_Sugalski -Larry -Wall -Wstrict-prototypes -Wmissing-prototypes -W inline -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Waggre gate-return -Winline -W -Wno-unused -Wsign-compare -Wformat-nonliteral -Wformat- security -Wpacked -Wdisabled-optimization -mno-accumulate-outgoing-args -Wno-sha dow -falign-functions=16 -I./include -I/usr/include -DHAS_JIT -DI386 -DHAVE_COMP UTED_GOTO -I. -o xx.o -c xx.c ops/core_ops_cg.c cc1: Cannot allocate 56022680 bytes after allocating 116981760 bytes gmake: *** [ops/core_ops_cg.o] Error 1 Regards, V. -Original Message- From: Dan Sugalski [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 27, 2004 6:19 PM To: Vijay D.; [EMAIL PROTECTED] Subject: Re: Install-Problem At 4:31 PM +0530 10/27/04, Vijay D. wrote: Hi, I was trying to install the latest Parrot. The latest source code is checked out from CVS. After configure, The make is stopping at ops/core_ops.c ops/core_ops_prederef.c ops/core_ops_switch.c ops/core_ops_cg.c Is it stopping, or just taking a long time? Those files take a while to build, and a lot of memory to build in. Figure on a few minutes, depending on your CPU, if you have enough memory. If the compiler falls into swap (which'll happen if you've less than 256M or so) it can take a half hour or more. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Install-Problem
On Thu, Oct 28, 2004 at 04:49:26PM +0530, Vijay D. wrote: I left the make for overnight :) Here is the error I got.. xx.c cc -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/includ e/gdbm -g -Dan_Sugalski -Larry -Wall -Wstrict-prototypes -Wmissing-prototypes -W inline -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Waggre gate-return -Winline -W -Wno-unused -Wsign-compare -Wformat-nonliteral -Wformat- security -Wpacked -Wdisabled-optimization -mno-accumulate-outgoing-args -Wno-sha dow -falign-functions=16 -I./include -I/usr/include -DHAS_JIT -DI386 -DHAVE_COMP UTED_GOTO -I. -o xx.o -c xx.c ops/core_ops_cg.c cc1: Cannot allocate 56022680 bytes after allocating 116981760 bytes gmake: *** [ops/core_ops_cg.o] Error 1 If your really committed to using computed goto removing the -g may help. Its gotten me past that kind of problem before. -- It is our mission to synergistically negotiate mission-critical resources so that we may conveniently foster parallel intellectual capital
RE: Install-Problem
pass the --cgoto=0 flag to Configure.pl. Thanks for the tip, I installed successfully .. I also have RH 9.0 and would love someone to confirm that make testj will fail on 3 tests (unless you additionally pass it another flag). Here is the output for the fulltest on my redhat machine. 3 tests and 49 subtests skipped. Failed 104/112 test scripts, 7.14% okay. 1851/1905 subtests failed, 2.83% okay. make: *** [testg] Error 2 Regards, Vijay.
RE: Install-Problem
At 4:49 PM +0530 10/28/04, Vijay D. wrote: I left the make for overnight :) Here is the error I got.. xx.c ops/core_ops_cg.c cc1: Cannot allocate 56022680 bytes after allocating 116981760 bytes gmake: *** [ops/core_ops_cg.o] Error 1 You just ran out of memory during the build. (If this is a server system you might want to check and make sure nothing else got killed by the OOM monitor) The computed goto cores do make gcc more than a little unhappy. Pass in the --cgoto=0 switch to configure, or throw another half-gig or so of swap at your system. :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Prederefed run cores
At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote: With the indirect register addressing all prederefed run cores (Prederefed, CGP, Switch) are currently not functional, as these run cores have absolute addresses in the prederefed code. I see two ways to fix it: 1) use frame pointer relative addressing: + prederefed code is usable by different threads too - ~4 times increase in code size of core_ops_*.{c,o} [1] 2) Re-prederef on function calls, if frame pointer differs + no impact on code size - needs precise code length of functions - threads need distinct prederefed code - possibly slower then 1) Or 3) Toss the prederef stuff entirely. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: register allocation questions
At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote: So, if you want that really super efficient, you would allocate registers around function calls directly to that wanted register number, which should be in the SymReg's want_regno. While true, in the general case leaving 0-15 as non-preferred registers will probably make things easier. Those registers, especially the PMC ones, are going to see a lot of thrash as function calls are made, and it'll probably be easier to have them as scratch registers. Yep, that's the easy part ;) OTOH when the register allocator is doing register renaming anyway, the most inner loop with a function call should get registers assigned already matching the calling convemtions. With more then one call at that loop level, you have to move around registers anyway. Oh, sure, but keeping your scratch PMCs out of the way makes life a lot easier for the register coloring algorithms. Might not be optimal, but if it makes life simpler to start, optimal can come later. It's distinctly possible, of course, that there'll be very little pressure to actually *use* them for most code, as we've got plenty of registers in general. That's the hope, at least. Yes, 16 regs are plenty and do suffice for all normal[1] code. Assigning to wanted reg numbers for a function is a nice optimization. [1] all except Dan's 6000 lines subroutines :) Did you start creating real subs for your code already? I wish. :( Unfortunately not, outside some simple stuff, and I doubt I will. The language just doesn't lend itself to that sort of thing. We're going to add actual real subroutines to the language after we roll out into production, but that doesn't help now, alas. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Prederefed run cores
Dan Sugalski wrote: Or 3) Toss the prederef stuff entirely. Which might not be quite as bad as it sounds: on at least one strange platform (IA64 HP-UX) the native C compiler gets the switch core running faster than the prederef core! (!) Duraid
Re: extend.c:Parrot_call
Leopold Toetsch wrote: Parrot_call() runs a Parrot subroutine, but it takes PMC arguments only and provides no return value. If no one hollers, I'll replace this function with a more flexible set of functions that are wrappers to the *runops* functions in src/inter_run.c: void *Parrot_call_sub_(interp, sub, signature, ...) [1] Parrot_IntParrot_call_sub_ret_int Parrot_Float Parrot_call_sub_ret_float void *Parrot_call_meth(interp, sub, object, meth, sig, ...) Done that now. The latter is: void *Parrot_call_method(interp, sub, object, meth, sig, ...) and other 2 accordingly. leo
Re: Prederefed run cores
Duraid Madina wrote: Dan Sugalski wrote: Or 3) Toss the prederef stuff entirely. Which might not be quite as bad as it sounds: on at least one strange platform (IA64 HP-UX) the native C compiler gets the switch core running faster than the prederef core! (!) Err, the switched core *is* a prederefed core. Duraid leo
Re: Install-Problem
Vijay D. [EMAIL PROTECTED] wrote: Failed 104/112 test scripts, 7.14% okay. 1851/1905 subtests failed, 2.83% okay. make: *** [testg] Error 2 Well, testing the now non-existing CGoto core with make testg is probably not really helpfull ;) Regards, Vijay. leo
Re: Python/Pirate status
Sam Ruby [EMAIL PROTECTED] wrote: Yes, I meant the ability to do things like '$P0 = new Py_int'. Could this be JITed? The mapping between string class name and assigned PMC type is constant throughout the life of the VM... Not really or not easily. Fastest is to have type enum numbers. Which needs reserved ranges for not yet loaded extensions. What provoked me to suggest that was a statement made in IRC yesterday that TCL is doing a find_type in every subroutine that does a new. And the knowledge that every local variable in Python and PHP is likely to be a PMC. Well, if only one set of PMC types is used and you control program initialization, it's not too hard, to probe Parrot for the next PMC type number. Then the compiler can emit the load_bytecode ops on top and use type numbers. That doesn't work, if the library loading isn't always using the same sequence, of course. My concern is that if there isn't a convenient way to look up and cache these types, the considerable runtime overhead will still be incurred, but in ways that aren't readily ameanable to optimization by the runtime. Yes. - Sam Ruby leo
Re: pmc_type
On 10/27/04 Luke Palmer wrote: Stéphane Payrard writes: That would allow to implement typechecking in imcc. .sym Scalar a a = new .PerlInt # ok. Perlint is derived from Scalar Ugh, yeah, but what does that buy you? In dynamic languages pure derivational typechecking is very close to useless. The reason C++[1] has pure derivational semantics is because of implementation. The vtable functions have the same relative address, so you can use a derived object interchangably. In a language where methods are looked up by name, such strictures are more often over-restrictive than helpful. Actually, if I were to write a perl runtime for parrot, mono or even the JVM I'd experiment with the same pattern. I guess it could be applied to a python implementation, too. You would assign small interger IDs to the names of the methods and build a vtable indexed by the id. In most cases the method name is known at compile time, so you know the id and you can get the method with a simple load from the vtable. This is much faster than a hash table lookup (I hinted at this in my old RFC for perl6). Of course the table would be sparse, especially in pathological programs, so you could have a limit, like 100 entries or less with IDs bigger than that using a different lookup (binary search on an array, for example). There are a number of optimizations that can be done to reduce the vtable size, but I'm not sure this would matter in parrot as long as bytecode values are as big as C ints:-) Maybe someone has time to write a script and run it on a bunch of perl programs and report how many different method names are usually created. Of course it also depends how much the hash lookup will cost wrt the total cost of a subroutine call... lupus -- - [EMAIL PROTECTED] debian/rules [EMAIL PROTECTED] Monkeys do it better
Access to Parakeet in CVS
So I've *finally* created a Perl.org account in order to update Parakeet in CVS. As I understand it my next step is to inform the developers of my username michel so that I can be given access to that area. I'm got some exciting new changes to commit just as soon as I figure out if they work with Leo's changes to the PCC innards. Thanks in advance, -Michel
[perl #32196] Yet Another GC Crash (YAGC)
# New Ticket Created by Matt Diephouse # Please include the string: [perl #32196] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32196 Parrot exploded when running my forth implementation after a cvs update. Below is the backtrace from gdb. -- matt diephouse http://matt.diephouse.com ns:~/Projects/parrot/languages/forth ezekiel$ ulimit -c unlimited ns:~/Projects/parrot/languages/forth ezekiel$ parrot -t forth.pir 2trace.log Bus error (core dumped) ns:~/Projects/parrot/languages/forth ezekiel$ ls /cores/ core.20226 ns:~/Projects/parrot/languages/forth ezekiel$ gdb ../../parrot /cores/core.20226 GNU gdb 5.3-20030128 (Apple version gdb-330.1) (Fri Jul 16 21:42:28 GMT 2004) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as powerpc-apple-darwin. Reading symbols for shared libraries .. done Core was generated by `/Users/ezekiel/bin/parrot'. #0 0x0003d420 in pobject_lives (interpreter=0xd00140, obj=0x0) at src/dod.c:198 198 if (PObj_is_live_or_free_TESTALL(obj)) { (gdb) bt #0 0x0003d420 in pobject_lives (interpreter=0xd00140, obj=0x0) at src/dod.c:198 #1 0x48f0 in mark_1_seg (interpreter=0xd00140, cs=0xd01fd0) at src/packfile.c:360 #2 0x4990 in find_code_iter (seg=0xd01fd0, user_data=0xd00140) at src/packfile.c:375 #3 0x503c in PackFile_map_segments (dir=0xd01e40, callback=0x4920 find_code_iter, user_data=0xd00140) at src/packfile.c:687 #4 0x4a18 in mark_const_subs (interpreter=0xd00140) at src/packfile.c:399 #5 0x0003d734 in Parrot_dod_trace_root (interpreter=0xd00140, trace_stack=1) at src/dod.c:333 #6 0x0003d848 in trace_active_PMCs (interpreter=0xd00140, trace_stack=1) at src/dod.c:371 #7 0x0003e5b0 in Parrot_dod_ms_run (interpreter=0xd00140, flags=1) at src/dod.c:1168 #8 0x0003e76c in Parrot_do_dod_run (interpreter=0xd00140, flags=1) at src/dod.c:1224 #9 0x0009a0a4 in mem_allocate (interpreter=0xd00140, req_size=0xbfffe650, pool=0xd002d0, align_1=15) at src/resources.c:142 #10 0x0009aefc in Parrot_allocate_string (interpreter=0xd00140, str=0xfe7798, size=128) at src/resources.c:656 #11 0x0002a814 in string_make_empty (interpreter=0xd00140, representation=enum_stringrep_one, capacity=128) at src/string.c:352 #12 0x00102138 in Parrot_sprintf_format (interpreter=0xd00140, pat=0xfe77c0, obj=0xb7e0) at src/spf_render.c:290 #13 0x000ea728 in Parrot_vsprintf_s (interpreter=0xd00140, pat=0xfe77c0, args=0xb8f0 ) at src/misc.c:68 #14 0x000ea7d4 in Parrot_vsprintf_c (interpreter=0xd00140, pat=0x299f40 \n, args=0xb8f0 ) at src/misc.c:93 #15 0x00033da8 in PIO_eprintf (interpreter=0xd00140, s=0x299f40 \n) at io/io.c:1069 #16 0x001d469c in trace_op_dump (interpreter=0xd00140, code_start=0x102cc00, pc=0x102cda4) at src/trace.c:327 #17 0x001d4724 in trace_op (interpreter=0xd00140, code_start=0x102cc00, code_end=0x102d398, pc=0x102cda4) at src/trace.c:355 #18 0x001d33f8 in runops_slow_core (interpreter=0xd00140, pc=0x102cda4) at src/runops_cores.c:155 #19 0x0003fc8c in runops_int (interpreter=0xd00140, offset=0) at src/interpreter.c:808 #20 0x00038bd0 in runops (interpreter=0xd00140, offset=0) at src/inter_run.c:69 #21 0xc150 in Parrot_runcode (interpreter=0xd00140, argc=1, argv=0xbd98) at src/embed.c:750 #22 0xbf58 in Parrot_runcode (interpreter=0xd00140, argc=1, argv=0xbd98) at src/embed.c:679 #23 0x3f8c in main (argc=1, argv=0xbd98) at imcc/main.c:579 (gdb)
Re: register allocation questions
Hi all, Thanks for your continued comments. Btw, I usually read all the parrot list, so don't think I'm not paying attention. Currently, here's how the register allocator is doing. Failed TestStat Wstat Total Fail Failed List of Failed --- t/library/dumper.t5 1280135 38.46% 1-2 5 8 13 4 tests and 51 subtests skipped. Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay. I recall Leo, or someone, saying that the data dumper routines are not following the calling convention properly. So I've decided not to worry about it too much. It passes the other tests, plus the randomized tests that I created, up to 150 symbols. At that range, it still takes about 20x longer than g++ -O2, for equivalent programs to compile (see gen4.pl). Also, it is currently running about O(n^2) for n symbols, where the old one was running about O(n^3) from my analysis. The spill code is still very expensive, and has a large constant associate. I also have data, which is attached. The difference doesn't show up until a lot of spilling is going on, around 80 symbols or so. I've learned a lot about how the compiler works at this point, and I'd like to contribute more :) Would you like a patch? Should I fix the data dumper routines first? What is all this talk about deferred registers? What should I do next? Well, I'm making some comments on the below stuff. On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote: At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote: So, if you want that really super efficient, you would allocate registers around function calls directly to that wanted register number, which should be in the SymReg's want_regno. Yes, I think we are kind of doing this. It's best to pass the registers straight through though. Like when a variable will be used as a parameter, give it the appropriate reg num. Sort of outside the immediate scope of register coloring, but as I've learned, one must go a little beyond, to see the input and output for each sub. While true, in the general case leaving 0-15 as non-preferred registers will probably make things easier. Those registers, especially the PMC ones, are going to see a lot of thrash as function calls are made, and it'll probably be easier to have them as scratch registers. I guess I don't agree. I'd like to pack down the number of registers used to a minimum. Then when a function is called, only those needed registers are copied in/out. Don't think the functionality exists. But the idea is to have each sub declare how many registers to save/restore. This would then save 0-k such registers. Where k is the number of registers used by the sub. Pack 'em down, minimize the number needed. We can also minimize this number to match the physical architecture that parrot is running on (for an arch specific optimization). The imc_reg_alloc function does not have 32 hard coded in there (well a little bit, but can be easily changed). It's pretty dynamic. Yep, that's the easy part ;) OTOH when the register allocator is doing register renaming anyway, the most inner loop with a function call should get registers assigned already matching the calling convemtions. With more then one call at that loop level, you have to move around registers anyway. Yes, yes, renaming! I want to do register renaming! Oh, sure, but keeping your scratch PMCs out of the way makes life a lot easier for the register coloring algorithms. Might not be optimal, but if it makes life simpler to start, optimal can come later. p31 holds all the spill stuff. It's a pain. Maybe I'll move that around, but if p31 is used, it means that there is no more room for symbols, in at least one of the reg sets. [1] all except Dan's 6000 lines subroutines :) Did you start creating real subs for your code already? I wish. :( Unfortunately not, outside some simple stuff, and I doubt I will. The language just doesn't lend itself to that sort of thing. We're going to add actual real subroutines to the language after we roll out into production, but that doesn't help now, alas. Interesting. I'd like to test on something like that. Maybe SPEC99 as well. - Bill Coffman compile.dat Description: Binary data compile.plot Description: Binary data attachment: compile.png
[PATCH] Re: [CVS ci] indirect register frame 9 - go
On Thu, Oct 28, 2004 at 10:06:05AM +0200, Leopold Toetsch wrote: * all JIT platforms except ppc and i386 are broken Takers wanted for JIT fixes. See jit/ppc/* for necessary changes. This patch fixes JIT for the sparc platform (make testj passes except for the streams and gc_10.pasm where it hangs - where apparently ppc has the issues). leo Thanks, Stéphane Index: jit/sun4/jit_emit.h === RCS file: /cvs/public/parrot/jit/sun4/jit_emit.h,v retrieving revision 1.30 diff -u -r1.30 jit_emit.h --- jit/sun4/jit_emit.h 10 Oct 2004 17:27:45 - 1.30 +++ jit/sun4/jit_emit.h 28 Oct 2004 21:24:47 - @@ -355,7 +355,7 @@ /* This register can be used only in jit_emit.h calculations */ #define XSR1 emitm_l(0) -#define Parrot_jit_regbase_ptr(i) ((i)-int_reg.registers[0]) +#define Parrot_jit_regbase_ptr(interpreter) REG_INT(0) /* The offset of a Parrot register from the base register */ #define Parrot_jit_regoff(a, i) (unsigned)(a) - (unsigned)(Parrot_jit_regbase_ptr(i)) @@ -469,25 +469,25 @@ break; case PARROT_ARG_I: -val = (int)interpreter-int_reg.registers[val]; +val = (int)REG_INT(val); emitm_ld_i(jit_info-native_ptr, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter), hwreg); break; case PARROT_ARG_P: -val = (int)interpreter-pmc_reg.registers[val]; +val = (int)REG_PMC(val); emitm_ld_i(jit_info-native_ptr, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter), hwreg); break; case PARROT_ARG_S: -val = (int)interpreter-string_reg.registers[val]; +val = (int)REG_STR(val); emitm_ld_i(jit_info-native_ptr, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter), hwreg); break; case PARROT_ARG_N: -val = (int)interpreter-num_reg.registers[val]; +val = (int)REG_NUM(val); emitm_ldd_i(jit_info-native_ptr, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter), hwreg); break; @@ -512,25 +512,25 @@ switch(op_type){ case PARROT_ARG_I: -val = (int)interpreter-int_reg.registers[val]; +val = (int)REG_INT(val); emitm_st_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter)); break; case PARROT_ARG_P: -val = (int)interpreter-pmc_reg.registers[val]; +val = (int)REG_PMC(val); emitm_st_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter)); break; case PARROT_ARG_S: -val = (int)interpreter-string_reg.registers[val]; +val = (int)REG_STR(val); emitm_st_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter)); break; case PARROT_ARG_N: -val = (int)interpreter-num_reg.registers[val]; +val = (int)REG_NUM(val); emitm_std_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter)); break; @@ -572,13 +572,13 @@ break; case PARROT_ARG_I: -val = (int)interpreter-int_reg.registers[val]; +val = (int)REG_INT(val); emitm_ldf_i(jit_info-native_ptr, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter), hwreg); break; case PARROT_ARG_N: -val = (int)interpreter-num_reg.registers[val]; +val = (int)REG_NUM(val); emitm_lddf_i(jit_info-native_ptr, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter), hwreg); break; @@ -602,13 +602,13 @@ switch(op_type){ case PARROT_ARG_I: -val = (int)interpreter-int_reg.registers[val]; +val = (int)REG_INT(val); emitm_stf_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter)); break; case PARROT_ARG_N: -val = (int)interpreter-num_reg.registers[val]; +val = (int)REG_NUM(val); emitm_stdf_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase, Parrot_jit_regoff(val, interpreter)); break; @@ -664,23 +664,27 @@ * i1 is reusable once past the jump. interpreter is preserved in i0 */ int ireg0_offset; +int ireg0_address; /* Standard Prolog */ emitm_save_i(jit_info-native_ptr, emitm_SP, -104, emitm_SP); /* Calculate the offset of I0 in the interpreter struct */ -ireg0_offset =
Re: register allocation questions
At 3:08 PM -0700 10/28/04, Bill Coffman wrote: It passes the other tests, plus the randomized tests that I created, up to 150 symbols. At that range, it still takes about 20x longer than g++ -O2, for equivalent programs to compile (see gen4.pl). Still, that's not bad. Also, it is currently running about O(n^2) for n symbols, where the old one was running about O(n^3) from my analysis. The spill code is still very expensive, and has a large constant associate. I also have data, which is attached. The difference doesn't show up until a lot of spilling is going on, around 80 symbols or so. I'm curious to see how it behaves once the spilling gets up into the 1000+ symbol range. Dropping from cubic to quadratic time ought to make a not-insignificant change in the running time, even if that constant's pretty big. :) I've learned a lot about how the compiler works at this point, and I'd like to contribute more :) Would you like a patch? Yes! Oh, yeah, definitely. Well, I'm making some comments on the below stuff. On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote: At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote: While true, in the general case leaving 0-15 as non-preferred registers will probably make things easier. Those registers, especially the PMC ones, are going to see a lot of thrash as function calls are made, and it'll probably be easier to have them as scratch registers. I guess I don't agree. I'd like to pack down the number of registers used to a minimum. Then when a function is called, only those needed registers are copied in/out. Don't think the functionality exists. But the idea is to have each sub declare how many registers to save/restore. This would then save 0-k such registers. Where k is the number of registers used by the sub. Pack 'em down, minimize the number needed. We can also minimize this number to match the physical architecture that parrot is running on (for an arch specific optimization). The imc_reg_alloc function does not have 32 hard coded in there (well a little bit, but can be easily changed). It's pretty dynamic. By all means, go for it. I certainly don't want to curb your enthusiasm. It's the right thing to do, ultimately. I didn't want to presume on your time. Happy to have it, of course. :) [1] all except Dan's 6000 lines subroutines :) Did you start creating real subs for your code already? I wish. :( Unfortunately not, outside some simple stuff, and I doubt I will. The language just doesn't lend itself to that sort of thing. We're going to add actual real subroutines to the language after we roll out into production, but that doesn't help now, alas. Interesting. I'd like to test on something like that. Maybe SPEC99 as well. If you've got a patch, I'd be more than happy to give it a whirl, and I can likely get you a copy of the code in question to give a run on. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: register allocation
When I cvs up'd, cleared and reConfigure'd I got these stats: Failed Test Stat Wstat Total Fail Failed List of Failed --- t/library/streams.t1 256211 4.76% 11 t/op/gc.t 1 256181 5.56% 13 4 tests and 66 subtests skipped. Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay. I'm curious to see how it behaves once the spilling gets up into the 1000+ symbol range. Dropping from cubic to quadratic time ought to make a not-insignificant change in the running time, even if that constant's pretty big. :) I think it's a bit more complicated. M=number lines in code, N=number variables. - Time = O(M^2+N^2) - old time = O(M^2 +N^3) Not quite sure of this either. But the N^3 eventually dominates, I think. The data seems to bear this out. There are more fixes I'd like to make as well. I spotted several things that could be fixed. And I think the spill code can be optimized a lot to reduce the big-O time as well. More statistics #vars v. time in seconds: #vars gcc parrot2 200 7.92 89.20 201 11.86 146.31 202 18.11 246.37 203 9.54 107.88 204 11.81 134.60 205 14.75 190.95 206 13.25 161.83 207 10.63 138.83 208 11.02 117.73 209 7.14 88.29 210 15.14 176.69 I am also running gen3.pl with 1000 vars. It's still on gcc. We'll see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight. Would you like a patch? Yes! Oh, yeah, definitely. [...] If you've got a patch, I'd be more than happy to give it a whirl, and I can likely get you a copy of the code in question to give a run on. Soon, I'll send one the proper way. The imc_reg_alloc function does not have 32 hard coded in there (well a little bit, but can be easily changed). It's pretty dynamic. By all means, go for it. I certainly don't want to curb your enthusiasm. It's the right thing to do, ultimately. I didn't want to presume on your time. Happy to have it, of course. :) Thanks. I've had a great time doing this. Remembering graph algorithms and compilers. Great fun! I'd also like to contribute to getting Parrot out there, sooner rather than later. So if I can help with that, I'd like to hear suggestions. -Bill
Re: register allocation questions
Thanks Matt, I hope I can help out. The patch I am submitting actually does simplify register coloring a bit. I've been waiting for perl6 with so much anticipation, I just couldn't stand it any more, and I had to participate. -Bill On Thu, 28 Oct 2004 18:17:57 -0400, Matt Fowles [EMAIL PROTECTED] wrote: Bill~ I have to say that I am really impressed by all of the work that you are doing, and if you can make the internals of imcc a little more approachable, you would be doing a great service. Thanks, Matt
Re: C89
Thanks for the info... Apparently, gcc -ansi -pedantic is supposed to be ANSI C '89. Equiv to -std=c89. Also, my Configure.pl generated make file uses neither -ansi nor -pedantic. I do have access to a KR C v2, but it doesn't look like it's going to match the actual practice. Oh well. So long, as my code works, I'm happy. Incidentally, I tried adding -ansi and -pedantic and I got lots of warnings, like long long not supported by ANSI C'89, etc. (how can you do 64 bit ints then?). I also got errors that caused outright failure. Perhaps it's best to forget the whole C'89 thing. But maybe someone should remove that from the documentation? Just a thought. -Bill On Thu, 21 Oct 2004 22:41:36 -0700, Jeff Clites [EMAIL PROTECTED] wrote: On Oct 21, 2004, at 11:51 AM, Dan Sugalski wrote: At 11:25 AM -0700 10/21/04, Bill Coffman wrote: I read somewhere that the requirement for parrot code is that it should be compliant with the ANSI C'89 standard. Can someone point me to a description of the C89 spec, so I can make sure my reg_alloc.c patch is C89 compliant? I don't think the ANSI C89 spec is freely available, though I may be wrong. (Google didn't find it easily, but I don't always get along well with Google) If the patch builds without warning with parrot's standard switches then you should be OK. (ANSI C89 was the first big rev of C after the original KR C. If you've got the second edition or later of the KR C book, it uses the C89 spec) Also, if you're compiling with gcc, then you can pass -std=c89 to the compiler to enforce that particular standard. (Apparently--though I haven't tried it.) I believe -ansi does the same thing. JEff