Re: Python/Pirate status

2004-10-28 Thread Leopold Toetsch
Sam Ruby [EMAIL PROTECTED] wrote:

 I'm now converting to dynclasses.  To be honest, I'm not thrilled with
 this.  What I would really prefer is a Parrot_new_p_s opcode with the
 runtime worrying about caching class names across sub and module boundaries.

  $P0 = new Py_int

or some such has a considerable runtime overhead, if that is emitted as
a new_p_sc opcode. So we probably want to reserve a certain range of
PMC enums for Python, Perl, whatever. With fix assigned PMC types, the
type lookup could use integer types again and type numbers aren't
depending on load order of PMC extensions.

 - Sam Ruby

leo


Re: hash multithreading and cross language issue

2004-10-28 Thread Leopold Toetsch
Sam Ruby [EMAIL PROTECTED] wrote:
 I note that the perlscalar code is careful about multithreading issues
 (example: if we morph to a string, first clear str_val so that after
 changing the vtable a parallel reader doesn't get a gargabe pointer),
 but reuses a static PMC* intret.

Lets postpone multi-threading issues for a while.

dict = {}
dict[1] = 'foo'
dict[1] = 'bar'
print dict[1]

 For Python support, it would be ideal if there would be a hash method
 entry in the VTABLE for each object.

Not only ideal but necessary. The stringification of hash keys is a
perlism that just isn't usable for Python.

 - Sam Ruby

leo


[CVS ci] indirect register frame 9 - go

2004-10-28 Thread Leopold Toetsch
I've now committed the new (internal) calling scheme. On the surface 
nothing has changed, at least, if the code obeys to the rules in 
docs/pdds/pdd03_calling_conventions.pod.

If you are using PIR code and the function call directives all will 
still work. PASM code or handcrafted calls have to take care to setup 
I0..I4 accordingly. If these registers don't indicate function arguments 
or return values, the other end will not see the passed values.

Some additional notes:
* t/library/streams_11 produces now a different result, I don't know 
which one is correct and why there is a difference
* t/library/dumper.* seems to be broken WRT pdd03, it's disabled
* t/op/gc_13 (Piers' backtracking example) needed the cloning of the 2nd 
Cchoose closure. I hope that this is correct, but as these closures 
are holding different state, it should be.

* all prederefed run cores (Prederef, CGP, Switch) are currently broken 
because they are still using absolute register addresses.
* all JIT platforms except ppc and i386 are broken

Takers wanted for JIT fixes. See jit/ppc/* for necessary changes.
leo


Re: pmc_type

2004-10-28 Thread Luke Palmer
Stphane Payrard writes:
 That would allow to implement typechecking in imcc.
 
   .sym Scalar a
   a = new .PerlInt  # ok.  Perlint is derived from Scalar

Ugh, yeah, but what does that buy you?  In dynamic languages pure
derivational typechecking is very close to useless.  The reason C++[1]
has pure derivational semantics is because of implementation.  The
vtable functions have the same relative address, so you can use a
derived object interchangably.  In a language where methods are looked
up by name, such strictures are more often over-restrictive than
helpful.

Anyway, that's just my rant.  If such a thing is to be in imcc, it
_must_ be optional without loss of feature.  I have quibble with the
automatic typechecking of .param variables for the same reason.

Luke

[1] And the reason Java has it is because C++ did.  Great design work,
guys.


[perl #32178] [TODO] include via relative paths

2004-10-28 Thread via RT
# New Ticket Created by  Matt Diephouse 
# Please include the string:  [perl #32178]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32178 


Currently there's no way to include a file using a relative path. This 
is a bit limiting. For example, Tcl must be run from the root parrot 
directory because the compiler is split across multiple files. There 
should be a way to include these files and just make it work.

--
matt diephouse
http://matt.diephouse.com



Prederefed run cores

2004-10-28 Thread Leopold Toetsch
With the indirect register addressing all prederefed run cores 
(Prederefed, CGP, Switch) are currently not functional, as these run 
cores have absolute addresses in the prederefed code.

I see two ways to fix it:
1) use frame pointer relative addressing:
   + prederefed code is usable by different threads too
   - ~4 times increase in code size of core_ops_*.{c,o} [1]
2) Re-prederef on function calls, if frame pointer differs
   + no impact on code size
   - needs precise code length of functions
   - threads need distinct prederefed code
   - possibly slower then 1)
Comments welcome,
leo
[1] due to absolute addressing a constant argument and a register 
argument have the same code, set_i_ic and set_i_i are the same.



Re: [perl #32176] [PATCH] Getopt/Long tidbits and Array access benchmark

2004-10-28 Thread Leopold Toetsch
Bernhard Schmalhofer [EMAIL PROTECTED] wrote:

 this patch adds a benchmark for random access of different Array PMCs.

Thanks, applied.

 Their differences against /dev/null are part of the attached patch. Hope
 that works.

works fine.

leo


Re: Python/Pirate status

2004-10-28 Thread Sam Ruby
Leopold Toetsch wrote:
Sam Ruby [EMAIL PROTECTED] wrote:
I'm now converting to dynclasses.  To be honest, I'm not thrilled with
this.  What I would really prefer is a Parrot_new_p_s opcode with the
runtime worrying about caching class names across sub and module boundaries.
  $P0 = new Py_int
or some such has a considerable runtime overhead, if that is emitted as
a new_p_sc opcode. So we probably want to reserve a certain range of
PMC enums for Python, Perl, whatever. With fix assigned PMC types, the
type lookup could use integer types again and type numbers aren't
depending on load order of PMC extensions.
Yes, I meant the ability to do things like '$P0 = new Py_int'.
Could this be JITed?  The mapping between string class name and assigned 
PMC type is constant throughout the life of the VM...

What provoked me to suggest that was a statement made in IRC yesterday 
that TCL is doing a find_type in every subroutine that does a new.  And 
the knowledge that every local variable in Python and PHP is likely to 
be a PMC.

My concern is that if there isn't a convenient way to look up and cache 
these types, the considerable runtime overhead will still be incurred, 
but in ways that aren't readily ameanable to optimization by the runtime.

- Sam Ruby


RE: Install-Problem

2004-10-28 Thread Vijay D.

I left the make for overnight :) 
Here is the error I got..

xx.c
cc -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-I/usr/includ
e/gdbm -g -Dan_Sugalski -Larry -Wall -Wstrict-prototypes
-Wmissing-prototypes -W
inline -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings
-Waggre
gate-return -Winline -W -Wno-unused -Wsign-compare -Wformat-nonliteral
-Wformat-
security -Wpacked -Wdisabled-optimization -mno-accumulate-outgoing-args
-Wno-sha
dow -falign-functions=16 -I./include -I/usr/include -DHAS_JIT -DI386
-DHAVE_COMP
UTED_GOTO -I. -o xx.o -c xx.c
ops/core_ops_cg.c

cc1: Cannot allocate 56022680 bytes after allocating 116981760 bytes
gmake: *** [ops/core_ops_cg.o] Error 1



Regards,
V.


-Original Message-
From: Dan Sugalski [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 27, 2004 6:19 PM
To: Vijay D.; [EMAIL PROTECTED]
Subject: Re: Install-Problem

At 4:31 PM +0530 10/27/04, Vijay D. wrote:
Hi,
I was trying to install the latest Parrot. The latest source code is
checked out from CVS.
After configure,  The make is stopping at

ops/core_ops.c
ops/core_ops_prederef.c
ops/core_ops_switch.c
ops/core_ops_cg.c


Is it stopping, or just taking a long time? Those files take a while 
to build, and a lot of memory to build in. Figure on a few minutes, 
depending on your CPU, if you have enough memory. If the compiler 
falls into swap (which'll happen if you've less than 256M or so) it 
can take a half hour or more.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk


Re: Install-Problem

2004-10-28 Thread Peter Sinnott
On Thu, Oct 28, 2004 at 04:49:26PM +0530, Vijay D. wrote:
 
 I left the make for overnight :) 
 Here is the error I got..
 
 xx.c
 cc -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
 -I/usr/includ
 e/gdbm -g -Dan_Sugalski -Larry -Wall -Wstrict-prototypes
 -Wmissing-prototypes -W
 inline -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings
 -Waggre
 gate-return -Winline -W -Wno-unused -Wsign-compare -Wformat-nonliteral
 -Wformat-
 security -Wpacked -Wdisabled-optimization -mno-accumulate-outgoing-args
 -Wno-sha
 dow -falign-functions=16 -I./include -I/usr/include -DHAS_JIT -DI386
 -DHAVE_COMP
 UTED_GOTO -I. -o xx.o -c xx.c
 ops/core_ops_cg.c
 
 cc1: Cannot allocate 56022680 bytes after allocating 116981760 bytes
 gmake: *** [ops/core_ops_cg.o] Error 1
 

If your really committed to using computed goto removing the -g may help. 
Its gotten me past that kind of problem before.

-- 
It is our mission to synergistically negotiate mission-critical resources 
so that we may conveniently foster parallel intellectual capital


RE: Install-Problem

2004-10-28 Thread Vijay D.

 pass the --cgoto=0 flag to Configure.pl.  
Thanks for the tip, I installed successfully .. 


I also have
RH 9.0 and would love someone to confirm that make
testj will fail on 3 tests (unless you additionally
pass it another flag). 

Here is the output for the fulltest on my redhat machine.

3 tests and 49 subtests skipped.
Failed 104/112 test scripts, 7.14% okay. 1851/1905 subtests failed,
2.83% okay.
make: *** [testg] Error 2



Regards,
Vijay.


RE: Install-Problem

2004-10-28 Thread Dan Sugalski
At 4:49 PM +0530 10/28/04, Vijay D. wrote:
I left the make for overnight :)
Here is the error I got..
xx.c
ops/core_ops_cg.c
cc1: Cannot allocate 56022680 bytes after allocating 116981760 bytes
gmake: *** [ops/core_ops_cg.o] Error 1
You just ran out of memory during the build. (If this is a server 
system  you might want to check and make sure nothing else got killed 
by the OOM monitor) The computed goto cores do make gcc more than a 
little unhappy. Pass in the --cgoto=0 switch to configure, or throw 
another half-gig or so of swap at your system. :)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Prederefed run cores

2004-10-28 Thread Dan Sugalski
At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
With the indirect register addressing all prederefed run cores 
(Prederefed, CGP, Switch) are currently not functional, as these run 
cores have absolute addresses in the prederefed code.

I see two ways to fix it:
1) use frame pointer relative addressing:
   + prederefed code is usable by different threads too
   - ~4 times increase in code size of core_ops_*.{c,o} [1]
2) Re-prederef on function calls, if frame pointer differs
   + no impact on code size
   - needs precise code length of functions
   - threads need distinct prederefed code
   - possibly slower then 1)
Or 3) Toss the prederef stuff entirely.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation questions

2004-10-28 Thread Dan Sugalski
At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:

So, if you want that really super efficient, you would allocate
registers around function calls directly to that wanted register number,
which should be in the SymReg's want_regno.
While true, in the general case leaving 0-15 as non-preferred 
registers will probably make things easier. Those registers, 
especially the PMC ones, are going to see a lot of thrash as 
function calls are made, and it'll probably be easier to have them 
as scratch registers.
Yep, that's the easy part ;) OTOH when the register allocator is 
doing register renaming anyway, the most inner loop with a function 
call should get registers assigned already matching the calling 
convemtions. With more then one call at that loop level, you have to 
move around registers anyway.
Oh, sure, but keeping your scratch PMCs out of the way makes life a 
lot easier for the register coloring algorithms. Might not be 
optimal, but if it makes life simpler to start, optimal can come 
later.

It's distinctly possible, of course, that there'll be very little 
pressure to actually *use* them for most code, as we've got plenty 
of registers in general. That's the hope, at least.
Yes, 16 regs are plenty and do suffice for all normal[1] code. 
Assigning to wanted reg numbers for a function is a nice 
optimization.

[1] all except Dan's 6000 lines subroutines :) Did you start 
creating real subs for your code already?
I wish. :( Unfortunately not, outside some simple stuff, and I doubt 
I will. The language just doesn't lend itself to that sort of thing. 
We're going to add actual real subroutines to the language after we 
roll out into production, but that doesn't help now, alas.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Prederefed run cores

2004-10-28 Thread Duraid Madina
Dan Sugalski wrote:
Or 3) Toss the prederef stuff entirely.
Which might not be quite as bad as it sounds: on at least one strange 
platform (IA64 HP-UX) the native C compiler gets the switch core 
running faster than the prederef core! (!)

	Duraid


Re: extend.c:Parrot_call

2004-10-28 Thread Leopold Toetsch
Leopold Toetsch wrote:
Parrot_call() runs a Parrot subroutine, but it takes PMC arguments only 
and provides no return value.

If no one hollers, I'll replace this function with a more flexible set 
of functions that are wrappers to the *runops* functions in 
src/inter_run.c:

  void *Parrot_call_sub_(interp, sub, signature, ...) [1]
  Parrot_IntParrot_call_sub_ret_int
  Parrot_Float  Parrot_call_sub_ret_float
  void *Parrot_call_meth(interp, sub, object, meth, sig, ...)
Done that now. The latter is:
   void *Parrot_call_method(interp, sub, object, meth, sig, ...)
and other 2 accordingly.
leo


Re: Prederefed run cores

2004-10-28 Thread Leopold Toetsch
Duraid Madina wrote:
Dan Sugalski wrote:
Or 3) Toss the prederef stuff entirely.

Which might not be quite as bad as it sounds: on at least one strange 
platform (IA64 HP-UX) the native C compiler gets the switch core 
running faster than the prederef core! (!)
Err, the switched core *is* a prederefed core.
Duraid
leo


Re: Install-Problem

2004-10-28 Thread Leopold Toetsch
Vijay D. [EMAIL PROTECTED] wrote:

 Failed 104/112 test scripts, 7.14% okay. 1851/1905 subtests failed,
 2.83% okay.
 make: *** [testg] Error 2

Well, testing the now non-existing CGoto core with make testg is
probably not really helpfull ;)

 Regards,
 Vijay.

leo


Re: Python/Pirate status

2004-10-28 Thread Leopold Toetsch
Sam Ruby [EMAIL PROTECTED] wrote:

 Yes, I meant the ability to do things like '$P0 = new Py_int'.

 Could this be JITed?  The mapping between string class name and assigned
 PMC type is constant throughout the life of the VM...

Not really or not easily. Fastest is to have type enum numbers. Which
needs reserved ranges for not yet loaded extensions.

 What provoked me to suggest that was a statement made in IRC yesterday
 that TCL is doing a find_type in every subroutine that does a new.  And
 the knowledge that every local variable in Python and PHP is likely to
 be a PMC.

Well, if only one set of PMC types is used and you control program
initialization, it's not too hard, to probe Parrot for the next PMC type
number. Then the compiler can emit the load_bytecode ops on top and
use type numbers.

That doesn't work, if the library loading isn't always using the same
sequence, of course.

 My concern is that if there isn't a convenient way to look up and cache
 these types, the considerable runtime overhead will still be incurred,
 but in ways that aren't readily ameanable to optimization by the runtime.

Yes.

 - Sam Ruby

leo


Re: pmc_type

2004-10-28 Thread Paolo Molaro
On 10/27/04 Luke Palmer wrote:
 Stéphane Payrard writes:
  That would allow to implement typechecking in imcc.
  
.sym Scalar a
a = new .PerlInt  # ok.  Perlint is derived from Scalar
 
 Ugh, yeah, but what does that buy you?  In dynamic languages pure
 derivational typechecking is very close to useless.  The reason C++[1]
 has pure derivational semantics is because of implementation.  The
 vtable functions have the same relative address, so you can use a
 derived object interchangably.  In a language where methods are looked
 up by name, such strictures are more often over-restrictive than
 helpful.

Actually, if I were to write a perl runtime for parrot, mono or 
even the JVM I'd experiment with the same pattern. I guess it could
be applied to a python implementation, too.
You would assign small interger IDs to the names of the methods
and build a vtable indexed by the id. In most cases the method name
is known at compile time, so you know the id and you can get
the method with a simple load from the vtable. This is much faster
than a hash table lookup (I hinted at this in my old RFC for perl6).
Of course the table would be sparse, especially in pathological 
programs, so you could have a limit, like 100 entries or less
with IDs bigger than that using a different lookup (binary search 
on an array, for example). There are a number of optimizations that 
can be done to reduce the vtable size, but I'm not sure this would
matter in parrot as long as bytecode values are as big as C ints:-)
Maybe someone has time to write a script and run it on a bunch of 
perl programs and report how many different method names are usually
created. Of course it also depends how much the hash lookup will cost
wrt the total cost of a subroutine call...

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better


Access to Parakeet in CVS

2004-10-28 Thread Michel Pelletier
So I've *finally* created a Perl.org account in order to update Parakeet 
in CVS.  As I understand it my next step is to inform the developers of 
my username michel so that I can be given access to that area.  I'm 
got some exciting new changes to commit just as soon as I figure out if 
they work with Leo's changes to the PCC innards.

Thanks in advance,
-Michel


[perl #32196] Yet Another GC Crash (YAGC)

2004-10-28 Thread via RT
# New Ticket Created by  Matt Diephouse 
# Please include the string:  [perl #32196]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32196 


Parrot exploded when running my forth implementation after a cvs 
update. Below is the backtrace from gdb.

--
matt diephouse
http://matt.diephouse.com

ns:~/Projects/parrot/languages/forth ezekiel$ ulimit -c unlimited
ns:~/Projects/parrot/languages/forth ezekiel$ parrot -t forth.pir 
2trace.log
Bus error (core dumped)
ns:~/Projects/parrot/languages/forth ezekiel$ ls /cores/
core.20226
ns:~/Projects/parrot/languages/forth ezekiel$ gdb ../../parrot 
/cores/core.20226
GNU gdb 5.3-20030128 (Apple version gdb-330.1) (Fri Jul 16 21:42:28 GMT 
2004)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and 
you are
welcome to change it and/or distribute copies of it under certain 
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for 
details.
This GDB was configured as powerpc-apple-darwin.
Reading symbols for shared libraries .. done
Core was generated by `/Users/ezekiel/bin/parrot'.
#0  0x0003d420 in pobject_lives (interpreter=0xd00140, obj=0x0) at 
src/dod.c:198
198 if (PObj_is_live_or_free_TESTALL(obj)) {
(gdb) bt
#0  0x0003d420 in pobject_lives (interpreter=0xd00140, obj=0x0) at 
src/dod.c:198
#1  0x48f0 in mark_1_seg (interpreter=0xd00140, cs=0xd01fd0) at 
src/packfile.c:360
#2  0x4990 in find_code_iter (seg=0xd01fd0, user_data=0xd00140) at 
src/packfile.c:375
#3  0x503c in PackFile_map_segments (dir=0xd01e40, callback=0x4920 
find_code_iter, user_data=0xd00140) at src/packfile.c:687
#4  0x4a18 in mark_const_subs (interpreter=0xd00140) at 
src/packfile.c:399
#5  0x0003d734 in Parrot_dod_trace_root (interpreter=0xd00140, 
trace_stack=1) at src/dod.c:333
#6  0x0003d848 in trace_active_PMCs (interpreter=0xd00140, 
trace_stack=1) at src/dod.c:371
#7  0x0003e5b0 in Parrot_dod_ms_run (interpreter=0xd00140, flags=1) at 
src/dod.c:1168
#8  0x0003e76c in Parrot_do_dod_run (interpreter=0xd00140, flags=1) at 
src/dod.c:1224
#9  0x0009a0a4 in mem_allocate (interpreter=0xd00140, 
req_size=0xbfffe650, pool=0xd002d0, align_1=15) at src/resources.c:142
#10 0x0009aefc in Parrot_allocate_string (interpreter=0xd00140, 
str=0xfe7798, size=128) at src/resources.c:656
#11 0x0002a814 in string_make_empty (interpreter=0xd00140, 
representation=enum_stringrep_one, capacity=128) at src/string.c:352
#12 0x00102138 in Parrot_sprintf_format (interpreter=0xd00140, 
pat=0xfe77c0, obj=0xb7e0) at src/spf_render.c:290
#13 0x000ea728 in Parrot_vsprintf_s (interpreter=0xd00140, 
pat=0xfe77c0, args=0xb8f0 ) at src/misc.c:68
#14 0x000ea7d4 in Parrot_vsprintf_c (interpreter=0xd00140, pat=0x299f40 
\n, args=0xb8f0 ) at src/misc.c:93
#15 0x00033da8 in PIO_eprintf (interpreter=0xd00140, s=0x299f40 \n) 
at io/io.c:1069
#16 0x001d469c in trace_op_dump (interpreter=0xd00140, 
code_start=0x102cc00, pc=0x102cda4) at src/trace.c:327
#17 0x001d4724 in trace_op (interpreter=0xd00140, code_start=0x102cc00, 
code_end=0x102d398, pc=0x102cda4) at src/trace.c:355
#18 0x001d33f8 in runops_slow_core (interpreter=0xd00140, pc=0x102cda4) 
at src/runops_cores.c:155
#19 0x0003fc8c in runops_int (interpreter=0xd00140, offset=0) at 
src/interpreter.c:808
#20 0x00038bd0 in runops (interpreter=0xd00140, offset=0) at 
src/inter_run.c:69
#21 0xc150 in Parrot_runcode (interpreter=0xd00140, argc=1, 
argv=0xbd98) at src/embed.c:750
#22 0xbf58 in Parrot_runcode (interpreter=0xd00140, argc=1, 
argv=0xbd98) at src/embed.c:679
#23 0x3f8c in main (argc=1, argv=0xbd98) at imcc/main.c:579
(gdb)



Re: register allocation questions

2004-10-28 Thread Bill Coffman
Hi all,

Thanks for your continued comments.  Btw, I usually read all the
parrot list, so don't think I'm not paying attention.

Currently, here's how the register allocator is doing.

Failed TestStat Wstat Total Fail  Failed  List of Failed
---
t/library/dumper.t5  1280135  38.46%  1-2 5 8 13
4 tests and 51 subtests skipped.
Failed 1/123 test scripts, 99.19% okay. 5/1956 subtests failed, 99.74% okay.

I recall Leo, or someone, saying that the data dumper routines are not
following the calling convention properly.  So I've decided not to
worry about it too much.  It passes the other tests, plus the
randomized tests that I created, up to 150 symbols.  At that range, it
still takes about 20x longer than g++ -O2, for equivalent programs to
compile (see gen4.pl).

Also, it is currently running about O(n^2) for n symbols, where the
old one was running about O(n^3) from my analysis.  The spill code is
still very expensive, and has a large constant associate.  I also have
data, which is attached.  The difference doesn't show up until a lot
of spilling is going on, around 80 symbols or so.

I've learned a lot about how the compiler works at this point, and I'd
like to contribute more :)

Would you like a patch?  Should I fix the data dumper routines first? 
What is all this talk about deferred registers?  What should I do
next?

Well, I'm making some comments on the below stuff.

On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
 Dan Sugalski wrote:
 At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
 
 So, if you want that really super efficient, you would allocate
 registers around function calls directly to that wanted register number,
 which should be in the SymReg's want_regno.

Yes, I think we are kind of doing this.  It's best to pass the
registers straight through though.  Like when a variable will be used
as a parameter, give it the appropriate reg num.  Sort of outside the
immediate scope of register coloring, but as I've learned, one must go
a little beyond, to see the input and output for each sub.

 While true, in the general case leaving 0-15 as non-preferred
 registers will probably make things easier. Those registers,
 especially the PMC ones, are going to see a lot of thrash as
 function calls are made, and it'll probably be easier to have them
 as scratch registers.

I guess I don't agree.  I'd like to pack down the number of registers
used to a minimum.  Then when a function is called, only those needed
registers are copied in/out.  Don't think the functionality exists. 
But the idea is to have each sub declare how many registers to
save/restore.  This would then save 0-k such registers.  Where k is
the number of registers used by the sub.  Pack 'em down, minimize the
number needed.

We can also minimize this number to match the physical architecture
that parrot is running on (for an arch specific optimization).  The
imc_reg_alloc function does not have 32 hard coded in there (well a
little bit, but can be easily changed).  It's pretty dynamic.

 Yep, that's the easy part ;) OTOH when the register allocator is
 doing register renaming anyway, the most inner loop with a function
 call should get registers assigned already matching the calling
 convemtions. With more then one call at that loop level, you have to
 move around registers anyway.

Yes, yes, renaming!  I want to do register renaming!

 Oh, sure, but keeping your scratch PMCs out of the way makes life a
 lot easier for the register coloring algorithms. Might not be
 optimal, but if it makes life simpler to start, optimal can come
 later.

p31 holds all the spill stuff.  It's a pain.  Maybe I'll move that
around, but if p31 is used, it means that there is no more room for
symbols, in at least one of the reg sets.

 [1] all except Dan's 6000 lines subroutines :) Did you start
 creating real subs for your code already?
 
 I wish. :( Unfortunately not, outside some simple stuff, and I doubt
 I will. The language just doesn't lend itself to that sort of thing.
 We're going to add actual real subroutines to the language after we
 roll out into production, but that doesn't help now, alas.

Interesting.  I'd like to test on something like that.  Maybe SPEC99 as well.

- Bill Coffman


compile.dat
Description: Binary data


compile.plot
Description: Binary data
attachment: compile.png

[PATCH] Re: [CVS ci] indirect register frame 9 - go

2004-10-28 Thread Stephane Peiry
On Thu, Oct 28, 2004 at 10:06:05AM +0200, Leopold Toetsch wrote:
 * all JIT platforms except ppc and i386 are broken
 
 Takers wanted for JIT fixes. See jit/ppc/* for necessary changes.

This patch fixes JIT for the sparc platform (make testj passes
except for the streams and gc_10.pasm where it hangs - where
apparently ppc has the issues).

 leo

Thanks,
Stéphane
Index: jit/sun4/jit_emit.h
===
RCS file: /cvs/public/parrot/jit/sun4/jit_emit.h,v
retrieving revision 1.30
diff -u -r1.30 jit_emit.h
--- jit/sun4/jit_emit.h 10 Oct 2004 17:27:45 -  1.30
+++ jit/sun4/jit_emit.h 28 Oct 2004 21:24:47 -
@@ -355,7 +355,7 @@
 /* This register can be used only in jit_emit.h calculations */
 #define XSR1 emitm_l(0)
 
-#define Parrot_jit_regbase_ptr(i) ((i)-int_reg.registers[0])
+#define Parrot_jit_regbase_ptr(interpreter) REG_INT(0)
 
 /* The offset of a Parrot register from the base register */
 #define Parrot_jit_regoff(a, i) (unsigned)(a) - (unsigned)(Parrot_jit_regbase_ptr(i))
@@ -469,25 +469,25 @@
 break;
 
 case PARROT_ARG_I:
-val = (int)interpreter-int_reg.registers[val];
+val = (int)REG_INT(val);
 emitm_ld_i(jit_info-native_ptr, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter), hwreg);
 break;
 
 case PARROT_ARG_P:
-val = (int)interpreter-pmc_reg.registers[val];
+val = (int)REG_PMC(val);
 emitm_ld_i(jit_info-native_ptr, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter), hwreg);
 break;
 
 case PARROT_ARG_S:
-val = (int)interpreter-string_reg.registers[val];
+val = (int)REG_STR(val);
 emitm_ld_i(jit_info-native_ptr, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter), hwreg);
 break;
 
 case PARROT_ARG_N:
-val = (int)interpreter-num_reg.registers[val];
+val = (int)REG_NUM(val);
 emitm_ldd_i(jit_info-native_ptr, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter), hwreg);
 break;
@@ -512,25 +512,25 @@
 
 switch(op_type){
 case PARROT_ARG_I:
-val = (int)interpreter-int_reg.registers[val];
+val = (int)REG_INT(val);
 emitm_st_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter));
 break;
 
 case PARROT_ARG_P:
-val = (int)interpreter-pmc_reg.registers[val];
+val = (int)REG_PMC(val);
 emitm_st_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter));
 break;
 
 case PARROT_ARG_S:
-val = (int)interpreter-string_reg.registers[val];
+val = (int)REG_STR(val);
 emitm_st_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter));
 break;
 
 case PARROT_ARG_N:
-val = (int)interpreter-num_reg.registers[val];
+val = (int)REG_NUM(val);
 emitm_std_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter));
 break;
@@ -572,13 +572,13 @@
 break;
 
 case PARROT_ARG_I:
-val = (int)interpreter-int_reg.registers[val];
+val = (int)REG_INT(val);
 emitm_ldf_i(jit_info-native_ptr, Parrot_jit_regbase,
 Parrot_jit_regoff(val, interpreter), hwreg);
 break;
 
 case PARROT_ARG_N:
-val = (int)interpreter-num_reg.registers[val];
+val = (int)REG_NUM(val);
 emitm_lddf_i(jit_info-native_ptr, Parrot_jit_regbase,
  Parrot_jit_regoff(val, interpreter), hwreg);
 break;
@@ -602,13 +602,13 @@
 
 switch(op_type){
 case PARROT_ARG_I:
-val = (int)interpreter-int_reg.registers[val];
+val = (int)REG_INT(val);
 emitm_stf_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter));
 break;
 
 case PARROT_ARG_N:
-val = (int)interpreter-num_reg.registers[val];
+val = (int)REG_NUM(val);
 emitm_stdf_i(jit_info-native_ptr, hwreg, Parrot_jit_regbase,
Parrot_jit_regoff(val, interpreter));
 break;
@@ -664,23 +664,27 @@
  * i1 is reusable once past the jump. interpreter is preserved in i0
  */
 int ireg0_offset;
+int ireg0_address;
 
 /* Standard Prolog */
 emitm_save_i(jit_info-native_ptr, emitm_SP, -104, emitm_SP);
 
 /* Calculate the offset of I0 in the interpreter struct */
-ireg0_offset = 

Re: register allocation questions

2004-10-28 Thread Dan Sugalski
At 3:08 PM -0700 10/28/04, Bill Coffman wrote:
 It passes the other tests, plus the
randomized tests that I created, up to 150 symbols.  At that range, it
still takes about 20x longer than g++ -O2, for equivalent programs to
compile (see gen4.pl).
Still, that's not bad.
Also, it is currently running about O(n^2) for n symbols, where the
old one was running about O(n^3) from my analysis.  The spill code is
still very expensive, and has a large constant associate.  I also have
data, which is attached.  The difference doesn't show up until a lot
of spilling is going on, around 80 symbols or so.
I'm curious to see how it behaves once the spilling gets up into the 
1000+ symbol range. Dropping from cubic to quadratic time ought to 
make a not-insignificant change in the running time, even if that 
constant's pretty big. :)

I've learned a lot about how the compiler works at this point, and I'd
like to contribute more :)
Would you like a patch?
Yes! Oh, yeah, definitely.
Well, I'm making some comments on the below stuff.
On Thu, 28 Oct 2004 09:07:05 -0400, Dan Sugalski [EMAIL PROTECTED] wrote:
 At 9:36 PM +0200 10/27/04, Leopold Toetsch wrote:
 Dan Sugalski wrote:
  At 11:09 AM +0200 10/26/04, Leopold Toetsch wrote:
  While true, in the general case leaving 0-15 as non-preferred
 registers will probably make things easier. Those registers,
 especially the PMC ones, are going to see a lot of thrash as
 function calls are made, and it'll probably be easier to have them
 as scratch registers.
I guess I don't agree.  I'd like to pack down the number of registers
used to a minimum.  Then when a function is called, only those needed
registers are copied in/out.  Don't think the functionality exists.
But the idea is to have each sub declare how many registers to
save/restore.  This would then save 0-k such registers.  Where k is
the number of registers used by the sub.  Pack 'em down, minimize the
number needed.
We can also minimize this number to match the physical architecture
that parrot is running on (for an arch specific optimization).  The
imc_reg_alloc function does not have 32 hard coded in there (well a
little bit, but can be easily changed).  It's pretty dynamic.
By all means, go for it. I certainly don't want to curb your 
enthusiasm. It's the right thing to do, ultimately. I didn't want to 
presume on your time. Happy to have it, of course. :)

  [1] all except Dan's 6000 lines subroutines :) Did you start
 creating real subs for your code already?
 I wish. :( Unfortunately not, outside some simple stuff, and I doubt
 I will. The language just doesn't lend itself to that sort of thing.
 We're going to add actual real subroutines to the language after we
 roll out into production, but that doesn't help now, alas.
Interesting.  I'd like to test on something like that.  Maybe SPEC99 as well.
If you've got a patch, I'd be more than happy to give it a whirl, and 
I can likely get you a copy of the code in question to give a run on.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: register allocation

2004-10-28 Thread Bill Coffman
When I cvs up'd, cleared and reConfigure'd I got these stats:

Failed Test Stat Wstat Total Fail  Failed  List of Failed
---
t/library/streams.t1   256211   4.76%  11
t/op/gc.t  1   256181   5.56%  13
4 tests and 66 subtests skipped.
Failed 2/124 test scripts, 98.39% okay. 2/1957 subtests failed, 99.90% okay.



 I'm curious to see how it behaves once the spilling gets up into the
 1000+ symbol range. Dropping from cubic to quadratic time ought to
 make a not-insignificant change in the running time, even if that
 constant's pretty big. :)

I think it's a bit more complicated.  M=number lines in code, N=number
variables.
- Time = O(M^2+N^2)  
- old time = O(M^2 +N^3)
Not quite sure of this either.  But the N^3 eventually dominates, I
think.  The data seems to bear this out.

There are more fixes I'd like to make as well.  I spotted several
things that could be fixed.  And I think the spill code can be
optimized a lot to reduce the big-O time as well.

More statistics #vars v. time in seconds: 
#vars  gcc  parrot2
200   7.92   89.20
201   11.86   146.31
202   18.11   246.37
203   9.54   107.88
204   11.81   134.60
205   14.75   190.95
206   13.25   161.83
207   10.63   138.83
208   11.02   117.73
209   7.14   88.29
210   15.14   176.69

I am also running gen3.pl with 1000 vars.  It's still on gcc.  We'll
see if parrot doesn't crash my 1Gigabyte, 2.4Ghz workstation tonight.

 Would you like a patch?
 
 Yes! Oh, yeah, definitely.
 [...]
 If you've got a patch, I'd be more than happy to give it a whirl, and
 I can likely get you a copy of the code in question to give a run on.

Soon, I'll send one the proper way.


  The
 imc_reg_alloc function does not have 32 hard coded in there (well a
 little bit, but can be easily changed).  It's pretty dynamic.
 By all means, go for it. I certainly don't want to curb your
 enthusiasm. It's the right thing to do, ultimately. I didn't want to
 presume on your time. Happy to have it, of course. :)

Thanks.  I've had a great time doing this.  Remembering graph
algorithms and compilers.  Great fun!  I'd also like to contribute to
getting Parrot out there, sooner rather than later.  So if I can help
with that, I'd like to hear suggestions.

-Bill


Re: register allocation questions

2004-10-28 Thread Bill Coffman
Thanks Matt,

I hope I can help out.  The patch I am submitting actually does
simplify register coloring a bit.  I've been waiting for perl6 with so
much anticipation, I just couldn't stand it any more, and I had to
participate.

-Bill


On Thu, 28 Oct 2004 18:17:57 -0400, Matt Fowles [EMAIL PROTECTED] wrote:
 Bill~
 
 I have to say that I am really impressed by all of the work that you
 are doing, and if you can make the internals of imcc a little more
 approachable, you would be doing a great service.
 
 Thanks,
 Matt
 
 
 



Re: C89

2004-10-28 Thread Bill Coffman
Thanks for the info...

Apparently,

   gcc -ansi -pedantic 

is supposed to be ANSI C '89.  Equiv to -std=c89.  Also, my
Configure.pl generated make file uses neither -ansi nor -pedantic.  I
do have access to a KR C v2, but it doesn't look like it's going to
match the actual practice.  Oh well.  So long, as my code works, I'm
happy.

Incidentally, I tried adding -ansi and -pedantic and I got lots of
warnings, like long long not supported by ANSI C'89, etc. (how can
you do 64 bit ints then?).  I also got errors that caused outright
failure.  Perhaps it's best to forget the whole C'89 thing.  But maybe
someone should remove that from the documentation?  Just a thought.

-Bill

On Thu, 21 Oct 2004 22:41:36 -0700, Jeff Clites [EMAIL PROTECTED] wrote:
 On Oct 21, 2004, at 11:51 AM, Dan Sugalski wrote:
 
  At 11:25 AM -0700 10/21/04, Bill Coffman wrote:
  I read somewhere that the requirement for parrot code is that it
  should be compliant with the ANSI C'89 standard.  Can someone point me
  to a description of the C89 spec, so I can make sure my reg_alloc.c
  patch is C89 compliant?
 
  I don't think the ANSI C89 spec is freely available, though I may be
  wrong. (Google didn't find it easily, but I don't always get along
  well with Google) If the patch builds without warning with parrot's
  standard switches then you should be OK. (ANSI C89 was the first big
  rev of C after the original KR C. If you've got the second edition or
  later of the KR C book, it uses the C89 spec)
 
 Also, if you're compiling with gcc, then you can pass -std=c89 to the
 compiler to enforce that particular standard. (Apparently--though I
 haven't tried it.) I believe -ansi does the same thing.
 
 JEff