[PATCH] Macro fixes

2001-11-10 Thread Jeff

The following patch fixes the following bugs with macros:

1) Macros with zero parameters were disallowed
2) Local branches inside macros were not being given unique names on a
per-invocation basis. This made it impossible to write the following
code:

--- cut here ---
answer macro R
eq R,42,$done
print 42
$done:
endm

answer 41
answer 42
end
--- cut here ---

Because when the macro was expanded twice, two separate expansions of
$done existed in the generated code, which caused an error. The
presented patch fixes this naively by appending 'LOCAL_$gensym' onto the
label name, creating '$LOCAL_0_done' in the first expansion,
'$LOCAL_1_done' for the second expansion, and so on. A slightly cleaner
solution would be to alter the parser to allow labels of the form
'LOCAL_0_$done' so that an author would stand a much lower risk of
colliding with compiler-generated labels.

On the upside, this patch allows me to continue to add new instructions
to the compiler without fear of label collision.

--
--Jeff
[EMAIL PROTECTED]



#! perl -w

use Parrot::Test tests = 6;

output_is( 'CODE', OUTPUT, macro, zero parameters );
answer  macro
print   42
print   \n
endm
answer
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one unused parameter, literal term );
answer  macro   A
print   42
endm
answer  42
print   \n
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one unused parameter, register term );
answer  macro   A
print   42
endm
set I0, 43
answer  I0
print   \n
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, literal );
answer  macro   A
print   A
endm
answer  42
print   \n
end
CODE
42
OUTPUT

#
# Can't test because I can't capture errors
#
#output_is( 'CODE', OUTPUT, macro, one parameter in call, not in def );
#answer macro
#   print A
#endm
#   answer 42
#   print \n
#end
#CODE
#42
#OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, register );
answer  macro   A
print   A
endm
set I0,42
answer  I0
print   \n
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, called twice );
answer  macro   A
print   A
print   \n
inc A
endm
set I0,42
answer  I0
answer  I0
end
CODE
42
43
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, label );
answer  macro   A
ne  I0,42,$done
print   A
print   \n
$done:
endm
set I0,42
answer  I0
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter run twice, label );
answer  macro   A
ne  I0,42,$done
print   A
print   \n
$done:
endm
set I0,42
answer  I0
answer  I0
end
CODE
42
OUTPUT


diff -ru parrot_orig/Parrot/Assembler.pm parrot/Parrot/Assembler.pm
--- parrot_orig/Parrot/Assembler.pm Sat Nov  3 19:04:08 2001
+++ parrot/Parrot/Assembler.pm  Sat Nov 10 21:35:01 2001
@@ -581,6 +581,7 @@
 =cut
 
 sub process_program_lines {
+  my $gensym = 0;
   while( my $lineinfo = shift( @program ) ) {
 ($file, $line, $pline, $sline) = @$lineinfo;
 
@@ -609,7 +610,7 @@
   # found a macro, expand it and append its lines to the front of
   # the program lines array.  
 
-  my @expanded_lines = expand_macro( $opcode, @args );
+  my @expanded_lines = expand_macro( $opcode, $gensym++, @args );
   unshift( @program, @expanded_lines );
   $lineinfo-[2] = '';
   unshift( @program, $lineinfo );
@@ -687,7 +688,7 @@
 =cut
 
 sub has_asm_directive {
-  return $_[0] =~ /^[_a-zA-Z]\w*\s+macro\s+.+$/i ||
+  return $_[0] =~ /^[_a-zA-Z]\w*\s+macro(?:\s+.+)?$/i ||
  $_[0] =~ /^[_a-zA-Z]\w*\s+equ\s+.+$/i;
 }
 
@@ -710,11 +711,16 @@
 $equate{$name} = $data;
 return 1;
   }
-  elsif( $line =~ /^([_a-zA-Z]\w*)\s+macro\s+(.+)$/i ) {
+  elsif( $line =~ /^([_a-zA-Z]\w*)\s+macro(?:\s+(.+))?$/i ) {
 # a macro definition
 my ($name, $args) = ($1, $2);
 my $cur_macro = $name;
-$macros{$name} = [ [split( /,\s*/, $args)], [] ];
+if(defined $args) {
+  $macros{$name} = [ [split( /,\s*/, $args)], [] ];
+}
+else {
+  $macros{$name} = [ [], [] ];
+}
 while( 1 ) {
   if( !scalar( @program ) ) {
 error( The end of the macro '$name' was never seen, $file, $line);
@@ -830,8 +836,9 @@
 =cut
 
 sub expand_macro {
-  my ($opcode, @args) = @_;
+  my ($opcode, $gensym, @args) = @_;
 
+  my $local_prefix = sprintf(LOCAL_%d_,$gensym);
   my (@margs) = @{ $macros{$opcode}[0] };
   my (@macro);
 
@@ -840,6 +847,11 @@
 
   foreach (@{ $macros{ $opcode }[1] } ) {
 push( @macro, [@$_] );
+  }
+  for(@macro) {
+$_-[2]=~/\$/ and do {
+  $_-[2]=~s/\$/\$$local_prefix/;
+};
   }
 
   my $nargs = scalar(@args);



Re: [PATCH] Macro fixes

2001-11-10 Thread Jeff

Erps, macro.t had a slight bug. The included version of macro.t fixes it.

--Jeff
[EMAIL PROTECTED]



#! perl -w

use Parrot::Test tests = 7;

output_is( 'CODE', OUTPUT, macro, zero parameters );
answer  macro
print   42
print   \n
endm
answer
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one unused parameter, literal term );
answer  macro   A
print   42
endm
answer  42
print   \n
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one unused parameter, register term );
answer  macro   A
print   42
endm
set I0, 43
answer  I0
print   \n
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, literal );
answer  macro   A
print   A
endm
answer  42
print   \n
end
CODE
42
OUTPUT

#
# Can't test because I can't capture errors
#
#output_is( 'CODE', OUTPUT, macro, one parameter in call, not in def );
#answer macro
#   print A
#endm
#   answer 42
#   print \n
#end
#CODE
#42
#OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, register );
answer  macro   A
print   A
endm
set I0,42
answer  I0
print   \n
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, called twice );
answer  macro   A
print   A
print   \n
inc A
endm
set I0,42
answer  I0
answer  I0
end
CODE
42
43
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter, label );
answer  macro   A
ne  I0,42,$done
print   A
print   \n
$done:
endm
set I0,42
answer  I0
end
CODE
42
OUTPUT

output_is( 'CODE', OUTPUT, macro, one used parameter run twice, label );
answer  macro   A
ne  I0,42,$done
print   A
print   \n
$done:
endm
set I0,42
answer  I0
answer  I0
end
CODE
42
42
OUTPUT



[PATCHES] concat, read, substr, added 'ord' operator, and a SURPRISE

2001-11-10 Thread Jeff

string.pasm patches the operators mentioned
The other file, 'parrot.pasm', is a miniature Parrot compiler, written
in Parrot.

The patches in the string.diff file are required to make this work.
It's currently -very- limited, due to some issues that I found with
macro processing and some problems in local labels that I found during
development.

Specifically, macros with labels cannot be expanded more than once
without the labels colliding.

The only sample test program it can compile is below (test.pasm):
---cut here---
# comment that will be ignored by parrot.pasm
print 9
# Maybe another comment here, these are ignored.
end
---cut here---
To prove that it is indeed compiling a test file, change '9' to
something like 732, and then:

../assemble.pl parrot.pasmparrot.pbc
../test_prog parrot.pbc
../test_prog test.pbc
9
~/parrot/

Now, of course, there are many limitations here.
For one, until macros are fixed (something I'm going to do tonight) I
can't have more than one macro invocation (like, say, sscanf) in a given
file. So, we can't scan for more than one instruction easily.
I also need to restructure the code to do one pass to collect the number
of operators, write that, then write the operator stream out.

--Jeff
[EMAIL PROTECTED]



#--
#
# read_file
#
# read_file STRING, FILE_NAME
#

read_file   macro  R, S, CHUNK_SIZE, TEMP_STRING
pushi
open   I31, S,  0
$read_chunk:read   TEMP_STRING, I31,CHUNK_SIZE
length I0,  TEMP_STRING
eq I0,  0,  $done
concat R,   TEMP_STRING
eq I0,  CHUNK_SIZE, $read_chunk
$done:  close  I31
popi
endm

#--
#
# sscanf
#
# sscanf STRING, INDEX, VALUE
#

sscanf  macro STRING, INDEX, RETURN_VALUE
length I3,STRING
set I2,INDEX
$next_char_2:   eq I2,I3,$done_2
ord I1,STRING,I2
lt I1,48,$done_2
gt I1,58,$done_2
sub I1,I1,48
mul RETURN_VALUE,RETURN_VALUE,10
add RETURN_VALUE,RETURN_VALUE,I1
inc I2
branch $next_char_2
$done_2:
endm

#--

write_magic macro FH 
write FH,20010401
endm

#--

write_print_ic  macro FH,IC
write FH,27
write FH,IC
endm

write_end   macro FH
write FH,0
endm

#--

parse_line  macro LINE, FILE, TS

eq  LINE, end,   $write_end
substr  TS,   LINE,0,   5
eq  TS,   print, $write_print
branch  $done_parsing
$write_end: write_end   FILE
branch  $done_parsing
$write_print:   
sscanf  LINE, 6, I28
#   ord I28,  LINE, 6
#   dec I28,  48
write_print_ic  FILE, I28
branch  $done_parsing
$done_parsing:
endm

split_file  macro   R, D, TEMP_FILE
openTEMP_FILE,test.pbc
write_magic TEMP_FILE
write   TEMP_FILE,0
write   TEMP_FILE,4
write   TEMP_FILE,0

#
# Unfortunate problem
#
write   TEMP_FILE,12

set I31,0
length  I30,R
$next_char: substr  S31,R,I31,1
eq  S31,\n,$end_of_line
concat  D,S31
inc I31
eq  I31,I30,$end_split
branch  $next_char
$end_of_line:   parse_line  D,TEMP_FILE,S2
set D,
inc I31
branch  $next_char
$end_split: close   TEMP_FILE
endm

#--
#
# Main
#
set S0,
read_file S0,test.pasm,8,S31
set S1,
split_file S0,S1,I29
end

#--


diff -ru parrot_orig/core.ops parrot/core.ops
--- parrot_orig/core.opsTue Nov  6 11:14:25 2001
+++ parrot/core.ops Sat Nov 10 17:55:47 2001
@@ -141,6 +141,26 @@
 
 
 
+=item Bord(i,s|sc)
+
+=item Bord(i,s|sc,i|ic)
+
+Set $1 to the appropriate character in string $2.
+Selects character $3 if $3 is present.
+
+=cut

Re: [PATCHES] concat, read, substr, added 'ord' operator, and aSURPRISE

2001-11-10 Thread Alex Gough

On Sat, 10 Nov 2001, Jeff wrote:

 string.pasm patches the operators mentioned
 The other file, 'parrot.pasm', is a miniature Parrot compiler, written
 in Parrot.
 
 The patches in the string.diff file are required to make this work.

ook, cool, but string_length returns an INTVAL, not an int.

Alex Gough

diff -ru parrot_orig/string.c parrot/string.c
--- parrot_orig/string.cWed Oct 31 17:51:31 2001
+++ parrot/string.c Sat Nov 10 18:16:27 2001
@@ -83,6 +83,33 @@
 return s-strlen;
 }
 
+/*=for api string string_ord
+ * return the length of the string
+ */
+INTVAL
+string_ord(STRING* s, INTVAL index) {
+if(s==NULL) {
+INTERNAL_EXCEPTION(ORD_OUT_OF_STRING,
+   Cannot get character of empty string);
+}
+else {
+INTVAL len = string_length(s);
+if(index  0) {
+INTERNAL_EXCEPTION(ORD_OUT_OF_STRING,
+   Cannot get character at negative index);
+}
+else if(index  (len - 1)) {
+INTERNAL_EXCEPTION(ORD_OUT_OF_STRING,
+   Cannot get character past end of string);
+}
+else {
+char *buf = s-bufstart;
+return buf[index];
+}
+}
+return -1;
+}
+
 /*=for api string string_copy
  * create a copy of the argument passed in
  */
@@ -175,13 +202,19 @@
  */
 STRING*
 string_concat(struct Parrot_Interp *interpreter, STRING* a, STRING* b, INTVAL flags) {
-if (a-type != b-type || a-encoding != b-encoding) {
-b = string_transcode(interpreter, b, a-encoding, a-type, NULL);
+if(a != NULL) {
+if (a-type != b-type || a-encoding != b-encoding) {
+b = string_transcode(interpreter, b, a-encoding, a-type, NULL);
+}
+string_grow(a, a-strlen + b-strlen);
+mem_sys_memcopy((void*)((ptrcast_t)a-bufstart + a-bufused),b-bufstart, 
+b-bufused);
+a-strlen = a-strlen + b-strlen;
+a-bufused = a-bufused + b-bufused;
+}
+else {
+  return string_make(interpreter,
+ b-bufstart,b-buflen,b-encoding,flags,b-type);
 }
-string_grow(a, a-strlen + b-strlen);
-mem_sys_memcopy((void*)((ptrcast_t)a-bufstart + a-bufused), 
b-bufstart,b-bufused);
-a-strlen = a-strlen + b-strlen;
-a-bufused = a-bufused + b-bufused;
 return a;
 }
 




Re: [PATCHES] concat, read, substr, added 'ord' operator, and aSURPRISE

2001-11-10 Thread Alex Gough

On Sun, 11 Nov 2001, Alex Gough wrote:

(but not quite enough...)

 On Sat, 10 Nov 2001, Jeff wrote:
 
  string.pasm patches the operators mentioned
  The other file, 'parrot.pasm', is a miniature Parrot compiler, written
  in Parrot.
  
  The patches in the string.diff file are required to make this work.
 
 ook, cool, but string_length returns an INTVAL, not an int.

Remember that people who say negative usually mean positive, they
just don't know it yet.  Always look on the bright si-ide of life, de
do, de do de do de do.

Yes, also this doesn't follow the style of the rest of string.c
(s-strlen is your friend) and I'm not sure that (char*)[index]
is the right way to get ord for $encoding.

Have we actually worked out what ord should do yet?

string_ord should be more like this anyhow:

INTVAL
string_ord(STRING* s, INTVAL index) {
if(s==NULL) {
INTERNAL_EXCEPTION(ORD_OUT_OF_STRING,
   Cannot get character from empty string);
}
else {
if (index  0 ) {
index = s-strlen + index; /* zero based */
}

if (index  (s-strlen - 1) || index  0) {
INTERNAL_EXCEPTION(ORD_OUT_OF_STRING,
   Cannot get character outside string);
}
else {
/* WORK OUT WHAT ORD SHOULD BE */
}
}
return -1;
}

Also, is it wise to be #defining every one of our errors to be 1,
aren't these better being an enum, or is there merely not yet a plan
for exceptions that works?

(The general gist of the patch is damn good, btw)

Alex Gough





Re: A serious stab at regexes

2001-11-10 Thread Angel Faus

Hi Brent,


 It just means you have to be more explicit.  I consider that a Good
 Thing--Perl 5's regular expressions are compact enough to be represented
 like: but the internals to support them are an absolute jungle.  I'd rather
 have a few exposed ops than have a chunk of code like Perl 5's regular
 expressions.  Besides, being able to /see/ where we jump to on each op
 when we fail will help when debugging the RE compiler and such.


I totally agree. It is not about having fewer opcodes or a more compact 
syntax, but a more maintainable system, I just pretend that having a stack 
were you predeclare the possible continuation paths is simpler than your 
marks idea. 

I could be biased here of course. I would love some comments (specially of 
someone with experience in this area) about whether were are going the right 
way here. 

 How do you plan to support lookahead?  Unless I'm mistaken, with my
 proposal it goes something like:

   rePushindex
   rePushmark
   #lookahead code in here
   rePopmark
   rePopindex


This could be solved by having a new pair of ops (reGetIndex /reSetIndex) 
that save and restore the current index into an INTVAL register:

reGetIndex I1 
#lookahead code in here
reSetIndex I1

So lookahead would be an special case, but that's fine, because it _is_ an 
special case.

 There are advantages and disadvantages to both proposals.  The question
 is, which one (or both) is flexible enough to do what we want it to do?


Probably both proposals are good enough, so I would say that we should 
choose one (not necessarily mine's) and go ahead. I would love to see some 
regex support commited soon (maybe as an oplib) so we can hack the languages 
(babyperl and others) and gain some experiencie about performance and 
optimitzations...

¿Is there any plan about when to commit regular expression support on Parrot? 

Dam Sugalsky has said that he is not interessed at all in the design of regex 
opcodes.  Who is going to have a final say on this? 

Just an idea, but ¿could we have someone with experience in perl5 regex 
internals playing the role of regex pumpking?


 I look forward to seeing it.


I am attaching a patch that implements the modifications I suggest to your 
code. The tests and the compiler are updated too and work fine.

(btw, I am not sure about if I choosed the right options on the diff command 
¿could someone please tell me on private what's the right way of submiting 
patches when there are various files?)

There is another (totally independent) question that bothers me: 

¿should we use the regex flags on compile time to perform agressive 
optimitzations, or should we rely on runtime checks to get the job done?

I'd better explain myself with an example:

We have (in these patches) an opcode called reAnyhing. When called, it testes 
the RE_single_line flag, and acts in consequence.

A small optimitzation could be gained if we had two versions of the op (like 
reAnythingML [multi-line version] and reAnythingsSL [single-line version]) 
and choosed between them on compile-time.

This can be applied to most flags, and would hopefully result in a speed 
increase (less runtime checks), at the cost of using more opcodes numbers.

Another example: case-insensitive matches can be implemented as run-time 
checks on the corresponding flag (that's the way most regex engines do it), 
or by normalizing the regular expression to lower case on compile time, and 
then converting the string to lower case just at the begining of the match.

(The former is actually propsed by Jeffrey Friedl in the Mastering Regexs 
book claiming that it would be much faster)

How agressive are we going to be with this kind of optimitzations? Or are we 
going to rely in a very smart JIT compiler?

---
Angel Faus
[EMAIL PROTECTED]




diff -crN parrot_current/parrot/Parrot/OpsFile.pm 
parrot_patched/parrot/Parrot/OpsFile.pm
*** parrot_current/parrot/Parrot/OpsFile.pm Fri Oct 26 03:00:01 2001
--- parrot_patched/parrot/Parrot/OpsFile.pm Sat Nov 10 16:08:44 2001
***
*** 22,27 
--- 22,42 
  #my %opcode;
  #my $opcode;
  
+ my $backtrack_macro = EOM;
+ {
+ 
+   opcode_t *dest;
+   if(stack_depth(interpreter, cur_re-stack_base)){
+ pop_generic_entry(interpreter, cur_re-stack_top, cur_re-index, 
+STACK_ENTRY_INT);
+ pop_generic_entry(interpreter, cur_re-dest_stack_top, dest, 
+STACK_ENTRY_DESTINATION);
+ return dest;
+   }
+   else {
+ return cur_re-onfaildest;
+   }
+ }
+ EOM
+ 
  
  #
  # trim()
***
*** 217,235 
#
  
$body =~ s/HALT/{{=0}}/mg;
!   
$body =~ s/RESTART\(\*\)/{{=0,+=$op_size}}/mg;
$body =~ s/RESTART\((.*)\)/{{=0,+=$1}}/mg;
!   
$body =~ s/RETREL\(\*\)/{{+=$op_size}}/mg;
$body =~ s/RETREL\((.*)\)/{{+=$1}}/mg;
!   
$body =~ s/RETABS\((.*)\)/{{=$1}}/mg;
!   
$body =~ s/\$(\d+)/{{\@$1}}/mg;
!   

RE: Lexical implementation work

2001-11-10 Thread Dan Sugalski

At 01:39 PM 11/9/2001 -0800, Brent Dax wrote:
Dan Sugalski:
# At 12:39 AM 11/9/2001 -0500, Ken Fox wrote:
# 3. We've adopted a register machine architecture to
# reduce push/pop stack traffic. Register save/load
# traffic is similar, but not nearly as bad.
# 
# Do we want to further reduce memory moves by allowing
# ops to work directly on lexicals?
#
# No, I don't think so--that's what the registers are for.
# Fetch out the PMC
# pointer into a PMC register and just go from there. Any
# changes made via
# the pointer will, of course, be reflected in the lexical, since we're
# working on the real thing.

Here's a thought--do we want to have variants on every PMC op to support
a key?  IIRC, assembly.pod proposes something like:

I think we're going to switch over to some sort of key creation op, but I'm 
not sure at the moment. Constant keys are easy, of course--they can be 
thrown up into the constants section, built at compile-time.

 fetchlex P1, %foo
 add P3, P1, key, 1

Why not just have:

 fetchlex P1, %foo #or whatever
 fetchhash P2, P1, key
 add P3, P2, 1

and save ourselves a few opcode numbers?

Why save the numbers? Also we'll end up needing variable-arg-count opcodes 
for proper multidimensional fetching.

# 4. There has been discussion about how useful non-PMC
# registers will be to the Perl compiler. If there are
# questions there, surely non-PMC lexicals would be even
# less useful to Perl.
# 
# Do we want non-PMC lexical support?
#
# Nope, I wasn't going to bother. All variables are PMCs. The
# int/string/num
# things are for internal speed hacks.

You may want to bounce that off the -language people--they seem to be
expecting that 'my int $foo' will only take up a pad entry and an
int-sized chunk of memory.

They can think what they like, but it's not going to happen for plain 
scalars. There's not much point, really--the space savings comes in with 
array and hash storage. The typical program doesn't have that many plain 
scalars.

# 5. Perl 5 uses pads and scratchpads for holding lexicals.
# 
# Do we want to consider other implementations such as
# traditional stack frames with a display or static
# link?
#
# Well, I don't think we can go with traditional stack frames
# as such, since
# the individual frames (and their children) may need to stick
# around for
# ages, courtesy of closures and suchlike things. (We almost
# end up with a
# linked list of stacks when you factor recursion in)

With closures, don't we just clone the PMC pointer and pad entry, thus
avoiding having to keep stack frames around until the end of time?  Or
am I missing something?

Bad explanation. What I meant was we need to keep stacks of call frames 
around, or something of the sort. If you've got a recursive set of calls 
and create a closure inside them, the closure only captures the top-most 
version of all the variables. (Basically the current pads for each block) 
You still need to keep all of 'em around and available, in case at runtime 
something walks up them with caller and MY.

We're going to keep the full pads around for closures, otherwise you set 
yourself up for problems with evals inside them.

# Lexical implementation is as critical to
# Perl's performance as the dispatcher is, so we should
# take some time to get it right.
#
# I'm not sure it's as performance critical, but it's
# definitely important.
# Fast is, of course, good. :)

Of course.  Random question only very tangentially related to this: is
INTVAL (and thus the I registers) supposed to be big enough to hold a
pointer?

INTVAL shouldn't ever get a pointer in it. We're not going to be casting 
pointers to ints and back anywhere, except possibly in some of the 
deep'n'evil spots (like the memory allocator).

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: JIT compilation

2001-11-10 Thread Dan Sugalski

At 07:13 PM 11/8/2001 -0500, Ken Fox wrote:
Dan Sugalski wrote:
  [native code regexps] There's a hugely good case for JITting.

Yes, for JITing the regexp engine. That looks like a much easier
problem to solve than JITing all of Parrot.

The problem there's no different than for the rest of the opcodes. Granted, 
doing a real JIT of the integer math routines is a lot easier than doing it 
for, say, the map opcode, because they generally decompose to a handful of 
machine instructions that are easy to generate with templates.

  If you think about it, the interpreter loop is essentially:
 
 while (code) {
   code = (func_table[*code])();
 }

That's *an* interpreter loop. ;)

Yes, I know.

Some day I'm going to have to write 
HOW_TO_RECOGNIZE_CODE_WHICH_IS_SIMPLIFIED_FOR_PURPOSES_OF_EXAMPLE.pod... :)

  you pay a lot of overhead there in looping (yes, even with the computed
  goto, that computation's loop overhead), plus all the pipeline misses from
  the indirect branching. (And the contention for D cache with your real 
 data)

The gcc goto-label dispatcher cuts the overhead to a single indirect
jump through a register.

Based on data in the D stream. Say Hi to Mr. Pipeline Flush there.

If Perl code *requires* that everything goes through vtable PMC ops,
then the cost of the vtable fetch, the method offset, the address
fetch and the function call will completely dominate the dispatcher.

Oh, absolutely. Even with perl 5, the opcode overhead's not a hugely 
significant part of the cost of execution. Snipping out 3-5% isn't shabby, 
though, and we *will* need all the speed we can muster with regexes being 
part of the generic opcode stream.

  Dynamic languages have potential overheads that can't be generally 
 factored
  out the way you can with static languages--nature of the beast, and one of
  the things that makes dynamic languages nice.

I know just enough to be dangerous. Lots of people have done work in
this area -- and on languages like Self and Smalltalk which are as
hard to optimize as Perl. Are we aiming high enough with our
performance goals?

Our aim is higher than perl 5. No, it's not where a lot of folks want to 
aim, but we can get there using well-known techniques and proven theories. 
Perl is, above all, supposed to be practical. The core interpreter's not a 
place to get too experimental or esoteric. (That's what the cores written 
to prove I'm a chuckle-headed moron with no imagination are for :)

I'll be happy with a clean re-design of Perl. I'd be *happier* with
an implementation that only charges me for cool features when I use
them.

There's a minimum charge you're going to have to pay for the privilege of 
dynamicity, or running a language not built by an organization with 20 
full-time engineers dedicated to it. Yes, with sufficient cleverness we can 
identify those programs that are, for optimization purposes, FORTRAN in 
perl, but that sort of cleverness is in short supply, and given a choice 
I'd rather it go to making method dispatch faster, or making the parser 
interface cleaner.

(Oops. That's what Bjarne said and look where that took him... ;)

Yeah, that way lies madness. Or C++. (But I repeat myself)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk