Re: Disappearing code

2003-01-12 Thread Ken Fox
Damian Conway wrote:

sub debug is immediate is exported (@message) {
return $debugging ?? { print $*STDERR: @message; } :: {;}
}


Won't @message need lazy evaluation? How will Perl know to
delay interpolation until the result of the macro is called
at run time?

- Ken




Re: right-to-left pipelines

2002-12-10 Thread Ken Fox
Damian Conway wrote:

For that reason, even if we can solve this puzzle, it might be far kinder
just to enforce parens.


I might be weird, but when I use parens to clarify code in Perl, I
like to use the Lisp convention:

  (method $object args)

Hopefully that will still work even if Perl 6 requires parens.

- Ken




Use CPAN to help define built-ins and globals?

2002-12-09 Thread Ken Fox
Smylers wrote:

Ken Fox wrote:

How about formalizing global namespace pollution with something like
the Usenet news group formation process?  Ship Perl 6 with a very
small number of global symbols and let it grow naturally.


If the initial release of Perl 6 doesn't have commonly-required
functions then people will write their own.  People will do these in
incompatible ways ...


You're taking that out of context. Ship the commonly required
functionality, but don't introduce new global symbols.

A global Cpush function:

  push @array, 1

A class Cpush method:

  push @array: 1

Methods work well with AUTOLOAD, so they probably don't require
Cuse statements. Anyways, I'd rather have Cuse statements than
globals. I know others disagree -- I even disagree when I'm
trying to write a one-liner on the command line.

Perl 6 is the community rewrite. One of the pillars of the
community is CPAN. Could CPAN help resolve simple library and
namespace issues? Adding Cpurge or Cpart is not a language
design issue.

- Ken




Re: purge: opposite of grep

2002-12-08 Thread Ken Fox
Damian Conway wrote:

sub part ($classifier, *@list) {



return @parts;
}


Given the original example

  (@foo,@bar,@zap) := part [ /foo/, /bar/, /zap/ ] @source;

this binds the contents of @parts to (@foo,@bar,@zap)? The
array refs in @parts are not flattened though. Is it correct
to think of flattening context as a lexical flattening? i.e.
only terms written with @ are flattened and the types of
the terms can be ignored?

BTW, if part were declared as an array method, the syntax
becomes

  @source.part [ /foo/, /bar/, /zap/ ]

or

  part @source: [ /foo/, /bar/, /zap/ ]

Can part be a multi-method defined in the array class
so the original example syntax can be used? (I'd prefer
the code too because the switch statement is eliminated.)


sub convert_to_sub ($classifier is topic) is cached {


Very nice.


for @classifiers.kv - $index, test {


An array::kv method? Very useful for sparse arrays, but
is this preferred for all arrays? An explicit index counter
seems simpler in this case.


my @indices = map { defined .key()($nextval) ?? .value 
:: () } %classifiers;

That map body looks like a syntax error, but it isn't. Can I add
extra syntax like

  map { defined(.key.($nextval)) ?? .value :: () }

to emphasize the fact that .key is returning a code ref?

Last, but not least, the Hash case returns a junction (most
likely of a single value). Junctions don't collapse like
superpositions, so I'm wondering what really happens.

Can you describe the evaluation? I'm really interested in how
long the junction lasts (how quickly it turns into an integer
index), and what happens with a duplicate (ambiguous?) index.

Sorry for so many questions. The code you wrote was just a
really, really good example of many Perl 6 features coming
together.

[This is out of order; Damian wrote it in another message.]
 Everything doesn't. Everything shouldn't be. Just the really common,
 important stuff.

So CGI.pm is in?

I don't think really common, important is a good criteria for
being in the core. IMHO it should be language defining, awkward or
impossible to implement as a module.

Perhaps the part method can be implemented as a mix-in module that
extends array without subclassing it? AUTOLOAD can do that now
for packages. Are classes sealed or will they use AUTOLOAD too?

- Ken




Re: purge: opposite of grep

2002-12-08 Thread Ken Fox
Damian Conway wrote:

Ken Fox asked:

Is it correct
to think of flattening context as a lexical flattening? i.e.
only terms written with @ are flattened and the types of
the terms can be ignored?


I'm not sure I understand this question.


Sometimes array references behave as arrays, e.g.

  push $array, 1

In flattening context array refs don't flatten, only arrays.
I'm not even sure that only arrays flatten either -- it might
be anything that begins with @. e.g.

  my Point @p;
  ($x, $y) := @p;

If the flattening rule is only @ symbols flatten then it
would be lexical flattening -- we only have to look at the
text. (I'm using lexical in the same sense as lexical
variable uses it.)


Multimethods don't belong to any particular class.
Does it *need* to be a method or multimethod???


If Cpart is not a method or multimethod, then it acts
like a reserved word or built-in, like Cgrep or Cmap.
IMHO that's name space pollution.

I know multi-methods don't belong to a class. It seems
useful to develop standards on where the implementation
is found though. I would expect to find Cpart as an
auto-loaded multimethod in perl6/6.0/auto/array/part.al

It would actually be nice if all the Cpush, Cpop,
etc. functions became methods, e.g.

  push @array: 1;


Depends on your definition of simpler, I guess.


I don't see anything particularly complex about this:

   my $index = 0;
   for @classifiers {
   return $index if $_.($nextval);
   ++$index
   }

That's understandable and it should produce simple bytecode.
If @classifiers is sparse or non-zero-based, then the .kv
method might be better.


I really think an CArray::kv method nicely meets the very common need of
iterating the indices and values of an array in parallel, with a minimum
of syntax and a maximum of maintainability.


Yes, I agree, but it needs to construct a stream generator
which isn't particularly efficient. I was surprised to see it
in a place where the generality and elegance isn't needed.

Thanks for the explanation of the junction. I'm not sure
whether I'm more excited by the possibility to write code
using junctions or more terrified by the certainty of
debugging that code... ;)


And I'm suggesting that Cparting is such sweet sorrow that everyone
will want to do it all the time. Or at least often enough that dragging
it in from a module with rapidly become a PITA. Just as it in Perl 5
to use CList::Utils::reduce or CList::Utils::max.


How about formalizing global namespace pollution with something
like the Usenet news group formation process? Ship Perl 6 with a
very small number of global symbols and let it grow naturally.

- Ken




Re: purge: opposite of grep

2002-12-07 Thread Ken Fox
Michael Lazzaro wrote:

(@foo,@bar,@zap) := classify { /foo/ ;; /bar/ ;; /zap/ } @source;
A shorthand for:

  for @source {
  given {
  when /foo/ { push @foo, $_ }
  when /bar/ { push @bar, $_ }
  when /zap/ { push @zap, $_ }
  }
  }


How about just

  (@foo,@bar,@zap) := classify [ rx/foo/, rx/bar/, rx/zap/ ] @source;

and implement classify as a normal sub? Why does everything
have to be built into the first version of Perl 6?

Is there any reason classify can't be a normal sub? e.g. can
a sub return ( [], [], [] ) and have that bound to 3 array
variables? What about return @AoA when @AoA = ( [], [], [] )?

- Ken




Re: Continuations

2002-11-18 Thread Ken Fox
Damian Conway wrote:

my $iter = fibses();
for  $iter  {...}

(Careful with those single angles, Eugene!)


Operator  isn't legal when the grammar is expecting an
expression, right? The  must begin the circumfix  operator.

Is the grammar being weakened so that yacc can handle it? The
rule engine is still talked about, but sometimes I get the
feeling that people don't want to depend on it.

That  $iter  syntax reminds me too much of C++.

- Ken




Re: Continuations

2002-11-18 Thread Ken Fox
Damian Conway wrote:

Ken Fox wrote:

The  must begin the circumfix  operator.


Or the circumfix ... operator. Which is the problem here.


This is like playing poker with God. Assuming you can get over
the little hurdles of Free Will and Omniscience, there's still
the problem of Him pulling cards out of thin air.

What does the circumfix ... operator do? [1]

Here docs are re-syntaxed and the  introducer was stolen
for the ... operator? [2]


Yes. But since iterating an iterator to get another iterator that
is immediately iterated will (I sincerely hope!) be a very rare
requirement, I doubt it will be anything like the serious inconvenience
it is in C++.


True. I suppose even multi-dimensional data structures will
rarely be iterated over with a simple:

  for  $array  {
  }

Most people will probably want more control:

  for $array {
 for $_ {
 }
  }

Anyways, I was wondering about the general principle of using
C++ style hacks to make yacc happy. I should have known better
coming from the author of C++ Resyntaxed. Did the immodest
proposal fixsyntax? ;)

- Ken

[1] I can't google for . Anybody know if Google can add perl6
operators to their word lists? Seriously!

[2] Hmm. Will the uproar on here docs beat string concatenation?




Re: Continuations

2002-11-18 Thread Ken Fox
Damian Conway wrote:

It's [...] the ASCII synonym for the «...» operator, which
is a synonym for the qw/.../ operator.



Nope. Heredocs still start with .


Hey! Where'd *that* card come from? ;)

Seriously, that's a good trick. How does it work? What do these
examples do?

  print a b c;

  print a
  b
  c;
  a

Is it illegal now to use quotes in qw()?

- Ken




Re: String concatentation operator

2002-11-14 Thread Ken Fox
Andy Wardley wrote:


Can we overload + in Perl 6 to work as both numeric addition
and string concatenation ...


Isn't there some nifty Unicode operator perl6 could enlist? ;)

How about concatenating adjacent operands? ANSI C does this
with string constants and it works very well. It would become
one of those great Perl sound bites too: the community
couldn't decide on the operator, so perl6 left it out.

- Ken




Re: String concatentation operator

2002-11-14 Thread Ken Fox
Michael G Schwern wrote:

Before this starts up again, I hereby sentence all potential repliers to
first read:

string concatenation operator - please stop
http://archive.develooper.com/perl6-language;perl.org/msg06710.html


The bike shed thing is like Godwin's Law. Only I don't know
which side loses. ;)

Wasn't one of the main problems with Jarkko's juxtaposition
proposal that it would kill indirect objects? Have we chased
our tail on this subject after the colon became required for
indirect objects?

If the assignment variant of the invisible concatentation
operator could be solved, juxtaposition seems like a reasonable
approach. (Line ending juxtaposition problems could be fixed with
a special rule similar to the '} by itself' rule.)

- Ken




Keywords global or only in context?

2002-11-05 Thread Ken Fox
Me wrote:


YAK for marking something.


I've been assuming that a keyword will only have
meaning in contexts where the keyword is valid.
Given the shiny new top-down grammar system, there's
no requirement for keywords to be global. (Context
sensitive keywords fall out of Perl 6 grammars
naturally -- just the opposite of yacc.)

Is this a valid assumption?

What's the parse of the following code?

sub topic ($x is topic) {
   $x.topic
}

- Ken




Re: Supercomma! (was Re: UTF-8 and Unicode FAQ, demos)

2002-11-05 Thread Ken Fox
Jonathan Scott Duff wrote:


Um ... could we have a zip functor as well?  I think the common case
will be to pull N elements from each list rather than N from one, M
from another, etc.  So, in the spirit of timtowtdi:

	for zip(a,b,c) - $x,$y,$z { ... }


sub zip (\:ref repeat{1,}) {
   my $max = max(map { $_.length } _);
   my $i = 0;
   while ($i  $max) {
   for (_) {
   yield $_[$i]
   }
   ++$i
   }
   return ( )
}

That prototype syntax is probably obsolete, but I'm not sure
what the current proposal is. It might be better to force scalar
context on the args so that both arrays and array refs can be
zipped.

I really like the idea of using generic iterators instead of
special syntax. Sometimes it seems like we're discussing 6.x
instead of just 6.0.

This iterator is nice too:

sub pairs (\a, \b) {
   my $max = max(a.length, b.length);
   my $i = 0;
   while ($i  $max) {
   yield a[$i] = b[$i];
   ++$i
   }
   return ( )
}

for pairs (a, b) {
   print .x, .y
}

- Ken




Re: How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Ken Fox
Austin Hastings wrote:


At this point, Meestaire ISO-phobic Amairecain Programmaire, you have
achieved keyboard parity with the average Swiss six-year-old child.


The question is not about being ISO-phobic or pro-English. **

The question is whether we want a pictographic language. I like
the size of the English alphabet. It produces fairly short words,
but the words are very robust (people can read words in all
orientations, backwards, upside down, in crazy fonts, hand-written,
etc.) This is the opposite of Huffman encoding, but just
as useful IMHO.

I've had the unpleasant job of turning math into software. Hand
written formulae can be very difficult to read because mathematics
worships Huffman encoding. Multiplication is specified by *nothing*.
Exponents are just written a bit smaller and a bit raised. Is this
what we want in the core?

Does anyone have any references for reading and comprehension
rates for different types of languages? I'm ignorant on the subject
and this seems like something a Perl programmer should know.

- Ken

** I'm probably both. ISO-phobic because I actually represented my
company on an ISO standard committee. Pro-English because it's what
I use -- being pro-English doesn't make me against everything else.
A language would have to be pretty bad to have its native speakers
advocate something else!




Re: UTF-8 and Unicode FAQ, demos

2002-11-04 Thread Ken Fox
Damian Conway wrote:

Larry Wall wrote:

That suggests to me that the circumlocution could be *. 

A five character multiple symbol??? I guess that's the penalty for not
upgrading to something that can handle unicode.


Unless this is subtle humor, the Huffman encoding idea is getting
seriously out of hand. That 5 char ASCII sequence is *identically*
encoded when read by the human eye. Humans can probably type the 5
char sequence faster too. How does Unicode win here?

I know I'm just another sample point in a sea of samples, but
my embedded symbol parser seems optimized for alphabetic symbols.
The cool non-alphabetic Unicode symbols are beautiful to look at,
but they don't help me read or write faster. There are rare
exceptions (like grouping) where I strongly prefer non-alphabetics,
but otherwise alphabetics help me get past the what is this code?
phase and into the what does this code do? phase as quickly as
possible.

(I just noticed that all the non-alphabetic symbols (except '?')
in the previous paragraph are used for grouping. Weird.)

- Ken




Re: How to set your Windows keyboard to ¶erl-mode

2002-11-04 Thread Ken Fox
Austin Hastings wrote:


The  and  ... are just as pictographic (or
not) as [ and ].


I'm not particularly fond of  or  either. ;) Damian just
wrote that he prefers non-alphabetic operators to help
differentiate nouns and verbs. I find it helpful when people
explain their biases like that. What's yours?


 They look the same from top or bottom, and are
unmistakable in direction when looked at from either side.


Well, anything can look like itself, that wasn't the point. The
goal is to not look like anything else in any orientation. The
chars O and 0 fail badly, but A and T are excellent. I'm not
sure where  and  fall because I don't have any experience
with them.

Programming languages probably get away with more because
most programmers don't spray paint algorithms on the side of
a bridge. (Well, Lisp programmers maybe. ;) My three points
against arbitrary punctuation as symbols are
 (1) it's impossible to identify symbol boundaries when
 reading punctuation -- you just have to guess,
 (2) it's harder to work with punctuation in non-digital
 communication, and
 (3) my memory doesn't work well on punctuation symbols!

Perl has some nice features like sigils that clue people in on
how to read a sentence. But...


difference between ' (apostrophe) and ` (tick)


is a horrible abomination. ;)


If every keyboard and operating system had the ability to simply
generate arbitrary expressions of the form (expr-a) ** (expr-b), ad
infinitum (a ** b ** c ** d ** e) then we'd be remiss not to use it.
But they can't, so we don't.


Non sequitur. Written language prior to the printing press had
no technological reason to limit alphabet size. Some languages
developed very large pictographic representations, others
developed small alphabets with word formation rules. I have no
idea what the design pressures were that caused these different
solutions. Do you? What are the strengths and weaknesses of the
approaches? Why should we select one over the other?

- Ken




Re: Blocks and semicolons

2002-09-12 Thread Ken Fox

Luke Palmer wrote:
 This requires infinite lookahead to parse.  Nobody likes infinite 
 lookahead grammars.

Perl already needs infinite lookahead. Anyways, most people
don't care whether a grammar is ambiguous or not -- if we did,
natural human languages would look very different.

People want expressive languages. (Some people even consider
ambiguity a feature. Poets? Comedians? Lawyers?)

I don't know how we should handle code blocks, but I do know
that the answer should solve human problems, not yacc's.

Perl 5 doesn't accept either of these loops:

   if ($_) { print $_\n } for (qw(1 0));
   print $_\n if ($_) for (qw(1 0));

The code must be written as:

   for (qw(1 0)) {
 print $_\n if ($_)
   }

Why won't a similar solution work for user-defined syntax in
Perl 6? (It would be no worse than Perl 5...)

- Ken




Re: Blocks and semicolons

2002-09-12 Thread Ken Fox

Luke Palmer wrote:
 On Thu, 12 Sep 2002, Ken Fox wrote:
  Perl already needs infinite lookahead.
 
 Really? Where?

Indirect objects need infinite lookahead and they are
in the core language. Hyper operators may need lookahead.
Place holders may need lookahead. User defined rules
will definitely need infinite lookahead. (Remember we
are switching from LR to LL parsing. LL(1) parsing is
not as powerful as LR(1), so Perl 6 will need lookahead
even in places where Perl 5 doesn't.)

 Computers don't like ambiguous grammars ...

The dangling else problem is ambiguous. Every time you
get a shift-reduce conflict in yacc, you have an ambiguous
grammar. Computers don't care about ambiguous grammars,
but some of the common tools (yacc) disambiguate whether
we like it not. ;)

BTW, there are some parser generators that handle
ambiguous grammars -- they either support backtracking,
infinite lookahead, or simultaneously parse all possible
derivations. In the case of the simultaneous parse, they
can actually return multiple parse trees and let the
code generator decide how to interpret things.

 But in Perl 5, Cif is not a sub.
 ...
 Because subs have to work as expressions.

In Perl 6 the difference between sub and syntax is
almost non-existant. Some subs will behave like built-in
syntax, some subs will behave like normal function
calls. (AFAIK, the only difference is that subs can not
provide lazy argument evaluation. Maybe the is rx
property eliminates even that? BTW, does anybody else
find is rx funny? This is your argument. This is your
argument on drugs. (Rx is an abbreviation for drug
prescription in the U.S.))

I think Perl 5's solution for handling if can be
applied to Perl 6 subs that look like syntax. It might
not be an improvement over Perl 5, but at least it's
no worse.

A better solution may be to continue the parse until
we see if for is followed by a block. (That's really
hard to do with yacc, which might be why Perl 5 does
what it does.)

- Ken



Re: Hypothetical variables and scope

2002-09-08 Thread Ken Fox

Damian Conway wrote:
 Though leaving optimization in the hands of the programmer
 is generally a Bad Idea.

That doesn't sound like a Perl slogan.

 It's also a matter of syntactic consistency. It has to be := for
 inlined bindings (i.e. rx/ $name:=ident /) because otherwise
 we make = meta (which is *not* a good idea). So it probably should be
 := for explicit Clets as well.

If let only works on bindings, it really bites into the
expressiveness of the language. For example, the natural way
to skip text within a match is to do something like:

   / (\w+) \d+ (\w+) { let $1 _= $2; let $2 = undef } /

This feels natural too:

   / (\w+ \d+ \w+) { let $1 =~ s/\d+// } /

Binding might be really fast for some implementations of Perl,
but slow for others. Just like string eval may be impossible in
some, but trivial in others.

- Ken




Re: Suggestion for perl 6 regex syntax

2002-09-07 Thread Ken Fox

Mr. Nobody wrote:
 /^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/
 
 would actually become longer:
 
 /^([+-]?)before \d|\.\d\d*(\.\d*)?([Ee]([+-]?\d+))?$/

Your first expression uses capturing parens, but the captures
don't bind anything useful, so you should probably compare
non-capturing versions of the regex:

/^[+-]?(?=\d|\.\d)\d*(?:\.\d*)?(?:[Ee][+-]?\d+)?$/

vs

/^[+-]?before \d|\.\d\d*[\.\d*]?[[Ee][+-]?\d+]?$/

The [Ee] isn't the way I'd write it in Perl 6 -- I'd shift
into case-insensitive mode temporarily because those
hand-written [Cc][Aa][Ss][Ee] insensitive matches are hard
to read.

/^[+-]?before \d|\.\d\d*[\.\d*]?[:i e[+-]?\d+]?$/

Now Perl 6 is just 5 characters longer. That's a horrible
pattern to read though. Can Perl 6 fix that? I think so.

I'd change the [+-] fragments to use a sub-rule because
repeated constants make things harder to read. (Not so bad
in this case, but it's a good general rule -- and you're making
generalizations about regex syntax.)

/^sign?before \d|\.\d\d*[\.\d*]?[:i esign?\d+]?$/

I'd put in some white space to clarify the different logical
pieces of the rule:

/^ sign? before \d | \.\d
\d* [\.\d*]?
[:i e sign? \d+]? $/

Now it's pretty obvious that the :i can be moved outside the
rule without screwing anything up. I'd rather have modifiers
affect the whole rule rather than remembering where they begin
and end inside it.

:i /^ sign? before \d | \.\d
   \d* [\.\d*]?
   [e sign? \d+]? $/

That's how I'd write your Perl 5 regex in Perl 6. (Well,
actually it's probably just /^ number $/, but would you
call that cheating? ;)

It does have more characters than the Perl 5 regex. Looking
at it another way, it has fewer symbols. It's faster to read.
How many times are you going to write it? How many times are
you going to read it?

When I was reading A5, I was concerned about character
classes too, but mostly because of the regex style that I
learned from the Friedl book:

   opening normal* ( special normal* )* closing

which can be used to match quoted strings for example:

   /[^\\]*(\\.[^\\]*)*/

The direct Perl 6 equivalent is not very pretty:

   / -[\\]* [ \\. -[\\]* ]* /

It's hard to come up with a good name for the character
class used there. not_a_quote_or_slash? special_char_in_quote?

I'm not concerned about it anymore because I think the
Perl 6 style will be:

   opening ( special :: | . )*? closing

The non-greedy match makes so many things easier to write
and the backtracking control prevents the special case
from accidentally matching the normal one.

I'd write the string match in Perl 6 like this:

   / [ \\. :: | . ] *? /

The only possible problem with this is non-greedy
iteration is slower. It doesn't have to be though -- and
the optimizations needed to get Perl 6 rules to match
full grammars should fix this.

If the pattern is rewritten as a grammar, we can
talk about first and follow sets.

   quoted_string:  quoted_char_seq 
quoted_char_seq: null
  | quoted_char quoted_char_seq
 quoted_char: \ any
  | any

The reason non-greedy matching is slow is because the
rule quoted_char_seq can be empty, i.e. it always
matches the current spot. However, the follow set of
quoted_char_seq is the quote. That means the *only*
thing that can follow quoted_char_seq is a quote.
There's no point in returning (taking the null route)
unless the rule is looking at a quote. This reduces
backtracking tremendously.

The other problem would normally be in the conflict
between the first sets of quoted_char. The slash
character is also an any character, so if the slash
alternative is taken, the system has to prepare to
backtrack to the any alternative. The :: backtracking
control eliminates the backtracking point, so it's
impossible for an escape sequence to be re-parsed
as two separated characters.

Damian wrote several good examples of Perl 5 - Perl 6
conversions. Take a look at E5 and experiment some
more. The built-in named rules may simplify a lot of
things too -- we're going to have a much richer library
than just \d, \w, etc.

- Ken




Re: [netlabs #801] [PATCH] PerlArray in scalar context

2002-09-05 Thread Ken Fox

Brent Dax wrote:
  Keep in mind how much it could inflate the bytecode if
 we render a ton of generic, N-dimensional hyper-operator logic into
 bytecode.

The main problem I see is that you could spend minutes executing
inside that hyper op. Doesn't that screw with the plan for putting
the event handler in the dispatch loop? It's not very friendly for
debugging either -- how can the user set a breakpoint on one of
the hyped calls?

The bytecode generated for hyper can't be any larger than the
equivalent loops in Perl 5. If there isn't any problem with byte
code size now, why would there be a problem in Perl 6?

- Ken




Re: Hypothetical variables and scope

2002-09-05 Thread Ken Fox

Damian Conway wrote:
 Because what you do with a hypothetical has to be reversible.
 And binding is far more cheaply reversible than assignment.

Why not leave it in the language spec then? If it's too
hard to implement, then the first release of Perl 6 can
leave it out. Someday somebody might come up with an
implementation.

BTW, I thought the Apocalypses were moving us away from
the-tarball-defines-the-language. Am I wrong to think that
Perl 6 the spec may have more than Perl 6 the release?

- Ken





Re: Multimethod Dispatch

2002-09-04 Thread Ken Fox

Dan Sugalski wrote:
 At 9:10 AM -0400 9/4/02, [EMAIL PROTECTED] wrote:
 So, just to clarify, does that mean that multi-dispatch is (by 
 definition)
 a run-time thing, and overloading is (by def) a compile time thing?
 
 No. They can be both compile time things or runtime things, depending on 
 the characteristics of the language.

I don't think so. Those terms are well understood by the OO community.
Bertrand Meyer wrote on the subject:
http://www.inf.ethz.ch/personal/meyer/publications/joop/overloading.pdf

Prior to C++ there might have been some ambiguity (for example, some
people have talked about run-time vs compile-time overloading), but
C++ has completely stolen the term. Here's a slide from one of Damian's
presentations that states multi-methods are Like C++ overloading,
but polymorphic.
http://www.csse.monash.edu.au/~damian/TPC/1999/MultipleDispatch/Presentation/sld010.htm

- Ken




Re: regex args and interpolation

2002-09-04 Thread Ken Fox

David Whipp wrote:
 But can I use a non-constant date?

You didn't show us the iso_date rule.

 Obviously we could put the onus on the module writer to write super-flexible
 rules/grammars. But will there be an easy way to force interpolative context
 onto this type of regex-valued subroutine arg?

With this rule you need literal numbers:

   rule iso_date { $year:=(\d{4}) -
   $month:=(\d{2}) -
   $day:=(\d{2}) }

This rule is more flexible:

   rule iso_date { (Perl.term) - { let $year := eval $1 }
   (Perl.term) - { let $month := eval $2 }
   (Perl.term)   { let $day := eval $3 }
   ($year =~ /^\d{4}$/ 
 $month =~ /^\d{2}$/ 
 $day =~ /^\d{2}$/) }

The eval is very restrictive though -- it forces the terms to
have values at compile time and breaks lexical scoping. (Unless
eval can play games with %MY and somehow evaluate in the
caller's scope. Maybe caller.{MY}.eval?)

We can tidy that up a little with another rule:

   rule value { Perl.term
{ let $0 := caller(2).{MY}.eval($term) } }

   rule iso_date { $year:=value -
   $month:=value -
   $day:=value
   ($year =~ /^\d{4}$/ 
 $month =~ /^\d{2}$/ 
 $day =~ /^\d{2}$/) }

There's still the compile-time value problem though.

What is really needed is something that converts the date syntax
to normal Perl code:

   rule iso_date { (Perl.term) -
   (Perl.term) -
   (Perl.term)
   { use grammar Perl::AbstractSyntax;
 $0 := (expr (invoke 'new (class 'Date) $1 $2 $3))) }

This rule just generates code -- the $0 return result is spliced
into the syntax tree at compile time. All the valid value checking
is done at run-time in the Date object constructor. (Or maybe at
compile time if the compiler can figure out there aren't any
side-effects and the object is serializable. But that's a whole
different thread.) It's very efficient because it only parses the
source code once. All the compiler optimizations (like constant
folding) get applied to your special syntax. But you're writing
Parrot macros (IMCC on steroids maybe?) and not Perl!

Another approach is to delay the final parse until run-time. In
this case, the compiler runs the rule over the input source
code and records the matched text, but without running any of the
code blocks. When the sub is called at run-time, the matched text
is parsed by the rule again, but this time with the code blocks
enabled. The second rule above (using eval) would work fine this
way (as long as the %MY games are possible). The down-side to
this approach is that you have lots of string evals happening
at run-time.

Maybe all of these techniques will be possible?

   rule x is macro { ... }
   rule y is delayed { ... }
   rule z is syntax { ... }

- Ken




Re: Hypothetical variables and scope

2002-09-03 Thread Ken Fox

Peter Haworth wrote:
 Also the different operators used (:= inside the rule, = inside the code)
 seems a bit confusing to me; I can't see that they're really doing anything
 different:
 
  / $x := (gr\w+) /vs/ (gr\w+) { let $x = $1 } /
 
 Shouldn't they both use C :=  ?

Depends on what you want. The $x := in the rule binds the
first match to $x -- it does not copy the value. The $x =
in the code block copies the value of the first match. You
can use either binding or assignment in the code block.

Both of them will be undone during backtracking. It's more
efficient to bind, but the copy guarantees changes to $x and $1
are independent.

- Ken



Re: Request for default rule modifiers in a grammar

2002-09-02 Thread Ken Fox

Damian Conway wrote:
  One possibility is that a modifier is
 implemented via a special class:
 
 my class Decomment is RULE::Modifier
is invoked(:decomment) {
 method SETUP ($data, $rule) {
 ...
 }
 # etc.
  }

I'm messing around with regex code generation by
converting first to a grammar. The modifiers seem
to need intimate knowledge of regex - grammar
conversion. This may be a quirk of my approach.
People using tree traversal or generating code
directly from the regex might see something else.
I suspect modifiers will still be deeply connected
with the internals.

Is this why you're thinking of modifiers as
classes? So the modifier can guide the regex
engine (tree generator, code generator, etc.) by
hooking in at certain spots? (I don't envy the
job of standardizing those hooks -- it seems like
every modifier I've thought of needs a different
hook.)

One alternative I was thinking of is treating
modifiers as filters -- something like a tree
transformation language that runs after the core
has parsed a regex. I like filters because they
don't need much knowledge of the regex engine.
They're slower though, so maybe a combination of
the two would be best. Built-in modifiers like :w
and :i could be embedded in the core, but user
defined modifiers could be filters. (Some
modifiers are really just new rules in disguise.
Those are the easiest kind. For example :p5 could
just be an attribute checked by the regex rule:
   rule regex { ($p5) :: p5_regex | p6_regex }
Oh. Wait. That should be 'rule' not 'regex'... ;)

BTW, what code is legal in a grammar block? Can
we introduce new lexicals? I'd like to write
something like this:

grammar foo {
   my $parsing_x;

   rule x { { let $parsing_x = true } ... }
   rule y { ($parsing_x) ... | ... }
}

(Hypotheticals are so cool!)

It's no big deal to put the grammar state in an
outer block, but I like the idea of treating
grammars like classes.

- Ken




Request for default rule modifiers in a grammar

2002-09-01 Thread Ken Fox

The thing I'd like to do right now is turn on :w
for all rules. A Fortran grammar might want to turn
on :i for all rules.

Maybe add modifiers to the grammar declaration?

   grammar Fortran :i { ... }

It would also be convenient to allow the :w
modifier to have lexically scoped behavior so a
grammar can change the definition of a word. For
example, a C grammar might want to skip comments.

Would it be better to create a new (user-defined)
modifier instead of changing the meaning of :w?

BTW, how do we create user-defined modifiers?

- Ken




Re: Regex stuff...

2002-08-31 Thread Ken Fox

Piers Cawley wrote:
  Unless I'm very much mistaken, the order of execution will
 look like:
 
   $2:=$1; $1:=$2;

You're not binding $2:=$1. You're binding $2 to the first
capture. By default $1 is also bound to the first capture.

Assuming that numbered variables aren't special, the order
of execution is:

   $2:=$1:=first; $1:=$2:=second;

That doesn't make any sense though, so numbered variables
must be treated specially -- an explicit numbered binding
replaces the default numbered binding. So, the order of
execution is really:

   $2:=first; $1:=second;

I think this solves both of your puzzles.

One last thing though. Binding might be done at compile-time
because it changes variables, not the values of variables.
Thinking about binding as a compile-time declaration might
be easier than thinking about run-time execution order.

Thinking about binding as a compile-time thing, the rule

   / $2:=(\S+) lt= $1:=(\S+) /

becomes

   / begin $2[\S+]end $2 lt= begin $1[\S+]end $1 /

- Ken




Re: @array = %hash

2002-08-31 Thread Ken Fox

Simon Cozens wrote:
 [EMAIL PROTECTED] (Damian Conway) writes:
 
  %hash4 = (Something, mixing, pairs = and, scalars);

That's perfectly okay (except you forgot the quotes around the and
and you have an odd number of elements initializing the hash).
 
 Urgh, no. Either a pair is an atomic entity or it isn't. Which?

Odd meaning not correct... When initializing the hash, the pair
comes off as a single element. That leaves scalars as a key without
a value. So there's an even, but insufficient, number of elements
initializing the hash.

- Ken




Re: atomicness and \n

2002-08-31 Thread Ken Fox

Damian Conway wrote:
 No. It will be equivalent to:
 
   [\x0a\x0d...]

I don't think \n can be a character class because it
is a two character sequence on some systems. Apoc 5
said \n will be the same everywhere, so won't it be
something like

   rule \n { \x0d \x0a | \x0d | \x0a }

Hmm. Now that I read that, I'm thinking some characters
will be multi-byte sequences. Is there going to be
multi-byte magic for line endings? Even in ASCII data
streams?

- Ken




Re: backtracking into { code }

2002-08-30 Thread Ken Fox

Damian Conway wrote:
 rule expr1 {
 term { m:cont/operators/ or fail } term
 }
 
 Backtracking would just step back over the rule as if it were atomic
 (or followed by a colon).

Ok, thanks. (The followed by a colon is just to explain the behavior,
right? It's illegal to follow a code block with a colon, isn't it?)

After talking to Aaron yesterday, I was wondering if sub-rules are
meant to backtrack at all.

Does the following example backtrack into foo?

   rule foo { b+ }
   rule bar { a foo b }

If it doesn't, I think we're restricted to pure LL(1) grammars.
That would suck. Apoc 5 is so close to ANTLR or precc that it would
be a shame not to be LL(k).

I've been playing around with converting regex operators into
Perl 6 attribute grammars and they look good. If backtracking into
rules doesn't work though, these conversions don't work.

   a ?rule x { a | null }
   a *rule x { a x | null }
   a +rule x($i=0) { a x($i+1) | ($i0) null }
   a {n,m}rule x($i=0) { ($i$m) a x($i+1) | ($i=$n) null }

The non-greedy versions just reverse the productions, so for example

   a *?   rule x { null | a x }

- Ken




Re: backtracking into { code }

2002-08-30 Thread Ken Fox

Larry Wall wrote:
 On Fri, 30 Aug 2002, Ken Fox wrote:
 : Ok, thanks. (The followed by a colon is just to explain the behavior,
 : right? It's illegal to follow a code block with a colon, isn't it?)
 
 I don't see why it should be illegal--it could be useful if the closure
 has played continuation games of some sort to get backtracking.

Apoc 5 has It is an error to use : on any atom that does no
backtracking. Code blocks don't backtrack (at least that's what
I understood Damian to say). Are zero width atoms treated specially?

And can you give me an example of a continuation game? That sounds
sort of like my original question.

Great news about backtracking into sub-rules. Perl 6 is going to
be a lovely system to work with. I think it's going to suffer a bit
from the same declarative-face vs procedural-heart** that Prolog
does, but it hits the little language target perfectly.

- Ken

** Prolog uses a cut (!) operator to control backtracking just like
Perl 6. A big problem (at least for me...) is learning when ! just
makes things run faster vs. when ! gives me the wrong answer. Maybe
I just haven't used Prolog enough to get my brain wrapped around it.




backtracking into { code }

2002-08-29 Thread Ken Fox

A question: Do rules matched in a { code } block set backtrack points for
the outer rule? For example, are these rules equivalent?

  rule expr1 {
term { /operators/ or fail } term
  }

  rule expr2 {
term operators term
  }

And a comment: It would be nice to have procedural control over back-
tracking so that { code } can fail, succeed (not fail), or succeed and
commit. Right now we can follow { code } with ::, :::, etc. but that does
not allow much control. I'm a little afraid of what happens in an LL(Inf)
grammar if backtracking states aren't aggressively pruned.

- Ken



Re: backtracking into { code }

2002-08-29 Thread Ken Fox

Aaron Sherman wrote:
 rule { term { /operators/.commit(1) or fail } term }
 
 The hypothetical commit() method being one that would take a number and

That would only be useful if the outer rule can backtrack into the
inner /operators/ rule. Can it?

I agree with you that a commit method would be useful -- especially when
used on $self. I'd probably write your example as:

  rule { term { m/operators { $self.commit(1) }/ or fail } term }

which is of course just a complicated

  rule { term { m/operators :/ or fail } term }

BTW, why isn't fail a method? Then a rule could pass itself to a sub-rule
and allow the sub-rule to fail it's parent, but not the entire match. Isn't
failing just invoking the last continuation on the backtrack stack?

- Ken



Re: abs_i_i but no abs_i

2002-08-03 Thread Ken Fox

Nicholas Clark wrote:
 I can write more a efficient implementation of abs_i ... things will
 go slightly faster

The law of diminishing returns is broken for a VM. Eventually you
reach a point where adding more ops actually decreases total
performance. Instead of the change in performance tending towards
zero, it actually becomes negative. No surprise there -- it's
the heart of the RISC v CISC debate.

Parrot has a RISC-like 3 operand design. IMHO if all ops are
consistent with that design, it will be easier/faster to generate
code. You can always add abs_i later if abs_i_i turns out to be a
performance problem.

- Ken




Re: On writing JITs

2002-08-03 Thread Ken Fox

Nicholas Clark wrote:
 It seems that foo  (foo - 1) is zero only for a power of 2 (or foo == 0) 
 but is there a fast way once you know that foo is a power of 2, to find out
 log2 foo?

You're right about (foo  (foo -1)).

gcc uses a repeated test and shift. That's works very nicely if foo
is small -- and I bet it usually is. If you want to protect yourself
from the cases where foo is large, you can unroll the loop 2 or 3 times
and then use a binary search.

I've attached two simple implementations I wrote. Neither log2()
implementation is from gcc, so don't worry about GPL contamination.

- Ken

#include stdio.h
#include assert.h

int simple_log2(int foo) {
 int log = 0;

 assert((foo  (foo - 1)) == 0);

 while (foo) {
++log;
foo = 1;
 }

 --log;

 return log;
}

int unrolled_log2(int foo) {
 int log = 0;
 int mid = 4 * sizeof(foo);

 assert(foo != 0);
 assert((foo  (foo - 1)) == 0);

 foo = 1;
 if (foo == 0) goto done;
 foo = 1; ++log;
 if (foo == 0) goto done;
 foo = 1; ++log;
 if (foo == 0) goto done;

 do {
if (foo = (1  mid)) {
foo = mid;
log += mid;
}
mid = 1;
 }
 while (mid);

 ++log;

done:

 return log;
}

int main(int argc, char *argv[]) {

#if 1

 int i = 1;
 while (i  (1  30)) {
printf(log2(%d) == %d\n, i, unrolled_log2(i));
i = 1;
 }

#else

 /* simple timing loop */

 int i = 1000;
 volatile int j;
 while (i  0) {
j = unrolled_log2(1  29);
j = unrolled_log2(2);
j = simple_log2(1  29);
j = simple_log2(2);
--i;
 }

#endif

 return 0;
}




Re: abs_i_i but no abs_i

2002-08-03 Thread Ken Fox

Nicholas Clark wrote:
 But there do seem already to be arguably duplicate 2 operand versions of
 many ops. Hence I was surprised at the lack of consistency.

Right. I suspect that as people get more experience with the new
Perl 6 compiler the 2 operand ops will go away (or vice versa).

At the very least, ops will be ordered so that frequent ops are
nearby each other. If an infrequent op doesn't save more CPU time
than blowing the instruction cache wastes, it's a net loss.

- Ken




Re: Light ideas

2002-08-03 Thread Ken Fox

Dave Storrs wrote:
 why didn't you have to write:
 
   rule ugly_c_comment {
  
/
  
\/ \*  [ .*? ugly_c_comment? ]*?  \* \/
  
{ let $0 :=   }
  
/
   }

Think of the curly braces as the regex quotes. If { is the quote
then there's nothing special about / and it doesn't need to be
escaped. Also, I don't think you want spaces between / and *
because / * isn't a comment delimiter.

 2) As written, I believe that the ugly_c_comment rule would permit nested
 comments (that is, /* /**/ */), but would break if the comments were
 improperly nested (e.g., /* /* */).  Is that correct?

It wouldn't fail, but it would scan to EOF and then back track.
Basically the inner ugly_c_comment succeeds and then the rest
of the file is scanned for '*/'. When that fails, the regex
back tracks to the inner ugly_c_comment, fails that and then
skips the unbalanced /* with .*?. I'd like to add ::: to fail
the entire comment if the inner comment fails, but I'm not sure
how to do it. Does this work?

   /\* [ .*? | ugly_c_comment ::: ]*? \*/

 3) The rule will replace the comment with a single, literal space.  Why is
 this replacement necessary...isn't it sufficient to simply define it as
 whitespace, as was done above?

Probably. I think it's a hold-over from thinking of parser vs lexer,
but that may not be true depending on how the rest of the grammar
uses white space. IMHO value bound to the white space production
should be the actual text (the comment in this case).

- Ken




Re: Light ideas

2002-08-03 Thread Ken Fox

Uri Guttman wrote:
 but remember that whitespace is ignored as the /x mode is on
  all the time.

Whoops, yeah. For some reason I kept literal mode on when
reading the spaces between two literals.

The rules {foo bar} and {foobar} are the same, but some
very low level part of my brain is resisting that. I have
no trouble with {foo | bar} and {foo|bar} though. Is this
a standard issue defect or should I complain to my parents?

- Ken




Re: On writing JITs

2002-08-03 Thread Ken Fox

Jason Gloudon wrote:
 http://www.ddj.com/ftp/2001/2001_07/aa0701.txt
 
 I believe the LOOKUP method was the fastest for me on SPARC, if I recall
 correctly.

Did they really spend 64K to create a lookup table just to find
the most significant bit? Calculating log2 for a power of two is
simpler -- all you need to find is the one set bit. If you really
want to use lookup tables, just use an 8 byte table and shift
by 4 bits until a non-zero is found. (Change constants to taste.)

- Ken




Re: sizeof(INTVAL), sizeof(void*), sizeof(opcode_t)

2001-11-20 Thread Ken Fox

James Mastros wrote:
 In byteswapping the bytecode ...
 
 I propose that we make INTVAL and opcode_t the same size, and gaurrenteed
 to be able to hold a void*.

It sounds like you want portable byte code. Is that a goal? It seems like
we can have either mmap'able byte code or portable byte code, but not both.
Personally, I'd rather have portable byte code because memory is cheap
and self-modifiying byte code opens up a lot of optimization potential. I
know others disagree.

Are we looking at two different byte code formats? Dan?

- Ken



What happened at little languages?

2001-11-19 Thread Ken Fox

Can't find any articles or notes on what happened
at the conference. What happened? I'm really curious
about the Worse is Better panel and the talk that
Dan and Simon gave.

- Ken



Re: What happened at little languages?

2001-11-19 Thread Ken Fox

Simon Cozens wrote:
 Us: We're working on this, that and the other.
 Them: Pshaw. We solved those problems thirty years ago.

Were Perl and Python both grouped into the same category
of re-inventing the wheel? Or is this just the academic
distaste for Perl syntax showing through? I had hoped that
the Perl 6 effort, especially Damian's influence, might
gain respect for Perl in the academic community. Doesn't
sound like it though.

What new and interesting things did the Them crowd
talk about?

- Ken



Re: Proper Tail Recursion

2001-11-15 Thread Ken Fox

Shlomi Fish wrote:
 Proper Tail Recursion is harder to debug, but consumes less memory and is
 faster to execute ...

It definitely consumes less memory, but performance is the same (until
the memory issue starts dominating...) I don't know what you mean by
debugging -- user code or parrot internals? Users may be confused when
caller() doesn't have the complete list of callers, but that doesn't
make debugging harder.

 For instance, we can have a ret-with-call opcode. However, isn't it
 exactly the same as a jump instruction ?

No. The current scope must be destroyed, lexicals destroyed, localized
globals restored, etc. It's basically the clean-up of a return, but the
control flow logic of a sub call.

 What is the current stance on implementing proper tail recursion in perl6?

You'd have to ask perl6-language. IMHO Parrot should have tail-call
ops even if perl6 doesn't specify tail recursion.

- Ken



better parrotguts drawing

2001-11-14 Thread Ken Fox

I made a couple changes to the drawing. Stacks
and register structures are now a bit more
conceptual -- it is much easier to see how they
work.

See http://www.msen.com/~fox/parrotguts.png
for the latest. Keep in mind that the light blue
frame stuff at the bottom is experimental.

Anybody interested in a version useful for hand
simulation? I'm pretty sure that gluing a picture
to foam core board would be a pretty good tool
for simulating Parrot. Gotta love low tech push
pins and string... ;)

- Ken



Re: memory assumptions

2001-11-13 Thread Ken Fox

Michael L Maraist wrote:
 Are we allowing _any_ dynamic memory to be non-GC-managed?

Parrot must allow third party libraries to use the standard system
malloc/free. Playing linker games to hide malloc/free gets *really*
ugly.

 Can we assume that a buffer object is ONLY accessible by a single
 reverse-path-to-PMC?

IMHO whatever we want to call private shouldn't be accessible to
extensions. If singleton buffers make GC easier then do it. PMC
internals are definitely private -- vtables require it.

 I believe Dan said that we _could_ assume that PMC's were unmovable

The term PMC is used in two ways: a PMC is an opaque handle to
a Perl variable, and a PMC is a struct that holds (meta)data for
a Perl variable.

The handles are movable, but the structs aren't. This isn't a big
deal since the structs are all the same size -- it's the array of
handles concept often used to simplify GC. I don't think anybody
has decided whether PMCs nested inside another PMC (arrays/hashes)
follow the normal PMC rules. (It would be a pain to use them if
they didn't, but there are lots of optimizations that need
contiguous+movable internals.)

 Are _all_ handles (PMC's, Strings, or whatever) that serve as
 root-set elements dynamically allocated?

Nope. Part of the root set includes auto C PMC handles used by
extensions. You also have to think about hardware registers too
if GC is running in another thread.

IMHO the easiest way to code this is to define an API that
extension writers use to declare every PMC handle. This code
looks ugly and encourages bugs though so a lot of people don't
like it. Many systems treat the C stack as conservative roots.
Those systems flush the hardware registers to the C stack
before running the collector. The benefit is that handles
don't need to be declared. The cost is the non-portable code
used to scan the C stack and flush the registers.

Another simple solution is to suspend the collector while in
extension code (or untrusted extension code). This works for
lots of cases, but falls apart badly in callback/event-loop
extensions that never return.

One nice thing about writing Perl extensions is that the SV *
can be stuck in odd places. For example, I put them in X Toolkit
and Motif widget user data slots. Once the SV * are stuck
in the slots I have no idea what happens to them. This works fine
because the ref count will keep the SV alive until the widget
dies. A hybrid ref count GC with pinned PMCs will work exactly
the same. If we get fancier though and either use movable
PMCs or tracing collectors, then my user data slots become
roots. That will crash and burn because I don't know how to
register them. I'd have to use a separate array for holding the
PMCs and then stick the array indexes into the widgets.

It's probably not worth making my job as an extension
writer easier if it slows/complicates the rest of the GC. Most
extension writers are pretty good at this sort of thing and
a special utility library for pseudo-pinned PMCs wouldn't be
any big deal.

Hmm. I guess that argues against pinned PMCs. Maybe the best
thing is to develop a public API to PMCs that doesn't assume
pinning. That way you can use pinned PMCs as an implementation
optimization while leaving the door open for changing the
internals in the future.

Other extension writers may have a different view. ;) Asking
for comments on comp.lang.perl.modules would be a good idea.
Maybe the perl-xs mailing list too.

- Ken



Re: Lexical implementation work

2001-11-13 Thread Ken Fox

Dan Sugalski wrote:
 Nope, not stone tablet at all. More a sketch than anything else,
 since I'm not sure yet of all the things Larry's got in store.

Ok. I've made some more progress. There's a crude picture of
some of the internals at http://www.msen.com/~fox/parrotguts.png
The lexical stuff is at the bottom of the picture.

I've patched the assembler to accept syntax like:

.begin
  .reserve 1
  fetch_lex I0, 0, 0
.end

This is nice syntax for use in hand-coded assembler. I assume
that a compiler will take control of the scope definitions and
emit the enter/exit scope ops itself. (I'm throwing in the more
complicated sync_scope op too so a compiler can handle weird
jumps between scopes -- IMHO it's simpler to do this than provide
an introspective API that lets the compiler manually adjust
the scope stack.)

There's psuedo-code for the ops attached below. I'm probably
a couple evenings away from having things working.

Unless people really hate the idea, I'm going to put in v
operand types in ops2c.pl. They'll look like c types (e.g. nc),
but reference the current lexical scope instead of the constant
section.

QUESTIONS!

Who owns the bytecode format? How do I propose changes? I need
a scope definition section. Each scope is assigned a per-module
id. I'm not sure what info is needed yet, but certainly a size
and a code ref (opcode address) for the DESTROY sub.

The control stack isn't used for much and it would simplify my
code a lot if we combine the frame stack with the control stack.
The only down-side I think will be with hand-coded assembler.
There may be a redundant restore scope pointer on the stack
when calling a sub in the same scope. (This is impossible with
Perl code BTW.) Well, there's also a purist downside too --
mixing opcode addresses with lexical storage. This is the thing
that makes capturing a closure easier, so I see it as an
advantage.

Anybody care if I subsume the control stack?

Lastly, does anybody care if I change how assembler directives
are registered? I think there are going to be a whole bunch of
symbol table and debugging directives going in. The current if
tests are kind of clunky.

Here's the op definitions (in pseudo-code form) taken from the
copy of core.ops I'm working on:

--- parrot/core.ops Wed Nov  7 22:34:10 2001
+++ my-parrot/core.ops  Tue Nov 13 21:13:53 2001
@@ -1930,6 +1930,159 @@
 
 ###
 
+=head2 Lexical Variables
+
+These opcodes implement lexical scopes and PMC lexical variables.
+This functionality is only concerned with scoping and storage,
+not with the symbol tables used for translating variable names to
+storage. Compilers often use the same symbol table for managing
+both lexical and dynamic variables.
+
+First, the terminology:
+
+Lexical Scope: A contiguous region of text (code) bounded by
+.begin/.end scope directives. A lexical scope may contain any
+number of non-overlapping interior (child) scopes. The flow
+control of the code does not affect these scoping rules.
+
+Lexical Variable: A variable that is visible only by code in the
+same lexical scope, or an interior scope, that the variable is
+defined in.
+
+Frame: A group of lexical variables defined in a lexical scope.
+The frame does not include the lexical variables defined in
+interior scopes -- each interior scope requires its own frame.
+
+Frame ID: The position of a frame relative to either the current
+scope (non-positive frame IDs), or the outer-most scope (positive
+IDs).
+
+Variable ID: The non-negative position of a variable relative to
+the start of a frame.
+
+Scope ID: A unique positive number assigned by the assembler or
+compiler to each lexical scope. Information about the scope, such
+as how much space is required for a frame, is retrieved using the
+scope ID.
+
+=over 4
+
+=item Bfetch_lex(out p variable, i|ic frame_id, i|ic variable_id)
+
+=item Bstore_lex(i|ic frame_id, i|ic variable_id, p variable)
+
+Note: While the PMC code is being developed, lexicals hold integers
+instead of PMCs. This changes the usage of lexicals because PMC
+lexicals will not need to be stored back to the frame.
+
+=item Benter_scope(ic scope_id)
+
+=item Bexit_scope()
+
+=item Bsync_scope(ic scope_id)
+
+=item B.begin [ID]
+
+B.begin is a pseudo-op that does two things: it begins a lexical
+scope at the current position in the code and it emits an
+Benter_scope op for the current scope. If ID is not provided, the
+assembler will generate one automatically.
+
+=item B.reserve variable_count
+
+=item B.end
+
+B.end is a pseudo-op that does two things: it ends the current
+lexical scope (returning to the enclosing lexical scope) and it
+emits an Bexit_scope op.
+
+=back
+
+=cut
+
+AUTO_OP fetch_lex(i, i|ic, i|ic) {
+  /*
+ int frame_id = $2;
+ int variable_id = $3;
+
+ if (frame_id = 0) {
+frame_id += interpreter-frame_display-used;
+ }
+
+ $1 = 

Re: preferences for data structure diagrams?

2001-11-12 Thread Ken Fox

Robert Spier wrote:
 On Sun, Nov 11, 2001 at 07:38:28PM -0500, Ken Fox wrote:
 | ... Powerpoint would be a better choice since everybody
 | has to deal with that format anyway.
 
 Please, no!  Powerpoint is one of the few formats which
 cannot be easily read on a non Windows or Mac system.  Any
 raster graphics format, or PDF.  Or all of the above.  

Brent Dax wrote:
 Ken Fox:
 # Ideally we should have something with the quality of
 # PerlGuts Illustrated and the simplicity of ASCII.
 
 I'd suggest PDF or PostScript for most, and a GIF or something for
 those who can't read the main format.

Sorry I wasn't clear. Definitely PDF or HTML+PNG are the
choices for viewing.

I was wondering about *editing* them. IMHO all the data
structures in Parrot must be documented as beautifully as
PerlGuts Illustrated. Parrot is evolving quickly though
and needs documentation that is easy to update.

dia is probably a worse choice than Powerpoint -- just
count the number of people who have an editor able to make
changes to those formats. I've heard of several
non-Microsoft solutions for editing Powerpoint too. I'm
not a fan of Powerpoint, but it does seem to be a de facto
structured graphics format.

- Ken



polymorphic inline caches

2001-11-12 Thread Ken Fox

I few days ago I suggested inlining some PMC methods
would be good for performance. It turns out that this
question has been heavily studied by the OO community
for at least 10 years. A common solution is to
dynamically replace a method call with the body of the
method wrapped in an if statement.

Code like:

  object-foo()

becomes

  if (typeof object == Bar) {
 ... inlined body of Bar::foo
  }
  else {
 object-foo()
  }

There are lots of tweaks to the idea. SmallEifel
uses this concept to completely eliminate vtable
method lookups -- all of its methods are called using
inlined binary search trees. This is totally useless
for Parrot, but somebody might find it as interesting
as I did. ;)

Where this fits into Parrot's interpreter is that
languages could pre-generate ops corresponding to
dynamically generated inlined caches. All we need is a
way to replace the simple method call op with the
inlined one. Yep. You heard it -- dynamically modified
byte code.

- Ken



Re: polymorphic inline caches

2001-11-12 Thread Ken Fox

Simon Cozens wrote:
 You save one level of indirection, at a large complexity
 cost.

A lot less complexity than a JIT though. 100% portable
code too.

It's also something that can be bolted on later, so there's
no reason to reject it now. I'm just throwing it out to the
list because I know other people are experimenting with
dispatchers.

- Ken



Re: Mixing lightweight refcount and full GC

2001-11-12 Thread Ken Fox

Michael L Maraist wrote:
 No barriers for us?

Generational collectors require a write barrier because
old objects must never point to younger ones. ('Course Dan
said he's starting with a simple copying collector, so we
don't need a barrier. Hmm. I guess Dan's not *reject*ing
a barrier, just rejecting a barrier *now*. ;)

Your point about XS code is a good one. I'm not sure the
write barrier is a problem though because all XS code
currently goes through the inc/dec/store API. The main
problem IMHO is that XS code caches PMC internals -- if
a GC moves the internals the XS code will crash. All
copying collectors have this problem.

Good article. I haven't gotten through the vmem article
yet and now here's another one. ;)

- Ken



Re: JIT compilation

2001-11-08 Thread Ken Fox

Dan Sugalski wrote:
 [native code regexps] There's a hugely good case for JITting.

Yes, for JITing the regexp engine. That looks like a much easier
problem to solve than JITing all of Parrot.

 If you think about it, the interpreter loop is essentially:
 
while (code) {
  code = (func_table[*code])();
}

That's *an* interpreter loop. ;)

 you pay a lot of overhead there in looping (yes, even with the computed 
 goto, that computation's loop overhead), plus all the pipeline misses from 
 the indirect branching. (And the contention for D cache with your real data)

The gcc goto-label dispatcher cuts the overhead to a single indirect
jump through a register. No table lookups. No extra branches. This is
even on the register starved x86. Yeah, there's a stall potential,
but it looks easy enough to fill with other stuff.

If Perl code *requires* that everything goes through vtable PMC ops,
then the cost of the vtable fetch, the method offset, the address
fetch and the function call will completely dominate the dispatcher.

 Dynamic languages have potential overheads that can't be generally factored 
 out the way you can with static languages--nature of the beast, and one of 
 the things that makes dynamic languages nice.

I know just enough to be dangerous. Lots of people have done work in
this area -- and on languages like Self and Smalltalk which are as
hard to optimize as Perl. Are we aiming high enough with our
performance goals?

I'll be happy with a clean re-design of Perl. I'd be *happier* with
an implementation that only charges me for cool features when I use
them. (Oops. That's what Bjarne said and look where that took him... ;)

- Ken



Re: Yet another switch/goto implementation

2001-11-08 Thread Ken Fox

Dan Sugalski wrote:
 Gack. Looks like a mis-placed optimization in perl 5. The list of a foreach 
 is *supposed* to flatten at loop start and be static. Apparently not. :)

Is anybody keeping a list of things that are *supposed* to be static? Is
the list changing much with Perl 6?

 Care to file the perl 5 bug report, or shall I?

I thought it was a feature. ;)

- Ken



Re: Yet another switch/goto implementation

2001-11-07 Thread Ken Fox

Dan Sugalski wrote:
 my $foo;
 $foo = 12;
 print $foo;
 $foo /= 24;
 print $foo;
 
 may well have the vtable pointer attached to the PMC for $foo change with 
 every line of code. Probably will, honestly.

Well, there's only two assignments there, so I assume that print is
going to upgrade the scalar to have a string rep. Are you thinking of
scalar representations something like:

  undef
  integer
  integer_with_string
  real_with_string (dirty)
  real_with_string

I like the principle of automatically changing representations, but
shouldn't the compiler be a little smarter? In this example every
expression has a static type. We're burning memory and CPU just to do
stuff at run-time that the compiler could have eliminated. For
example, Perl could use reps like:

  number (undef)
  number (int)
  number (int)
  number (float)
  number (float)

Building inlined number vtable ops seems like a big win if Perl
can use them.

Which brings up an interesting question. Does Perl the language
require just-in-time representation changes or is it legal for the
compiler to choose an efficient representation in advance?

- Ken



Re: Yet another switch/goto implementation

2001-11-07 Thread Ken Fox

Dan Sugalski wrote:
 At 04:29 PM 11/7/2001 -0500, James Mastros wrote:
  On Wed, Nov 07, 2001 at 10:15:07AM -0500, Ken Fox wrote:
   If Perl can keep the loop index in an integer register, then Parrot
   could use fast loop ops. IMHO there's no point in using fast loop ops
   if taking the length of @vector is expensive.
 
  Huha?  I don't see what the problem is with keeping the index in an I
  register.  The problem is with keeping the _current element_ in an I
  register.
 
 Yup. Hidden things--loop counters, iterators, stuff like that--can and
 probably will use the I registers.

If the interfaces between hidden things and PMCs are expensive
then I don't see the point in putting those things in non-PMC
registers. For example, if a loop index is compared to the
length of an array, Parrot is going to have to convert the index
to a PMC so it can be compared to the PMC array length. Might as
well just use a PMC index to begin with...

- Ken



Re: JIT compilation

2001-11-07 Thread Ken Fox

Simon Cozens wrote:
 ... Mono's work on JIT compilation ... they've got some pretty
 interesting x86 code generation stuff going on.

Mono is doing some very cool stuff, but it's kind of hard
to understand at this time. The x86 code generation macros are
easy to use, but the instruction selection is based on a
re-implementation (?) of BURG and it will take some time to
dig through the way that code works. BURG is a code-generator
generator in the style of yacc. (More trips to the library...)

I also poked around the Lightning project which is documented a
bit better. It's also more widely ported (although I'm sure Mono
is going to catch up soon -- those guys code like maniacs.)
Here's a quote from Lightning that worries me a bit:

from doc/body.texi:
| @lightning{} has been useful in practice; however, it does have
| at least four drawbacks: it has limited registers ...
|
| The low number of available registers (six) is also an important
| limitation.  However, let's take the primary application of dynamic
| code generation, that is, bytecode translators.  The underlying
| virtual machines tend to have very few general purpose registers
| (usually 0 to 2) and the translators seldom rely on sophisticated
| graph-coloring algorithms to allocate registers to temporary
| variables. ...

If all the current JIT work is focused on JVM and CIL, then Parrot's
JIT is going to break new ground.

There's a more fundamental issue though. After spending time
looking at the benefits of a JIT and thinking about the yet
another switch/goto implementation conversation, I'm starting
to think that a JIT will be almost useless for Parrot.

JITs help when the VM is focused on lots of small instructions
with well-known, static semantics. Perl's use of Parrot is going
to be focused almost completely on PMC vtable ops. A JIT has
no advantage over a threaded interpreter.

About the only place where a JIT might really win big is in
regexps.

Have other people come to the same conclusion?

Is there any interest in a less dynamic dialect of Perl that can
take advantage of a JIT? Should we feed a request to p6-language
to think about this?

- Ken



Re: vmem memory manager

2001-11-07 Thread Ken Fox

Dan Sugalski wrote:
 I doubt there'll be GC pluggbility. (Unless you consider Ripping out the
 guts of resources.c and gc.c and replacing them pluggability... :) If it
 works out that way, great, but I don't know that it's really something I'm
 shooting for.

That problem doesn't bother me too much either. If you change your mind
though and want to consider pluggability, you might look at Intel's ORP
GC interface documentation:

  http://orp.sourceforge.net/

ORP defines standard traversals, structure queries, write barriers, etc.
that allow different GC algorithms to be plugged in. The header file
defining the interface is common/include/gc_for_orp.h. It's kind of
big, 45k, so I'm not going to post it unless people really want to read
it.

Another interesting pluggable GC that I read about a long time ago is:

  http://www.usenix.org/publications/library/proceedings/c++94/attardi.html

There was source available, but I've lost the URL.

Both of these systems are C++ by the way.

- Ken



Re: Yet another switch/goto implementation

2001-11-07 Thread Ken Fox

Dan Sugalski wrote:
 No it isn't. It can get the integer length of the array and stuff it in a
 register at the beginning of the loop, or do an integer compare when it
 needs to, depending on the semantics of the loop.

Wow. Did you just come up with a place in Perl where static
behavior is guaranteed?

  my @array = ( 1 );

  foreach my $e (@array) {
 push @array, $e + 1;
  }

No, I guess not. ;)

Maybe the hyper-operators will give us more optimization room.

- Ken



Re: Yet another switch/goto implementation

2001-11-07 Thread Ken Fox

Dan Sugalski wrote:
 At 07:47 PM 11/6/2001 -0500, Ken Fox wrote:
 If the guts of a vtable implementation are ripped out and given an
 op, isn't that inlining a PMC method? There doesn't seem much point
 in replacing a dynamic vtable offset with a constant vtable offset.
 The method really needs to be inlined -- either by copying the code
 or by calling the implementation directly.
 
 We can't do that. PMCs, even statically typed ones, can change their 
 vtables as they see fit. Also for this to be at all efficient we'd need to 
 have all the common variable types vtables be available at compile time. 
 That's not likely in the general case.

Isn't this a language decision and not a Parrot decision? Shouldn't Parrot
leave optimization strategy to the compiler?

I've been thinking of ways to speed up stuff like:

  foreach my $x (@vector) {
$x *= $scale
  }

also written

  @vector ^*= $scale;

or even

  for (my $i = 0; $i  @vector; ++$i) {
@vector[$i] *= $scale
  }

If Perl can keep the loop index in an integer register, then Parrot
could use fast loop ops. IMHO there's no point in using fast loop ops
if taking the length of @vector is expensive.

It seems like Perl might have a tough time in general using non-PMC
registers. Is anybody working on the Perl code generator yet? I think
the code generator might influence op implementation.

- Ken



Re: Yet another switch/goto implementation

2001-11-06 Thread Ken Fox

Simon Cozens wrote:
 On Mon, Nov 05, 2001 at 11:35:53PM -0500, Ken Fox wrote:
  IMHO Perl is getting 
 
 Interesting construction. :)

Yeah, that should have been a disclaimer. I've heard static typing
proposed, but nothing appears finalized about anything yet. Something
like static typing might even be a bolt-on module if it doesn't make
the core.

  some static typing ability, so it should be able
  to emit bytecode that doesn't go through the PMC vtable.
 
 Yes, but that's fundamentally different from inlining vtable methods
 in the runops loop, which is what you were originally suggesting. 
 I'm now unsure what you're actually getting at.

If the guts of a vtable implementation are ripped out and given an
op, isn't that inlining a PMC method? There doesn't seem much point
in replacing a dynamic vtable offset with a constant vtable offset.
The method really needs to be inlined -- either by copying the code
or by calling the implementation directly.

For example, if you have a length op, the generic one would
dispatch through a vtable: (hand waving pseudo code)

  op_length:
 integer[pc[1]] = pmc[pc[2]]-vtable.length(pmc[pc[2]]);
 pc += 3;
 goto *pc;

the inlined one should be:

  op_length_array:
 if (pmc[pc[2]]-type == PERL_ARRAY) {
integer[pc[1]] = ((Perl_Array *)pmc[pc[2]])-length;
pc += 3;
goto *pc;
 }
 else {
goto op_length;
 }

It's probably naive to think that static typing is going to be
reliable enough to eliminate the type check. Hopefully branch
prediction will be nearly 100% accurate though.

This code would be horrible to implement and maintain by hand though.
It would be very cool if there was an option to unroll PMC methods
into inlined ops based on profiling example code. Perl might even send
enough type information to Parrot that Parrot could do some strength
reduction on its own (and replace generic PMC ops with inlined PMC
ops).

- Ken



Re: Yet another switch/goto implementation

2001-11-05 Thread Ken Fox

Dan Sugalski wrote:
 We might want to have one fast and potentially big loop (switch or computed 
 goto) with all the alternate (tracing, Safe, and debugging) loops use the 
 indirect function dispatch so we're not wedging another 250K per loop or 
 something.

Absolutely. There's no gain from doing computed goto for those anyway
because the per-op overhead makes direct threading impossible. Brent Dax
already posted an example of why this is bad.

Function calls are not slow. It's the extra jumps and table lookups
that are slow. If a mode has extra over-head it won't see any advantage
with computed goto over function calls. (At least this is what I've
found testing on Pentium III and Athlon. Most RISC systems should see
similar effects. Older CISC systems with slow function calls may be a
different story.)

BTW, 250K for the size of the inlined dispatch loop is way too big. The
goal should be to put the hot ops inline and leave the other ones out.
Ideally the dispatch loop will fit into L1 cache -- maybe 8k or so. IMHO
we'd be a lot better inlining some of the PMC methods as ops instead of
trig functions. ;)

- Ken



Re: Yet another switch/goto implementation

2001-11-05 Thread Ken Fox

Simon Cozens wrote:
 On Mon, Nov 05, 2001 at 02:08:21PM -0500, Ken Fox wrote:
  we'd be a lot better inlining some of the PMC methods as ops instead of
  trig functions. ;)
 
 Won't work. We can't predict what kind of PMCs will be coming our way, let
 alone what vtables they'll use, let alone what methods those vtables will use
 most often.

This is not appropriate for Parrot, but Perl should definitely have
some of its own PMC ops inlined.

IMHO Perl is getting some static typing ability, so it should be able
to emit bytecode that doesn't go through the PMC vtable. If somebody
adds type inferencing there would be even more opportunities for
skipping the vtable.

If Perl isn't able to infer the type of a PMC, it could still win by
inlining a type check and the common case in a custom op. If the VM is
automatically generated this could be done by looking at profiling
info and inlining the hottest PMC methods. It seems feasible to even
have multiple dispatch loops optimized for different types of apps,
say PDL number crunching vs. XML munging.

That's a lot of ifs and shoulds -- but I'd rather hear that
instead of can't.

- Ken



Rebinding can change type? [was: Static Values and Variable Bindings]

2001-11-02 Thread Ken Fox

Garrett Goebel wrote:
 Just does compile-time typing for $foo? Not inlining the constant?

You can't assume that the value associated with the symbol is
the same each time through the code, so how can it be inlined?

 I was thinking lowercase typed variables couldn't be rebound, because
 they were compile-time optimized... Can they? Or are we back to the
 selective use of yet to be named pragmas?

Binding normally means associating a value with a symbol, so binding
to a different type depends upon whether the type information is
associated with the symbol or the value.

I can't recall what Perl 6 does. I suspect that it allows binding
to change types because binding is supposed to replace messing with
globs.

This code should work, yes?

  my int $foo;

  ... $foo is a tiny little int

  { my $bar; $foo := $bar }

  ... $foo is a big hulking scalar

Why would sticking const on $foo change anything?

- Ken



Instruction timings for Intel/AMD chips?

2001-11-01 Thread Ken Fox

After downloading a dozen PDF files I've given up.
All I need is the approximate cycle counts for
instructions and address modes.

The particular problem I've got now is deciding
which of these three is the fastest:

   movl (%edi,%eax,4),%eax
   movl (%edi,%eax),%eax
   movl (%edi),%eax

Same with:

   movl $1,(%eax,%edx,4)
   movl $1,(%eax,%edi)

According to an old 486 book I have, it claims
that complex addressing modes don't have cycle
penalties for leaving out the scale or the offset.
That seems hard to believe for the RISC-like
P3s and Athalons.

What about other processors? Is it common to
have address modes like:

   base+offset*scale

Most RISC instruction sets only provide base +
constant offset don't they?

Yeah, yeah, I know. Premature optimization is the
root of all evil. Except this isn't premature.
Getting to 50 mops was pretty easy. Getting to 100
mops is a *lot* harder!

- Ken



Re: vmem memory manager

2001-11-01 Thread Ken Fox

Michael L Maraist wrote:
[an incredible amount of detailed information that will
 take me weeks to digest...]

This looks like a malloc/free style allocator. Since the whole
GC system for Parrot is on the table, you don't have to constrain
yourself to malloc/free. IMHO free is not needed at all -- we
should be scavenging entire arenas all at once. I assume you
want to use malloc to grab an arena, but carving up the arena is
the GC's job.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-30 Thread Ken Fox

Dan Sugalski wrote:
 Hmmm. I'd like to see the two run with the same style dispatcher to get a 
 closer comparison of run speeds. When you say threaded, you're doing more 
 or less what the switch/computed-goto dispatcher does, right?

If you take a look at the op_add code I posted, you'll see that it
uses gcc's goto *address feature. What happens is the bytecode is
pre-processed and the addresses of the ops are stashed directly
into the byte code stream. There's nothing to compute -- the system
just loads the address and jumps.

I'd really like to try different dispatchers out with Parrot too. That's
why I asked if anybody did a threaded dispatcher yet. (If nobody has,
then I'll do one...)

 Parrot and Kakapo should have very similar mops when using the
 same dispatcher.
 
 In which case I'm not sure I see the win, though I'm definitely curious as 
 to how it stacks up. I think the end result will be essentially the same as 
 far as performance goes, but I'm not sure yet.

The win is in simplicity -- both the VM and the compiler. Register
VMs require the compiler to load things into registers, right? If the
register allocation is good it will be fast. Poor register allocation
will waste time with redundant loads, excessive register spills, etc.

 Ken Fox wrote:
  What happens when you goto middle depends on where you started.
 
 Personally I'm all for throwing a fatal error, but that's just me.

:)

 If you're copying things around that means you have to do a bunch of 
 pointer fixups too, otherwise you'll have code pointing to the wrong place.

Nope. The byte code holds pointers to scope definitions (think of them
like templates) and op addresses. Everything else is a [frame:offset]
pair. Entire frames can be moved around without disrupting any code. The
callee can even move around its caller's frames and nothing breaks. (This
is how taking continuations works.)

 If you're not storing pointers to things in frames, then I don't see the 
 advantage to this scheme, since you're indirect anyway, which is what we 
 are now.

Indirection is absolutely required in both our schemes. The difference
is that a register architecture copies data into a register whereas
a storage-to-storage architecture works on the data in place.

You can think of Kakapo as having a single address mode for
everything. Constants, globals, locals, temporaries, etc. are all
identified by [frame:offset]. (In-lined constants are an exception
to this rule, but that's just a performance hack.)

 Smart compilers aren't that tough to build any more--there's a lot of 
 literature for them these days, so it's not much more work to build a
 smart one than it is to build a dumb one.

Sure, agree. Smart compilers can take some time to run though. For
example, IMHO a compiler has to hoist register loads out of loops to
get decent performance. To do that it's going to have to analyze when
the local variable will change -- a pretty expensive optimization.

Maybe it won't matter because PMCs all introduce an extra layer of
indirection anyway. I have no idea whether most things will be
PMCs or if compilers will try to use strongly typed registers for
speed.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-30 Thread Ken Fox

Michael L Maraist wrote:
 The only memory storage for scalars that I currently am conceiving of is
 in name-space stashes (globals). Thus our most likely implementation of S2S
 would be to have 'add g_x, g_y, g_z' which performs two stash 
 lookups, an add, then one stash write.

Kakapo currently uses:

   .global G:$x
   .global G:$y
   .global G:$z

   add G:$x, G:$y, G:$z

The assembler translates this to:

   add [1:0], [1:1], [1:2]

The operands are in [frame:offset] notation. Global variables always
appear in frame #1. Symbol table lookup is done at compile time.
Run-time uses the same addressing mode to fetch globals as it does
to fetch locals.

There isn't any aliasing mechanism -- I'm not sure if Perl 6 bindings
are going to require VM aliasing support or not. Even without aliasing,
somebody can still do by-name lookups: (hand-waving here)

   lookup L0, $x

That doesn't alias L0 to G:$x -- it copies the value of G:$x into
L0. (Note that PMCs act like pointers anyways, so if G:$x is a PMC,
then L0 is effectively an alias. That wouldn't be true if G:$x
was a machine int.)

 How does the GC sweep across a huge VM heap (as opposed to walking
 the register / stash maps).

It simplifies things because the root set is just the frame display
and inactive regions of the display that have been pushed onto the
stack. Since there aren't any registers or register stacks, all
traversals are done on a single data structure -- a frame.

 I spoke exclusively on scalars (PMCs) because I had an interesting time 
 considering static datatypes.  It's obvious what happens when we declare:
 for my int $idx ( 0 .. 100 ) {
 }
 We get an integer register variable which maps directly to add / extraction 
 calls.

Maybe. You might get a strongly-typed PMC depending on how the body
of the code uses $idx. (Dan has said that Perl 6 functions will take
a list of PMCs as arguments. If $idx gets passed to a sub, you'll
need to create a scratch PMC out of that integer register.)

Capturing a closure inside the loop might also affect how $idx is
allocated.

 But what about:
 
 our int $g_idx;

You're asking how Parrot will do this? I don't think Dan's told us yet.

In Kakapo's case, the compiler just assigns a spot in the
scope definition. When the scope is entered, a frame is created
large enough to hold all the variables. A single frame holds
many types -- the only difference between our int $g_idx and
our $g_idx is how the space in the frame is initialized.

- Ken



Re: java vs. parrot mops

2001-10-30 Thread Ken Fox

Kevin Huber wrote:
 This is a comparison of mops running on Parrot (-O6 on an Athlon 700)
 versus Java JDK 1.4.0 beta 2 jitted and interpreted.  You can see that
 Parrot performance is very comparable to Java in interpreted mode.

I have an Athlon 700 too. With these compiler flags:

PERL-CFLAGS = -O3 -fomit-frame-pointer -pipe -s -march=pentium -ffast-math \
-fexpensive-optimizations -fno-strict-aliasing \
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

I'm seeing 24 mops which puts Parrot even closer to Java. Do those
flags improve your times?

When you run mops.pbc through pbc2c what kind of performance do
you get?

I have a feeling that Parrot will end up with higher code density
than a JVM so performance in real world cases will be better. Of
course most of the effort in Java is going into JIT, so shooting for
JVM interpreter speeds is likely to be shooting way too low.

- Ken



Anybody write a threaded dispatcher yet?

2001-10-29 Thread Ken Fox

Anybody do a gcc-specific goto *pc dispatcher
for Parrot yet? On some architectures it really
cooks.

- Ken



Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

A little while back I posted some code that
implemented a storage-to-storage architecture.
It was slow, but I tossed that off as an
implementation detail. Really. It was. :)

Well, I've tuned things up a bit. It's now
hitting 56 mops with the mops.pasm example. Parrot
turns in 24 mops on the same machine with the same
compiler options. This is not a fair comparison
because the Parrot dispatcher isn't optimal, but it
shows I'm not hand waving about the architecture
any more... ;)

Dan was right. It's a lot faster to emit explicit
scope change instructions than to include a scope
tag everywhere. Memory usage is about the same, but
the explicit instructions permit code threading
which is a *huge* win on some architectures. The
assembler does 99% of the optimizations, and it
still uses scope tagged instructions, so nothing is
really lost by ripping out the scope tags.

One thing I learned is that it's not necessary (or
desirable) to do enter/exit scope ops. I implemented
sync_scope which takes a scope id as an operand
and switches the VM into that scope, adjusting
the current lexical environment as necessary. This
works really well. The reason why sync_scope works
better than explicit enter/exit ops is because
sync_scope doesn't force any execution order on the
code. Compilers just worry about flow control and
the VM figures out how to adjust the environment
automatically. For example, Algol-style non-local
goto is very fast -- faster and cleaner than
exceptions for escaping from deep recursion.

One other thing I tested was subroutine calling.
This is an area where a storage-to-storage arch
really shines. I called a naive factorial(5) in a
loop 10 million times. Subroutine call performance
obviously dominates. Here's the code and the times:

Parrot: 237,000 fact(5)/sec

fact:   clonei
eq  I0, 1, done
set I1, I0
dec I0, 1
bsr fact
mul I0, I0, I1
done:   saveI0
popi
restore I0
ret

Kakapo: 467,000 fact(5)/sec

.begin
fact: arg   L0, 0
  cmp   L1, L0, 1
  brne  L1, else
  ret.i 1
else: sub   L2, L0, 1
  jsr   L3, fact, L2
  mul   L4, L0, L3
  ret.i L4
.end

I think the main thing that makes the storage-
to-storage architecture faster is that the callee
won't step on the caller's registers. The caller's
arguments can be fetched directly by the callee.
There's no argument stack or save/restore needed.

Here's the calling conventions for Kakapo.

On a sub call, the pc is saved in the ret_pc
register. Any frames not shared (lexically) between
the caller and callee are dumped to the stack (just
the frame pointers; the frames themselves are never
copied).

A sync_scope instruction at the start of a sub
takes care of building the callee's lexical
environment.

The caller passes arguments by reference. The arg
instruction uses the operands in the jsr instruction
as an argument list. (The jsr instruction is easy to
access because the ret_pc register points to it.)
arg works exactly like set except that it uses
the caller's lexical environment to fetch the source
value. Yes, this makes jsr a variable-size instruction,
but so what? There's no penalty on a software VM.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Dan Sugalski wrote:
 What sort of dispatch was your version using, and what sort was
 parrot using in your test?

Parrot used the standard function call dispatcher without bounds
checking.

Kakapo used a threaded dispatcher. There's a pre-processing phase
that does byte code verification because threading makes for some
outrageously unsafe code.

Parrot and Kakapo should have very similar mops when using the
same dispatcher. You all know what a Parrot add op looks like.
Here's the Kakapo add op:

op_add:
STORE(kvm_int32, pc[1]) = FETCH(kvm_int32, pc[2]) +
  FETCH(kvm_int32, pc[3]);
pc += 4;
NEXT_OP;

Ok, ok. You want to know what those macros do... ;)

op_add:
*(kvm_int32 *)(frame[pc[1].word.hi] + pc[1].word.lo) = 
   *(const kvm_int32 *)(frame[pc[2].word.hi] + pc[2].word.lo) +
   *(const kvm_int32 *)(frame[pc[3].word.hi] + pc[3].word.lo);
pc += 4;
goto *(pc-i_addr);

I haven't counted derefs, but Parrot and Kakapo should be close.
On architectures with very slow word instructions, some code bloat
to store hi/lo offsets in native ints might be worth faster
address calculations.

 Ken Fox wrote:
  One thing I learned is that it's not necessary (or
  desirable) to do enter/exit scope ops.
 
 Don't forget that you'll need those for higher-level constructs. For
 example, this code:
 
{
   my Dog $spot is color('brindle'):breed('welsh corgi');
}
 
 will need to call Dog's constructor and attribute setting code every time
 you enter that scope.

Definitely. I didn't say Kakapo doesn't have enter/exit scope
semantics -- it does. There's no byte code enter scope op though.
What happens is more declarative. There's a sync_scope guard op
that means the VM must be in lexical scope X to properly run the
following code. If the VM is already in scope X, then it's a nop.
If the VM is in the parent of X, then it's an enter scope. If the
VM is in a child of X, then it's an exit scope.

This makes it *very* easy for a compiler to generate flow control
instructions. For example:

{
   my Dog $spot ...

   {
  my Cat $fluffy ...

middle: $spot-chases($fluffy);

   }
}

What happens when you goto middle depends on where you started.
sync_scope might have to create both Dog and Cat scopes when code
jumps to the middle. Or, code might already be in a sub-scope of
Cat, so sync_scope would just pop scopes until it gets back to Cat.

This is where sync_scope is very useful. It allows the compiler
to say this is the environment I want here and delegates the job
to the VM on how it happens.

 You also potentially need to allocate a new scope object every time you
 enter a scope so you can remember it properly if any closures are created.

Closures in Kakapo are simple. All it needs to do is:

1. copy any current stack frames to the heap
2. copy the display (array of frame pointers) to the heap
3. save the pc

Step #1 can be optimized because the assembler will have a pretty
good idea which frames escape -- the run-time can scribble a note
on the scope definition if it finds one the assembler missed.
Escaping frames will just be allocated on the heap to begin with.

This means that taking a closure is almost as cheap as calling
a subroutine. Calling a closure is also almost as cheap as
calling a subroutine because we just swap in an entirely new
frame display.

 How does this handle nested copies of a single scope? That's the spot a SS
 architecture needs to switch to indirect access from direct, otherwise you
 can only have a single instance of a particular scope active at any one
 time, and that won't work.

Calling a subroutine basically does this:

1. pushes previous return state on the stack
2. sets the return state registers
3. finds the deepest shared scope between caller and callee's parent
4. pushes the non-shared frames onto the stack
5. transfers control to the callee
6. sync_scope at the callee creates any frames it needs

 I'm curious as to whether the current bytecode could be translated on load
 to something a SS interpreter could handle.

Never thought of that -- I figured the advantage of an SS machine
is that brain-dead compilers can still generate fast code. Taking
a really smart compiler generating register-based code and then
translating it to an SS machine seems like a losing scenario.

I think this is why storage-to-storage architectures have lost
favor -- today's compilers are just too smart. Possibly with a
software VM the memory pressure argument favoring registers isn't
strong enough to offset the disadvantage of requiring smart
compilers.

I just put up the 0.2 version of Kakapo at
http://www.msen.com/~fox/Kakapo-0.2.tar.gz

This version has the sync_scope instruction, threaded dispatch,
immediate mode operands, and a really crappy rewrite technique
for instruction selection.

One other thing that I discovered is how sensitive the VM is
to dereferences. Adding the immediate mode versions of add and
cmp gave me 10 more mops

Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
 that is good. i wasn't disagreeing with your alternative architecture.
 i was just making sure that the priority was execution over compilation
 speed.

I use a snazzy quintuple-pass object-oriented assembler written
in equal parts spit and string (with a little RecDescent thrown in for
good measure). A real speed demon it is... ;)

The real motivation of my work is to see if a storage-to-storage
machine ends up using cache better and with less compiler effort
than a register machine. When I read about CRISP, the first thing
that came to mind was the top-of-stack-register-file could be
simulated exactly with high-speed cache in a software VM. Dropping
the stack-machine instructions in favor of Parrot's 3 operand ones
made it sound even better.

 ... then be mmap'ed in and run with hopefully impressive speed.

I'm impressed with the possibilities of the pbc-C translator. The
core modules on my system probably won't be mmap'ed byte code -- they'll
be mmap'ed executable. Reducing memory foot-print this way might take
some of the pressure off the need to share byte code. Lots of really
nice optimizations require frobbing the byte code, which definitely
hurts sharing.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
 and please don't bring in hardware comparisons again. a VM design
 cannot be compared in any way to a hardware design.

I have absolutely no idea what you are talking about. I didn't
say a single thing about hardware. My entire post was simply about
an alternative VM architecture. It's not a theory. You can go get
the code right now.

I'm just messing around on a storage-to-storage VM system I've
named Kakapo. It's a dead-end. A fat, flightless, endangered kind
of parrot. It's fun to experiment with ideas and I hope that good
ideas might make it into Parrot.

- Ken



Re: Improved storage-to-storage architecture performance

2001-10-29 Thread Ken Fox

Uri Guttman wrote:
 so my point is the the speed of the VM is a separate issue from the ease
 of code generation. an S2S VM would be easier to code generate for but
 may be slower to run. the speed difference is still an open point as dan
 has said. but since his goal is execution speed, that will determine the
 best parrot design, not ease of code generation.

Absolutely. Which is why I posted my numbers and my code.

The other thing to consider is that Perl is still a compile-on-the-fly
system. I hope Perl 6 keeps this aspect of Perl instead of moving to a
more Javaesque development environment.

This means time spent doing dataflow analysis to help with register
allocation is going to eat into perceived execution speed. Parrot might
fly, but if the Perl 6 compiler is slow, the user experience will be
poor. Hopefully the compiler will run quickly and generate decent code.
A separate super-optimizing compiler can be used for important things
like CGI.pm. ;)

- Ken



Interesting experiment with lexical scoped VM

2001-10-21 Thread Ken Fox

A while back I wondered if a higher-level VM might be
useful for implementing higher-level languages. I
proposed a lexically scoped machine without registers
or stacks. The response wasn't encouraging.

A quick tour through the library turned up a few
machine designs that sounded very similar to what I
was thinking. There's even a name for them: storage
to storage architectures. The one I found the most
literature on was CRISP. ATT based the Hobbit chips
off the CRISP design.

CRISP is a RISC architecture that does not have
named registers. It maps the top of the stack to a
large register file and provides a few addressing modes
to access the stack. This hugely simplifies a compiler
because it doesn't have to worry about register
conventions -- especially it doesn't have to worry
about saving/restoring register sets in function calls.
The effect is sort of like custom-fit register windows.

One thing that makes these particularly well suited
for a software VM is that they never copy data.
A software register architecture needs to be *very*
smart about register usage, otherwise all it's doing
is loading two copies of everything into L2.

I became interested enough that I implemented a
throw-away VM based on the concept. It's a Perl 5
module that implements a lexically scoped, storage
to storage VM. Parts of the system ended up being more
of a framework -- there's a parser, assembler, byte-code
generator and VM all integrated together. It's very
easy to hack on.

I'm calling the module Kakapo. The Kakapo is the
heaviest parrot in the world. It's critically endangered
because the *flightless* bird uses a stand and freeze
defense to evade predators.

As you can see I have high hopes for this code. ;)

I realize I'm too slow to have any direct impact on
Parrot, but maybe some of the ideas will find a home.

Here are a couple factorial examples in Kakapo:

; code a stupid compiler might generate

.begin
fact1:arg   L0, 0
  cmp   L1, L0, 1
  brne  L1, else
  ret   1
else: sub   L2, L0, 1
  jsr   L3, fact1, L2
  mul   L4, L0, L3
  ret   L4
.end

; code a smart compiler might generate

.begin
fact2:arg   L0, 0
  sub   L1, L0, 1
  bre   L1, done
  jsr   L1, fact2, L1
  mul   L0, L0, L1
done: ret   L0
.end

The .begin and .end commands are assembly
directives that introduce a lexical scope. Every
time a scope is entered (via a subroutine call
or branch) the VM compares the current lexical
scope state to the one the instruction requires.
The VM will automatically sync up. This works
almost exactly like make-your-own register
windows. Here's an example:

  .begin
 set L0, 1
 .begin
set L0, L0@1
 .end
  .end

The inner scope has it's own register set -- it
only uses L0 so the set only has one register
in it. The L0@1 syntax asks for the L0
register in the parent scope. (This isn't the
calling scope like the way register windows work
on the Sparc -- this is lexical scoping. The
arg op is used to reference the caller's scope.
It treats the argument list of the jsr op as
an export statement. The arg op imports from
that list.)

Using scopes allows nested subroutines to be
written very easily:

.begin
tens: arg   L0, 0
  set   L1, 10
  set   L2, 0

  jmp   test
next: jsr   L2, add, L2
  sub   L0, L0, 1
test: cmp   L3, L0, 0
  brg   L3, next
  ret   L2

  .begin
add:arg L0, 0
add L0, L0, L1@1
ret L0
  .end
.end

Here tens is the address of a subroutine that
takes one input argument and return 10x the value.
It calls the add subroutine in a loop. add
takes one argument and adds L1@1 -- or the L1
register in the parent lexical scope. If you
could write nested subroutines in Perl, the code
would look about like this:

 sub tens {
 my $n = $_[0];
 my $i = 10;
 my $r = 0;

 sub add {
return $_[0] + $i
 }

 while ($n  0) {
$r = add($r);
$n--
 }

 return $r
 }

Another cool thing about scopes is that they can
be marked static. (This is how global and module
global scopes work.) When a scope is static -- even
if it's in the middle of a bunch of dynamic scopes --
the VM will re-use a single instance of the scope.
Here's a simple example:

  .begin
 set L0, 1
 .begin static
add L0, L0, L0@1
 .end
  .end

The inner scope is static so L0 is actually an
accumulator that will keep incrementing by the value
of the parent's L0. This is pretty handy for
implementing class and function static variables.

Performance is slower than Parrot on some things, but
faster on others. For example, Kakapo is about 50%
slower on tight loops like:

  ; Kakapo
  loop1:  addL0, L0, 1
  cmpL1, L0, 1
  brlL1, loop1

 # Parrot
  REDO:   eq I2, I4, DONE
  

Re: String API

2001-09-11 Thread Ken Fox

Simon Cozens wrote:
 On Mon, Sep 10, 2001 at 08:38:43PM -0400, Ken Fox wrote:
  Have you guys seen Topaz?
 
 I may have heard of it, yes.

That's it? You're rejecting all of that work without
learning anything from it? Building strings on buffers
looked like a really good idea.

In general I think Parrot is missing critical abstractions.
The string/buffer relation is one. Others are the use of
stacks in the interpreter and the dispatch of opcodes in
runops(). This code is going to be very difficult to work
with because assumptions of the data structures are made
in lots of different places.

IMHO this is the main reason why Perl 5 is difficult to
understand -- and Parrot is repeating it.

  The other major suggestion I have is to avoid void *
  interfaces.
 
 I'm using void * to avoid char *. :)
 ...
 Look at the code for string_make. If you're already
 passing *in* a UTF32 buffer, why do we need any special
 processing for UTF32?

My point is that a void * API turns off all compiler
type checking -- it's unsafe as a public API. I have looked
at string_make() and it doesn't do any special processing
based on encoding. It *requires* that the raw bits are
compatible with the encoding.

The only reason I'm bringing this up is that the whole
intent behind handling platform encodings is to integrate
better with user environments. A really nice interface
would type check the arguments when making strings so
that a user sees a type mismatch error when compiling.
For example, on Win32, Parrot might use string_make_utf16
take only LPWSTR.

I don't know much about encodings, but I do know that
type casts make perl really difficult to understand. Casts
suck. They're the goto for data structures.

 No, that's a really bad way to do it. We can special-case
 things like UTF16 to UTF32.

Ok. Are you planning on compiling in all the encodings
or can a module dynamically load one? You might want a
slow-but-standard encoding conversion algorithm anyways.

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-11 Thread Ken Fox

Bryan C. Warnock wrote:
 On Monday 10 September 2001 09:30 pm, Dan Sugalski wrote:
  gotos into scopes might not be allowed.
 
 That's how it currently is for most scopes, and it certainly saves a
 whole lot of trouble and inconsistencies.

I'm not sure I understand what you mean. Perl 5 allows us to enter
and leave scopes with goto:

  sub foo {
my($x, @y) = @_;

if ($x) {
  goto START;
}

while (@y) {
  $x = shift @y;
  START:
  print $x\n;
}
  }

And of course the keywords last, next and redo are just restricted
gotos. Those are able to leave one or more scopes.

It isn't clear to me that explicit ops are the way to go
for scope management -- that seems to shift the burden of scope
management onto the compiler. A declarative interface between
the compiler and interpreter might be best. The compiler will
probably need that for communicating debugging info anyways.

- Ken



Re: String API

2001-09-11 Thread Ken Fox

Dan Sugalski wrote:
 If you're speaking of multiple buffers for a string or something like that,
 you're looking at too low a level. That's something that should go in the
 variables, not in the string bits. (We do *not* want all string ops slow to
 support flexibility of this sort. Only the bits that need it)

I think of buffers as more primitive than strings. They are segments of
bits. They can be copied, extended, garbage collected, etc. They don't
have character encodings. A string is just a buffer with a few more
semantics. PMCs might implement split strings or some higher abstraction
using several parrot_strings.

Perhaps you intend to make parrot_string.bufstart hold a buffer structure.
That'd be good, except that now parrot_string.buflen is not really owned
by the string anymore -- the buffer structure is probably going to need to
mess around with the string's internals. That increases the coupling between
buffers and strings. Coupling is bad. ;)

 Others are the use of
 stacks in the interpreter and the dispatch of opcodes in
 runops().
 
 So what's wrong with this?

Lots of duplicated code in register.h. We really need a stack template.

The interpreter knows the internals of the stack structure and is
responsible for managing it. To change the stack implementation, we'll
have to carefully examine all the interpreter code. The semantics of
the stack aren't real clear either. For example, when changing the
stack code what invariants must be preserved? If the stack had an API
the interpreter used I think this would be much clearer.

runops() is hard-wired with assumptions on the ops. What if we want
to generate a switch-based runops()? How much code is going to make
the same assumption that runops() does now? runops() should be generated
similarly to the opcode table. There should be an API for calling ops
so we don't have to jump through the same hoops required for perl 5.

 Compiler type checking isn't going to work for us at this level--we'd need
 runtime checking for it to work properly.

I'm talking about the interface between Parrot and user code -- the
XS layer in Perl 5. Sometimes compile-time checking is possible. We
should try to take as much advantage of those situations as possible.
For example, in Win32 we *know* that certain things are 16 bit Unicode.
If we try to build a utf8 parrot_string with them, the compiler should
barf.

 String encodings will be dynamically loadable. I'm going to add
 documentation for those ops in soon. (I was hoping today or tomorrow, but
 things are kinda screwy here at the moment)

Simon just said the opposite. Does this mean that the C and Unicode
encodings will be built-in and everything else dynamically loadable?

- Ken



PMC vtable vs other overloading techniques

2001-09-10 Thread Ken Fox

Simon Cozens wrote:
 FWIW, it's just dawned on me that if we want all of these things to be
 overloadable by PMCs, they need to have vtable entries. The PMC vtable
 is going to be considerably bigger than we anticipated.

Surely you don't expect the PMC vtable to be the only mechanism
for overloading?!

Perl 6 has the option of:

* emiting Perl 6 specific opcodes
* over-riding built-in opcodes in a lexical scope
* generating method calls instead of ops
* replacing the dispatcher in a lexical scope

We've talked about all of those options. Have you eliminated some
of them?

One of the things that confuses me is how compiler writers will
decide to use one code generation technique over another. For example,
if the Perl back-end generates a method call for sin() and Python
generates a custom op, the two won't interoperate. IMHO this needs to
be nailed down soon or we'll end up with a messy architecture.

Just because Perl is TMTOWTDI doesn't mean Parrot should be.

- Ken



Re: String API

2001-09-10 Thread Ken Fox

Simon Cozens wrote:
 =head1 The Parrot String API

Have you guys seen Topaz? One of many things I think Chip
did right was to build strings from a low-level buffer
concept. This moves memory management (and possibly raw-io)
out of the string class and into the buffer class.

The other major suggestion I have is to avoid void *
interfaces. Do we really need a string_make() that takes
the encoding enum? Aren't encodings different enough that
we'd want string_make_ascii(), string_make_utf32(), ...
Maybe I've been doing C++ too long, but taking advantage
of compiler type checks as much as possible seems like an
excellent goal -- especially since the various string_*
functions are going to be visible in user code.

The use of an encoding enum seems a little weird, but once
you explain why it will probably make sense. Right now the
only thing it seems good for is the transcoding system --
everything else is slower and vtables are more cumbersome
because you have to manage a vtable[num_encodings] array.

I'd make a string's encoding a direct pointer to its'
vtable struct. The transcoding algorithm seems like it
could be implemented using a string iterator on the source
with a character-by-character append to the destination.
We would need an abstract character object, but that
seems necessary anyway. (Perl doesn't have characters,
but does Perl 6 or Python?) The destination can be
pre-extended to the length of the source to avoid
fragmenting memory. How many encoding specific
optimizations are there? Is it worth having a table of
custom transcoders instead of using a generic algorithm?

- Ken



Re: Patch to assembler/disassembler + parrot asm inconsistancies

2001-09-10 Thread Ken Fox

Bryan C. Warnock wrote:
 On Monday 10 September 2001 06:23 pm, Dan Sugalski wrote:
  When we run out, we repeat the innermost type.
 
 Why are you doing right-to-left instead of left-to-right?

Because it would be harder to repeat the innermost type then? ;)
Most binary ops will take identical inner args. This is just
a readability optimization to avoid things like _i_i.

Avoiding type extensions completely seems best if the compiler
and disassembler are smart enough. I'm not sure we'll be able
to do that though -- more complex addressing modes may lose
type info. (But that assumes someday we get more complex
addressing modes... ;)

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-10 Thread Ken Fox

Dan Sugalski wrote:
 At 05:41 PM 9/10/2001 -0400, Ken Fox wrote:
  You're expecting the current lexical scope to be carried implicitly
  via the PC?
 
 No, it'll be in the interpreter struct.

But how does the interpreter know where a lexical scope begins
and ends in the bytecode? For example, a jump FOO might change
scopes. How is the scope discovered?

 This only lets us use multi-methods where arg0 is an object. Is
 this sufficient for implementing Perl 6?
 
 Ummm... how on earth do you plan on calling a method of something that's
 not an object? Methods sorta require them... :)

sub add(int x, int y) : multi { ... }
sub add(int x, num y) : multi { ... }
sub add(num x, num y) : multi { ... }

That makes sense to me. My syntax is probably wrong, but the intention
is clear. I'd be surprised if Damian doesn't want this.

- Ken



Re: Parrot 0.0.1 is released.

2001-09-10 Thread Ken Fox

Jeffrey Coleman Carlyle wrote:
 Am I missing something (well, clearly I am), but are test.pasm and
 test2.pasm missing from the CVS repository?

Look in ./t

- Ken



Re: PDD 6: Parrot Assembly Language

2001-09-10 Thread Ken Fox

Dan Sugalski wrote:
jump FOO
 
 doesn't change scope.
 
newscope scope_template_in_fixup_section
 
 does. And
 
exitscope
 
 leaves one. :)

Ok. That clears it up a little. The current scope is part of
the VM internal state and compilers need to generate state
change instructions if they use lexicals. (Actually, I hope
they only need state change instructions if they need the
scope's symbol table...)

Just to check my understanding, Perl 6 will compile

  { bar { foo } blah }

into

   newscope
   bar
   newscope
   foo
   exitscope
   blah
   exitscope

What happens with:

   goto FOO; { bar { FOO: foo } blah }

Is goto responsible for figuring out it has entered bar's
scope and setting the VM state so that the exitscopes are
properly balanced?

- Ken



Re: What's up with %MY?

2001-09-06 Thread Ken Fox

Dan Sugalski wrote:
 ... you have to take into account the possibility that a
 variable outside your immediate scope (because it's been defined in an
 outer level of scope) might get replaced by a variable in some intermediate
 level, things get tricky.

Other things get tricky too. How about when the compiler
optimizes away a lexical with a constant? How about when
the compiler optimizes away a sub call because it's side-
effect-free and called with a constant? What about dead
code elimination? What about when the compiler selects a
register-based call op because of the prototype and then
the sub gets replaced with an incompatible sub at run-time?
What about inlining?

We're not just talking symbol table frobbing. The whole ball
of wax is on the table.

 Personally, I'd argue that it should print 'B'.

I totally agree. What's the point in injecting *broken* lexicals?!

Yeah, I can see it now. Perl 6 has three kinds of variables:
dynamically scoped package variables, statically scoped lexical
variables and Magical Disappearing Reappearing Surprise Your
Friends Every Time variables. Oh, and by the way, lexicals
are really implemented using Magical Disappearing Reappearing
Surprise Your Friends Every Time variables, so I guess we only
have two kinds of variables...

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dave Mitchell wrote:
 The Perl equivalent $a = $a + $a*$b requires a
 temporary PMC to store the intermediate result ($a*$b). I'm asking
 where this tmp PMC comes from.

The PMC will stashed in a register. The PMC's value will be
stored either on the heap or in a special memory pool reserved
for temps. (I'm guessing we won't have a real generational
garbage collector, but we will be able to know when/if a temp
is destroyed at block exit.)

Dan can say for sure since he's read the code. (nudge, nudge ;).

BTW, I think we will be able to optimize this code in some
instances to use the floating point registers instead of the
PMC registers. (This is the main reason I'm totally against
run-time modification of the current scope -- essentially
we'd have to treat *everything* as a PMC and we lose all of
our optimization possibilities.)

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dave Mitchell wrote:
 So how does that all work then? What does the parrot assembler for
 
   foo($x+1, $x+2, , $x+65)

The arg list will be on the stack. Parrot just allocates new PMCs and
pushes the PMC on the stack.

I assume it will look something like

  new_pmc pmc_register[0]
  add pmc_register[0], $x, 1
  push pmc_register[0]

  new_pmc pmc_register[0]
  add pmc_register[0], $x, 2
  push pmc_register[0]

  ...

  call foo, 65

It would be nice if we knew the lifetime of those temps so that
we could optimize the allocation. In Perl 5, closures don't capture
@_ -- I hope Perl 6 won't capture them either. So the only thing
we need to worry about is code taking a reference to @_. That
should be something the compiler can catch.

Hmm. It didn't occur to me that raw values might go on the call
stack. Is the call stack going to store PMCs only? That would
simplify things a lot.

- Ken



Re: An overview of the Parrot interpreter

2001-09-06 Thread Ken Fox

Simon Cozens wrote:
 I want to get on with writing all the other documents like this one, but
 I don't want the questions raised in this thread to go undocumented and
 unanswered. I would *love* it if someone could volunteer to send me a patch
 to the original document tightening it up in the light of this thread.

Sure. I can do that while *waiting patiently* for Parrot to be
released. ;)

- Ken



Re: An overview of the Parrot interpreter

2001-09-06 Thread Ken Fox

Paolo Molaro wrote:
 If anyone has any
 evidence that coding a stack-based virtual machine or a register one
 provides for better instructions scheduling in the dispatch code,
 please step forward.

I think we're going to have some evidence in a few weeks. I'm not
sure which side the evidence is going to support though... ;)

Eric Raymond posted on python-dev that he's doubtful, but has
a wait and see approach. Seems sensible. I think even Dan would
say he's just hopeful, not commited.

 I believe that a stack-based machine will have roughly the same
 performance when interpreted as a register-based machine, but
 it easily allows to take a step further and JIT compile the bytecode
 to machine code.

I don't really see much difference. You don't have to map the VM
registers to hardware registers to pick up a lot of speed. If the
JIT wanted to, it could have an optional peep-hole optimization
pass. That actually sounds easier to do with a register-based VM
than a stack-based one. (Of course the run-time environment is a
big wild-card here. We need a fast interface between the dispatcher
and the run-time. That's going to want registers too.)

 With the difference that the registers are malloc()ed while the eval
 stack in a stack machine is in the actual cpu stack.

Is there really a difference in memory access between the heap
and the stack? I've alway thought a page is a page and it doesn't
matter where in memory the page is. I'm not a hardware guy though...

Allocating register pools on the heap (making sure that malloc()
is used sensibly), might be faster if you want your VM to handle
continuations and co-routines. Check out stackless Python for
a good example. I'm not sure if Appel was the first, but he has
written quite a bit about the advantages of allocating activation
records on the heap. (He also points out that a garbage collector
can make heap allocation as fast as stack allocation.)

- Ken



Re: pads and lexicals

2001-09-06 Thread Ken Fox

Dan Sugalski wrote:
  Dan Sugalski [EMAIL PROTECTED] wrote:
   Where do they come from? Leave a plate of milk and cookies on your back
   porch and the Temp PMC Gnomes will bring them. :)

 Bad Dan! No cookie for me.

You aren't fooling anybody anymore... You might just as well stop the
charade and write Dan The Temp PMC Gnome Sugalski in your sig. ;)

At least we know where temps *really* come from now...

- Ken



Re: What's up with %MY?

2001-09-06 Thread Ken Fox

Dan Sugalski wrote:
 At 02:05 PM 9/6/2001 -0400, Ken Fox wrote:
 You wrote on perl6-internals:
 
 get_lex P1, $x  # Find $x
 get_type I0, P1 # Get $x's type
 
 [ loop using P1 and I0 ]
 
 That code isn't safe! If %MY is changed at run-time, the
 type and location of $x will change. You really need to put
 the code for finding $x *inside the loop*.
 
 Only if $x is active. (i.e. tied) In that case we need to do some other
 things as well. I was assuming the passive case, for which the code was
 valid since there wasn't any way for it to be changed.

Could you compile the following for us with the assumption that
g() does not change its' caller?

  sub f {
my $sum = 0;
for (0..9) {
  $sum += g()
}
$sum
  }

Now what if g() is:

  sub g {
my $parent = caller().{MY};
my $zero = 0;
$parent{'$sum'} = \$zero;
1
  }

What if g() *appears* to be safe when perl compiles the loop, but
later on somebody replaces its' definition with the scope changing
one? Does perl go back and re-compile the loop?

The compiler could watch for uses of %MY, but I bet that most
modules will eventually use %MY to export symbols. Can the
compiler tell the difference between run-time and compile-time
usage of %MY?

 Now, granted, it might be such that a single uses string eval or uses
 MY in the program shuts down optimization the same way that $ kills RE
 performance in perl 5, but we are in the position of tracking that.

To quote Timone: And you're okay with that?

- Ken



Re: What's up with %MY?

2001-09-06 Thread Ken Fox

Dan Sugalski wrote:
 I think you're also overestimating the freakout factor.

Probably. I'm not really worried about surprising programmers
when they debug their code. Most of the time they've requested
the surprise and will at least have a tiny clue about what
happened.

I'm worried a little about building features with global effects.
Part of Perl 6 is elimination of action-at-a-distance, but now
we're building the swiss-army-knife-of-action-at-a-distance.

What worries me the most is that allowing %MY to change at run-time
slows down code that doesn't do it. Maybe we can figure out how
to reduce the impact, but that's time IMHO better spent making
existing code run faster.

You wrote on perl6-internals:

   get_lex P1, $x  # Find $x
   get_type I0, P1 # Get $x's type

   [ loop using P1 and I0 ]

That code isn't safe! If %MY is changed at run-time, the
type and location of $x will change. You really need to put
the code for finding $x *inside the loop*.

Maybe we can detect a few cases when it's safe to move
get_lex out of a loop, but if the loop calls any subs or
non-core ops we're stuck.

- Ken



Re: what lexicals do?

2001-09-06 Thread Ken Fox

Dave Mitchell wrote:
 Can anyone think of anything else?

You omitted the most important property of lexical variables:

  [From perlsub.pod]

  Unlike dynamic variables created by the Clocal operator, lexical
  variables declared with Cmy are totally hidden from the outside
  world, including any called subroutines.  This is true if it's the
  same subroutine called from itself or elsewhere--every call gets
  its own copy.

- Ken



Re: What's up with %MY?

2001-09-05 Thread Ken Fox

[EMAIL PROTECTED] wrote:
 Clearly caller() isn't what we want here, but I'm not
 quite sure what would be the correct incantation.

I've always assumed that a BEGIN block's caller() will
be the compiler. This makes it easy for the compiler to
lie about %MY:: and use the lexical scope being compiled
instead of the compiler's lexical scope.

- Ken



Re: What's up with %MY?

2001-09-04 Thread Ken Fox

Damian Conway wrote:
 It would seem *very* odd to allow every symbol table *except*
 %MY:: to be accessed at run-time.

Well, yeah, that's true. How about we make it really
simple and don't allow any modifications at run-time to
any symbol table?

Somehow I get the feeling that *very* odd can't be
fixed by making the system more normal. ;)

 Is stuff like:

   %MY::{'$lexical_var'} = \$other_var;

 supposed to be a compile-time or run-time feature?
 
 Run-time.

A definition of run-time would help too since we have
things like BEGIN blocks. I consider it run-time if the
compiler has already built a symbol table and finished
compiling code for a given scope. Is that an acceptable
definition of run-time? This allows BEGIN blocks to
modify their caller's symbol tables even if we prohibit
changes at run-time.

Can we have an example of why you want run-time
symbol table manipulation? Aliases are interesting,
but symbol table aliases don't seem very friendly.
It would be simple to write:

  %MY::{'@point'} = [ $x, $y ];

But that probably won't work and using [ \$x, \$y ]
doesn't make sense either. What seems necessary is:

  %MY::{'$x'} = \$point[0];
  %MY::{'$y'} = \$point[1];

If the alias gets more complicated, I'm not sure the
symbol table approach works well at all.

 Modifying the caller's environment:
 
   $lexscope = caller().{MY};
   $lexscope{'die'} = die_hard;

This only modifies the caller's scope? It doesn't modify
all instances of the caller's scope, right? For example,
if I have an counter generator, and one of the generated
closures somehow has its' symbol table modified, only that
*one* closure is affected even though all the closures
were cloned from the same symbol table.

What about if the symbol doesn't exist in the caller's scope
and the caller is not in the process of being compiled? Can
the new symbol be ignored since there obviously isn't any
code in the caller's scope referring to a lexical with that
name?

 Between source filters and Inline I can do pretty much whatever I like
 to your lexicals without your knowledge. ;-)

Those seem more obvious. There will be a use declaration
I wrote and I already know that use can have side-effects on
my current name space. IMHO this could become a significant problem
as we continue to make Perl more expressive. Macros, filters,
self-modifying code, mini-languages ... they all make expressing
a solution easier, and auditing code harder. Do we favor
expression too much over verification? I'm not qualified to
answer because I know I'm biased towards expression. (The %MY
issues I'm raising mostly because of performance potential.)

 I would envisage that mucking about with symbol tables would be no more
 common in Perl 6 than it is in Perl 5. But I certainly wouldn't want to
 restrict the ability to do so.

We also want Perl 6 to be fast and cleanly implemented.

This particular issue is causing trouble because it has a big
impact on local variable analysis -- which then causes problems
with optimization. I'd hate to see lots of pragmas for turning
features on/off because it seems like we'll end up with a more
fragmented language that way.

 How am I expected to produce fresh wonders if you won't let me warp the
 (new) laws of the Perl universe to my needs?

You constantly amaze me and everyone else. That's never
been a problem.

One of the things that I haven't been seeing is the exchange
of ideas between the implementation side and the language side.
I've been away for a while, so maybe it's just me.

It vaguely worries me though that we'll be so far down the
language side when implementation troubles arise that it will
be hard to change the language. Are we going to end up with
hacks in the language because certain Very Cool And Important
Features turned out too hard to implement?

- Ken



Re: What's up with %MY?

2001-09-04 Thread Ken Fox

Damian wrote:
 Dan wept:
 I knew there was something bugging me about this.
 
 Allowing lexically scoped subs to spring into existence (and
 variables, for that matter) will probably slow down sub and
 variable access, since we can't safely resolve at compile time what
 variable or sub is being accessed. 
 
 Understood. And that's why you get the big bucks. ;-)

Efficiency is a real issue! I've got 30,000 lines of *.pm in my
latest application -- another 40,000 come from CPAN. The lines
of code run a good deal less, but it's still a pretty big chunk
of Perl.

The thought of my app suddenly running slower (possibly *much*
slower after seeing the semantics of Perl 6 lexicals) doesn't
make me real happy. IMHO it would fork the language, even if
the fork was only done with pragmas.

- Ken



Re: What's up with %MY?

2001-09-04 Thread Ken Fox

Damian wrote:
 In other words, everything that Exporter does, only with lexical
 referents not package referents. This in turn gives us the ability to
 easily write proper lexically-scoped modules.

Great! Then we don't need run-time lexical symbol table
frobbing. A BEGIN block can muck with its' callers symbol
table at compile time.

 I can image a Lexically::Verbose module, that modifies all variables and/or
 subroutines in a scope to report their own activity:
 
 while (whatever) {
 use Lexically::Verbose 'vars';

Another compile time example.

 In fact, you just invented a new pattern, in
 which a set of subroutines called within a scope can communicate invisibly
 but safely through that scope's lexical symbol table.

Hey, don't make me an accomplice in this... ;)

 Accessing lexicals will be no slower than accessing package variables
 is today.

Actually I'm not sure about that. Package variables only work well
because they have global definitions. Lexicals don't. IMHO in order
to have the speed of package variables, we'll have to make lexical
scope changes trigger a re-compile (at least a re-link) of the
affected code. Besides, I was hoping for Perl 6 lexicals to be a
great deal *faster* than package variables...

How much stuff currently depends on dynamic lexicals? (Ugh. Why
are we even *talking* about something that horrible.) If there were
a pragma to eliminate them, would it break much?

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Brent Dax wrote:
 Ken Fox:
 # Lexicals are fundamentally different from Perl's package (dynamically
 # scoped) variables.
 
 *How* are they fundamentally different?

Perl's local variables are dynamically scoped. This means that
they are *globally visible* -- you never know where the actual
variable you're using came from. If you set a local variable,
all the subroutines you call see *your* definition.

Perl's my variables are lexically scoped. This means that they
are *not* globally visible. Lexicals can only be seen in the scope
they are introduced and they do not get used by subroutines you
call. This is safer and a bit easier to use because you can tell
what code does just by reading it.

 But in this case the pad is actually a full symbol table.  The
 concept is the same, the data structure is different.

The concept isn't the same. local variables are globals. You
only have *one* of them no matter how many times you call a sub.
For example, threading or recursion might cause the same sub to
have several copies running at the same time. With global
variables (aka local variables) all the copies share the same
global variables. With my variables each copy of the sub gets
a brand new set of variables. This is known as an activation
record. THIS IS COMPLETELY UNRELATED TO A SYMBOL TABLE!

In both cases you have one symbol table. However, in
the case of my variables you have *many* values for each
variable. The particular value being used depends upon which
activation record is being used.

 There *is* run-time lookup in some contexts, such as a string eval.

String eval isn't a run-time lookup. The code is *compiled* and
then run. Also notice that string eval can't change the current
lexical scope. It can create a new inner scope, but it can't
introduce variables into the outer scope.

Basically anything that breaks scoping barriers goes against
the grain of lexical scoping. If an inner scope can modify its'
parent, you've just destroyed one of the major advantages of
lexical scoping.

We tolerate symbol table globs with local variables because
we've already admitted there's no hope of understanding what
something does just by reading the code. We haven't corrupted
my yet -- and I don't want to start!

 In the end, all I'm doing is suggesting an alternate implementation
 which should reduce our workload and make many concepts which currently
 don't work with lexicals work correctly.

Your proposal to use temp with flags to implement my doesn't
even work, let alone achieve either of your goals.

- Ken



Re: Should MY:: be a real symbol table?

2001-09-03 Thread Ken Fox

Brent Dax wrote:
 What I'm suggesting is that, instead of the padlist's AV containing
 arrays, it should contain stashes, otherwise indistinguishable from
 the ones used for global variables.

Lexicals are fundamentally different from Perl's package (dynamically
scoped) variables. Even if you could somehow hack package variables
to implement lexicals, it would waste space duplicating a symbol table
for each instance of a lexical scope.

 The simple way to emulate this is to make sure that no subroutine
 can see another's MY:: stash.

Right. Sounds a lot like a pad to me -- each instance of a scope (sub)
gets its' own copy of the variables. (By instance I mean activation
record, not the symbol table definition.)

 There is a possible caveat with inner blocks--how does an outer block
 get, er, blocked from accessing an inner block's my() variables?
 However, I think this isn't really that big a problem, and can easily be
 solved with properties:

You don't understand lexically scoped variables. There isn't
any run-time name lookup on a variable -- the compiler resolves all
access to a specific memory location (or offset). All your fancy
symbol table flag twiddling is slow, unreliable and unnecessary.

- Ken



Re: An overview of the Parrot interpreter

2001-09-03 Thread Ken Fox

Dan Sugalski wrote:
 For those of you worrying that parrot will be *just* low-level ops,
 don't. There will be medium and high level ops in the set as well.

I was going to cite http://citeseer.nj.nec.com/romer96structure.html,
but you guys have already read that, eh? ;)

One thing I was expecting was bytecode subroutines are given opcodes.
Will there be a difference? Can we replace built-in opcodes with Perl
subs?

- Ken



  1   2   >