Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Jeremy Howard

Leon Brocard wrote:
 Bradley M. Kuhn sent the following bits through the ether:

  It should be noted that in Larry's speech on Friday, he said that he
wanted
  to write the Lexer and Parser for Perl in some subset of Perl.  :)

 Is there a writeup somewhere for those who couldn't attend?

 Hmmm, I wonder what kind of subset would be necessary - surely the
 most useful constructs are also the most complicated...

We could learn quite a bit by looking through the code from
Parse::RecDescent, switch.pm, and friends. Damian's done a lot of parsing
(including parsing Perl) with Perl, so this would be a good place to start.

In terms of bootstrapping, however, we either need to:
 - Write the Perl subset in C (or some other portable language), or
 - Use Perl 5 as the 'Perl subset', and distribute that with Perl 6.

The 2nd of these options seems unlikely to be practical... Maybe however the
bootstrapper could be a subset of Perl 5 stolen fairly directly from the
existing code. Maybe this would then also become the Perl for small/embedded
devices.





Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Simon Cozens

On Tue, Oct 17, 2000 at 03:56:20AM -0400, Adam Turoff wrote:
  We could learn quite a bit by looking through the code from
  Parse::RecDescent, switch.pm, and friends. Damian's done a lot of parsing
  (including parsing Perl) with Perl, so this would be a good place to start.

It's time to drag out my quote of the week:

Recursive-descent, or predictive, parsing ONLY works on grammars
where the first terminal symbol of each subexpression provides
enough information to choose which production to use.

(Appel, emphasis mine.)

 Gisle and I were talking about this tonight, and it *might* be possible
 to write the Perl tokenizer in a Perl[56] regex, which is more easily 
 parsable in C.  All of a sudden, toke.c is replaced by toke.re, which
 would be much more legible to this community (which is more of a strike
 against toke.c instead of a benefit of some toke.re).  That would certainly
 qualify as implementing the Perl grammar in Perl, and might even be
 achievable.   (*gasp!*)

This would have to take account of the fact that Perl's tokeniser is
aware of what's going on in the rest of perl. Consider

print foo;

What should the tokeniser return for "foo"? Is it a bareword? Is it a
subroutine call? Is it a class? Is it - heaven forbid - a filehandle? 
Well, it could be any of these things. You have to choose.

So, while I don't doubt that, with the state of Perl's regexes these
days, it's possible to create something with enough sentience to
tokenize Perl, I've really got to wonder whether it's sane.

-- 
BEWARE!  People acting under the influence of human nature.



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Nicholas Clark

On Tue, Oct 17, 2000 at 11:00:35AM +0100, Simon Cozens wrote:
 On Tue, Oct 17, 2000 at 10:37:24AM +0100, Simon Cozens wrote:
  What should the tokeniser return for "foo"? 
 
 Uh, tokenizer != lexer. Insert coffee. Yes, writing a tokeniser in a regexp
 should be very doable.

To allow the lexer to influence the tokeniser, what characters are we
going to use in (? ) for smoke and mirrors extensions? (?s) and (?m) are
already taken.

[Seriously, I was under the impression that the perl tokenizer was
influenced by the state of the lexer]

Nicholas Clark



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Simon Cozens

On Tue, Oct 17, 2000 at 11:22:02AM +0100, Nicholas Clark wrote:
 [Seriously, I was under the impression that the perl tokenizer was
 influenced by the state of the lexer]

Currently, the tokeniser and the lexer are a combined entity. It doesn't have
to be this way, though. At least, I don't think it does, until you're allowed
to define your own special variables which I sincerely hope won't happen.

(Quick, how would you parse ""?)

To be perfectly honest, my preferred solution would be to have the tokenizer,
lexer and parser as a single, hand-crafted LR(k) monstrosity.

-- 
"So i get the chance to reread my postings to asr at times, with a
corresponding conservation of the almighty leviam00se, Kai Henningsen."
-- Megahal (trained on asr), 1998-11-06



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread John Porter

Simon Cozens wrote:
 
 Currently, the tokeniser and the lexer are a combined entity. 

Yes, in the vast majority of languages; so people get used to thinking
that it has to be this way.


 my preferred solution would be to have the tokenizer,
 lexer and parser as a single, hand-crafted LR(k) monstrosity.

This is a case of me agreeing with Simon 1000%.
I was going to just let it go by, but I thought it might 
be nice to add my aol/ for a change.

-- 
John Porter




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Ken Fox

Simon Cozens wrote:
 It's time to drag out my quote of the week:
 
 Recursive-descent, or predictive, parsing ONLY works on grammars
 where the first terminal symbol of each subexpression provides
 enough information to choose which production to use.

Recursive-descent parsers are nice because they are *much* easier to
generate errors with. They are also much easier to generate segmented
grammars which is nice for something like Perl because there are so
many quiet shifts into several different sub-languages.

The only real problem is prediction and that is *easily* solved with
look-ahead and/or back-tracking. IMHO back-tracking is preferable,
especially if there are cut-points where the search tree can be pruned.
I think it's very powerful to think of a grammar as a declarative
program which searches for the best-fit between itself and the input
stream.

 So, while I don't doubt that, with the state of Perl's regexes these
 days, it's possible to create something with enough sentience to
 tokenize Perl, I've really got to wonder whether it's sane.

I think the goal would be to increase the power of the regexes to
handle Perl grammar. This could be the coolest language tool since
yacc. (I'm intentionally not comparing Perl's regex to lex. We shouldn't
make the same stupid mistake as lex/yacc by splitting a language into
a token specification and a grammar with incompatible syntax.)

- Ken



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Dan Sugalski

At 10:22 AM 10/17/00 -0400, John Porter wrote:
Simon Cozens wrote:
 
  Currently, the tokeniser and the lexer are a combined entity.

Yes, in the vast majority of languages; so people get used to thinking
that it has to be this way.

I'd just as soon we thought a bit differently. I'm not sure we want to 
split the lexer and tokenizer out, but I don't want to rule out the 
possibility. It's looking like a goodly portion of the 
lexing/tokenizing/parsing bit of perl 6 will be written in perl, so I'm not 
sure how things are going to split out just yet.

I don't suppose anyone's got code to translate a perl regex into C?

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Nicholas Clark

On Tue, Oct 17, 2000 at 01:18:39PM -0400, Ken Fox wrote:
 The other down-side is that we'd be doing a whole lot of custom work designed
 just for parsing Perl instead of creating something more general and powerful
 that can be used for other problems as well. For example, I'd imagine the PDL
 folks would much rather extend a recursive-descent parser with back-tracking
 than an LR(k) monstrosity.

Not that I know anything about how to write a parser. But anecdotal evidence
from all the syntax highlighting editors etc. is that perl is the hardest
thing to parse. Hence if perl6 contains a generic parser powerful enough to
parse perl (if this can be done), to me this would suggest that it would
allow a lot of other people to use it to rapidly implement parsers for just
about anything else.

Nicholas Clark



Re: A tentative list of vtable functions

2000-10-17 Thread Ken Fox

"ye, wei" wrote:
 One C++ problem I just found out is memory management.  It seems
 that it's impossible to 'new' an object from an specified memory block.
 So it's impossible to put free'd objects in memory pool and re-allocate
 them next time.

Stuff like that isn't the problem with using C++. (In fact a class can
provide its own allocator. You can even provide a global allocator if
you like to live dangerously.)

The trouble is that the object model for C++ isn't exactly the object
model (I mean at the internals level like "SV") for Perl. That means we
still have to do a lot of work to get Perl to work right (and even more
work to defeat some C++ compiler assumptions to get Perl to work fast).

Another big problem with C++ is lack of internal documentation and
object code standards. Some of Perl's dynamic module loading capability
would be complex using C++ -- and possibly impossible with code built
by different compilers.

I think the general idea is that the advantages of C++ don't move us
far enough out of our comfortable local minimum to make it worthwhile.

- Ken



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Simon Cozens

On Tue, Oct 17, 2000 at 01:18:39PM -0400, Ken Fox wrote:
 Those are hard to understand because so much extra work has to be done to
 compensate for lack of top-down state when doing a bottom-up match.

I haven't found this to be true.

 Since Perl is much more difficult than C++ to parse...

Perl is essentially a natural language processing problem, and so I'd think
twice before hitting it with a pure computer science solution.

Come on, guys, take a look at toke.c; there's a *probabilistic* part-of-speech
tagger in there. This is NLP, and we do things differently.

Have you read this?
ftp://ftp.cs.titech.ac.jp/pub/TR/93/TR93-0003.ps.gz

-- 
This process can check if this value is zero, and if it is, it does
something child-like.
-- Forbes Burkowski, CS 454, University of Washington



Re: A tentative list of vtable functions

2000-10-17 Thread Dan Sugalski

At 01:34 PM 10/17/00 -0400, Ken Fox wrote:
I think the general idea is that the advantages of C++ don't move us
far enough out of our comfortable local minimum to make it worthwhile.

Yup, that pretty much covers it. C++ also has an awful lot of stuff in it 
that, while interesting, is too likely to be misused, or be really, really 
unfamiliar to too many people. (The perl guts are going to present enough 
problems for folks without adding all of C++'s potential idiosyncracies to 
the mix)

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Larry's ALS talk

2000-10-17 Thread Jeff Okamoto

For just a split second, I thought Larry was talking about amyotrophic
lateral sclerosis, commonly known as "Lou Gehrig's disease" in America
and "motor neuron disease" in Great Britain.  Dr. Stephen Hawking of
Cambridge is certainly among the most famous sufferers of this disease.

Okay Larry, for the next Perl Conference use the theme of diseases,
bacterial, viral, fungal, and so on. :-)

Jeff



Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Bradley M. Kuhn

Adam Turoff wrote:

 to write the Perl tokenizer in a Perl[56] regex, which is more easily 
 parsable in C.  All of a sudden, toke.c is replaced by toke.re, which
 would be much more legible to this community (which is more of a strike
 against toke.c instead of a benefit of some toke.re).

Larry brought this up in his talk.  Of course, I believe that Larry was
sleep-deprived at the time, too.  ;)

 It was late though.  Might have been sleep deprevation talking.


-- 
Bradley M. Kuhn  -  http://www.ebb.org/bkuhn

 PGP signature


Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Adam Turoff

On Tue, Oct 17, 2000 at 07:18:54PM -0400, Bradley M. Kuhn wrote:
 Adam Turoff wrote:
  to write the Perl tokenizer in a Perl[56] regex, which is more easily 
  parsable in C.  All of a sudden, toke.c is replaced by toke.re, which
  would be much more legible to this community (which is more of a strike
  against toke.c instead of a benefit of some toke.re).
 
 Larry brought this up in his talk.  Of course, I believe that Larry was
 sleep-deprived at the time, too.  ;)
 
  It was late though.  Might have been sleep deprevation talking.

Dammit, I'm not finding the message in the thread, but someone casually
mentioned writing the important bits of parsing Perl in Perl5, generating
bytecode, and starting Perl6 by writing the bytecode loader.  (Apologies
for not finding the attribution.  Please stand up and elucidate if
you've had this idea as well.)

That approach does have a significant amount of merit.  Smalltalk, FORTH,
Lisp (etc.), and Java work in that manner.  That would pose a bootstrapping 
problem if there were no Perl5 to start with.  That should also aid in the
testing effort.

Z.




Re: Perl's parser and lexer will likely be in Perl (was Re: RFC 334 (v1) I'm {STILL} trying to understand this...)

2000-10-17 Thread Adam Turoff

On Tue, Oct 17, 2000 at 08:57:43PM -0400, Dan Sugalski wrote:
 On Tue, 17 Oct 2000, Adam Turoff wrote:
  Dammit, I'm not finding the message in the thread, but someone casually
  mentioned writing the important bits of parsing Perl in Perl5, generating
  bytecode, and starting Perl6 by writing the bytecode loader.  (Apologies
  for not finding the attribution.  Please stand up and elucidate if
  you've had this idea as well.)
 
 That would be me. I wasn't necessarily thinking of emitting p6 bytecode,
 though that's certainly possible. 

What's wrong with bootstrapping Perl6 with the Perl5 bytecode (or the
most interesting subset of Perl5 bytecode)?  That lets Perl6 start out
where the parser is read in as bytecode (or compiled to C from Perl or
bytecode) and modify those bytecodes as the need progresses.  Voila.  
No bootstrapping problem.  (e.g. start writing the Perl6 parser in Perl5).

  That should also aid in the testing effort.
 
 I hadn't thought about that, but it would. I'm thinking we need to set up
 a bunch of performance benchmarks for p6 development too, though that can
 go in as part of the general QA. (Not much Q there if we run slower...)

This came up before in another thread many moons ago (don't have
time to find the reference). IIRC, this was deemed an exercise in
premature optimization (i.e. EVIL!).

Z.