Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Randal L. Schwartz
 Luke == Luke Palmer [EMAIL PROTECTED] writes:

Luke But you don't really need to parse to syntax highlight, either.  You
Luke just need to tokenize.

Unfortunately, to tokenize, you also have to know the state of the parse.
As long as / is both divide and begin regex, you're toasted.

Please see my long post at on parsing perl in perlmonks at
http://www.perlmonks.org/index.pl?node_id=44722 for examples of
*why* you need to notice whether you have a divide or a regex match.

Perl is fundamentally resistant to lexing.  As in the beginning of
this thread, one of the RFCs suggested the possibility of making Perl
lexable, but apparently the designers said no, we think the / duality
is worth keeping.  And that seals the fate for Perl6 just like all
Perl before it.

To properly lex a Perl program (Perl6 included), you *must* execute
BEGIN blocks.  That's the end of that tune.  Anything else is just an
approximation.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Matthew Walton
Randal L. Schwartz wrote:
Luke == Luke Palmer [EMAIL PROTECTED] writes:

Luke But you don't really need to parse to syntax highlight, either.  You
Luke just need to tokenize.
Unfortunately, to tokenize, you also have to know the state of the parse.
As long as / is both divide and begin regex, you're toasted.
So you're saying that in Perl 6 it will be entirely impossible to 
determine if / appears as the division operator or as the beginning of a 
regex from a purely syntactic examination of the source code?

I'm finding that very, very hard to believe. Regexps aren't valid where 
/-the-operator is, after all.

Please correct me if I'm wrong, but I've got the impression that Perl 6 
is tokenisable without requiring BEGIN blocks to be run - provided no 
grammars which the tokeniser doesn't already know about are used, of 
course, that one will never be avoidable.



Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Randal L. Schwartz
 Matthew == Matthew Walton [EMAIL PROTECTED] writes:

Matthew So you're saying that in Perl 6 it will be entirely impossible to
Matthew determine if / appears as the division operator or as the beginning of
Matthew a regex from a purely syntactic examination of the source code?

Yes.

Matthew I'm finding that very, very hard to believe. Regexps aren't valid
Matthew where /-the-operator is, after all.

And that's precisely why Perl can work as it does.  If an operator is
expected, / is divide.  If a term is expected, / is the beginning of a
regex.  This has been true since Perl1 (maybe 0).  There are a few
other characters that also work similarly, but / is the most frequent
and most troublesome.  And it got worse for Perl5, because of
user-defined prototypes, which as far as I can tell, are still present
in Perl6.

Matthew Please correct me if I'm wrong, but I've got the impression that Perl
Matthew 6 is tokenisable without requiring BEGIN blocks to be run - provided
Matthew no grammars which the tokeniser doesn't already know about are used,
Matthew of course, that one will never be avoidable.

Your impression is wrong.  In the presence of user-defined prototypes,
you *must* execute the code that might alter a prototype in order to
determine whether / is a divide (and therefore standalone token) or
the beginning of a regex (and therefore must locate the end of the
regex to properly be a token).

Please see the referenced perlmonks article.

All the handwaving in the world won't fix this.  As long as we have
dual-natured characters like /, and user-defined prototypes, Perl
cannot be lexed without also parsing, and therefore without also
running BEGIN blocks.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Matthew Walton
Randal L. Schwartz wrote:
Matthew == Matthew Walton [EMAIL PROTECTED] writes:

Matthew So you're saying that in Perl 6 it will be entirely impossible to
Matthew determine if / appears as the division operator or as the beginning of
Matthew a regex from a purely syntactic examination of the source code?
Yes.
Matthew I'm finding that very, very hard to believe. Regexps aren't valid
Matthew where /-the-operator is, after all.
And that's precisely why Perl can work as it does.  If an operator is
expected, / is divide.  If a term is expected, / is the beginning of a
regex.  This has been true since Perl1 (maybe 0).  There are a few
other characters that also work similarly, but / is the most frequent
and most troublesome.  And it got worse for Perl5, because of
user-defined prototypes, which as far as I can tell, are still present
in Perl6.
Perl 6 has formal parameters for subs, methods etc. I don't see any 
mention of Perl 5-style prototypes in S6, and I honestly can't see how 
they could possibly fit with formal parameters. Hopefully Larry or 
someone can clarify whether they still exist or not.

If they don't still exist, this eases the problem somewhat, but not 
entirely I understand. Being able to call subs and methods without 
parentheses around the argument lists causes problems; a quick scan of 
the updated Synopses failed to reveal the rules for that in Perl 6.

Your impression is wrong.  In the presence of user-defined prototypes,
you *must* execute the code that might alter a prototype in order to
determine whether / is a divide (and therefore standalone token) or
the beginning of a regex (and therefore must locate the end of the
regex to properly be a token).
Since Perl 5 style prototypes don't appear to exist anymore, this may be 
easier. I don't believe that the addition of the // operator compounds 
the problem anymore, because hopefully by that point it was possible to 
determine that you've seen an operator.

The Perlmonks article throws up a lot of very nasty cases. Not knowing 
the entire current language definition by heart, I can't say this with 
absolutely certainty, but I retain the belief that Perl 6 is at least 
*easier* to deal with than Perl 5.

It is also possible that telling the difference between /-as-divide and 
/-as-regex becomes much easier if lookahead is employed in the 
tokeniser. Unfortunately, that makes the tokeniser much more 
complicated, and it's just a vague and random idea.




Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Randal L. Schwartz
 Matthew == Matthew Walton [EMAIL PROTECTED] writes:

Matthew Perl 6 has formal parameters for subs, methods etc. I don't see any
Matthew mention of Perl 5-style prototypes in S6, and I honestly can't see how
Matthew they could possibly fit with formal parameters. Hopefully Larry or
Matthew someone can clarify whether they still exist or not.

As long as you can have a user-defined null-prototyped subroutine (one
that doesn't need parens following), you have the problem.  See the
sin/time examples in the monk article, and then consider user-defined
functions that have no args (like time) and those that do (like sin).

Matthew The Perlmonks article throws up a lot of very nasty cases. Not knowing
Matthew the entire current language definition by heart, I can't say this with
Matthew absolutely certainty, but I retain the belief that Perl 6 is at least
Matthew *easier* to deal with than Perl 5.

I believe you have a false belief.  I don't know anything in the new
prototypes-which-became-full-formal-arguments that made it any
*easier* to recognize the ending of a subroutine argument list without
knowing its precise definition.

In Perl6:

sub no_args () { ... }
sub list_args ([EMAIL PROTECTED]) { ... }

no_args / # this is a divide
list_args / # this is the start of a regex

See, it's still there. :)

Matthew It is also possible that telling the difference between /-as-divide
Matthew and /-as-regex becomes much easier if lookahead is employed in the
Matthew tokeniser.

No, not possible at all.  The entire rest of the program may be valid
either way.  You *must* know by the time you're done with /, or
/-and-more.  The rest of the code cannot be a hint.  Again, see my
article.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!


Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Matthew Walton
Randal L. Schwartz wrote:
Matthew == Matthew Walton [EMAIL PROTECTED] writes:

Matthew Perl 6 has formal parameters for subs, methods etc. I don't see any
Matthew mention of Perl 5-style prototypes in S6, and I honestly can't see how
Matthew they could possibly fit with formal parameters. Hopefully Larry or
Matthew someone can clarify whether they still exist or not.
As long as you can have a user-defined null-prototyped subroutine (one
that doesn't need parens following), you have the problem.  See the
sin/time examples in the monk article, and then consider user-defined
functions that have no args (like time) and those that do (like sin).
Matthew The Perlmonks article throws up a lot of very nasty cases. Not knowing
Matthew the entire current language definition by heart, I can't say this with
Matthew absolutely certainty, but I retain the belief that Perl 6 is at least
Matthew *easier* to deal with than Perl 5.
I believe you have a false belief.  I don't know anything in the new
prototypes-which-became-full-formal-arguments that made it any
*easier* to recognize the ending of a subroutine argument list without
knowing its precise definition.
In Perl6:
sub no_args () { ... }
sub list_args ([EMAIL PROTECTED]) { ... }
no_args / # this is a divide
list_args / # this is the start of a regex
See, it's still there. :)
I believe I did mention that being able to call functions without parens 
is a problem.

Matthew It is also possible that telling the difference between /-as-divide
Matthew and /-as-regex becomes much easier if lookahead is employed in the
Matthew tokeniser.
No, not possible at all.  The entire rest of the program may be valid
either way.  You *must* know by the time you're done with /, or
/-and-more.  The rest of the code cannot be a hint.  Again, see my
article.
I read the article. I believe I mentioned that as well.
But I will have to concede that it is impossible to correctly determine 
the structure of an arbitrary Perl 6 program without having to hand the 
definitions of all functions used and also any grammars and macros used. 
Sometimes you will be able to do it, sometimes you won't, but you can't 
operate on the assumption that you can.

It's quite a disappointment in some ways, but we've lived with it in 
Perl 5, and I'm sure we can live with it in Perl 6.

And I still think Perl 6 will have fewer cases in which it's completely 
impossible for not-Perl to parse it. Unfortunately, fewer still implies 
some, and some is still a problem.



Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread James Mastros
Randal L. Schwartz wrote:
All the handwaving in the world won't fix this.  As long as we have
dual-natured characters like /, and user-defined prototypes, Perl
cannot be lexed without also parsing, and therefore without also
running BEGIN blocks.
And user-defined prototypes that change when the argument list of a 
function ends, that is.  If we forced the argument list for all 
functions to have parens (including empty parens for argument less 
functions), then we'd be OK, I'm fairly certain.

For that matter, if we stick to declaration syntax for declarations, and 
not BEGIN blocks and reflection, then we're OK -- you have to do some 
execution, but of a minilanguage that can't express concepts that you 
wouldn't be OK running... though you do still have to descend through 
require/use, and thus have to have the files being required or used (or 
at least a description of their declarations).

-=- James Mastros,
theorbtwo


Re: Lexing requires execution (was Re: Will _anything_ be able to truly parse and understand perl?)

2004-11-26 Thread Juerd
James Mastros skribis 2004-11-26 14:36 (+0100):
 And user-defined prototypes that change when the argument list of a 
 function ends, that is.  If we forced the argument list for all 
 functions to have parens (including empty parens for argument less 
 functions), then we'd be OK, I'm fairly certain.

While that is true, please realise that many people like that in Perl,
parens are optional. I am one of those people who dislike typing and
counting too many balanced symbol sets.

If only method and function syntax could be the same, and methods would
also not require parens... Ah well, that's what we have mutable grammar
for.

 For that matter, if we stick to declaration syntax for declarations, and 
 not BEGIN blocks and reflection

Macros are somewhat like BEGIN blocks and may be needed to turn invalid
syntax into something that is valid.


Juerd