!!/nor and ??!! vs. ??::
Pugs currently implements infix:!! as an ugly version of the infix:nor operator. Are these in the spec? If so, how does !! interact with the second part of the new ??!! replacement for ??:: ? -- Benjamin integral Smith [EMAIL PROTECTED], [EMAIL PROTECTED]
Parsing indent-sensitive languages
If I want to parse a language that is sensitive to whitespace indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? The way I'd usually handle it is to have a lexer that examines leading whitespace and converts it into indent and unindent tokens. The grammer can then use these tokens in the same way that it would any other block-delimiter. This requires a stateful lexer, because to work out the number of unindent tokens on a line, it needs to know what the indentation positions are. How would I write a P6 rule that defines indent and unindent tokens? Alternatively (if a different approach is needed) how would I use P6 to parse such a language?
Re: !!/nor and ??!! vs. ??::
On 9/8/05, Benjamin Smith [EMAIL PROTECTED] wrote: Pugs currently implements infix:!! as an ugly version of the infix:nor operator. Are these in the spec? No they are not. Destroy! Luke
Re: Packages, Modules and Classes
On Wed, Sep 07, 2005 at 03:00:29PM -0400, Stevan Little wrote: : If methods and subs are in the same namespace, and both have the : sigil, what about instance attributes and class attributes? Is this : legal? : : class Foo { : my $.bar; : has $.bar; : } : : Part of me thinks that it should be since my $.bar is an attribute of : the Foo class, and has $.bar is an attribute of instances of Foo. I don't think that should be any more legal than { my $foo; our $foo; } since both declarations are trying to install the same symbol in a scope regardless of the actual scope/lifetime of the variable in question. The difference is that with the latter, the collision is in the lexical scope, while in the former, the collision is in the package's method name space. The problem is you're making two .bar accessors in the same scope, even if the things being accessed are in different places. It's also a problem under my current view that Foo (the non-meta class) is primarily just the undefined prototype instance, in which case you don't want to make artificial distinctions between the type of Foo itself and the type of its instances. So I think that if .can is true for an instance method it should also be true of the class method, even if it would fail when used on the class object (because it's undefined, not because it's a different type). This fits with the policy that we want to be able to use class names to reason about class behavior generically even in the absence of an actual instance. Foo.can('bar') needs to be true regardless of whether you said my or has. : Also, is there anyway to iterate over the keys in the namespace? The : old way would be to do something like keys(%Foo::). Is something like : this possible with the new way? Sure, it's still just a hash, basically, so Foo.keys() works fine. All we've changed is that we've removed a special syntactic case by allowing a type/package object to pretend to be a hash when used that way, just as we allow it to pretend to be an undef when used as an instance. Tagmemics strikes again... Of course, this is not entirely free, insofar as the Hash role has to figure out when it's being composed into a class, and pick an entirely different behavior when used as a class name than when used as an instance of the class. That is, Foo{$x} wants to do a symbol table lookup, while (new Foo){$x} has to access the Hash interface of the instance. But still think I'd rather hide the semantic fudge down there than bake it into special syntax, even though it means being a wee bit inconsistent about the type relationship of Foo and (new Foo). Or if that ends up causing too much indigestion, we could still back off to a Foo::{$x} syntax without requiring % on the front. After all, the main point of the change is to get rid of the sigil on the front of things like $CALLER::_ so that we end up with CALLER$_ instead. So CALLER::$_ wouldn't be all that bad if we decide the bare package name is too ambiguous in its semantics. Then we'd probably see things like ::($pkg)::$var ::($pkg)::{$varname} Hmm, unless we actually spring for infix::: to get $pkg :: '$var' $pkg :: $varname But I think that makes symbolic references a little too easy to use, and a little too hard to spot. Though I suppose that's no worse than the current proposal allowing $pkg$var $pkg{$varname} Maybe what we really have is a class of *postfix* operators that all begin with ::, as in ::(...) ::... ::{...} Basically steal the :: from the old %Foo:: syntax, and analyze it the other way. Then a name is simply not allowed to end with ::. Postfix operators are recognized where an operator is expected. But if a term can start with ::(), maybe it can start with :: or ::{} as well. Not quite sure if that makes sense. We could make it mean something. Doubtless term ::$foo should mean the same as $foo though. Term ::{$varname} is probably just MY::{$varname}, and also likely to be confused with ::($sym). So that approach doesn't really buy us much. So let's just try to make the simple Foo$var form stick for now, and back off later if forced to. Larry
Re: Packages, Modules and Classes
Larry, On Sep 8, 2005, at 2:30 PM, Larry Wall wrote: On Wed, Sep 07, 2005 at 03:00:29PM -0400, Stevan Little wrote: : Also, is there anyway to iterate over the keys in the namespace? The : old way would be to do something like keys(%Foo::). Is something like : this possible with the new way? Sure, it's still just a hash, basically, so Foo.keys() works fine. All we've changed is that we've removed a special syntactic case by allowing a type/package object to pretend to be a hash when used that way, just as we allow it to pretend to be an undef when used as an instance. Tagmemics strikes again... But what if I want to do this? class Foo { my %:stuff; method keys (Class $c:) { %:stuff.keys(); } } How can I get at my namespace now? How would I disambiguiate that call? Doing something like Foo.Package::keys() seems to me to be exposing too much of the meta-level (the Package class). I can see lots of potential conflict between class methods and methods to access the contents of a namespace (methods defined in the Hash role I assume). This means that Foo is getting even more and more magical. It's now a type annotation, a special undef value, the invocant in class methods and the gatekeeper of the namespace. Stevan
Re: Parsing indent-sensitive languages
On Thu, Sep 08, 2005 at 08:37:21AM -0700, Dave Whipp wrote: : If I want to parse a language that is sensitive to whitespace : indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? : : The way I'd usually handle it is to have a lexer that examines leading : whitespace and converts it into indent and unindent tokens. The : grammer can then use these tokens in the same way that it would any : other block-delimiter. This is the multi-pass approach, which even in Perl 6 is still certainly one way to do it. Or actually, two ways, one of which is to use source filters to mung the input text, and the other way of which is to make one lexer pass to transform into a list or tree of tokens, and then do a list/tree transformation on that. Given that tree transformation is something that a number of us are currently thinking about for various other reasons, I suspect that can be made to work pretty well. But we will have to describe how rule matching can be extended to lists and trees, and what happens when you intermix text elements with non-textual objects. But essentially, just think what it would take to match against the tree of match objects returned by a prior match instead of against a string. : This requires a stateful lexer, because to work out the number of : unindent tokens on a line, it needs to know what the indentation : positions are. How would I write a P6 rule that defines indent and : unindent tokens? Alternatively (if a different approach is needed) how : would I use P6 to parse such a language? I can think of two other approaches as well. One would be to allow pushback on the queued input so that when we hit a line transition, we can immediately analyze the leading whitespace and replace it conceptually with the appropriate number of indent/unindent objects. It's not yet clear whether that is a good approach. It might be rather inefficient to splice faked up objects into the middle of the input stream. On the other hand, we don't actually have to splice the input, only pretend that we did. And certainly, the Perl 5 lexer makes heavy use of this kind of we'll-fake-the-next-N-tokens queueing (though I actually botched the Perl 5 implementation of it by making it a stack instead of a queue). My final idea is that you can treat it as a fancy kind of lookbehind assertion, provided you have some landmark that will stop the normal analysis from running over the newline boundary. With the other approaches, you have a precomputed positive unindent token to function as a stopper, but with this approach, the only thing you can depend on is the newline itself. So when you first hit a newline, you don't progress beyond it, but instead look ahead at the indentation of the following line and queue up the right number of indent/unindent objects in your own data structure (as well as pushing or popping your stack of current indents appropriately so that the new indent is on top). Then as long as there are queued up objects, the newline doesn't report itself as a newline, but as the next object, until you run out, and then the newline allows itself to be skipped, along with the subsequent whitespace. (The difference between this and the previous approach is that you're not relying on the rules engine to keep track of the queueing for you.) The very fact that I had to use a phrase like positive unindent token says a lot about why indentation as syntax is problematic in general. Larry
Re: Packages, Modules and Classes
On Thu, Sep 08, 2005 at 04:52:52PM -0400, Stevan Little wrote: : But what if I want to do this? : : class Foo { : my %:stuff; : method keys (Class $c:) { : %:stuff.keys(); : } : } : : How can I get at my namespace now? How would I disambiguiate that call? : Doing something like Foo.Package::keys() seems to me to be exposing too : much of the meta-level (the Package class). : : I can see lots of potential conflict between class methods and methods : to access the contents of a namespace (methods defined in the Hash role : I assume). This means that Foo is getting even more and more magical. : It's now a type annotation, a special undef value, the invocant in : class methods and the gatekeeper of the namespace. Well, like I said, we can require the extra :: in cases of ambiguity. It's really only the misplaced sigil I'm trying to get rid of. Larry
Re: Parsing indent-sensitive languages
That's something I've been thinking about, too. There are a lot of interesting languages that cannot be described by context free grammars (such as {empty, 012, 001122, 000111222, ...} but very simple enhancements do make them easy to recognize. In the case of the indentation grammar, then the (one) stack in a push-down automaton is basically used up keeping track of the indentation level. But you don't need a whole stack to keep track of indntation level, just a register that can be used to track the current level. BTW, I'm new to this list. I haven't done that much with Perl recently, but of late, I've become a lot more interested in the language. I recently picked ot the Perl 6/Parrot book from O'Reilly but had really meant to finish reading the book before jumping in. It's just that this topic is too interesting! --- Larry Wall [EMAIL PROTECTED] wrote: On Thu, Sep 08, 2005 at 08:37:21AM -0700, Dave Whipp wrote: : If I want to parse a language that is sensitive to whitespace : indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? : : The way I'd usually handle it is to have a lexer that examines leading : whitespace and converts it into indent and unindent tokens. The : grammer can then use these tokens in the same way that it would any : other block-delimiter. This is the multi-pass approach, which even in Perl 6 is still certainly one way to do it. Or actually, two ways, one of which is to use source filters to mung the input text, and the other way of which is to make one lexer pass to transform into a list or tree of tokens, and then do a list/tree transformation on that. Given that tree transformation is something that a number of us are currently thinking about for various other reasons, I suspect that can be made to work pretty well. But we will have to describe how rule matching can be extended to lists and trees, and what happens when you intermix text elements with non-textual objects. But essentially, just think what it would take to match against the tree of match objects returned by a prior match instead of against a string. : This requires a stateful lexer, because to work out the number of : unindent tokens on a line, it needs to know what the indentation : positions are. How would I write a P6 rule that defines indent and : unindent tokens? Alternatively (if a different approach is needed) how : would I use P6 to parse such a language? I can think of two other approaches as well. One would be to allow pushback on the queued input so that when we hit a line transition, we can immediately analyze the leading whitespace and replace it conceptually with the appropriate number of indent/unindent objects. It's not yet clear whether that is a good approach. It might be rather inefficient to splice faked up objects into the middle of the input stream. On the other hand, we don't actually have to splice the input, only pretend that we did. And certainly, the Perl 5 lexer makes heavy use of this kind of we'll-fake-the-next-N-tokens queueing (though I actually botched the Perl 5 implementation of it by making it a stack instead of a queue). My final idea is that you can treat it as a fancy kind of lookbehind assertion, provided you have some landmark that will stop the normal analysis from running over the newline boundary. With the other approaches, you have a precomputed positive unindent token to function as a stopper, but with this approach, the only thing you can depend on is the newline itself. So when you first hit a newline, you don't progress beyond it, but instead look ahead at the indentation of the following line and queue up the right number of indent/unindent objects in your own data structure (as well as pushing or popping your stack of current indents appropriately so that the new indent is on top). Then as long as there are queued up objects, the newline doesn't report itself as a newline, but as the next object, until you run out, and then the newline allows itself to be skipped, along with the subsequent whitespace. (The difference between this and the previous approach is that you're not relying on the rules engine to keep track of the queueing for you.) The very fact that I had to use a phrase like positive unindent token says a lot about why indentation as syntax is problematic in general. Larry === Gregory Woodhouse [EMAIL PROTECTED] Without the requirement of mathematical aesthetics a great many discoveries would not have been made. -- Albert Einstein
Re: Parsing indent-sensitive languages
On Thu, Sep 08, 2005 at 02:16:33PM -0700, Greg Woodhouse wrote: : In the case of the : indentation grammar, then the (one) stack in a push-down automaton is : basically used up keeping track of the indentation level. But you don't : need a whole stack to keep track of indntation level, just a register : that can be used to track the current level. It seems to me you need a stack of levels so you know how many indentation levels to pop off. Otherwise you can't parse this: if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 Larry
Re: Parsing indent-sensitive languages
What I had in mind is really no different from the stateful lexer previously proposed. Unless I'm mistaken, an abstract model might be a language over {0, 1, 2} where each 1 or 2 must be prececed by a run of 1 or more 0's, but each run differ in length from the preceding one by 0, 1 or -1. But that's only a local constraint. You also also want to eventually get back to a run of 1 as the string ends (You don't want the program to end in the middle of a nested block.) So, maybe a single register would work locally (so long as transitions can be conditioned on its value), but you still need a stack for global correctness. --- Larry Wall [EMAIL PROTECTED] wrote: On Thu, Sep 08, 2005 at 02:16:33PM -0700, Greg Woodhouse wrote: : In the case of the : indentation grammar, then the (one) stack in a push-down automaton is : basically used up keeping track of the indentation level. But you don't : need a whole stack to keep track of indntation level, just a register : that can be used to track the current level. It seems to me you need a stack of levels so you know how many indentation levels to pop off. Otherwise you can't parse this: if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 Larry === Gregory Woodhouse [EMAIL PROTECTED] Without the requirement of mathematical aesthetics a great many discoveries would not have been made. -- Albert Einstein
Re: Parsing indent-sensitive languages
Come to think of it...I had in mind a sequence of skip statements, that would back out of a level one at a time, until you finally reached the desired level. But, I think maybe these skip statements essentially play the role of what you called positive unindent tokens (I like that term). I agree that simply using terms like this means indentation grammars are problematic -- or does it? One thing that bothers me is that *people* don't seem to have a great deal of difficulty with them. Why not? === Gregory Woodhouse [EMAIL PROTECTED] Without the requirement of mathematical aesthetics a great many discoveries would not have been made. -- Albert Einstein
Re: Parsing indent-sensitive languages
On Thu, 2005-09-08 at 14:59 -0700, Greg Woodhouse wrote: I agree that simply using terms like this means indentation grammars are problematic -- or does it? One thing that bothers me is that *people* don't seem to have a great deal of difficulty with them. Why not? People can parse multi-dimensionally. Computers cannot... yet. -- c
Re: Proposal: split ternary ?? :: into binary ?? and //
[EMAIL PROTECTED] (Larry Wall) writes: So let's go ahead and make it ??!!. (At least this week...) I hereby christen this the interrobang operator. (http://en.wikipedia.org/wiki/Interrobang) -- Your fault: core dumped -- MegaHAL
Re: Packages, Modules and Classes
Larry, On Sep 8, 2005, at 5:07 PM, Larry Wall wrote: On Thu, Sep 08, 2005 at 04:52:52PM -0400, Stevan Little wrote: : But what if I want to do this? : : class Foo { : my %:stuff; : method keys (Class $c:) { : %:stuff.keys(); : } : } : : How can I get at my namespace now? How would I disambiguiate that call? : Doing something like Foo.Package::keys() seems to me to be exposing too : much of the meta-level (the Package class). : : I can see lots of potential conflict between class methods and methods : to access the contents of a namespace (methods defined in the Hash role : I assume). This means that Foo is getting even more and more magical. : It's now a type annotation, a special undef value, the invocant in : class methods and the gatekeeper of the namespace. Well, like I said, we can require the extra :: in cases of ambiguity. It's really only the misplaced sigil I'm trying to get rid of. So it would be Foo::.keys() then? Would this be possible? my $pkg = Foo::; # or maybe this ... my $pkg = Foo::; Would $pkg be an instance of the Package class? I would assume given this code: package Foo { ... package Foo::Bar { ... } } I can do this: my $pkg = Foo::{'::Bar'} And get back some kind of Package reference of some kind. Do we even have first class packages? h Stevan
Re: Parsing indent-sensitive languages
To solve Dave's particular problem, you don't need any new features. Just: rule indentation { ^^ $token:=(\h*) { state @indents = 0; my $new_indent = expand_tabs($token).chars; let @indents = @indents; pop @indents while @indents $new_indent = @indents[-1]; push @indents, $new_indent; $/nesting = [EMAIL PROTECTED]; } } where every line has an indentation and the subsequent $indentationnesting value tells you how deep it is. Alternatively, you could define separate rules for the three cases: { state @indents = 0; rule indent { ^^ $token:=(\h*) { $new_indent = expand_tabs($token).chars } ( $new_indent @indents[-1] ) { let @indents = (@indents, $new_indent) } } rule outdent { ^^ $token:=(\h*) { $new_indent = expand_tabs($token).chars } ( $new_indent @indents[-1] ) { pop @indents while @indents $new_indent @indents[-1]; let @indents = (@indents, $new_indent); } } rule samedent { ^^ $token:=(\h*) { $new_indent = expand_tabs($token).chars } ( $new_indent == @indents[-1] ) } } Damian
Re: Parsing indent-sensitive languages
On 9/8/05, Larry Wall [EMAIL PROTECTED] wrote: It seems to me you need a stack of levels so you know how many indentation levels to pop off. Otherwise you can't parse this: if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 Sure you can. Since each item on the stack would just mean add another indentation level, you could use the same pop-triggering logic to decrement a simple counter. Collin
Re: Parsing indent-sensitive languages
On Thu, Sep 08, 2005 at 07:57:43PM -0400, Collin Winter wrote: : On 9/8/05, Larry Wall [EMAIL PROTECTED] wrote: : It seems to me you need a stack of levels so you know how many : indentation levels to pop off. Otherwise you can't parse this: : : if foo1 : bar1 : if foo2 : bar2 : if foo3 : bar3 : else : baz2 : : Sure you can. Since each item on the stack would just mean add : another indentation level, you could use the same pop-triggering : logic to decrement a simple counter. Okay, how do you tell the difference between if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 and if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 unless you remember the exact columns that if foo1 and if foo2 were at? We're counting spaces here, not tabs. Sure, you could throw in extra indent/outdent tokens for every column, but that's just silly--you'd have to do massive tree rewriting when you find an unattached else, and you'd get zillions of useless blocks. (And don't even begin to think about allowing Unicode space characters. It's bad enough when people remap their tabs to something other than multiples of 8 and don't tell you...) Larry
Re: Parsing indent-sensitive languages
On 9/8/05, Larry Wall [EMAIL PROTECTED] wrote: Okay, how do you tell the difference between if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 and if foo1 bar1 if foo2 bar2 if foo3 bar3 else baz2 unless you remember the exact columns that if foo1 and if foo2 were at? I had misunderstood what you were intending to do with the stack. Unfortunately it dawned on me a little too late to unclick Send : ) Collin
Re: Parsing indent-sensitive languages
Damian Conway wrote: Alternatively, you could define separate rules for the three cases: { state @indents = 0; rule indent { ^^ $token:=(\h*) { $new_indent = expand_tabs($token).chars } ( $new_indent @indents[-1] ) { let @indents = (@indents, $new_indent) } } rule outdent { ^^ $token:=(\h*) { $new_indent = expand_tabs($token).chars } ( $new_indent @indents[-1] ) { pop @indents while @indents $new_indent @indents[-1]; let @indents = (@indents, $new_indent); } } rule samedent { ... } } I have a couple of questions about this: 1. It's quite possible that a code-block in a parser could call a function that reads a different file (e.g. for an include file statement). How does the state, @indents, get associated with a particular match? (Sure, I could do an explicit save/restore; but things might get harder if I was using coroutines to get concurrent matches to implement, say, a smart-diff script) 2. How does the outdent rule work in the case where a line does 2 outdents? It looks to me as if I'd only get one match of outdent: the /\h*/ match will advance the match pos, so /^^/ won't match for the second outdent on the same line, which would cause problems if I'm trying to match up nested blocks. Dave.