Re: Parsing indent-sensitive languages

2005-09-09 Thread Peri Hankey
Many apologies for triple posting - short-circuit between ears. Peri Hankey -- http://languagemachine.sourceforge.net - The language machine

Re: Parsing indent-sensitive languages

2005-09-09 Thread Peri Hankey
Dave Whipp wrote: If I want to parse a language that is sensitive to whitespace indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? The way I'd usually handle it is to have a lexer that examines leading whitespace and converts it into "indent" and "unindent" tokens. The

Re: Parsing indent-sensitive languages

2005-09-09 Thread Peri Hankey
Dave Whipp wrote: If I want to parse a language that is sensitive to whitespace indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? The way I'd usually handle it is to have a lexer that examines leading whitespace and converts it into "indent" and "unindent" tokens. T

Re: Parsing indent-sensitive languages

2005-09-09 Thread Peri Hankey
Dave Whipp wrote: If I want to parse a language that is sensitive to whitespace indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? The way I'd usually handle it is to have a lexer that examines leading whitespace and converts it into "indent" and "unindent" tokens. The

Re: Parsing indent-sensitive languages

2005-09-08 Thread Dave Whipp
Damian Conway wrote: Alternatively, you could define separate rules for the three cases: { state @indents = 0; rule indent { ^^ $:=(\h*) { $ = expand_tabs($).chars } <( $ > @indents[-1] )> { let @indents = (@indents, $) }

Re: Parsing indent-sensitive languages

2005-09-08 Thread Collin Winter
On 9/8/05, Larry Wall <[EMAIL PROTECTED]> wrote: > Okay, how do you tell the difference between > > if foo1 > bar1 > if foo2 > bar2 > if foo3 > bar3 > else >

Re: Parsing indent-sensitive languages

2005-09-08 Thread Larry Wall
On Thu, Sep 08, 2005 at 07:57:43PM -0400, Collin Winter wrote: : On 9/8/05, Larry Wall <[EMAIL PROTECTED]> wrote: : > It seems to me you need a stack of levels so you know how many : > indentation levels to pop off. Otherwise you can't parse this: : > : > if foo1 : > bar1

Re: Parsing indent-sensitive languages

2005-09-08 Thread Collin Winter
On 9/8/05, Larry Wall <[EMAIL PROTECTED]> wrote: > It seems to me you need a stack of levels so you know how many > indentation levels to pop off. Otherwise you can't parse this: > > if foo1 > bar1 > if foo2 > bar2 >

Re: Parsing indent-sensitive languages

2005-09-08 Thread Damian Conway
To solve Dave's particular problem, you don't need any new features. Just: rule indentation { ^^ $:=(\h*) { state @indents = 0; my $new_indent = expand_tabs($).chars; let @indents = @indents; pop @indents while @indents && $new_indent <= @

Re: Parsing indent-sensitive languages

2005-09-08 Thread chromatic
On Thu, 2005-09-08 at 14:59 -0700, Greg Woodhouse wrote: > I agree that simply using terms like this means indentation grammars > are problematic -- or does it? One thing that bothers me is that > *people* don't seem to have a great deal of difficulty with them. Why > not? People can parse multi-

Re: Parsing indent-sensitive languages

2005-09-08 Thread Greg Woodhouse
Come to think of it...I had in mind a sequence of "skip" statements, that would back out of a level one at a time, until you finally reached the desired level. But, I think maybe these "skip" statements essentially play the role of what you called "positive unindent tokens" (I like that term). I a

Re: Parsing indent-sensitive languages

2005-09-08 Thread Greg Woodhouse
What I had in mind is really no different from the stateful lexer previously proposed. Unless I'm mistaken, an abstract model might be a language over {0, 1, 2} where each 1 or 2 must be prececed by a run of 1 or more 0's, but each run differ in length from the preceding one by 0, 1 or -1. But that

Re: Parsing indent-sensitive languages

2005-09-08 Thread Larry Wall
On Thu, Sep 08, 2005 at 02:16:33PM -0700, Greg Woodhouse wrote: : In the case of the : "indentation grammar", then the (one) stack in a push-down automaton is : basically used up keeping track of the indentation level. But you don't : need a whole stack to keep track of indntation level, just a reg

Re: Parsing indent-sensitive languages

2005-09-08 Thread Greg Woodhouse
That's something I've been thinking about, too. There are a lot of "interesting" languages that cannot be described by context free grammars (such as {empty, 012, 001122, 000111222, ...} but very simple enhancements do make them easy to recognize. In the case of the "indentation grammar", then the

Re: Parsing indent-sensitive languages

2005-09-08 Thread Larry Wall
On Thu, Sep 08, 2005 at 08:37:21AM -0700, Dave Whipp wrote: : If I want to parse a language that is sensitive to whitespace : indentation (e.g. Python, Haskell), how do I do it using P6 rules/grammars? : : The way I'd usually handle it is to have a lexer that examines leading : whitespace and co