Re: zip: stop when and where?

2005-10-04 Thread Greg Woodhouse
I see your point. Option b does suggest that you can read ahead in a
"blocked" list and get undef's. If I had to choose just one, I think
I'd opt for d, but having two zip's one acting like c and one like d
might be useful. Then, of course, my first thought was wrong. This one
may well be, too.

--- Eric <[EMAIL PROTECTED]> wrote:

> Hey,
> I'd just like to say that I find B a bit misleading because you
> couldn't
> tell that the first list ended, it could just have undef's at the
> end. I
> like a because it doesn't add any data that wasn't there, of course
> that
> could be a reason to dislike it too. On the other hand c makes a good
> option
> when you want to work with infinite lists. Is this something that
> could be
> modified on per use basis and we just choose one now as the default
> "they
> didn't request a specific one so use this one).
> 
> After all that i think I agree on C specificaly because you can
> provide a
> good code use of it and it doesn't add any data that wasn't there
> before. I
> don't think it should ever lean towards (b) but them I bet someone
> else will
> have an equaly good use of that. ;) So in the end I think some way of
> chooseing would be good, with one option picked as standard.
> 
> --
> Eric Hodges
> 



===
Gregory Woodhouse  <[EMAIL PROTECTED]>



"Without the requirement of mathematical aesthetics a great many discoveries 
would not have been made."

-- Albert Einstein











Re: zip: stop when and where?

2005-10-04 Thread Greg Woodhouse
That (b) certainly seems like the sensible option to me. My second
choice would be d.

A nice thing about c is that it leaves open the possibility of lazy
evaluation (zip as much of the lists as you can, leaving open the
possibility of picking up the process later). But I still prefer b.
Maybe there could be separate "lazy zip" (lzip?).

--- Juerd <[EMAIL PROTECTED]> wrote:

> What should zip do given 1..3 and 1..6?
> 
> (a) 1 1 2 2 3 3 4 5 6
> (b) 1 1 2 2 3 3 undef 4 undef 5 undef 6
> (c) 1 1 2 2 3 3
> (d) fail
> 
> I'd want c, mostly because of code like
> 
> for @foo Y 0... -> $foo, $i { ... }
> 
> Pugs currently does b.
> 
> 
> Juerd
> -- 
> http://convolution.nl/maak_juerd_blij.html
> http://convolution.nl/make_juerd_happy.html 
> http://convolution.nl/gajigu_juerd_n.html
> 



===
Gregory Woodhouse  <[EMAIL PROTECTED]>



"Without the requirement of mathematical aesthetics a great many discoveries 
would not have been made."

-- Albert Einstein











Re: Parsing indent-sensitive languages

2005-09-08 Thread Greg Woodhouse
Come to think of it...I had in mind a sequence of "skip" statements,
that would back out of a level one at a time, until you finally reached
the desired level. But, I think maybe these "skip" statements
essentially play the role of what you called "positive unindent tokens"
(I like that term).

I agree that simply using terms like this means indentation grammars
are problematic -- or does it? One thing that bothers me is that
*people* don't seem to have a great deal of difficulty with them. Why
not?




===
Gregory Woodhouse  <[EMAIL PROTECTED]>



"Without the requirement of mathematical aesthetics a great many discoveries 
would not have been made."

-- Albert Einstein











Re: Parsing indent-sensitive languages

2005-09-08 Thread Greg Woodhouse
What I had in mind is really no different from the stateful lexer
previously proposed. Unless I'm mistaken, an abstract model might be a
language over {0, 1, 2} where each 1 or 2 must be prececed by a run of
1 or more 0's, but each run differ in length from the preceding one by
0, 1 or -1. But that's only a local constraint. You also also want to
eventually get back to a run of 1 as the string ends (You don't want
the program to end in the middle of a nested block.) So, maybe a single
register would work locally (so long as transitions can be conditioned
on its value), but you still need a stack for global correctness. 

--- Larry Wall <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 08, 2005 at 02:16:33PM -0700, Greg Woodhouse wrote:
> : In the case of the
> : "indentation grammar", then the (one) stack in a push-down
> automaton is
> : basically used up keeping track of the indentation level. But you
> don't
> : need a whole stack to keep track of indntation level, just a
> register
> : that can be used to track the current level.
> 
> It seems to me you need a stack of levels so you know how many
> indentation levels to pop off.  Otherwise you can't parse this:
> 
>   if foo1
>   bar1
>   if foo2
>   bar2
>   if foo3
>   bar3
>   else
>   baz2
>  
> Larry
> 



===
Gregory Woodhouse  <[EMAIL PROTECTED]>



"Without the requirement of mathematical aesthetics a great many discoveries 
would not have been made."

-- Albert Einstein











Re: Parsing indent-sensitive languages

2005-09-08 Thread Greg Woodhouse
That's something I've been thinking about, too. There are a lot of
"interesting" languages that cannot be described by context free
grammars (such as {empty, 012, 001122, 000111222, ...} but very simple
enhancements do make them easy to recognize. In the case of the
"indentation grammar", then the (one) stack in a push-down automaton is
basically used up keeping track of the indentation level. But you don't
need a whole stack to keep track of indntation level, just a register
that can be used to track the current level.

BTW, I'm new to this list. I haven't done that much with Perl recently,
but of late, I've become a lot more interested in the language. I
recently picked ot the Perl 6/Parrot book from O'Reilly but had really
meant to finish reading the book before jumping in. It's just that this
topic is too interesting!

--- Larry Wall <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 08, 2005 at 08:37:21AM -0700, Dave Whipp wrote:
> : If I want to parse a language that is sensitive to whitespace 
> : indentation (e.g. Python, Haskell), how do I do it using P6
> rules/grammars?
> : 
> : The way I'd usually handle it is to have a lexer that examines
> leading 
> : whitespace and converts it into "indent" and "unindent" tokens. The
> 
> : grammer can then use these tokens in the same way that it would any
> 
> : other block-delimiter.
> 
> This is the multi-pass approach, which even in Perl 6 is still
> certainly one way to do it.  Or actually, two ways, one of which is
> to use source filters to mung the input text, and the other way of
> which is to make one lexer pass to transform into a list or tree of
> tokens, and then do a list/tree transformation on that.  Given that
> tree transformation is something that a number of us are currently
> thinking about for various other reasons, I suspect that can be made
> to work pretty well.  But we will have to describe how rule matching
> can be extended to lists and trees, and what happens when you
> intermix
> text elements with non-textual objects.  But essentially, just think
> what it would take to match against the tree of match objects
> returned
> by a prior match instead of against a string.
> 
> : This requires a stateful lexer, because to work out the number of 
> : "unindent" tokens on a line, it needs to know what the indentation 
> : positions are. How would I write a P6 rule that defines 
> and 
> :  tokens? Alternatively (if a different approach is
> needed) how 
> : would I use P6 to parse such a language?
> 
> I can think of two other approaches as well.  One would be to allow
> "pushback" on the queued input so that when we hit a line transition,
> we can immediately analyze the leading whitespace and replace it
> conceptually with the appropriate number of indent/unindent objects.
> It's not yet clear whether that is a good approach.  It might be
> rather inefficient to splice faked up objects into the middle of the
> input stream.  On the other hand, we don't actually have to splice
> the input, only pretend that we did.  And certainly, the Perl 5 lexer
> makes heavy use of this kind of we'll-fake-the-next-N-tokens queueing
> (though I actually botched the Perl 5 implementation of it by making
> it a stack instead of a queue).
> 
> My final idea is that you can treat it as a fancy kind of lookbehind
> assertion, provided you have some landmark that will stop the
> normal analysis from running over the newline boundary.  With the
> other approaches, you have a precomputed positive unindent token
> to function as a "stopper", but with this approach, the only thing
> you can depend on is the newline itself.  So when you first hit a
> newline, you don't progress beyond it, but instead look ahead at the
> indentation of the following line and queue up the right number of
> indent/unindent objects in your own data structure (as well as
> pushing
> or popping your stack of current indents appropriately so that the
> new indent is on top).  Then as long as there are queued up objects,
> the newline doesn't report itself as a newline, but as the next
> object,
> until you run out, and then the newline allows itself to be skipped,
> along with the subsequent whitespace.  (The difference between this
> and the previous approach is that you're not relying on the rules
> engine to keep track of the queueing for you.)
> 
> The very fact that I had to use a phrase like "positive unindent
> token"
> says a lot about why indentation as syntax is problematic in general.
> 
> Larry
> 



===
Gregory Woodhouse  <[EMAIL PROTECTED]>



"Without the requirement of mathematical aesthetics a great many discoveries 
would not have been made."

-- Albert Einstein