Re: S05 question

2004-12-10 Thread John Macdonald
On Thu, Dec 09, 2004 at 11:18:34AM -0800, Larry Wall wrote:
 On Wed, Dec 08, 2004 at 08:24:20PM -0800, Ashley Winters wrote:
 : I'm still going to prefer using :=, simply as a good programming
 : practice. My mind sees a big difference between building a parse-tree
 : object and just grepping for some word I want in a string. Within a
 : rule{} block, there is no place except the rule object to keep your
 : data (hypothetically -- haha), so it makes sense to have everything
 : capture unless otherwise specified. There's no such limitation in a
 : regular code block, so I don't see the need.
 
 Since regex results are lexically scoped in Perl 6, in a regular
 code block we can do static analysis and determine whether there's
 any possibility that $foo is referenced at all, and optimize it
 away in many cases, if it turns out to be high overhead.  But as Patrick
 points out, so far capture seems pretty cheap.

It might turn out to be worth optimizing only when ALL of the
capture blocks are unused - the saving from avoiding setup
costs together with avoiding the (too small to be a bother
by themselves) incremental costs, might be significantwhen
taken together.

-- 


Re: S05 question

2004-12-09 Thread Patrick R. Michaud
On Thu, Dec 09, 2004 at 10:52:54AM +, Matthew Walton wrote:
 Of course, it then begs the question about
 
   word ws $foo ws number
 
 if we're thinking of parallels with qw//-like constructs, which I 
 certainly am. I'm not quite sure what that would do, as it collides 
 slightly with the existing rule match syntax (which I quite like), and 
 thus it may already have a meaning.

This already has a meaning, it calls the word assertion with the
(rule) expression /ws $foo ws number/ as an argument.  At least it's
that way unless/until Larry changes (changed?) it.

Pm


Re: S05 question

2004-12-09 Thread Patrick R. Michaud
On Wed, Dec 08, 2004 at 08:24:20PM -0800, Ashley Winters wrote:
 
 I was working on the (possibly misguided) assumption that there's a
 cost to capturing, and that perhaps agressive capturing isn't worth
 having on in a one-liner. Some deep part of my mind remembers $`
 being bad, I think. If there's no consequence to having capture being
 on, then ignoring it is fine. I don't have a problem with that. As I
 said before, ?foo reads fine to me.

At least in the current implementation of PGE there's not a big
cost to capturing of any sort.  Each capture is held as a pair of
(start,end) offsets into the target string, so there's no string 
copying or other overhead until the captured item is actually
referred to.  It's even easy to determine $` as being the start 
of the string up to the beginning offset of the $0 capture.  (Yes,
the perl 5 docs indicate there's a cost to $` that's incurred for all
regexps in a program once $` is used, but I don't think that will 
translate over to PGE.)

That might change, of course, especially as we add the ability 
to modify the target string in the middle of the match.  But even
then we may be able to keep the offset pairs as a useful optimization.

At the moment the bigger cost is calling the subrule itself -- and
even here it's basically the equivalent of a method or subroutine call
(actually, coroutine calls), since a called rule maintains its own
match state just like any other match.

Pm


Re: S05 question

2004-12-09 Thread Larry Wall
On Wed, Dec 08, 2004 at 08:24:20PM -0800, Ashley Winters wrote:
: I'm still going to prefer using :=, simply as a good programming
: practice. My mind sees a big difference between building a parse-tree
: object and just grepping for some word I want in a string. Within a
: rule{} block, there is no place except the rule object to keep your
: data (hypothetically -- haha), so it makes sense to have everything
: capture unless otherwise specified. There's no such limitation in a
: regular code block, so I don't see the need.

Since regex results are lexically scoped in Perl 6, in a regular
code block we can do static analysis and determine whether there's
any possibility that $foo is referenced at all, and optimize it
away in many cases, if it turns out to be high overhead.  But as Patrick
points out, so far capture seems pretty cheap.

Larry


Re: S05 question

2004-12-08 Thread Larry Wall
On Tue, Dec 07, 2004 at 10:36:53PM -0800, Larry Wall wrote:
: But somehow I expect that when someone writes (foo) they probably
: usually meant («foo»).

If we're going to stick with the notion that foo captures and something
else doesn't, I'm beginning to think that the other thing isn't «foo» for
a couple of reasons.  First, if other languages are going to borrow this
notation, they're probably not going to buy into the French quotes.  Second,
I can think of several other possible uses for the French quotes to cure
perceived ills such as the (...) vs {...} confusion.  Third, it now
bothers me to have a ! without a ?.  So what if «foo» is instead written
?foo, meaning you only want to evaluate its success.  (Unlike !foo,
it's not zero-width, but that's just how success/failure works.)  So we'd
get things like

/ $bar := [ (?ident) = (\N+) ]* /

And people would have to get used to seeing ? as non-capturing assertions:

?before ...
?after ...
?ws
?sp
?null

This has a rather Ruby-esque I am a boolean feeling to it.  I think
I like it.  It's pretty easy to type, at least on my keyboard.

Now suppose that we extend that I am a boolean feeling to

?{ code }

which might take the place of the confusing (...), and make consistent
the notion that we always use {...} to invoke real code.

: : Or is it that hypotheticals only bind to things captured by parens?
: : If so, it might need clarification (or perhaps I'm overlooking the part
: : that makes it clear).
: 
: No, I think you just found a blind spot in the design.

I think I'm leaning toward the idea that anything in angles that
begins alpha is a capture to just the alpha part, so the ? prefix is 
merely a no-op that happens to make the assertion not start with an
alpha.  Interestingly, that gives these implicit bindings:

after ... $after$`
before ...$before   $'

Thought that's an argument for changing them to pre ... and post ...,
I suppose, since if users are going to refer to $after in their main
program, it doesn't look like a declarative assertion anymore.

Another problem we've run into is naming if there are multiple assertions
of the same name.  If the capture name is just the alpha part of the
assertion, then we could allow an optional number, and still recognize
it as a ws:

ws1 ws2 ws3

Except I can well imagine people wanting numbered rules.  Drat.  Could
force people to say ws_1 if they want that, I suppose.

Or we could use some standard delim for that:

ws-1 ws-2 ws-3

which is vaguely reminiscent of our version syntax.  Indeed, if we
had quantifications, you might well want to have wildcards ws-* and
let the name be filled in rather than autogenerating a list.  But maybe
we just stick with lists in that case.

For captures of non-alpha assertions, we could say that ? is the same
as true (just as with regular operators), and so

true-3 +alpha-[aeiou]

would capture to $true-3.  (And one could always do an explicit binding
for a different name.)

Actually, I think people would find $match-3 more meaningful than
Ctrue-3.

I'm still thinking about what «...» might mean, if anything.  Bonus points
for interpolative and/or word-splitty.

Anyway, that's where I am this week/day/hour/minute/second.

Larry


Re: S05 question

2004-12-08 Thread Austin Hastings
Larry Wall wrote:
Another problem we've run into is naming if there are multiple assertions
of the same name.  If the capture name is just the alpha part of the
assertion, then we could allow an optional number, and still recognize
it as a ws:
   ws1 ws2 ws3
Except I can well imagine people wanting numbered rules.  Drat.  Could
force people to say ws_1 if they want that, I suppose.
Or we could use some standard delim for that:
   ws-1 ws-2 ws-3
which is vaguely reminiscent of our version syntax.  Indeed, if we
had quantifications, you might well want to have wildcards ws-* and
let the name be filled in rather than autogenerating a list.  But maybe
we just stick with lists in that case.
For captures of non-alpha assertions, we could say that ? is the same
as true (just as with regular operators), and so
   true-3 +alpha-[aeiou]
would capture to $true-3.  (And one could always do an explicit binding
for a different name.)
Actually, I think people would find $match-3 more meaningful than
Ctrue-3.
 

PHP's use of $array[] as push might work for this:
true[] +alpha-[aeiou]
or
@true +alpha-[aeiou]
or
true=1.. +alpha-[aeiou]
or
true@ +alpha-[aeiou]
I like the idea of being able to continue versus chunk patterns. How 
do you say  This is a continuation of the other thing versus This 
is a separate thing ?

=Austin


Re: S05 question

2004-12-08 Thread Patrick R. Michaud
On Wed, Dec 08, 2004 at 08:19:17AM -0800, Larry Wall wrote:
 And people would have to get used to seeing ? as non-capturing assertions:
 ?before ...
 ?after ...
 ?ws
 ?sp
 ?null
 This has a rather Ruby-esque I am a boolean feeling to it.  I think
 I like it.  It's pretty easy to type, at least on my keyboard.

FWIW, for some reason in rule contexts I tend to conflate 
I am a boolean feelings with zero-width assertion, so that each
of those look vaguely to me as though I'm testing a zero-width 
proposition and not consuming any text.  And I still tend to think of
'?' in it's zero or one matches or minimal match connotations.
Oh well, I suppose I could get used to that.

 Now suppose that we extend that I am a boolean feeling to
 ?{ code }
 which might take the place of the confusing (...), and make consistent
 the notion that we always use {...} to invoke real code.

Hmm, this is nice, however.

 Another problem we've run into is naming if there are multiple assertions
 of the same name.  If the capture name is just the alpha part of the
 assertion, then we could allow an optional number, and still recognize
 it as a ws:
 ws1 ws2 ws3
 Except I can well imagine people wanting numbered rules.  Drat.  Could
 force people to say ws_1 if they want that, I suppose.

I had been thinking that 

/ws foo ws bar/

would simply cause $ws to be a list of captured elements, similar to 
what might happen for $1 in 

/ [ (.*?) , ]* /

If someone really needs the contents of the first and second ws, they
could do

   (ws) foo (ws)

and get them as $1 and $2.  But, seeing this tells me that perhaps
(rule) should be used for capturing rules, analogous to the
capturing parens, and leave rule to be the non-capturing version.
But maybe that's anti-Huffman overall.  Maybe the parens could also
help for disambiguating

   (ws) foo (ws)

so that we end up with $/ws[1], $/ws[2], etc.  But then we might
have to always subscript our named captures, which is icky, or maybe 
we'd only make $/ws act like list when there's more than one 
capturing (ws) in the rule.

I dunno.  I kinda like (rule) for capturing, but maybe it just
doesn't work.

Pm


Re: S05 question

2004-12-08 Thread Luke Palmer
Larry Wall writes:
 If we're going to stick with the notion that foo captures and
 something else doesn't, I'm beginning to think that the other thing
 isn't foo for a couple of reasons.

I just sat down to say the exact same thing.  I'm glad you beat me to
it.

 And people would have to get used to seeing ? as non-capturing assertions:
 
 ?before ...
 ?after ...
 ?ws
 ?sp
 ?null
 
 This has a rather Ruby-esque I am a boolean feeling to it.  I think
 I like it.  It's pretty easy to type, at least on my keyboard.

Yeah, I like it pretty well too.  Better than the french quites for
sure.

 Now suppose that we extend that I am a boolean feeling to
 
 ?{ code }
 
 which might take the place of the confusing (...), and make consistent
 the notion that we always use {...} to invoke real code.

Hmm...  I'm just so attached to (...).  I find it quite beautiful.  It
also somehow communicates the feeling you shouldn't be putting
side-effects here.

 I think I'm leaning toward the idea that anything in angles that
 begins alpha is a capture to just the alpha part, so the ? prefix is
 merely a no-op that happens to make the assertion not start with an
 alpha.  Interestingly, that gives these implicit bindings:
 
 after ...   $after$`
 before ...  $before   $'

I don't quite follow.  Wouldn't that mean that these guys would get
clobbered if you used lookaheads or lookbehinds in your rules?

 Or we could use some standard delim for that:
 
 ws-1 ws-2 ws-3
 
 which is vaguely reminiscent of our version syntax.  Indeed, if we
 had quantifications, you might well want to have wildcards ws-* and
 let the name be filled in rather than autogenerating a list.  But
 maybe we just stick with lists in that case.

I can imagine this being a lot cleaner if the thing after the dash can
be any sort of identifier:

ws-indent if ?ws condition ws-comment

On the other hand, it could be misleading, since the standard naming of
BNF uses dashes instead of underscored.  I don't think it should be a
big problem though. 

 I'm still thinking about what ... might mean, if anything.  Bonus
 points for interpolative and/or word-splitty.

Yeah... umm... nope.  I got nothin.

Luke


Re: S05 question

2004-12-08 Thread Ashley Winters
On Wed, 8 Dec 2004 08:19:17 -0800, Larry Wall [EMAIL PROTECTED] wrote:
 / $bar := [ (?ident) = (\N+) ]* /

You know, to be honest I don't know that I want rules in one-liners to
capture by default. I certainly want them to capture in rules, though.

 And people would have to get used to seeing ? as non-capturing assertions:
 
 ?before ...
 ?after ...
 ?ws
 ?sp
 ?null
 
 This has a rather Ruby-esque I am a boolean feeling to it.  I think
 I like it.  It's pretty easy to type, at least on my keyboard.

I like it. It reads to me as if before ..., if null. Sounds good.

 I think I'm leaning toward the idea that anything in angles that
 begins alpha is a capture to just the alpha part, so the ? prefix is
 merely a no-op that happens to make the assertion not start with an
 alpha.  Interestingly, that gives these implicit bindings:
 
 after ... $after$`
 before ...$before   $'

Again, I don't see the utility of that in a one-liner. In a grammar,
you would create a real rule which would assert after ... and
capture the result in a reasonable name.

 Anyway, that's where I am this week/day/hour/minute/second.

I'm thinking capturing rules should be default in rules, where they're
downright useful. Your hour/minute/second comment brings up parsing
ISO time:

grammar ISO8601::DateTime {
rule year { \d4 }
rule month { \d2 }
rule day { \d2 }
rule hour { \d2 }
rule minute { \d2 }
rule second { \d2 }
rule fraction { \d+ }

rule date { year -? month -? day }
rule time { hour \:? minute \:? second [\. fraction]? }
rule datetime { date T time }
}

For a grammar, that works perfectly!

In a one-liner, I'd rather just use:

$datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./

and specify the vars I want to save directly in my own scope.

Ashley Winters


Re: S05 question

2004-12-08 Thread Luke Palmer
Ashley Winters writes:
 I'm thinking capturing rules should be default in rules, where they're
 downright useful. Your hour/minute/second comment brings up parsing
 ISO time:
 
 grammar ISO8601::DateTime {
 rule year { \d4 }
 rule month { \d2 }
 rule day { \d2 }
 rule hour { \d2 }
 rule minute { \d2 }
 rule second { \d2 }
 rule fraction { \d+ }
 
 rule date { year -? month -? day }
 rule time { hour \:? minute \:? second [\. fraction]? }
 rule datetime { date T time }
 }
 
 For a grammar, that works perfectly!

Yep. 

 In a one-liner, I'd rather just use:
 
 $datetime ~~ /$year := (\d+) -? $month := (\d+) -? ./

Then go ahead and use that.  If you're going to use subrules, you can
either use the ?subrule form or just the regular old subrule form
and ignore the result.  There's nothing forcing you to pay attention to
those.  The number variables only get incremented when you use
parentheses.  I'd suspect that the return value of a rule only accounts
for parenthecized captures as well.

Or are you asking something different than that?

Luke


Re: S05 question

2004-12-08 Thread Larry Wall
On Wed, Dec 08, 2004 at 11:09:30AM -0700, Patrick R. Michaud wrote:
: On Wed, Dec 08, 2004 at 08:19:17AM -0800, Larry Wall wrote:
:  And people would have to get used to seeing ? as non-capturing assertions:
:  ?before ...
:  ?after ...
:  ?ws
:  ?sp
:  ?null
:  This has a rather Ruby-esque I am a boolean feeling to it.  I think
:  I like it.  It's pretty easy to type, at least on my keyboard.
: 
: FWIW, for some reason in rule contexts I tend to conflate 
: I am a boolean feelings with zero-width assertion, so that each
: of those look vaguely to me as though I'm testing a zero-width 
: proposition and not consuming any text.  And I still tend to think of
: '?' in it's zero or one matches or minimal match connotations.
: Oh well, I suppose I could get used to that.

Yes, there are those interferences, which was one of the reasons for
removing ? the last time we had it in that position (albeit on the
captures rather than the non-captures).  I think we'll have to let
it set a while to see how it feels in this role.  For the purpose of
being a non-alpha no-op, any other non-alpha character would do as well,
so maybe the I am a boolean feeling is not that useful.

:  Now suppose that we extend that I am a boolean feeling to
:  ?{ code }
:  which might take the place of the confusing (...), and make consistent
:  the notion that we always use {...} to invoke real code.
: 
: Hmm, this is nice, however.

In some ways, and not so nice in others, as Luke pointed out.

:  Another problem we've run into is naming if there are multiple assertions
:  of the same name.  If the capture name is just the alpha part of the
:  assertion, then we could allow an optional number, and still recognize
:  it as a ws:
:  ws1 ws2 ws3
:  Except I can well imagine people wanting numbered rules.  Drat.  Could
:  force people to say ws_1 if they want that, I suppose.
: 
: I had been thinking that 
: 
: /ws foo ws bar/
: 
: would simply cause $ws to be a list of captured elements, similar to 
: what might happen for $1 in 
: 
: / [ (.*?) , ]* /

That's what happens by default whenever there is a name conflict.  This
would just be a way of giving a rule a long name as well as a short one,
much like abscomplex is the long name of abs when dispatched on a
complex number, whereas abs is just the set of all abs() multis, if
there is such a beastie.

: If someone really needs the contents of the first and second ws, they
: could do
: 
:(ws) foo (ws)
: 
: and get them as $1 and $2.  But, seeing this tells me that perhaps
: (rule) should be used for capturing rules, analogous to the
: capturing parens, and leave rule to be the non-capturing version.
: But maybe that's anti-Huffman overall.  Maybe the parens could also
: help for disambiguating
: 
:(ws) foo (ws)
: 
: so that we end up with $/ws[1], $/ws[2], etc.  But then we might
: have to always subscript our named captures, which is icky, or maybe 
: we'd only make $/ws act like list when there's more than one 
: capturing (ws) in the rule.
: 
: I dunno.  I kinda like (rule) for capturing, but maybe it just
: doesn't work.

I thought about that a long time, which was part of the reason I also
thought about freeing up (...).  But it just seems a little icky
to mix together the named captures and numbered captures visually if
not semantically.  It starts not being at all clear which parentheses
count and which ones not.  Which is perhaps another reason for changing
current (...) to ?{...}.

We could, I suppose use a subscript inside:

ws[0] foo ws[1]
ws«first» foo ws«second»

but then you'd reference it as

$ws[0]
$wsfirst

which is a gratuitous difference, and suffers the same problem as
the parenthese in confusing real arrays/hashes with sorta fake ones.
So I think we'll stick with the hyphen names for now, which have the
benefit of looking the same and not sending us to bracket heaven.

ws-1 foo ws-2
ws-first foo ws-second

$ws-1
$ws-first

Larry


Re: S05 question

2004-12-07 Thread Larry Wall
On Tue, Dec 07, 2004 at 12:11:18PM -0700, Patrick R. Michaud wrote:
: I'm reviewing the updated S05 (2 Dec 2004) and ran across this
: in the Hypothetical Variables section:
: 
: # Pairs of repeated captures can be bound to hashes:
: 
: / %options := [ (ident) = (\N+) ]* /
: 
: Actually, I see three captures there, so should this instead read...?
: 
: / %options := [ («ident») = (\N+) ]* /

Probably--that was the intent.  Or maybe that style of capture
ignores rule captures.  (If we do allow rule captures, we have to
worry about setting up a hash that is indexed by object rather than
by string, which seems not terribly useful for grammar reductions.)
Or maybe if there are more than two captures in the brackets, the
first becomes the key and the rest of them become a list value, in
which case ident's $/ could be the first element and \N+ the second.

But somehow I expect that when someone writes (foo) they probably
usually meant («foo»).  On the other hand, it's an interesting idiom
if you really mean that you want the key to be the text value of the
match, and the value to be the object value of the match.

: Or is it that hypotheticals only bind to things captured by parens?
: If so, it might need clarification (or perhaps I'm overlooking the part
: that makes it clear).

No, I think you just found a blind spot in the design.

: A similar question arises a bit later, with
: 
: And this puts a list of lists:
: 
: / $bar := [ (ident) = (\N+) ]* /
: 
: Is the ident capture part of the list of lists that goes into $bar?

Wasn't intended to be.  I think I just missed changing it to «ident» in
the GBS.

Larry