Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Patrick R. Michaud
On Fri, Sep 07, 2007 at 04:05:55PM -0600, Paul Seamons wrote:
> I'd vote for <:ws> which is vaguely reminiscent of the former non-capturing 
> parens (?:).
> 
> It (<:ws>) also bears little similarity to any other regex construct - 
> although it looks a bit like a Perl 6 pair.

For completeness it may be worth pointing out that :i, :s, and :Perl5
are in fact valid regex constructs.  :-)

Pm


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Patrick R. Michaud
On Fri, Sep 07, 2007, Larry Wall writes:
> If we stick with +, one approach might be to simply disallow whitespace
> in composite character classes.

Of the choices presented thus far, I like this one the best.
Although I did like being able to stick whitespace in the
character classes for readability, such that losing the whitespace
in <+foo - [Jj] > would be a disappointment -- I still like <+foo>
as much as the other alternatives.

Even if we decide that <+foo> isn't the official non-capturing syntax,
we still have the case that <+foo> is effectively a non-capturing
form of .  I sorta liked that we were reducing two syntaxes
for the same thing (  and <+foo> ) down to one, so adding
one back in feels funny.

I do agree that we may be getting a few too many +'s in our
patterns.  However, having just converted several grammars in Parrot 
languages to use the new <+foo> syntax, I was surprised at how 
few there actually were.  And many of the existing cases where 
I had previously used  didn't really change (or need to
change), because they were already zero-width things such as
, , , etc., and I felt it made
more sense to keep the  syntax anyway.

Of the non-<+foo> options given thus far, I like <~foo> and <.foo> 
(in that order).  I don't find ~ all that hard to type -- after 
all, we use the tilde quite frequently in things like Unix's 
"~username" syntax, in Perl 5's =~ operator, and even in Perl 
6 with the ~~ smart match operator.  Perhaps I would feel 
differently about tilde if I were on a non-US keyboard.

I agree that <:foo> should probably be reserved for something
having to do with pairs or adverbs.

I'm not at all a fan of <\ws>.

Anyway, those are my reactions, for whatever they're worth.

Pm


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Larry Wall
On Sat, Sep 08, 2007 at 12:12:10AM +0100, Nicholas Clark wrote:
: On Fri, Sep 07, 2007 at 03:50:09PM -0700, Larry Wall wrote:
: > I dunno, maybe <\ws> isn't so bad...
: 
: But as soon as I saw it I thought the same as you say in the paragraph above -
: in the context of a regexp (or string) \ makes me think that one character is
: being back-whacked, rather than it applying to the entire token.
: 
: I suspect my brain will think of rules like regexps. (But I could be wrong,
: and unlike quite a few people on this list, I've not written any yet, so my
: opinion might be of little value)

Well, we could go off in a TeXish direction and say that \foo is a
non-capturing , and \w, \d, etc. are just , , etc.  Then
your whitespace is just \ws, and your word boundary is just \wb.

That would simplify how you define your own \w sequences as well.

\xfe gets a little problematic under that view though, unless we
require all rules starting with x to be called .
Or require people to use \x[fe], which also kinda sux.

Larry


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Nicholas Clark
On Fri, Sep 07, 2007 at 03:50:09PM -0700, Larry Wall wrote:

> The first list is the ones I'm really considering, and of those, <.ws>
> is the easiest to type and gets out of the way of identifier visually.
> It also looks like a method call, which in fact it is.  <~ws> is hard
> to type, and <\ws> can be confused with \w.  The problem with <=foo>
> I already mentioned.  The only strangeness about <.foo> I see is that
> arguments would presumably continue to parse like like ordinary
> assertions: <.foo bar> and <.foo: bar> might be misread.
> 
> I dunno, maybe <\ws> isn't so bad...

But as soon as I saw it I thought the same as you say in the paragraph above -
in the context of a regexp (or string) \ makes me think that one character is
being back-whacked, rather than it applying to the entire token.

I suspect my brain will think of rules like regexps. (But I could be wrong,
and unlike quite a few people on this list, I've not written any yet, so my
opinion might be of little value)

Nicholas Clark


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Larry Wall
On Fri, Sep 07, 2007 at 04:05:55PM -0600, Paul Seamons wrote:
: > Other available chars:
: >
: > <`ws>
: > <^ws>
: > <&ws>
: > <*ws>
: > <-ws>

I forgot we're using - already, so scratch that one...

: > <|ws>
: > <:ws>
: > <;ws>
: > 
: 
: I'd vote for <:ws> which is vaguely reminiscent of the former non-capturing 
: parens (?:).

I'm not sure a resemblance to P5 syntax is really a recommendation... :)

: It (<:ws>) also bears little similarity to any other regex construct - 
: although it looks a bit like a Perl 6 pair.

Which might be a good argument for reserving the syntax for real
pairs somehow.  Also, pairs have special arguments, and people would
wonder what <:foo(...)> <:foo[...]>, <:foo{...}> and <:foo<...>> mean.
Not to mention <:!foo>.

I should have pointed out that I think all the candidates from the
last list are long shots for various reasons.   looks like a
closing tag. <*ws> is visually confusing with other * usages, and while
<^ws> implies some kind of negation culturally, it's a form of negation
we're trying to get away from, in favor of consistently using !.

The first list is the ones I'm really considering, and of those, <.ws>
is the easiest to type and gets out of the way of identifier visually.
It also looks like a method call, which in fact it is.  <~ws> is hard
to type, and <\ws> can be confused with \w.  The problem with <=foo>
I already mentioned.  The only strangeness about <.foo> I see is that
arguments would presumably continue to parse like like ordinary
assertions: <.foo bar> and <.foo: bar> might be misread.

I dunno, maybe <\ws> isn't so bad...

Larry


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Paul Seamons
> Other available chars:
>
> <`ws>
> <^ws>
> <&ws>
> <*ws>
> <-ws>
> <|ws>
> <:ws>
> <;ws>
> 

I'd vote for <:ws> which is vaguely reminiscent of the former non-capturing 
parens (?:).

It (<:ws>) also bears little similarity to any other regex construct - 
although it looks a bit like a Perl 6 pair.

Paul


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Larry Wall
On Fri, Sep 07, 2007 at 02:45:52AM -0500, Patrick R. Michaud wrote:
: On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote:
: > Log:
: > old  is now <+foo> to suppress capture
: > new  now is zero-width like 
: 
: I really like the change from  to <+foo>, but I think there's
: a conflict (or at least some confusion) in the way the new spec is
: worded, especially as it relates to character class sets.

I'm actually still of two minds whether it's proper to overload <+foo>
like that, and what we end up with may well depend on revisions to the
binding syntax.  But it can be <+foo> for now, assuming we can deal with
the ambiguities you point out.  'Course, by the time we're done with
that, we might well decide <+foo> is a bad plan...

: Both old and new versions of S05 say:
: 
: If the first character after the identifier is whitespace, the
: subsequent text (following any whitespace) is passed as a regex, 
: so  is more or less equivalent to .
: 
: In the previous version of S05, the non-capturing form of 
: would be .  Here, the whitespace after "foo" indicated
: that "bar" was to be parsed and passed to foo as a regex.
: 
: In the new version of S05, the non-capturing form of 
: would seem to be <+foo bar>.  Okay, I can handle that.  However, 
: S05 also says that "  can be written as <+ foo + bar - baz> ".
: Presumably this second form would also allow "<+foo + bar - baz>",
: which seems to conflict slightly with the notion that <+foo bar>
: is the non-capturing form of .  In other words, the
: whitespace character following "<+foo" doesn't seem to be
: sufficient to indicate how the remainder is to be processed --
: we have to look beyond the whitespace for a leading plus or minus.

If we stick with +, one approach might be to simply disallow whitespace
in composite character classes.

: Perhaps S05 is addressing this when it says 
: 
: An initial identifier is taken as a character class, so the 
: first character after the identifier doesn't matter in this 
: case, and you can use whitespace however you like.
: 
: Here I find this wording very unclear -- it doesn't tell me 
: what is distinguishing the "doesn't matter in this case" part
: between <+foo + bar> and <+foo bar>.

What, me unclear?  How could that happen?  :-)

[Don't answer that...]

: Since the S05 spec has changed so that all punctuation is meta, 
: I'm thinking we may be able to simplify the spec altogether.
: Previously the "whitespace following the identifier" was
: used to distinguish  from , or 
: from .  Since it's now effectively impossible for 
: a regex to begin with a bare plus or minus character, we may be
: able to alter the "whitespace following identifier" wording such
: that  and  are identical.  Perhaps
: something like:
: 
:   - if the character following the identifier is a left paren,
: it's a call
: 
: 
: <+foo('bar')>
: 
: 
:   - if the character following the identifier is a colon, the rest
: of the text (following any whitespace) is passed as a string
: 
:  # same as 
: <+foo: bar>
: 
: 
:   - if the identifier is followed by a plus or minus (with optional
: intervening whitespace), it's a set of character classes
: 
: 
:   # same thing
: <+foo + baz - bar> # also the same
: 
:   - anything else following whitespace is a regex to be passed
: 
:   # same as 
: <+foo bar> # same as <+foo(/bar/)>
:  # same as 

That's assuming we don't define any metasyntax that starts with + or
- in the future, such as bare +[ a..z ], or +[ ...] as a variant of
[...]+.  And while we could resolve the ambiguity of the second +
by fiat, it would probably be better if the ambiguity didn't arise
in the first place.  If <+foo ...> is going to change the parsing
of ...  at all, then it should probably do so consistenly, which
means <+foo> is really a bad plan.  (Also, there are already too
many +'s in patterns.)  So while it's cute to generalize <+foo> to
"establish the initial universal set of matches", I suspect it's
likely to change to something else.  Possibilities I've been mulling:

<~ws>   # "I just want to match as a string"
<\ws>   # "Don't do the normal thing with the following"
<.ws>   # "Just call the ws method"
<=ws>   # "Bind to nothing", assuming  binds $

Damian points out that it's a little strange for = to enable binding
in the  case but disable it in the <=ws> case.  It would be
possible to make <=ws> mean  and  not capture at all.
Offhand I'd say that would be bad huffmanization, but I need to look
at STD some more.  It also depends on any post-binding syntax
resembling:

 -> $foo {...}

and whether that is deemed preferable to  or $foo= or
whatever.  (One nice thing about the post syntax is that we could know
for sure that we're creating a new 

Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Smylers
[EMAIL PROTECTED] writes:

> -A leading C<[> or C<+> indicates an enumerated character class.  Ranges
> +A leading C<[> indicates an enumerated character class.  Ranges
>  in enumerated character classes are indicated with "C<..>" rather than 
> "C<->".
>  
>   / <[a..z_]>* /
> - / <+[a..z_]>* /
> - / <+[ a..z _ ]>* /
> - / <+ [ a .. z _ ] >* /
>  
>  Whitespace is ignored within square brackets and after the initial C<+>.

Did you mean to remove "and after the initial C<+>" as well?

> + / <[ a..z _ ]>* /
> +

Simon


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Patrick R. Michaud
Some other minor notes about the S05.pod update:

> +In particular,  also matches the null string, and  always fails.

Perhaps these should be quoted with "C<< ... >>" so that it's
clear that "" and "" are the tokens?  When looking at the
.pod file I had to think about it a couple of times to make sure
that it wasn't intending C and C.

> +Any atom that is quantified with a minimally match (using the C modifier).

s/minimally/minimal/

> +Greedy quantifiers and characters classes do not terminate a token pattern.

s/characters/character/

Thanks,

Pm



Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-07 Thread Patrick R. Michaud
On Thu, Sep 06, 2007 at 05:12:03PM -0700, [EMAIL PROTECTED] wrote:
> Log:
> old  is now <+foo> to suppress capture
> new  now is zero-width like 

I really like the change from  to <+foo>, but I think there's
a conflict (or at least some confusion) in the way the new spec is
worded, especially as it relates to character class sets.

Both old and new versions of S05 say:

If the first character after the identifier is whitespace, the
subsequent text (following any whitespace) is passed as a regex, 
so  is more or less equivalent to .

In the previous version of S05, the non-capturing form of 
would be .  Here, the whitespace after "foo" indicated
that "bar" was to be parsed and passed to foo as a regex.

In the new version of S05, the non-capturing form of 
would seem to be <+foo bar>.  Okay, I can handle that.  However, 
S05 also says that "  can be written as <+ foo + bar - baz> ".
Presumably this second form would also allow "<+foo + bar - baz>",
which seems to conflict slightly with the notion that <+foo bar>
is the non-capturing form of .  In other words, the
whitespace character following "<+foo" doesn't seem to be
sufficient to indicate how the remainder is to be processed --
we have to look beyond the whitespace for a leading plus or minus.

Perhaps S05 is addressing this when it says 

An initial identifier is taken as a character class, so the 
first character after the identifier doesn't matter in this 
case, and you can use whitespace however you like.

Here I find this wording very unclear -- it doesn't tell me 
what is distinguishing the "doesn't matter in this case" part
between <+foo + bar> and <+foo bar>.

Since the S05 spec has changed so that all punctuation is meta, 
I'm thinking we may be able to simplify the spec altogether.
Previously the "whitespace following the identifier" was
used to distinguish  from , or 
from .  Since it's now effectively impossible for 
a regex to begin with a bare plus or minus character, we may be
able to alter the "whitespace following identifier" wording such
that  and  are identical.  Perhaps
something like:

  - if the character following the identifier is a left paren,
it's a call


<+foo('bar')>


  - if the character following the identifier is a colon, the rest
of the text (following any whitespace) is passed as a string

 # same as 
<+foo: bar>


  - if the identifier is followed by a plus or minus (with optional
intervening whitespace), it's a set of character classes


  # same thing
<+foo + baz - bar> # also the same

  - anything else following whitespace is a regex to be passed

  # same as 
<+foo bar> # same as <+foo(/bar/)>
 # same as 

Pm


Re: [svn:perl6-synopsis] r14449 - doc/trunk/design/syn

2007-09-06 Thread jerry gay
On 9/6/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> @@ -1254,6 +1273,17 @@
>
>  =item *
>
> +A leading C indicates a positive zero-width assertion, and like C
> +merely reparses the rest of the assertion recursively as if the C
> +were not there.  In addition to forcing zero-width, it also suppresses
> +any named capture:
> +
> + # match a letter and capture in $
> +<+alpha># match a letter, don't capture
> +# much null before a letter, don't capture
> +
> +=item *
> +
>  A leading C<~~> indicates a recursive call back into some or all of
>  the current rule.  An optional argument indicates which subpattern
>  to re-use, and if provided must resolve to a single subpattern.

s/much/match/