Re: pattern alternation (was Re: How are ...)

2010-08-06 Thread Moritz Lenz


Darren Duncan wrote:
 David Green wrote:
 On 2010-08-05, at 8:27 am, Aaron Sherman wrote:
 On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
 I see this particular thinko a lot, though. Maybe some Perl 6 lint tool
 or another will detect when you have a regex containing ^ at its start, $
 at the end, | somewhere in the middle, and no [] to disambiguate.
 
 I think conceptually the beginning and the end of a string feels like a
 bracketing construct (only without symmetrical symbols).  At least that seems
 to be my instinct.  Well, it doesn't in / ^foo | ^bar | ^qux /, but in
 something like /^ foo|bar $/, the context immediately implies a higher
 precedence for ^ and $.  Maybe something like // foo|bar // could work as a
 bracketing version?
 
 Personally, I had always considered the ^ and $ to be the lowest precedence 
 things in a pattern. 

Meta characters don't have a precedence on their on - concatenation has.

Cheers,
Moritz


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Patrick R. Michaud
On Thu, Aug 05, 2010 at 12:29:38AM -0700, Darren Duncan wrote:
 Carl Mäsak wrote:
 Darren ():
 Read what I said again.  I was proposing that the namespace comprised of
 names matching a pattern like this:
 
  /^ [A..Z]+ | [a..z]+ $/
 
 /^ [[A..Z]+ | [a..z]+] $/
 
 Are the square brackets necessary when the pattern doesn't contain
 anything other than the alternatives?

In this case yes -- the original pattern without the square brackets
would act like:

/ [^ [A..Z]+] | [[a..z]+ $] /

In other words, the original pattern says starting with uppercase
or ending with lowercase.

Pm



Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Carl Mäsak
Darren (), Carl (), Darren (), Patrick ():
 Read what I said again.  I was proposing that the namespace comprised of
 names matching a pattern like this:
 
  /^ [A..Z]+ | [a..z]+ $/
 
 /^ [[A..Z]+ | [a..z]+] $/

 Are the square brackets necessary when the pattern doesn't contain
 anything other than the alternatives?

 In this case yes -- the original pattern without the square brackets
 would act like:

    / [^ [A..Z]+] | [[a..z]+ $] /

 In other words, the original pattern says starting with uppercase
 or ending with lowercase.

I see this particular thinko a lot, though. Maybe some Perl 6 lint
tool or another will detect when you have a regex containing ^ at its
start, $ at the end, | somewhere in the middle, and no [] to
disambiguate.

// Carl


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Aaron Sherman
On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:

 Darren (), Carl (), Darren (), Patrick ():

  In this case yes -- the original pattern without the square brackets
  would act like:
 
 / [^ [A..Z]+] | [[a..z]+ $] /
 
  In other words, the original pattern says starting with uppercase
  or ending with lowercase.

 I see this particular thinko a lot, though. Maybe some Perl 6 lint
 tool or another will detect when you have a regex containing ^ at its
 start, $ at the end, | somewhere in the middle, and no [] to
 disambiguate.



You know, this problem would go away, almost entirely, if we had a :f[ull]
adverb for regex matching that imposed ^[...]$ around the entire match. Then
your code becomes:

  m:f/[A..Z]+|[a..z]+/

for grins, :f[ull]l[ine] could use ^^ and $$.

I suspect :full would almost always be associated with TOP, in fact. Boy am
I tired of typing ^ and $ in TOP ;-)

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Jon Lang
Aaron Sherman wrote:
 You know, this problem would go away, almost entirely, if we had a :f[ull]
 adverb for regex matching that imposed ^[...]$ around the entire match. Then
 your code becomes:

  m:f/[A..Z]+|[a..z]+/

 for grins, :f[ull]l[ine] could use ^^ and $$.

 I suspect :full would almost always be associated with TOP, in fact. Boy am
 I tired of typing ^ and $ in TOP ;-)

The regex counterpart of C say $x  vs. C print $x\n .  Yes,
this would indeed solve a lot of problems.  It also reflects a
tendency in some regular expression engines out there to automatically
impose full string matching (i.e., an implicit ^ at the start and $ at
the end).

That said: for mnemonic purposes, I'd be inclined to have :f do
/^[$pattern]$/, while :ff does /^^[$pattern]$$/.

-- 
Jonathan Dataweaver Lang


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread David Green
On 2010-08-05, at 8:27 am, Aaron Sherman wrote:
 On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
 
 I see this particular thinko a lot, though. Maybe some Perl 6 lint tool or 
 another will detect when you have a regex containing ^ at its start, $ at 
 the end, | somewhere in the middle, and no [] to disambiguate.

I think conceptually the beginning and the end of a string feels like a 
bracketing construct (only without symmetrical symbols).  At least that seems 
to be my instinct.  Well, it doesn't in / ^foo | ^bar | ^qux /, but in 
something like /^ foo|bar $/, the context immediately implies a higher 
precedence for ^ and $.  Maybe something like // foo|bar // could work as a 
bracketing version?

 You know, this problem would go away, almost entirely, if we had a :f[ull] 
 adverb for regex matching that imposed ^[...]$ around the entire match. 

I was thinking of that too.

 I suspect :full would almost always be associated with TOP, in fact. Boy am
 I tired of typing ^ and $ in TOP ;-)

Does it make sense for ^[...]$ to be assumed in TOP by default?  (Though not 
necessary if there's a shortcut like //...//.)


-David



Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Patrick R. Michaud
On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote:
 On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
  I see this particular thinko a lot, though. Maybe some Perl 6 lint
  tool or another will detect when you have a regex containing ^ at its
  start, $ at the end, | somewhere in the middle, and no [] to
  disambiguate.
 
 You know, this problem would go away, almost entirely, if we had a :f[ull]
 adverb for regex matching that imposed ^[...]$ around the entire match. Then
 your code becomes:
 
   m:f/[A..Z]+|[a..z]+/

There's a version of this already.  Matching against an explicit 'regex', 
'token', or 'rule' automatically anchors it on both ends.  Thus:

$string ~~ regex { [A..Z]+ | [a..z]+ }

is equivalent to

$string ~~ regex { ^ [ A..Z+ | [a..z]+ ] $ }

Pm


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Aaron Sherman
On Thu, Aug 5, 2010 at 11:09 AM, Patrick R. Michaud pmich...@pobox.comwrote:

 On Thu, Aug 05, 2010 at 10:27:50AM -0400, Aaron Sherman wrote:
  On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:
   I see this particular thinko a lot, though. Maybe some Perl 6 lint
   tool or another will detect when you have a regex containing ^ at its
   start, $ at the end, | somewhere in the middle, and no [] to
   disambiguate.
 
  You know, this problem would go away, almost entirely, if we had a
 :f[ull]
  adverb for regex matching that imposed ^[...]$ around the entire match.
 Then
  your code becomes:
 
m:f/[A..Z]+|[a..z]+/

 There's a version of this already.  Matching against an explicit 'regex',
 'token', or 'rule' automatically anchors it on both ends.  Thus:

$string ~~ regex { [A..Z]+ | [a..z]+ }

 is equivalent to

$string ~~ regex { ^ [ A..Z+ | [a..z]+ ] $ }


While that's a nifty special case (I'm sure it will surprise me someday, and
I'll spend a half hour debugging before I remember this mail), it doesn't
help in the general case (see my example grammar, below).

After doing some more thinking and comparing this to other languages
(python, for example has match which matches only at the start of a
string), it seems to me that there is a sort of out-of-band need to have a
more general solution at match time. Here's my second pass suggestion:

 m:r / m:rooted -- Match is rooted on both ends (^...$)
 m:rs / m:rootedstart - Match is rooted at the start of string (^, ala
Python re.match)
 m:re / m:rootedend - Match is rooted at the end of string ($)
 m:rn / m:rootednone - Match is not rooted (default)
 m:o / m:oneline - Modify :r and friends to use ^^/$$

Here's one way I can see that being routinely used:

 # Simplistic shell scripts
 rule TOP :r {stmt*} # Match the whole script
 rule stmt :r :o { cmd arg* } # One statement per line

The other way to go about that would be with parameterized adverbs. I'm not
sure how comfy people are with those, but they're in the spec. So this:

 m:r / m:rooted -- Match is rooted (default is ^...$)
Parameters:
:s / :start -- Match is rooted only at start (^)
:e / :end -- Match is rooted only at end ($)
[note: :s :e should produce a warning]
:n / :none -- Match is not rooted (null modifier)
[note: combining :n with :s or :e should warn]
:o / :oneline -- Use ^^ and $$ instead of ^ and $
[note: combining :o with :n should warn?]

So our statement matching grammar becomes:

 rule TOP :r {stmt*}
 rule stmt :r(:o) { cmd arg* }

The clown nose is just a side benefit ;-)

Seriously, though, I prefer :r(:o) because :r:o looks like it should be the
opposite of :rw (there is no :ro, as far as I know).

PS: I see no reason that any of this is needed for 6.0.0

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Tyler Curtis
On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman a...@ajs.com wrote:
 While that's a nifty special case (I'm sure it will surprise me someday, and
 I'll spend a half hour debugging before I remember this mail), it doesn't
 help in the general case (see my example grammar, below).

In the general case, no. In the case of your grammar, and all
grammars, it does help.

All regex routines, when called standalone, are anchored to the
beginning and end of the string. So, having ^ and $ at the
beginning and end of your TOP is a no-op unless some other rule calls
it as a subrule.

S05 says: In general, the anchoring of any subrule call is controlled
by its calling context. When a regex, token, or rule method is called
as a subrule, the front is anchored to the current position (as with
:p), while the end is not anchored, since the calling context will
likely wish to continue parsing. However, when such a method is
smartmatched directly, it is automatically anchored on both ends to
the beginning and end of the string. and that The basic rule of
thumb is that the keyword-defined methods never do implicit .*?-like
scanning, while the m// and s// quotelike forms do such scanning in
the absence of explicit anchoring.

Given that the Grammar.parse is specified to create a new Grammar
object and directly match its TOP(or the value of the :rule adverb)
method, without any specification that it does implicit .*? like
scanning, I think that Grammar.parse should always anchor. This
doesn't appear to work quite properly in Rakudo currently. It anchors
to the beginning but not to the end. I'm about to check if there's a
rakudobug for this already, and submit it if not.

 After doing some more thinking and comparing this to other languages
 (python, for example has match which matches only at the start of a
 string), it seems to me that there is a sort of out-of-band need to have a
 more general solution at match time. Here's my second pass suggestion:

  m:r / m:rooted -- Match is rooted on both ends (^...$)
  m:rs / m:rootedstart - Match is rooted at the start of string (^, ala
 Python re.match)
  m:re / m:rootedend - Match is rooted at the end of string ($)
  m:rn / m:rootednone - Match is not rooted (default)
  m:o / m:oneline - Modify :r and friends to use ^^/$$

 Here's one way I can see that being routinely used:

  # Simplistic shell scripts
  rule TOP :r {stmt*} # Match the whole script
  rule stmt :r :o { cmd arg* } # One statement per line

:oneline or similar might be useful. I'm not sure about :rootedend and
:rootedstart. :rooted is useful only in one situation: when implicitly
matching against the topic. You could do m:r/ foo /; to match
against the topic, but regex { foo }; would not do what you want (I
think). I don't know if doing an anchored match against the topic is
really important enough to justify an adverb just so you don't have to
do $_ ~~ regex { foo }.


 The other way to go about that would be with parameterized adverbs. I'm not
 sure how comfy people are with those, but they're in the spec. So this:

  m:r / m:rooted -- Match is rooted (default is ^...$)
    Parameters:
    :s / :start -- Match is rooted only at start (^)
    :e / :end -- Match is rooted only at end ($)
    [note: :s :e should produce a warning]
    :n / :none -- Match is not rooted (null modifier)
    [note: combining :n with :s or :e should warn]
    :o / :oneline -- Use ^^ and $$ instead of ^ and $
    [note: combining :o with :n should warn?]

 So our statement matching grammar becomes:

  rule TOP :r {stmt*}
  rule stmt :r(:o) { cmd arg* }

 The clown nose is just a side benefit ;-)

 Seriously, though, I prefer :r(:o) because :r:o looks like it should be the
 opposite of :rw (there is no :ro, as far as I know).

 PS: I see no reason that any of this is needed for 6.0.0

 --
 Aaron Sherman
 Email or GTalk: a...@ajs.com
 http://www.ajs.com/~ajs




-- 
Tyler Curtis


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Darren Duncan

David Green wrote:

On 2010-08-05, at 8:27 am, Aaron Sherman wrote:

On Thu, Aug 5, 2010 at 7:55 AM, Carl Mäsak cma...@gmail.com wrote:

I see this particular thinko a lot, though. Maybe some Perl 6 lint tool
or another will detect when you have a regex containing ^ at its start, $
at the end, | somewhere in the middle, and no [] to disambiguate.


I think conceptually the beginning and the end of a string feels like a
bracketing construct (only without symmetrical symbols).  At least that seems
to be my instinct.  Well, it doesn't in / ^foo | ^bar | ^qux /, but in
something like /^ foo|bar $/, the context immediately implies a higher
precedence for ^ and $.  Maybe something like // foo|bar // could work as a
bracketing version?


Personally, I had always considered the ^ and $ to be the lowest precedence 
things in a pattern.  But I can understand the flexibility one gains from that 
not being so, having seen David's example here, which it never occurred to me 
before was possible. -- Darren Duncan


Re: pattern alternation (was Re: How are ...)

2010-08-05 Thread Aaron Sherman
On Thu, Aug 5, 2010 at 2:43 PM, Tyler Curtis ekir...@gmail.com wrote:

 On Thu, Aug 5, 2010 at 12:28 PM, Aaron Sherman a...@ajs.com wrote:
  While that's a nifty special case (I'm sure it will surprise me someday,
 and
  I'll spend a half hour debugging before I remember this mail), it doesn't
  help in the general case (see my example grammar, below).

 In the general case, no. In the case of your grammar, and all
 grammars, it does help.

 All regex routines, when called standalone, are anchored to the
 beginning and end of the string. So, having ^ and $ at the
 beginning and end of your TOP is a no-op unless some other rule calls
 it as a subrule.


There's something deeply disturbing to me in that... but I can't fully
express what it is. It just feels like I'm going to end up debugging
mountains of code, written by people who didn't understand that that was the
case.

Several times over the past few weeks, I've mentioned something on this list
only to find that, buried somewhere deep in a synopsis, there was a special
case I was unaware of.

The sheer volume of silent special cases in Perl 6 appears to be dwarfing
that of Perl 5, but perhaps that's just because I know Perl 5 far better
than I know Perl 6.

Mind you, I'm not complaining, so much as working out how I feel out
loud Am I the only one who feels this way at this point?



 :oneline or similar might be useful. I'm not sure about :rootedend and
 :rootedstart.


Are you saying that you can't think of examples of where you want to root a
regex only to the start or end, or that you just don't think you need an
adverb to do it? If the former, then I submit the 1536 examples of matching
only at the end of strings in my local Perl library (mostly for matching
whitespace or filename extensions it looks like) and the 3199 examples of
matching only at the start which includes headers of all types (RFC2822 and
friends, HTTP, CPAN configs, etc.), whitespace, command sequence matching
(e.g. /^GET /) and so on.

If the latter, then I guess you and I just have a different take, here, and
that's fine. I respect your opinion, but in this case, I happen to disagree.

PS: You can also search through any typical python install for \.match
which will yield quite a lot of additional examples. I don't know Ruby or
Java very well, or I'd go looking for examples there too.

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs