Re: C:: in rules
On Fri, May 13, 2005 at 01:07:20PM -0700, Larry Wall wrote: On Fri, May 13, 2005 at 11:54:47AM -0500, Patrick R. Michaud wrote: : $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; : $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; : $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; I would prefer that $r1 work like $r3, not like $r2, for two reasons. Now implemented as such in Parrot r8103. And yes, it now means that rx :w /foo/; rx /:w::foo/; rx /[:w::foo]/; are all identical, which is very nice. By the way, I still think of it as a group of alternatives even if there's only one alternative, and no |. But I can see how that can be misread to imply at least two alternatives. [...] And if there's no alternative, you only have one alternative. Ain't English wonderful? ...and this last bit means we can strike the It is illegal to use C:: outside of an alternation from S05, since we're always inside of an alternation (group of alternatives), even if there's only one alternative. That sentence has now been struck. Many thanks for the clarification and discussion. Pm
Re: C:: in rules
Larry Wall wrote: Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). Isn't that slightly inconsistent with :p meaning :p(1) the so-called real winner for passing boolean options of A12? -- TSa
Re: C:: in rules
TSa (Thomas Sandlaß) kirjoitti: Larry Wall wrote: Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). Isn't that slightly inconsistent with :p meaning :p(1) the so-called real winner for passing boolean options of A12? Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1) and not :p(1) -- Markus Laire Jam. 1:5-6
Re: C:: in rules
Markus Laire skribis 2005-05-13 11:43 (+0300): Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1) and not :p(1) aol Agreed / Juerd -- http://convolution.nl/maak_juerd_blij.html http://convolution.nl/make_juerd_happy.html http://convolution.nl/gajigu_juerd_n.html
Re: C:: in rules
On Fri, 2005-05-13 at 00:26, Patrick R. Michaud wrote: On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote: On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote: : Also, A05 proposes incorrect alternatives to the above : : /[:w[]foo bar]/ I would just like to point out that you are misreading those. I've been looking at patterns too long You know, this is going to be a problem for a lot of people... Think of this case: /:w[foo bar|bar foo]/ I may be in the minority here, but I think we should try to avoid having [] and () mean different things in different parts of a rule, especially where one use is VERY common, and the other is obscure at best. I'd even be ok with only allowing this inside our already highly magical : /:w[foo bar|bar foo]/ and /:p(false)/ and / :p5['ponie'] (?{die;}) / I checked, and while ::... has a meaning in S05, :... does not, so as long as we never allow a modifier called ::, this would work. In fact, Larry, I think it's safe to say that is actually more sought-after than that : everyone wants ;-) -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: C:: in rules
On 5/12/05, Patrick R. Michaud [EMAIL PROTECTED] wrote: I have a couple of questions regarding C :: in perl 6 rules. First, a question of verification -- in $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; travel by plane jet train tgv today ~~ $rule I think the match should fail outright, as opposed to matching train tgv. In other words, it acts as though one had written $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; and not $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Those both do the same thing (which is the same as your example). When you fail over the :: after plane, it skips out of the alternation looking for something to backtrack before it. Since there is nothing, the rule fails. Does this sound right? Next on my list, S05 says It is illegal to use :: outside of an alternation, but A05 has /[:w::foo bar]/ which leads me to believe that :: isn't illegal here even though there's no alternation. I'd like to strike that sentence from S05 Yeah, I think using :: to break out of the innermost bracketing group is helpful even without an alternation present. Also, A05 proposes incorrect alternatives to the above /[:w[]foo bar]/# null pattern illegal, use null /[:w()foo bar]/# null capture illegal, and probably undesirable /[:w\bfoo bar]/# not exactly the same as above I'd like to remove those from A05, or at least put an Update: note there that doesn't lead people astray. One option not mentioned in A05 that we can add there is /[:w?nullfoo bar]/ which is admittedly ugly. So, now then, on to the item that got me here in the first place. The upshot of all of the above is that rx :w /foo bar/ is not equivalent to rx /:w::foo bar/ Yeah, but it is. So no problem. :-) which may surprise a few people. The :: at the beginning of the pattern effectively anchors the match to the beginning of the string or the current position -- i.e., it eliminates the implicit C .*? at the start of the match. Ohhh, ohh. There isn't an implicit .*? at the beginning of the match. It's more like there's an implicit .*? followed by a rule call to the match. Think of it as that we're trying to match the pattern at any position rather than there being an implicit .*?. Luke
Re: C:: in rules
On 5/13/05, Patrick R. Michaud [EMAIL PROTECTED] wrote: To use the phrase from later in your message, there's still the implicit .*? followed by the rule call. Since the rule itself hasn't failed (only the group failed), we're still free to try to match the pattern at later positions. I'm basically saying that you should treat your: $str ~~ /abc :: def | ghi :: jkl | mn :: op/; As: $rule = rx/abc :: def | ghi :: jkl | mn :: op/; $str ~~ /^ .*? $rule/; Which means that you fail the rule, your .*? advances to the next character and tries the rule again. Maybe I'm misunderstanding your interpretation (when in doubt, explain with code). Luke
Re: C:: in rules
On Fri, May 13, 2005 at 03:36:50PM +, Luke Palmer wrote: I'm basically saying that you should treat your: $str ~~ /abc :: def | ghi :: jkl | mn :: op/; As: $rule = rx/abc :: def | ghi :: jkl | mn :: op/; $str ~~ /^ .*? $rule/; Which means that you fail the rule, your .*? advances to the next character and tries the rule again. Taking this explanation literally, this would mean that $rule = rx/abc :: def | ghi :: jkl | mn :: op/; $rule = rx/abc ::: def | ghi ::: jkl | mn ::: op/; both succeed against xyzabc---ghijkl. But even just considering the :: instance, this interpretation doesn't match what you said in your original message that :: would fail the rule without further advancing: Pm $rule =3D rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; Pm travel by plane jet train tgv today ~~ $rule LP When you fail over the :: after plane, it skips out of the alternation LP looking for something to backtrack before it. Since there is nothing, LP the rule fails. Maybe I'm misunderstanding your interpretation (when in doubt, explain with code). One of us is misunderstanding the other. I'll explain with code, but first let's clarify the difference. I read your first message as claiming that $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; are equivalent. I believe $r2 and $r3 are not equivalent. For comparison, let's first look at a slightly different example, and let's avoid subrules they don't provide the auto-advance of unanchored patterns that forms the crux of my question. First, I'm quite certain that $r2 and $r3 are different. For illustration, let's use a variation like: $q2 = rx / \w [ abc ::: def | ghi ::: jkl | mn ::: op ] /; $q3 = rx / \w [ [ abc :: def | ghi :: jkl | mn :: op ] ]/; xyzabc---xyzghijklmno ~~ $q2 # fails after seeing zabc xyzabc---xyzghijklmno ~~ $q3 # matches zghijkl The difference is precisely the difference between ::: and :: -- the former fails the rule entirely, while the latter simply fails the current group (of alternations) and tries again. With :::, an unanchored rule should also stop its process of advancing to the next character and trying again. (Otherwise, abefgh ~~ rx / [ ab ::: cd | ef ::: gh ] / succeeds.) So, by analogy $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; xyzabc---xyzghijklmno ~~ $r2 # fails after seeing abc xyzabc---xyzghijklmno ~~ $r3 # matches ghijkl The :: in $r3 doesn't cause the entire rule to fail, just the group, so the match is free to backtrack and continue its advance to the next character and try again. (What the :: in $r3 *does* do is to tell the matching engine to not bother trying the remaining alternatives once it has seen an abc at this point.) So, going back to the original $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; does it work like $r2 or $r3? My gut feeling is that it should work like $r2 -- i.e., that once we find an abc we'll fail the rule if there's not a def following. This also accords with what others have written in reply, when they say that all three of my expressions fail in the same way (even though they do not). However, *if* we say that :: at the top level fails the rule, that means that as things currently stand $z1 = rx :w /foo/; $z2 = rx /:w::foo/; $z3 = rx /[:w::foo]/; can be a little surprising: hello foo ~~ $z1 # matches foo hello foo ~~ $z2 # fails immediately upon the 'h' != 'f' hello foo ~~ $z3 # matches foo which was the point of my original post. And as I said there, I don't have a problem with this, I just wanted to make this result didn't surprise too many others. I hope this was clear enough -- if not, explain counter examples in code. :-) Pm
Re: C:: in rules
On Fri, May 13, 2005 at 11:43:42AM +0300, Markus Laire wrote: : Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1) : and not :p(1) I'm still not sure I believe in booleans to that extent. I suppose we could go as far as to make it :p(0 but true). Actually, it's more like undef but true, if you want to be able to distinguish sub foo (+$p = 0) { # no :p at all say true if $p; # :p with no argument $p //= 42; # :p with no argument ... } Or maybe it's something more like 1 but assumed. In any event, it'd be nice to be able to distinguish :p from :p(1) somehow. Maybe the Bool type is good enough for that. The bool type probably isn't unless we depend on autoboxing to turn it into a Bool consistently. Larry
Re: C:: in rules
Larry wrote: I'm still not sure I believe in booleans to that extent. I suppose we could go as far as to make it :p(0 but true). Actually, it's more like undef but true, if you want to be able to distinguish sub foo (+$p = 0) { # no :p at all say true if $p; # :p with no argument $p //= 42; # :p with no argument ... } Yes, I was thinking along the same lines. Cundef but true as a default seems to be more accurate and useful than CBool::true. Damian
Re: C:: in rules
On 5/13/05, Patrick R. Michaud [EMAIL PROTECTED] wrote: First, I'm quite certain that $r2 and $r3 are different. For illustration, let's use a variation like: $q2 = rx / \w [ abc ::: def | ghi ::: jkl | mn ::: op ] /; $q3 = rx / \w [ [ abc :: def | ghi :: jkl | mn :: op ] ]/; xyzabc---xyzghijklmno ~~ $q2 # fails after seeing zabc xyzabc---xyzghijklmno ~~ $q3 # matches zghijkl Okay, I know where the misunderstanding is. When we use these kinds of examples, let's not rely on the implicit matching semantic. I'm saying that the above code is equivalent to: # the following is a rule, so ::: backtracks out of it and no further rule q2 { \w [ abc ::: def | ghi ::: jkl | mn ::: op ] } rule q3 { \w [ [ abc :: def | ghi :: jkl | mn :: op ] ] } xyzabc---xyzghijklmno ~~ /^ .*? q2/; # ::: backtracks into the .*? xyzabc---xyzghijklmno ~~ /^ .*? q3/; The presence of the \w does nothing, because \w doesn't backtrack. Alternations and quantifiers backtrack when you fail beyond them, \w just fails. You never enter the same subpattern (meant in the most general case: .* is a subpattern, for instance) in the same state. Something had to change behind you in order for a subpattern to be re-entered. I think the misunderstanding is rather simple. You keep talking like you prepend a .*? to the rule we're matching. I think that's wrong (and this is where I'm making a design call, so we can dispute on this once we're clear that it's this that is being disputed). I think there is a special rule: rule matchanywhere($rx) { .*? $rx } Which makes a *subrule call* to the rule we're matching. Therefore ::: just breaks out of that subrule, and backtracks into the .*? again. Because of this, I think there will be a difference between ::: and commit at the top level, but not :: and :::. Luke
Re: C:: in rules
On Sat, May 14, 2005 at 01:15:36AM +, Luke Palmer wrote: : I think the misunderstanding is rather simple. You keep talking like : you prepend a .*? to the rule we're matching. I think that's wrong : (and this is where I'm making a design call, so we can dispute on this : once we're clear that it's this that is being disputed). I think : there is a special rule: : : rule matchanywhere($rx) { .*? $rx } : : Which makes a *subrule call* to the rule we're matching. Therefore : ::: just breaks out of that subrule, and backtracks into the .*? : again. I want ::: to break out of *that* dynamic scope (or the equivalent matchrighthere scope), but not ::. Larry
Re: C:: in rules
On 5/14/05, Larry Wall [EMAIL PROTECTED] wrote: On Sat, May 14, 2005 at 01:15:36AM +, Luke Palmer wrote: : I think the misunderstanding is rather simple. You keep talking like : you prepend a .*? to the rule we're matching. I think that's wrong : (and this is where I'm making a design call, so we can dispute on this : once we're clear that it's this that is being disputed). I think : there is a special rule: : : rule matchanywhere($rx) { .*? $rx } : : Which makes a *subrule call* to the rule we're matching. Therefore : ::: just breaks out of that subrule, and backtracks into the .*? : again. I want ::: to break out of *that* dynamic scope (or the equivalent matchrighthere scope), but not ::. I'm not sure that's such a good idea. When you say: rule foo() { a* ::: b } You know precisely where that ::: is going to take you: right out of the rule. That's the way it works in grammars, and there's no implicit anything else that you're breaking out of. But you're saying that when we use a bare // matching a string, that's no longer the case? In other words, this: $str ~~ / a* ::: b / Is different from: $str ~~ / foo / That seems like a pretty obvious indirection, and a mistake to break it. There's nothing there except foo, how could it act differently? Luke
Re: C:: in rules
On Sat, May 14, 2005 at 04:26:44AM +, Luke Palmer wrote: On 5/14/05, Larry Wall [EMAIL PROTECTED] wrote: I want ::: to break out of *that* dynamic scope (or the equivalent matchrighthere scope), but not ::. I'm not sure that's such a good idea. When you say: rule foo() { a* ::: b } You know precisely where that ::: is going to take you: right out of the rule. [...] But you're saying that when we use a bare // matching a string, that's no longer the case? In other words, this: $str ~~ / a* ::: b / Is different from: $str ~~ / foo / That seems like a pretty obvious indirection, and a mistake to break it. There's nothing there except foo, how could it act differently? Because $str ~~ / foo / puts the ::: in a subrule, whereas $str ~~ / a* ::: b / does not. It's the same sort of difference that one gets between { return if $a; } and sub foo() { return if $a; } { foo() } It's clear that the Creturn in the first case affects control flow in in the current sub, while the nested Creturn of foo() in the second case does not. Pm
C:: in rules
I have a couple of questions regarding C :: in perl 6 rules. First, a question of verification -- in $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; travel by plane jet train tgv today ~~ $rule I think the match should fail outright, as opposed to matching train tgv. In other words, it acts as though one had written $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; and not $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Does this sound right? Next on my list, S05 says It is illegal to use :: outside of an alternation, but A05 has /[:w::foo bar]/ which leads me to believe that :: isn't illegal here even though there's no alternation. I'd like to strike that sentence from S05. Also, A05 proposes incorrect alternatives to the above /[:w[]foo bar]/# null pattern illegal, use null /[:w()foo bar]/# null capture illegal, and probably undesirable /[:w\bfoo bar]/# not exactly the same as above I'd like to remove those from A05, or at least put an Update: note there that doesn't lead people astray. One option not mentioned in A05 that we can add there is /[:w?nullfoo bar]/ which is admittedly ugly. So, now then, on to the item that got me here in the first place. The upshot of all of the above is that rx :w /foo bar/ is not equivalent to rx /:w::foo bar/ which may surprise a few people. The :: at the beginning of the pattern effectively anchors the match to the beginning of the string or the current position -- i.e., it eliminates the implicit C .*? at the start of the match. To put the :w inside the rule (e.g., in a variable or subrule), one would have to write it as rx /[:w::foo bar]/ rx /:wnullfoo bar/ Now then, I don't have a problem at all with this outcome -- but I wanted to let p6l verify my interpretation of things and make sure it's okay for me to adjust S05/A05 accordingly. Pm
Re: C:: in rules
My take, based on S05: On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote: I have a couple of questions regarding C :: in perl 6 rules. First, a question of verification -- in $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; travel by plane jet train tgv today ~~ $rule I think the match should fail outright, as opposed to matching train tgv. Correct, that's the meaning of :: S05: Backtracking over a double colon causes the surrounding group of alternations to immediately fail: Your surrounding group is the entire rule, and thus you fail at that point. In other words, it acts as though one had written $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; and not $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Your two examples fail in the same way because of the fact that the group IS the whole rule. Next on my list, S05 says It is illegal to use :: outside of an alternation, but A05 has /[:w::foo bar]/ I can't even figure out what that means. :w turns on word mode (lexically scoped per S05) and :: is a group-level commit. What are we committing exactly? Looks like a noop to me, which actually might not be so bad. However, you're right: this is an error as there are no alternations. which leads me to believe that :: isn't illegal here even though there's no alternation. I'd like to strike that sentence from S05. I don't think it should be removed. You can always use ::: if that's what you wanted. Also, A05 proposes incorrect alternatives to the above /[:w[]foo bar]/# null pattern illegal, use null Correct. /[:w()foo bar]/# null capture illegal, and probably undesirable Correct. /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. So, now then, on to the item that got me here in the first place. The upshot of all of the above is that rx :w /foo bar/ is not equivalent to rx /:w::foo bar/ If we feel strongly, it could be special-cased, but your null solution seems fine to me. -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: C:: in rules
On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote: Next on my list, S05 says It is illegal to use :: outside of an alternation, but A05 has /[:w::foo bar]/ I can't even figure out what that means. :w turns on word mode (lexically scoped per S05) and :: is a group-level commit. What are we committing exactly? Looks like a noop to me, which actually might not be so bad. However, you're right: this is an error as there are no alternations. I think the definition of :: needs to be changed slightly. You even used a phrase that isn't exactly true according to spec but would be if :: meant what I think it should mean. That phrase is :: is a group-level commit. This isn't how I read S05 (and apparently how you and others read it as well, hence your comment to Pm that there are no alternations). S05 says: Backtracking over a double colon causes the surrounding group of alternations to immediately fail: I think it should simply read: Backtracking over a double colon causes the surrounding group to immediately fail: In other words, the phrase of alternations is a red herring. which leads me to believe that :: isn't illegal here even though there's no alternation. I'd like to strike that sentence from S05. I don't think it should be removed. You can always use ::: if that's what you wanted. I too think it should be stricken. /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. What does \b mean again? I assume it's no longer backspace? So, now then, on to the item that got me here in the first place. The upshot of all of the above is that rx :w /foo bar/ is not equivalent to rx /:w::foo bar/ If we feel strongly, it could be special-cased, but your null solution seems fine to me. If :: were to fail the surrounding group we can say that a rule without [] or () is an implicit group for :: purposes. -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: C:: in rules
On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: My take, based on S05: In other words, it acts as though one had written $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; and not $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Your two examples fail in the same way because of the fact that the group IS the whole rule. False. In the first case the group is the whole rule. In the second case the group would not include the (implied) '.*?' at the start of the rule. Perhaps it helps to see the difference if I write it this way: $rule = rx :w /null[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/; Note that the rule is *unanchored*, thus it tries at the first character, if it fails then it goes to the second character, if that fails it goes to the third, etc. Thus, given: $rule1 = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; $rule2 = rx :w /null[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; travel by plane jet train tgv today ~~ $rule1; # fails travel by plane jet train tgv today ~~ $rule2; # matches train tgv They're not equivalent. Next on my list, S05 says It is illegal to use :: outside of an alternation, but A05 has /[:w::foo bar]/ I can't even figure out what that means. :w turns on word mode (lexically scoped per S05) and :: is a group-level commit. What are we committing exactly? Looks like a noop to me, which actually might not be so bad. Yes, the point is that it's a no-op, because /[:wfoo bar:]/ is something entirely different. /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. Nope. Consider: $foo = rx /[:w::foo bar]/ $baz = rx /[:w\bfoo bar]/ myfoo bar ~~ $foo # matches myfoo bar ~~ $baz # fails, foo is not on a word boundary Pm
Re: C:: in rules
On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote: /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. What does \b mean again? I assume it's no longer backspace? For as long as I can remember \b has meant word boundary in regular expressions. :-) :-) Pm
Re: C:: in rules
PRM == Patrick R Michaud [EMAIL PROTECTED] writes: PRM On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote: /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. What does \b mean again? I assume it's no longer backspace? PRM For as long as I can remember \b has meant word boundary in PRM regular expressions. :-) :-) except in char classes where it gets its backspace meaning back. :-) uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: C:: in rules
On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote: On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: In other words, it acts as though one had written $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; and not $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Your two examples fail in the same way because of the fact that the group IS the whole rule. False. In the first case the group is the whole rule. In the second case the group would not include the (implied) '.*?' at the start of the rule. That cannot be true. If it were, then: s/[a]// and s/a// would replace different things, and they MUST NOT. If I've missed some fundamental way in which rx:p5/(?:...)/ is different from rx/[...]/, then please let me know. Otherwise, we can simply demonstrate this with P5: perl -le 'abcaabbcc =~ /(?:aa)/;print $' and unshockingly, that prints aa, not abcaa Note that the rule is *unanchored*, thus it tries at the first character, if it fails then it goes to the second character, if that fails it goes to the third, etc. Yes, you're correct, but when you step forward over input in order to find a start for your unanchored expression, you do NOT consume that input, grouping or not. To say: $foo ~~ /unanchored/ is something like for 0..length($foo)-1 - $i { substr($foo,$i) ~~ /^unanchored/; } and always has been. Unless I'm unaware of some subtlety of [], it is just the same as P5's (?:...), which behaves exactly this way. I'll skip the rest of your post for now, except for the last bit, since I think we need to resolve which universe we're in before we can give each other street directions ;-) /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. Nope. Consider: $foo = rx /[:w::foo bar]/ $baz = rx /[:w\bfoo bar]/ myfoo bar ~~ $foo # matches myfoo bar ~~ $baz # fails, foo is not on a word boundary You're correct, sorry about that. -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: C:: in rules
$rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote: On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote: On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: Your two examples fail in the same way because of the fact that the group IS the whole rule. False. In the first case the group is the whole rule. In the second case the group would not include the (implied) '.*?' at the start of the rule. That cannot be true. If it were, then: s/[a]// and s/a// would replace different things, and they MUST NOT. No, /[a]/ is still the same as /a/ here -- I'm not discussing that at all, nor am I implying any special [] or rule semantics. I'm simply referring to the fact that the rule is free to step across the characters in the string, same as you pointed out. Let me backtrack(!) and try a slightly different example, first using a group and (::) $r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/; abcdef ~~ $r1 # matches abcdef xyzghijkl ~~ $r1 # matches ghijkl xyzabcghijkl ~~ $r1# matches ghijkl Why does the last one match? Because it fails the group but doesn't fail the rule -- i.e., the rule is still free to advance its initial pointer to the next character and try again. Contrast this with: $r2 = rx /abc ::: def | ghi ::: jkl | mn ::: op/; abcdef ~~ $r1 # matches abcdef xyzghijkl ~~ $r1 # matches ghijkl xyzabcghijkl ~~ $r1# fails! This one fails, because once we match the abc, we're commited to completing the match or failing the rule altogether. Does this work to convince you that the two expression are indeed different? Pm
Re: C:: in rules
On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote: $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote: On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote: False. In the first case the group is the whole rule. In the second case the group would not include the (implied) '.*?' at the start of the rule. This was a very unfortunate choice of explanations, since an implied .*? would change the semantics of the match deeply. However, your later explanation: $r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/; abcdef ~~ $r1 # matches abcdef xyzghijkl ~~ $r1 # matches ghijkl xyzabcghijkl ~~ $r1# matches ghijkl Why does the last one match? Because it fails the group but doesn't fail the rule -- i.e., the rule is still free to advance its initial pointer to the next character and try again. ... is very understandable. Now I'm just left with a vague sense that I never want to see anyone use :: :-) -- Aaron Sherman [EMAIL PROTECTED] Senior Systems Engineer and Toolsmith It's the sound of a satellite saying, 'get me down!' -Shriekback
Re: C:: in rules
On Thu, May 12, 2005 at 05:15:55PM -0400, Aaron Sherman wrote: On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote: False. In the first case the group is the whole rule. In the second case the group would not include the (implied) '.*?' at the start of the rule. This was a very unfortunate choice of explanations, since an implied .*? would change the semantics of the match deeply. I agree, my wording on this wasn't all that clear--I haven't found a good phrase for the stepping that takes place at the beginning of an unanchored match. And in earlier versions of PGE, the stepping was actually performed by a '.*?' node at the beginning of the expression tree that didn't participate in the captured result. Anyway, we're in agreement as to what :: and ::: do, so I'll propose changes to S05/A05 and we can go from there. Thanks! :-) Pm
Re: C:: in rules
On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote: : Also, A05 proposes incorrect alternatives to the above : : /[:w[]foo bar]/# null pattern illegal, use null : /[:w()foo bar]/# null capture illegal, and probably undesirable : /[:w\bfoo bar]/# not exactly the same as above : : I'd like to remove those from A05, or at least put an Update: : note there that doesn't lead people astray. One option not : mentioned in A05 that we can add there is : : /[:w?nullfoo bar]/ : : which is admittedly ugly. I would just like to point out that you are misreading those. The [] and () above are part of pair syntax, not rule syntax. Likewise your :w?null should be taken to :w('?null'). We used to try to distinguish modifiers like :w that don't take an argument, but that's a bad plan. All colon pairs parse alike wherever they occur. That's why we've required space before bracket delimiters outside, but the same constraint holds inside rules. Which means, of course, that we should probably try to figure what :w($x) actually means... :-) Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). We could also pass positions as opaque objects, which is another reason not to consider positions as mere numbers. Larry
Re: C:: in rules
On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote: On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote: : Also, A05 proposes incorrect alternatives to the above : : /[:w[]foo bar]/# null pattern illegal, use null : /[:w()foo bar]/# null capture illegal, and probably undesirable : /[:w\bfoo bar]/# not exactly the same as above : I would just like to point out that you are misreading those. Ouch, you're right! I've been looking at patterns too long, I guess -- thanks for the correction. Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). Sounds good to me. Pm