Re: C<::> in rules
On Fri, May 13, 2005 at 01:07:20PM -0700, Larry Wall wrote: > On Fri, May 13, 2005 at 11:54:47AM -0500, Patrick R. Michaud wrote: > : $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; > : $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; > : $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; > > I would prefer that $r1 work like $r3, not like $r2, for two reasons. Now implemented as such in Parrot r8103. And yes, it now means that rx :w /foo/; rx /:w::foo/; rx /[:w::foo]/; are all identical, which is very nice. > By the way, I still think of it as "a group of alternatives" even > if there's only one alternative, and no |. But I can see how that > can be misread to imply at least two alternatives. [...] > And if there's no alternative, you only have one alternative. > Ain't English wonderful? ...and this last bit means we can strike the "It is illegal to use C<::> outside of an alternation" from S05, since we're always inside of an alternation (group of alternatives), even if there's only one alternative. That sentence has now been struck. Many thanks for the clarification and discussion. Pm
Re: C<::> in rules
On Sat, May 14, 2005 at 04:26:44AM +, Luke Palmer wrote: > On 5/14/05, Larry Wall <[EMAIL PROTECTED]> wrote: > > I want ::: to break out of *that* dynamic scope (or the equivalent > > "matchrighthere" scope), but not ::. > > I'm not sure that's such a good idea. When you say: > > rule foo() { a* ::: b } > > You know precisely where that ::: is going to take you: right out of > the rule. [...] But you're saying that when we use a bare // > matching a string, that's no longer the case? In other words, this: > > $str ~~ / a* ::: b / > > Is different from: > > $str ~~ / / > > That seems like a pretty obvious indirection, and a mistake to break > it. There's nothing there except , how could it act differently? Because $str ~~ / / puts the ::: in a subrule, whereas $str ~~ / a* ::: b / does not. It's the same sort of difference that one gets between { return if $a; } and sub foo() { return if $a; } { foo() } It's clear that the C in the first case affects control flow in in the current sub, while the nested C of foo() in the second case does not. Pm
Re: C<::> in rules
On 5/14/05, Larry Wall <[EMAIL PROTECTED]> wrote: > On Sat, May 14, 2005 at 01:15:36AM +, Luke Palmer wrote: > : I think the misunderstanding is rather simple. You keep talking like > : you prepend a .*? to the rule we're matching. I think that's wrong > : (and this is where I'm making a design call, so we can dispute on this > : once we're clear that it's this that is being disputed). I think > : there is a special rule: > : > : rule matchanywhere($rx) { .*? <$rx> } > : > : Which makes a *subrule call* to the rule we're matching. Therefore > : ::: just breaks out of that subrule, and backtracks into the .*? > : again. > > I want ::: to break out of *that* dynamic scope (or the equivalent > "matchrighthere" scope), but not ::. I'm not sure that's such a good idea. When you say: rule foo() { a* ::: b } You know precisely where that ::: is going to take you: right out of the rule. That's the way it works in grammars, and there's no implicit anything else that you're breaking out of. But you're saying that when we use a bare // matching a string, that's no longer the case? In other words, this: $str ~~ / a* ::: b / Is different from: $str ~~ / / That seems like a pretty obvious indirection, and a mistake to break it. There's nothing there except , how could it act differently? Luke
Re: C<::> in rules
On Sat, May 14, 2005 at 01:15:36AM +, Luke Palmer wrote: : I think the misunderstanding is rather simple. You keep talking like : you prepend a .*? to the rule we're matching. I think that's wrong : (and this is where I'm making a design call, so we can dispute on this : once we're clear that it's this that is being disputed). I think : there is a special rule: : : rule matchanywhere($rx) { .*? <$rx> } : : Which makes a *subrule call* to the rule we're matching. Therefore : ::: just breaks out of that subrule, and backtracks into the .*? : again. I want ::: to break out of *that* dynamic scope (or the equivalent "matchrighthere" scope), but not ::. Larry
Re: C<::> in rules
On 5/13/05, Patrick R. Michaud <[EMAIL PROTECTED]> wrote: > First, I'm quite certain that $r2 and $r3 are different. For > illustration, let's use a variation like: > > $q2 = rx / \w [ abc ::: def | ghi ::: jkl | mn ::: op ] /; > $q3 = rx / \w [ [ abc :: def | ghi :: jkl | mn :: op ] ]/; > > "xyzabc---xyzghijklmno" ~~ $q2 # fails after seeing "zabc" > "xyzabc---xyzghijklmno" ~~ $q3 # matches "zghijkl" Okay, I know where the misunderstanding is. When we use these kinds of examples, let's not rely on the implicit matching semantic. I'm saying that the above code is equivalent to: # the following is a rule, so ::: backtracks out of it and no further rule q2 { \w [ abc ::: def | ghi ::: jkl | mn ::: op ] } rule q3 { \w [ [ abc :: def | ghi :: jkl | mn :: op ] ] } "xyzabc---xyzghijklmno" ~~ /^ .*? /; # ::: backtracks into the .*? "xyzabc---xyzghijklmno" ~~ /^ .*? /; The presence of the \w does nothing, because \w doesn't backtrack. Alternations and quantifiers backtrack when you fail beyond them, \w just fails. You never enter the same subpattern (meant in the most general case: .* is a subpattern, for instance) in the same state. Something had to change behind you in order for a subpattern to be re-entered. I think the misunderstanding is rather simple. You keep talking like you prepend a .*? to the rule we're matching. I think that's wrong (and this is where I'm making a design call, so we can dispute on this once we're clear that it's this that is being disputed). I think there is a special rule: rule matchanywhere($rx) { .*? <$rx> } Which makes a *subrule call* to the rule we're matching. Therefore ::: just breaks out of that subrule, and backtracks into the .*? again. Because of this, I think there will be a difference between ::: and at the top level, but not :: and :::. Luke
Re: C<::> in rules
Larry wrote: I'm still not sure I believe in booleans to that extent. I suppose we could go as far as to make it :p(0 but true). Actually, it's more like "undef but true", if you want to be able to distinguish sub foo (+$p = 0) { # no :p at all say "true" if $p; # :p with no argument $p //= 42; # :p with no argument ... } Yes, I was thinking along the same lines. C as a default seems to be more accurate and useful than C. Damian
Re: C<::> in rules
On Fri, May 13, 2005 at 11:54:47AM -0500, Patrick R. Michaud wrote: : $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; : $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; : $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; I would prefer that $r1 work like $r3, not like $r2, for two reasons. First, it gives a useful distinction of meaning. Second, and more importantly, outer lexical scopes are often delimited by something other than what the inner scopes are delimited by. A file scope or an eval string is no less a block because it's not delimited by curlies. In the same way, the outer delimiters of an rx/.../ should function as if you'd said rx/[...]/. : However, *if* we say that :: at the top level fails the rule, that : means that as things currently stand : : $z1 = rx :w /foo/; : $z2 = rx /:w::foo/; : $z3 = rx /[:w::foo]/; : : can be a little surprising: : : "hello foo" ~~ $z1 # matches "foo" : "hello foo" ~~ $z2 # fails immediately upon the 'h' != 'f' : "hello foo" ~~ $z3 # matches "foo" : : which was the point of my original post. And that's the third reason. A :: at the beginning of a "group" should essentially be a no-op. By the way, I still think of it as "a group of alternatives" even if there's only one alternative, and no |. But I can see how that can be misread to imply at least two alternatives. (We're also hampered by the linguistic fact that "alternative" can mean either how many choices you have to make or how many paths are open to you. In other words, one alternative of the first sort presents you two alternatives of the second sort. And if there's no alternative, you only have one alternative. Ain't English wonderful? Anyway, :: fails the current lexical scope, not the current rule. ::: fails the current rule in a more dynamically scoped way, which is why it also fails the engine applying the implicit .*?. And of course, failure to is almost completely dynamic in scoping. It's more like unwinding an exception till you find a handler that knows it's the outer rule. Larry
Re: C<::> in rules
On Fri, May 13, 2005 at 11:43:42AM +0300, Markus Laire wrote: : Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1) : and not :p(1) I'm still not sure I believe in booleans to that extent. I suppose we could go as far as to make it :p(0 but true). Actually, it's more like "undef but true", if you want to be able to distinguish sub foo (+$p = 0) { # no :p at all say "true" if $p; # :p with no argument $p //= 42; # :p with no argument ... } Or maybe it's something more like "1 but assumed". In any event, it'd be nice to be able to distinguish :p from :p(1) somehow. Maybe the Bool type is good enough for that. The bool type probably isn't unless we depend on autoboxing to turn it into a Bool consistently. Larry
Re: C<::> in rules
On Fri, May 13, 2005 at 03:36:50PM +, Luke Palmer wrote: > I'm basically saying that you should treat your: > $str ~~ /abc :: def | ghi :: jkl | mn :: op/; > As: > $rule = rx/abc :: def | ghi :: jkl | mn :: op/; > $str ~~ /^ .*? <$rule>/; > Which means that you fail the rule, your .*? advances to the next > character and tries the rule again. Taking this explanation literally, this would mean that $rule = rx/abc :: def | ghi :: jkl | mn :: op/; $rule = rx/abc ::: def | ghi ::: jkl | mn ::: op/; both succeed against "xyzabc---ghijkl". But even just considering the :: instance, this interpretation doesn't match what you said in your original message that :: would fail the rule without further advancing: Pm> $rule =3D rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; Pm> "travel by plane jet train tgv today" ~~ $rule LP> When you fail over the :: after plane, it skips out of the alternation LP> looking for something to backtrack before it. Since there is nothing, LP> the rule fails. > Maybe I'm misunderstanding your interpretation (when in doubt, explain > with code). One of us is misunderstanding the other. I'll explain with code, but first let's clarify the difference. I read your first message as claiming that $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; are equivalent. I believe $r2 and $r3 are not equivalent. For comparison, let's first look at a slightly different example, and let's avoid subrules they don't provide the auto-advance of unanchored patterns that forms the crux of my question. First, I'm quite certain that $r2 and $r3 are different. For illustration, let's use a variation like: $q2 = rx / \w [ abc ::: def | ghi ::: jkl | mn ::: op ] /; $q3 = rx / \w [ [ abc :: def | ghi :: jkl | mn :: op ] ]/; "xyzabc---xyzghijklmno" ~~ $q2 # fails after seeing "zabc" "xyzabc---xyzghijklmno" ~~ $q3 # matches "zghijkl" The difference is precisely the difference between ::: and :: -- the former fails the rule entirely, while the latter simply fails the current group (of alternations) and tries again. With :::, an unanchored rule should also stop its process of "advancing to the next character and trying again". (Otherwise, "abefgh" ~~ rx / [ ab ::: cd | ef ::: gh ] / succeeds.) So, by analogy $r2 = rx / abc ::: def | ghi ::: jkl | mn ::: op /; $r3 = rx / [ abc :: def | ghi :: jkl | mn :: op ] /; "xyzabc---xyzghijklmno" ~~ $r2 # fails after seeing "abc" "xyzabc---xyzghijklmno" ~~ $r3 # matches "ghijkl" The :: in $r3 doesn't cause the entire rule to fail, just the group, so the match is free to backtrack and continue its "advance to the next character and try again". (What the "::" in $r3 *does* do is to tell the matching engine to not bother trying the remaining alternatives once it has seen an "abc" at this point.) So, going back to the original $r1 = rx / abc :: def | ghi :: jkl | mn :: op /; does it work like $r2 or $r3? My gut feeling is that it should work like $r2 -- i.e., that once we find an "abc" we'll fail the rule if there's not a "def" following. This also accords with what others have written in reply, when they say that all three of my expressions fail in the same way (even though they do not). However, *if* we say that :: at the top level fails the rule, that means that as things currently stand $z1 = rx :w /foo/; $z2 = rx /:w::foo/; $z3 = rx /[:w::foo]/; can be a little surprising: "hello foo" ~~ $z1 # matches "foo" "hello foo" ~~ $z2 # fails immediately upon the 'h' != 'f' "hello foo" ~~ $z3 # matches "foo" which was the point of my original post. And as I said there, I don't have a problem with this, I just wanted to make this result didn't surprise too many others. I hope this was clear enough -- if not, explain counter examples in code. :-) Pm
Re: C<::> in rules
On 5/13/05, Patrick R. Michaud <[EMAIL PROTECTED]> wrote: > To use the phrase from later in your message, there's still > the "implicit .*? followed by the rule call." Since the rule > itself hasn't failed (only the group failed), we're still free to > try to match the pattern at later positions. I'm basically saying that you should treat your: $str ~~ /abc :: def | ghi :: jkl | mn :: op/; As: $rule = rx/abc :: def | ghi :: jkl | mn :: op/; $str ~~ /^ .*? <$rule>/; Which means that you fail the rule, your .*? advances to the next character and tries the rule again. Maybe I'm misunderstanding your interpretation (when in doubt, explain with code). Luke
Re: C<::> in rules
On 5/12/05, Patrick R. Michaud <[EMAIL PROTECTED]> wrote: > I have a couple of questions regarding C< :: > in perl 6 rules. > First, a question of verification -- in > > $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; > > "travel by plane jet train tgv today" ~~ $rule > > I think the match should fail outright, as opposed to matching "train tgv". > In other words, it acts as though one had written > > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; > > and not > > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Those both do the same thing (which is the same as your example). When you fail over the :: after plane, it skips out of the alternation looking for something to backtrack before it. Since there is nothing, the rule fails. > Does this sound right? > > Next on my list, S05 says "It is illegal to use :: outside of > an alternation", but A05 has > > /[:w::foo bar]/ > > which leads me to believe that :: isn't illegal here even though there's > no alternation. I'd like to strike that sentence from S05 Yeah, I think using :: to break out of the innermost bracketing group is helpful even without an alternation present. > Also, A05 proposes incorrect alternatives to the above > > /[:w[]foo bar]/# null pattern illegal, use > /[:w()foo bar]/# null capture illegal, and probably undesirable > /[:w\bfoo bar]/# not exactly the same as above > > I'd like to remove those from A05, or at least put an "Update:" > note there that doesn't lead people astray. One option not > mentioned in A05 that we can add there is > > /[:wfoo bar]/ > > which is admittedly ugly. > > So, now then, on to the item that got me here in the first place. > The upshot of all of the above is that > > rx :w /foo bar/ > > is not equivalent to > > rx /:w::foo bar/ Yeah, but it is. So no problem. :-) > which may surprise a few people. The :: at the beginning of > the pattern effectively anchors the match to the beginning of > the string or the current position -- i.e., it eliminates the > implicit C< .*? > at the start of the match. Ohhh, ohh. There isn't an implicit .*? at the beginning of the match. It's more like there's an implicit .*? followed by a rule call to the match. Think of it as that we're trying to match the pattern at any position rather than there being an implicit .*?. Luke
Re: C<::> in rules
On Fri, 2005-05-13 at 00:26, Patrick R. Michaud wrote: > On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote: > > On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote: > > : Also, A05 proposes incorrect alternatives to the above > > : > > : /[:w[]foo bar]/ > > I would just like to point out that you are misreading those. > I've been looking at patterns too long You know, this is going to be a problem for a lot of people... Think of this case: /:w[foo bar|bar foo]/ I may be in the minority here, but I think we should try to avoid having [] and () mean different things in different parts of a rule, especially where one use is VERY common, and the other is obscure at best. I'd even be ok with only allowing this inside our already highly magical <>: /<:w>[foo bar|bar foo]/ and /<:p(false)>/ and / <:p5['ponie']> (?{die;}) / I checked, and while <::...> has a meaning in S05, <:...> does not, so as long as we never allow a modifier called "::", this would work. In fact, Larry, I think it's safe to say that <> is actually more sought-after than that : everyone wants ;-) -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback
Re: C<::> in rules
Markus Laire skribis 2005-05-13 11:43 (+0300): > Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1) > and not :p(1) Agreed Juerd -- http://convolution.nl/maak_juerd_blij.html http://convolution.nl/make_juerd_happy.html http://convolution.nl/gajigu_juerd_n.html
Re: C<::> in rules
TSa (Thomas Sandlaß) kirjoitti: Larry Wall wrote: Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). Isn't that slightly inconsistent with :p meaning :p(1) the so-called "real winner for passing boolean options" of A12? Perhaps spec should be changed so that :p means :p(bool::true) or :p(?1) and not :p(1) -- Markus Laire
Re: C<::> in rules
Larry Wall wrote: Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). Isn't that slightly inconsistent with :p meaning :p(1) the so-called "real winner for passing boolean options" of A12? -- TSa
Re: C<::> in rules
On Thu, May 12, 2005 at 08:56:39PM -0700, Larry Wall wrote: > On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote: > : Also, A05 proposes incorrect alternatives to the above > : > : /[:w[]foo bar]/# null pattern illegal, use > : /[:w()foo bar]/# null capture illegal, and probably undesirable > : /[:w\bfoo bar]/# not exactly the same as above > : > > I would just like to point out that you are misreading those. Ouch, you're right! I've been looking at patterns too long, I guess -- thanks for the correction. > Speaking of which, it seems to me that :p and :c should allow an > argument that says where to start relative to the current position. > In other words, :p means :p(0) and :c means :c(0). I could also see > uses for :p(-1) and :p(+1). Sounds good to me. Pm
Re: C<::> in rules
On Thu, May 12, 2005 at 09:33:37AM -0500, Patrick R. Michaud wrote: : Also, A05 proposes incorrect alternatives to the above : : /[:w[]foo bar]/# null pattern illegal, use : /[:w()foo bar]/# null capture illegal, and probably undesirable : /[:w\bfoo bar]/# not exactly the same as above : : I'd like to remove those from A05, or at least put an "Update:" : note there that doesn't lead people astray. One option not : mentioned in A05 that we can add there is : : /[:wfoo bar]/ : : which is admittedly ugly. I would just like to point out that you are misreading those. The [] and () above are part of pair syntax, not rule syntax. Likewise your :w should be taken to :w('?null'). We used to try to distinguish modifiers like :w that don't take an argument, but that's a bad plan. All colon pairs parse alike wherever they occur. That's why we've required space before bracket delimiters outside, but the same constraint holds inside rules. Which means, of course, that we should probably try to figure what :w($x) actually means... :-) Speaking of which, it seems to me that :p and :c should allow an argument that says where to start relative to the current position. In other words, :p means :p(0) and :c means :c(0). I could also see uses for :p(-1) and :p(+1). We could also pass positions as opaque objects, which is another reason not to consider positions as mere numbers. Larry
Re: C<::> in rules
On Thu, May 12, 2005 at 05:15:55PM -0400, Aaron Sherman wrote: > On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote: > > False. In the first case the group is the whole rule. In the second > > case the group would not include the (implied) '.*?' at the start of > > the rule. > > This was a very unfortunate choice of explanations, since an implied > ".*?" would change the semantics of the match deeply. I agree, my wording on this wasn't all that clear--I haven't found a good phrase for "the stepping that takes place at the beginning of an unanchored match". And in earlier versions of PGE, the stepping was actually performed by a '.*?' node at the beginning of the expression tree that didn't participate in the captured result. Anyway, we're in agreement as to what :: and ::: do, so I'll propose changes to S05/A05 and we can go from there. Thanks! :-) Pm
Re: C<::> in rules
On Thu, 2005-05-12 at 15:41, Patrick R. Michaud wrote: > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; > > On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote: > > On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote: > > > False. In the first case the group is the whole rule. In the second > > > case the group would not include the (implied) '.*?' at the start of > > > the rule. This was a very unfortunate choice of explanations, since an implied ".*?" would change the semantics of the match deeply. However, your later explanation: > $r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/; > > "abcdef" ~~ $r1 # matches "abcdef" > "xyzghijkl" ~~ $r1 # matches "ghijkl" > "xyzabcghijkl" ~~ $r1# matches "ghijkl" > > Why does the last one match? Because it fails the group but > doesn't fail the rule -- i.e., the rule is still free to advance > its initial pointer to the next character and try again. ... is very understandable. Now I'm just left with a vague sense that I never want to see anyone use :: :-) -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback
Re: C<::> in rules
$rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; On Thu, May 12, 2005 at 02:29:24PM -0400, Aaron Sherman wrote: > On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote: > > On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: > > > Your two examples fail in the same way because of the fact that the > > > group IS the whole rule. > > > > False. In the first case the group is the whole rule. In the second > > case the group would not include the (implied) '.*?' at the start of > > the rule. > > That cannot be true. If it were, then: > s/[a]// > and > s/a// > would replace different things, and they MUST NOT. No, /[a]/ is still the same as /a/ here -- I'm not discussing that at all, nor am I implying any special [] or rule semantics. I'm simply referring to the fact that the rule is free to step across the characters in the string, same as you pointed out. Let me backtrack(!) and try a slightly different example, first using a group and (::) $r1 = rx /[abc :: def | ghi :: jkl | mn :: op]/; "abcdef" ~~ $r1 # matches "abcdef" "xyzghijkl" ~~ $r1 # matches "ghijkl" "xyzabcghijkl" ~~ $r1# matches "ghijkl" Why does the last one match? Because it fails the group but doesn't fail the rule -- i.e., the rule is still free to advance its initial pointer to the next character and try again. Contrast this with: $r2 = rx /abc ::: def | ghi ::: jkl | mn ::: op/; "abcdef" ~~ $r1 # matches "abcdef" "xyzghijkl" ~~ $r1 # matches "ghijkl" "xyzabcghijkl" ~~ $r1# fails! This one fails, because once we match the "abc", we're commited to completing the match or failing the rule altogether. Does this work to convince you that the two expression are indeed different? Pm
Re: C<::> in rules
On Thu, 2005-05-12 at 13:44, Patrick R. Michaud wrote: > On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: > > > In other words, it acts as though one had written > > > > > > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; > > > > > > and not > > > > > > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; > > > > Your two examples fail in the same way because of the fact that the > > group IS the whole rule. > > False. In the first case the group is the whole rule. In the second > case the group would not include the (implied) '.*?' at the start of > the rule. That cannot be true. If it were, then: s/[a]// and s/a// would replace different things, and they MUST NOT. If I've missed some fundamental way in which rx:p5/(?:...)/ is different from rx/[...]/, then please let me know. Otherwise, we can simply demonstrate this with P5: perl -le '"abcaabbcc" =~ /(?:aa)/;print $&' and unshockingly, that prints "aa", not "abcaa" > Note that the rule is *unanchored*, thus it tries at the first character, > if it fails then it goes to the second character, if that fails it goes > to the third, etc. Yes, you're correct, but when you step forward over input in order to find a start for your unanchored expression, you do NOT consume that input, grouping or not. To say: $foo ~~ /unanchored/ is something like for 0..length($foo)-1 -> $i { substr($foo,$i) ~~ /^unanchored/; } and always has been. Unless I'm unaware of some subtlety of [], it is just the same as P5's (?:...), which behaves exactly this way. I'll skip the rest of your post for now, except for the last bit, since I think we need to resolve which universe we're in before we can give each other street directions ;-) > > > /[:w\bfoo bar]/# not exactly the same as above > > > > No, I think that's exactly the same. > > Nope. Consider: > > $foo = rx /[:w::foo bar]/ > $baz = rx /[:w\bfoo bar]/ > > "myfoo bar" ~~ $foo # matches > "myfoo bar" ~~ $baz # fails, foo is not on a word boundary You're correct, sorry about that. -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback
Re: C<::> in rules
> "PRM" == Patrick R Michaud <[EMAIL PROTECTED]> writes: PRM> On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote: >> >> > > /[:w\bfoo bar]/# not exactly the same as above >> > >> > No, I think that's exactly the same. >> >> What does \b mean again? I assume it's no longer backspace? PRM> For as long as I can remember \b has meant "word boundary" in PRM> regular expressions. :-) :-) except in char classes where it gets its backspace meaning back. :-) uri -- Uri Guttman -- [EMAIL PROTECTED] http://www.stemsystems.com --Perl Consulting, Stem Development, Systems Architecture, Design and Coding- Search or Offer Perl Jobs http://jobs.perl.org
Re: C<::> in rules
On Thu, May 12, 2005 at 12:48:16PM -0500, Patrick R. Michaud wrote: > On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote: > > > > > > /[:w\bfoo bar]/# not exactly the same as above > > > > > > No, I think that's exactly the same. > > > > What does \b mean again? I assume it's no longer backspace? > > For as long as I can remember \b has meant "word boundary" in > regular expressions. :-) :-) Doh! See how the shiny new perl6 confuses? ;-) -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: C<::> in rules
On Thu, May 12, 2005 at 12:33:59PM -0500, Jonathan Scott Duff wrote: > > > > /[:w\bfoo bar]/# not exactly the same as above > > > > No, I think that's exactly the same. > > What does \b mean again? I assume it's no longer backspace? For as long as I can remember \b has meant "word boundary" in regular expressions. :-) :-) Pm
Re: C<::> in rules
On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: > My take, based on S05: > > > In other words, it acts as though one had written > > > > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; > > > > and not > > > > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; > > Your two examples fail in the same way because of the fact that the > group IS the whole rule. False. In the first case the group is the whole rule. In the second case the group would not include the (implied) '.*?' at the start of the rule. Perhaps it helps to see the difference if I write it this way: $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/; Note that the rule is *unanchored*, thus it tries at the first character, if it fails then it goes to the second character, if that fails it goes to the third, etc. Thus, given: $rule1 = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; $rule2 = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; "travel by plane jet train tgv today" ~~ $rule1; # fails "travel by plane jet train tgv today" ~~ $rule2; # matches "train tgv" They're not equivalent. > > Next on my list, S05 says "It is illegal to use :: outside of > > an alternation", but A05 has > > > > /[:w::foo bar]/ > > I can't even figure out what that means. :w turns on word mode > (lexically scoped per S05) and "::" is a group-level commit. What are we > committing exactly? Looks like a noop to me, which actually might not be > so bad. Yes, the point is that it's a no-op, because /[:wfoo bar:]/ is something entirely different. > > /[:w\bfoo bar]/# not exactly the same as above > > No, I think that's exactly the same. Nope. Consider: $foo = rx /[:w::foo bar]/ $baz = rx /[:w\bfoo bar]/ "myfoo bar" ~~ $foo # matches "myfoo bar" ~~ $baz # fails, foo is not on a word boundary Pm
Re: C<::> in rules
On Thu, May 12, 2005 at 12:53:46PM -0400, Aaron Sherman wrote: > On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote: > > Next on my list, S05 says "It is illegal to use :: outside of > > an alternation", but A05 has > > > > /[:w::foo bar]/ > > I can't even figure out what that means. :w turns on word mode > (lexically scoped per S05) and "::" is a group-level commit. What are we > committing exactly? Looks like a noop to me, which actually might not be > so bad. However, you're right: this is an error as there are no > alternations. I think the definition of :: needs to be changed slightly. You even used a phrase that isn't exactly true according to spec but would be if :: meant what I think it should mean. That phrase is ":: is a group-level commit". This isn't how I read S05 (and apparently how you and others read it as well, hence your comment to Pm that there are no alternations). S05 says: Backtracking over a double colon causes the surrounding group of alternations to immediately fail: I think it should simply read: Backtracking over a double colon causes the surrounding group to immediately fail: In other words, the phrase "of alternations" is a red herring. > > which leads me to believe that :: isn't illegal here even though there's > > no alternation. I'd like to strike that sentence from S05. > > I don't think it should be removed. You can always use ::: if that's > what you wanted. I too think it should be stricken. > > /[:w\bfoo bar]/# not exactly the same as above > > No, I think that's exactly the same. What does \b mean again? I assume it's no longer backspace? > > So, now then, on to the item that got me here in the first place. > > The upshot of all of the above is that > > > > rx :w /foo bar/ > > > > is not equivalent to > > > > rx /:w::foo bar/ > > If we feel strongly, it could be special-cased, but your solution > seems fine to me. If :: were to fail the surrounding group we can say that a rule without [] or () is an implicit group for :: purposes. -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: C<::> in rules
My take, based on S05: On Thu, 2005-05-12 at 10:33, Patrick R. Michaud wrote: > I have a couple of questions regarding C< :: > in perl 6 rules. > First, a question of verification -- in > > $rule = rx :w / plane :: (\d+) | train :: (\w+) | auto :: (\S+) / ; > > "travel by plane jet train tgv today" ~~ $rule > > I think the match should fail outright, as opposed to matching "train tgv". Correct, that's the meaning of :: S05: "Backtracking over a double colon causes the surrounding group of alternations to immediately fail:" Your surrounding group is the entire rule, and thus you fail at that point. > In other words, it acts as though one had written > > $rule = rx :w / plane ::: (\d+) | train ::: (\w+) | auto ::: (\S+) / ; > > and not > > $rule = rx :w /[ plane :: (\d+) | train :: (\w+) | auto :: (\S+) ]/ ; Your two examples fail in the same way because of the fact that the group IS the whole rule. > Next on my list, S05 says "It is illegal to use :: outside of > an alternation", but A05 has > > /[:w::foo bar]/ I can't even figure out what that means. :w turns on word mode (lexically scoped per S05) and "::" is a group-level commit. What are we committing exactly? Looks like a noop to me, which actually might not be so bad. However, you're right: this is an error as there are no alternations. > which leads me to believe that :: isn't illegal here even though there's > no alternation. I'd like to strike that sentence from S05. I don't think it should be removed. You can always use ::: if that's what you wanted. > Also, A05 proposes incorrect alternatives to the above > > /[:w[]foo bar]/# null pattern illegal, use Correct. > /[:w()foo bar]/# null capture illegal, and probably undesirable Correct. > /[:w\bfoo bar]/# not exactly the same as above No, I think that's exactly the same. > So, now then, on to the item that got me here in the first place. > The upshot of all of the above is that > > rx :w /foo bar/ > > is not equivalent to > > rx /:w::foo bar/ If we feel strongly, it could be special-cased, but your solution seems fine to me. -- Aaron Sherman <[EMAIL PROTECTED]> Senior Systems Engineer and Toolsmith "It's the sound of a satellite saying, 'get me down!'" -Shriekback