# New Ticket Created by "Carl Mäsak" # Please include the string: [perl #72440] # in the subject line of all future correspondence about this issue. # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=72440 >
This be Rakudo a609d7 on Parrot r43600. $ perl6 -e 'say "1ab2ab3c" ~~ /^ \d ** abc $/ ?? "OH NOES" !! "oh phew"' OH NOES This is a PGE bug. Here follows a brief explanation. S05 states that unquotes literals like C<abc> are actually three distinct atoms, each of which can be quantified separately. Thus, C<abc*> means C<ab[c]*>, not C<[abc]*>. With that reasoning, C<\d ** abc> means C<\d ** [a] bc>. However (though S05, to my knowledge, does not mention it), one might perhaps temporarily lift the rule about each unquoted alphanumeric character being its own atom in "** separator context". In that case, C<\d ** abc> could be made to mean C<\d ** [abc]>. (I'm not saying this exception would be a good idea, language-wise.) In PGE, as we see above, C<\d ** abc> currently means C<\d ** [ab] c>. This is due to an internal optimization that's usually invisible to the user. When parsing C<abc>, PGE conveniently reads it as C<'ab' c> or, more generally, it reads all characters in an unquoted literal, save for the last character. This optimization makes a lot of sense if it turns out that C<c> had a quantifier on it. Later steps in the regex compilation merge the C<ab> and C<c> into one literal string if it didn't. In the case of the separator in C<**>, this optimization produces the wrong results. At the time C<ab> and C<c> would be merged, C<ab> has already been bound as the separator of the C<**> operator. I probably wouldn't submit this as a rakudobug, were it not for the fact that, according to my reading of <http://github.com/perl6/nqp-rx/blob/eb9c75a9b6bf144808ca6d24f31b606e9e8adba8/src/Regex/P6Regex/Grammar.pm> (lines 47 and 67), this problem persists in nqp-rx, and thus in the ng branch of Rakudo, once it supports regex matching. For what it's worth, I suggest that /\d ** abc/ actually be interpreted as /\d ** [a] bc/, but that a (suppressible) warning be emitted whenever an atom follows a quantifier separator with no whitespace in between. // Carl
