RFC 110 counting matches (post Hugo)
This list has gone a little quiet... Hugo wrote: I like this too. I'd suggest /t should mean a) return a scalar of the number of matches and b) don't set any special variables. Then /t without /g would return 0 or 1, but be faster since no extra information need be captured (except internally for (.)\1 type matching - compile time checks could determine if these are needed, though (?{..}) and (??{..}) patterns would require disabling of that optimisation). /tg would give a scalar count of the total number of matches. \G would retain its meaning. Any which way, implementation should be fairly straightforward, though ensuring that optimisations occurred precisely when they are safe would probably involve a few bug-chasing cycles. I propose adding this note. His preference for the working of /t and /g seems the most appropriate. Unless I here any further discussion I propose moving this RFC to frozen this week. Richard -- [EMAIL PROTECTED]
RFC 166 (postHugo)
This RFC had three concepts, I propose dropping the "Not a pattern" from here as it is now in RFC 198 and the null element. The List expansion might benefit from a slight enhancement. Hugo: (?@foo) and (?Q@foo) are both things I've wanted before now. I'm not sure if this is the right syntax, particularly if RFC 112 is adopted: it would be confusing to have (?@foo) to have so different a meaning from (?$foo=...), and even more so if the latter is ever extended to allow (?@foo=...). I see no reason that implementation should cause any problems since this is purely a regexp-compile time issue. I dont have any problem with the (?@foo) syntax, does anybody else? I cant imagine a (?@foo=...) style syntax (yet). Thinking further about what I defined for (?Q@foo) as adding the list as quoted alternatives, is there a case for (?Q$foo) to match the contents of $foo quoted in a similar way? (I think it is at least a probably). Feedback desirable. Richard (Still thinking on scoping in assignment and boolean regexes) -- [EMAIL PROTECTED]
Re: RFC 166 (postHugo)
Sorry, I can't help but read the subject as an abbreviation of post Hugo, ergo propter Hugo and then I wonder why you're naming an RFC after a logical fallacy involving a perl5-porter. I am seeking treatment, though :-) Nat
special character to match a valid expression
Thinking about the brace matching problem, regarding the specific problem of writing a regex to match any valid specification of a scalar written like ${expression returning name or reference goes here} I realized that no amount of lookahead is going to be without possible problems. So why not give up and try another direction? A special backslash assertion that matches ___ a valid perl expression which appears as if it would return a value ___ would be just the thing. We could call it \v for valid, and it would match as far as it could get in validity. ($Name_Of_The_First_Interpolable_Scalar) = m/\$(\w+|(\{\v\}))/; It's opposite, \V, is something I'd like to know more about before submitting a RFC on this idea. Would it be greedy? Would it require a quantifier, matching dot for that many, and then validate the results? Maybe it should be left undefined: can anyone come up with a situation in which you'd want to match all the characters that were not syntactically valid, or match up to the last token that would match all the previous brackets? -- David Nicol 816.235.1187 [EMAIL PROTECTED] perl -e'map{sleep print$w[rand@w]}@w=' ~/nsmail/Inbox
Re: RFC 72 (v1) The regexp engine should go backward as well as forward.
Simply put, I want variable-length lookbehind. Why didn't you simply propose that the (?...) operator be fixed to support variable-length expressions? Why so much additional machinery?
Re: $ and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)
in any case, i think we have a fair agreement on rfc 158 and i will freeze it if there is no further comments on it. I think you should remove the parts of your propsal about making $ be autolocalized. If you're not planning to revise your RFC, let me know so that I can ask the librarian to mark it as withdrawn.
Re: RFC 158 (v1) Regular Expression Special Variables
Mark-Jason Dominus writes: : There's also long been talk/thought about making $ and $1 : and friends magic aliases into the original string, which would : save that cost. : :Please correct me if I'm mistaken, but I believe that that's the way :they are implemented now. A regex match populates the -startp and :-endp parts of the regex structure, and the elements of these items :are byte offsets into the original string. I went on a briefish trawl for this the other day, and as far as I can tell what happens is this: - during matching, the startp/endp pairs are populated with offsets into the target string - immediately after matching, the target string is copied if needed, and the PL_curpm object is updated to refer to the copy - the copy is needed if any of the special variables can be referred to: $`, $, $', $1, $2, ... The result of that is that if there are backreferences in the regexp, the copy is always needed; if not, the copy is needed only if $ or her kin have been seen. So regexps with backrefs should suffer no slowdown from use of $ in the same program, but regexps without backrefs will get a (potentially) unnecessary copy. The other problem with this, of course, is that the compiler may not yet have seen the $ we intend to use: crypt% perl -wle '$_="foo"; /.*/; $_="bar"; print eval q{$}' bar crypt% .. and I think coredumps may be possible from this. (Hmm, perlbug upcoming.) Hugo
Re: XML/HTML-specific ? and ? operators?
: it looks worse and dumps core. That's because the first non-paren forces it to recurse into the second branch until you hit REG_INFTY or overflow the stack. Swap second and third branches and you have a better chance: I think something else goes wrong there too. $re = qr{...} (I haven't checked that there aren't other problems with it, though.) Try this: "(x)(y)" -~ /^$re$/; This should match, but it dumps core. I don't think there is infinite recursion, although I might be mistaken. Anyway, Snobol has a nice heuristic to prevent infinite recursion in cases like this, but I'm not sure it's applicable to the way the Perl regex engine works. I will think about it.
Re: XML/HTML-specific ? and ? operators?
:Anyway, Snobol has a nice heuristic to prevent infinite recursion in :cases like this, but I'm not sure it's applicable to the way the Perl :regex engine works. I will think about it. It is probably worth adding the heuristic above: anytime you recurse into the same re at the same position, there is an infinite loop. That is basically it, except that in snobol it is inside out: Each recursively interpolated pattern is assumed to match a string of at least length 1, and if the remaining part of the target string isn't sufficiently long to match the rest of the pattern after recursion, then the recursion is skipped.