Re: RFC 110 (v3) counting matches
(mystery: how can filling in $ be a lot slower than filling in $1?) It isn't. It's the same. $1 might even be more expensive than $. It appears that many people don't understand the problem with $. I will try to explain. Maintaining the information required by $1 or $ slows down the regex match, possibly by as much as forty to sixty percent, or more. (How much depends on details of the regex and the target string.) For this reason, Perl has an optimization in it so that if you never use $ anywhere in your program, Perl never maintains the information, and every regex in your program runs faster. But if you do use $ somewhere, Perl cannot apply the optimization, and it must compute the $ information for every regex in the program. Every regex becomes much slower. In particular, if you load a module whose author happened to use $, all your regexes get slower, which might be an unpleasant surprise, since you might not be aware of the cause. A regex with backreferences is *also* slow. But using backreferences in one regex does not make all the *other* regexes slow. If you have /(...)/ # regex 1 /.../ # regex 2 Perl knows that it must compute the backreference information for regex 1, and knows that it can skip computing the backreference information for regex 2, because regex 2 contains no parentheses. If you use a module that contains regexes that use backreferences, those regexes run slowly, but there is no effect on *your* regexes. The cost is just as high for backreferences as for $, but the backreference cost is paid only by regexes that actually need it. The $ cost is paid by every regex in the entire program, whether they used it or not. This is because Perl has no way to tell which regexes use $ and which do not. One of Uri's suggestions in RFC 158 was to compute $ only for regexes that have a /k modifier. This would solve the $ problem because Perl would compute $ only when asked to, and not for every other regex in the rest of the program.
Re: RFC 110 (v3) counting matches
Jonathan Scott Duff wrote: How about something like this? $re = qr/(\d\d)-(\d\d)-(\d\d)/g; $re-onmatch_callback(push @list, makedate(^0,^1,^2)); $string =~ $re; It's not bad, but it loses one thing that I was trying to keep from the SNOBOL model. If you have (again, improvised syntax - I *know* you want to use the $ variables, OK? This is just for discussion): /($pat1)($pat2)($pat3)(?{sub1(@\)$pat4|?{sub2(@\)}$pat5|?{sub3(@\)})/ This would translate to "if pat1pat2pat3 matches, call sub1 with all the matches to that point if pat4 matches afterward, otherwise call sub2 with all the matches if pat5 matches, else just call sub3." The key bit here is that you pass over the sub call, deferring it until you've decided if the whole match worked, then picking the one that succeeded and calling it. If you don't like the syntax, please feel free to propose another. @\ seemed a good mnemonic for "the array of backreferences I already matched". And, of course, if you assume that @\ keeps growing when you use /g, then doing a scalar @\and dividing by the number of backreferences would give you a match count: $string /(\d\d)-(\d\d)-(\d\d)/g; $hits = scalar(@\)/3; Of course, with multiple alternatives with different numbers of backreferences leads to a problem, so maybe this is all academic. Oh well. --- Joe M.
Re: RFC 72 (v2) The regexp engine should go backward as well as forward.
From: "Peter Heslin" [EMAIL PROTECTED] Sent: Thursday, August 31, 2000 10:51 PM I would propose that your version of the syntax might also function in the middle of a regexp: /GHI(?`=DEF)JKL(?`=^ABC)MNO/ would match the start of the alphabet (fixed-length example used for simplicity). That's not what I had in mind; I would have the new look-behind look (in terms of left-to-right placement) and act like the existing (?=pat), except that it would have non-zero width and create a back-reference. The example above would (if we remove the ^ ) match GHIDEFJKLABCMNO Hmm, that non-zero width thing is screwy. The zero-width analog, /GHI(?=DEF)JKL(?=^ABC)MNO/, would NEVER match. That asks for a GHI immediately followed by a JKL immediately preceeded by a DEF. Without some better motivating examples, I'd rather keep the old and proposed look-behinds working the same. So let me retract what I said above about matching GHIDEFJKLABCMNO A match would be had with /GHI.*(?`=DEF)JKL.*(?`=ABC)MNO/ mike mulligan