Re: $ and copying: rfc 158 (was Re: RFC 110 (v3) counting matches)
in any case, i think we have a fair agreement on rfc 158 and i will freeze it if there is no further comments on it. I think you should remove the parts of your propsal about making $ be autolocalized. If you're not planning to revise your RFC, let me know so that I can ask the librarian to mark it as withdrawn.
Re: RFC 110 (v3) counting matches
(mystery: how can filling in $ be a lot slower than filling in $1?) It isn't. It's the same. $1 might even be more expensive than $. It appears that many people don't understand the problem with $. I will try to explain. Maintaining the information required by $1 or $ slows down the regex match, possibly by as much as forty to sixty percent, or more. (How much depends on details of the regex and the target string.) For this reason, Perl has an optimization in it so that if you never use $ anywhere in your program, Perl never maintains the information, and every regex in your program runs faster. But if you do use $ somewhere, Perl cannot apply the optimization, and it must compute the $ information for every regex in the program. Every regex becomes much slower. In particular, if you load a module whose author happened to use $, all your regexes get slower, which might be an unpleasant surprise, since you might not be aware of the cause. A regex with backreferences is *also* slow. But using backreferences in one regex does not make all the *other* regexes slow. If you have /(...)/ # regex 1 /.../ # regex 2 Perl knows that it must compute the backreference information for regex 1, and knows that it can skip computing the backreference information for regex 2, because regex 2 contains no parentheses. If you use a module that contains regexes that use backreferences, those regexes run slowly, but there is no effect on *your* regexes. The cost is just as high for backreferences as for $, but the backreference cost is paid only by regexes that actually need it. The $ cost is paid by every regex in the entire program, whether they used it or not. This is because Perl has no way to tell which regexes use $ and which do not. One of Uri's suggestions in RFC 158 was to compute $ only for regexes that have a /k modifier. This would solve the $ problem because Perl would compute $ only when asked to, and not for every other regex in the rest of the program.
Re: RFC 110 (v3) counting matches
Jonathan Scott Duff wrote: How about something like this? $re = qr/(\d\d)-(\d\d)-(\d\d)/g; $re-onmatch_callback(push @list, makedate(^0,^1,^2)); $string =~ $re; It's not bad, but it loses one thing that I was trying to keep from the SNOBOL model. If you have (again, improvised syntax - I *know* you want to use the $ variables, OK? This is just for discussion): /($pat1)($pat2)($pat3)(?{sub1(@\)$pat4|?{sub2(@\)}$pat5|?{sub3(@\)})/ This would translate to "if pat1pat2pat3 matches, call sub1 with all the matches to that point if pat4 matches afterward, otherwise call sub2 with all the matches if pat5 matches, else just call sub3." The key bit here is that you pass over the sub call, deferring it until you've decided if the whole match worked, then picking the one that succeeded and calling it. If you don't like the syntax, please feel free to propose another. @\ seemed a good mnemonic for "the array of backreferences I already matched". And, of course, if you assume that @\ keeps growing when you use /g, then doing a scalar @\and dividing by the number of backreferences would give you a match count: $string /(\d\d)-(\d\d)-(\d\d)/g; $hits = scalar(@\)/3; Of course, with multiple alternatives with different numbers of backreferences leads to a problem, so maybe this is all academic. Oh well. --- Joe M.
Re: RFC 110 (v3) counting matches
On Mon, 28 Aug 2000, Mark-Jason Dominus wrote: But there is no convenient way to run the loop once for each date and split the dates into pieces: # WRONG while (($mo, $dy, $yr) = ($string =~ /(\d\d)-(\d\d)-(\d\d)/g)) { ... } What I use in a script of mine is: while ($string =~ /(\d\d)-(\d\d)-(\d\d)/g) { ($mo, $dy, $yr) = ($1, $2, $3); } Although this, of course, also requires that you know the number of backreferences. The real problem I was trying to discuss was not this particular application. I was trying to point out a larger problem, which is that there are several regex features that are enabled or disabled depending on what context the match is in, so that if you want one scalar-context feature and one list-context feature at the same time, there is no direct way to do it. Nicer would be to be able to assign from @matchdata or something like that :) I agree. There are many operations that would be simpler if there was a magic array that contained ($1, $2, $3, ...). If anyone wants to write an RFC on this, I will help.
Re: RFC 110 (v3) counting matches
On Tue, 29 Aug 2000 08:51:29 -0400, Mark-Jason Dominus wrote: There are many operations that would be simpler if there was a magic array that contained ($1, $2, $3, ...). If anyone wants to write an RFC on this, I will help. Heh. I once complained about the lack of such an array, in comp.lang.perl.misc, *years* ago. My practical problem was something like this, in a translation program. $phrase is one of many patterns in a table, to look for English phrases, %translate contains the French translations. interpolate() is a sub that fills in the parameters -- the numbers in the string): $_ = "It is 5 past 10." $phrase = 'it is (\d+) past (\d+)'; s/^$phrase/interpolate($translate{$phrase}, $1, $2)/ie; The problem is that with variable patterns, you *don't know* how many paren groups there are. The solution they came upo with, was @+ and @-. I still can't work with those. An array of matches, (e.g. @) would be a lot easier. It could also be a lot slower; see the discussion on $ for this. (mystery: how can filling in $ be a lot slower than filling in $1?) -- Bart.
Re: RFC 110 (v3) counting matches
That empty list to force the proper context irks me. How about a modifier to the RE that forces it (this would solve the "counting matches" problem too). $string =~ m{ (\d\d) - (\d\d) - (\d\d) (?{ push @dates, makedate($1,$2,$3) }) }gxl; $count = $string =~ m/foo/gl; # always list context The reason why not is because you're adding a special case hack to one particular place, rather than promoting a general mechanism that can be everywhere. Tell me: which is better and why. 1) A regex switch to specify scalar context, as in a mythical /r: push(@got, /bar/r) 2) A general mechanism, say for example, "scalar": push(@got, scalar /bar/) Obviously the "scalar" is better, because it does not require that a new switch be learnt, nor is its use restricted to pattern matching. Furthermore, it's inarguably more mnemonic for the sense of "match this scalarishly". Likewise, to force list context (a far less common operation, mind you), it is a bad idea to have what amounts to a special argument to just one function to this. What happens to the next function you want to do this to? How about if I want to force getpwnam() into list context and get back a scalar result? $count = getpwnam("tchrist")/l; $count = getpwnam("tchrist", LIST); $count = getpwnam("tchrist")-as_list; All of those, frankly, suck. This is much better: $count = () = getpwnam("tchrist"); It's better because * You don't have to invent anything new, whether syntactically or mnemonically. The sucky solution all require modification of Perl's very syntax. With the list assignment, you just need to learn how to use what you *already have*. I could say as much for (?{...}). Think how many of the suggestions on these lists can be dealt with simply through using existing features that the suggesting party was unaware of. * It's a general mechanism that isn't tailored for this particular function call. Special-purpose solutions are often inferior to general-purpose ones, because the latter are more likely to be creatively usable in a fashion unforeseen by the author. * What could possibly be more intuitive for the action of acting as though one were assigning to a list than doing that very thing itself? Since () is the canonical list (it's empty, after all), this follows directly and requires on special knowledge whatsoever. --tom
Re: RFC 110 (v3) counting matches
p.s. Has anybody already suggested that we ought to have a nicer solution to execute perl code inside a string, replacing "${\(...)}" and "@{[...]}", which also won't ever win a beauty contest? Oops, wrong mailing list. The first one doesn't work, and never did. You want @{[]} and @{[scalar ]} instead. "Doesn't work"? print "The sum of 1 + 2 is ${\(1+2)}.\n"; -- The sum of 1 + 2 is 3. I'm surprised your wouldn't have known this. The principle is the same: "${...}" expects a scalar reference inside the block, and '\' provides one. Of course, there shouldn't be a real multi-element list inside the parens, but just one scalar. And often, the parens aren't needed. I'm surprised that you still don't understand. Notice what I showed you for the replacement above: @{[scalar ]}. Using ${\(...)} doesn't work in the sense that contrary to popular belief, it fails to provide a scalar context to the contents of those parens. Thus ${ \( fn() ) } is still calling fn() in list context, not scalar context. Witness: sub fn { sprintf "called in %s context", wantarray ? "list" : "scalar" } print "Test 1: "; print "@{ [fn()] }\n"; print "Test 2: "; print "${ \(fn()) }\n"; print "Test 3: "; print "@{ [scalar fn()] }\n"; That, when executed, yields: Test 1: called in list context Test 2: called in list context Test 3: called in scalar context *That's* why test 2 "doesn't work". --tom
Re: RFC 110 (v3) counting matches
Have you ever wanted to count the number of matches of a patten? s///g returns the number of matches it finds. m//g just returns 1 for matching. Counts can be made using s//$/g but this is wastefull, or by putting some counting loop round a m//g. But this all seams rather messy. It's really much easier than all that: $count = () = $string =~ /pattern/g; --tom