Re: RFC 158 (v1) Regular Expression Special Variables
Mark-Jason Dominus writes: : There's also long been talk/thought about making $ and $1 : and friends magic aliases into the original string, which would : save that cost. : :Please correct me if I'm mistaken, but I believe that that's the way :they are implemented now. A regex match populates the -startp and :-endp parts of the regex structure, and the elements of these items :are byte offsets into the original string. I went on a briefish trawl for this the other day, and as far as I can tell what happens is this: - during matching, the startp/endp pairs are populated with offsets into the target string - immediately after matching, the target string is copied if needed, and the PL_curpm object is updated to refer to the copy - the copy is needed if any of the special variables can be referred to: $`, $, $', $1, $2, ... The result of that is that if there are backreferences in the regexp, the copy is always needed; if not, the copy is needed only if $ or her kin have been seen. So regexps with backrefs should suffer no slowdown from use of $ in the same program, but regexps without backrefs will get a (potentially) unnecessary copy. The other problem with this, of course, is that the compiler may not yet have seen the $ we intend to use: crypt% perl -wle '$_="foo"; /.*/; $_="bar"; print eval q{$}' bar crypt% .. and I think coredumps may be possible from this. (Hmm, perlbug upcoming.) Hugo
Re: RFC 158 (v1) Regular Expression Special Variables
"TC" == Tom Christiansen [EMAIL PROTECTED] writes: $`, $ and $' are useful variables which are never used by any experienced Perl hacker since they have well known problems with efficiency. TC That's hardly true. I could show you plenty of code from TC inexperienced Perl hackers like lwall that use them. But TC the cost in understood. :-) those early perl3 scripts by lwall floating around in /etc were poorly written. i am glad they are finally out of the distribution. TC The rest of what you said probably is reasonable, however. TC The (.*?)(blah)(.*) solution kind works sometimes, but is TC hardly pleasant. Likewise the @+ and @- stuff. i would like to see the @+ and @- stuff made to work faster or beterr or something. they have merit but not practicality. another related grabbing issue is grabbing repeated groups like @all_words = /(\w+\s+)+/ ; we only get the last match from that. but that should be a separate rfc. TC There's also long been talk/thought about making $ and $1 TC and friends magic aliases into the original string, which would TC save that cost. but if you modify that string with s/// you lose unless you make a copy. in fact $`, $ and $' should just be aliases if the op was m///. it is the s/// case that is the problem. that brings up the question about how often is $ needed after a s///? it almost makes little sense since you are matching and modifying. maybe we can also remove support for them with s/// and thereby remove the copy penalty. but my idea would work in both cases and puts it under program control so we could just use that. uri -- Uri Guttman - [EMAIL PROTECTED] -- http://www.sysarch.com SYStems ARCHitecture, Software Engineering, Perl, Internet, UNIX Consulting The Perl Books Page --- http://www.sysarch.com/cgi-bin/perl_books The Best Search Engine on the Net -- http://www.northernlight.com
Re: RFC 158 (v1) Regular Expression Special Variables
those early perl3 scripts by lwall floating around in /etc were poorly written. i am glad they are finally out of the distribution. Those weren't the scripts I was thinking about, and it is *NOT* ipso facto true that something which uses $ or $` is poorly written. --tom
Re: RFC 158 (v1) Regular Expression Special Variables
Tom Christiansen wrote: There's also long been talk/thought about making $ and $1 and friends magic aliases into the original string, which would save that cost. I was distressed to discover that s///g does not rebuild the old string between matches, but only at the end. It broke my random anagram generator which was depending on instant updates. If STRING was a linked list of partially full blocks rather than a big piece of contiguous space, we could do length-altering substitutions without copying. -- David Nicol 816.235.1187 [EMAIL PROTECTED] safety first: Republicans for Nader in 2000
Re: RFC 158 (v1) Regular Expression Special Variables
Please correct me if I'm mistaken, but I believe that that's the way they are implemented now. A regex match populates the -startp and -endp parts of the regex structure, and the elements of these items are byte offsets into the original string. I haven't looked at it at all, and perhaps that 's sometihng Ilya did when creating @+ etc. So you might be right. As far as I know it's the same in 5.000. I thought the problem with $ was that the regex engine has to adjust the offsets in the startp/endp arrays every time it scans forward a character or backtracks a character. But maybe the effect of $ is greatly exaggerated or is a relic from perl4? Has anyone actually benchmarked this recently?