Mark-Jason Dominus writes:
:> There's also long been talk/thought about making $& and $1
:> and friends magic aliases into the original string, which would
:> save that cost.
:
:Please correct me if I'm mistaken, but I believe that that's the way
:they are implemented now. A regex match populates the ->startp and
:->endp parts of the regex structure, and the elements of these items
:are byte offsets into the original string.
I went on a briefish trawl for this the other day, and as far as I
can tell what happens is this:
- during matching, the startp/endp pairs are populated with offsets
into the target string
- immediately after matching, the target string is copied if needed,
and the PL_curpm object is updated to refer to the copy
- the copy is needed if any of the special variables can be referred
to: $`, $&, $', $1, $2, ...
The result of that is that if there are backreferences in the regexp,
the copy is always needed; if not, the copy is needed only if $& or
her kin have been seen. So regexps with backrefs should suffer no
slowdown from use of $& in the same program, but regexps without
backrefs will get a (potentially) unnecessary copy.
The other problem with this, of course, is that the compiler may not
yet have seen the $& we intend to use:
crypt% perl -wle '$_="foo"; /.*/; $_="bar"; print eval q{$&}'
bar
crypt%
.. and I think coredumps may be possible from this. (Hmm, perlbug
upcoming.)
Hugo