This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Replace =~, !~, m//, and s/// with match() and subst() =head1 VERSION Maintainer: Nathan Wiger <[EMAIL PROTECTED]> Date: 27 Aug 2000 Version: 1 Mailing List: [EMAIL PROTECTED] Number: 164 =head1 ABSTRACT Several people (including Larry) have expressed a desire to get rid of C<=~> and C<!~>. This RFC proposes a way to replace C<m//> and C<s///> with two new builtins, C<match()> and C<subst()>. =head1 DESCRIPTION =head2 Overview Everyone knows how C<=~> and C<!~> work. Several proposals, such as RFCs 135 and 138, attempt to fix some stuff with the current pattern-matching syntax. Most proposals center around minor modifications to C<m//> and C<s///>. This RFC proposes that C<m//> and C<s///> be dropped from the language altogether, and instead be replaced with new C<match> and C<subst> builtins, with the following syntaxes: $res = match /pattern/flags, $string $new = subst /pattern/newpattern/flags, $string These subs are designed to mirror the format of C<split>, making them more consistent. Unlike the current forms, these return the modified string, leaving C<$string> alone. (Unless they are called in a void context, in which case they act on and modify C<$_> consistent with current behavior). Extra arguments can be dropped, consistent with C<split> and many other builtins: match; # all defaults (pattern is /\w+/?) match /pat/; # match $_ match /pat/, $str; # match $str match /pat/, @strs; # match any of @strs subst; # like s///, pretty useless :-) subst /pat/new/; # sub on $_ subst /pat/new/, $str; # sub on $str subst /pat/new/, @strs; # return array of modified strings These new builtins eliminate the need for C<=~> and C<!~> altogether, since they are functions just like C<split>, C<join>, C<splice>, and so on. Sometimes examples are easiest, so here are some examples of the new syntax: Perl 5 Perl 6 -------------------------------- ---------------------------------- if ( /\w+/ ) { } if ( match ) { } die "Bad!" if ( $_ !~ /\w+/ ); die "Bad!" if ( ! match ); ($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g; next if /\s+/ || /\w+/; next if match /\s+/ or match /\w+/; next if ($str =~ /\s+/) || next if match /\s+/, $str or ($str =~ /\w+/) match /\w+/, $str; next unless $str =~ /^N/; next unless match /^N/, $str; $str =~ s/\w+/$bob/gi; $str = subst /\w+/$bob/gi, $str; ($str = $_) =~ s/\d+/&func/ge; $str = subst /\d+/&func/ge; s/\w+/this/; subst /\w+/this/; # These are pretty cool... foreach (@old) { @new = subst /hello/X/gi, @old; s/hello/X/gi; push @new, $_; } foreach (@str) { print "Got it" if match /\w+/, @str; print "Got it" if (/\w+/); } This gives us a cleaner, more consistent syntax. In addition, it makes several things easier, is more easily extensible: &callsomesub(subst(/old/new/gi, $mystr)); $str = subst /old/new/i, $r->getsomeval; and is easier to read English-wise. However, it requires a little too much typing. See below. =head2 Concerns This should be carefully considered. It's good because it gets rid of "yet another odditty" with a more standard syntax that I would argue is more powerful and consistent. However, it also causes everyone to relearn how to match and substitute patterns. This must be a careful, conscious decision, lest we really screw stuff up. That being said, since my intial post I have received several personal emails endorsing this, hence the reason I decided to RFC it. So it's an option, it just has to be powerful enough for people to see the "big win". Finally, it requires a little too much typing still for my tastes. Perhaps we should make "m" and "s" at least shortcuts to the names, possibly allowing users to bind them to the front of the pattern (similar to some of RFC 138's suggestions). Maybe these two could be equivalent: $new = subst /old/new/i, $old; == $new = s/old/new/i, $old; And then it doesn't look that radical anymore. This is similar to RFC 138, only C<$old> is not modified. =head1 IMPLEMENTATION Hold your horses =head1 MIGRATION This would be huge. Every pattern match would have to be translated, every Perl hacker would have to relearn patterns, and every Perl 5 book's regexp section would be instantly out of date. Like I said, this is not a simple decision. But if there's obvious increases in power, I think people will appreciate the change, not dread it. At the very least it makes Perl much more consistent. =head1 REFERENCES This is a synthesis of several ideas from myself, Ed Mills, and Tom C RFC 138: Eliminate =~ operator. RFC 135: Require explicit m on matches, even with ?? and // as delimiters.
