This and other RFCs are available on the web at http://dev.perl.org/rfc/ =head1 TITLE Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade() =head1 VERSION Maintainer: Nathan Wiger <[EMAIL PROTECTED]> Date: 27 Aug 2000 Last-Modified: 29 Aug 2000 Version: 2 Mailing List: [EMAIL PROTECTED] Number: 164 Status: Developing =head1 CHANGES 1. Added 100% backwards-compatible syntax 2. Included C<trade> replacement for C<tr///> 3. Expanded examples and contexts =head1 ABSTRACT Several people (including Larry) have expressed a desire to get rid of C<=~> and C<!~>. This RFC proposes a way to replace C<m//>, C<s///>, and C<tr///> with three new builtins, C<match>, C<subst>, and C<trade>. It also proposes a way to allow full backwards-compatible syntax. =head1 DESCRIPTION =head2 Overview Everyone knows how C<=~> and C<!~> work. Several proposals, such as RFCs 135 and 138, attempt to fix some stuff with the current pattern-matching syntax. Most proposals center around minor modifications to C<m//> and C<s///>. This RFC proposes that C<m//>, C<s///>, and C<tr///> be dropped from the language, and instead be replaced with new C<match>, C<subst>, and C<trade> builtins, with the following syntaxes: $res [, $res] = match /pat/flags, $str [, $str]; $res [, $res] = subst /pat/new/flags, $str [, $str]; $res [, $res] = trade /pat/new/flags, $str [, $str]; These subs are designed to mirror the format of C<split>, making them more consistent. Unlike the current forms, these return the modified string, leaving the input C<$str> alone. Context modifies the return values just as Perl 5 context does, with some extensions: 1. If called in a void context, they act on and modify C<$_>, consistent with current behavior. 2. If called in a scalar context, C<match> returns the number of matches (like now), and the rest return the first (or only) string. 3. If called in a list context, a list of the modified strings will be returned. 4. If called in a numeric context, they all return the number of substitutions made. Extra arguments can be dropped, consistent with C<split> and many other builtins: match; # all defaults (pattern is /\w+/) match /pat/; # match $_ match /pat/, $str; # match $str match /pat/, @strs; # match any of @strs subst; # strip leading/trailing whitespace subst /pat/new/; # sub on $_ subst /pat/new/, $str; # sub on $str subst /pat/new/, @strs; # return array of modified strings trade; # nothing trade /pat/new/; # tr on $_ trade /pat/new/, $str; # tr on $str trade /pat/new/, @str; # return array of modified strings These new builtins eliminate the need for C<=~> and C<!~> altogether, since they are functions just like C<split>, C<join>, C<splice>, and so on. There are also shortcut forms, see below. Sometimes examples are easiest, so here are some examples of the new syntax: Perl 5 Perl 6 -------------------------------- ---------------------------------- if ( /\w+/ ) { } if ( match ) { } die "Bad!" if ( $_ !~ /\w+/ ); die "Bad!" if ( ! match ); ($res) = m#^(.*)$#g; ($res) = match #^(.*)$#g; next if /\s+/ || /\w+/; next if match /\s+/ or match /\w+/; next if ($str =~ /\s+/) || next if match /\s+/, $str or ($str =~ /\w+/) match /\w+/, $str; next unless $str =~ /^N/; next unless match /^N/, $str; $str =~ s/\w+/$bob/gi; $str = subst /\w+/$bob/gi, $str; s/\w+/this/; subst /\w+/this/; tr/a-z/Z-A/; trade /a-z/Z-A/; $new =~ tr/a/b/; $new = trade /a/b/, $new; # Some become easier and more consistent... ($str = $_) =~ s/\d+/&func/ge; $str = subst /\d+/&func/ge; ($new = $old) =~ tr/a/z/; $new = trade /a/z/, $old; # And these are pretty cool... foreach (@old) { @new = subst /hello/X/gi, @old; s/hello/X/gi; push @new, $_; } foreach (@str) { @new = trade /a-z/A-Z/, @str; tr/a-z/A-Z/; push @new, $_; } foreach (@str) { print "Got it" if match /\w+/, @str; if (/\w+/) { $gotit = 1 }; } print "Got it" if $gotit; This gives us a cleaner, more consistent syntax. In addition, it makes several things easier, is more easily extensible: &callsomesub(subst(/old/new/gi, $mystr)); $str = subst /old/new/i, $r->getsomeval; and is easier to read English-wise. However, it requires too much typing. For that reason, we include the shortcut form as well: =head2 Shortcut Form RFC 139 describes a way that the C<//> syntax can be expanded to any function. So, to gain backwards compatibility, we simply allow this syntax along with the shortcut function names C<s>, C<m>, and C<tr> [1]: Shortcut Form Builtin -------------------------------- ---------------------------------- s/\w+/W/g; subst /\w+/W/g; /\w+/; match /\w+/; tr/ae/io/; trade /ae/io/; $new = s/\s+/X/, $old; $new = subst /\s+/X/, $old; $num = m/\w+/, $str; $num = match /\w+/, $str; $new = tr/a-z/z-a/, $str; $new = trade /a-z/z-a/, $str; Note C<//> can still be used as a shortcut to C<m//>. This is the form most people will use, I would imagine. Starting to look like Perl 5... =head2 Use of C<=~> Syntax Another RFC I submitted (not posted yet) shows how C<=~> can be used as a more generic assignment operator / rvalue duplicator. With this ability, we can now write all our Perl 5 regex syntaxes still, even though they're actually new Perl 6 builtins: Shortcut Form + C<=~> Builtin -------------------------------- ---------------------------------- $str =~ s/\w+/W/g; $str = subst /\w+/W/g, $str; $str =~ tr/a-z/z-a/; $str = trade /a-z/z-a/, $str; $str =~ /\w+/; match /\w+/, $str; # See [2] ($match) = /^(.*)$/g; ($match) = match /^(.*)$/g; # Can't do these in Perl 5 @str =~ s/$foo/$bar/gi; @str = subst /$foo/$bar/gi, @str; @str =~ tr/a-z/A-Z/; @str = trade /a-z/A-Z/, @str; @str =~ m/^Pass:/; match /^Pass:/, @str; So, why all the bother if it looks just like Perl 5? Well, these last two sections are based on more general mechanisms for Perl 6. That is, allowing the generalization of C<=~> and the C<//> syntax allows us to write these expressions in a way that is backwards compatible. However, there is no explicit relationship between the Perl 5 backwards-compat syntax and the new Perl 6 syntax, even though there appears to be. In fact, these mechanisms - which are covered in other RFCs - would allow us to write stuff like: $str =~ quotemeta; # $str = quotemeta($str); @a =~ sort { $a <=> $b }; # @a = sort { $a <=> $b } @a; So you can see how this general purpose mechanism allows us backwards compatibility. Finally, note how we have a good amount of flexible, parallel syntax because of this: $str =~ s/$foo/$bar/gi; # just a general shorcut $new = s/$bar/$baz/g, $str; # more consistent when $new != $str =head2 Concerns Because of the fact that this proposal has the ability to be 100% backwards compatible, it doesn't strike me as problematic anymore. However, it should still be a conscious decision to change pattern matching at all. I have no interest in breaking Perl 5 regex's. At all. None. Still, I have received many personal emails in favor of this idea. So, if implemented correctly, I think it could be a benefit for Perl 6. Finally, note that C<trade> was chosen because "transliterate" is way too long and "trans" looks to be taken by transactional variables. And C<trade> seems to connote the action pretty well still. However, the issue is still open for debate. Alternatives to C<subst> are welcomed. =head1 IMPLEMENTATION Hold your horses. =head1 MIGRATION There are no longer any syntax changes as of v2. No migration path is required. =head1 NOTES [1] If most people are going to continue using the shortcut form and names, it might be wise just to make the functions be named C<m>, C<s>, and C<tr>, even though these are silly function names. [2] C<match> is a bit of a special case, just like C<m//> is when compared to C<s///> and C<tr///>. The support of C<!~> and C<m//> will have to be explored some more, but I'll leave that for a subsequent version. =head1 REFERENCES This is a synthesis of several ideas from myself, MJD, Ed Mills, and Tom C. RFC 138: Eliminate =~ operator. RFC 139: Allow Calling Any Function With A Syntax Like s/// RFC 170: Generalize =~ to a special-purpose assignment operator
