In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/f5b885cd5cef50d401e2785fc9cd1f5ab1859f48?hp=bcf53fae4530fa696a6f26dd100207c224a13e89>
- Log ----------------------------------------------------------------- commit f5b885cd5cef50d401e2785fc9cd1f5ab1859f48 Author: Father Chrysostomos <[email protected]> Date: Sun Mar 6 16:58:46 2011 -0800 More perlretut tweaks In particular: ⢠The word âsubstituteâ was misused. I changed it to substitution, but then realised that it was actually wordy and redundant, so I removed it. ⢠The /e modifier does not do an eval{...} or eval '...' or anything of the sort. s/foo/die/e demonstrates this clearly. (/ee is a diffe- rent matter, but is not covered in perlretut.) M pod/perlretut.pod commit 7862260706f75f5fa2a2e54ba2f431089f598303 Author: Father Chrysostomos <[email protected]> Date: Sun Mar 6 15:59:14 2011 -0800 perlretut: Mention /p M pod/perlretut.pod commit 8ccb14777ea99ad182277063e258c28132f12d9a Author: Father Chrysostomos <[email protected]> Date: Sun Mar 6 15:57:46 2011 -0800 perlretut tweaks In particular, remove the obsolete mention of new features âin 5.6.0â. M pod/perlretut.pod ----------------------------------------------------------------------- Summary of changes: pod/perlretut.pod | 45 +++++++++++++++++++++++++-------------------- 1 files changed, 25 insertions(+), 20 deletions(-) diff --git a/pod/perlretut.pod b/pod/perlretut.pod index 293683c..2e3ae39 100644 --- a/pod/perlretut.pod +++ b/pod/perlretut.pod @@ -41,7 +41,7 @@ you master the first part, you will have all the tools needed to solve about 98% of your needs. The second part of the tutorial is for those comfortable with the basics and hungry for more power tools. It discusses the more advanced regular expression operators and -introduces the latest cutting edge innovations in 5.6.0. +introduces the latest cutting-edge innovations. A note: to save time, 'regular expression' is often abbreviated as regexp or regex. Regexp is a more natural abbreviation than regex, but @@ -60,7 +60,7 @@ contains that word: "Hello World" =~ /World/; # matches What is this Perl statement all about? C<"Hello World"> is a simple -double quoted string. C<World> is the regular expression and the +double-quoted string. C<World> is the regular expression and the C<//> enclosing C</World/> tells Perl to search a string for a match. The operator C<=~> associates the string with the regexp match and produces a true value if the regexp matched, or false if the regexp @@ -287,7 +287,7 @@ Although one can already do quite a lot with the literal string regexps above, we've only scratched the surface of regular expression technology. In this and subsequent sections we will introduce regexp concepts (and associated metacharacter notations) that will allow a -regexp to not just represent a single character sequence, but a I<whole +regexp to represent not just a single character sequence, but a I<whole class> of them. One such concept is that of a I<character class>. A character class @@ -742,7 +742,7 @@ all 3-letter doubles with a space in between: /\b(\w\w\w)\s\g1\b/; -The grouping assigns a value to \g1, so that the same 3 letter sequence +The grouping assigns a value to \g1, so that the same 3-letter sequence is used for both parts. A similar task is to find words consisting of two identical parts: @@ -773,7 +773,7 @@ preceding capture group one now may write C<\g{-1}>, the next but last is available via C<\g{-2}>, and so on. Another good reason in addition to readability and maintainability -for using relative backreferences is illustrated by the following example, +for using relative backreferences is illustrated by the following example, where a simple pattern for matching peculiar strings is used: $a99a = '([a-z])(\d)\g2\g1'; # matches a11a, g22g, x33x, etc. @@ -897,6 +897,9 @@ C<@+> instead: $& is the same as substr( $x, $-[0], $+[0]-$-[0] ) $' is the same as substr( $x, $+[0] ) +As of Perl 5.10, the C<${^PREMATCH}>, C<${^MATCH}> and C<${^POSTMATCH}> +variables may be used. These are only set if the C</p> modifier is present. +Consequently they do not penalize the rest of the program. =head2 Non-capturing groupings @@ -981,7 +984,7 @@ Here are some examples: /y(es)?/i; # matches 'y', 'Y', or a case-insensitive 'yes' $year =~ /^\d{2,4}$/; # make sure year is at least 2 but not more # than 4 digits - $year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3 digit dates + $year =~ /^\d{4}$|^\d{2}$/; # better match; throw out 3-digit dates $year =~ /^\d{2}(\d{2})?$/; # same thing written differently. However, # this captures the last two digits in $1 # and the other does not. @@ -1019,9 +1022,9 @@ stop there, but that wouldn't give the longest possible string to the first quantifier C<.*>. Instead, the first quantifier C<.*> grabs as much of the string as possible while still having the regexp match. In this example, that means having the C<at> sequence with the final C<at> -in the string. The other important principle illustrated here is that +in the string. The other important principle illustrated here is that, when there are two or more elements in a regexp, the I<leftmost> -quantifier, if there is one, gets to grab as much the string as +quantifier, if there is one, gets to grab as much of the string as possible, leaving the rest of the regexp to fight over scraps. Thus in our example, the first quantifier C<.*> grabs most of the string, while the second quantifier C<.*> gets the empty string. Quantifiers that @@ -1419,7 +1422,7 @@ we can rewrite our 'extended' regexp in the more pleasing form If whitespace is mostly irrelevant, how does one include space characters in an extended regexp? The answer is to backslash it S<C<'\ '>> or put it in a character class S<C<[ ]>>. The same thing -goes for pound signs, use C<\#> or C<[#]>. For instance, Perl allows +goes for pound signs: use C<\#> or C<[#]>. For instance, Perl allows a space between the sign and the mantissa or integer, and we could add this to our regexp as follows: @@ -1548,7 +1551,7 @@ The final two modifiers C<//g> and C<//c> concern multiple matches. The modifier C<//g> stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have -`C<//g> jump from match to match, keeping track of position in the +C<//g> jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the C<pos()> function. @@ -1615,7 +1618,7 @@ bit at a time and use arbitrary Perl logic to decide what to do next. Currently, the C<\G> anchor is only fully supported when used to anchor to the start of the pattern. -C<\G> is also invaluable in processing fixed length records with +C<\G> is also invaluable in processing fixed-length records with regexps. Suppose we have a snippet of coding region DNA, encoded as base pair letters C<ATCGTTGAAT...> and we want to find all the stop codons C<TGA>. In a coding region, codons are 3-letter sequences, so @@ -1666,11 +1669,11 @@ operations in Perl. Search and replace is accomplished with the C<s///> operator. The general form is C<s/regexp/replacement/modifiers>, with everything we know about regexps and modifiers applying in this case as well. The -C<replacement> is a Perl double quoted string that replaces in the +C<replacement> is a Perl double-quoted string that replaces in the string whatever is matched with the C<regexp>. The operator C<=~> is also used here to associate a string with C<s///>. If matching against C<$_>, the S<C<$_ =~>> can be dropped. If there is a match, -C<s///> returns the number of substitutions made, otherwise it returns +C<s///> returns the number of substitutions made; otherwise it returns false. Here are a few examples: $x = "Time to feed the cat!"; @@ -1684,7 +1687,7 @@ false. Here are a few examples: In the last example, the whole string was matched, but only the part inside the single quotes was grouped. With the C<s///> operator, the -matched variables C<$1>, C<$2>, etc. are immediately available for use +matched variables C<$1>, C<$2>, etc. are immediately available for use in the replacement expression, so we use C<$1> to replace the quoted string with just what was quoted. With the global modifier, C<s///g> will search and replace all occurrences of the regexp in the string: @@ -1725,7 +1728,7 @@ behavior so that C<s///r> returns the final substituted string: print "$x $y\n"; That example will print "I like dogs. I like cats". Notice the original -C<$x> variable has not been affected by the substitute. The overall +C<$x> variable has not been affected. The overall result of the substitution is instead stored in C<$y>. If the substitution doesn't affect anything then the original string is returned: @@ -1742,8 +1745,9 @@ substitutions: # prints "Hedgehogs are great." A modifier available specifically to search and replace is the -C<s///e> evaluation modifier. C<s///e> wraps an C<eval{...}> around -the replacement string and the evaluated result is substituted for the +C<s///e> evaluation modifier. C<s///e> treats the +replacement text as Perl code, rather than a double-quoted +string. The value that the code returns is substituted for the matched substring. C<s///e> is useful if you need to do a bit of computation in the process of replacing text. This example counts character frequencies in a line: @@ -1767,8 +1771,9 @@ This prints As with the match C<m//> operator, C<s///> can use other delimiters, such as C<s!!!> and C<s{}{}>, and even C<s{}//>. If single quotes are -used C<s'''>, then the regexp and replacement are treated as single -quoted strings and there are no substitutions. C<s///> in list context +used C<s'''>, then the regexp and replacement are +treated as single-quoted strings and there are no +variable substitutions. C<s///> in list context returns the same thing as in scalar context, i.e., the number of matches. @@ -1810,7 +1815,7 @@ an empty initial element to the list. If you have read this far, congratulations! You now have all the basic tools needed to use regular expressions to solve a wide range of text processing problems. If this is your first time through the tutorial, -why not stop here and play around with regexps a while... S<Part 2> +why not stop here and play around with regexps a while.... S<Part 2> concerns the more esoteric aspects of regular expressions and those concepts certainly aren't needed right at the start. -- Perl5 Master Repository
