irc-html Mon Jan 21 14:28:39 2002 EDT Modified files: /phpdoc/en/functions pcre.xml Log: whitespace correction
Index: phpdoc/en/functions/pcre.xml diff -u phpdoc/en/functions/pcre.xml:1.68 phpdoc/en/functions/pcre.xml:1.69 --- phpdoc/en/functions/pcre.xml:1.68 Fri Dec 28 07:44:29 2001 +++ phpdoc/en/functions/pcre.xml Mon Jan 21 14:28:39 2002 @@ -1,5 +1,5 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!-- $Revision: 1.68 $ --> +<!-- $Revision: 1.69 $ --> <reference id="ref.pcre"> <title>Regular Expression Functions (Perl-Compatible)</title> <titleabbrev>PCRE</titleabbrev> @@ -36,17 +36,17 @@ <itemizedlist> <listitem> <simpara> - /href='(.*)' - missing ending delimiter + /href='(.*)' - missing ending delimiter </simpara> </listitem> <listitem> <simpara> - /\w+\s*\w+/J - unknown modifier 'J' + /\w+\s*\w+/J - unknown modifier 'J' </simpara> </listitem> <listitem> <simpara> - 1-\d3-\d3-\d4| - missing starting delimiter + 1-\d3-\d3-\d4| - missing starting delimiter </simpara> </listitem> </itemizedlist> @@ -65,7 +65,7 @@ </simpara> </note> </partintro> - + <refentry id="function.preg-match"> <refnamediv> <refname>preg_match</refname> @@ -145,13 +145,13 @@ ]]> </programlisting> <para> - This example will produce: - <screen> + This example will produce: + <screen> <![CDATA[ domain name is: php.net ]]> - </screen> -</para> + </screen> + </para> </example> See also <function>preg_match_all</function>, <function>preg_replace</function>, and @@ -194,12 +194,12 @@ <varlistentry> <term>PREG_PATTERN_ORDER</term> <listitem> - <para> - Orders results so that $matches[0] is an array of full - pattern matches, $matches[1] is an array of strings matched by - the first parenthesized subpattern, and so on. - <informalexample> - <programlisting role="php"> + <para> + Orders results so that $matches[0] is an array of full + pattern matches, $matches[1] is an array of strings matched by + the first parenthesized subpattern, and so on. + <informalexample> + <programlisting role="php"> <![CDATA[ preg_match_all ("|<[^>]+>(.*)</[^>]+>|U", "<b>example: </b><div align=left>this is a test</div>", @@ -220,18 +220,18 @@ and $out[1] contains array of strings enclosed by tags. </para> </informalexample> - </para> + </para> </listitem> </varlistentry> <varlistentry> <term>PREG_SET_ORDER</term> <listitem> - <para> - Orders results so that $matches[0] is an array of first set - of matches, $matches[1] is an array of second set of matches, - and so on. - <informalexample> - <programlisting role="php"> + <para> + Orders results so that $matches[0] is an array of first set + of matches, $matches[1] is an array of second set of matches, + and so on. + <informalexample> + <programlisting role="php"> <![CDATA[ preg_match_all ("|<;[^>]+>(.*)</[^>]+>|U", "<b>example: </b><div align=left>this is a test</div>", @@ -239,25 +239,26 @@ print $out[0][0].", ".$out[0][1]."\n"; print $out[1][0].", ".$out[1][1]."\n"; ]]> - </programlisting> - </informalexample> - This example will produce: - <informalexample> - <programlisting role="php"> + </programlisting> + </informalexample> + This example will produce: + <informalexample> + <programlisting role="php"> <![CDATA[ <b>example: </b>, example: <div align=left>this is a test</div>, this is a test ]]> - </programlisting> - </informalexample> - In this case, $matches[0] is the first set of matches, and - $matches[0][0] has text matched by full pattern, $matches[0][1] - has text matched by first subpattern and so on. Similarly, - $matches[1] is the second set of matches, etc. - </para> + </programlisting> + </informalexample> + In this case, $matches[0] is the first set of matches, and + $matches[0][0] has text matched by full pattern, $matches[0][1] + has text matched by first subpattern and so on. Similarly, + $matches[1] is the second set of matches, etc. + </para> </listitem> </varlistentry> - </variablelist></para> + </variablelist> + </para> <para> If <parameter>order</parameter> is not specified, it is assumed to be PREG_PATTERN_ORDER. @@ -475,7 +476,7 @@ <para> Parameter <parameter>limit</parameter> was added after PHP 4.0.1pl2. </para> - </note> + </note> <para> See also <function>preg_match</function>, <function>preg_match_all</function>, and @@ -550,35 +551,35 @@ </para> <para> - If <parameter>limit</parameter> is specified, then only substrings up to - <parameter>limit</parameter> are returned, and if - <parameter>limit</parameter> is -1, it actually means "no limit", which is - useful for specifying the <parameter>flags</parameter>. + If <parameter>limit</parameter> is specified, then only substrings up to + <parameter>limit</parameter> are returned, and if + <parameter>limit</parameter> is -1, it actually means "no limit", which is + useful for specifying the <parameter>flags</parameter>. </para> <para> - <parameter>flags</parameter> can be any combination of the following flags - (combined with bitwise | operator): + <parameter>flags</parameter> can be any combination of the following flags + (combined with bitwise | operator): <variablelist> <varlistentry> - <term>PREG_SPLIT_NO_EMPTY</term> - <listitem> - <simpara> - If this flag is set, only non-empty pieces will be returned by - <function>preg_split</function>. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term>PREG_SPLIT_DELIM_CAPTURE</term> - <listitem> - <simpara> - If this flag is set, parenthesized expression in the delimiter pattern - will be captured and returned as well. This flag was added for 4.0.5. - </simpara> - </listitem> + <term>PREG_SPLIT_NO_EMPTY</term> + <listitem> + <simpara> + If this flag is set, only non-empty pieces will be returned by + <function>preg_split</function>. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term>PREG_SPLIT_DELIM_CAPTURE</term> + <listitem> + <simpara> + If this flag is set, parenthesized expression in the delimiter pattern + will be captured and returned as well. This flag was added for 4.0.5. + </simpara> + </listitem> </varlistentry> - </variablelist> + </variablelist> </para> <para> <example> @@ -739,159 +740,159 @@ <blockquote> <variablelist> <varlistentry> - <term><emphasis>i</emphasis> (PCRE_CASELESS)</term> - <listitem> - <simpara> - If this modifier is set, letters in the pattern match both - upper and lower case letters. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>m</emphasis> (PCRE_MULTILINE)</term> - <listitem> - <simpara> - By default, PCRE treats the subject string as consisting of a - single "line" of characters (even if it actually contains - several newlines). The "start of line" metacharacter (^) - matches only at the start of the string, while the "end of - line" metacharacter ($) matches only at the end of the - string, or before a terminating newline (unless - <emphasis>D</emphasis> modifier is set). This is the same as - Perl. - </simpara> - <simpara> - When this modifier is set, the "start of line" and "end of - line" constructs match immediately following or immediately - before any newline in the subject string, respectively, as - well as at the very start and end. This is equivalent to - Perl's /m modifier. If there are no "\n" characters in a - subject string, or no occurrences of ^ or $ in a pattern, - setting this modifier has no effect. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>s</emphasis> (PCRE_DOTALL)</term> - <listitem> - <simpara> - If this modifier is set, a dot metacharater in the pattern - matches all characters, including newlines. Without it, - newlines are excluded. This modifier is equivalent to Perl's - /s modifier. A negative class such as [^a] always matches a - newline character, independent of the setting of this - modifier. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>x</emphasis> (PCRE_EXTENDED)</term> - <listitem> - <simpara> - If this modifier is set, whitespace data characters in the - pattern are totally ignored except when escaped or inside a - character class, and characters between an unescaped # - outside a character class and the next newline character, - inclusive, are also ignored. This is equivalent to Perl's /x - modifier, and makes it possible to include comments inside - complicated patterns. Note, however, that this applies only - to data characters. Whitespace characters may never appear - within special character sequences in a pattern, for example - within the sequence (?( which introduces a conditional - subpattern. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>e</emphasis></term> - <listitem> - <simpara> - If this modifier is set, <function>preg_replace</function> - does normal substitution of backreferences in the - replacement string, evaluates it as PHP code, and uses the - result for replacing the search string. - </simpara> - <simpara> - Only <function>preg_replace</function> uses this modifier; - it is ignored by other PCRE functions. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>A</emphasis> (PCRE_ANCHORED)</term> - <listitem> - <simpara> - If this modifier is set, the pattern is forced to be - "anchored", that is, it is constrained to match only at the - start of the string which is being searched (the "subject - string"). This effect can also be achieved by appropriate - constructs in the pattern itself, which is the only way to - do it in Perl. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>D</emphasis> (PCRE_DOLLAR_ENDONLY)</term> - <listitem> - <simpara> - If this modifier is set, a dollar metacharacter in the pattern - matches only at the end of the subject string. Without this - modifier, a dollar also matches immediately before the final - character if it is a newline (but not before any other - newlines). This modifier is ignored if <emphasis>m</emphasis> - modifier is set. There is no equivalent to this modifier in - Perl. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>S</emphasis></term> - <listitem> - <simpara> - When a pattern is going to be used several times, it is - worth spending more time analyzing it in order to speed up - the time taken for matching. If this modifier is set, then - this extra analysis is performed. At present, studying a - pattern is useful only for non-anchored patterns that do not - have a single fixed starting character. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>U</emphasis> (PCRE_UNGREEDY)</term> - <listitem> - <simpara> - This modifier inverts the "greediness" of the quantifiers so - that they are not greedy by default, but become greedy if - followed by "?". It is not compatible with Perl. It can also - be set by a (?U) modifier setting within the pattern. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>X</emphasis> (PCRE_EXTRA)</term> - <listitem> - <simpara> - This modifier turns on additional functionality of PCRE that - is incompatible with Perl. Any backslash in a pattern that - is followed by a letter that has no special meaning causes - an error, thus reserving these combinations for future - expansion. By default, as in Perl, a backslash followed by a - letter with no special meaning is treated as a literal. - There are at present no other features controlled by this - modifier. - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>u</emphasis> (PCRE_UTF8)</term> - <listitem> - <simpara> - This modifier turns on additional functionality of PCRE that - is incompatible with Perl. Pattern strings are treated as - UTF-8. This modifier is available from PHP 4.1.0 or greater. - </simpara> - </listitem> + <term><emphasis>i</emphasis> (PCRE_CASELESS)</term> + <listitem> + <simpara> + If this modifier is set, letters in the pattern match both + upper and lower case letters. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>m</emphasis> (PCRE_MULTILINE)</term> + <listitem> + <simpara> + By default, PCRE treats the subject string as consisting of a + single "line" of characters (even if it actually contains + several newlines). The "start of line" metacharacter (^) + matches only at the start of the string, while the "end of + line" metacharacter ($) matches only at the end of the + string, or before a terminating newline (unless + <emphasis>D</emphasis> modifier is set). This is the same as + Perl. + </simpara> + <simpara> + When this modifier is set, the "start of line" and "end of + line" constructs match immediately following or immediately + before any newline in the subject string, respectively, as + well as at the very start and end. This is equivalent to + Perl's /m modifier. If there are no "\n" characters in a + subject string, or no occurrences of ^ or $ in a pattern, + setting this modifier has no effect. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>s</emphasis> (PCRE_DOTALL)</term> + <listitem> + <simpara> + If this modifier is set, a dot metacharacter in the pattern + matches all characters, including newlines. Without it, + newlines are excluded. This modifier is equivalent to Perl's + /s modifier. A negative class such as [^a] always matches a + newline character, independent of the setting of this + modifier. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>x</emphasis> (PCRE_EXTENDED)</term> + <listitem> + <simpara> + If this modifier is set, whitespace data characters in the + pattern are totally ignored except when escaped or inside a + character class, and characters between an unescaped # + outside a character class and the next newline character, + inclusive, are also ignored. This is equivalent to Perl's /x + modifier, and makes it possible to include comments inside + complicated patterns. Note, however, that this applies only + to data characters. Whitespace characters may never appear + within special character sequences in a pattern, for example + within the sequence (?( which introduces a conditional + subpattern. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>e</emphasis></term> + <listitem> + <simpara> + If this modifier is set, <function>preg_replace</function> + does normal substitution of backreferences in the + replacement string, evaluates it as PHP code, and uses the + result for replacing the search string. + </simpara> + <simpara> + Only <function>preg_replace</function> uses this modifier; + it is ignored by other PCRE functions. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>A</emphasis> (PCRE_ANCHORED)</term> + <listitem> + <simpara> + If this modifier is set, the pattern is forced to be + "anchored", that is, it is constrained to match only at the + start of the string which is being searched (the "subject + string"). This effect can also be achieved by appropriate + constructs in the pattern itself, which is the only way to + do it in Perl. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>D</emphasis> (PCRE_DOLLAR_ENDONLY)</term> + <listitem> + <simpara> + If this modifier is set, a dollar metacharacter in the pattern + matches only at the end of the subject string. Without this + modifier, a dollar also matches immediately before the final + character if it is a newline (but not before any other + newlines). This modifier is ignored if <emphasis>m</emphasis> + modifier is set. There is no equivalent to this modifier in + Perl. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>S</emphasis></term> + <listitem> + <simpara> + When a pattern is going to be used several times, it is + worth spending more time analyzing it in order to speed up + the time taken for matching. If this modifier is set, then + this extra analysis is performed. At present, studying a + pattern is useful only for non-anchored patterns that do not + have a single fixed starting character. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>U</emphasis> (PCRE_UNGREEDY)</term> + <listitem> + <simpara> + This modifier inverts the "greediness" of the quantifiers so + that they are not greedy by default, but become greedy if + followed by "?". It is not compatible with Perl. It can also + be set by a (?U) modifier setting within the pattern. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>X</emphasis> (PCRE_EXTRA)</term> + <listitem> + <simpara> + This modifier turns on additional functionality of PCRE that + is incompatible with Perl. Any backslash in a pattern that + is followed by a letter that has no special meaning causes + an error, thus reserving these combinations for future + expansion. By default, as in Perl, a backslash followed by a + letter with no special meaning is treated as a literal. + There are at present no other features controlled by this + modifier. + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>u</emphasis> (PCRE_UTF8)</term> + <listitem> + <simpara> + This modifier turns on additional functionality of PCRE that + is incompatible with Perl. Pattern strings are treated as + UTF-8. This modifier is available from PHP 4.1.0 or greater. + </simpara> + </listitem> </varlistentry> </variablelist> </blockquote> @@ -922,31 +923,31 @@ The differences described here are with respect to Perl 5.005. <orderedlist> - <listitem> - <simpara> - By default, a whitespace character is any character that - the C library function isspace() recognizes, though it is - possible to compile PCRE with alternative character type - tables. Normally isspace() matches space, formfeed, newline, - carriage return, horizontal tab, and vertical tab. Perl 5 no - longer includes vertical tab in its set of whitespace char- - acters. The \v escape that was in the Perl documentation for - a long time was never in fact recognized. However, the char- - acter itself was treated as whitespace at least up to 5.002. - In 5.004 and 5.005 it does not match \s. - </simpara> - </listitem> - <listitem> - <simpara> + <listitem> + <simpara> + By default, a whitespace character is any character that + the C library function isspace() recognizes, though it is + possible to compile PCRE with alternative character type + tables. Normally isspace() matches space, formfeed, newline, + carriage return, horizontal tab, and vertical tab. Perl 5 no + longer includes vertical tab in its set of whitespace characters. + The \v escape that was in the Perl documentation for + a long time was never in fact recognized. However, the character + itself was treated as whitespace at least up to 5.002. + In 5.004 and 5.005 it does not match \s. + </simpara> + </listitem> + <listitem> + <simpara> PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits them, but they do not mean what you might think. For example, (?!a){3} does not assert that the next three characters are not "a". It just asserts that the next character is not "a" three times. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> Capturing subpatterns that occur inside negative looka- head assertions are counted, but their entries in the offsets vector are never set. Perl sets its numerical vari- @@ -954,39 +955,39 @@ assertion fails to match something (thereby succeeding), but only if the negative lookahead assertion contains just one branch. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> Though binary zero characters are supported in the sub- ject string, they are not allowed in a pattern string because it is passed as a normal C string, terminated by zero. The escape sequence "\0" can be used in the pattern to represent a binary zero. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> The following Perl escape sequences are not supported: \l, \u, \L, \U, \E, \Q. In fact these are implemented by Perl's general string-handling and are not part of its pat- tern matching engine. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> The Perl \G assertion is not supported as it is not relevant to single pattern matches. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> Fairly obviously, PCRE does not support the (?{code}) construction. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> There are at the time of writing some oddities in Perl 5.005_02 concerned with the settings of captured strings when part of a pattern is repeated. For example, matching @@ -997,23 +998,23 @@ In Perl 5.004 $2 is set in both cases, and that is also &true; of PCRE. If in the future Perl changes to a consistent state that is different, PCRE may change to follow. - </simpara> - </listitem> - <listitem> - <simpara> + </simpara> + </listitem> + <listitem> + <simpara> Another as yet unresolved discrepancy is that in Perl 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string "a", whereas in PCRE it does not. However, in both Perl and PCRE /^(a)?a/ matched against "a" leaves $1 unset. - </simpara> - </listitem> - <listitem> - <para> + </simpara> + </listitem> + <listitem> + <para> PCRE provides some extensions to the Perl regular expression facilities: - <orderedlist> - <listitem> - <simpara> + <orderedlist> + <listitem> + <simpara> Although lookbehind assertions must match fixed length strings, each alternative branch of a lookbehind assertion can match a different length of string. Perl 5.005 requires @@ -1042,9 +1043,9 @@ </simpara> </listitem> </orderedlist> - </para> - </listitem> - </orderedlist> + </para> + </listitem> + </orderedlist> </para> </refsect1> @@ -1070,8 +1071,8 @@ itself. </para> </refsect2> - <refsect2 id="regexp.reference.meta"> - <title>Meta-caracters</title> + <refsect2 id="regexp.reference.meta"> + <title>Meta-caracters</title> <para> The power of regular expressions comes from the ability to include alternatives and repetitions in the pat- @@ -1086,116 +1087,116 @@ Outside square brackets, the meta-characters are as follows: <variablelist> <varlistentry> - <term><emphasis>\</emphasis></term> - <listitem> - <simpara> - general escape character with several uses - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>^</emphasis></term> - <listitem> - <simpara> - assert start of subject (or line, in multiline mode) - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>$</emphasis></term> - <listitem> - <simpara> - assert end of subject (or line, in multiline mode) - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>.</emphasis></term> - <listitem> - <simpara> - match any character except newline (by default) - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>[</emphasis></term> - <listitem> - <simpara> - start character class definition - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>]</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\</emphasis></term> + <listitem> + <simpara> + general escape character with several uses + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>^</emphasis></term> + <listitem> + <simpara> + assert start of subject (or line, in multiline mode) + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>$</emphasis></term> + <listitem> + <simpara> + assert end of subject (or line, in multiline mode) + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>.</emphasis></term> + <listitem> + <simpara> + match any character except newline (by default) + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>[</emphasis></term> + <listitem> + <simpara> + start character class definition + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>]</emphasis></term> + <listitem> + <simpara> end character class definition - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>|</emphasis></term> - <listitem> - <simpara> + <term><emphasis>|</emphasis></term> + <listitem> + <simpara> start of alternative branch - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>(</emphasis></term> - <listitem> - <simpara> + <term><emphasis>(</emphasis></term> + <listitem> + <simpara> start subpattern - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>)</emphasis></term> - <listitem> - <simpara> + <term><emphasis>)</emphasis></term> + <listitem> + <simpara> end subpattern - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>?</emphasis></term> - <listitem> - <simpara> + <term><emphasis>?</emphasis></term> + <listitem> + <simpara> extends the meaning of (, also 0 or 1 quantifier, also quantifier minimizer - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>*</emphasis></term> - <listitem> - <simpara> + <term><emphasis>*</emphasis></term> + <listitem> + <simpara> 0 or more quantifier - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>+</emphasis></term> - <listitem> - <simpara> + <term><emphasis>+</emphasis></term> + <listitem> + <simpara> 1 or more quantifier - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>{</emphasis></term> - <listitem> - <simpara> + <term><emphasis>{</emphasis></term> + <listitem> + <simpara> start min/max quantifier - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>}</emphasis></term> - <listitem> - <simpara> + <term><emphasis>}</emphasis></term> + <listitem> + <simpara> end min/max quantifier - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> </variablelist> @@ -1204,36 +1205,36 @@ characters are: <variablelist> <varlistentry> - <term><emphasis>\</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\</emphasis></term> + <listitem> + <simpara> general escape character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>^</emphasis></term> - <listitem> - <simpara> + <term><emphasis>^</emphasis></term> + <listitem> + <simpara> negate the class, but only if the first character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>-</emphasis></term> - <listitem> - <simpara> + <term><emphasis>-</emphasis></term> + <listitem> + <simpara> indicates character range - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>]</emphasis></term> - <listitem> - <simpara> + <term><emphasis>]</emphasis></term> + <listitem> + <simpara> terminates the character class - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> </variablelist> The following sections describe the use of each of the @@ -1277,76 +1278,76 @@ <para> <variablelist> <varlistentry> - <term><emphasis>\a</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\a</emphasis></term> + <listitem> + <simpara> alarm, that is, the BEL character (hex 07) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\cx</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\cx</emphasis></term> + <listitem> + <simpara> "control-x", where x is any character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\e</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\e</emphasis></term> + <listitem> + <simpara> escape (hex 1B) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\f</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\f</emphasis></term> + <listitem> + <simpara> formfeed (hex 0C) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\n</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\n</emphasis></term> + <listitem> + <simpara> newline (hex 0A) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\r</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\r</emphasis></term> + <listitem> + <simpara> carriage return (hex 0D) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\t</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\t</emphasis></term> + <listitem> + <simpara> tab (hex 09) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\xhh</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\xhh</emphasis></term> + <listitem> + <simpara> character with hex code hh - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\ddd</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\ddd</emphasis></term> + <listitem> + <simpara> character with octal code ddd, or backreference - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> </variablelist> </para> @@ -1389,80 +1390,80 @@ <para> <variablelist> <varlistentry> - <term><emphasis>\040</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\040</emphasis></term> + <listitem> + <simpara> is another way of writing a space - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\40</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\40</emphasis></term> + <listitem> + <simpara> is the same, provided there are fewer than 40 previous capturing subpatterns - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\7</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\7</emphasis></term> + <listitem> + <simpara> is always a back reference - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\11</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\11</emphasis></term> + <listitem> + <simpara> might be a back reference, or another way of writing a tab - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\011</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\011</emphasis></term> + <listitem> + <simpara> is always a tab - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\0113</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\0113</emphasis></term> + <listitem> + <simpara> is a tab followed by the character "3" - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\113</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\113</emphasis></term> + <listitem> + <simpara> is the character with octal code 113 (since there can be no more than 99 back references) - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\377</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\377</emphasis></term> + <listitem> + <simpara> is a byte consisting entirely of 1 bits - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\81</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\81</emphasis></term> + <listitem> + <simpara> is either a back reference, or a binary zero followed by the two characters "8" and "1" - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> </variablelist> </para> @@ -1485,52 +1486,52 @@ <para> <variablelist> <varlistentry> - <term><emphasis>\d</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\d</emphasis></term> + <listitem> + <simpara> any decimal digit - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\D</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\D</emphasis></term> + <listitem> + <simpara> any character that is not a decimal digit - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\s</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\s</emphasis></term> + <listitem> + <simpara> any whitespace character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\S</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\S</emphasis></term> + <listitem> + <simpara> any character that is not a whitespace character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\w</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\w</emphasis></term> + <listitem> + <simpara> any "word" character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> <varlistentry> - <term><emphasis>\W</emphasis></term> - <listitem> - <simpara> + <term><emphasis>\W</emphasis></term> + <listitem> + <simpara> any "non-word" character - </simpara> - </listitem> + </simpara> + </listitem> </varlistentry> </variablelist> </para> @@ -1565,49 +1566,49 @@ backslashed assertions are </para> <para> - <variablelist> - <varlistentry> - <term><emphasis>\b</emphasis></term> - <listitem> - <simpara> - word boundary - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>\B</emphasis></term> - <listitem> - <simpara> - not a word boundary - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>\A</emphasis></term> - <listitem> - <simpara> - start of subject (independent of multiline mode) - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>\Z</emphasis></term> - <listitem> - <simpara> - end of subject or newline at end (independent of - multiline mode) - </simpara> - </listitem> - </varlistentry> - <varlistentry> - <term><emphasis>\z</emphasis></term> - <listitem> - <simpara> - end of subject (independent of multiline mode) - </simpara> - </listitem> - </varlistentry> - </variablelist> + <variablelist> + <varlistentry> + <term><emphasis>\b</emphasis></term> + <listitem> + <simpara> + word boundary + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>\B</emphasis></term> + <listitem> + <simpara> + not a word boundary + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>\A</emphasis></term> + <listitem> + <simpara> + start of subject (independent of multiline mode) + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>\Z</emphasis></term> + <listitem> + <simpara> + end of subject or newline at end (independent of + multiline mode) + </simpara> + </listitem> + </varlistentry> + <varlistentry> + <term><emphasis>\z</emphasis></term> + <listitem> + <simpara> + end of subject (independent of multiline mode) + </simpara> + </listitem> + </varlistentry> + </variablelist> </para> <para> These assertions may not appear in character classes (but @@ -1634,8 +1635,9 @@ string, whereas <literal>\z</literal> matches only at the end. </para> </refsect2> - <refsect2 id="regexp.reference.circudollar"> - <title>Circumflex and dollar</title> + + <refsect2 id="regexp.reference.circudollar"> + <title>Circumflex and dollar</title> <literallayout> Outside a character class, in the default matching mode, the circumflex character is an assertion which is true only if @@ -1684,8 +1686,9 @@ whether <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not. </literallayout> </refsect2> - <refsect2 id="regexp.reference.dot"> - <title>FULL STOP</title> + + <refsect2 id="regexp.reference.dot"> + <title>FULL STOP</title> <literallayout> Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing @@ -1697,8 +1700,9 @@ in a character class. </literallayout> </refsect2> - <refsect2 id="regexp.reference.squarebrackets"> - <title>Square brackets</title> + + <refsect2 id="regexp.reference.squarebrackets"> + <title>Square brackets</title> <literallayout> An opening square bracket introduces a character class, ter- minated by a closing square bracket. A closing square @@ -1776,8 +1780,9 @@ classes, but it does no harm if they are escaped. </literallayout> </refsect2> - <refsect2 id="regexp.reference.verticalbar"> - <title>Vertical bar</title> + + <refsect2 id="regexp.reference.verticalbar"> + <title>Vertical bar</title> <literallayout> Vertical bar characters are used to separate alternative patterns. For example, the pattern @@ -1794,8 +1799,9 @@ subpattern. </literallayout> </refsect2> - <refsect2 id="regexp.reference.internal-options"> - <title>Internal option setting</title> + + <refsect2 id="regexp.reference.internal-options"> + <title>Internal option setting</title> <literallayout> The settings of <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link> , <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> , @@ -1866,8 +1872,9 @@ even when it is at top level. It is best put at the start. </literallayout> </refsect2> - <refsect2 id="regexp.reference.subpatterns"> - <title>subpatterns</title> + + <refsect2 id="regexp.reference.subpatterns"> + <title>subpatterns</title> <literallayout> Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpat- @@ -1929,8 +1936,9 @@ the above patterns match "SUNDAY" as well as "Saturday". </literallayout> </refsect2> - <refsect2 id="regexp.reference.repetition"> - <title>Repetition</title> + + <refsect2 id="regexp.reference.repetition"> + <title>Repetition</title> <literallayout> Repetition is specified by quantifiers, which can follow any of the following items: @@ -2069,8 +2077,9 @@ "b". </literallayout> </refsect2> - <refsect2 id="regexp.reference.back-references"> - <title>BACK REFERENCES</title> + + <refsect2 id="regexp.reference.back-references"> + <title>BACK REFERENCES</title> <literallayout> Outside a character class, a backslash followed by a digit greater than 0 (and possibly further digits) is a back @@ -2136,8 +2145,9 @@ example above, or by a quantifier with a minimum of zero. </literallayout> </refsect2> - <refsect2 id="regexp.reference.assertions"> - <title>Assertions</title> + + <refsect2 id="regexp.reference.assertions"> + <title>Assertions</title> <literallayout> An assertion is a test on the characters following or preceding the current matching point that does not actually @@ -2257,8 +2267,9 @@ subpatterns. </literallayout> </refsect2> - <refsect2 id="regexp.reference.onlyonce"> - <title>Once-only subpatterns</title> + + <refsect2 id="regexp.reference.onlyonce"> + <title>Once-only subpatterns</title> <literallayout> With both maximizing and minimizing repetition, failure of what follows normally causes the repeated item to be re- @@ -2367,8 +2378,9 @@ pens quickly. </literallayout> </refsect2> - <refsect2 id="regexp.reference.conditional"> - <title>Conditional subpatterns</title> + + <refsect2 id="regexp.reference.conditional"> + <title>Conditional subpatterns</title> <literallayout> It is possible to cause the matching process to obey a sub- pattern conditionally or to choose between two alternative @@ -2426,8 +2438,9 @@ letters and dd are digits. </literallayout> </refsect2> - <refsect2 id="regexp.reference.comments"> - <title>Comments</title> + + <refsect2 id="regexp.reference.comments"> + <title>Comments</title> <literallayout> The sequence (?# marks the start of a comment which continues up to the next closing parenthesis. Nested @@ -2439,8 +2452,9 @@ ues up to the next newline character in the pattern. </literallayout> </refsect2> - <refsect2 id="regexp.reference.recursive"> - <title>Recursive patterns</title> + + <refsect2 id="regexp.reference.recursive"> + <title>Recursive patterns</title> <literallayout> Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without the use @@ -2499,8 +2513,9 @@ recursion. </literallayout> </refsect2> - <refsect2 id="regexp.reference.performances"> - <title>Performances</title> + + <refsect2 id="regexp.reference.performances"> + <title>Performances</title> <literallayout> Certain items that may appear in patterns are more efficient than others. It is more efficient to use a character class