aidan Mon Dec 6 22:29:17 2004 EDT
Modified files:
/phpdoc/en/reference/pcre pattern.syntax.xml
Log:
whitespace fixes
http://cvs.php.net/diff.php/phpdoc/en/reference/pcre/pattern.syntax.xml?r1=1.4&r2=1.5&ty=u
Index: phpdoc/en/reference/pcre/pattern.syntax.xml
diff -u phpdoc/en/reference/pcre/pattern.syntax.xml:1.4
phpdoc/en/reference/pcre/pattern.syntax.xml:1.5
--- phpdoc/en/reference/pcre/pattern.syntax.xml:1.4 Wed Aug 11 16:15:29 2004
+++ phpdoc/en/reference/pcre/pattern.syntax.xml Mon Dec 6 22:29:16 2004
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.4 $ -->
+<!-- $Revision: 1.5 $ -->
<!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 -->
<refentry id="reference.pcre.pattern.syntax">
<refnamediv>
@@ -38,109 +38,105 @@
</listitem>
<listitem>
<simpara>
- PCRE does not allow repeat quantifiers on lookahead
- assertions. Perl permits them, but they do not mean what you
- might think. For example, (?!a){3} does not assert that the
- next three characters are not "a". It just asserts that the
- next character is not "a" three times.
+ PCRE does not allow repeat quantifiers on lookahead
+ assertions. Perl permits them, but they do not mean what you
+ might think. For example, (?!a){3} does not assert that the
+ next three characters are not "a". It just asserts that the
+ next character is not "a" three times.
</simpara>
</listitem>
<listitem>
<simpara>
- Capturing subpatterns that occur inside negative
- lookahead assertions are counted, but their entries in the
- offsets vector are never set. Perl sets its numerical
- variables from any such patterns that are matched before the
- assertion fails to match something (thereby succeeding), but
- only if the negative lookahead assertion contains just one
- branch.
+ Capturing subpatterns that occur inside negative
+ lookahead assertions are counted, but their entries in the
+ offsets vector are never set. Perl sets its numerical
+ variables from any such patterns that are matched before the
+ assertion fails to match something (thereby succeeding), but
+ only if the negative lookahead assertion contains just one
+ branch.
</simpara>
</listitem>
<listitem>
<simpara>
- Though binary zero characters are supported in the subject string,
- they are not allowed in a pattern string because it is passed as a
- normal C string, terminated by zero. The escape sequence "\\x00" can
- be used in the pattern to represent a binary zero.
+ Though binary zero characters are supported in the subject string,
+ they are not allowed in a pattern string because it is passed as a
+ normal C string, terminated by zero. The escape sequence "\\x00" can
+ be used in the pattern to represent a binary zero.
</simpara>
</listitem>
<listitem>
<simpara>
- The following Perl escape sequences are not supported:
- \l, \u, \L, \U, \E, \Q. In fact these are implemented by
- Perl's general string-handling and are not part of its
- pattern matching engine.
+ The following Perl escape sequences are not supported:
+ \l, \u, \L, \U, \E, \Q. In fact these are implemented by
+ Perl's general string-handling and are not part of its
+ pattern matching engine.
</simpara>
</listitem>
<listitem>
<simpara>
- The Perl \G assertion is not supported as it is not
- relevant to single pattern matches.
+ The Perl \G assertion is not supported as it is not
+ relevant to single pattern matches.
</simpara>
</listitem>
<listitem>
<simpara>
- Fairly obviously, PCRE does not support the (?{code})
- construction.
+ Fairly obviously, PCRE does not support the (?{code})
+ construction.
</simpara>
</listitem>
<listitem>
<simpara>
- There are at the time of writing some oddities in Perl
- 5.005_02 concerned with the settings of captured strings
- when part of a pattern is repeated. For example, matching
- "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
- "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
- unset. However, if the pattern is changed to
- /^(aa(b(b))?)+$/ then $2 (and $3) get set.
- In Perl 5.004 $2 is set in both cases, and that is also &true;
- of PCRE. If in the future Perl changes to a consistent state
- that is different, PCRE may change to follow.
+ There are at the time of writing some oddities in Perl
+ 5.005_02 concerned with the settings of captured strings
+ when part of a pattern is repeated. For example, matching
+ "aba" against the pattern /^(a(b)?)+$/ sets $2 to the value
+ "b", but matching "aabbaa" against /^(aa(bb)?)+$/ leaves $2
+ unset. However, if the pattern is changed to
+ /^(aa(b(b))?)+$/ then $2 (and $3) get set.
+ In Perl 5.004 $2 is set in both cases, and that is also &true;
+ of PCRE. If in the future Perl changes to a consistent state
+ that is different, PCRE may change to follow.
</simpara>
</listitem>
<listitem>
<simpara>
- Another as yet unresolved discrepancy is that in Perl
- 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
- "a", whereas in PCRE it does not. However, in both Perl and
- PCRE /^(a)?a/ matched against "a" leaves $1 unset.
+ Another as yet unresolved discrepancy is that in Perl
+ 5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
+ "a", whereas in PCRE it does not. However, in both Perl and
+ PCRE /^(a)?a/ matched against "a" leaves $1 unset.
</simpara>
</listitem>
<listitem>
<para>
- PCRE provides some extensions to the Perl regular
- expression facilities:
+ PCRE provides some extensions to the Perl regular
+ expression facilities:
<orderedlist>
<listitem>
<simpara>
- Although lookbehind assertions must match fixed length
- strings, each alternative branch of a lookbehind assertion
- can match a different length of string. Perl 5.005 requires
- them all to have the same length.
+ Although lookbehind assertions must match fixed length
+ strings, each alternative branch of a lookbehind assertion
+ can match a different length of string. Perl 5.005 requires
+ them all to have the same length.
</simpara>
</listitem>
<listitem>
<simpara>
- If <link
-
linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
- is set and <link
- linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
+ If <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
+ is set and <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
not set, the $ meta-character matches only at the very end of the
string.
</simpara>
</listitem>
<listitem>
<simpara>
- If <link
- linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> is
+ If <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> is
set, a backslash followed by a letter with no special meaning is
faulted.
</simpara>
</listitem>
<listitem>
<simpara>
- If <link
- linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> is
+ If <link
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> is
set, the greediness of the repetition quantifiers is inverted,
that is, by default they are not greedy, but if followed by a
question mark they are.
@@ -155,307 +151,202 @@
<refsect1 id="regexp.reference">
<title>Regular Expression Details</title>
- <refsect2 id="regexp.introduction">
- <title>Introduction</title>
- <para>
- The syntax and semantics of the regular expressions
- supported by PCRE are described below. Regular expressions are
- also described in the Perl documentation and in a number of
- other books, some of which have copious examples. Jeffrey
- Friedl's "Mastering Regular Expressions", published by
- O'Reilly (ISBN 1-56592-257-3), covers them in great detail.
- The description here is intended as reference documentation.
- </para>
- <para>
- A regular expression is a pattern that is matched against a
- subject string from left to right. Most characters stand for
- themselves in a pattern, and match the corresponding
- characters in the subject. As a trivial example, the pattern
- <literal>The quick brown fox</literal>
- matches a portion of a subject string that is identical to
- itself.
- </para>
+ <refsect2 id="regexp.introduction">
+ <title>Introduction</title>
+ <para>
+ The syntax and semantics of the regular expressions
+ supported by PCRE are described below. Regular expressions are
+ also described in the Perl documentation and in a number of
+ other books, some of which have copious examples. Jeffrey
+ Friedl's "Mastering Regular Expressions", published by
+ O'Reilly (ISBN 1-56592-257-3), covers them in great detail.
+ The description here is intended as reference documentation.
+ </para>
+ <para>
+ A regular expression is a pattern that is matched against a
+ subject string from left to right. Most characters stand for
+ themselves in a pattern, and match the corresponding
+ characters in the subject. As a trivial example, the pattern
+ <literal>The quick brown fox</literal>
+ matches a portion of a subject string that is identical to
+ itself.
+ </para>
</refsect2>
<refsect2 id="regexp.reference.meta">
<title>Meta-characters</title>
<para>
- The power of regular expressions comes from the
- ability to include alternatives and repetitions in the
- pattern. These are encoded in the pattern by the use of
- <emphasis>meta-characters</emphasis>, which do not stand for themselves
but instead
- are interpreted in some special way.
- </para>
- <para>
- There are two different sets of meta-characters: those that
- are recognized anywhere in the pattern except within square
- brackets, and those that are recognized in square brackets.
- Outside square brackets, the meta-characters are as follows:
+ The power of regular expressions comes from the
+ ability to include alternatives and repetitions in the
+ pattern. These are encoded in the pattern by the use of
+ <emphasis>meta-characters</emphasis>, which do not stand for themselves
but instead
+ are interpreted in some special way.
+ </para>
+ <para>
+ There are two different sets of meta-characters: those that
+ are recognized anywhere in the pattern except within square
+ brackets, and those that are recognized in square brackets.
+ Outside square brackets, the meta-characters are as follows:
<variablelist>
<varlistentry>
<term><emphasis>\</emphasis></term>
- <listitem>
- <simpara>
- general escape character with several uses
- </simpara>
- </listitem>
+ <listitem><simpara>general escape character with several
uses</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>^</emphasis></term>
- <listitem>
- <simpara>
- assert start of subject (or line, in multiline mode)
- </simpara>
- </listitem>
+ <listitem><simpara>assert start of subject (or line, in multiline
mode)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>$</emphasis></term>
- <listitem>
- <simpara>
- assert end of subject (or line, in multiline mode)
- </simpara>
- </listitem>
+ <listitem><simpara>assert end of subject (or line, in multiline
mode)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>.</emphasis></term>
- <listitem>
- <simpara>
- match any character except newline (by default)
- </simpara>
- </listitem>
+ <listitem><simpara>match any character except newline (by
default)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>[</emphasis></term>
- <listitem>
- <simpara>
- start character class definition
- </simpara>
- </listitem>
+ <listitem><simpara>start character class
definition</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>]</emphasis></term>
- <listitem>
- <simpara>
- end character class definition
- </simpara>
- </listitem>
+ <listitem><simpara>end character class definition</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>|</emphasis></term>
- <listitem>
- <simpara>
- start of alternative branch
- </simpara>
- </listitem>
+ <listitem><simpara>start of alternative branch</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>(</emphasis></term>
- <listitem>
- <simpara>
- start subpattern
- </simpara>
- </listitem>
+ <listitem><simpara>start subpattern</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>)</emphasis></term>
- <listitem>
- <simpara>
- end subpattern
- </simpara>
- </listitem>
+ <listitem><simpara>end subpattern</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>?</emphasis></term>
- <listitem>
- <simpara>
- extends the meaning of (, also 0 or 1 quantifier, also quantifier
minimizer
- </simpara>
- </listitem>
+ <listitem><simpara>extends the meaning of (, also 0 or 1 quantifier,
also quantifier minimizer</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>*</emphasis></term>
- <listitem>
- <simpara>
- 0 or more quantifier
- </simpara>
- </listitem>
+ <listitem><simpara>0 or more quantifier</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>+</emphasis></term>
- <listitem>
- <simpara>
- 1 or more quantifier
- </simpara>
- </listitem>
+ <listitem><simpara>1 or more quantifier</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>{</emphasis></term>
- <listitem>
- <simpara>
- start min/max quantifier
- </simpara>
- </listitem>
+ <listitem><simpara>start min/max quantifier</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>}</emphasis></term>
- <listitem>
- <simpara>
- end min/max quantifier
- </simpara>
- </listitem>
+ <listitem><simpara>end min/max quantifier</simpara></listitem>
</varlistentry>
</variablelist>
- Part of a pattern that is in square brackets is called a
- "character class". In a character class the only
- meta-characters are:
+ Part of a pattern that is in square brackets is called a
+ "character class". In a character class the only
+ meta-characters are:
+
<variablelist>
<varlistentry>
<term><emphasis>\</emphasis></term>
- <listitem>
- <simpara>
- general escape character
- </simpara>
- </listitem>
+ <listitem><simpara>general escape character</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>^</emphasis></term>
- <listitem>
- <simpara>
- negate the class, but only if the first character
- </simpara>
- </listitem>
+ <listitem><simpara>negate the class, but only if the first
character</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>-</emphasis></term>
- <listitem>
- <simpara>
- indicates character range
- </simpara>
- </listitem>
+ <listitem><simpara>indicates character range</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>]</emphasis></term>
- <listitem>
- <simpara>
- terminates the character class
- </simpara>
- </listitem>
+ <listitem><simpara>terminates the character class</simpara></listitem>
</varlistentry>
</variablelist>
- The following sections describe the use of each of the
- meta-characters.
- </para>
+
+ The following sections describe the use of each of the
+ meta-characters.
+ </para>
</refsect2>
- <refsect2 id="regexp.reference.backslash">
- <title>backslash</title>
+
+ <refsect2 id="regexp.reference.backslash">
+ <title>backslash</title>
+ <para>
+ The backslash character has several uses. Firstly, if it is
+ followed by a non-alphanumeric character, it takes away any
+ special meaning that character may have. This use of
+ backslash as an escape character applies both inside and
+ outside character classes.
+ </para>
+ <para>
+ For example, if you want to match a "*" character, you write
+ "\*" in the pattern. This applies whether or not the
+ following character would otherwise be interpreted as a
+ meta-character, so it is always safe to precede a non-alphanumeric
+ with "\" to specify that it stands for itself. In
+ particular, if you want to match a backslash, you write "\\".
+ </para>
+ <para>
+ If a pattern is compiled with the
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
option,
+ whitespace in the pattern (other than in a character class) and
+ characters between a "#" outside a character class and the next newline
+ character are ignored. An escaping backslash can be used to include a
+ whitespace or "#" character as part of the pattern.
+ </para>
+ <para>
+ A second use of backslash provides a way of encoding
+ non-printing characters in patterns in a visible manner. There
+ is no restriction on the appearance of non-printing characters,
+ apart from the binary zero that terminates a pattern,
+ but when a pattern is being prepared by text editing, it is
+ usually easier to use one of the following escape sequences
+ than the binary character it represents:
+ </para>
<para>
- The backslash character has several uses. Firstly, if it is
- followed by a non-alphanumeric character, it takes away any
- special meaning that character may have. This use of
- backslash as an escape character applies both inside and
- outside character classes.
- </para>
- <para>
- For example, if you want to match a "*" character, you write
- "\*" in the pattern. This applies whether or not the
- following character would otherwise be interpreted as a
- meta-character, so it is always safe to precede a non-alphanumeric
- with "\" to specify that it stands for itself. In
- particular, if you want to match a backslash, you write "\\".
- </para>
- <para>
- If a pattern is compiled with the <link
- linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option,
- whitespace in the pattern (other than in a character class) and
- characters between a "#" outside a character class and the next newline
- character are ignored. An escaping backslash can be used to include a
- whitespace or "#" character as part of the pattern.
- </para>
- <para>
- A second use of backslash provides a way of encoding
- non-printing characters in patterns in a visible manner. There
- is no restriction on the appearance of non-printing characters,
- apart from the binary zero that terminates a pattern,
- but when a pattern is being prepared by text editing, it is
- usually easier to use one of the following escape sequences
- than the binary character it represents:
- </para>
- <para>
<variablelist>
<varlistentry>
<term><emphasis>\a</emphasis></term>
- <listitem>
- <simpara>
- alarm, that is, the BEL character (hex 07)
- </simpara>
- </listitem>
+ <listitem><simpara>alarm, that is, the BEL character (hex
07)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\cx</emphasis></term>
- <listitem>
- <simpara>
- "control-x", where x is any character
- </simpara>
- </listitem>
+ <listitem><simpara>"control-x", where x is any
character</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\e</emphasis></term>
- <listitem>
- <simpara>
- escape (hex 1B)
- </simpara>
- </listitem>
+ <listitem><simpara>escape (hex 1B)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\f</emphasis></term>
- <listitem>
- <simpara>
- formfeed (hex 0C)
- </simpara>
- </listitem>
+ <listitem><simpara>formfeed (hex 0C)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\n</emphasis></term>
- <listitem>
- <simpara>
- newline (hex 0A)
- </simpara>
- </listitem>
+ <listitem><simpara>newline (hex 0A)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\r</emphasis></term>
- <listitem>
- <simpara>
- carriage return (hex 0D)
- </simpara>
- </listitem>
+ <listitem><simpara>carriage return (hex 0D)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\t</emphasis></term>
- <listitem>
- <simpara>
- tab (hex 09)
- </simpara>
- </listitem>
+ <listitem><simpara>tab (hex 09)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\xhh</emphasis></term>
- <listitem>
- <simpara>
- character with hex code hh
- </simpara>
- </listitem>
+ <listitem><simpara>character with hex code hh</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\ddd</emphasis></term>
- <listitem>
- <simpara>
- character with octal code ddd, or backreference
- </simpara>
- </listitem>
+ <listitem><simpara>character with octal code ddd, or
backreference</simpara></listitem>
</varlistentry>
</variablelist>
- </para>
+ </para>
<para>
The precise effect of "<literal>\cx</literal>" is as follows:
if "<literal>x</literal>" is a lower case letter, it is converted
@@ -496,83 +387,63 @@
stand for themselves. For example:
</para>
<para>
- <variablelist>
- <varlistentry>
- <term><emphasis>\040</emphasis></term>
- <listitem>
- <simpara>
- is another way of writing a space
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\40</emphasis></term>
- <listitem>
- <simpara>
- is the same, provided there are fewer than 40
- previous capturing subpatterns
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\7</emphasis></term>
- <listitem>
- <simpara>
- is always a back reference
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\11</emphasis></term>
- <listitem>
- <simpara>
- might be a back reference, or another way of
- writing a tab
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\011</emphasis></term>
- <listitem>
- <simpara>
- is always a tab
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\0113</emphasis></term>
- <listitem>
- <simpara>
- is a tab followed by the character "3"
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\113</emphasis></term>
- <listitem>
- <simpara>
- is the character with octal code 113 (since there
- can be no more than 99 back references)
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\377</emphasis></term>
- <listitem>
- <simpara>
- is a byte consisting entirely of 1 bits
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\81</emphasis></term>
- <listitem>
- <simpara>
- is either a back reference, or a binary zero
- followed by the two characters "8" and "1"
- </simpara>
- </listitem>
- </varlistentry>
+ <variablelist>
+ <varlistentry>
+ <term><emphasis>\040</emphasis></term>
+ <listitem><simpara>is another way of writing a
space</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\40</emphasis></term>
+ <listitem>
+ <simpara>
+ is the same, provided there are fewer than 40
+ previous capturing subpatterns
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\7</emphasis></term>
+ <listitem><simpara>is always a back reference</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\11</emphasis></term>
+ <listitem>
+ <simpara>
+ might be a back reference, or another way of
+ writing a tab
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\011</emphasis></term>
+ <listitem><simpara>is always a tab</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\0113</emphasis></term>
+ <listitem><simpara>is a tab followed by the character
"3"</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\113</emphasis></term>
+ <listitem>
+ <simpara>
+ is the character with octal code 113 (since there
+ can be no more than 99 back references)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\377</emphasis></term>
+ <listitem><simpara>is a byte consisting entirely of 1
bits</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\81</emphasis></term>
+ <listitem>
+ <simpara>
+ is either a back reference, or a binary zero
+ followed by the two characters "8" and "1"
+ </simpara>
+ </listitem>
+ </varlistentry>
</variablelist>
</para>
<para>
@@ -592,56 +463,32 @@
character types:
</para>
<para>
- <variablelist>
- <varlistentry>
- <term><emphasis>\d</emphasis></term>
- <listitem>
- <simpara>
- any decimal digit
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\D</emphasis></term>
- <listitem>
- <simpara>
- any character that is not a decimal digit
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\s</emphasis></term>
- <listitem>
- <simpara>
- any whitespace character
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\S</emphasis></term>
- <listitem>
- <simpara>
- any character that is not a whitespace character
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\w</emphasis></term>
- <listitem>
- <simpara>
- any "word" character
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\W</emphasis></term>
- <listitem>
- <simpara>
- any "non-word" character
- </simpara>
- </listitem>
- </varlistentry>
- </variablelist>
+ <variablelist>
+ <varlistentry>
+ <term><emphasis>\d</emphasis></term>
+ <listitem><simpara>any decimal digit</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\D</emphasis></term>
+ <listitem><simpara>any character that is not a decimal
digit</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\s</emphasis></term>
+ <listitem><simpara>any whitespace character</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\S</emphasis></term>
+ <listitem><simpara>any character that is not a whitespace
character</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\w</emphasis></term>
+ <listitem><simpara>any "word" character</simpara></listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\W</emphasis></term>
+ <listitem><simpara>any "non-word" character</simpara></listitem>
+ </varlistentry>
+ </variablelist>
</para>
<para>
Each pair of escape sequences partitions the complete set of
@@ -677,44 +524,28 @@
<variablelist>
<varlistentry>
<term><emphasis>\b</emphasis></term>
- <listitem>
- <simpara>
- word boundary
- </simpara>
- </listitem>
+ <listitem><simpara>word boundary</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\B</emphasis></term>
- <listitem>
- <simpara>
- not a word boundary
- </simpara>
- </listitem>
+ <listitem><simpara>not a word boundary</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\A</emphasis></term>
- <listitem>
- <simpara>
- start of subject (independent of multiline mode)
- </simpara>
- </listitem>
+ <listitem><simpara>start of subject (independent of multiline
mode)</simpara></listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\Z</emphasis></term>
- <listitem>
+ <listitem>
<simpara>
- end of subject or newline at end (independent of
- multiline mode)
+ end of subject or newline at end (independent of
+ multiline mode)
</simpara>
</listitem>
</varlistentry>
<varlistentry>
<term><emphasis>\z</emphasis></term>
- <listitem>
- <simpara>
- end of subject(independent of multiline mode)
- </simpara>
- </listitem>
+ <listitem><simpara>end of subject(independent of multiline
mode)</simpara></listitem>
</varlistentry>
</variablelist>
</para>
@@ -738,8 +569,7 @@
ever match at the very start and end of the subject string,
whatever options are set. They are not affected by the
<link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> or
- <link
- linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
+ <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
options. The difference between <literal>\Z</literal> and
<literal>\z</literal> is that <literal>\Z</literal> matches before a
newline that is the last character of the string as well as at the end of
@@ -750,60 +580,59 @@
<refsect2 id="regexp.reference.circudollar">
<title>Circumflex and dollar</title>
<para>
- Outside a character class, in the default matching mode, the
- circumflex character is an assertion which is true only if
- the current matching point is at the start of the subject
- string. Inside a character class, circumflex has an entirely
- different meaning (see below).
- </para>
- <para>
- Circumflex need not be the first character of the pattern if
- a number of alternatives are involved, but it should be the
- first thing in each alternative in which it appears if the
- pattern is ever to match that branch. If all possible
- alternatives start with a circumflex, that is, if the pattern is
- constrained to match only at the start of the subject, it is
- said to be an "anchored" pattern. (There are also other
- constructs that can cause a pattern to be anchored.)
- </para>
- <para>
- A dollar character is an assertion which is &true; only if the
- current matching point is at the end of the subject string,
- or immediately before a newline character that is the last
- character in the string (by default). Dollar need not be the
- last character of the pattern if a number of alternatives
- are involved, but it should be the last item in any branch
- in which it appears. Dollar has no special meaning in a
- character class.
- </para>
- <para>
- The meaning of dollar can be changed so that it matches only
- at the very end of the string, by setting the
- <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
- option at compile or matching time. This
- does not affect the \Z assertion.
- </para>
- <para>
- The meanings of the circumflex and dollar characters are
- changed if the <link
- linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> option
- is set. When this is the case, they match immediately after and
- immediately before an internal "\n" character, respectively, in addition
- to matching at the start and end of the subject string. For example, the
- pattern /^abc$/ matches the subject string "def\nabc" in multiline mode,
- but not otherwise. Consequently, patterns that are anchored in single
- line mode because all branches start with "^" are not anchored in
- multiline mode. The <link
- linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
- option is ignored if <link
- linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
- set.
- </para>
- <para>
- Note that the sequences \A, \Z, and \z can be used to match
- the start and end of the subject in both modes, and if all
- branches of a pattern start with \A is it always anchored,
- whether <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not.
+ Outside a character class, in the default matching mode, the
+ circumflex character is an assertion which is true only if
+ the current matching point is at the start of the subject
+ string. Inside a character class, circumflex has an entirely
+ different meaning (see below).
+ </para>
+ <para>
+ Circumflex need not be the first character of the pattern if
+ a number of alternatives are involved, but it should be the
+ first thing in each alternative in which it appears if the
+ pattern is ever to match that branch. If all possible
+ alternatives start with a circumflex, that is, if the pattern is
+ constrained to match only at the start of the subject, it is
+ said to be an "anchored" pattern. (There are also other
+ constructs that can cause a pattern to be anchored.)
+ </para>
+ <para>
+ A dollar character is an assertion which is &true; only if the
+ current matching point is at the end of the subject string,
+ or immediately before a newline character that is the last
+ character in the string (by default). Dollar need not be the
+ last character of the pattern if a number of alternatives
+ are involved, but it should be the last item in any branch
+ in which it appears. Dollar has no special meaning in a
+ character class.
+ </para>
+ <para>
+ The meaning of dollar can be changed so that it matches only
+ at the very end of the string, by setting the
+ <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
+ option at compile or matching time. This does not affect the \Z
assertion.
+ </para>
+ <para>
+ The meanings of the circumflex and dollar characters are
+ changed if the
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>
option
+ is set. When this is the case, they match immediately after and
+ immediately before an internal "\n" character, respectively, in addition
+ to matching at the start and end of the subject string. For example, the
+ pattern /^abc$/ matches the subject string "def\nabc" in multiline mode,
+ but not otherwise. Consequently, patterns that are anchored in single
+ line mode because all branches start with "^" are not anchored in
+ multiline mode. The
+ <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link>
+ option is ignored if
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is
+ set.
+ </para>
+ <para>
+ Note that the sequences \A, \Z, and \z can be used to match
+ the start and end of the subject in both modes, and if all
+ branches of a pattern start with \A is it always anchored,
+ whether <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not.
</para>
</refsect2>
@@ -812,8 +641,8 @@
<para>
Outside a character class, a dot in the pattern matches any
one character in the subject, including a non-printing
- character, but not (by default) newline. If the <link
- linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
+ character, but not (by default) newline. If the
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
option is set, then dots match newlines as well. The
handling of dot is entirely independent of the handling of
circumflex and dollar, the only relationship being that they
@@ -825,90 +654,90 @@
<refsect2 id="regexp.reference.squarebrackets">
<title>Square brackets</title>
<para>
- An opening square bracket introduces a character class,
- terminated by a closing square bracket. A closing square
- bracket on its own is not special. If a closing square
- bracket is required as a member of the class, it should be
- the first data character in the class (after an initial
- circumflex, if present) or escaped with a backslash.
- </para>
- <para>
- A character class matches a single character in the subject;
- the character must be in the set of characters defined by
- the class, unless the first character in the class is a
- circumflex, in which case the subject character must not be in
- the set defined by the class. If a circumflex is actually
- required as a member of the class, ensure it is not the
- first character, or escape it with a backslash.
- </para>
- <para>
- For example, the character class [aeiou] matches any lower
- case vowel, while [^aeiou] matches any character that is not
- a lower case vowel. Note that a circumflex is just a
- convenient notation for specifying the characters which are in
- the class by enumerating those that are not. It is not an
- assertion: it still consumes a character from the subject
- string, and fails if the current pointer is at the end of
- the string.
- </para>
- <para>
- When caseless matching is set, any letters in a class
- represent both their upper case and lower case versions, so
- for example, a caseless [aeiou] matches "A" as well as "a",
- and a caseless [^aeiou] does not match "A", whereas a
- caseful version would.
- </para>
- <para>
- The newline character is never treated in any special way in
- character classes, whatever the setting of the <link
- linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
- or <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>
- options is. A class such as [^a] will always match a newline.
- </para>
- <para>
- The minus (hyphen) character can be used to specify a range
- of characters in a character class. For example, [d-m]
- matches any letter between d and m, inclusive. If a minus
- character is required in a class, it must be escaped with a
- backslash or appear in a position where it cannot be
- interpreted as indicating a range, typically as the first or last
- character in the class.
- </para>
- <para>
- It is not possible to have the literal character "]" as the
- end character of a range. A pattern such as [W-]46] is
- interpreted as a class of two characters ("W" and "-")
- followed by a literal string "46]", so it would match "W46]" or
- "-46]". However, if the "]" is escaped with a backslash it
- is interpreted as the end of range, so [W-\]46] is
- interpreted as a single class containing a range followed by two
- separate characters. The octal or hexadecimal representation
- of "]" can also be used to end a range.
- </para>
- <para>
- Ranges operate in ASCII collating sequence. They can also be
- used for characters specified numerically, for example
- [\000-\037]. If a range that includes letters is used when
- caseless matching is set, it matches the letters in either
- case. For example, [W-c] is equivalent to [][\^_`wxyzabc],
- matched caselessly, and if character tables for the "fr"
- locale are in use, [\xc8-\xcb] matches accented E characters
- in both cases.
- </para>
- <para>
- The character types \d, \D, \s, \S, \w, and \W may also
- appear in a character class, and add the characters that
- they match to the class. For example, [\dABCDEF] matches any
- hexadecimal digit. A circumflex can conveniently be used
- with the upper case character types to specify a more
- restricted set of characters than the matching lower case type.
- For example, the class [^\W_] matches any letter or digit,
- but not underscore.
- </para>
- <para>
- All non-alphanumeric characters other than \, -, ^ (at the
- start) and the terminating ] are non-special in character
- classes, but it does no harm if they are escaped.
+ An opening square bracket introduces a character class,
+ terminated by a closing square bracket. A closing square
+ bracket on its own is not special. If a closing square
+ bracket is required as a member of the class, it should be
+ the first data character in the class (after an initial
+ circumflex, if present) or escaped with a backslash.
+ </para>
+ <para>
+ A character class matches a single character in the subject;
+ the character must be in the set of characters defined by
+ the class, unless the first character in the class is a
+ circumflex, in which case the subject character must not be in
+ the set defined by the class. If a circumflex is actually
+ required as a member of the class, ensure it is not the
+ first character, or escape it with a backslash.
+ </para>
+ <para>
+ For example, the character class [aeiou] matches any lower
+ case vowel, while [^aeiou] matches any character that is not
+ a lower case vowel. Note that a circumflex is just a
+ convenient notation for specifying the characters which are in
+ the class by enumerating those that are not. It is not an
+ assertion: it still consumes a character from the subject
+ string, and fails if the current pointer is at the end of
+ the string.
+ </para>
+ <para>
+ When caseless matching is set, any letters in a class
+ represent both their upper case and lower case versions, so
+ for example, a caseless [aeiou] matches "A" as well as "a",
+ and a caseless [^aeiou] does not match "A", whereas a
+ caseful version would.
+ </para>
+ <para>
+ The newline character is never treated in any special way in
+ character classes, whatever the setting of the <link
+ linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>
+ or <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>
+ options is. A class such as [^a] will always match a newline.
+ </para>
+ <para>
+ The minus (hyphen) character can be used to specify a range
+ of characters in a character class. For example, [d-m]
+ matches any letter between d and m, inclusive. If a minus
+ character is required in a class, it must be escaped with a
+ backslash or appear in a position where it cannot be
+ interpreted as indicating a range, typically as the first or last
+ character in the class.
+ </para>
+ <para>
+ It is not possible to have the literal character "]" as the
+ end character of a range. A pattern such as [W-]46] is
+ interpreted as a class of two characters ("W" and "-")
+ followed by a literal string "46]", so it would match "W46]" or
+ "-46]". However, if the "]" is escaped with a backslash it
+ is interpreted as the end of range, so [W-\]46] is
+ interpreted as a single class containing a range followed by two
+ separate characters. The octal or hexadecimal representation
+ of "]" can also be used to end a range.
+ </para>
+ <para>
+ Ranges operate in ASCII collating sequence. They can also be
+ used for characters specified numerically, for example
+ [\000-\037]. If a range that includes letters is used when
+ caseless matching is set, it matches the letters in either
+ case. For example, [W-c] is equivalent to [][\^_`wxyzabc],
+ matched caselessly, and if character tables for the "fr"
+ locale are in use, [\xc8-\xcb] matches accented E characters
+ in both cases.
+ </para>
+ <para>
+ The character types \d, \D, \s, \S, \w, and \W may also
+ appear in a character class, and add the characters that
+ they match to the class. For example, [\dABCDEF] matches any
+ hexadecimal digit. A circumflex can conveniently be used
+ with the upper case character types to specify a more
+ restricted set of characters than the matching lower case type.
+ For example, the class [^\W_] matches any letter or digit,
+ but not underscore.
+ </para>
+ <para>
+ All non-alphanumeric characters other than \, -, ^ (at the
+ start) and the terminating ] are non-special in character
+ classes, but it does no harm if they are escaped.
</para>
</refsect2>
@@ -917,9 +746,7 @@
<para>
Vertical bar characters are used to separate alternative
patterns. For example, the pattern
-
- <literal>gilbert|sullivan</literal>
-
+ <literal>gilbert|sullivan</literal>
matches either "gilbert" or "sullivan". Any number of alternatives
may appear, and an empty alternative is permitted
(matching the empty string). The matching process tries
@@ -934,104 +761,105 @@
<refsect2 id="regexp.reference.internal-options">
<title>Internal option setting</title>
<para>
- The settings of <link
linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link>,
- <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>,
- <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>,
- <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link>,
- and <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> can be changed
from within the pattern by
- a sequence of Perl option letters enclosed between "(?" and
- ")". The option letters are
+ The settings of <link
linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link>,
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>,
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>,
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link>,
+ and <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>
+ can be changed from within the pattern by
+ a sequence of Perl option letters enclosed between "(?" and
+ ")". The option letters are:
+
+ <table>
+ <title>Internal option letters</title>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry><literal>i</literal></entry>
+ <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link></entry>
+ </row>
+ <row>
+ <entry><literal>m</literal></entry>
+ <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link></entry>
+ </row>
+ <row>
+ <entry><literal>s</literal></entry>
+ <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link></entry>
+ </row>
+ <row>
+ <entry><literal>x</literal></entry>
+ <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link></entry>
+ </row>
+ <row>
+ <entry><literal>U</literal></entry>
+ <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link></entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+ </para>
+ <para>
+ For example, (?im) sets caseless, multiline matching. It is
+ also possible to unset these options by preceding the letter
+ with a hyphen, and a combined setting and unsetting such as
+ (?im-sx), which sets <link
linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link> and <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> while
+ unsetting <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> and <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>, is also
permitted.
+ If a letter appears both before and after the hyphen, the
+ option is unset.
+ </para>
+ <para>
+ The scope of these option changes depends on where in the
+ pattern the setting occurs. For settings that are outside
+ any subpattern (defined below), the effect is the same as if
+ the options were set or unset at the start of matching. The
+ following patterns all behave in exactly the same way:
+ </para>
- <table>
- <title>Internal option letters</title>
- <tgroup cols="2">
- <tbody>
- <row>
- <entry><literal>i</literal></entry>
- <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link></entry>
- </row>
- <row>
- <entry><literal>m</literal></entry>
- <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link></entry>
- </row>
- <row>
- <entry><literal>s</literal></entry>
- <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link></entry>
- </row>
- <row>
- <entry><literal>x</literal></entry>
- <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link></entry>
- </row>
- <row>
- <entry><literal>U</literal></entry>
- <entry>for <link
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link></entry>
- </row>
- </tbody>
- </tgroup>
- </table>
- </para>
- <para>
- For example, (?im) sets caseless, multiline matching. It is
- also possible to unset these options by preceding the letter
- with a hyphen, and a combined setting and unsetting such as
- (?im-sx), which sets <link
linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link> and <link
linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> while
- unsetting <link
linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> and <link
linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>, is also
permitted.
- If a letter appears both before and after the hyphen, the
- option is unset.
- </para>
- <para>
- The scope of these option changes depends on where in the
- pattern the setting occurs. For settings that are outside
- any subpattern (defined below), the effect is the same as if
- the options were set or unset at the start of matching. The
- following patterns all behave in exactly the same way:
- </para>
-
- <literallayout>
- (?i)abc
- a(?i)bc
- ab(?i)c
- abc(?i)
- </literallayout>
-
- <para>
- which in turn is the same as compiling the pattern abc with
- <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link> set.
- In other words, such "top level" settings apply to the whole
- pattern (unless there are other changes inside subpatterns).
- If there is more than one setting of the same option at top level,
- the rightmost setting is used.
- </para>
- <para>
- If an option change occurs inside a subpattern, the effect
- is different. This is a change of behaviour in Perl 5.005.
- An option change inside a subpattern affects only that part
- of the subpattern that follows it, so
-
- <literal>(a(?i)b)c</literal>
-
- matches abc and aBc and no other strings (assuming
- <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link>
is not used). By this means, options can be
- made to have different settings in different parts of the
- pattern. Any changes made in one alternative do carry on
- into subsequent branches within the same subpattern. For
- example,
-
- <literal>(a(?i)b|c)</literal>
-
- matches "ab", "aB", "c", and "C", even though when matching
- "C" the first branch is abandoned before the option setting.
- This is because the effects of option settings happen at
- compile time. There would be some very weird behaviour otherwise.
- </para>
- <para>
- The PCRE-specific options <link
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and
- <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> can
- be changed in the same way as the Perl-compatible options by
- using the characters U and X respectively. The (?X) flag
- setting is special in that it must always occur earlier in
- the pattern than any of the additional features it turns on,
- even when it is at top level. It is best put at the start.
+ <literallayout>
+ (?i)abc
+ a(?i)bc
+ ab(?i)c
+ abc(?i)
+ </literallayout>
+
+ <para>
+ which in turn is the same as compiling the pattern abc with
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link>
set.
+ In other words, such "top level" settings apply to the whole
+ pattern (unless there are other changes inside subpatterns).
+ If there is more than one setting of the same option at top level,
+ the rightmost setting is used.
+ </para>
+ <para>
+ If an option change occurs inside a subpattern, the effect
+ is different. This is a change of behaviour in Perl 5.005.
+ An option change inside a subpattern affects only that part
+ of the subpattern that follows it, so
+
+ <literal>(a(?i)b)c</literal>
+
+ matches abc and aBc and no other strings (assuming
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link>
is not used). By this means, options can be
+ made to have different settings in different parts of the
+ pattern. Any changes made in one alternative do carry on
+ into subsequent branches within the same subpattern. For
+ example,
+
+ <literal>(a(?i)b|c)</literal>
+
+ matches "ab", "aB", "c", and "C", even though when matching
+ "C" the first branch is abandoned before the option setting.
+ This is because the effects of option settings happen at
+ compile time. There would be some very weird behaviour otherwise.
+ </para>
+ <para>
+ The PCRE-specific options <link
linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and
+ <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> can
+ be changed in the same way as the Perl-compatible options by
+ using the characters U and X respectively. The (?X) flag
+ setting is special in that it must always occur earlier in
+ the pattern than any of the additional features it turns on,
+ even when it is at top level. It is best put at the start.
</para>
</refsect2>