irc-html Mon Jan 21 14:28:39 2002 EDT
Modified files:
/phpdoc/en/functions pcre.xml
Log:
whitespace correction
Index: phpdoc/en/functions/pcre.xml
diff -u phpdoc/en/functions/pcre.xml:1.68 phpdoc/en/functions/pcre.xml:1.69
--- phpdoc/en/functions/pcre.xml:1.68 Fri Dec 28 07:44:29 2001
+++ phpdoc/en/functions/pcre.xml Mon Jan 21 14:28:39 2002
@@ -1,5 +1,5 @@
<?xml version="1.0" encoding="iso-8859-1"?>
-<!-- $Revision: 1.68 $ -->
+<!-- $Revision: 1.69 $ -->
<reference id="ref.pcre">
<title>Regular Expression Functions (Perl-Compatible)</title>
<titleabbrev>PCRE</titleabbrev>
@@ -36,17 +36,17 @@
<itemizedlist>
<listitem>
<simpara>
- /href='(.*)' - missing ending delimiter
+ /href='(.*)' - missing ending delimiter
</simpara>
</listitem>
<listitem>
<simpara>
- /\w+\s*\w+/J - unknown modifier 'J'
+ /\w+\s*\w+/J - unknown modifier 'J'
</simpara>
</listitem>
<listitem>
<simpara>
- 1-\d3-\d3-\d4| - missing starting delimiter
+ 1-\d3-\d3-\d4| - missing starting delimiter
</simpara>
</listitem>
</itemizedlist>
@@ -65,7 +65,7 @@
</simpara>
</note>
</partintro>
-
+
<refentry id="function.preg-match">
<refnamediv>
<refname>preg_match</refname>
@@ -145,13 +145,13 @@
]]>
</programlisting>
<para>
- This example will produce:
- <screen>
+ This example will produce:
+ <screen>
<![CDATA[
domain name is: php.net
]]>
- </screen>
-</para>
+ </screen>
+ </para>
</example>
See also <function>preg_match_all</function>,
<function>preg_replace</function>, and
@@ -194,12 +194,12 @@
<varlistentry>
<term>PREG_PATTERN_ORDER</term>
<listitem>
- <para>
- Orders results so that $matches[0] is an array of full
- pattern matches, $matches[1] is an array of strings matched by
- the first parenthesized subpattern, and so on.
- <informalexample>
- <programlisting role="php">
+ <para>
+ Orders results so that $matches[0] is an array of full
+ pattern matches, $matches[1] is an array of strings matched by
+ the first parenthesized subpattern, and so on.
+ <informalexample>
+ <programlisting role="php">
<![CDATA[
preg_match_all ("|<[^>]+>(.*)</[^>]+>|U",
"<b>example: </b><div align=left>this is a test</div>",
@@ -220,18 +220,18 @@
and $out[1] contains array of strings enclosed by tags.
</para>
</informalexample>
- </para>
+ </para>
</listitem>
</varlistentry>
<varlistentry>
<term>PREG_SET_ORDER</term>
<listitem>
- <para>
- Orders results so that $matches[0] is an array of first set
- of matches, $matches[1] is an array of second set of matches,
- and so on.
- <informalexample>
- <programlisting role="php">
+ <para>
+ Orders results so that $matches[0] is an array of first set
+ of matches, $matches[1] is an array of second set of matches,
+ and so on.
+ <informalexample>
+ <programlisting role="php">
<![CDATA[
preg_match_all ("|<;[^>]+>(.*)</[^>]+>|U",
"<b>example: </b><div align=left>this is a test</div>",
@@ -239,25 +239,26 @@
print $out[0][0].", ".$out[0][1]."\n";
print $out[1][0].", ".$out[1][1]."\n";
]]>
- </programlisting>
- </informalexample>
- This example will produce:
- <informalexample>
- <programlisting role="php">
+ </programlisting>
+ </informalexample>
+ This example will produce:
+ <informalexample>
+ <programlisting role="php">
<![CDATA[
<b>example: </b>, example:
<div align=left>this is a test</div>, this is a test
]]>
- </programlisting>
- </informalexample>
- In this case, $matches[0] is the first set of matches, and
- $matches[0][0] has text matched by full pattern, $matches[0][1]
- has text matched by first subpattern and so on. Similarly,
- $matches[1] is the second set of matches, etc.
- </para>
+ </programlisting>
+ </informalexample>
+ In this case, $matches[0] is the first set of matches, and
+ $matches[0][0] has text matched by full pattern, $matches[0][1]
+ has text matched by first subpattern and so on. Similarly,
+ $matches[1] is the second set of matches, etc.
+ </para>
</listitem>
</varlistentry>
- </variablelist></para>
+ </variablelist>
+ </para>
<para>
If <parameter>order</parameter> is not specified, it is assumed
to be PREG_PATTERN_ORDER.
@@ -475,7 +476,7 @@
<para>
Parameter <parameter>limit</parameter> was added after PHP 4.0.1pl2.
</para>
- </note>
+ </note>
<para>
See also <function>preg_match</function>,
<function>preg_match_all</function>, and
@@ -550,35 +551,35 @@
</para>
<para>
- If <parameter>limit</parameter> is specified, then only substrings up to
- <parameter>limit</parameter> are returned, and if
- <parameter>limit</parameter> is -1, it actually means "no limit", which is
- useful for specifying the <parameter>flags</parameter>.
+ If <parameter>limit</parameter> is specified, then only substrings up to
+ <parameter>limit</parameter> are returned, and if
+ <parameter>limit</parameter> is -1, it actually means "no limit", which is
+ useful for specifying the <parameter>flags</parameter>.
</para>
<para>
- <parameter>flags</parameter> can be any combination of the following flags
- (combined with bitwise | operator):
+ <parameter>flags</parameter> can be any combination of the following flags
+ (combined with bitwise | operator):
<variablelist>
<varlistentry>
- <term>PREG_SPLIT_NO_EMPTY</term>
- <listitem>
- <simpara>
- If this flag is set, only non-empty pieces will be returned by
- <function>preg_split</function>.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term>PREG_SPLIT_DELIM_CAPTURE</term>
- <listitem>
- <simpara>
- If this flag is set, parenthesized expression in the delimiter pattern
- will be captured and returned as well. This flag was added for 4.0.5.
- </simpara>
- </listitem>
+ <term>PREG_SPLIT_NO_EMPTY</term>
+ <listitem>
+ <simpara>
+ If this flag is set, only non-empty pieces will be returned by
+ <function>preg_split</function>.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term>PREG_SPLIT_DELIM_CAPTURE</term>
+ <listitem>
+ <simpara>
+ If this flag is set, parenthesized expression in the delimiter pattern
+ will be captured and returned as well. This flag was added for 4.0.5.
+ </simpara>
+ </listitem>
</varlistentry>
- </variablelist>
+ </variablelist>
</para>
<para>
<example>
@@ -739,159 +740,159 @@
<blockquote>
<variablelist>
<varlistentry>
- <term><emphasis>i</emphasis> (PCRE_CASELESS)</term>
- <listitem>
- <simpara>
- If this modifier is set, letters in the pattern match both
- upper and lower case letters.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>m</emphasis> (PCRE_MULTILINE)</term>
- <listitem>
- <simpara>
- By default, PCRE treats the subject string as consisting of a
- single "line" of characters (even if it actually contains
- several newlines). The "start of line" metacharacter (^)
- matches only at the start of the string, while the "end of
- line" metacharacter ($) matches only at the end of the
- string, or before a terminating newline (unless
- <emphasis>D</emphasis> modifier is set). This is the same as
- Perl.
- </simpara>
- <simpara>
- When this modifier is set, the "start of line" and "end of
- line" constructs match immediately following or immediately
- before any newline in the subject string, respectively, as
- well as at the very start and end. This is equivalent to
- Perl's /m modifier. If there are no "\n" characters in a
- subject string, or no occurrences of ^ or $ in a pattern,
- setting this modifier has no effect.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>s</emphasis> (PCRE_DOTALL)</term>
- <listitem>
- <simpara>
- If this modifier is set, a dot metacharater in the pattern
- matches all characters, including newlines. Without it,
- newlines are excluded. This modifier is equivalent to Perl's
- /s modifier. A negative class such as [^a] always matches a
- newline character, independent of the setting of this
- modifier.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>x</emphasis> (PCRE_EXTENDED)</term>
- <listitem>
- <simpara>
- If this modifier is set, whitespace data characters in the
- pattern are totally ignored except when escaped or inside a
- character class, and characters between an unescaped #
- outside a character class and the next newline character,
- inclusive, are also ignored. This is equivalent to Perl's /x
- modifier, and makes it possible to include comments inside
- complicated patterns. Note, however, that this applies only
- to data characters. Whitespace characters may never appear
- within special character sequences in a pattern, for example
- within the sequence (?( which introduces a conditional
- subpattern.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>e</emphasis></term>
- <listitem>
- <simpara>
- If this modifier is set, <function>preg_replace</function>
- does normal substitution of backreferences in the
- replacement string, evaluates it as PHP code, and uses the
- result for replacing the search string.
- </simpara>
- <simpara>
- Only <function>preg_replace</function> uses this modifier;
- it is ignored by other PCRE functions.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>A</emphasis> (PCRE_ANCHORED)</term>
- <listitem>
- <simpara>
- If this modifier is set, the pattern is forced to be
- "anchored", that is, it is constrained to match only at the
- start of the string which is being searched (the "subject
- string"). This effect can also be achieved by appropriate
- constructs in the pattern itself, which is the only way to
- do it in Perl.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>D</emphasis> (PCRE_DOLLAR_ENDONLY)</term>
- <listitem>
- <simpara>
- If this modifier is set, a dollar metacharacter in the pattern
- matches only at the end of the subject string. Without this
- modifier, a dollar also matches immediately before the final
- character if it is a newline (but not before any other
- newlines). This modifier is ignored if <emphasis>m</emphasis>
- modifier is set. There is no equivalent to this modifier in
- Perl.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>S</emphasis></term>
- <listitem>
- <simpara>
- When a pattern is going to be used several times, it is
- worth spending more time analyzing it in order to speed up
- the time taken for matching. If this modifier is set, then
- this extra analysis is performed. At present, studying a
- pattern is useful only for non-anchored patterns that do not
- have a single fixed starting character.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>U</emphasis> (PCRE_UNGREEDY)</term>
- <listitem>
- <simpara>
- This modifier inverts the "greediness" of the quantifiers so
- that they are not greedy by default, but become greedy if
- followed by "?". It is not compatible with Perl. It can also
- be set by a (?U) modifier setting within the pattern.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>X</emphasis> (PCRE_EXTRA)</term>
- <listitem>
- <simpara>
- This modifier turns on additional functionality of PCRE that
- is incompatible with Perl. Any backslash in a pattern that
- is followed by a letter that has no special meaning causes
- an error, thus reserving these combinations for future
- expansion. By default, as in Perl, a backslash followed by a
- letter with no special meaning is treated as a literal.
- There are at present no other features controlled by this
- modifier.
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>u</emphasis> (PCRE_UTF8)</term>
- <listitem>
- <simpara>
- This modifier turns on additional functionality of PCRE that
- is incompatible with Perl. Pattern strings are treated as
- UTF-8. This modifier is available from PHP 4.1.0 or greater.
- </simpara>
- </listitem>
+ <term><emphasis>i</emphasis> (PCRE_CASELESS)</term>
+ <listitem>
+ <simpara>
+ If this modifier is set, letters in the pattern match both
+ upper and lower case letters.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>m</emphasis> (PCRE_MULTILINE)</term>
+ <listitem>
+ <simpara>
+ By default, PCRE treats the subject string as consisting of a
+ single "line" of characters (even if it actually contains
+ several newlines). The "start of line" metacharacter (^)
+ matches only at the start of the string, while the "end of
+ line" metacharacter ($) matches only at the end of the
+ string, or before a terminating newline (unless
+ <emphasis>D</emphasis> modifier is set). This is the same as
+ Perl.
+ </simpara>
+ <simpara>
+ When this modifier is set, the "start of line" and "end of
+ line" constructs match immediately following or immediately
+ before any newline in the subject string, respectively, as
+ well as at the very start and end. This is equivalent to
+ Perl's /m modifier. If there are no "\n" characters in a
+ subject string, or no occurrences of ^ or $ in a pattern,
+ setting this modifier has no effect.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>s</emphasis> (PCRE_DOTALL)</term>
+ <listitem>
+ <simpara>
+ If this modifier is set, a dot metacharacter in the pattern
+ matches all characters, including newlines. Without it,
+ newlines are excluded. This modifier is equivalent to Perl's
+ /s modifier. A negative class such as [^a] always matches a
+ newline character, independent of the setting of this
+ modifier.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>x</emphasis> (PCRE_EXTENDED)</term>
+ <listitem>
+ <simpara>
+ If this modifier is set, whitespace data characters in the
+ pattern are totally ignored except when escaped or inside a
+ character class, and characters between an unescaped #
+ outside a character class and the next newline character,
+ inclusive, are also ignored. This is equivalent to Perl's /x
+ modifier, and makes it possible to include comments inside
+ complicated patterns. Note, however, that this applies only
+ to data characters. Whitespace characters may never appear
+ within special character sequences in a pattern, for example
+ within the sequence (?( which introduces a conditional
+ subpattern.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>e</emphasis></term>
+ <listitem>
+ <simpara>
+ If this modifier is set, <function>preg_replace</function>
+ does normal substitution of backreferences in the
+ replacement string, evaluates it as PHP code, and uses the
+ result for replacing the search string.
+ </simpara>
+ <simpara>
+ Only <function>preg_replace</function> uses this modifier;
+ it is ignored by other PCRE functions.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>A</emphasis> (PCRE_ANCHORED)</term>
+ <listitem>
+ <simpara>
+ If this modifier is set, the pattern is forced to be
+ "anchored", that is, it is constrained to match only at the
+ start of the string which is being searched (the "subject
+ string"). This effect can also be achieved by appropriate
+ constructs in the pattern itself, which is the only way to
+ do it in Perl.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>D</emphasis> (PCRE_DOLLAR_ENDONLY)</term>
+ <listitem>
+ <simpara>
+ If this modifier is set, a dollar metacharacter in the pattern
+ matches only at the end of the subject string. Without this
+ modifier, a dollar also matches immediately before the final
+ character if it is a newline (but not before any other
+ newlines). This modifier is ignored if <emphasis>m</emphasis>
+ modifier is set. There is no equivalent to this modifier in
+ Perl.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>S</emphasis></term>
+ <listitem>
+ <simpara>
+ When a pattern is going to be used several times, it is
+ worth spending more time analyzing it in order to speed up
+ the time taken for matching. If this modifier is set, then
+ this extra analysis is performed. At present, studying a
+ pattern is useful only for non-anchored patterns that do not
+ have a single fixed starting character.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>U</emphasis> (PCRE_UNGREEDY)</term>
+ <listitem>
+ <simpara>
+ This modifier inverts the "greediness" of the quantifiers so
+ that they are not greedy by default, but become greedy if
+ followed by "?". It is not compatible with Perl. It can also
+ be set by a (?U) modifier setting within the pattern.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>X</emphasis> (PCRE_EXTRA)</term>
+ <listitem>
+ <simpara>
+ This modifier turns on additional functionality of PCRE that
+ is incompatible with Perl. Any backslash in a pattern that
+ is followed by a letter that has no special meaning causes
+ an error, thus reserving these combinations for future
+ expansion. By default, as in Perl, a backslash followed by a
+ letter with no special meaning is treated as a literal.
+ There are at present no other features controlled by this
+ modifier.
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>u</emphasis> (PCRE_UTF8)</term>
+ <listitem>
+ <simpara>
+ This modifier turns on additional functionality of PCRE that
+ is incompatible with Perl. Pattern strings are treated as
+ UTF-8. This modifier is available from PHP 4.1.0 or greater.
+ </simpara>
+ </listitem>
</varlistentry>
</variablelist>
</blockquote>
@@ -922,31 +923,31 @@
The differences described here are with respect to Perl
5.005.
<orderedlist>
- <listitem>
- <simpara>
- By default, a whitespace character is any character that
- the C library function isspace() recognizes, though it is
- possible to compile PCRE with alternative character type
- tables. Normally isspace() matches space, formfeed, newline,
- carriage return, horizontal tab, and vertical tab. Perl 5 no
- longer includes vertical tab in its set of whitespace char-
- acters. The \v escape that was in the Perl documentation for
- a long time was never in fact recognized. However, the char-
- acter itself was treated as whitespace at least up to 5.002.
- In 5.004 and 5.005 it does not match \s.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ <listitem>
+ <simpara>
+ By default, a whitespace character is any character that
+ the C library function isspace() recognizes, though it is
+ possible to compile PCRE with alternative character type
+ tables. Normally isspace() matches space, formfeed, newline,
+ carriage return, horizontal tab, and vertical tab. Perl 5 no
+ longer includes vertical tab in its set of whitespace characters.
+ The \v escape that was in the Perl documentation for
+ a long time was never in fact recognized. However, the character
+ itself was treated as whitespace at least up to 5.002.
+ In 5.004 and 5.005 it does not match \s.
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
PCRE does not allow repeat quantifiers on lookahead
assertions. Perl permits them, but they do not mean what you
might think. For example, (?!a){3} does not assert that the
next three characters are not "a". It just asserts that the
next character is not "a" three times.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
Capturing subpatterns that occur inside negative looka-
head assertions are counted, but their entries in the
offsets vector are never set. Perl sets its numerical vari-
@@ -954,39 +955,39 @@
assertion fails to match something (thereby succeeding), but
only if the negative lookahead assertion contains just one
branch.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
Though binary zero characters are supported in the sub-
ject string, they are not allowed in a pattern string
because it is passed as a normal C string, terminated by
zero. The escape sequence "\0" can be used in the pattern to
represent a binary zero.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
The following Perl escape sequences are not supported:
\l, \u, \L, \U, \E, \Q. In fact these are implemented by
Perl's general string-handling and are not part of its pat-
tern matching engine.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
The Perl \G assertion is not supported as it is not
relevant to single pattern matches.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
Fairly obviously, PCRE does not support the (?{code})
construction.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
There are at the time of writing some oddities in Perl
5.005_02 concerned with the settings of captured strings
when part of a pattern is repeated. For example, matching
@@ -997,23 +998,23 @@
In Perl 5.004 $2 is set in both cases, and that is also &true;
of PCRE. If in the future Perl changes to a consistent state
that is different, PCRE may change to follow.
- </simpara>
- </listitem>
- <listitem>
- <simpara>
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
Another as yet unresolved discrepancy is that in Perl
5.005_02 the pattern /^(a)?(?(1)a|b)+$/ matches the string
"a", whereas in PCRE it does not. However, in both Perl and
PCRE /^(a)?a/ matched against "a" leaves $1 unset.
- </simpara>
- </listitem>
- <listitem>
- <para>
+ </simpara>
+ </listitem>
+ <listitem>
+ <para>
PCRE provides some extensions to the Perl regular
expression facilities:
- <orderedlist>
- <listitem>
- <simpara>
+ <orderedlist>
+ <listitem>
+ <simpara>
Although lookbehind assertions must match fixed length
strings, each alternative branch of a lookbehind assertion
can match a different length of string. Perl 5.005 requires
@@ -1042,9 +1043,9 @@
</simpara>
</listitem>
</orderedlist>
- </para>
- </listitem>
- </orderedlist>
+ </para>
+ </listitem>
+ </orderedlist>
</para>
</refsect1>
@@ -1070,8 +1071,8 @@
itself.
</para>
</refsect2>
- <refsect2 id="regexp.reference.meta">
- <title>Meta-caracters</title>
+ <refsect2 id="regexp.reference.meta">
+ <title>Meta-caracters</title>
<para>
The power of regular expressions comes from the
ability to include alternatives and repetitions in the pat-
@@ -1086,116 +1087,116 @@
Outside square brackets, the meta-characters are as follows:
<variablelist>
<varlistentry>
- <term><emphasis>\</emphasis></term>
- <listitem>
- <simpara>
- general escape character with several uses
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>^</emphasis></term>
- <listitem>
- <simpara>
- assert start of subject (or line, in multiline mode)
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>$</emphasis></term>
- <listitem>
- <simpara>
- assert end of subject (or line, in multiline mode)
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>.</emphasis></term>
- <listitem>
- <simpara>
- match any character except newline (by default)
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>[</emphasis></term>
- <listitem>
- <simpara>
- start character class definition
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>]</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\</emphasis></term>
+ <listitem>
+ <simpara>
+ general escape character with several uses
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>^</emphasis></term>
+ <listitem>
+ <simpara>
+ assert start of subject (or line, in multiline mode)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>$</emphasis></term>
+ <listitem>
+ <simpara>
+ assert end of subject (or line, in multiline mode)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>.</emphasis></term>
+ <listitem>
+ <simpara>
+ match any character except newline (by default)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>[</emphasis></term>
+ <listitem>
+ <simpara>
+ start character class definition
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>]</emphasis></term>
+ <listitem>
+ <simpara>
end character class definition
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>|</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>|</emphasis></term>
+ <listitem>
+ <simpara>
start of alternative branch
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>(</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>(</emphasis></term>
+ <listitem>
+ <simpara>
start subpattern
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>)</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>)</emphasis></term>
+ <listitem>
+ <simpara>
end subpattern
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>?</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>?</emphasis></term>
+ <listitem>
+ <simpara>
extends the meaning of (, also 0 or 1 quantifier, also quantifier minimizer
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>*</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>*</emphasis></term>
+ <listitem>
+ <simpara>
0 or more quantifier
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>+</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>+</emphasis></term>
+ <listitem>
+ <simpara>
1 or more quantifier
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>{</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>{</emphasis></term>
+ <listitem>
+ <simpara>
start min/max quantifier
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>}</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>}</emphasis></term>
+ <listitem>
+ <simpara>
end min/max quantifier
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
</variablelist>
@@ -1204,36 +1205,36 @@
characters are:
<variablelist>
<varlistentry>
- <term><emphasis>\</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\</emphasis></term>
+ <listitem>
+ <simpara>
general escape character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>^</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>^</emphasis></term>
+ <listitem>
+ <simpara>
negate the class, but only if the first character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>-</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>-</emphasis></term>
+ <listitem>
+ <simpara>
indicates character range
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>]</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>]</emphasis></term>
+ <listitem>
+ <simpara>
terminates the character class
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
</variablelist>
The following sections describe the use of each of the
@@ -1277,76 +1278,76 @@
<para>
<variablelist>
<varlistentry>
- <term><emphasis>\a</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\a</emphasis></term>
+ <listitem>
+ <simpara>
alarm, that is, the BEL character (hex 07)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\cx</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\cx</emphasis></term>
+ <listitem>
+ <simpara>
"control-x", where x is any character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\e</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\e</emphasis></term>
+ <listitem>
+ <simpara>
escape (hex 1B)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\f</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\f</emphasis></term>
+ <listitem>
+ <simpara>
formfeed (hex 0C)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\n</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\n</emphasis></term>
+ <listitem>
+ <simpara>
newline (hex 0A)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\r</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\r</emphasis></term>
+ <listitem>
+ <simpara>
carriage return (hex 0D)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\t</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\t</emphasis></term>
+ <listitem>
+ <simpara>
tab (hex 09)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\xhh</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\xhh</emphasis></term>
+ <listitem>
+ <simpara>
character with hex code hh
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\ddd</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\ddd</emphasis></term>
+ <listitem>
+ <simpara>
character with octal code ddd, or backreference
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
</variablelist>
</para>
@@ -1389,80 +1390,80 @@
<para>
<variablelist>
<varlistentry>
- <term><emphasis>\040</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\040</emphasis></term>
+ <listitem>
+ <simpara>
is another way of writing a space
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\40</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\40</emphasis></term>
+ <listitem>
+ <simpara>
is the same, provided there are fewer than 40
previous capturing subpatterns
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\7</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\7</emphasis></term>
+ <listitem>
+ <simpara>
is always a back reference
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\11</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\11</emphasis></term>
+ <listitem>
+ <simpara>
might be a back reference, or another way of
writing a tab
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\011</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\011</emphasis></term>
+ <listitem>
+ <simpara>
is always a tab
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\0113</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\0113</emphasis></term>
+ <listitem>
+ <simpara>
is a tab followed by the character "3"
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\113</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\113</emphasis></term>
+ <listitem>
+ <simpara>
is the character with octal code 113 (since there
can be no more than 99 back references)
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\377</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\377</emphasis></term>
+ <listitem>
+ <simpara>
is a byte consisting entirely of 1 bits
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\81</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\81</emphasis></term>
+ <listitem>
+ <simpara>
is either a back reference, or a binary zero
followed by the two characters "8" and "1"
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
</variablelist>
</para>
@@ -1485,52 +1486,52 @@
<para>
<variablelist>
<varlistentry>
- <term><emphasis>\d</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\d</emphasis></term>
+ <listitem>
+ <simpara>
any decimal digit
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\D</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\D</emphasis></term>
+ <listitem>
+ <simpara>
any character that is not a decimal digit
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\s</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\s</emphasis></term>
+ <listitem>
+ <simpara>
any whitespace character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\S</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\S</emphasis></term>
+ <listitem>
+ <simpara>
any character that is not a whitespace character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\w</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\w</emphasis></term>
+ <listitem>
+ <simpara>
any "word" character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
<varlistentry>
- <term><emphasis>\W</emphasis></term>
- <listitem>
- <simpara>
+ <term><emphasis>\W</emphasis></term>
+ <listitem>
+ <simpara>
any "non-word" character
- </simpara>
- </listitem>
+ </simpara>
+ </listitem>
</varlistentry>
</variablelist>
</para>
@@ -1565,49 +1566,49 @@
backslashed assertions are
</para>
<para>
- <variablelist>
- <varlistentry>
- <term><emphasis>\b</emphasis></term>
- <listitem>
- <simpara>
- word boundary
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\B</emphasis></term>
- <listitem>
- <simpara>
- not a word boundary
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\A</emphasis></term>
- <listitem>
- <simpara>
- start of subject (independent of multiline mode)
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\Z</emphasis></term>
- <listitem>
- <simpara>
- end of subject or newline at end (independent of
- multiline mode)
- </simpara>
- </listitem>
- </varlistentry>
- <varlistentry>
- <term><emphasis>\z</emphasis></term>
- <listitem>
- <simpara>
- end of subject (independent of multiline mode)
- </simpara>
- </listitem>
- </varlistentry>
- </variablelist>
+ <variablelist>
+ <varlistentry>
+ <term><emphasis>\b</emphasis></term>
+ <listitem>
+ <simpara>
+ word boundary
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\B</emphasis></term>
+ <listitem>
+ <simpara>
+ not a word boundary
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\A</emphasis></term>
+ <listitem>
+ <simpara>
+ start of subject (independent of multiline mode)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\Z</emphasis></term>
+ <listitem>
+ <simpara>
+ end of subject or newline at end (independent of
+ multiline mode)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ <varlistentry>
+ <term><emphasis>\z</emphasis></term>
+ <listitem>
+ <simpara>
+ end of subject (independent of multiline mode)
+ </simpara>
+ </listitem>
+ </varlistentry>
+ </variablelist>
</para>
<para>
These assertions may not appear in character classes (but
@@ -1634,8 +1635,9 @@
string, whereas <literal>\z</literal> matches only at the end.
</para>
</refsect2>
- <refsect2 id="regexp.reference.circudollar">
- <title>Circumflex and dollar</title>
+
+ <refsect2 id="regexp.reference.circudollar">
+ <title>Circumflex and dollar</title>
<literallayout>
Outside a character class, in the default matching mode, the
circumflex character is an assertion which is true only if
@@ -1684,8 +1686,9 @@
whether <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or
not.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.dot">
- <title>FULL STOP</title>
+
+ <refsect2 id="regexp.reference.dot">
+ <title>FULL STOP</title>
<literallayout>
Outside a character class, a dot in the pattern matches any
one character in the subject, including a non-printing
@@ -1697,8 +1700,9 @@
in a character class.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.squarebrackets">
- <title>Square brackets</title>
+
+ <refsect2 id="regexp.reference.squarebrackets">
+ <title>Square brackets</title>
<literallayout>
An opening square bracket introduces a character class, ter-
minated by a closing square bracket. A closing square
@@ -1776,8 +1780,9 @@
classes, but it does no harm if they are escaped.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.verticalbar">
- <title>Vertical bar</title>
+
+ <refsect2 id="regexp.reference.verticalbar">
+ <title>Vertical bar</title>
<literallayout>
Vertical bar characters are used to separate alternative
patterns. For example, the pattern
@@ -1794,8 +1799,9 @@
subpattern.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.internal-options">
- <title>Internal option setting</title>
+
+ <refsect2 id="regexp.reference.internal-options">
+ <title>Internal option setting</title>
<literallayout>
The settings of <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link> ,
<link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> ,
@@ -1866,8 +1872,9 @@
even when it is at top level. It is best put at the start.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.subpatterns">
- <title>subpatterns</title>
+
+ <refsect2 id="regexp.reference.subpatterns">
+ <title>subpatterns</title>
<literallayout>
Subpatterns are delimited by parentheses (round brackets),
which can be nested. Marking part of a pattern as a subpat-
@@ -1929,8 +1936,9 @@
the above patterns match "SUNDAY" as well as "Saturday".
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.repetition">
- <title>Repetition</title>
+
+ <refsect2 id="regexp.reference.repetition">
+ <title>Repetition</title>
<literallayout>
Repetition is specified by quantifiers, which can follow any
of the following items:
@@ -2069,8 +2077,9 @@
"b".
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.back-references">
- <title>BACK REFERENCES</title>
+
+ <refsect2 id="regexp.reference.back-references">
+ <title>BACK REFERENCES</title>
<literallayout>
Outside a character class, a backslash followed by a digit
greater than 0 (and possibly further digits) is a back
@@ -2136,8 +2145,9 @@
example above, or by a quantifier with a minimum of zero.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.assertions">
- <title>Assertions</title>
+
+ <refsect2 id="regexp.reference.assertions">
+ <title>Assertions</title>
<literallayout>
An assertion is a test on the characters following or
preceding the current matching point that does not actually
@@ -2257,8 +2267,9 @@
subpatterns.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.onlyonce">
- <title>Once-only subpatterns</title>
+
+ <refsect2 id="regexp.reference.onlyonce">
+ <title>Once-only subpatterns</title>
<literallayout>
With both maximizing and minimizing repetition, failure of
what follows normally causes the repeated item to be re-
@@ -2367,8 +2378,9 @@
pens quickly.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.conditional">
- <title>Conditional subpatterns</title>
+
+ <refsect2 id="regexp.reference.conditional">
+ <title>Conditional subpatterns</title>
<literallayout>
It is possible to cause the matching process to obey a sub-
pattern conditionally or to choose between two alternative
@@ -2426,8 +2438,9 @@
letters and dd are digits.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.comments">
- <title>Comments</title>
+
+ <refsect2 id="regexp.reference.comments">
+ <title>Comments</title>
<literallayout>
The sequence (?# marks the start of a comment which
continues up to the next closing parenthesis. Nested
@@ -2439,8 +2452,9 @@
ues up to the next newline character in the pattern.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.recursive">
- <title>Recursive patterns</title>
+
+ <refsect2 id="regexp.reference.recursive">
+ <title>Recursive patterns</title>
<literallayout>
Consider the problem of matching a string in parentheses,
allowing for unlimited nested parentheses. Without the use
@@ -2499,8 +2513,9 @@
recursion.
</literallayout>
</refsect2>
- <refsect2 id="regexp.reference.performances">
- <title>Performances</title>
+
+ <refsect2 id="regexp.reference.performances">
+ <title>Performances</title>
<literallayout>
Certain items that may appear in patterns are more efficient
than others. It is more efficient to use a character class