nlopess Thu Jul 29 06:15:26 2004 EDT
Modified files: /phpdoc/en/reference/pcre pattern.modifiers.xml pattern.syntax.xml reference.xml Log: fix IDs: now livedocs correctly links the pattern syntax/modifiers some WS
http://cvs.php.net/diff.php/phpdoc/en/reference/pcre/pattern.modifiers.xml?r1=1.1&r2=1.2&ty=u Index: phpdoc/en/reference/pcre/pattern.modifiers.xml diff -u phpdoc/en/reference/pcre/pattern.modifiers.xml:1.1 phpdoc/en/reference/pcre/pattern.modifiers.xml:1.2 --- phpdoc/en/reference/pcre/pattern.modifiers.xml:1.1 Wed Mar 3 00:06:14 2004 +++ phpdoc/en/reference/pcre/pattern.modifiers.xml Thu Jul 29 06:15:26 2004 @@ -1,7 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!-- $Revision: 1.1 $ --> +<!-- $Revision: 1.2 $ --> <!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 --> - <refentry id="pcre.pattern.modifiers"> + <refentry id="reference.pcre.pattern.modifiers"> <refnamediv> <refname>Pattern Modifiers</refname> <refpurpose>Describes possible modifiers in regex http://cvs.php.net/diff.php/phpdoc/en/reference/pcre/pattern.syntax.xml?r1=1.1&r2=1.2&ty=u Index: phpdoc/en/reference/pcre/pattern.syntax.xml diff -u phpdoc/en/reference/pcre/pattern.syntax.xml:1.1 phpdoc/en/reference/pcre/pattern.syntax.xml:1.2 --- phpdoc/en/reference/pcre/pattern.syntax.xml:1.1 Wed Mar 3 00:06:14 2004 +++ phpdoc/en/reference/pcre/pattern.syntax.xml Thu Jul 29 06:15:26 2004 @@ -1,7 +1,7 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!-- $Revision: 1.1 $ --> +<!-- $Revision: 1.2 $ --> <!-- splitted from ./en/functions/pcre.xml, last change in rev 1.2 --> - <refentry id="pcre.pattern.syntax"> + <refentry id="reference.pcre.pattern.syntax"> <refnamediv> <refname>Pattern Syntax</refname> <refpurpose>Describes PCRE regex syntax</refpurpose> @@ -121,23 +121,29 @@ </listitem> <listitem> <simpara> - If <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> is set and - <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is not - set, the $ meta-character matches only at the very end of - the string. + If <link + linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> + is set and <link + linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is + not set, the $ meta-character matches only at the very end of the + string. </simpara> </listitem> <listitem> <simpara> - If <link linkend="pcre.pattern.modifiers">PCRE_EXTRA</link> is set, a backslash followed by a letter - with no special meaning is faulted. + If <link + linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> is + set, a backslash followed by a letter with no special meaning is + faulted. </simpara> </listitem> <listitem> <simpara> - If <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> is set, the greediness of the - repetition quantifiers is inverted, that is, by default they are - not greedy, but if followed by a question mark they are. + If <link + linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> is + set, the greediness of the repetition quantifiers is inverted, + that is, by default they are not greedy, but if followed by a + question mark they are. </simpara> </listitem> </orderedlist> @@ -358,12 +364,12 @@ particular, if you want to match a backslash, you write "\\". </para> <para> - If a pattern is compiled with the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option, + If a pattern is compiled with the <link + linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option, whitespace in the pattern (other than in a character class) and - characters between a "#" outside a character class and the - next newline character are ignored. An escaping backslash - can be used to include a whitespace or "#" character as part - of the pattern. + characters between a "#" outside a character class and the next newline + character are ignored. An escaping backslash can be used to include a + whitespace or "#" character as part of the pattern. </para> <para> A second use of backslash provides a way of encoding @@ -731,13 +737,13 @@ circumflex and dollar (described below) in that they only ever match at the very start and end of the subject string, whatever options are set. They are not affected by the - <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> or - <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> options. - The difference between <literal>\Z</literal> and - <literal>\z</literal> is that <literal>\Z</literal> - matches before a newline that is the - last character of the string as well as at the end of the - string, whereas <literal>\z</literal> matches only at the end. + <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> or + <link + linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> + options. The difference between <literal>\Z</literal> and + <literal>\z</literal> is that <literal>\Z</literal> matches before a + newline that is the last character of the string as well as at the end of + the string, whereas <literal>\z</literal> matches only at the end. </para> </refsect2> @@ -773,28 +779,31 @@ <para> The meaning of dollar can be changed so that it matches only at the very end of the string, by setting the - <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> + <link linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> option at compile or matching time. This does not affect the \Z assertion. </para> <para> The meanings of the circumflex and dollar characters are - changed if the <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> option is set. When this is - the case, they match immediately after and immediately - before an internal "\n" character, respectively, in addition - to matching at the start and end of the subject string. For - example, the pattern /^abc$/ matches the subject string - "def\nabc" in multiline mode, but not otherwise. - Consequently, patterns that are anchored in single line mode - because all branches start with "^" are not anchored in - multiline mode. The <link linkend="pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> option is ignored if - <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is set. + changed if the <link + linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> option + is set. When this is the case, they match immediately after and + immediately before an internal "\n" character, respectively, in addition + to matching at the start and end of the subject string. For example, the + pattern /^abc$/ matches the subject string "def\nabc" in multiline mode, + but not otherwise. Consequently, patterns that are anchored in single + line mode because all branches start with "^" are not anchored in + multiline mode. The <link + linkend="reference.pcre.pattern.modifiers">PCRE_DOLLAR_ENDONLY</link> + option is ignored if <link + linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is + set. </para> <para> Note that the sequences \A, \Z, and \z can be used to match the start and end of the subject in both modes, and if all branches of a pattern start with \A is it always anchored, - whether <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not. + whether <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> is set or not. </para> </refsect2> @@ -803,7 +812,8 @@ <para> Outside a character class, a dot in the pattern matches any one character in the subject, including a non-printing - character, but not (by default) newline. If the <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> + character, but not (by default) newline. If the <link + linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option is set, then dots match newlines as well. The handling of dot is entirely independent of the handling of circumflex and dollar, the only relationship being that they @@ -850,9 +860,10 @@ </para> <para> The newline character is never treated in any special way in - character classes, whatever the setting of the <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> - or <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> options is. A class such as [^a] will - always match a newline. + character classes, whatever the setting of the <link + linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> + or <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> + options is. A class such as [^a] will always match a newline. </para> <para> The minus (hyphen) character can be used to specify a range @@ -923,10 +934,10 @@ <refsect2 id="regexp.reference.internal-options"> <title>Internal option setting</title> <para> - The settings of <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link>, - <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link>, - <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>, - and <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> can be changed from within the pattern by + The settings of <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link>, + <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link>, + <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>, + and <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> can be changed from within the pattern by a sequence of Perl option letters enclosed between "(?" and ")". The option letters are @@ -936,19 +947,19 @@ <tbody> <row> <entry><literal>i</literal></entry> - <entry>for <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link></entry> + <entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link></entry> </row> <row> <entry><literal>m</literal></entry> - <entry>for <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link></entry> + <entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link></entry> </row> <row> <entry><literal>s</literal></entry> - <entry>for <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link></entry> + <entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link></entry> </row> <row> <entry><literal>x</literal></entry> - <entry>for <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link></entry> + <entry>for <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link></entry> </row> </tbody> </tgroup> @@ -958,8 +969,8 @@ For example, (?im) sets caseless, multiline matching. It is also possible to unset these options by preceding the letter with a hyphen, and a combined setting and unsetting such as - (?im-sx), which sets <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link> and <link linkend="pcre.pattern.modifiers">PCRE_MULTILINE</link> while - unsetting <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> and <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link>, is also permitted. + (?im-sx), which sets <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link> and <link linkend="reference.pcre.pattern.modifiers">PCRE_MULTILINE</link> while + unsetting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> and <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link>, is also permitted. If a letter appears both before and after the hyphen, the option is unset. </para> @@ -980,7 +991,7 @@ <para> which in turn is the same as compiling the pattern abc with - <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link> set. + <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link> set. In other words, such "top level" settings apply to the whole pattern (unless there are other changes inside subpatterns). If there is more than one setting of the same option at top level, @@ -995,7 +1006,7 @@ <literal>(a(?i)b)c</literal> matches abc and aBc and no other strings (assuming - <link linkend="pcre.pattern.modifiers">PCRE_CASELESS</link> is not used). By this means, options can be + <link linkend="reference.pcre.pattern.modifiers">PCRE_CASELESS</link> is not used). By this means, options can be made to have different settings in different parts of the pattern. Any changes made in one alternative do carry on into subsequent branches within the same subpattern. For @@ -1009,8 +1020,8 @@ compile time. There would be some very weird behaviour otherwise. </para> <para> - The PCRE-specific options <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> and - <link linkend="pcre.pattern.modifiers">PCRE_EXTRA</link> can + The PCRE-specific options <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> and + <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTRA</link> can be changed in the same way as the Perl-compatible options by using the characters U and X respectively. The (?X) flag setting is special in that it must always occur earlier in @@ -1218,7 +1229,7 @@ that is the only way the rest of the pattern matches. </para> <para> - If the <link linkend="pcre.pattern.modifiers">PCRE_UNGREEDY</link> option is set (an option which is not + If the <link linkend="reference.pcre.pattern.modifiers">PCRE_UNGREEDY</link> option is set (an option which is not available in Perl) then the quantifiers are not greedy by default, but individual ones can be made greedy by following them with a question mark. In other words, it inverts the @@ -1231,7 +1242,7 @@ proportion to the size of the minimum or maximum. </para> <para> - If a pattern starts with .* or .{0,} and the <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> + If a pattern starts with .* or .{0,} and the <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option (equivalent to Perl's /s) is set, thus allowing the . to match newlines, then the pattern is implicitly anchored, because whatever follows will be tried against every character @@ -1239,7 +1250,7 @@ retrying the overall match at any position after the first. PCRE treats such a pattern as though it were preceded by \A. In cases where it is known that the subject string contains - no newlines, it is worth setting <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> when the pattern begins with .* in order to + no newlines, it is worth setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> when the pattern begins with .* in order to obtain this optimization, or alternatively using ^ to indicate anchoring explicitly. </para> @@ -1311,7 +1322,7 @@ following the backslash are taken as part of a potential back reference number. If the pattern continues with a digit character, then some delimiter must be used to terminate the - back reference. If the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, this can + back reference. If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, this can be whitespace. Otherwise an empty comment can be used. </para> <para> @@ -1603,7 +1614,7 @@ condition is satisfied if the capturing subpattern of that number has previously matched. Consider the following pattern, which contains non-significant white space to make it - more readable (assume the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option) and to + more readable (assume the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option) and to divide it into three parts for ease of discussion: <literal>( \( )? [^()]+ (?(1) \) )</literal> @@ -1655,7 +1666,7 @@ comment play no part in the pattern matching at all. </para> <para> - If the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, an unescaped # character + If the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set, an unescaped # character outside a character class introduces a comment that continues up to the next newline character in the pattern. </para> @@ -1673,7 +1684,7 @@ expressions to recurse (among other things). The special item (?R) is provided for the specific case of recursion. This PCRE pattern solves the parentheses problem (assume - the <link linkend="pcre.pattern.modifiers">PCRE_EXTENDED</link> + the <link linkend="reference.pcre.pattern.modifiers">PCRE_EXTENDED</link> option is set so that white space is ignored): @@ -1737,10 +1748,10 @@ regular expressions for efficient performance. </para> <para> - When a pattern begins with .* and the <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> option is + When a pattern begins with .* and the <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> option is set, the pattern is implicitly anchored by PCRE, since it can match only at the start of a subject string. However, if - <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link> is not set, PCRE cannot make this optimization, + <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link> is not set, PCRE cannot make this optimization, because the . metacharacter does not then match a newline, and if the subject string contains newlines, the pattern may match from the character immediately following one of them @@ -1756,7 +1767,7 @@ <para> If you are using such a pattern with subject strings that do not contain newlines, the best performance is obtained by - setting <link linkend="pcre.pattern.modifiers">PCRE_DOTALL</link>, or starting the pattern with ^.* to + setting <link linkend="reference.pcre.pattern.modifiers">PCRE_DOTALL</link>, or starting the pattern with ^.* to indicate explicit anchoring. That saves PCRE from having to scan along the subject looking for a newline to restart at. </para> http://cvs.php.net/diff.php/phpdoc/en/reference/pcre/reference.xml?r1=1.11&r2=1.12&ty=u Index: phpdoc/en/reference/pcre/reference.xml diff -u phpdoc/en/reference/pcre/reference.xml:1.11 phpdoc/en/reference/pcre/reference.xml:1.12 --- phpdoc/en/reference/pcre/reference.xml:1.11 Wed Mar 3 00:06:14 2004 +++ phpdoc/en/reference/pcre/reference.xml Thu Jul 29 06:15:26 2004 @@ -1,5 +1,5 @@ <?xml version="1.0" encoding="iso-8859-1"?> -<!-- $Revision: 1.11 $ --> +<!-- $Revision: 1.12 $ --> <reference id="ref.pcre"> <title>Regular Expression Functions (Perl-Compatible)</title> <titleabbrev>PCRE</titleabbrev> @@ -15,13 +15,14 @@ the delimiter character has to be used in the expression itself, it needs to be escaped by backslash. Since PHP 4.0.4, you can also use Perl-style (), {}, [], and <> matching delimiters. - See <link linkend="pcre.pattern.syntax">Pattern Syntax</link> + See <link linkend="reference.pcre.pattern.syntax">Pattern Syntax</link> for detailed explanation. </para> <para> The ending delimiter may be followed by various modifiers that affect the matching. - See <link linkend="pcre.pattern.modifiers">Pattern Modifiers</link>. + See <link linkend="reference.pcre.pattern.modifiers">Pattern + Modifiers</link>. </para> <para> PHP also supports regular expressions using a POSIX-extended syntax