On 01 October 2003 21:02, [EMAIL PROTECTED] wrote:

> 1. The PHP manual sais to escape the escape char, it has to
> be written
> twice*, but:


Yes, it does.  But it also says that to put a \ into a string, you need to
write it twice ("escape" it) ***. So:

> $term = preg_replace('/(\\)/', 'backslash $1', $term);
> causes an error** while using three backslashes (see 2.) works.

PHP's string processing replaces that \\ with a single \, so preg_replace
sees the pattern as /(\)/ which is invalid (as it needs the \ to be
"escaped", or doubled).

> 2.1.
> $term = "beg \ end";

Here, because backslash-space is not a valid (PHP) escape sequence, PHP
passes it unchanged.

> print preg_replace('/(\\\)/', 'backslash $1', $term);

But this gets its number of backslahes reduced by one -- the initial \\ is
replaced by \, but then PHP's string processor sees \), which is not a valid
PHP escape sequnce so is passed unchanged -- i.e. as \) .  So preg_replace
sees a pattern of /(\\)/, which now contains a (preg) escape sequence of \\,
validly representing a single \ in the pattern.  And your string contains a
single \ (remember?), so it matches.

> returns: beg backslash \ end
> 
> 2.2.
> $term = "beg \\ end";

This string contains the (PHP) escape sequence \\, which is reduced to a
single \ by php's string processing -- so this $term is, in fact, identical
to the one in 2.1.

> print preg_replace('/(\\\)/', 'backslash $1', $term);
> returns: beg backslash \ end (the same as 2.1.)

QED

> 
> 2.3.
> $term = "beg \\\ end";
> print preg_replace('/(\\\)/', 'backslash $1', $term);
> returns: beg backslash \backslash \ end

And here, the string in $term has its triple \\\ reduced to double \\ (by
the same reasoning as before), and the pattern is (as before) matching a
single backslash.  So each of the two backslashes in $term is replaced by
"backslash", a space, and the matched \, giving your result.  Again, QED.

The real trick here is that there are *two* levels of \-escaping going on --
PHP's string processing does one, and the regex interpreter does a second.
So to write an absolutely guaranteed cast-iron regex fragment that will
match a single backslash, you actually have to include *four* backslashes in
your PHP script:

    preg_replace('/(\\\\)/', ...)

PHP's string processing will reduce the four backslashes to two in the
actual string passed to preg_replace, and then the regex interpreter will
treat the \\ so passed as a valid escape sequence to match a single \.  This
is called backslash proliferation, or "leaning toothpick" syndrome.

Why is it important to use all four backslashes, even when three seems to do
it?  Well, consider this:

    preg_replace("/\\\test/", ...)

Here, the first two \\ will be reduced to \ (ok), but the next two
characters, \t, are the PHP escape sequence for a tab character (oops!), so
your string will end up containing a \, a tab, and the three characters est
-- probably not what was intended!

Hope this helps you build your regexes better.  Personally, my approach is
often to write my regex without bothering to escape any backslashes, then go
through putting in the required regex escapes, and then go through again
putting in the PHP string escapes.  And if I need to decipher a regex which
has leaning toothpick syndrome, I tend to cut'n'paste it into a text editor,
and then go through it manually doing the replaces in the way that I've
described them above.

Cheers!

Mike

---------------------------------------------------------------------
Mike Ford,  Electronic Information Services Adviser,
Learning Support Services, Learning & Information Services,
JG125, James Graham Building, Leeds Metropolitan University,
Beckett Park, LEEDS,  LS6 3QS,  United Kingdom
Email: [EMAIL PROTECTED]
Tel: +44 113 283 2600 extn 4730      Fax:  +44 113 283 3211 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to