On 01 October 2003 21:02, [EMAIL PROTECTED] wrote: > 1. The PHP manual sais to escape the escape char, it has to > be written > twice*, but:
Yes, it does. But it also says that to put a \ into a string, you need to write it twice ("escape" it) ***. So: > $term = preg_replace('/(\\)/', 'backslash $1', $term); > causes an error** while using three backslashes (see 2.) works. PHP's string processing replaces that \\ with a single \, so preg_replace sees the pattern as /(\)/ which is invalid (as it needs the \ to be "escaped", or doubled). > 2.1. > $term = "beg \ end"; Here, because backslash-space is not a valid (PHP) escape sequence, PHP passes it unchanged. > print preg_replace('/(\\\)/', 'backslash $1', $term); But this gets its number of backslahes reduced by one -- the initial \\ is replaced by \, but then PHP's string processor sees \), which is not a valid PHP escape sequnce so is passed unchanged -- i.e. as \) . So preg_replace sees a pattern of /(\\)/, which now contains a (preg) escape sequence of \\, validly representing a single \ in the pattern. And your string contains a single \ (remember?), so it matches. > returns: beg backslash \ end > > 2.2. > $term = "beg \\ end"; This string contains the (PHP) escape sequence \\, which is reduced to a single \ by php's string processing -- so this $term is, in fact, identical to the one in 2.1. > print preg_replace('/(\\\)/', 'backslash $1', $term); > returns: beg backslash \ end (the same as 2.1.) QED > > 2.3. > $term = "beg \\\ end"; > print preg_replace('/(\\\)/', 'backslash $1', $term); > returns: beg backslash \backslash \ end And here, the string in $term has its triple \\\ reduced to double \\ (by the same reasoning as before), and the pattern is (as before) matching a single backslash. So each of the two backslashes in $term is replaced by "backslash", a space, and the matched \, giving your result. Again, QED. The real trick here is that there are *two* levels of \-escaping going on -- PHP's string processing does one, and the regex interpreter does a second. So to write an absolutely guaranteed cast-iron regex fragment that will match a single backslash, you actually have to include *four* backslashes in your PHP script: preg_replace('/(\\\\)/', ...) PHP's string processing will reduce the four backslashes to two in the actual string passed to preg_replace, and then the regex interpreter will treat the \\ so passed as a valid escape sequence to match a single \. This is called backslash proliferation, or "leaning toothpick" syndrome. Why is it important to use all four backslashes, even when three seems to do it? Well, consider this: preg_replace("/\\\test/", ...) Here, the first two \\ will be reduced to \ (ok), but the next two characters, \t, are the PHP escape sequence for a tab character (oops!), so your string will end up containing a \, a tab, and the three characters est -- probably not what was intended! Hope this helps you build your regexes better. Personally, my approach is often to write my regex without bothering to escape any backslashes, then go through putting in the required regex escapes, and then go through again putting in the PHP string escapes. And if I need to decipher a regex which has leaning toothpick syndrome, I tend to cut'n'paste it into a text editor, and then go through it manually doing the replaces in the way that I've described them above. Cheers! Mike --------------------------------------------------------------------- Mike Ford, Electronic Information Services Adviser, Learning Support Services, Learning & Information Services, JG125, James Graham Building, Leeds Metropolitan University, Beckett Park, LEEDS, LS6 3QS, United Kingdom Email: [EMAIL PROTECTED] Tel: +44 113 283 2600 extn 4730 Fax: +44 113 283 3211 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php