Re: [PHP] perl regex in php and multiple escape rules

2004-09-14 Thread Wouter van Vliet
On Tue, 14 Sep 2004 11:18:33 +0200, Christophe Chisogne
[EMAIL PROTECTED] wrote:
 
 In a word:
 
 I'm looking for more detailed information about preg_replace
 (and other perl regex functions) than in the php manual,
 specifically about different escape rules interaction.
 
 In more words:
 
 PHP has it's own way of escaping strings [2]
 Ex \ within '' is '\' (or '\\' if at the end or before ' )
 \ within  is \ (or \\ if at the end or before  )
 So  \\  can be written '\\\' or '' or \\\ or 
 and \\\ can be written '\' or '\\' (same with  )
 (rule 1)
 
 Perl regex are powerfull and came with other escape rules [3]
 Ex regex to match... is ...
   \  /\\/
(newline)  \n /\n/
(2 chars)  \n /\\n/
 (rule 2)
 
 My problem is about preg_replace function, because it's entry in
 the php manual [1] is not specific enough -- I mean, writing
 a real specification seems impossible without more details
 
 The 'pattern' argument is a string, but how does php proceed it?
 I guess it first uses rule1 then rule2, ie php string escape rule
 (for '  and \ ) then perl regex rule (via verbatim use in perlre C library?)
 
 This mean that to match \n (the 2 chars), the perl re is \\n
 so correct php pattern is '\\\n' or 'n' or \\\n or n.
 (see comment 29-Mar-2004 05:46 on [1]). Is this right?
 /me think using perl regex is easier in perl than in php ;-)
 
 Is it the same for the 'replacement' argument?
 
 Another comment (steven -a-t- acko dot net, 08-Feb-2004 12:45) says
 To make this easier, the data in a backreference with /e is run through
   addslashes() before being inserted in your replacement expression.
 Is that user right?
 
 Ok, I can try to guess answers to my questions by probing things.
 But that didnt tell me if my guesses are wrong, or if what I guess
 is exactly what php pcre functions are supposed to do
 (not only now with php x.y.z but in the future too).
 And I prefer specifications over guesses.
 (think about ppl using alt attribute instead of title
   on img html tags : they guessed wrong by not reading html spec)
 
 In other words, is there some details about escape rules
 in pcre php functions? I feel much better when I can use
 a stable, reliable and precise API.
 
 Christophe
 
 [1] preg_replace in php manual
 http://www.php.net/manual/en/function.preg-replace.php
 
 [2] strings in php manual
 http://www.php.net/manual/en/language.types.string.php
 
 [3] pcre syntax in php manual
 http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 

It's all very easy, actually. You take the real regex, and convert
that to a php string.

for example:

to match the two chars \n, in perl you'd do: 

  /\\n/

php requires each slash to be slashed again, so you'd get

 $regex = '/n/';

whenever you're in doubt, put the regex into a var, print that var and
if that what you get is exactly the regex you'd use in perl, you're
good. And yes, I do agree with anybody who'd state that it's a bit
confusing. Cuz it is!

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] perl regex in php and multiple escape rules

2004-09-14 Thread Ford, Mike
On 14 September 2004 10:19, Christophe Chisogne wrote:

 I'm looking for more detailed information about preg_replace
 (and other perl regex functions) than in the php manual,
 specifically about different escape rules interaction.

[]

 The 'pattern' argument is a string, but how does php proceed it?
 I guess it first uses rule1 then rule2, ie php string escape rule
 (for '  and \ ) then perl regex rule (via verbatim use in
 perlre C library?)

It's really very simple: (1) PHP processes the string, doing all PHP escape 
substitutions, and hands the result off to the pcre library; (2) pcre applies its 
escapes to the received string and then performs the match.

 This mean that to match \n (the 2 chars), the perl re is \\n
 so correct php pattern is '\\\n' or 'n' or \\\n or n.
 (see comment 29-Mar-2004 05:46 on [1]). Is this right?

Not quite, since \n is also a valid escape sequence in PHP when used in double-quoted 
strings; so \\\n will be escape-processed by PHP to give the two-character sequence 
backslash-newline, whereas n will yield the 3-character sequence 
backslash-backslash-n.  Of course, this is probably a moot point since pcre will then 
interpret the backslash-n sequence in the latter into a newline, but it's as well to 
be aware of it anyway.

 In other words, is there some details about escape rules
 in pcre php functions? I feel much better when I can use
 a stable, reliable and precise API.

Well, the way I usually do it is to use the reverse order of the two rules above, so:

- construct your pcre regex, including any necessary pcre escapes

- then run through it inserting PHP escapes; this is a lot easier if you use 
single-quoted PHP strings, since the only valid PHP escape sequence in that case is 
\', so only backslashes preceding a ' or a \ actually require doubling.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php