On 28/08/07, Brian Rue <[EMAIL PROTECTED]> wrote:
> Sure, I'll break it apart a little:
Er, wow, thanks. Lots of material here...
> '{(?=<p(?:>|\s)(?!.*<p(?:>|\s)))}is'
>
> $regex = '{' . // opening delimeter
> '(?=' . // positive lookahead: match the beginning of a position
> // that matches the following pattern:
> '<p' . // first part of an opening <p> tag
> '(?:' . // non-capturing parenthesis (same as normal
> // parenthesis, but a bit faster since we don't
> // need to capture what they match for use later
> '>|\s' . // match a closing > or a space
> ')' . // end capturing paranthesis
> '(?!' . // negative lookahead: the match will fail if the
> //following pattern matches from the current position
> '.*' . // match until the end of the string
> '<p(?:>|\s)' . // same as above - look for another <p> tag
> ')' . // end negative lookahead
> ')' . // end positive lookahead
> '}is'; // ending delimeter, and use modifiers s and i
It was the negative lookahead that confused me, I see. The rest seems
pretty straightforward. Difficult, but straightforward.
>
> About the modifiers: i makes it case-insensitive, and s turns on
> dot-matches-all-mode (including newlines)--otherwise, the . would only match
> until the next newline.
Yes, this I know.
> The regex has two parts: matching a <p> tag, and then making sure there
> aren't any more <p> tags in the string following it. The positive lookahead
> is (hopefully) pretty straightforward. The negative lookahead works by using
> a greedy (regular) .*, which forces the regex engine to match all the way to
> the end of the haystack. Then it encounters the <p(?:>\s) part, forcing it
> to backtrack until it finds a <p> tag. If it doesn't find one before
> returning to the 'current' position (directly after the <p> tag we just
> matched), then we know we have found the last <p> tag.
Nice. Very nice.
> The positive and negative lookahead are 'zero-width' requirements, which
> means they don't advance the regex engine's pointer in the haystack string.
> Since the entire regex is zero-width, the replacement string gets inserted
> at the matched position.
Hmm.
> I hope that made at least a little bit of sense :) If you're doing a lot of
> regex work, I would strongly recommend reading the book Mastering Regular
> Expressions by Jeffrey Friedl... it's very well written and very helpful.
I don't do a lot, but it's a great tool to know when one needs it!
Thank you for the patient explanations.
Just a general note, both these addresses are 404 right now:
http://il.php.net/manual/en/pcre.pattern.modifiers.php
http://uk.php.net/manual/en/pcre.pattern.syntax.php
Dotan Cohen
http://lyricslist.com/
http://what-is-what.com/
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php