On Wed, Jan 12, 2011 at 8:24 AM, Randal Rust <randalr...@gmail.com> wrote: > I am admittedly not very good with regular expressions. I am trying to > pull all of the paragraphs out of an article, so that I can create > inline links. Here is my script: > > $blockpattern='/<p*[^>]*>.*?<\/p>/'; > $blocks=preg_match_all($blockpattern, $txt, $blockmatches); >
You really don't want the * after that first p, because this: /<p*[^>]*>/ Means, essentially, "Match a `<` character, then any number of `p` (including 0), then a bunch of things that aren't `>`". This regex will match any pair of `<...>` -- i.e. any opening and closing html tag in your document. Dropping the first * will get you closer: /<p[^>]*>/ But that's still not right, as it'll get false positives on `<pre>` and `<param>` tags. Instead use this: /<p(\s+[^>]*)?>/ Which only matches that "a bunch of things that aren't `>`" if there's a space between the `p` and whatever comes next. The second half of your regex is right, but it does have the newline problem you mentioned. To get `.` to match newline characters, use the `dotall` flag by adding `s` after the final slash: /<p(\s+[^>]*)?>.*?<\/p>/s So that leaves us with: $blockpattern = '/<p(\s+[^>]*)?>.*?<\/p>/s'; -- http://justinhileman.com _______________________________________________ New York PHP Users Group Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk http://www.nyphp.org/Show-Participation