On 1/11/2011 11:13 AM, Richard S. Crawford wrote:
> I'm retrieving CLOB data from an Oracle database, and cleaning up the HTML
> in it. I'm using the following commands:
>     $content =
> strip_tags($description->fields['CONTENT'],'<p><ol><ul><li>');
>     $content = preg_replace("/<p.*>/","<p>",$content);
> The second line is necessary because the <p> tag frequently comes with class
> or style descriptions that must be eliminated.
> This works on the whole except where the <p> tag with the style definition
> is broken up over two or more lines. In other words, something like:
> <p class = "bullettext" style = "line-height: normal
> border: 3;">
> In this case, the second line of my code does not strip the class or style
> definitions from the paragraph tag. I've tried:
> $content = nl2br($content)
> and
> $content = str_replace(chr(13),$content)
> and
> $content = preg_replace("/[".chr(10)."|".chr(13)."]/","",$content)
> (I've read that Oracle uses chr(10) or chr(13) to represent line breaks
> internally, so I decided to give those a try as well.)
> and
> $content = str_replace(array('\n','\r','\r\n'),$content)
> all to no avail; these all leave the line break intact, which means my
> preg_replace('/<p.*>/','<p>',$content) line still breaks.
> Anyone have any ideas?


Looks like you need to read up on the modifiers for preg_* functions.  Start
here:  http://us3.php.net/manual/en/reference.pcre.pattern.modifiers.php

I would change your "second line" regex to the following.

$content = preg_replace("/<p.*>/is", "<p>", $content);

The modifiers after the second / are

i = case-insensitive
s = include new lines in your '.' character match.
    New lines are excluded by default.

Can't remember right now, nor do I have the time to test, you might need to
invert the greediness of the match using a 'U' after the second / also.


$content = preg_replace("/<p.*>/isU", "<p>", $content);


Let us know how this works out for you.

Jim Lucas

PS: you might want to swap the order of these two statements.

