I'm retrieving CLOB data from an Oracle database, and cleaning up the HTML
in it. I'm using the following commands:

    $content =
strip_tags($description->fields['CONTENT'],'<p><ol><ul><li>');
    $content = preg_replace("/<p.*>/","<p>",$content);

The second line is necessary because the <p> tag frequently comes with class
or style descriptions that must be eliminated.

This works on the whole except where the <p> tag with the style definition
is broken up over two or more lines. In other words, something like:

<p class = "bullettext" style = "line-height: normal
border: 3;">

In this case, the second line of my code does not strip the class or style
definitions from the paragraph tag. I've tried:

$content = nl2br($content)

and

$content = str_replace(chr(13),$content)

and

$content = preg_replace("/[".chr(10)."|".chr(13)."]/","",$content)
(I've read that Oracle uses chr(10) or chr(13) to represent line breaks
internally, so I decided to give those a try as well.)

and

$content = str_replace(array('\n','\r','\r\n'),$content)

all to no avail; these all leave the line break intact, which means my
preg_replace('/<p.*>/','<p>',$content) line still breaks.

Anyone have any ideas?

-- 
Sl√°inte,
Richard S. Crawford (rich...@underpope.com)
http://www.underpope.com

Reply via email to