>My site accepts HTML files by upload. A lot of these files are written in MS
>Word and then saved as HTML files from that. MS Word likes to put a bunch of
>garbage at the beginning of the file. Now, when users upload their HTML
>files, my script goes and striptags all of the unnecessary junk in there
>beginning of the HTML file.
But those are all enclosed in HTML tags, even with something as sucky as MS
>Some of these tags span multiple lines, and my
>script goes through line-by-line, so it won't identify these as tags. Is
>there a simpler fashion?
There's your true problem.
An HTML tag can span multiple lines, regardless of where it comes from.
Even my hand-coded HTML will occasionally end up with a multi-line HTML
tag... Well, okay, maybe not, but I could if I wanted to :-)
You need to http://php.net/implode all your HTML into one big long string
*before* you strip_tags:
$html = implode('', $html);
$html = strip_tags($html);
If you really need the multi-line HTML turned into an array after that, you
$html = explode("\n", $html);
But you probably are storing this stuff in a file or database, and it's just
as easy to fwrite the large string as to mess with it as an array.
Like Music? http://l-i-e.com/artists.htm
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php