>My site accepts HTML files by upload. A lot of these files are written in MS >Word and then saved as HTML files from that. MS Word likes to put a bunch of >garbage at the beginning of the file. Now, when users upload their HTML >files, my script goes and striptags all of the unnecessary junk in there >except it can't rid all this junk (HTML, XML, CSS, JavaScript) at the >beginning of the HTML file.
But those are all enclosed in HTML tags, even with something as sucky as MS Word involved. >Some of these tags span multiple lines, and my >script goes through line-by-line, so it won't identify these as tags. Is >there a simpler fashion? There's your true problem. An HTML tag can span multiple lines, regardless of where it comes from. Even my hand-coded HTML will occasionally end up with a multi-line HTML tag... Well, okay, maybe not, but I could if I wanted to :-) You need to http://php.net/implode all your HTML into one big long string *before* you strip_tags: $html = implode('', $html); $html = strip_tags($html); If you really need the multi-line HTML turned into an array after that, you can do: $html = explode("\n", $html); But you probably are storing this stuff in a file or database, and it's just as easy to fwrite the large string as to mess with it as an array. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php