On 8 July 2010 16:15, Gary . <php-gene...@garydjones.name> wrote:
> Okay. At least one of the problems with this so called HTML seems to
> be that the body tag looks like
> <BODY vlink=#ffffff ...>
> and xml_parse complains that "> required" on that line (i.e. it is
> claiming it can't find the end of the tag!).
> I'm guessing that those attributes "must" be quoted in XML and
> "should" be in HTML (but patently aren't)? Is there any way to get
> xml_parse to ignore that? My element_handler functions never even get
> a chance to see that line.
> Regex to insert quotes or remove the attributes entirely, perhaps?
> *gulp* I hope there's a better way than that.

So. Essentially, you want to parse some plain text which may or may
not be well formed XML.

In short ... good luck.

How badly formed is the file going to be?

If it is things like missing ", then this could be managed with regex.
Essentially you are going to have to do the clean up that Tidy could
do for you.

PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to