[PHP] Re: Simple XML - problem with errors

2010-07-08 Thread Gary .
Okay. At least one of the problems with this so called HTML seems to
be that the body tag looks like
BODY vlink=#ff ...
and xml_parse complains that  required on that line (i.e. it is
claiming it can't find the end of the tag!).

I'm guessing that those attributes must be quoted in XML and
should be in HTML (but patently aren't)? Is there any way to get
xml_parse to ignore that? My element_handler functions never even get
a chance to see that line.

Regex to insert quotes or remove the attributes entirely, perhaps?
*gulp* I hope there's a better way than that.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Simple XML - problem with errors

2010-07-08 Thread Richard Quadling
On 8 July 2010 16:15, Gary . php-gene...@garydjones.name wrote:
 Okay. At least one of the problems with this so called HTML seems to
 be that the body tag looks like
 BODY vlink=#ff ...
 and xml_parse complains that  required on that line (i.e. it is
 claiming it can't find the end of the tag!).

 I'm guessing that those attributes must be quoted in XML and
 should be in HTML (but patently aren't)? Is there any way to get
 xml_parse to ignore that? My element_handler functions never even get
 a chance to see that line.

 Regex to insert quotes or remove the attributes entirely, perhaps?
 *gulp* I hope there's a better way than that.

So. Essentially, you want to parse some plain text which may or may
not be well formed XML.

In short ... good luck.

How badly formed is the file going to be?

If it is things like missing , then this could be managed with regex.
Essentially you are going to have to do the clean up that Tidy could
do for you.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: Simple XML - problem with errors

2010-07-08 Thread Nisse Engström
On Thu, 8 Jul 2010 17:15:02 +0200, Gary . wrote:

 Okay. At least one of the problems with this so called HTML seems to
 be that the body tag looks like
 BODY vlink=#ff ...
 and xml_parse complains that  required on that line (i.e. it is
 claiming it can't find the end of the tag!).
 
 I'm guessing that those attributes must be quoted in XML and
 should be in HTML (but patently aren't)?

For that attribute value, it's a must in both cases.
And for strict versions of (X)HTML, the attribute does
not exist at all.


/Nisse

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Simple XML - problem with errors

2010-07-08 Thread Gary .
On 7/8/10, Richard Quadling wrote:
 On 8 July 2010 16:15, Gary wrote:
 Okay. At least one of the problems with this so called HTML seems to
 be that the body tag looks like
 BODY vlink=#ff ...
 and xml_parse complains that  required on that line (i.e. it is
 claiming it can't find the end of the tag!).

 I'm guessing that those attributes must be quoted in XML and
 should be in HTML (but patently aren't)? Is there any way to get
 xml_parse to ignore that? My element_handler functions never even get
 a chance to see that line.

 Regex to insert quotes or remove the attributes entirely, perhaps?
 *gulp* I hope there's a better way than that.

 So. Essentially, you want to parse some plain text which may or may
 not be well formed XML.

No. I don't *want* to And it isn't plain text, it's just sh*t html
(no doctype,  missing closing tags in some cases, etc. It's an
absolute mess). Browsers are pretty good at handling it. XML
parsers... less so.

 How badly formed is the file going to be?

It's not a file. It comes from an embedded web server on a device. I
could ask them to change it. I can hear the laughter already.

 If it is things like missing , then this could be managed with regex.
 Essentially you are going to have to do the clean up that Tidy could
 do for you.

Yeah :(

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Simple XML - problem with errors

2010-07-08 Thread Gary .
On 7/8/10, Nisse Engström wrote:
 On Thu, 8 Jul 2010 17:15:02 +0200, Gary . wrote:

 I'm guessing that those attributes must be quoted in XML and
 should be in HTML (but patently aren't)?

 For that attribute value, it's a must in both cases.

Okay. Please tell L**! :)

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php