Heather,

You can look into HTMLTidy utility: http://www.w3.org/People/Raggett/tidy/

Partially supported by HP, BTW.

Thanks,

Dmitry Volpyansky

----- Original Message -----
From: "Cox Andy" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, March 13, 2000 1:07 PM
Subject: RE: HTML parsing


> If the HTML is not well-formed XML (which most is not), you are correct.
>
> Andy
>
> | -----Original Message-----
> | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> | Sent: Monday, March 13, 2000 10:32 AM
> | To: [EMAIL PROTECTED]
> | Subject: HTML parsing
> |
> |
> | For what I can tell, I cannot expect to be able to parse an HTML doc
with
> | the xerces parser?  I was hoping to use the C++ SAX parser to find <IMG>
> | tags but I don't think I will be able to do that.  Can someone confirm
> this
> | dreadful fact?
>
>

Reply via email to