Heather, You can look into HTMLTidy utility: http://www.w3.org/People/Raggett/tidy/
Partially supported by HP, BTW. Thanks, Dmitry Volpyansky ----- Original Message ----- From: "Cox Andy" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, March 13, 2000 1:07 PM Subject: RE: HTML parsing > If the HTML is not well-formed XML (which most is not), you are correct. > > Andy > > | -----Original Message----- > | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > | Sent: Monday, March 13, 2000 10:32 AM > | To: [EMAIL PROTECTED] > | Subject: HTML parsing > | > | > | For what I can tell, I cannot expect to be able to parse an HTML doc with > | the xerces parser? I was hoping to use the C++ SAX parser to find <IMG> > | tags but I don't think I will be able to do that. Can someone confirm > this > | dreadful fact? > >