I was too but this the only thing I can come up with and I'm hoping someone might be able to correct me:
The DOM parser is built off the SAX parser which in itself wouldn't need well-formedness but because the DOM parser needs proper end tags, etc. the SAX parser does also???? I was assuming that with the SAX parser I could simply handle startElement() and grab all attributes associated with IMG -- this doesn't work though because my sample HTML doc is not well-formed -- certain eng tags are left out (which is acceptable in HTML land). -Heather -----Original Message----- From: Ward D. Cannon [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2000 10:57 AM To: [EMAIL PROTECTED] Subject: RE: HTML parsing Well, I hope it can be done. Couldn't you just trap elements that contain the tag IMG as you parse the Instance? You know like using startElement and EndElement. I would be blown away if the Sax Parser couldn't handle this. Regards, Ward -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, March 13, 2000 10:32 AM To: [EMAIL PROTECTED] Subject: HTML parsing For what I can tell, I cannot expect to be able to parse an HTML doc with the xerces parser? I was hoping to use the C++ SAX parser to find <IMG> tags but I don't think I will be able to do that. Can someone confirm this dreadful fact? Thanks, Heather Matthews