SAX needs well-formedness, SAX is equivalent to DOM, they both build the same InfoSet. Every SAX startElement event must have an endElement event to balance it off.
IMG is an empty tag by HTML definition, which means startElement(IMG), endElement(IMG) by SAX definition. HTML parsers are trained to handle IMG in that way, XML parsers are not. There are thousand other problems with HTML that will never allow an XML parser to read it, such as entity references (HTML defines a different set, but can also accept &), implied CDATA sections (SCRIPT/STYLE are CDATA-like, but without having to spell it), self-closing elements (P, LI), boolean attributes, and so on and on. arkin [EMAIL PROTECTED] wrote: > > I was too but this the only thing I can come up with and I'm hoping someone > might be able to correct me: > > The DOM parser is built off the SAX parser which in itself wouldn't need > well-formedness but because the DOM parser needs proper end tags, etc. the > SAX parser does also???? I was assuming that with the SAX parser I could > simply handle startElement() and grab all attributes associated with IMG -- > this doesn't work though because my sample HTML doc is not well-formed -- > certain eng tags are left out (which is acceptable in HTML land). > > -Heather > > -----Original Message----- > From: Ward D. Cannon [mailto:[EMAIL PROTECTED] > Sent: Monday, March 13, 2000 10:57 AM > To: [EMAIL PROTECTED] > Subject: RE: HTML parsing > > Well, I hope it can be done. Couldn't you just trap elements that contain > the tag IMG as you parse the Instance? You know like using startElement and > EndElement. I would be blown away if the Sax Parser couldn't handle this. > Regards, > Ward > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Monday, March 13, 2000 10:32 AM > To: [EMAIL PROTECTED] > Subject: HTML parsing > > For what I can tell, I cannot expect to be able to parse an HTML doc with > the > xerces parser? I was hoping to use the C++ SAX parser to find <IMG> tags > but I > don't think I will be able to do that. Can someone confirm this dreadful > fact? > > Thanks, > Heather Matthews
