RE: HTML parsing

heather_matthews 13 Mar 2000 21:47:21 -0000

I was too but this the only thing I can come up with and I'm hoping someone
might be able to correct me:


The DOM parser is built off the SAX parser which in itself wouldn't need
well-formedness but because the DOM parser needs proper end tags, etc. the
SAX parser does also????  I was assuming that with the SAX parser I could
simply handle startElement() and grab all attributes associated with IMG --
this doesn't work though because my sample HTML doc is not well-formed --
certain eng tags are left out (which is acceptable in HTML land).

-Heather

-----Original Message-----
From: Ward D. Cannon [mailto:[EMAIL PROTECTED]
Sent: Monday, March 13, 2000 10:57 AM
To: [EMAIL PROTECTED]
Subject: RE: HTML parsing




Well, I hope it can be done. Couldn't you just trap elements that contain
the tag IMG as you parse the Instance? You know like using startElement and
EndElement. I would be blown away if the Sax Parser couldn't handle this.
Regards,
Ward

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Monday, March 13, 2000 10:32 AM
To: [EMAIL PROTECTED]
Subject: HTML parsing


For what I can tell, I cannot expect to be able to parse an HTML doc with
the
xerces parser?  I was hoping to use the C++ SAX parser to find <IMG> tags
but I
don't think I will be able to do that.  Can someone confirm this dreadful
fact?


Thanks,
Heather Matthews

RE: HTML parsing

Reply via email to