The xhtml parser from Sun is an internal only version which will be made available for Apache as soon as the licensing issues are cleared.
- Rajiv On Mon, 13 Mar 2000, Mike Pogue wrote: > Note that we have a couple of people who would like to donate an > HTML parser to xml.apache.org, to be added to Xerces. The ones I know of > are: > > ExOffice (extremely well tested, used for web spiders), > Sun (I haven't seen it yet), and > IBM (I haven't seen it yet either). > > I suspect that if people are interested in this, we ought to have people look > at all three, > and figure out whether one is better, or whether they should be merged > somehow before > being checked in...assuming there's interest in this! > > Any volunteers? > > Mike > > Cox Andy wrote: > > > > If the HTML is not well-formed XML (which most is not), you are correct. > > > > Andy > > > > | -----Original Message----- > > | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > | Sent: Monday, March 13, 2000 10:32 AM > > | To: [EMAIL PROTECTED] > > | Subject: HTML parsing > > | > > | > > | For what I can tell, I cannot expect to be able to parse an HTML doc with > > | the xerces parser? I was hoping to use the C++ SAX parser to find <IMG> > > | tags but I don't think I will be able to do that. Can someone confirm > > this > > | dreadful fact? >
