Re: HTML parsing

Rajiv Mordani 14 Mar 2000 00:05:40 -0000

The xhtml parser from Sun is an internal only version which will be made
available for Apache as soon as the licensing issues are cleared.


- Rajiv

On Mon, 13 Mar 2000, Mike Pogue wrote:

> Note that we have a couple of people who would like to donate an 
> HTML parser to xml.apache.org, to be added to Xerces.  The ones I know of
> are:  
> 
>       ExOffice (extremely well tested, used for web spiders), 
>       Sun (I haven't seen it yet), and
>       IBM (I haven't seen it yet either).
> 
> I suspect that if people are interested in this, we ought to have people look 
> at all three,
> and figure out whether one is better, or whether they should be merged 
> somehow before
> being checked in...assuming there's interest in this!
> 
> Any volunteers?
> 
> Mike
> 
> Cox Andy wrote:
> > 
> > If the HTML is not well-formed XML (which most is not), you are correct.
> > 
> > Andy
> > 
> > | -----Original Message-----
> > | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > | Sent: Monday, March 13, 2000 10:32 AM
> > | To: [EMAIL PROTECTED]
> > | Subject: HTML parsing
> > |
> > |
> > | For what I can tell, I cannot expect to be able to parse an HTML doc with
> > | the xerces parser?  I was hoping to use the C++ SAX parser to find <IMG>
> > | tags but I don't think I will be able to do that.  Can someone confirm
> > this
> > | dreadful fact?
>

Re: HTML parsing

Reply via email to