I propose we wait and see what Sun has, before we pull the OpenXML one in. After talking with the IBM folks in Tokyo, I suspect that the IBM HTML parser will *not* be suitable for our needs (at this point, I think OpenXML is better, because it supports HTML 4.0, instead of 3.2, and it handles the badly formed case better than the IBM one).
I haven't seen the Sun one, though, so I think we should take a look before we start checking in the OpenXML one. Let's look at all the possibilities, before we choose one. Mike Pierpaolo Fumagalli wrote: > > Mike Pogue wrote: > > > > Note that we have a couple of people who would like to donate an > > HTML parser to xml.apache.org, to be added to Xerces. The ones I know of > > are: > > > > ExOffice (extremely well tested, used for web spiders), > > Sun (I haven't seen it yet), and > > IBM (I haven't seen it yet either). > > > > I suspect that if people are interested in this, we ought to have people > > look at all three, > > and figure out whether one is better, or whether they should be merged > > somehow before > > being checked in...assuming there's interest in this! > > > > Any volunteers? > > If Assaf doesn't do it... I'll make the package change and pull OpenXML > into the CVS... > > Pier > > -- > ---------------------------------------------------------------------- > pier: stable structure erected over water to allow docking of seacraft > <mailto:[EMAIL PROTECTED]> <http://www.betaversion.org/~pier/> > ----------------------------------------------------------------------
