Re: HTML parsing

Mike Pogue 17 Mar 2000 16:54:04 -0000

I propose we wait and see what Sun has, before we pull the OpenXML one in.

After talking with the IBM folks in Tokyo, I suspect that the IBM HTML parser 
will *not* be suitable
for our needs (at this point, I think OpenXML is better, because it supports 
HTML 4.0, instead of
3.2, and it handles the badly formed case better than the IBM one).


I haven't seen the Sun one, though, so I think we should take a look before we 
start checking in the
OpenXML one.  Let's look at all the possibilities, before we choose one.

Mike

Pierpaolo Fumagalli wrote:
> 
> Mike Pogue wrote:
> >
> > Note that we have a couple of people who would like to donate an
> > HTML parser to xml.apache.org, to be added to Xerces.  The ones I know of
> > are:
> >
> >         ExOffice (extremely well tested, used for web spiders),
> >         Sun (I haven't seen it yet), and
> >         IBM (I haven't seen it yet either).
> >
> > I suspect that if people are interested in this, we ought to have people 
> > look at all three,
> > and figure out whether one is better, or whether they should be merged 
> > somehow before
> > being checked in...assuming there's interest in this!
> >
> > Any volunteers?
> 
> If Assaf doesn't do it... I'll make the package change and pull OpenXML
> into the CVS...
> 
>         Pier
> 
> --
> ----------------------------------------------------------------------
> pier: stable structure erected over water to allow docking of seacraft
> <mailto:[EMAIL PROTECTED]>      <http://www.betaversion.org/~pier/>
> ----------------------------------------------------------------------

Re: HTML parsing

Reply via email to