The IBM HTML parser code isn't public.  I talked to the IBM group who wrote it, 
and it's in Java,
and I think it does less than what Sun and ExOffice have.  So, I don't think 
it's an option.  When
we get the HTML parser into the Java code base, it would be great to get it 
ported to work with the
C++ parser!

Mike

Michael Mason wrote:
> 
> There was lots of discussion on this, but not really much of a
> conclusion. We're currently using Tidy as a preprocessor for nasty
> random HTML pages, but it seems to be overkill. It does lots of stuff so
> we get perfect HTML out the other end, rather than just creating
> something that's well formed.
> 
> I had a look at OpenXML, but it's Java and we need something in C or
> C++. I can't seem to find the IBM parser that was mentioned. Does anyone
> have further pointers they can give me?
> 
> Cheers,
> Mike.
> 
> --
> Mike Mason, Software Engineer
> XML Script Development Team                    Office: 44-1865-203192
> http://www.xmlscript.org/                      Mobile: 44-7050-288923

Reply via email to