The IBM HTML parser code isn't public. I talked to the IBM group who wrote it, and it's in Java, and I think it does less than what Sun and ExOffice have. So, I don't think it's an option. When we get the HTML parser into the Java code base, it would be great to get it ported to work with the C++ parser!
Mike Michael Mason wrote: > > There was lots of discussion on this, but not really much of a > conclusion. We're currently using Tidy as a preprocessor for nasty > random HTML pages, but it seems to be overkill. It does lots of stuff so > we get perfect HTML out the other end, rather than just creating > something that's well formed. > > I had a look at OpenXML, but it's Java and we need something in C or > C++. I can't seem to find the IBM parser that was mentioned. Does anyone > have further pointers they can give me? > > Cheers, > Mike. > > -- > Mike Mason, Software Engineer > XML Script Development Team Office: 44-1865-203192 > http://www.xmlscript.org/ Mobile: 44-7050-288923