There was lots of discussion on this, but not really much of a conclusion. We're currently using Tidy as a preprocessor for nasty random HTML pages, but it seems to be overkill. It does lots of stuff so we get perfect HTML out the other end, rather than just creating something that's well formed.
I had a look at OpenXML, but it's Java and we need something in C or C++. I can't seem to find the IBM parser that was mentioned. Does anyone have further pointers they can give me? Cheers, Mike. -- Mike Mason, Software Engineer XML Script Development Team Office: 44-1865-203192 http://www.xmlscript.org/ Mobile: 44-7050-288923
