Good idea! If I do that, I can extract pieces with XPath and forgo J parsing altogether. In fact, I have a shell script which does exactly that, somewhere....
As you are willing to forgoe J for htis part then I will mention that I have always had excellent luck with Tidy which was originated with the W3C. It is now independantly maintained. The output will be compliant HTML. I have used it to convert random HTML into XHTML and then used an XML parser to turn it into a DOM... it fixes a surprisingly wide range of markup errors. http://tidy.sourceforge.net/ - michael -- - michael dykman - [EMAIL PROTECTED] ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
