Hi,
We tried to integrate Tika in our product instead of using our own parsing library, all goes well except for one problem. We use an OSGi environment, and the Xerces library used by NekoHTML is causing us real problems with classloading. So we decided to ditch NekoHTML, and use HTMLParser [1] instead. HTMLParser's SAX implementation has some bugs though, so we sub-classed it in Tika's HtmlParser class. If there is any interest, I can create a JIRA-issue and attach the patch there. Another minor problem we encountered is that the tests can not be run without first copying the contents of src/main/resources to src/main/resources/org/apache/tika. Daan