Hi,

 

We tried to integrate Tika in our product instead of using our own
parsing library, all goes well except for one problem. We use an OSGi
environment, and the Xerces library used by NekoHTML is causing us real
problems with classloading. So we decided to ditch NekoHTML, and use
HTMLParser [1] instead. HTMLParser's SAX implementation has some bugs
though, so we sub-classed it in Tika's HtmlParser class. If there is any
interest, I can create a JIRA-issue and attach the patch there. 

Another minor problem we encountered is that the tests can not be run
without first copying the contents of src/main/resources to
src/main/resources/org/apache/tika.

 

Daan

Reply via email to