On a side-note, we only have the classloading problems when running on Java 5, Java 6 works just fine so it seems the implementation of the Java XML-library has changed it's implementation-loading mechanism. Also, forgot to include the link to HTMLParser, so here it is.
[1] htmlparser.sourceforge.net > -----Original Message----- > From: Daan de Wit [mailto:d.de....@wis.nl] > Sent: maandag 23 maart 2009 10:42 > To: tika-dev@lucene.apache.org > Subject: classloading problems with Xerces > > Hi, > > > > We tried to integrate Tika in our product instead of using our own > parsing library, all goes well except for one problem. We use an OSGi > environment, and the Xerces library used by NekoHTML is causing us real > problems with classloading. So we decided to ditch NekoHTML, and use > HTMLParser [1] instead. HTMLParser's SAX implementation has some bugs > though, so we sub-classed it in Tika's HtmlParser class. If there is any > interest, I can create a JIRA-issue and attach the patch there. > > Another minor problem we encountered is that the tests can not be run > without first copying the contents of src/main/resources to > src/main/resources/org/apache/tika. > > > > Daan