Hi,

In Nutch we have a copy of Tika-core. But with just that lib we also have 
access to the Tika.parser API from the other module. How does this all work 
because i have had confusing results in the past (and now).

Right now we've added a class to org.apache.tika.parser.html but we get a 
ClassNotFound with a newly compiled Tika. Our code compiles when we add tika-
parsers to the classpath, but when we run we get some obscure exception:

Exception in thread "main" java.lang.NoClassDefFoundError: Could not 
initialize class org.apache.tika.parser.dwg.DWGParser
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at sun.misc.Service$LazyIterator.next(Service.java:271)
        at org.apache.nutch.parse.tika.TikaConfig.<init>(TikaConfig.java:149)
        at 
org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211)
        at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:255)
        at 
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
        at 
org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132)
        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71)
        at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)

When we previously patched Tika in the core module all went perfectly well but 
patching the parser module and getting it all compiled in tike-core.jar seems 
tricky. Any advice? What am i missing? How do the parser libs end up in the 
core jar?

Thanks

Reply via email to