Hi,
In Nutch we have a copy of Tika-core. But with just that lib we also have
access to the Tika.parser API from the other module. How does this all work
because i have had confusing results in the past (and now).
Right now we've added a class to org.apache.tika.parser.html but we get a
ClassNotFound with a newly compiled Tika. Our code compiles when we add tika-
parsers to the classpath, but when we run we get some obscure exception:
Exception in thread "main" java.lang.NoClassDefFoundError: Could not
initialize class org.apache.tika.parser.dwg.DWGParser
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at sun.misc.Service$LazyIterator.next(Service.java:271)
at org.apache.nutch.parse.tika.TikaConfig.<init>(TikaConfig.java:149)
at
org.apache.nutch.parse.tika.TikaConfig.getDefaultConfig(TikaConfig.java:211)
at org.apache.nutch.parse.tika.TikaParser.setConf(TikaParser.java:255)
at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:162)
at
org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:132)
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71)
at org.apache.nutch.parse.ParserChecker.run(ParserChecker.java:101)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:138)
When we previously patched Tika in the core module all went perfectly well but
patching the parser module and getting it all compiled in tike-core.jar seems
tricky. Any advice? What am i missing? How do the parser libs end up in the
core jar?
Thanks