Added: https://issues.apache.org/jira/browse/TIKA-2491
To be clear, it really works well outside of Nutch. Thanks again! Markus -----Original message----- > From:Allison, Timothy B. <[email protected]> > Sent: Friday 3rd November 2017 16:40 > To: [email protected] > Subject: RE: Using TikaConfig troubles > > Ugh. Sorry. I'll take a look. Can you share your custom config file? This > sounds like a bug, so please hang it on a new issue. ☹ > > -----Original Message----- > From: Markus Jelsma [mailto:[email protected]] > Sent: Friday, November 3, 2017 11:12 AM > To: [email protected] > Subject: Using TikaConfig troubles > > Hello, > > I need to use a custom tika-config.xml in Nutch, which has support for it but > i can't get it to work. > > This is how Nutch gets the parser: > Parser parser = tikaConfig.getParser(MediaType.parse(mimeType)); > > When no custom config is specified config is: > new TikaConfig(this.getClass().getClassLoader()); > > When i specify a custom config, it is: > tikaConfig = new TikaConfig(conf.getResource(customConfFile)); > > getParser always returns null with a custom config file. There are no errors > or exceptions. The config is fine, it fixed the encoding problem in a parser > outside of Nutch (thanks again Timothy) but i need to get it to work in Nutch > too. > > Our external project does: > AutoDetectParser parser = new AutoDetectParser(tikaConfig); parser.parse(..); > > and it just works! If i do this in Nutch, however, nothing is passed through > the content handlers, the parser result is completely empty? HUH?!? > > Any tips would be great! > > Many thanks, > Markus > > >
