On Tue, 14 Feb 2012, Stephan Mühlstrasser wrote:
https://issues.apache.org/jira/browse/TIKA-527

Is there any documentation of the syntax of the configuration file available?

You could look at the code that process the file, but the example in that JIRA ought to cover most uses cases


The problem is that using the proposed method does not work for me. Any use of the configuration file apparently sends Tika into an endless recursion, even without overriding a built-in parser in the configuration file.

Are you able to produce a unit test that shows the problem?


If I understand it correctly, the following configuration file should have the same effect as the built-in configuration:

$ cat tika-config.xml
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
</properties>

Ah, I'm not sure that's correct. I think you also need to give a mimetypes and a detector. Looking at lines 145 to 172 of TikaConfig, it seems that you either get the defaults with no config, or specify them all with your own config

Nick

Reply via email to