On Tue, 14 Feb 2012, Stephan Mühlstrasser wrote:
https://issues.apache.org/jira/browse/TIKA-527
Is there any documentation of the syntax of the configuration file
available?
You could look at the code that process the file, but the example in that
JIRA ought to cover most uses cases
The problem is that using the proposed method does not work for me. Any
use of the configuration file apparently sends Tika into an endless
recursion, even without overriding a built-in parser in the
configuration file.
Are you able to produce a unit test that shows the problem?
If I understand it correctly, the following configuration file should have
the same effect as the built-in configuration:
$ cat tika-config.xml
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
</parsers>
</properties>
Ah, I'm not sure that's correct. I think you also need to give a mimetypes
and a detector. Looking at lines 145 to 172 of TikaConfig, it seems that
you either get the defaults with no config, or specify them all with your
own config
Nick