On Mon, 21 Sep 2015, Brian Young wrote:
I originally wanted to avoid having to specify a new config because I
thought that supplying my own tika XML config meant that I had to
redefine everything that would be in the default file. However after
some testing it appears that, as in your example, the default tika
config is simply modified with the single exclusion that is provided in
the custom config?
If you give Tika no config, it'll find what it can to present a sensible
default
If you give Tika an explicit config, it'll use that, and hunt for nothing
If you ask the Tika app, it'll spit out a static Tika config based on the
current dynamic one, ready for you to customise. (There's talk of some
improvements / new options here)
If you give Tika a config with elements missing, it'll use default for
those. You can also say "default except" and the define additional
excludes / includes on top of the default
There's more information on the site on this:
http://tika.apache.org/1.10/configuring.html
Nick