On Mon, 21 Sep 2015, Brian Young wrote:
I originally wanted to avoid having to specify a new config because I thought that supplying my own tika XML config meant that I had to redefine everything that would be in the default file. However after some testing it appears that, as in your example, the default tika config is simply modified with the single exclusion that is provided in the custom config?

If you give Tika no config, it'll find what it can to present a sensible default

If you give Tika an explicit config, it'll use that, and hunt for nothing

If you ask the Tika app, it'll spit out a static Tika config based on the current dynamic one, ready for you to customise. (There's talk of some improvements / new options here)

If you give Tika a config with elements missing, it'll use default for those. You can also say "default except" and the define additional excludes / includes on top of the default

There's more information on the site on this:
http://tika.apache.org/1.10/configuring.html

Nick

Reply via email to