On Tue, 25 Aug 2015, Mikhail Titov wrote:
The following will break automatic parser calling for text/toa5 in 1.10
but not in 1.9
,----[ tika config xml ]
| <?xml version="1.0" encoding="UTF-8"?>
| <properties>
| <parsers>
| <!-- <parser class="org.apache.tika.parser.DefaultParser"> -->
| <!-- <mime>text/toa5</mime> -->
| <!-- </parser> -->
| <parser class="my.tika.parser.ExcelParser">
| <parser class="org.apache.tika.parser.DefaultParser" />
| </parser>
| </parsers>
| </properties>
You probably shouldn't be defining additional mimetypes to DefaultParser.
Give it child parsers that support those additional mimetypes. If there's
no child parser registered for a given mimetype, then binding another mime
type to DefaultParser won't help
You probably shouldn't be wrapping your own parser around DefaultParser in
config. If you really need to do that, to decorate some how do it in code
If you want Default Parser and your own one, do something like:
<parsers>
<parser class="org.apache.tika.parser.DefaultParser" />
<parser class="my.tika.parser.ExcelParser">
<!-- any mimetypes special to this -->
</parser>
</parsers>
Nick