On Tue, 25 Aug 2015, Mikhail Titov wrote:
The following will break automatic parser calling for text/toa5 in 1.10
but not in 1.9

,----[ tika config xml ]
| <?xml version="1.0" encoding="UTF-8"?>
| <properties>
|       <parsers>
| <!--               <parser class="org.apache.tika.parser.DefaultParser"> -->
| <!--                       <mime>text/toa5</mime> -->
| <!--               </parser> -->
|               <parser class="my.tika.parser.ExcelParser">
|                       <parser class="org.apache.tika.parser.DefaultParser" />
|               </parser>
|       </parsers>
| </properties>

You probably shouldn't be defining additional mimetypes to DefaultParser. Give it child parsers that support those additional mimetypes. If there's no child parser registered for a given mimetype, then binding another mime type to DefaultParser won't help

You probably shouldn't be wrapping your own parser around DefaultParser in config. If you really need to do that, to decorate some how do it in code


If you want Default Parser and your own one, do something like:

<parsers>
  <parser class="org.apache.tika.parser.DefaultParser" />
  <parser class="my.tika.parser.ExcelParser">
    <!-- any mimetypes special to this -->
  </parser>
</parsers>

Nick

Reply via email to