Hello, I am trying to omit headers/footers from doc/docx files. I have tried the following XML configuration file with "tika-server-standard-2.4.0.jar". I have attached a sample file also. Any help in this regard would be appreciated.
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<parser-exclude class="org.apache.tika.parser.pdf.PDFParser"/>
</parser>
<parser class="org.apache.tika.parser.pdf.PDFParser">
<params>
<param name="sortByPosition" type="bool">true</param>
</params>
</parser>
<parser class="org.apache.tika.parser.microsoft.OfficeParserConfig">
<params>
<param name="includeHeadersAndFooters"
type="bool">false</param>
</params>
</parser>
</parsers>
</properties>
Regards
Inzamam
<<attachment: sample_doc.doc>>
