Hello,

I am trying to omit headers/footers from doc/docx files. I have tried the
following XML configuration file with "tika-server-standard-2.4.0.jar". I
have attached a sample file also. Any help in this regard would be
appreciated.

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser">
            <parser-exclude class="org.apache.tika.parser.pdf.PDFParser"/>
        </parser>
        <parser class="org.apache.tika.parser.pdf.PDFParser">
            <params>
                <param name="sortByPosition" type="bool">true</param>
            </params>
        </parser>
        <parser class="org.apache.tika.parser.microsoft.OfficeParserConfig">
            <params>
                <param name="includeHeadersAndFooters"
type="bool">false</param>
            </params>
        </parser>
    </parsers>
</properties>

Regards
Inzamam

<<attachment: sample_doc.doc>>

Reply via email to