In looking into this, I discovered:
https://issues.apache.org/jira/browse/TIKA-3796 .  It looks like that
parameter was not settable via tika-config.  I've fixed this now, and the
fix will be in the next release.  I'm not sure, yet, when that will be, but
you can build locally or pull a build from Jenkins.

The example config that shows how to turn this on/off is here:
https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/org/apache/tika/parser/microsoft/tika-config-headers-footers.xml

On Mon, Jun 20, 2022 at 3:56 AM Inzamam Anwar <[email protected]>
wrote:

> Hello,
>
> I am trying to omit headers/footers from doc/docx files. I have tried the
> following XML configuration file with "tika-server-standard-2.4.0.jar". I
> have attached a sample file also. Any help in this regard would be
> appreciated.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>     <parsers>
>         <parser class="org.apache.tika.parser.DefaultParser">
>             <parser-exclude class="org.apache.tika.parser.pdf.PDFParser"/>
>         </parser>
>         <parser class="org.apache.tika.parser.pdf.PDFParser">
>             <params>
>                 <param name="sortByPosition" type="bool">true</param>
>             </params>
>         </parser>
>         <parser
> class="org.apache.tika.parser.microsoft.OfficeParserConfig">
>             <params>
>                 <param name="includeHeadersAndFooters"
> type="bool">false</param>
>             </params>
>         </parser>
>     </parsers>
> </properties>
>
> Regards
> Inzamam
>
>

Reply via email to