In looking into this, I discovered: https://issues.apache.org/jira/browse/TIKA-3796 . It looks like that parameter was not settable via tika-config. I've fixed this now, and the fix will be in the next release. I'm not sure, yet, when that will be, but you can build locally or pull a build from Jenkins.
The example config that shows how to turn this on/off is here: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/org/apache/tika/parser/microsoft/tika-config-headers-footers.xml On Mon, Jun 20, 2022 at 3:56 AM Inzamam Anwar <[email protected]> wrote: > Hello, > > I am trying to omit headers/footers from doc/docx files. I have tried the > following XML configuration file with "tika-server-standard-2.4.0.jar". I > have attached a sample file also. Any help in this regard would be > appreciated. > > <?xml version="1.0" encoding="UTF-8"?> > <properties> > <parsers> > <parser class="org.apache.tika.parser.DefaultParser"> > <parser-exclude class="org.apache.tika.parser.pdf.PDFParser"/> > </parser> > <parser class="org.apache.tika.parser.pdf.PDFParser"> > <params> > <param name="sortByPosition" type="bool">true</param> > </params> > </parser> > <parser > class="org.apache.tika.parser.microsoft.OfficeParserConfig"> > <params> > <param name="includeHeadersAndFooters" > type="bool">false</param> > </params> > </parser> > </parsers> > </properties> > > Regards > Inzamam > >
