Thank you Tim for the quick response. I was wondering whether it is possible to make a detailed 'default' xml configuration file for all settable parameters or not. Doing this will help people from different backgrounds to control behavior of Apache Tika.
Regards Inzamam On Tue, Jun 21, 2022 at 12:16 AM Tim Allison <[email protected]> wrote: > In looking into this, I discovered: > https://issues.apache.org/jira/browse/TIKA-3796 . It looks like that > parameter was not settable via tika-config. I've fixed this now, and the > fix will be in the next release. I'm not sure, yet, when that will be, but > you can build locally or pull a build from Jenkins. > > The example config that shows how to turn this on/off is here: > https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/test/resources/org/apache/tika/parser/microsoft/tika-config-headers-footers.xml > > On Mon, Jun 20, 2022 at 3:56 AM Inzamam Anwar <[email protected]> > wrote: > >> Hello, >> >> I am trying to omit headers/footers from doc/docx files. I have tried the >> following XML configuration file with "tika-server-standard-2.4.0.jar". I >> have attached a sample file also. Any help in this regard would be >> appreciated. >> >> <?xml version="1.0" encoding="UTF-8"?> >> <properties> >> <parsers> >> <parser class="org.apache.tika.parser.DefaultParser"> >> <parser-exclude class="org.apache.tika.parser.pdf.PDFParser"/> >> </parser> >> <parser class="org.apache.tika.parser.pdf.PDFParser"> >> <params> >> <param name="sortByPosition" type="bool">true</param> >> </params> >> </parser> >> <parser >> class="org.apache.tika.parser.microsoft.OfficeParserConfig"> >> <params> >> <param name="includeHeadersAndFooters" >> type="bool">false</param> >> </params> >> </parser> >> </parsers> >> </properties> >> >> Regards >> Inzamam >> >>
