Great, I'm glad that you solved your problem.

Anyway I recommend not keeping the apache/tika:latest-full as your
Docker image tag for Tika, please use apache/tika:3.0.0.0-full.
This to avoid issues when you rebuild your Docker container when a new
version will be published (for example 3.1.0.0) because Docker could
get the newer version instead of your current tested version
(3.0.0.0).

Cheers,
PG

Il giorno lun 20 gen 2025 alle ore 08:52 Bisonti Mario
<mario.biso...@vimar.com> ha scritto:
>
> I solved with tika 3.0 and this tika-config.xml:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <properties>
>   <server>
>     <taskTimeoutMillis>120000</taskTimeoutMillis>
>     <minimumTimeoutMillis>10</minimumTimeoutMillis>
>     <port>9998</port>
>     <maxFiles>20000</maxFiles>
>     <forkedJvmArgs>
>       <arg>-Xmx512m</arg>
>     </forkedJvmArgs>
>   </server>
>
>   <service-loader dynamic="true"/>
>   <service-loader loadErrorHandler="WARN"/>
>
>   <parsers>
>     <parser class="org.apache.tika.parser.DefaultParser">
>       <parser-exclude class="org.apache.tika.parser.ocr.TesseractOCRParser"/>
>       <parser-exclude class="org.apache.tika.parser.microsoft.OfficeParser"/>
>       <params>
>         <param name="byteArrayMaxOverride" type="int">30000000</param>
>         <param name="suppressExceptions" type="bool">true</param>
>         <param name="ignoreTikaErrors" type="bool">true</param>
>       </params>
>     </parser>
>
>     <parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
>       <params>
>         <param name="includeShapeBasedContent" type="bool">false</param>
>         <param name="suppressExceptions" type="bool">true</param>
>       </params>
>     </parser>
>
>     <parser class="org.apache.tika.parser.pdf.PDFParser">
>       <params>
>         <param name="pdfbox.enableAutoSpace" type="bool">true</param>
>         <param name="suppressExceptions" type="bool">true</param>
>       </params>
>     </parser>
>   </parsers>
> </properties>
>
>
>
> Thanks alot
>
>
> Mario



-- 
Piergiorgio

Reply via email to