We should handle this more gracefully (and I think we do in our main
branch, Tika 2.0.0), but the problem is that you're only loading the
PDFParser...not the TesseractOCRParser so the PDFParser throws an NPE
when it can't find tesseract.

Make sure to include the DefaultParser, which will also load Tesseract.

<properties>
    <parsers>
        <parser class="org.apache.tika.parser.DefaultParser"/>
        <parser class="org.apache.tika.parser.pdf.PDFParser">
...


On Fri, Mar 12, 2021 at 12:06 PM Subhajit Das <[email protected]> wrote:
>
> I am getting this in console out:
>
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
>
> But nothing on logs.
>
>
>
> When a /tika put is send for PDF, I get nullpointer exception in 
> AbstractPDF2XHTML.java in line 434.
>
>
>
> Using Tikaconfig:
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
>
> <properties>
>
>   <service-loader loadErrorHandler="WARN"/>
>
>   <parsers>
>
>     <parser class="org.apache.tika.parser.pdf.PDFParser">
>
>       <params>
>
>         <param name="ocrStrategy" type="string">ocr_only</param>
>
>         <param name="ocrImageType" type="string">rgb</param>
>
>         <param name="ocrDPI" type="int">300</param>
>
>       </params>
>
>     </parser>
>
>   </parsers>
>
> </properties>
>
>

Reply via email to