Hey team

I'm wondering what's wrong with my config.
I'm running this very basic piece of code:
@Test
public void testTika() throws TikaException, IOException, SAXException {
   BodyContentHandler handler = new BodyContentHandler(new 
WriteOutContentHandler(1000));
   new AutoDetectParser().parse(getBinaryContent("test-ocr.png"), handler, new 
Metadata(), new ParseContext());
   System.out.println("handler = " + handler);
}

Here are my logs:

16:31:13,089 DEBUG [o.a.t.p.o.TesseractOCRParser] hasTesseract (path: 
[tesseract]): true
16:31:13,560 DEBUG [o.a.t.p.o.TesseractOCRParser] hasTesseract (path: 
[tesseract]): true
16:31:13,564 DEBUG [o.a.t.p.o.TesseractOCRParser] ImageMagick does not appear 
to be installed (commandline: convert)
16:31:13,591 DEBUG [o.a.t.p.o.TesseractOCRParser] hasTesseract (path: 
[tesseract]): true
16:31:13,595 DEBUG [o.a.t.p.o.TesseractOCRParser] ImageMagick does not appear 
to be installed (commandline: convert)
handler =


The content is not extracted although Tesseract is detected.

When I run Tesseract manually:

tesseract test-ocr.png tess.out
cat tess.out.txt

I'm getting:

This file contains some words.

tesseract --version gives

tesseract 5.2.0
 leptonica-1.82.0
 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.3) : libpng 1.6.37 : libtiff 
4.4.0 : zlib 1.2.11 : libwebp 1.2.4 : libopenjp2 2.5.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found libarchive 3.6.1 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 
libzstd/1.5.2
 Found libcurl/7.79.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.11 
nghttp2/1.45.1


What I'm missing here?


David

Reply via email to