try the latest tesseract and letonica version - there were some improvement
for big size images.

On Thu, 28 Jul 2022, 10:48 Gaurav Verma, <[email protected]>
wrote:

> Hi,
> I am trying extracting text from some PNG images on windows Server 2019
> Standard using Tesseract OCR 5.0.1 but getting some image validation errors.
>
> Test Image 1 : Image1.png
> Properties :      Dimensions 25500 x 44738
>                           Width            25500 pixels
>                           Height           44738 pixels
>                           Bit depth       24
>                           Size               42.4 MB
>
> ! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser
> bad exit value 1 err msg: *Error in pixCreateHeader: requested w = 25500,
> h = 44738, d = 32*
> ! Error in pixCreateHeader: requested bytes >= 2^31
> ! Error in pixCreateNoInit: pixd not made
> ! Error in pixCreate: pixd not made
> ! Error in pixSetInputFormat: pix not defined
> ! Error in pixReadStreamJpeg: rowbuffer or pix not made
> ! Error in pixReadStream: jpeg: no pix returned
> ! Error in pixRead: pix not read
> ! Error during processing.
>
> Test Image 2 : Image2.png
> Properties :      Dimensions   35700 x 6599
>                           Width               35700 pixels
>                           Height             6599 pixels
>                           Bit depth         24
>                           Size                 50.6 MB
>                           Resolution     608 DPI
>
> ! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser
> bad exit value 1 err msg: *Error in pixCreateNoInit: pix_malloc fail for
> data*
> ! Error in pixCreate: pixd not made
> ! Error in pixReadStreamPng: pix not made
> ! Error in pixReadStream: png: no pix returned
> ! Error in pixRead: pix not read
> ! Error during processing.
>
> Thanks in Advance for any help or hint.
>
> - Gaurav
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bd4f257f-1303-4f11-9c3b-879e521c86fdn%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/bd4f257f-1303-4f11-9c3b-879e521c86fdn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x7Gugd%3DVqao9bAJ5EOgutjWy%2BXrCEqyHD7Q56Gk1n8Kg%40mail.gmail.com.

Reply via email to