try the latest tesseract and letonica version - there were some improvement for big size images.
On Thu, 28 Jul 2022, 10:48 Gaurav Verma, <[email protected]> wrote: > Hi, > I am trying extracting text from some PNG images on windows Server 2019 > Standard using Tesseract OCR 5.0.1 but getting some image validation errors. > > Test Image 1 : Image1.png > Properties : Dimensions 25500 x 44738 > Width 25500 pixels > Height 44738 pixels > Bit depth 24 > Size 42.4 MB > > ! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser > bad exit value 1 err msg: *Error in pixCreateHeader: requested w = 25500, > h = 44738, d = 32* > ! Error in pixCreateHeader: requested bytes >= 2^31 > ! Error in pixCreateNoInit: pixd not made > ! Error in pixCreate: pixd not made > ! Error in pixSetInputFormat: pix not defined > ! Error in pixReadStreamJpeg: rowbuffer or pix not made > ! Error in pixReadStream: jpeg: no pix returned > ! Error in pixRead: pix not read > ! Error during processing. > > Test Image 2 : Image2.png > Properties : Dimensions 35700 x 6599 > Width 35700 pixels > Height 6599 pixels > Bit depth 24 > Size 50.6 MB > Resolution 608 DPI > > ! Caused by: org.apache.tika.exception.TikaException: TesseractOCRParser > bad exit value 1 err msg: *Error in pixCreateNoInit: pix_malloc fail for > data* > ! Error in pixCreate: pixd not made > ! Error in pixReadStreamPng: pix not made > ! Error in pixReadStream: png: no pix returned > ! Error in pixRead: pix not read > ! Error during processing. > > Thanks in Advance for any help or hint. > > - Gaurav > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/bd4f257f-1303-4f11-9c3b-879e521c86fdn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/bd4f257f-1303-4f11-9c3b-879e521c86fdn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x7Gugd%3DVqao9bAJ5EOgutjWy%2BXrCEqyHD7Q56Gk1n8Kg%40mail.gmail.com.

