I'm developing an open source Android app that uses Tesseract 3.01 for OCR by passing Tesseract images captured by a phone or tablet camera.
The OCR is working adequately for small segments of text--like a few words--but uneven illumination seems to lower the recognition quality with larger text input. Because the input comes from the device camera, there's a lot of shadows and glare. Right now I'm not doing any pre-processing using binarization/ thresholding, I'm just passing a grayscale Pix to Tesseract. In general, how should I pre-process images to improve recognition quality? Is there a way to specify settings for Tesseract's internal thresholding using TessBaseAPI::SetVariable()? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

