[tesseract-ocr] Re: Tesseract joins characters that are not touching

2018-11-27 Thread Mohit Jain
Hi, Can you tell me how did you extract the binary-intermediate image created by Tesseract? On Saturday, June 18, 2016 at 10:16:57 PM UTC+5:30, Julian Einhaus wrote: > > Hi, > I am trying to read three lines of text on a well defined image (pretty > much no background noise, characters

[tesseract-ocr] Re: How to recognize text in images with blue background and boxed

2018-11-15 Thread Mohit Jain
You can try a local-otsu preprocessing step on this image and then pass the binary image to Tesseract. If you need something more sophisticated, try colour invariant text-binarization (https://github.com/jasonlfunk/ocr-text-extraction) instead of local-otsu. On Saturday, June 9, 2018 at

[tesseract-ocr] Paragraph/Block Reading Order of Text

2018-07-06 Thread Mohit Jain
I'd like to know what algorithm/heuristics Tesseract follows to determine the order in which blocks of text are read? Analysing the output of Tesseract on complex layout documents, I see that its not a simple row-order/column-order, rather some sort of hybrid fusion of the two. Can someone

[tesseract-ocr] Extract Header and Footer text separately from document image

2018-04-09 Thread Mohit Jain
Is there a way to extract the header and footer content on a document page separately using Tesseract OCR? I tried the hOCR output but it doesn't seem to have any such tags associated with the output. Regards, Mohit -- You received this message because you are subscribed to the Google Groups

[tesseract-ocr] Figure, Graph, Image detection/classification using Tesseract OCR

2018-04-06 Thread Mohit Jain
I'd like to know if it's possible to use Tesseract OCR for automatically detecting figures, graphs or images which occur in the image? From reviewing the code-documentation , I can see that it's possible to expose the AnalyseLayout