On Friday, May 27, 2016 at 8:29:02 AM UTC-4, Mika Koistinen wrote: > > Looks like i have related problem when trying to create HOCR files for a > single word images. The result for single word is disappearing, however I > can find it from txt files without HOCR parameter. > ...
> ERROR message: > > Too few characters. Skipping this page > > OSD: Weak margin (0.00) for 1 blob text block, but using orientation > anyway: 0 > > Empty page!! > The "too few characters. Skipping this page" message explains what's going on. How are you requesting hOCR output? If you are using the default `hocr` config file, it not only enables hOCR output, but it also changes the page segmentation mode to 1, which is what's causing the problem. You can remove this line: tessedit_pageseg_mode 1 or change it to a more appropriate page segmentation mode like tessedit_pageseg_mode 6 Tom -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c6b18f11-aa24-4a59-b3ef-d5544c0e98a7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

