Re: [tesseract-ocr] lines dissappear in resulting file

Tom Morris Thu, 02 Jun 2016 15:49:52 -0700

On Friday, May 27, 2016 at 8:29:02 AM UTC-4, Mika Koistinen wrote:
>
> Looks like i have related problem when trying to create HOCR files for a 
> single word images. The result for single word is disappearing, however I 
> can find it from txt files without HOCR parameter.
>
 ...


> ERROR message:
>
> Too few characters. Skipping this page
>
> OSD: Weak margin (0.00) for 1 blob text block, but using orientation 
> anyway: 0
>
> Empty page!!
>

The "too few characters. Skipping this page" message explains what's going 
on.

How are you requesting hOCR output? If you are using the default `hocr` 
config file, it not only enables hOCR output, but it also changes the page 
segmentation mode to 1, which is what's causing the problem.

You can remove this line:

tessedit_pageseg_mode 1
  
or change it to a more appropriate page segmentation mode like 

tessedit_pageseg_mode 6


Tom

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c6b18f11-aa24-4a59-b3ef-d5544c0e98a7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] lines dissappear in resulting file

Reply via email to