Am Dienstag, 15. November 2016 16:29:16 UTC+1 schrieb Tom Morris:
>
> How are you specifying the output format? For example, if you use the 
> default pdf config file, it includes the line:
>
> tessedit_pageseg_mode 1
>
> which may override your intended -psm flag.
>

Thanks for the hint.
But I've also tried with my own config (named leohocr) that contains only:

load_system_dawg 0
load_freq_dawg 0
tessedit_create_hocr 1

and called it like that:
tesseract clean01.tif t01_3 -c tessedit_pageseg_mode=3 leohocr
tesseract clean01.tif t01_5 -c tessedit_pageseg_mode=5 leohocr
[...]
tesseract clean01.tif t01_11 -c tessedit_pageseg_mode=11 leohocr
tesseract clean01.tif t01_12 -c tessedit_pageseg_mode=12 leohocr

psm 1, 3, 6, 11 and 12 produce very good results but still the only problem 
are those missing single digit cells. :-(
 

> Having said that, you probably have more information than tesseract about 
> the page layout, so you may want to try doing page segmentation yourself 
> and feeding the resulting columns or cells to tesseract for recognition 
> individually.
>

I tried to feed a single column (see the attached input file single_col1.tif) 
but got the same results:
psm 1,3, 6, 11 and 12 produce usable results but again the single digit 
cells are missing.
See the attached Screenshot.  

I'd greatly appreciate any pointers!

Thanks,
--leo

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/55ce1138-9e04-4e00-b66f-3a488e67bfec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to