Am Dienstag, 15. November 2016 16:29:16 UTC+1 schrieb Tom Morris: > > How are you specifying the output format? For example, if you use the > default pdf config file, it includes the line: > > tessedit_pageseg_mode 1 > > which may override your intended -psm flag. >
Thanks for the hint. But I've also tried with my own config (named leohocr) that contains only: load_system_dawg 0 load_freq_dawg 0 tessedit_create_hocr 1 and called it like that: tesseract clean01.tif t01_3 -c tessedit_pageseg_mode=3 leohocr tesseract clean01.tif t01_5 -c tessedit_pageseg_mode=5 leohocr [...] tesseract clean01.tif t01_11 -c tessedit_pageseg_mode=11 leohocr tesseract clean01.tif t01_12 -c tessedit_pageseg_mode=12 leohocr psm 1, 3, 6, 11 and 12 produce very good results but still the only problem are those missing single digit cells. :-( > Having said that, you probably have more information than tesseract about > the page layout, so you may want to try doing page segmentation yourself > and feeding the resulting columns or cells to tesseract for recognition > individually. > I tried to feed a single column (see the attached input file single_col1.tif) but got the same results: psm 1,3, 6, 11 and 12 produce usable results but again the single digit cells are missing. See the attached Screenshot. I'd greatly appreciate any pointers! Thanks, --leo -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/55ce1138-9e04-4e00-b66f-3a488e67bfec%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

