I have a problem using tesseract with german fraktur. firstly the text to be ocr'd is real printed text of about 1930. the printing is a little dirty i.e. there are little points and strokes between the letters. though these are far smaller than the other letters, they are interpreted as normal letters.
Is there a possibility to give parameters to tesseract that it . either should neglect letters which do not fit the majority of the other letters, . or it should only use letters in a given range of size . or to firstly make the boxes, then correct the boxes, by hand or program, finally translate using the corrected boxes a solution with a dictionary is not possible, because the text consists of only names of persons and locations. Another thing i wonder is: when i ocr an image from .tiff to .txt and makebox of the same image some (few) letters are different recognized! thanks for help in advance -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7a3189e9-7bf4-408b-906d-c85090c7fc8f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

