once again, with more information:

I have a problem using tesseract with german fraktur.

I work with tesseract 3.02.02 on SUSE Linux 13.2

firstly the text to be ocr'd is real printed text of about 1930.
the printing is a little dirty i.e. there are little points and strokes 
between 
the letters.
though these are far smaller than the other letters, they are interpreted 
as 
normal letters.oes-frak.frak.exp017

Is there a possibility to give parameters to tesseract that it 
. either should neglect letters which do not fit the majority of the other 
  letters, 
. or it should only use letters in a given range of size
. or to firstly make the boxes, 
  then correct the boxes, by hand or program,
  finally translate using the corrected boxes

I have already tried with a config-file to modify
  textord_min_xheight 24
  textord_xheight_mode_fraction 0.9
  textord_xheight_error_margin 0.1
  textord_descx_ratio_min 0.3
  tessedit_redo_xheight FALSE
it changes some things but nothing to neglect the points and strokes

following an example: 
the appended picture is translated to the text
  15 Ellser Exdmsund Mögsgzerg

a solution with a dictionary is not possible, because the text consists of 
only 
names of persons and locations.

Another thing i wonder is:
when i ocr an image from .tiff to .txt
and makebox of the same image
some (few) letters are different recognized!

thanks for help in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0c58a26a-a8be-4550-9fca-593669a8cf5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to