Tess3.01 has a lot of trouble recognizing my curly double quotes. Unfortunately, my scans have lots of dialog with these in them.
My Irish font is one with diacriticals. It has accents over vowels and dots over consonants. In addition, the uppercase letters are just larger versions of the lower case letters, differing only in size. All this means that the quote marks are pretty high. Even after extensive training though, Tess is not getting it right. Even with -psm 6 it still doesn't get it right. But using gimp to lower the quotes about 4 or 5 pixels on the 600dpi scans makes Tess work. Is there a configuration parameter that would help? Or can anyone point me to the section of code that would be relevant? When it screws up, it groups the curly quotes and diacriticals into one line of mess followed by a second line of the letters without the diacriticals etc. On a page with say 12 lines starting with quotes, it will miss 4 of them. It seems to be more likely to screw up the first line of a page if it starts with a left-double-curly. Is there any way to tell Tess to tolerate these high double curly quotes (left and right of course)? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

