A strange phenomenon, since I started working with Tesseract keeps me awake at night.
Often, but not always, the output strings have additional letters at the beginning and at the end. They are clearly not on the images I entered into the engine. There is also no noise or anything else on the image that could suggest why there could be another letter. I use a language I trained myself using the training images from the download page. Here a few examples to illustrate the problem: http://people.ee.ethz.ch/~bknecht/tesseract/img_01.tif - marrBsysrems | http://people.ee.ethz.ch/~bknecht/tesseract/img_02.tif - Sun Microsystems (Schweiz) AG Z http://people.ee.ethz.ch/~bknecht/tesseract/img_03.tif - Fax: N http://people.ee.ethz.ch/~bknecht/tesseract/img_04.tif - Department of Management, N 8 http://people.ee.ethz.ch/~bknecht/tesseract/img_05.tif - Z Technology,and Economics Z I do not care that "microsystems" has not been recognized correctly, as this is a rather small image there, but I do care about the pipe character "|" there. As you see this occurs more often at the end of the output, but also at the beginning. Mostly its just one character and it is mostly separated from the rest with a space or a new line character. In example 4 you see even 2 characters added at the end. The characters are mostly upper case letters and mostly it's a Z, N or I, but symbols and numbers also occur sometimes. They are also not always the same, sometimes I get a N a Z or an 8 or a Y with the same picture. I already tried to solve the problem by cleaning the output string by just deleting all single characters at the beginning and the end of the output. Of course this is not always a good solution. Is there a way to keep those ghost letters from appearing? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

