Re: No treatment for touching letters?

Eugene Reimer Thu, 12 Aug 2010 02:24:44 -0700

You could probably improve its ability to recognize "00" as two 0's bytraining it on such paired symbols.

Mind you, I have also been surprised by cases where a perfectly clearand flawless symbol gets subdivided, like a N becoming |\| or an Hbecoming I-I, which indicates that tesseract has code to subdivide blobsother than based on there being "space" between them. However that codeseems to behave in erratic ways.



patrickq wrote, On 2010-08-12 02:01:

See http://www.scanbizcards.com/touchingdigits.jpg
Includes a tel number where "OO" appear twice with no spacing, i.e.
touching. Tesseract fails on both sets, returning:
(65)81W6W instead of (65)8100 6002
("00" -> "W" and '002" -> "W")

I have not seen Tesseract do well with hardly any situation where two
letters were touching - yet ironically I have seen plenty of examples
where a letter got chopped up in 2 or 3 pieces, for example:
|\| instead of N

Any idea what's going on and why Tesseract doesn't attempt to
recognize "00" as two 0's?


--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: No treatment for touching letters?

Reply via email to