Hi everyone,
We have a set of images taken from buses head signs which displays bus id and its route details displayed by LEDs. Our goal is to "*USE Tesseract to Extract Texts Written in the Cropped Images*". When we selected the first image shown below which reads as "*30 ROYAL OAK EX*", we got "*30 RIWHL 0fl|( EX*" as the output. As you see, tesseract only detected some of the characters correctly. ,<https://lh4.googleusercontent.com/-hFOIsEuVsUw/UdztzLbnqUI/AAAAAAAAAGw/OdNG99jkr3s/s1600/30_bus.jpg> We also tested tesseract with another headsign image input shown below which reads as "*26 UVIC*". However, in this case tesseract returned an empty string! <https://lh4.googleusercontent.com/-tVeJU0Hyjis/Udzu19sURfI/AAAAAAAAAG8/Zme6iJHd_sA/s1600/bus_26_headsign.jpg> So, we have two questions: 1- Can we use Tesseract for such a task: specifically passing above image with an english text inside and expecting to extract the text? 2- If the above assumption is valid, what's the reason that tesseract fails detecting the right text? Do we need to train tesseract with fonts used in the bus head signs? If so, how can we do such a task? Finally, are there any wiki pages that we can read which explains the internal algorithms of tesseract and how it extracts texts from images? Any help would be really appreciated. Kazem -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

