Hello Matthew, Thanks for the info regarding emop.
I had seen the Prima Research web page sometime back but don't have access to their tools . Is Alethia available download? Does it work with complex scripts such as Hindi? Look forward to Franken+ . Hope I'll be able to use for Hindi/Sanskrit. Shree Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jul 11, 2013 at 7:20 PM, matthew christy <[email protected]>wrote: > If you do find a font with whatthefont, then use the directions here: > https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 to train > tesseract on the font. These directions aren't great though, so you can > also look at some notes I created on training tesseract: > http://emop.tamu.edu/node/47. You should also search this forum for a lot > of information that isn't in the official google docs on Tesseract. > > If you don't find a font you can use, the IDHMC <http://idhmc.tamu.edu>is > about to release an open source tool, as part of our > eMOP <http://emop.tamu.edu> project, that will let you create training > pages for Tesseract using your own image files. We should be releasing that > tool in beta in a week or two. > > On Wednesday, July 10, 2013 12:29:48 AM UTC-5, Kazem Jahanbakhsh wrote: >> >> Hi everyone, >> >> We have a set of images taken from buses head signs which displays bus id >> and its route details displayed by LEDs. Our goal is to "*USE Tesseract >> to Extract Texts Written in the Cropped Images*". When we selected the >> first image shown below which reads as "*30 ROYAL OAK EX*", we got "*30 >> RIWHL 0fl|( EX*" as the output. As you see, tesseract only detected some >> of the characters correctly. >> >> ,<https://lh4.googleusercontent.com/-hFOIsEuVsUw/UdztzLbnqUI/AAAAAAAAAGw/OdNG99jkr3s/s1600/30_bus.jpg> >> >> We also tested tesseract with another headsign image input shown below >> which reads as "*26 UVIC*". However, in this case tesseract returned >> an empty string! >> >> >> <https://lh4.googleusercontent.com/-tVeJU0Hyjis/Udzu19sURfI/AAAAAAAAAG8/Zme6iJHd_sA/s1600/bus_26_headsign.jpg> >> >> So, we have two questions: >> >> 1- Can we use Tesseract for such a task: specifically passing above image >> with an english text inside and expecting to extract the text? >> 2- If the above assumption is valid, what's the reason that tesseract >> fails detecting the right text? Do we need to train tesseract with fonts >> used in the bus head signs? If so, how can we do such a task? Finally, are >> there any wiki pages that we can read which explains the internal >> algorithms of tesseract and how it extracts texts from images? >> >> Any help would be really appreciated. >> >> Kazem >> >> -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

