Re: How to trian tesseract for new fonts?

Shree Devi Kumar Thu, 11 Jul 2013 09:21:28 -0700

Hello Matthew,

Thanks for the info regarding emop.


I had seen the Prima Research web page sometime back but don't have access
to their tools . Is Alethia available download? Does it work with complex
scripts such as Hindi?

Look forward to Franken+ . Hope I'll be able to use for Hindi/Sanskrit.

Shree







Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Thu, Jul 11, 2013 at 7:20 PM, matthew christy <[email protected]>wrote:

> If you do find a font with whatthefont, then use the directions here:
> https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 to train
> tesseract on the font. These directions aren't great though, so you can
> also look at some notes I created on training tesseract:
> http://emop.tamu.edu/node/47. You should also search this forum for a lot
> of information that isn't in the official google docs on Tesseract.
>
> If you don't find a font you can use, the IDHMC <http://idhmc.tamu.edu>is 
> about to release an open source tool, as part of our
> eMOP <http://emop.tamu.edu> project, that will let you create training
> pages for Tesseract using your own image files. We should be releasing that
> tool in beta in a week or two.
>
> On Wednesday, July 10, 2013 12:29:48 AM UTC-5, Kazem Jahanbakhsh wrote:
>>
>> Hi everyone,
>>
>> We have a set of images taken from buses head signs which displays bus id
>> and its route details displayed by LEDs. Our goal is to "*USE Tesseract
>> to Extract Texts Written in the Cropped Images*". When we selected the
>> first image shown below which reads as "*30 ROYAL OAK EX*", we got "*30
>> RIWHL 0ﬂ|( EX*" as the output. As you see, tesseract only detected some
>> of the characters correctly.
>>
>> ,<https://lh4.googleusercontent.com/-hFOIsEuVsUw/UdztzLbnqUI/AAAAAAAAAGw/OdNG99jkr3s/s1600/30_bus.jpg>
>>
>> We also tested tesseract with another headsign image input shown below
>> which reads as "*26    UVIC*". However, in this case tesseract returned
>> an empty string!
>>
>>
>> <https://lh4.googleusercontent.com/-tVeJU0Hyjis/Udzu19sURfI/AAAAAAAAAG8/Zme6iJHd_sA/s1600/bus_26_headsign.jpg>
>>
>> So, we have two questions:
>>
>> 1- Can we use Tesseract for such a task: specifically passing above image
>> with an english text inside and expecting to extract the text?
>> 2- If the above assumption is valid, what's the reason that tesseract
>> fails detecting the right text? Do we need to train tesseract with fonts
>> used in the bus head signs? If so, how can we do such a task? Finally, are
>> there any wiki pages that we can read which explains the internal
>> algorithms of tesseract and how it extracts texts from images?
>>
>> Any help would be really appreciated.
>>
>> Kazem
>>
>>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to trian tesseract for new fonts?

Reply via email to