How to trian tesseract for new fonts?

Kazem Jahanbakhsh Wed, 10 Jul 2013 00:26:38 -0700


Hi everyone,


We have a set of images taken from buses head signs which displays bus id 
and its route details displayed by LEDs. Our goal is to "*USE Tesseract to 
Extract Texts Written in the Cropped Images*". When we selected the first 
image shown below which reads as "*30 ROYAL OAK EX*", we got "*30 RIWHL 
0ﬂ|( EX*" as the output. As you see, tesseract only detected some of the 
characters correctly.

,<https://lh4.googleusercontent.com/-hFOIsEuVsUw/UdztzLbnqUI/AAAAAAAAAGw/OdNG99jkr3s/s1600/30_bus.jpg>

We also tested tesseract with another headsign image input shown below 
which reads as "*26    UVIC*". However, in this case tesseract returned an 
empty string! 

<https://lh4.googleusercontent.com/-tVeJU0Hyjis/Udzu19sURfI/AAAAAAAAAG8/Zme6iJHd_sA/s1600/bus_26_headsign.jpg>

So, we have two questions:

1- Can we use Tesseract for such a task: specifically passing above image 
with an english text inside and expecting to extract the text?
2- If the above assumption is valid, what's the reason that tesseract fails 
detecting the right text? Do we need to train tesseract with fonts used in 
the bus head signs? If so, how can we do such a task? Finally, are there 
any wiki pages that we can read which explains the internal algorithms of 
tesseract and how it extracts texts from images?

Any help would be really appreciated.

Kazem

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

How to trian tesseract for new fonts?

Reply via email to