I'm working on a prototype to be able to OCR Beverage labels and pull the 
description off them.  The problem that I have is that the fonts can be all 
different and I may or may not know the font.  I want to be able to script 
this as much as possible.  Is there a way to train Tesseract in a way that 
you don't need to know the name of the font?  Can I supply an image to 
train it myself without the font name? I have attached a couple of 
examples. One Idea that have it to automatically crop out the description 
text so that the OCR doesn't have to figure out where the text is.  

<https://lh4.googleusercontent.com/-MNqGffADKJE/UzAuOvIBpiI/AAAAAAAAG8g/jPiQ_46UpIQ/s1600/Revolver.png><https://lh4.googleusercontent.com/-DKQmOHDtYGc/UzAuTwVMmDI/AAAAAAAAG8o/1LC3om8xlIc/s1600/CigarCity.png>


The first image (Revolver Brewing) does a pretty good job when I crop out 
the right had side description:
A full-flavored bock finished with
Northern Brewer and Saphir hops.
Brewed with an abundance of
Munich and caramel malts for a
hearty biscuit and toffee choracter.

The second image (Cigar City) not so much.  I cropped out the middle 
description and this is what I got:

WMNF 88.5Fm IS 3
I1s'rener-supporreo
communrru l'aDi0 s1'a11on
TH3'l' cetesrares Cl.IlT|.Il'al
DiVel’SiT9 am: is commmeb
T0 GQUHIH9. Peace ano
GCOn0miC JUSTICE. WMNF in
Tampa Has Been Sel'VinG
THE communrru since 1979,
ano is Cel9Bl‘aTil1G THE
33]‘ D H|1|1|'Vel'Sal‘9 OF THe
WMNF Tl‘0PiCal Hearwave.

T0 Learn more asour WMNF,
GO TO lUl‘I1I1F.0l' G.

T|"0PiCal Heatwave WH9aT
ate IS an American WHGHT
Ale. Generousw HOPPGD
UJi'I' H Kouaru HOPS Fl'0I'n
New zealano. THE KOHHTU
HOPS Pl‘0ViDe 3 very
Tl‘0PiCal FLaV0f mar F1’
perrecns WIT H THi$
summer ate.

I know this is because its not sure of the font. 

Most common fonts work pretty well... But does anyone have any suggestions 
on how one might go about this?

Cheers!

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to