I'm working on a prototype to be able to OCR Beverage labels and pull the description off them. The problem that I have is that the fonts can be all different and I may or may not know the font. I want to be able to script this as much as possible. Is there a way to train Tesseract in a way that you don't need to know the name of the font? Can I supply an image to train it myself without the font name? I have attached a couple of examples. One Idea that have it to automatically crop out the description text so that the OCR doesn't have to figure out where the text is.
<https://lh4.googleusercontent.com/-MNqGffADKJE/UzAuOvIBpiI/AAAAAAAAG8g/jPiQ_46UpIQ/s1600/Revolver.png><https://lh4.googleusercontent.com/-DKQmOHDtYGc/UzAuTwVMmDI/AAAAAAAAG8o/1LC3om8xlIc/s1600/CigarCity.png> The first image (Revolver Brewing) does a pretty good job when I crop out the right had side description: A full-flavored bock finished with Northern Brewer and Saphir hops. Brewed with an abundance of Munich and caramel malts for a hearty biscuit and toffee choracter. The second image (Cigar City) not so much. I cropped out the middle description and this is what I got: WMNF 88.5Fm IS 3 I1s'rener-supporreo communrru l'aDi0 s1'a11on TH3'l' cetesrares Cl.IlT|.Il'al DiVel’SiT9 am: is commmeb T0 GQUHIH9. Peace ano GCOn0miC JUSTICE. WMNF in Tampa Has Been Sel'VinG THE communrru since 1979, ano is Cel9Bl‘aTil1G THE 33]‘ D H|1|1|'Vel'Sal‘9 OF THe WMNF Tl‘0PiCal Hearwave. T0 Learn more asour WMNF, GO TO lUl‘I1I1F.0l' G. T|"0PiCal Heatwave WH9aT ate IS an American WHGHT Ale. Generousw HOPPGD UJi'I' H Kouaru HOPS Fl'0I'n New zealano. THE KOHHTU HOPS Pl‘0ViDe 3 very Tl‘0PiCal FLaV0f mar F1’ perrecns WIT H THi$ summer ate. I know this is because its not sure of the font. Most common fonts work pretty well... But does anyone have any suggestions on how one might go about this? Cheers! -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.

