On Fri, May 15, 2009 at 10:18, Pierpaolo Monaco <[email protected]>wrote:
> Using tesseract i can limit the output with a shell command. > > I just need to create a file in the tesseract-ocr/tessdata/configs/ that, > for example, I call myletters. > In the file i define the whitelist in this way, writing in the file: > > tessedit_char_whitelist QWERTYUIOPASDFGHJKLZXCVBNM > > After that i can process an image writing: > > $ tesseract prova.tif out nobatch myletters > > I will have just upper case letters as result. (letters from my white list) > > Can I do something like that in ocropus or I need to do that whit a > language model? > You need a language model for that, but a pretty simple one. The language model you need is the equivalent of "[A-Z]*". You can create something as simple as that by hand even; you just need one or two states, plus a transition for each permited letter. See the OpenFST documentation (you do not need to use OpenFST, but OCRopus uses the same representation). If you want good recognition performance, you should also retrain the classifier on just your target character set. I've written an overview paper describing how all the bits and pieces of OCRopus fit together; I'll try and put that up publicly in a couple of weeks. After that, I'll revise the tutorial to conform to OCRopus 0.4. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
