On Fri, May 15, 2009 at 10:18, Pierpaolo Monaco
<[email protected]>wrote:

> Using tesseract i can limit the output with a shell command.
>
> I just need to create a file in the tesseract-ocr/tessdata/configs/ that,
> for example, I call myletters.
> In the file i define the whitelist in this way, writing in the file:
>
> tessedit_char_whitelist QWERTYUIOPASDFGHJKLZXCVBNM
>
> After that i can process an image writing:
>
> $ tesseract prova.tif out nobatch myletters
>
> I will have just upper case letters as result. (letters from my white list)
>
> Can I do something like that in ocropus or I need to do that whit a
> language model?
>

You need a language model for that, but a pretty simple one.  The language
model you need is the equivalent of "[A-Z]*".  You can create something as
simple as that by hand even; you just need one or two states, plus a
transition for each permited letter.  See the OpenFST documentation (you do
not need to use OpenFST, but OCRopus uses the same representation).

If you want good recognition performance, you should also retrain the
classifier on just your target character set.

I've written an overview paper describing how all the bits and pieces of
OCRopus fit together; I'll try and put that up publicly in a couple of
weeks.

After that, I'll revise the tutorial to conform to OCRopus 0.4.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to