You need to build a language model.  Download the PyOpenFST project
from Google, then look in the "scripts" subdirectory.  There are a
bunch of scripts for building language models, including dict2linefst.

We're currently benchmarking a whole bunch of standard language models
(n-grams, n-graphs, various smoothing and back-off strategies); I hope
we'll have a report on that in a few months.

(Note that the default recognizer has been trained only on UNLV and
does not perform all that well on other datasets.)

Tom


On Jan 2, 5:54 am, Benjamin Lambert <[email protected]> wrote:
> Hi all,
>
> Let's see, I'm running the latest version controlled OCRopus  (at least 
> within the last couple weeks), on Ubuntu.  It seems to be working.  My 
> question is:
> is there some way to specify a dictionary to the recognizer?
>
> For recognition, I'm getting output that looks like this:
> "|nd tl1e results of hi$ inVeStigations were pulfli6lled in l8815 # # y]"
>
> I'd like to be able to specify the set of words that can be recognized, and 
> have that not include strings like "hi$" and "pulfli6lled".  Is that possible 
> in OCRopus?
>
> Best,
> Ben
>
> --
> Benjamin Lambert
> Ph.D. Student of Computer Science
> Carnegie Mellon Universitywww.cs.cmu.edu/~belamber
> Mobile: 617-869-1844

On Jan 2, 5:54 am, Benjamin Lambert <[email protected]> wrote:
> Hi all,
>
> Let's see, I'm running the latest version controlled OCRopus  (at least 
> within the last couple weeks), on Ubuntu.  It seems to be working.  My 
> question is:
> is there some way to specify a dictionary to the recognizer?
>
> For recognition, I'm getting output that looks like this:
> "|nd tl1e results of hi$ inVeStigations were pulfli6lled in l8815 # # y]"
>
> I'd like to be able to specify the set of words that can be recognized, and 
> have that not include strings like "hi$" and "pulfli6lled".  Is that possible 
> in OCRopus?
>
> Best,
> Ben
>
> --
> Benjamin Lambert
> Ph.D. Student of Computer Science
> Carnegie Mellon Universitywww.cs.cmu.edu/~belamber
> Mobile: 617-869-1844

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to