You need to build a language model. Download the PyOpenFST project from Google, then look in the "scripts" subdirectory. There are a bunch of scripts for building language models, including dict2linefst.
We're currently benchmarking a whole bunch of standard language models (n-grams, n-graphs, various smoothing and back-off strategies); I hope we'll have a report on that in a few months. (Note that the default recognizer has been trained only on UNLV and does not perform all that well on other datasets.) Tom On Jan 2, 5:54 am, Benjamin Lambert <[email protected]> wrote: > Hi all, > > Let's see, I'm running the latest version controlled OCRopus (at least > within the last couple weeks), on Ubuntu. It seems to be working. My > question is: > is there some way to specify a dictionary to the recognizer? > > For recognition, I'm getting output that looks like this: > "|nd tl1e results of hi$ inVeStigations were pulfli6lled in l8815 # # y]" > > I'd like to be able to specify the set of words that can be recognized, and > have that not include strings like "hi$" and "pulfli6lled". Is that possible > in OCRopus? > > Best, > Ben > > -- > Benjamin Lambert > Ph.D. Student of Computer Science > Carnegie Mellon Universitywww.cs.cmu.edu/~belamber > Mobile: 617-869-1844 On Jan 2, 5:54 am, Benjamin Lambert <[email protected]> wrote: > Hi all, > > Let's see, I'm running the latest version controlled OCRopus (at least > within the last couple weeks), on Ubuntu. It seems to be working. My > question is: > is there some way to specify a dictionary to the recognizer? > > For recognition, I'm getting output that looks like this: > "|nd tl1e results of hi$ inVeStigations were pulfli6lled in l8815 # # y]" > > I'd like to be able to specify the set of words that can be recognized, and > have that not include strings like "hi$" and "pulfli6lled". Is that possible > in OCRopus? > > Best, > Ben > > -- > Benjamin Lambert > Ph.D. Student of Computer Science > Carnegie Mellon Universitywww.cs.cmu.edu/~belamber > Mobile: 617-869-1844 -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
