Hi all,

I am fairly new to OCRopus and OCR in general.  My recent work has been in 
language modeling for automatic speech recognition (ASR).

We recently started a new project in which we'll be applying various language 
modeling techniques from our work on ASR, to OCR.  OCRopus seems to be best 
suited for this since it already supports language modeling through OpenFST.

The first thing I'm interested in experimenting with, is post-processsing of 
OCR recognition lattices (e.g. re-scoring those lattices with other language 
models).

I've managed to get OCRopus to output lattices in the latest development 
HG-checkout, with the ocropy command "ocropus-linerec" (which it seems to 
output by default).  However, as far as I can tell the binary OpenFST files do 
not contain embedded symbol tables.

I.e. if I convert the binary FST to a text FST, I get something like this:
0       1       65537   33      16.7413559
0       24      65537   33      11.7413559
0       1       65537   49      17.3978558
0       25      65537   49      12.3978558

I believe the number 65537 is an "input label" and 33 and 49 are "output 
labels".  I am guessing that the input labels are image segment ID's, and the 
output labels are the ID's of letters, or sequences of letters?

If that's correct, what I am most interested in is how to access the letters 
corresponding to each "output" ID.  Is there any way to do this and/or all this 
feature?

Best,
Ben


--
Benjamin Lambert
Ph.D. Student of Computer Science
Carnegie Mellon University
www.cs.cmu.edu/~belamber
Mobile: 617-869-1844



-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Reply via email to