Thanks for the reply, Tom. I didn't see it right away for some reason. I'll report back when I have some results from the process you describe.
Dennis On Thu, May 13, 2010 at 2:45 PM, Tom <[email protected]> wrote: > Hi, > > sorry, there is no tutorial yet, and there are actually a number of > different possibilities. > > The following is what works for the development branch if you have > text lines and transcriptions and you already have a character > recognition model that sort of works but not well. The process then > is roughly as follows: > > - put the text line images into *.png files and the corresponding > ground truth into *.gt.txt files > - run ocropus-calign -x .gt.txt -m my.cmodel *.png > - run ocropus-extract-csegs *.png -o chars.db > - optionally, correct the character labels with ocropus-cedit chars.db > -t chars > - optionally, cluster the character shapes with ocropus-cluster > chars.db clusters.db > - optionally, correct the cluster labels with ocropus-cedit > clusters.db > - train a new character recognition model with ocropus-ctrain -b > clusters.db new.cmodel > > You can now recognize with "ocropus-calign -m new.cmodel ..." > > There are other recipes for completely new scripts (i.e., if you don't > already have any model), for new scripts that differ from old scripts > by only a few characters, etc. > > Also, there are two kinds of recognizers, the old C++ recognizer > (ocropus-linerec) and the new Python recognizer (ocropus-calign); they > work similarly but have some differences. For the official release, > we're moving completely to the Python recognizer. > > Tom > > > > > On May 9, 10:28 pm, Ted <[email protected]> wrote: > > Did you ever get a reply to the documentation question? > > I'm a new OCROpus user and have to use gocr. This creates problems > > because it needs a lot of corrections. I'd like to know how to use > > the trainer. > > > > I'd even be willing to write an introductory manual. > > > > On May 5, 5:53 pm, Dennis Rardin <[email protected]> wrote: > > > > > > > > > All/Anyone, > > > > > I have 2 large books broken into pages and then to lines. I'm ready to > > > train. For both books, I have text files to compare against the images. > > > > > How do I train OCROpus by using the text files to correct the results > of the > > > character recognition? > > > > > Thank You Very Much, > > > Dennis > > > > > -- > > > You received this message because you are subscribed to the Google > Groups "ocropus" group. > > > To post to this group, send email to [email protected]. > > > To unsubscribe from this group, send email to > [email protected]<ocropus%[email protected]> > . > > > For more options, visit this group athttp:// > groups.google.com/group/ocropus?hl=en. > > > > -- > > You received this message because you are subscribed to the Google Groups > "ocropus" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > [email protected]<ocropus%[email protected]> > . > > For more options, visit this group athttp:// > groups.google.com/group/ocropus?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "ocropus" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<ocropus%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/ocropus?hl=en. > > -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
