Re: Hot To Train On Book Directory Against Ground Truth Text Files?

Dennis Rardin Thu, 20 May 2010 11:58:35 -0700

Thanks for the reply, Tom. I didn't see it right away for some reason. I'll
report back when I have some results from the process you describe.


Dennis

On Thu, May 13, 2010 at 2:45 PM, Tom <[email protected]> wrote:

> Hi,
>
> sorry, there is no tutorial yet, and there are actually a number of
> different possibilities.
>
> The following is what works for the development branch if you have
> text lines and transcriptions and you already have a character
> recognition model that sort of works but not well.  The process then
> is roughly as follows:
>
> - put the text line images into *.png files and the corresponding
> ground truth into *.gt.txt files
> - run ocropus-calign -x .gt.txt -m my.cmodel *.png
> - run ocropus-extract-csegs *.png -o chars.db
> - optionally, correct the character labels with ocropus-cedit chars.db
> -t chars
> - optionally, cluster the character shapes with ocropus-cluster
> chars.db clusters.db
> - optionally, correct the cluster labels with ocropus-cedit
> clusters.db
> - train a new character recognition model with ocropus-ctrain -b
> clusters.db new.cmodel
>
> You can now recognize with "ocropus-calign -m new.cmodel ..."
>
> There are other recipes for completely new scripts (i.e., if you don't
> already have any model), for new scripts that differ from old scripts
> by only a few characters, etc.
>
> Also, there are two kinds of recognizers, the old C++ recognizer
> (ocropus-linerec) and the new Python recognizer (ocropus-calign); they
> work similarly but have some differences.  For the official release,
> we're moving completely to the Python recognizer.
>
> Tom
>
>
>
>
> On May 9, 10:28 pm, Ted <[email protected]> wrote:
> > Did you ever get a reply to the documentation question?
> > I'm a new OCROpus user and have to use gocr.  This creates problems
> > because it needs a lot of corrections.  I'd like to know how to use
> > the trainer.
> >
> > I'd even be willing to write an introductory manual.
> >
> > On May 5, 5:53 pm, Dennis Rardin <[email protected]> wrote:
> >
> >
> >
> > > All/Anyone,
> >
> > > I have 2 large books broken into pages and then to lines. I'm ready to
> > > train. For both books, I have text files to compare against the images.
> >
> > > How do I train OCROpus by using the text files to correct the results
> of the
> > > character recognition?
> >
> > > Thank You Very Much,
> > > Dennis
> >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups "ocropus" group.
> > > To post to this group, send email to [email protected].
> > > To unsubscribe from this group, send email to
> [email protected]<ocropus%[email protected]>
> .
> > > For more options, visit this group athttp://
> groups.google.com/group/ocropus?hl=en.
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "ocropus" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to
> [email protected]<ocropus%[email protected]>
> .
> > For more options, visit this group athttp://
> groups.google.com/group/ocropus?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "ocropus" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<ocropus%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/ocropus?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Re: Hot To Train On Book Directory Against Ground Truth Text Files?

Reply via email to