I'm generating the lines from a set of training images, not from a set
of strings. I just know where in this particular data set the text is,
so I can grab the text from those regions and dump it to a file. I
still need to correctly transcribe it afterwards. Thanks for the
answers, I will keep my eyes peeled for the next release!

On Oct 19, 6:26 pm, Tom Breuel <[email protected]> wrote:
> > What is the recommended procedure for manually correcting cseg.gt.png
> > files? Is there a utility that I am overlooking?
>
> There isn't one yet; we've been working on it.
>
> > When generating text for training images, should this include spaces?
>
> Yes; however, the space handling in OCRopus is currently inconsistent
> so that the spaces are ignored.
>
> > My overall procedure : I have spent some time training ocropus on a
> > custom font, images from JPGs. I am using the following methods :
>
> > 1) Generate a variety of single line training images programatically
> > 2) Manually type the text contained in each training image
>
> If you generate it, why not save the text?
>
> > 3) Places these in a directory training/0000 or training/0001 etc
> > 4) run ocropus lines2fsts training
> > 5) replace the generate txt files with my txt files and run ocropus
> > align training to generate cseg.png
> > 6) run ocropus trainseg on training to generate a new model file
> > 7) goto 1 using the new training model
>
> If you can write a script that takes a text file and font and
> generates a book directory full of binary line images, corresponding
> csegs, and corresponding Unicode strings, that would be useful.
>
> Tom
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to