I'm generating the lines from a set of training images, not from a set of strings. I just know where in this particular data set the text is, so I can grab the text from those regions and dump it to a file. I still need to correctly transcribe it afterwards. Thanks for the answers, I will keep my eyes peeled for the next release!
On Oct 19, 6:26 pm, Tom Breuel <[email protected]> wrote: > > What is the recommended procedure for manually correcting cseg.gt.png > > files? Is there a utility that I am overlooking? > > There isn't one yet; we've been working on it. > > > When generating text for training images, should this include spaces? > > Yes; however, the space handling in OCRopus is currently inconsistent > so that the spaces are ignored. > > > My overall procedure : I have spent some time training ocropus on a > > custom font, images from JPGs. I am using the following methods : > > > 1) Generate a variety of single line training images programatically > > 2) Manually type the text contained in each training image > > If you generate it, why not save the text? > > > 3) Places these in a directory training/0000 or training/0001 etc > > 4) run ocropus lines2fsts training > > 5) replace the generate txt files with my txt files and run ocropus > > align training to generate cseg.png > > 6) run ocropus trainseg on training to generate a new model file > > 7) goto 1 using the new training model > > If you can write a script that takes a text file and font and > generates a book directory full of binary line images, corresponding > csegs, and corresponding Unicode strings, that would be useful. > > Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
