> On a slightly different topic. I'm trying to create some training > data, but > am unsure how to go about creating the 'cseg' files, is that something > ocropus will > do for me (how ?) or is it done manually. If the latter how are they > created ?.
It depends on what you're training on. For an entirely new script, you need to create them by hand, but that's rare. You can also artificially generate training data from fonts and generate the cseg information automatically along with that. The usual thing is that you have some existing OCR results, probably with character bounding boxes. There are some OCRopus functions to convert bounding boxes into cseg files. Those can then be used for training. Once you have trained OCRopus, then the usual way of creating the cseg files is with "ocropus align"; this will align text lines with their corresponding transcriptions. Cseg files are also generated during regular recognition and those can be used for training as well (this is the usual way for book-adaptive training). Have a look at the "Training" wiki page. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
