Hi there, There are lots of situations where it would be really useful to be able to get some of the source files from a .traineddata file. For example I am working on improving training of Ancient Greek (grc) - which is basically the same as modern Greek (ell), but with some extra accents and similar additions - and it would be really useful to be able to reuse all of the perfectly valid ell.traineddata stuff, just adding training for the extra characters and symbols, rather than have to essentially redo the majority of the training for modern Greek as well as the Ancient Greek.
As far as I'm aware this should be possible, but I don't know of any tools to do it. Creating a .tr file from the .inttemp file might be some work, but from scanning the way it works looks feasible, and creating a dawg2wordlist tool looks like it ought to be straightforward enough. Has anybody else attempted this? Am I going about things the wrong way? If I write code to do this in a sane manner, would it be suitable to be included in the Tesseract codebase? Thanks folks, Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

