On Tue, Apr 17, 2012 at 06:13:12PM -0700, David Eger wrote: > On Apr 17, 7:26 am, Nick White <[email protected]> wrote: > > On Mon, Apr 16, 2012 at 06:38:01PM +0200, zdenko podobny wrote: > > > I think in 3.02 will provide solution this cases: you can use more than > > > one > > > language for OCR. e.g. you can run something like this: > > > > > tesseract image output -l grc+ell > > > > Ah, that's a very good idea, and will indeed be useful. However for > > my usecase (a script which is mostly the same, but with additions, > > and an older version of the language), it would be useful to only > > use one set of dictionary files (rather than presumably the union of > > grc & ell, in the above example). > > The main difficult thing for you will be any characters that are not > already trained. There's no easy way to "just add a few characters" > you basically have to do a full retrain.
Really? Why can't I just train the extra characters I need in a new trainingdata file, then just combine the two with something like -l grc+ell? Why wouldn't that acheive pretty much the same thing (specifying a dictionary separately to avoid the ell one)? Thanks alot for your input. Nick -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

