Re: Specifying different dictionary files [was: Getting usable source files from traineddata files]

Nick White Wed, 18 Apr 2012 02:48:18 -0700

On Tue, Apr 17, 2012 at 06:13:12PM -0700, David Eger wrote:
> On Apr 17, 7:26 am, Nick White <[email protected]> wrote:
> > On Mon, Apr 16, 2012 at 06:38:01PM +0200, zdenko podobny wrote:
> > > I think in 3.02 will provide solution this cases: you can use more than 
> > > one
> > > language for OCR. e.g. you can run something like this:
> >
> > > tesseract image output -l grc+ell
> >
> > Ah, that's a very good idea, and will indeed be useful. However for
> > my usecase (a script which is mostly the same, but with additions,
> > and an older version of the language), it would be useful to only
> > use one set of dictionary files (rather than presumably the union of
> > grc & ell, in the above example).
> 
> The main difficult thing for you will be any characters that are not
> already trained.  There's no easy way to "just add a few characters"
> you basically have to do a full retrain.


Really? Why can't I just train the extra characters I need in a new
trainingdata file, then just combine the two with something like
-l grc+ell? Why wouldn't that acheive pretty much the same thing
(specifying a dictionary separately to avoid the ell one)?

Thanks alot for your input.

Nick

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Specifying different dictionary files [was: Getting usable source files from traineddata files]

Reply via email to