Re: Extracting files from .tessdata

zdenko podobny Wed, 28 Apr 2010 06:55:49 -0700

Hello Ramon,

for extending existing language you need "Tif/Box pairs" see
http://code.google.com/p/tesseract-ocr/wiki/FAQ and there "How do I add just
one character or one font to my favourite language, without having to
retrain from scratch?"


Unfortunately tif/box pairs are provided only for eng, deu, fra, ita, nld
and spa languages... So you can wait that somebody will someday release
tif/box pairs for your language or you will start training from scratch. I
choose second option and this is reason why I started with testing of
training process for  tesseract 3.00.

BR,

Zdenko


On Mon, Apr 26, 2010 at 11:06 AM, Ramon <[email protected]> wrote:

> Hi,
> After some tests I realized the best for me is to put effort to extend
> the Catalan Diccionari which is in svn repository (v3).
> It will be so useful if you can do one of these:
>
> -> deliver the different files combined to create the cat.traineddata
> unified file. (the utf8 files used to generate the dawg would be also
> amazing!).
> -> show how to extract these files from the cat.traineddata and how to
> dawg2utf8 (if it is possible).
>
> THANKS!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<tesseract-ocr%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Re: Extracting files from .tessdata

Reply via email to