Hello Ramon, for extending existing language you need "Tif/Box pairs" see http://code.google.com/p/tesseract-ocr/wiki/FAQ and there "How do I add just one character or one font to my favourite language, without having to retrain from scratch?"
Unfortunately tif/box pairs are provided only for eng, deu, fra, ita, nld and spa languages... So you can wait that somebody will someday release tif/box pairs for your language or you will start training from scratch. I choose second option and this is reason why I started with testing of training process for tesseract 3.00. BR, Zdenko On Mon, Apr 26, 2010 at 11:06 AM, Ramon <[email protected]> wrote: > Hi, > After some tests I realized the best for me is to put effort to extend > the Catalan Diccionari which is in svn repository (v3). > It will be so useful if you can do one of these: > > -> deliver the different files combined to create the cat.traineddata > unified file. (the utf8 files used to generate the dawg would be also > amazing!). > -> show how to extract these files from the cat.traineddata and how to > dawg2utf8 (if it is possible). > > THANKS! > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<tesseract-ocr%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

